July 29, 2019

3099 words 15 mins read

Paper Group ANR 72

A New Urban Objects Detection Framework Using Weakly Annotated Sets. Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition. Measuring Catastrophic Forgetting in Neural Networks. KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors. ROAD: Reality Oriented Adaptation for Sema …

A New Urban Objects Detection Framework Using Weakly Annotated Sets


Title	A New Urban Objects Detection Framework Using Weakly Annotated Sets
Authors	Eric Keiji, Gabriel Ferreira, Claudio Silva, Roberto M. Cesar Jr
Abstract	Urban informatics explore data science methods to address different urban issues intensively based on data. The large variety and quantity of data available should be explored but this brings important challenges. For instance, although there are powerful computer vision methods that may be explored, they may require large annotated datasets. In this work we propose a novel approach to automatically creating an object recognition system with minimal manual annotation. The basic idea behind the method is to use large input datasets using available online cameras on large cities. A off-the-shelf weak classifier is used to detect an initial set of urban elements of interest (e.g. cars, pedestrians, bikes, etc.). Such initial dataset undergoes a quality control procedure and it is subsequently used to fine tune a strong classifier. Quality control and comparative performance assessment are used as part of the pipeline. We evaluate the method for detecting cars based on monitoring cameras. Experimental results using real data show that despite losing generality, the final detector provides better detection rates tailored to the selected cameras. The programmed robot gathered 770 video hours from 24 online city cameras (~300GB), which has been fed to the proposed system. Our approach has shown that the method nearly doubled the recall (93%) with respect to state-of-the-art methods using off-the-shelf algorithms.
Tasks	Object Recognition
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09308v2
PDF	http://arxiv.org/pdf/1706.09308v2.pdf
PWC	https://paperswithcode.com/paper/a-new-urban-objects-detection-framework-using
Repo
Framework

Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition


Title	Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition
Authors	Fei Tao, Gang Liu
Abstract	Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (A-LSTM), for better temporal context modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based weighted pooling RNN can also complement the state-of-the-art emotion classification framework. This shows the advantage of A-LSTM.
Tasks	Emotion Classification, Emotion Recognition
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10197v1
PDF	http://arxiv.org/pdf/1710.10197v1.pdf
PWC	https://paperswithcode.com/paper/advanced-lstm-a-study-about-better-time
Repo
Framework

Measuring Catastrophic Forgetting in Neural Networks


Title	Measuring Catastrophic Forgetting in Neural Networks
Authors	Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, Christopher Kanan
Abstract	Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than re-training the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem has yet to be solved.
Tasks
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02072v4
PDF	http://arxiv.org/pdf/1708.02072v4.pdf
PWC	https://paperswithcode.com/paper/measuring-catastrophic-forgetting-in-neural
Repo
Framework

KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors


Title	KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors
Authors	Amit Kumar, Azadeh Alavi, Rama Chellappa
Abstract	Keypoint detection is one of the most important pre-processing steps in tasks such as face modeling, recognition and verification. In this paper, we present an iterative method for Keypoint Estimation and Pose prediction of unconstrained faces by Learning Efficient H-CNN Regressors (KEPLER) for addressing the face alignment problem. Recent state of the art methods have shown improvements in face keypoint detection by employing Convolution Neural Networks (CNNs). Although a simple feed forward neural network can learn the mapping between input and output spaces, it cannot learn the inherent structural dependencies. We present a novel architecture called H-CNN (Heatmap-CNN) which captures structured global and local features and thus favors accurate keypoint detecion. HCNN is jointly trained on the visibility, fiducials and 3D-pose of the face. As the iterations proceed, the error decreases making the gradients small and thus requiring efficient training of DCNNs to mitigate this. KEPLER performs global corrections in pose and fiducials for the first four iterations followed by local corrections in the subsequent stage. As a by-product, KEPLER also provides 3D pose (pitch, yaw and roll) of the face accurately. In this paper, we show that without using any 3D information, KEPLER outperforms state of the art methods for alignment on challenging datasets such as AFW and AFLW.
Tasks	Face Alignment, Head Pose Estimation, Keypoint Detection, Pose Estimation, Pose Prediction
Published	2017-02-16
URL	http://arxiv.org/abs/1702.05085v1
PDF	http://arxiv.org/pdf/1702.05085v1.pdf
PWC	https://paperswithcode.com/paper/kepler-keypoint-and-pose-estimation-of
Repo
Framework

ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes


Title	ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
Authors	Yuhua Chen, Wen Li, Luc Van Gool
Abstract	Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the intrinsic domain difference between synthetic and real images usually causes a significant performance drop when applying the learned model to real world scenarios. This is mainly due to two reasons: 1) the model overfits to synthetic images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is a distribution difference between synthetic and real data, which is also known as the domain adaptation problem. To this end, we propose a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided distillation approach to learn the real image style, which is achieved by training the segmentation model to imitate a pretrained real style model using real images. Second, we further take advantage of the intrinsic spatial structure presented in urban scene images, and propose a spatial-aware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation networks to improve their generalizability when adapting from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method.
Tasks	Domain Adaptation, Semantic Segmentation, Synthetic-to-Real Translation
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11556v2
PDF	http://arxiv.org/pdf/1711.11556v2.pdf
PWC	https://paperswithcode.com/paper/road-reality-oriented-adaptation-for-semantic
Repo
Framework


Title	Reconstruction of Electrical Impedance Tomography Using Fish School Search, Non-Blind Search, and Genetic Algorithm
Authors	Valter Augusto de Freitas Barbosa, Reiga Ramalho Ribeiro, Allan Rivalles Souza Feitosa, Victor Luiz Bezerra Araújo da Silva, Arthur Diego Dias Rocha, Rafaela Covello de Freitas, Ricardo Emmanuel de Souza, Wellington Pinheiro dos Santos
Abstract	Electrical Impedance Tomography (EIT) is a noninvasive imaging technique that does not use ionizing radiation, with application both in environmental sciences and in health. Image reconstruction is performed by solving an inverse problem and ill-posed. Evolutionary Computation and Swarm Intelligence have become a source of methods for solving inverse problems. Fish School Search (FSS) is a promising search and optimization method, based on the dynamics of schools of fish. In this article the authors present a method for reconstruction of EIT images based on FSS and Non-Blind Search (NBS). The method was evaluated using numerical phantoms consisting of electrical conductivity images with subjects in the center, between the center and the edge and on the edge of a circular section, with meshes of 415 finite elements. The authors performed 20 simulations for each configuration. Results showed that both FSS and FSS-NBS were able to converge faster than genetic algorithms.
Tasks	Image Reconstruction
Published	2017-12-03
URL	http://arxiv.org/abs/1712.00789v1
PDF	http://arxiv.org/pdf/1712.00789v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-of-electrical-impedance
Repo
Framework

Deep Learning for Accelerated Ultrasound Imaging


Title	Deep Learning for Accelerated Ultrasound Imaging
Authors	Yeo Hun Yoon, Jong Chul Ye
Abstract	In portable, 3-D, or ultra-fast ultrasound (US) imaging systems, there is an increasing demand to reconstruct high quality images from limited number of data. However, the existing solutions require either hardware changes or computationally expansive algorithms. To overcome these limitations, here we propose a novel deep learning approach that interpolates the missing RF data by utilizing the sparsity of the RF data in the Fourier domain. Extensive experimental results from sub-sampled RF data from a real US system confirmed that the proposed method can effectively reduce the data rate without sacrificing the image quality.
Tasks
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10006v1
PDF	http://arxiv.org/pdf/1710.10006v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-accelerated-ultrasound
Repo
Framework

Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events


Title	Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events
Authors	Prafulla Kumar Choubey, Ruihong Huang
Abstract	We introduce a novel iterative approach for event coreference resolution that gradually builds event clusters by exploiting inter-dependencies among event mentions within the same chain as well as across event chains. Among event mentions in the same chain, we distinguish within- and cross-document event coreference links by using two distinct pairwise classifiers, trained separately to capture differences in feature distributions of within- and cross-document event clusters. Our event coreference approach alternates between WD and CD clustering and combines arguments from both event clusters after every merge, continuing till no more merge can be made. And then it performs further merging between event chains that are both closely related to a set of other chains of events. Experiments on the ECB+ corpus show that our model outperforms state-of-the-art methods in joint task of WD and CD event coreference resolution.
Tasks	Coreference Resolution
Published	2017-07-23
URL	http://arxiv.org/abs/1707.07344v1
PDF	http://arxiv.org/pdf/1707.07344v1.pdf
PWC	https://paperswithcode.com/paper/event-coreference-resolution-by-iteratively
Repo
Framework

Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext


Title	Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
Authors	John Wieting, Jonathan Mallinson, Kevin Gimpel
Abstract	We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases via back-translation of bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to serve as training data for learning paraphrastic sentence embeddings. We find that the data quality is stronger than prior work based on bitext and on par with manually-written English paraphrase pairs, with the advantage that our approach can scale up to generate large training sets for many languages and domains. We experiment with several language pairs and data sources, and develop a variety of data filtering techniques. In the process, we explore how neural machine translation output differs from human-written sentences, finding clear differences in length, the amount of repetition, and the use of rare words.
Tasks	Machine Translation, Sentence Embeddings
Published	2017-06-06
URL	http://arxiv.org/abs/1706.01847v1
PDF	http://arxiv.org/pdf/1706.01847v1.pdf
PWC	https://paperswithcode.com/paper/learning-paraphrastic-sentence-embeddings
Repo
Framework

A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)


Title	A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
Authors	Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford
Abstract	This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.
Tasks
Published	2017-10-25
URL	http://arxiv.org/abs/1710.09430v2
PDF	http://arxiv.org/pdf/1710.09430v2.pdf
PWC	https://paperswithcode.com/paper/a-markov-chain-theory-approach-to
Repo
Framework

Trace norm regularization and faster inference for embedded speech recognition RNNs


Title	Trace norm regularization and faster inference for embedded speech recognition RNNs
Authors	Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi
Abstract	We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, we introduce and study a trace norm regularization technique for training low rank factored versions of matrix multiplications. Compared to standard low rank training, we show that our method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models. For speedup, we enable faster inference on ARM processors through new open sourced kernels optimized for small batch sizes, resulting in 3x to 7x speed ups over the widely used gemmlowp library. Beyond LVCSR, we expect our techniques and kernels to be more generally applicable to embedded neural networks with large fully connected or recurrent layers.
Tasks	Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2017-10-25
URL	http://arxiv.org/abs/1710.09026v2
PDF	http://arxiv.org/pdf/1710.09026v2.pdf
PWC	https://paperswithcode.com/paper/trace-norm-regularization-and-faster
Repo
Framework

3D Vision Guided Robotic Charging Station for Electric and Plug-in Hybrid Vehicles


Title	3D Vision Guided Robotic Charging Station for Electric and Plug-in Hybrid Vehicles
Authors	Justinas Miseikis, Matthias Ruther, Bernhard Walzel, Mario Hirz, Helmut Brunner
Abstract	Electric vehicles (EVs) and plug-in hybrid vehicles (PHEVs) are rapidly gaining popularity on our roads. Besides a comparatively high purchasing price, the main two problems limiting their use are the short driving range and inconvenient charging process. In this paper we address the following by presenting an automatic robot-based charging station with 3D vision guidance for plugging and unplugging the charger. First of all, the whole system concept consisting of a 3D vision system, an UR10 robot and a charging station is presented. Then we show the shape-based matching methods used to successfully identify and get the exact pose of the charging port. The same approach is used to calibrate the camera-robot system by using just known structure of the connector plug and no additional markers. Finally, a three-step robot motion planning procedure for plug-in is presented and functionality is demonstrated in a series of successful experiments.
Tasks	Motion Planning
Published	2017-03-15
URL	http://arxiv.org/abs/1703.05381v1
PDF	http://arxiv.org/pdf/1703.05381v1.pdf
PWC	https://paperswithcode.com/paper/3d-vision-guided-robotic-charging-station-for
Repo
Framework

Extraction of Airways with Probabilistic State-space Models and Bayesian Smoothing


Title	Extraction of Airways with Probabilistic State-space Models and Bayesian Smoothing
Authors	Raghavendra Selvan, Jens Petersen, Jesper H. Pedersen, Marleen de Bruijne
Abstract	Segmenting tree structures is common in several image processing applications. In medical image analysis, reliable segmentations of airways, vessels, neurons and other tree structures can enable important clinical applications. We present a framework for tracking tree structures comprising of elongated branches using probabilistic state-space models and Bayesian smoothing. Unlike most existing methods that proceed with sequential tracking of branches, we present an exploratory method, that is less sensitive to local anomalies in the data due to acquisition noise and/or interfering structures. The evolution of individual branches is modelled using a process model and the observed data is incorporated into the update step of the Bayesian smoother using a measurement model that is based on a multi-scale blob detector. Bayesian smoothing is performed using the RTS (Rauch-Tung-Striebel) smoother, which provides Gaussian density estimates of branch states at each tracking step. We select likely branch seed points automatically based on the response of the blob detection and track from all such seed points using the RTS smoother. We use covariance of the marginal posterior density estimated for each branch to discriminate false positive and true positive branches. The method is evaluated on 3D chest CT scans to track airways. We show that the presented method results in additional branches compared to a baseline method based on region growing on probability images.
Tasks
Published	2017-08-07
URL	http://arxiv.org/abs/1708.02096v1
PDF	http://arxiv.org/pdf/1708.02096v1.pdf
PWC	https://paperswithcode.com/paper/extraction-of-airways-with-probabilistic
Repo
Framework

Generating and Estimating Nonverbal Alphabets for Situated and Multimodal Communications


Title	Generating and Estimating Nonverbal Alphabets for Situated and Multimodal Communications
Authors	Serhii Hamotskyi, Sergii Stirenko, Yuri Gordienko, Anis Rojbi
Abstract	In this paper, we discuss the formalized approach for generating and estimating symbols (and alphabets), which can be communicated by the wide range of non-verbal means based on specific user requirements (medium, priorities, type of information that needs to be conveyed). The short characterization of basic terms and parameters of such symbols (and alphabets) with approaches to generate them are given. Then the framework, experimental setup, and some machine learning methods to estimate usefulness and effectiveness of the nonverbal alphabets and systems are presented. The previous results demonstrate that usage of multimodal data sources (like wearable accelerometer, heart monitor, muscle movements sensors, braincomputer interface) along with machine learning approaches can provide the deeper understanding of the usefulness and effectiveness of such alphabets and systems for nonverbal and situated communication. The symbols (and alphabets) generated and estimated by such methods may be useful in various applications: from synthetic languages and constructed scripts to multimodal nonverbal and situated interaction between people and artificial intelligence systems through Human-Computer Interfaces, such as mouse gestures, touchpads, body gestures, eyetracking cameras, wearables, and brain-computing interfaces, especially in applications for elderly care and people with disabilities.
Tasks
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04314v1
PDF	http://arxiv.org/pdf/1712.04314v1.pdf
PWC	https://paperswithcode.com/paper/generating-and-estimating-nonverbal-alphabets
Repo
Framework

Continual Learning in Generative Adversarial Nets


Title	Continual Learning in Generative Adversarial Nets
Authors	Ari Seff, Alex Beatson, Daniel Suo, Han Liu
Abstract	Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling.
Tasks	Continual Learning
Published	2017-05-23
URL	http://arxiv.org/abs/1705.08395v1
PDF	http://arxiv.org/pdf/1705.08395v1.pdf
PWC	https://paperswithcode.com/paper/continual-learning-in-generative-adversarial
Repo
Framework