Paper Group ANR 72
A New Urban Objects Detection Framework Using Weakly Annotated Sets. Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition. Measuring Catastrophic Forgetting in Neural Networks. KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors. ROAD: Reality Oriented Adaptation for Sema …
A New Urban Objects Detection Framework Using Weakly Annotated Sets
Title | A New Urban Objects Detection Framework Using Weakly Annotated Sets |
Authors | Eric Keiji, Gabriel Ferreira, Claudio Silva, Roberto M. Cesar Jr |
Abstract | Urban informatics explore data science methods to address different urban issues intensively based on data. The large variety and quantity of data available should be explored but this brings important challenges. For instance, although there are powerful computer vision methods that may be explored, they may require large annotated datasets. In this work we propose a novel approach to automatically creating an object recognition system with minimal manual annotation. The basic idea behind the method is to use large input datasets using available online cameras on large cities. A off-the-shelf weak classifier is used to detect an initial set of urban elements of interest (e.g. cars, pedestrians, bikes, etc.). Such initial dataset undergoes a quality control procedure and it is subsequently used to fine tune a strong classifier. Quality control and comparative performance assessment are used as part of the pipeline. We evaluate the method for detecting cars based on monitoring cameras. Experimental results using real data show that despite losing generality, the final detector provides better detection rates tailored to the selected cameras. The programmed robot gathered 770 video hours from 24 online city cameras (~300GB), which has been fed to the proposed system. Our approach has shown that the method nearly doubled the recall (93%) with respect to state-of-the-art methods using off-the-shelf algorithms. |
Tasks | Object Recognition |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09308v2 |
http://arxiv.org/pdf/1706.09308v2.pdf | |
PWC | https://paperswithcode.com/paper/a-new-urban-objects-detection-framework-using |
Repo | |
Framework | |
Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition
Title | Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition |
Authors | Fei Tao, Gang Liu |
Abstract | Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (A-LSTM), for better temporal context modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based weighted pooling RNN can also complement the state-of-the-art emotion classification framework. This shows the advantage of A-LSTM. |
Tasks | Emotion Classification, Emotion Recognition |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10197v1 |
http://arxiv.org/pdf/1710.10197v1.pdf | |
PWC | https://paperswithcode.com/paper/advanced-lstm-a-study-about-better-time |
Repo | |
Framework | |
Measuring Catastrophic Forgetting in Neural Networks
Title | Measuring Catastrophic Forgetting in Neural Networks |
Authors | Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, Christopher Kanan |
Abstract | Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than re-training the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem has yet to be solved. |
Tasks | |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02072v4 |
http://arxiv.org/pdf/1708.02072v4.pdf | |
PWC | https://paperswithcode.com/paper/measuring-catastrophic-forgetting-in-neural |
Repo | |
Framework | |
KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors
Title | KEPLER: Keypoint and Pose Estimation of Unconstrained Faces by Learning Efficient H-CNN Regressors |
Authors | Amit Kumar, Azadeh Alavi, Rama Chellappa |
Abstract | Keypoint detection is one of the most important pre-processing steps in tasks such as face modeling, recognition and verification. In this paper, we present an iterative method for Keypoint Estimation and Pose prediction of unconstrained faces by Learning Efficient H-CNN Regressors (KEPLER) for addressing the face alignment problem. Recent state of the art methods have shown improvements in face keypoint detection by employing Convolution Neural Networks (CNNs). Although a simple feed forward neural network can learn the mapping between input and output spaces, it cannot learn the inherent structural dependencies. We present a novel architecture called H-CNN (Heatmap-CNN) which captures structured global and local features and thus favors accurate keypoint detecion. HCNN is jointly trained on the visibility, fiducials and 3D-pose of the face. As the iterations proceed, the error decreases making the gradients small and thus requiring efficient training of DCNNs to mitigate this. KEPLER performs global corrections in pose and fiducials for the first four iterations followed by local corrections in the subsequent stage. As a by-product, KEPLER also provides 3D pose (pitch, yaw and roll) of the face accurately. In this paper, we show that without using any 3D information, KEPLER outperforms state of the art methods for alignment on challenging datasets such as AFW and AFLW. |
Tasks | Face Alignment, Head Pose Estimation, Keypoint Detection, Pose Estimation, Pose Prediction |
Published | 2017-02-16 |
URL | http://arxiv.org/abs/1702.05085v1 |
http://arxiv.org/pdf/1702.05085v1.pdf | |
PWC | https://paperswithcode.com/paper/kepler-keypoint-and-pose-estimation-of |
Repo | |
Framework | |
ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes
Title | ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes |
Authors | Yuhua Chen, Wen Li, Luc Van Gool |
Abstract | Exploiting synthetic data to learn deep models has attracted increasing attention in recent years. However, the intrinsic domain difference between synthetic and real images usually causes a significant performance drop when applying the learned model to real world scenarios. This is mainly due to two reasons: 1) the model overfits to synthetic images, making the convolutional filters incompetent to extract informative representation for real images; 2) there is a distribution difference between synthetic and real data, which is also known as the domain adaptation problem. To this end, we propose a new reality oriented adaptation approach for urban scene semantic segmentation by learning from synthetic data. First, we propose a target guided distillation approach to learn the real image style, which is achieved by training the segmentation model to imitate a pretrained real style model using real images. Second, we further take advantage of the intrinsic spatial structure presented in urban scene images, and propose a spatial-aware adaptation scheme to effectively align the distribution of two domains. These two modules can be readily integrated with existing state-of-the-art semantic segmentation networks to improve their generalizability when adapting from synthetic to real urban scenes. We evaluate the proposed method on Cityscapes dataset by adapting from GTAV and SYNTHIA datasets, where the results demonstrate the effectiveness of our method. |
Tasks | Domain Adaptation, Semantic Segmentation, Synthetic-to-Real Translation |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11556v2 |
http://arxiv.org/pdf/1711.11556v2.pdf | |
PWC | https://paperswithcode.com/paper/road-reality-oriented-adaptation-for-semantic |
Repo | |
Framework | |
Reconstruction of Electrical Impedance Tomography Using Fish School Search, Non-Blind Search, and Genetic Algorithm
Title | Reconstruction of Electrical Impedance Tomography Using Fish School Search, Non-Blind Search, and Genetic Algorithm |
Authors | Valter Augusto de Freitas Barbosa, Reiga Ramalho Ribeiro, Allan Rivalles Souza Feitosa, Victor Luiz Bezerra Araújo da Silva, Arthur Diego Dias Rocha, Rafaela Covello de Freitas, Ricardo Emmanuel de Souza, Wellington Pinheiro dos Santos |
Abstract | Electrical Impedance Tomography (EIT) is a noninvasive imaging technique that does not use ionizing radiation, with application both in environmental sciences and in health. Image reconstruction is performed by solving an inverse problem and ill-posed. Evolutionary Computation and Swarm Intelligence have become a source of methods for solving inverse problems. Fish School Search (FSS) is a promising search and optimization method, based on the dynamics of schools of fish. In this article the authors present a method for reconstruction of EIT images based on FSS and Non-Blind Search (NBS). The method was evaluated using numerical phantoms consisting of electrical conductivity images with subjects in the center, between the center and the edge and on the edge of a circular section, with meshes of 415 finite elements. The authors performed 20 simulations for each configuration. Results showed that both FSS and FSS-NBS were able to converge faster than genetic algorithms. |
Tasks | Image Reconstruction |
Published | 2017-12-03 |
URL | http://arxiv.org/abs/1712.00789v1 |
http://arxiv.org/pdf/1712.00789v1.pdf | |
PWC | https://paperswithcode.com/paper/reconstruction-of-electrical-impedance |
Repo | |
Framework | |
Deep Learning for Accelerated Ultrasound Imaging
Title | Deep Learning for Accelerated Ultrasound Imaging |
Authors | Yeo Hun Yoon, Jong Chul Ye |
Abstract | In portable, 3-D, or ultra-fast ultrasound (US) imaging systems, there is an increasing demand to reconstruct high quality images from limited number of data. However, the existing solutions require either hardware changes or computationally expansive algorithms. To overcome these limitations, here we propose a novel deep learning approach that interpolates the missing RF data by utilizing the sparsity of the RF data in the Fourier domain. Extensive experimental results from sub-sampled RF data from a real US system confirmed that the proposed method can effectively reduce the data rate without sacrificing the image quality. |
Tasks | |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10006v1 |
http://arxiv.org/pdf/1710.10006v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-accelerated-ultrasound |
Repo | |
Framework | |
Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events
Title | Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events |
Authors | Prafulla Kumar Choubey, Ruihong Huang |
Abstract | We introduce a novel iterative approach for event coreference resolution that gradually builds event clusters by exploiting inter-dependencies among event mentions within the same chain as well as across event chains. Among event mentions in the same chain, we distinguish within- and cross-document event coreference links by using two distinct pairwise classifiers, trained separately to capture differences in feature distributions of within- and cross-document event clusters. Our event coreference approach alternates between WD and CD clustering and combines arguments from both event clusters after every merge, continuing till no more merge can be made. And then it performs further merging between event chains that are both closely related to a set of other chains of events. Experiments on the ECB+ corpus show that our model outperforms state-of-the-art methods in joint task of WD and CD event coreference resolution. |
Tasks | Coreference Resolution |
Published | 2017-07-23 |
URL | http://arxiv.org/abs/1707.07344v1 |
http://arxiv.org/pdf/1707.07344v1.pdf | |
PWC | https://paperswithcode.com/paper/event-coreference-resolution-by-iteratively |
Repo | |
Framework | |
Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
Title | Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext |
Authors | John Wieting, Jonathan Mallinson, Kevin Gimpel |
Abstract | We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases via back-translation of bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to serve as training data for learning paraphrastic sentence embeddings. We find that the data quality is stronger than prior work based on bitext and on par with manually-written English paraphrase pairs, with the advantage that our approach can scale up to generate large training sets for many languages and domains. We experiment with several language pairs and data sources, and develop a variety of data filtering techniques. In the process, we explore how neural machine translation output differs from human-written sentences, finding clear differences in length, the amount of repetition, and the use of rare words. |
Tasks | Machine Translation, Sentence Embeddings |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01847v1 |
http://arxiv.org/pdf/1706.01847v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-paraphrastic-sentence-embeddings |
Repo | |
Framework | |
A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
Title | A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares) |
Authors | Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Venkata Krishna Pillutla, Aaron Sidford |
Abstract | This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification. |
Tasks | |
Published | 2017-10-25 |
URL | http://arxiv.org/abs/1710.09430v2 |
http://arxiv.org/pdf/1710.09430v2.pdf | |
PWC | https://paperswithcode.com/paper/a-markov-chain-theory-approach-to |
Repo | |
Framework | |
Trace norm regularization and faster inference for embedded speech recognition RNNs
Title | Trace norm regularization and faster inference for embedded speech recognition RNNs |
Authors | Markus Kliegl, Siddharth Goyal, Kexin Zhao, Kavya Srinet, Mohammad Shoeybi |
Abstract | We propose and evaluate new techniques for compressing and speeding up dense matrix multiplications as found in the fully connected and recurrent layers of neural networks for embedded large vocabulary continuous speech recognition (LVCSR). For compression, we introduce and study a trace norm regularization technique for training low rank factored versions of matrix multiplications. Compared to standard low rank training, we show that our method leads to good accuracy versus number of parameter trade-offs and can be used to speed up training of large models. For speedup, we enable faster inference on ARM processors through new open sourced kernels optimized for small batch sizes, resulting in 3x to 7x speed ups over the widely used gemmlowp library. Beyond LVCSR, we expect our techniques and kernels to be more generally applicable to embedded neural networks with large fully connected or recurrent layers. |
Tasks | Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2017-10-25 |
URL | http://arxiv.org/abs/1710.09026v2 |
http://arxiv.org/pdf/1710.09026v2.pdf | |
PWC | https://paperswithcode.com/paper/trace-norm-regularization-and-faster |
Repo | |
Framework | |
3D Vision Guided Robotic Charging Station for Electric and Plug-in Hybrid Vehicles
Title | 3D Vision Guided Robotic Charging Station for Electric and Plug-in Hybrid Vehicles |
Authors | Justinas Miseikis, Matthias Ruther, Bernhard Walzel, Mario Hirz, Helmut Brunner |
Abstract | Electric vehicles (EVs) and plug-in hybrid vehicles (PHEVs) are rapidly gaining popularity on our roads. Besides a comparatively high purchasing price, the main two problems limiting their use are the short driving range and inconvenient charging process. In this paper we address the following by presenting an automatic robot-based charging station with 3D vision guidance for plugging and unplugging the charger. First of all, the whole system concept consisting of a 3D vision system, an UR10 robot and a charging station is presented. Then we show the shape-based matching methods used to successfully identify and get the exact pose of the charging port. The same approach is used to calibrate the camera-robot system by using just known structure of the connector plug and no additional markers. Finally, a three-step robot motion planning procedure for plug-in is presented and functionality is demonstrated in a series of successful experiments. |
Tasks | Motion Planning |
Published | 2017-03-15 |
URL | http://arxiv.org/abs/1703.05381v1 |
http://arxiv.org/pdf/1703.05381v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-vision-guided-robotic-charging-station-for |
Repo | |
Framework | |
Extraction of Airways with Probabilistic State-space Models and Bayesian Smoothing
Title | Extraction of Airways with Probabilistic State-space Models and Bayesian Smoothing |
Authors | Raghavendra Selvan, Jens Petersen, Jesper H. Pedersen, Marleen de Bruijne |
Abstract | Segmenting tree structures is common in several image processing applications. In medical image analysis, reliable segmentations of airways, vessels, neurons and other tree structures can enable important clinical applications. We present a framework for tracking tree structures comprising of elongated branches using probabilistic state-space models and Bayesian smoothing. Unlike most existing methods that proceed with sequential tracking of branches, we present an exploratory method, that is less sensitive to local anomalies in the data due to acquisition noise and/or interfering structures. The evolution of individual branches is modelled using a process model and the observed data is incorporated into the update step of the Bayesian smoother using a measurement model that is based on a multi-scale blob detector. Bayesian smoothing is performed using the RTS (Rauch-Tung-Striebel) smoother, which provides Gaussian density estimates of branch states at each tracking step. We select likely branch seed points automatically based on the response of the blob detection and track from all such seed points using the RTS smoother. We use covariance of the marginal posterior density estimated for each branch to discriminate false positive and true positive branches. The method is evaluated on 3D chest CT scans to track airways. We show that the presented method results in additional branches compared to a baseline method based on region growing on probability images. |
Tasks | |
Published | 2017-08-07 |
URL | http://arxiv.org/abs/1708.02096v1 |
http://arxiv.org/pdf/1708.02096v1.pdf | |
PWC | https://paperswithcode.com/paper/extraction-of-airways-with-probabilistic |
Repo | |
Framework | |
Generating and Estimating Nonverbal Alphabets for Situated and Multimodal Communications
Title | Generating and Estimating Nonverbal Alphabets for Situated and Multimodal Communications |
Authors | Serhii Hamotskyi, Sergii Stirenko, Yuri Gordienko, Anis Rojbi |
Abstract | In this paper, we discuss the formalized approach for generating and estimating symbols (and alphabets), which can be communicated by the wide range of non-verbal means based on specific user requirements (medium, priorities, type of information that needs to be conveyed). The short characterization of basic terms and parameters of such symbols (and alphabets) with approaches to generate them are given. Then the framework, experimental setup, and some machine learning methods to estimate usefulness and effectiveness of the nonverbal alphabets and systems are presented. The previous results demonstrate that usage of multimodal data sources (like wearable accelerometer, heart monitor, muscle movements sensors, braincomputer interface) along with machine learning approaches can provide the deeper understanding of the usefulness and effectiveness of such alphabets and systems for nonverbal and situated communication. The symbols (and alphabets) generated and estimated by such methods may be useful in various applications: from synthetic languages and constructed scripts to multimodal nonverbal and situated interaction between people and artificial intelligence systems through Human-Computer Interfaces, such as mouse gestures, touchpads, body gestures, eyetracking cameras, wearables, and brain-computing interfaces, especially in applications for elderly care and people with disabilities. |
Tasks | |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04314v1 |
http://arxiv.org/pdf/1712.04314v1.pdf | |
PWC | https://paperswithcode.com/paper/generating-and-estimating-nonverbal-alphabets |
Repo | |
Framework | |
Continual Learning in Generative Adversarial Nets
Title | Continual Learning in Generative Adversarial Nets |
Authors | Ari Seff, Alex Beatson, Daniel Suo, Han Liu |
Abstract | Developments in deep generative models have allowed for tractable learning of high-dimensional data distributions. While the employed learning procedures typically assume that training data is drawn i.i.d. from the distribution of interest, it may be desirable to model distinct distributions which are observed sequentially, such as when different classes are encountered over time. Although conditional variations of deep generative models permit multiple distributions to be modeled by a single network in a disentangled fashion, they are susceptible to catastrophic forgetting when the distributions are encountered sequentially. In this paper, we adapt recent work in reducing catastrophic forgetting to the task of training generative adversarial networks on a sequence of distinct distributions, enabling continual generative modeling. |
Tasks | Continual Learning |
Published | 2017-05-23 |
URL | http://arxiv.org/abs/1705.08395v1 |
http://arxiv.org/pdf/1705.08395v1.pdf | |
PWC | https://paperswithcode.com/paper/continual-learning-in-generative-adversarial |
Repo | |
Framework | |