Paper Group ANR 342
SAR Target Recognition Using the Multi-aspect-aware Bidirectional LSTM Recurrent Neural Networks. Consistent Multiple Graph Matching with Multi-layer Random Walks Synchronization. A Fuzzy Brute Force Matching Method for Binary Image Features. Learning to Predict Charges for Criminal Cases with Legal Basis. Multi-kernel learning of deep convolutiona …
SAR Target Recognition Using the Multi-aspect-aware Bidirectional LSTM Recurrent Neural Networks
Title | SAR Target Recognition Using the Multi-aspect-aware Bidirectional LSTM Recurrent Neural Networks |
Authors | Fan Zhang, Chen Hu, Qiang Yin, Wei Li, Hengchao Li, Wen Hong |
Abstract | The outstanding pattern recognition performance of deep learning brings new vitality to the synthetic aperture radar (SAR) automatic target recognition (ATR). However, there is a limitation in current deep learning based ATR solution that each learning process only handle one SAR image, namely learning the static scattering information, while missing the space-varying information. It is obvious that multi-aspect joint recognition introduced space-varying scattering information should improve the classification accuracy and robustness. In this paper, a novel multi-aspect-aware method is proposed to achieve this idea through the bidirectional Long Short-Term Memory (LSTM) recurrent neural networks based space-varying scattering information learning. Specifically, we first select different aspect images to generate the multi-aspect space-varying image sequences. Then, the Gabor filter and three-patch local binary pattern (TPLBP) are progressively implemented to extract a comprehensive spatial features, followed by dimensionality reduction with the Multi-layer Perceptron (MLP) network. Finally, we design a bidirectional LSTM recurrent neural network to learn the multi-aspect features with further integrating the softmax classifier to achieve target recognition. Experimental results demonstrate that the proposed method can achieve 99.9% accuracy for 10-class recognition. Besides, its anti-noise and anti-confusion performance are also better than the conventional deep learning based methods. |
Tasks | Dimensionality Reduction |
Published | 2017-07-25 |
URL | http://arxiv.org/abs/1707.09875v1 |
http://arxiv.org/pdf/1707.09875v1.pdf | |
PWC | https://paperswithcode.com/paper/sar-target-recognition-using-the-multi-aspect |
Repo | |
Framework | |
Consistent Multiple Graph Matching with Multi-layer Random Walks Synchronization
Title | Consistent Multiple Graph Matching with Multi-layer Random Walks Synchronization |
Authors | Han-Mu Park, Kuk-Jin Yoon |
Abstract | We address the correspondence search problem among multiple graphs with complex properties while considering the matching consistency. We describe each pair of graphs by combining multiple attributes, then jointly match them in a unified framework. The main contribution of this paper is twofold. First, we formulate the global correspondence search problem of multi-attributed graphs by utilizing a set of multi-layer structures. The proposed formulation describes each pair of graphs as a multi-layer structure, and jointly considers whole matching pairs. Second, we propose a robust multiple graph matching method based on the multi-layer random walks framework. The proposed framework synchronizes movements of random walkers, and leads them to consistent matching candidates. In our extensive experiments, the proposed method exhibits robust and accurate performance over the state-of-the-art multiple graph matching algorithms. |
Tasks | Graph Matching |
Published | 2017-12-07 |
URL | http://arxiv.org/abs/1712.02575v2 |
http://arxiv.org/pdf/1712.02575v2.pdf | |
PWC | https://paperswithcode.com/paper/consistent-multiple-graph-matching-with-multi |
Repo | |
Framework | |
A Fuzzy Brute Force Matching Method for Binary Image Features
Title | A Fuzzy Brute Force Matching Method for Binary Image Features |
Authors | Erkan Bostanci, Nadia Kanwal, Betul Bostanci, Mehmet Serdar Guzel |
Abstract | Matching of binary image features is an important step in many different computer vision applications. Conventionally, an arbitrary threshold is used to identify a correct match from incorrect matches using Hamming distance which may improve or degrade the matching results for different input images. This is mainly due to the image content which is affected by the scene, lighting and imaging conditions. This paper presents a fuzzy logic based approach for brute force matching of image features to overcome this situation. The method was tested using a well-known image database with known ground truth. The approach is shown to produce a higher number of correct matches when compared against constant distance thresholds. The nature of fuzzy logic which allows the vagueness of information and tolerance to errors has been successfully exploited in an image processing context. The uncertainty arising from the imaging conditions has been overcome with the use of compact fuzzy matching membership functions. |
Tasks | |
Published | 2017-04-20 |
URL | http://arxiv.org/abs/1704.06018v1 |
http://arxiv.org/pdf/1704.06018v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fuzzy-brute-force-matching-method-for |
Repo | |
Framework | |
Learning to Predict Charges for Criminal Cases with Legal Basis
Title | Learning to Predict Charges for Criminal Cases with Legal Basis |
Authors | Bingfeng Luo, Yansong Feng, Jianbo Xu, Xiang Zhang, Dongyan Zhao |
Abstract | The charge prediction task is to determine appropriate charges for a given case, which is helpful for legal assistant systems where the user input is fact description. We argue that relevant law articles play an important role in this task, and therefore propose an attention-based neural network method to jointly model the charge prediction task and the relevant article extraction task in a unified framework. The experimental results show that, besides providing legal basis, the relevant articles can also clearly improve the charge prediction results, and our full model can effectively predict appropriate charges for cases with different expression styles. |
Tasks | |
Published | 2017-07-28 |
URL | http://arxiv.org/abs/1707.09168v1 |
http://arxiv.org/pdf/1707.09168v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-predict-charges-for-criminal |
Repo | |
Framework | |
Multi-kernel learning of deep convolutional features for action recognition
Title | Multi-kernel learning of deep convolutional features for action recognition |
Authors | Biswa Sengupta, Yu Qian |
Abstract | Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-class activity recognition problem (HMDB-51 dataset); this specific dataset has proved to be particularly challenging for deep neural networks due to the heterogeneity in camera viewpoints, video quality, etc. The resulting architecture is named pillar networks as each (very) deep neural network acts as a pillar for the hierarchical classifiers. In addition, we illustrate that hand-crafted features such as improved dense trajectories (iDT) and Multi-skip Feature Stacking (MIFS), as additional pillars, can further supplement the performance. |
Tasks | Activity Recognition, Temporal Action Localization, Video Understanding |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.06923v2 |
http://arxiv.org/pdf/1707.06923v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-kernel-learning-of-deep-convolutional |
Repo | |
Framework | |
Multi-level SVM Based CAD Tool for Classifying Structural MRIs
Title | Multi-level SVM Based CAD Tool for Classifying Structural MRIs |
Authors | Jerrin Thomas Panachakel, Jeena R. S. |
Abstract | The revolutionary developments in the field of supervised machine learning have paved way to the development of CAD tools for assisting doctors in diagnosis. Recently, the former has been employed in the prediction of neurological disorders such as Alzheimer’s disease. We propose a CAD (Computer Aided Diagnosis tool for differentiating neural lesions caused by CVA (Cerebrovascular Accident) from the lesions caused by other neural disorders by using Non-negative Matrix Factorisation (NMF) and Haralick features for feature extraction and SVM (Support Vector Machine) for pattern recognition. We also introduce a multi-level classification system that has better classification efficiency, sensitivity and specificity when compared to systems using NMF or Haralick features alone as features for classification. Cross-validation was performed using LOOCV (Leave-One-Out Cross Validation) method and our proposed system has a classification accuracy of over 86%. |
Tasks | |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08227v1 |
http://arxiv.org/pdf/1706.08227v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-svm-based-cad-tool-for |
Repo | |
Framework | |
Improved Regularization Techniques for End-to-End Speech Recognition
Title | Improved Regularization Techniques for End-to-End Speech Recognition |
Authors | Yingbo Zhou, Caiming Xiong, Richard Socher |
Abstract | Regularization is important for end-to-end speech models, since the models are highly flexible and easy to overfit. Data augmentation and dropout has been important for improving end-to-end models in other domains. However, they are relatively under explored for end-to-end speech models. Therefore, we investigate the effectiveness of both methods for end-to-end trainable, deep speech recognition models. We augment audio data through random perturbations of tempo, pitch, volume, temporal alignment, and adding random noise.We further investigate the effect of dropout when applied to the inputs of all layers of the network. We show that the combination of data augmentation and dropout give a relative performance improvement on both Wall Street Journal (WSJ) and LibriSpeech dataset of over 20%. Our model performance is also competitive with other end-to-end speech models on both datasets. |
Tasks | Data Augmentation, End-To-End Speech Recognition, Speech Recognition |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.07108v1 |
http://arxiv.org/pdf/1712.07108v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-regularization-techniques-for-end-to |
Repo | |
Framework | |
Two-Sample Tests for Large Random Graphs Using Network Statistics
Title | Two-Sample Tests for Large Random Graphs Using Network Statistics |
Authors | Debarghya Ghoshdastidar, Maurilio Gutzeit, Alexandra Carpentier, Ulrike von Luxburg |
Abstract | We consider a two-sample hypothesis testing problem, where the distributions are defined on the space of undirected graphs, and one has access to only one observation from each model. A motivating example for this problem is comparing the friendship networks on Facebook and LinkedIn. The practical approach to such problems is to compare the networks based on certain network statistics. In this paper, we present a general principle for two-sample hypothesis testing in such scenarios without making any assumption about the network generation process. The main contribution of the paper is a general formulation of the problem based on concentration of network statistics, and consequently, a consistent two-sample test that arises as the natural solution for this problem. We also show that the proposed test is minimax optimal for certain network statistics. |
Tasks | |
Published | 2017-05-17 |
URL | http://arxiv.org/abs/1705.06168v2 |
http://arxiv.org/pdf/1705.06168v2.pdf | |
PWC | https://paperswithcode.com/paper/two-sample-tests-for-large-random-graphs |
Repo | |
Framework | |
Exposure: A White-Box Photo Post-Processing Framework
Title | Exposure: A White-Box Photo Post-Processing Framework |
Authors | Yuanming Hu, Hao He, Chenxi Xu, Baoyuan Wang, Stephen Lin |
Abstract | Retouching can significantly elevate the visual appeal of photos, but many casual photographers lack the expertise to do this well. To address this problem, previous works have proposed automatic retouching systems based on supervised learning from paired training images acquired before and after manual editing. As it is difficult for users to acquire paired images that reflect their retouching preferences, we present in this paper a deep learning approach that is instead trained on unpaired data, namely a set of photographs that exhibits a retouching style the user likes, which is much easier to collect. Our system is formulated using deep convolutional neural networks that learn to apply different retouching operations on an input image. Network training with respect to various types of edits is enabled by modeling these retouching operations in a unified manner as resolution-independent differentiable filters. To apply the filters in a proper sequence and with suitable parameters, we employ a deep reinforcement learning approach that learns to make decisions on what action to take next, given the current state of the image. In contrast to many deep learning systems, ours provides users with an understandable solution in the form of conventional retouching edits, rather than just a “black-box” result. Through quantitative comparisons and user studies, we show that this technique generates retouching results consistent with the provided photo set. |
Tasks | |
Published | 2017-09-27 |
URL | http://arxiv.org/abs/1709.09602v2 |
http://arxiv.org/pdf/1709.09602v2.pdf | |
PWC | https://paperswithcode.com/paper/exposure-a-white-box-photo-post-processing |
Repo | |
Framework | |
Controlled Tactile Exploration and Haptic Object Recognition
Title | Controlled Tactile Exploration and Haptic Object Recognition |
Authors | Massimo Regoli, Nawid Jamali, Giorgio Metta, Lorenzo Natale |
Abstract | In this paper we propose a novel method for in-hand object recognition. The method is composed of a grasp stabilization controller and two exploratory behaviours to capture the shape and the softness of an object. Grasp stabilization plays an important role in recognizing objects. First, it prevents the object from slipping and facilitates the exploration of the object. Second, reaching a stable and repeatable position adds robustness to the learning algorithm and increases invariance with respect to the way in which the robot grasps the object. The stable poses are estimated using a Gaussian mixture model (GMM). We present experimental results showing that using our method the classifier can successfully distinguish 30 objects.We also compare our method with a benchmark experiment, in which the grasp stabilization is disabled. We show, with statistical significance, that our method outperforms the benchmark method. |
Tasks | Object Recognition |
Published | 2017-06-27 |
URL | http://arxiv.org/abs/1706.08697v1 |
http://arxiv.org/pdf/1706.08697v1.pdf | |
PWC | https://paperswithcode.com/paper/controlled-tactile-exploration-and-haptic |
Repo | |
Framework | |
Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection
Title | Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection |
Authors | Wei Shen, Bin Wang, Yuan Jiang, Yan Wang, Alan Yuille |
Abstract | In the field of connectomics, neuroscientists seek to identify cortical connectivity comprehensively. Neuronal boundary detection from the Electron Microscopy (EM) images is often done to assist the automatic reconstruction of neuronal circuit. But the segmentation of EM images is a challenging problem, as it requires the detector to be able to detect both filament-like thin and blob-like thick membrane, while suppressing the ambiguous intracellular structure. In this paper, we propose multi-stage multi-recursive-input fully convolutional networks to address this problem. The multiple recursive inputs for one stage, i.e., the multiple side outputs with different receptive field sizes learned from the lower stage, provide multi-scale contextual boundary information for the consecutive learning. This design is biologically-plausible, as it likes a human visual system to compare different possible segmentation solutions to address the ambiguous boundary issue. Our multi-stage networks are trained end-to-end. It achieves promising results on two public available EM segmentation datasets, the mouse piriform cortex dataset and the ISBI 2012 EM dataset. |
Tasks | Boundary Detection |
Published | 2017-03-24 |
URL | http://arxiv.org/abs/1703.08493v2 |
http://arxiv.org/pdf/1703.08493v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-stage-multi-recursive-input-fully |
Repo | |
Framework | |
NMODE — Neuro-MODule Evolution
Title | NMODE — Neuro-MODule Evolution |
Authors | Keyan Ghazi-Zahedi |
Abstract | Modularisation, repetition, and symmetry are structural features shared by almost all biological neural networks. These features are very unlikely to be found by the means of structural evolution of artificial neural networks. This paper introduces NMODE, which is specifically designed to operate on neuro-modules. NMODE addresses a second problem in the context of evolutionary robotics, which is incremental evolution of complex behaviours for complex machines, by offering a way to interface neuro-modules. The scenario in mind is a complex walking machine, for which a locomotion module is evolved first, that is then extended by other modules in later stages. We show that NMODE is able to evolve a locomotion behaviour for a standard six-legged walking machine in approximately 10 generations and show how it can be used for incremental evolution of a complex walking machine. The entire source code used in this paper is publicly available through GitHub. |
Tasks | |
Published | 2017-01-18 |
URL | http://arxiv.org/abs/1701.05121v1 |
http://arxiv.org/pdf/1701.05121v1.pdf | |
PWC | https://paperswithcode.com/paper/nmode-neuro-module-evolution |
Repo | |
Framework | |
Learning Robust Dialog Policies in Noisy Environments
Title | Learning Robust Dialog Policies in Noisy Environments |
Authors | Maryam Fazel-Zarandi, Shang-Wen Li, Jin Cao, Jared Casale, Peter Henderson, David Whitney, Alborz Geramifard |
Abstract | Modern virtual personal assistants provide a convenient interface for completing daily tasks via voice commands. An important consideration for these assistants is the ability to recover from automatic speech recognition (ASR) and natural language understanding (NLU) errors. In this paper, we focus on learning robust dialog policies to recover from these errors. To this end, we develop a user simulator which interacts with the assistant through voice commands in realistic scenarios with noisy audio, and use it to learn dialog policies through deep reinforcement learning. We show that dialogs generated by our simulator are indistinguishable from human generated dialogs, as determined by human evaluators. Furthermore, preliminary experimental results show that the learned policies in noisy environments achieve the same execution success rate with fewer dialog turns compared to fixed rule-based policies. |
Tasks | Speech Recognition |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.04034v1 |
http://arxiv.org/pdf/1712.04034v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-robust-dialog-policies-in-noisy |
Repo | |
Framework | |
RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations
Title | RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations |
Authors | Shuochao Yao, Yiran Zhao, Huajie Shao, Aston Zhang, Chao Zhang, Shen Li, Tarek Abdelzaher |
Abstract | Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications. Although existing studies have demonstrated the effectiveness and feasibility of running deep neural network inference operations on mobile and embedded devices, they overlooked the reliability of mobile computing models. Reliability measurements such as predictive uncertainty estimations are key factors for improving the decision accuracy and user experience. In this work, we propose RDeepSense, the first deep learning model that provides well-calibrated uncertainty estimations for resource-constrained mobile and embedded devices. RDeepSense enables the predictive uncertainty by adopting a tunable proper scoring rule as the training criterion and dropout as the implicit Bayesian approximation, which theoretically proves its correctness.To reduce the computational complexity, RDeepSense employs efficient dropout and predictive distribution estimation instead of model ensemble or sampling-based method for inference operations. We evaluate RDeepSense with four mobile sensing applications using Intel Edison devices. Results show that RDeepSense can reduce around 90% of the energy consumption while producing superior uncertainty estimations and preserving at least the same model accuracy compared with other state-of-the-art methods. |
Tasks | |
Published | 2017-09-09 |
URL | http://arxiv.org/abs/1709.02980v1 |
http://arxiv.org/pdf/1709.02980v1.pdf | |
PWC | https://paperswithcode.com/paper/rdeepsense-reliable-deep-mobile-computing |
Repo | |
Framework | |
Deep Structured Learning for Facial Action Unit Intensity Estimation
Title | Deep Structured Learning for Facial Action Unit Intensity Estimation |
Authors | Robert Walecki, Ognjen, Rudovic, Vladimir Pavlovic, Björn Schuller, Maja Pantic |
Abstract | We consider the task of automated estimation of facial expression intensity. This involves estimation of multiple output variables (facial action units — AUs) that are structurally dependent. Their structure arises from statistically induced co-occurrence patterns of AU intensity levels. Modeling this structure is critical for improving the estimation performance; however, this performance is bounded by the quality of the input features extracted from face images. The goal of this paper is to model these structures and estimate complex feature representations simultaneously by combining conditional random field (CRF) encoded AU dependencies with deep learning. To this end, we propose a novel Copula CNN deep learning approach for modeling multivariate ordinal variables. Our model accounts for $ordinal$ structure in output variables and their $non$-$linear$ dependencies via copula functions modeled as cliques of a CRF. These are jointly optimized with deep CNN feature encoding layers using a newly introduced balanced batch iterative training algorithm. We demonstrate the effectiveness of our approach on the task of AU intensity estimation on two benchmark datasets. We show that joint learning of the deep features and the target output structure results in significant performance gains compared to existing deep structured models for analysis of facial expressions. |
Tasks | |
Published | 2017-04-14 |
URL | http://arxiv.org/abs/1704.04481v1 |
http://arxiv.org/pdf/1704.04481v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-structured-learning-for-facial-action |
Repo | |
Framework | |