October 17, 2019

2915 words 14 mins read

Paper Group ANR 760

Paper Group ANR 760

Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders. Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging. On the Learning Dynamics of Deep Neural Networks. Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Gr …

Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders

Title Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders
Authors Emad M. Grais, Dominic Ward, Mark D. Plumbley
Abstract Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. In this work, we introduce a novel multi-channel, multi-resolution convolutional auto-encoder neural network that works on raw time-domain signals to determine appropriate multi-resolution features for separating the singing-voice from stereo music. Our experimental results show that the proposed method can achieve multi-channel audio source separation without the need for hand-crafted features or any pre- or post-processing.
Tasks
Published 2018-03-02
URL http://arxiv.org/abs/1803.00702v1
PDF http://arxiv.org/pdf/1803.00702v1.pdf
PWC https://paperswithcode.com/paper/raw-multi-channel-audio-source-separation
Repo
Framework

Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging

Title Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging
Authors Yaroslav Shmulev, Mikhail Belyaev
Abstract Nowadays, a lot of scientific efforts are concentrated on the diagnosis of Alzheimer’s Disease (AD) applying deep learning methods to neuroimaging data. Even for 2017, there were published more than a hundred papers dedicated to AD diagnosis, whereas only a few works considered a problem of mild cognitive impairments (MCI) conversion to the AD. However, the conversion prediction is an important problem since approximately 15% of patients with MCI converges to the AD every year. In the current work, we are focusing on the conversion prediction using brain Magnetic Resonance Imaging and clinical data, such as demographics, cognitive assessments, genetic, and biochemical markers. First of all, we applied state-of-the-art deep learning algorithms on the neuroimaging data and compared these results with two machine learning algorithms that we fit using the clinical data. As a result, the models trained on the clinical data outperform the deep learning algorithms applied to the MR images. To explore the impact of neuroimaging further, we trained a deep feed-forward embedding using similarity learning with Histogram loss on all available MRIs and obtained 64-dimensional vector representation of neuroimaging data. The use of learned representation from the deep embedding allowed to increase the quality of prediction based on the neuroimaging. Finally, the current results on this dataset show that the neuroimaging does affect conversion prediction, however, cannot noticeably increase the quality of the prediction. The best results of predicting MCI-to-AD conversion are provided by XGBoost algorithm trained on the clinical and embedding data. The resulting accuracy is 0.76 +- 0.01 and the area under the ROC curve - 0.86 +- 0.01.
Tasks
Published 2018-07-30
URL http://arxiv.org/abs/1807.11228v1
PDF http://arxiv.org/pdf/1807.11228v1.pdf
PWC https://paperswithcode.com/paper/predicting-conversion-of-mild-cognitive
Repo
Framework

On the Learning Dynamics of Deep Neural Networks

Title On the Learning Dynamics of Deep Neural Networks
Authors Remi Tachet des Combes, Mohammad Pezeshki, Samira Shabanian, Aaron Courville, Yoshua Bengio
Abstract While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features’ frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.
Tasks
Published 2018-09-18
URL https://arxiv.org/abs/1809.06848v2
PDF https://arxiv.org/pdf/1809.06848v2.pdf
PWC https://paperswithcode.com/paper/on-the-learning-dynamics-of-deep-neural
Repo
Framework

Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

Title Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks
Authors Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, Daniel Gildea
Abstract Multi-hop reading comprehension focuses on one type of factoid question, where a system needs to properly integrate multiple pieces of evidence to correctly answer a question. Previous work approximates global evidence with local coreference information, encoding coreference chains with DAG-styled GRU layers within a gated-attention reader. However, coreference is limited in providing information for rich inference. We introduce a new method for better connecting global evidence, which forms more complex graphs compared to DAGs. To perform evidence integration on our graphs, we investigate two recent graph neural networks, namely graph convolutional network (GCN) and graph recurrent network (GRN). Experiments on two standard datasets show that richer global information leads to better answers. Our method performs better than all published results on these datasets.
Tasks Multi-Hop Reading Comprehension, Reading Comprehension
Published 2018-09-06
URL http://arxiv.org/abs/1809.02040v1
PDF http://arxiv.org/pdf/1809.02040v1.pdf
PWC https://paperswithcode.com/paper/exploring-graph-structured-passage
Repo
Framework

Fusion of stereo and still monocular depth estimates in a self-supervised learning context

Title Fusion of stereo and still monocular depth estimates in a self-supervised learning context
Authors Diogo Martins, Kevin van Hecke, Guido de Croon
Abstract We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion method that preserves high confidence stereo estimates, while leveraging the CNN estimates in the low-confidence regions. The main contribution of the article is that it is shown that the fused estimates lead to a higher performance than the stereo vision estimates alone. Experiments are performed on the KITTI dataset, and on board of a Parrot SLAMDunk, showing that even rather limited CNNs can help provide stereo vision equipped robots with more reliable depth maps for autonomous navigation.
Tasks Autonomous Navigation, Depth Estimation
Published 2018-03-20
URL http://arxiv.org/abs/1803.07512v1
PDF http://arxiv.org/pdf/1803.07512v1.pdf
PWC https://paperswithcode.com/paper/fusion-of-stereo-and-still-monocular-depth
Repo
Framework

Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference

Title Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference
Authors Reza Ghaeini, Xiaoli Z. Fern, Prasad Tadepalli
Abstract Deep learning models have achieved remarkable success in natural language inference (NLI) tasks. While these models are widely explored, they are hard to interpret and it is often unclear how and why they actually work. In this paper, we take a step toward explaining such deep learning based models through a case study on a popular neural model for NLI. In particular, we propose to interpret the intermediate layers of NLI models by visualizing the saliency of attention and LSTM gating signals. We present several examples for which our methods are able to reveal interesting insights and identify the critical information contributing to the model decisions.
Tasks Natural Language Inference
Published 2018-08-12
URL http://arxiv.org/abs/1808.03894v1
PDF http://arxiv.org/pdf/1808.03894v1.pdf
PWC https://paperswithcode.com/paper/interpreting-recurrent-and-attention-based
Repo
Framework

Coupled Fluid Density and Motion from Single Views

Title Coupled Fluid Density and Motion from Single Views
Authors Marie-Lena Eckert, Wolfgang Heidrich, Nils Thuerey
Abstract We present a novel method to reconstruct a fluid’s 3D density and motion based on just a single sequence of images. This is rendered possible by using powerful physical priors for this strongly under-determined problem. More specifically, we propose a novel strategy to infer density updates strongly coupled to previous and current estimates of the flow motion. Additionally, we employ an accurate discretization and depth-based regularizers to compute stable solutions. Using only one view for the reconstruction reduces the complexity of the capturing setup drastically and could even allow for online video databases or smart-phone videos as inputs. The reconstructed 3D velocity can then be flexibly utilized, e.g., for re-simulation, domain modification or guiding purposes. We will demonstrate the capacity of our method with a series of synthetic test cases and the reconstruction of real smoke plumes captured with a Raspberry Pi camera.
Tasks
Published 2018-06-18
URL http://arxiv.org/abs/1806.06613v1
PDF http://arxiv.org/pdf/1806.06613v1.pdf
PWC https://paperswithcode.com/paper/coupled-fluid-density-and-motion-from-single
Repo
Framework

Compositional Verification for Autonomous Systems with Deep Learning Components

Title Compositional Verification for Autonomous Systems with Deep Learning Components
Authors Corina S. Pasareanu, Divya Gopinath, Huafeng Yu
Abstract As autonomy becomes prevalent in many applications, ranging from recommendation systems to fully autonomous vehicles, there is an increased need to provide safety guarantees for such systems. The problem is difficult, as these are large, complex systems which operate in uncertain environments, requiring data-driven machine-learning components. However, learning techniques such as Deep Neural Networks, widely used today, are inherently unpredictable and lack the theoretical foundations to provide strong assurance guarantees. We present a compositional approach for the scalable, formal verification of autonomous systems that contain Deep Neural Network components. The approach uses assume-guarantee reasoning whereby {\em contracts}, encoding the input-output behavior of individual components, allow the designer to model and incorporate the behavior of the learning-enabled components working side-by-side with the other components. We illustrate the approach on an example taken from the autonomous vehicles domain.
Tasks Autonomous Vehicles, Recommendation Systems
Published 2018-10-18
URL http://arxiv.org/abs/1810.08303v1
PDF http://arxiv.org/pdf/1810.08303v1.pdf
PWC https://paperswithcode.com/paper/compositional-verification-for-autonomous
Repo
Framework

An Analysis of the Accuracy of the P300 BCI

Title An Analysis of the Accuracy of the P300 BCI
Authors Nitzan S. Artzi, Oren Shriki
Abstract The P300 Brain-Computer Interface (BCI) is a well-established communication channel for severely disabled people. The P300 event-related potential is mostly characterized by its amplitude or its area, which correlate with the spelling accuracy of the P300 speller. Here, we introduce a novel approach for estimating the efficiency of this BCI by considering the P300 signal-to-noise ratio (SNR), a parameter that estimates the spatial and temporal noise levels and has a significantly stronger correlation with spelling accuracy. Furthermore, we suggest a Gaussian noise model, which utilizes the P300 event-related potential SNR to predict spelling accuracy under various conditions for LDA-based classification. We demonstrate the utility of this analysis using real data and discuss its potential applications, such as speeding up the process of electrode selection.
Tasks
Published 2018-12-11
URL http://arxiv.org/abs/1901.03299v1
PDF http://arxiv.org/pdf/1901.03299v1.pdf
PWC https://paperswithcode.com/paper/an-analysis-of-the-accuracy-of-the-p300-bci
Repo
Framework

Structured Neural Topic Models for Reviews

Title Structured Neural Topic Models for Reviews
Authors Babak Esmaeili, Hongyi Huang, Byron C. Wallace, Jan-Willem van de Meent
Abstract We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user and item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by inferring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects. Experimental evaluation on large number of datasets demonstrates that aspects are interpretable, yield higher coherence scores than non-structured autoencoding topic model variants, and can be utilized to perform aspect-based comparison and genre discovery.
Tasks Topic Models
Published 2018-12-12
URL http://arxiv.org/abs/1812.05035v2
PDF http://arxiv.org/pdf/1812.05035v2.pdf
PWC https://paperswithcode.com/paper/structured-neural-topic-models-for-reviews
Repo
Framework

Proximal Online Gradient is Optimum for Dynamic Regret

Title Proximal Online Gradient is Optimum for Dynamic Regret
Authors Yawei Zhao, Shuang Qiu, Ji Liu
Abstract In online learning, the dynamic regret metric chooses the reference (optimal) solution that may change over time, while the typical (static) regret metric assumes the reference solution to be constant over the whole time horizon. The dynamic regret metric is particularly interesting for applications such as online recommendation (since the customers’ preference always evolves over time). While the online gradient method has been shown to be optimal for the static regret metric, the optimal algorithm for the dynamic regret remains unknown. In this paper, we show that proximal online gradient (a general version of online gradient) is optimum to the dynamic regret by showing that the proved lower bound matches the upper bound that slightly improves existing upper bound.
Tasks
Published 2018-10-08
URL https://arxiv.org/abs/1810.03594v6
PDF https://arxiv.org/pdf/1810.03594v6.pdf
PWC https://paperswithcode.com/paper/proximal-online-gradient-is-optimum-for
Repo
Framework

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition

Title Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition
Authors Wei-Ning Hsu, James Glass
Abstract The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions. In this paper, we address robustness by studying domain invariant features, such that domain information becomes transparent to ASR systems, resolving the mismatch problem. Specifically, we investigate a recent model, called the Factorized Hierarchical Variational Autoencoder (FHVAE). FHVAEs learn to factorize sequence-level and segment-level attributes into different latent variables without supervision. We argue that the set of latent variables that contain segment-level information is our desired domain invariant feature for ASR. Experiments are conducted on Aurora-4 and CHiME-4, which demonstrate 41% and 27% absolute word error rate reductions respectively on mismatched domains.
Tasks Speech Recognition
Published 2018-03-07
URL http://arxiv.org/abs/1803.02551v1
PDF http://arxiv.org/pdf/1803.02551v1.pdf
PWC https://paperswithcode.com/paper/extracting-domain-invariant-features-by
Repo
Framework

Learning to Race through Coordinate Descent Bayesian Optimisation

Title Learning to Race through Coordinate Descent Bayesian Optimisation
Authors Rafael Oliveira, Fernando H. M. Rocha, Lionel Ott, Vitor Guizilini, Fabio Ramos, Valdir Grassi Jr
Abstract In the automation of many kinds of processes, the observable outcome can often be described as the combined effect of an entire sequence of actions, or controls, applied throughout its execution. In these cases, strategies to optimise control policies for individual stages of the process might not be applicable, and instead the whole policy might have to be optimised at once. On the other hand, the cost to evaluate the policy’s performance might also be high, being desirable that a solution can be found with as few interactions as possible with the real system. We consider the problem of optimising control policies to allow a robot to complete a given race track within a minimum amount of time. We assume that the robot has no prior information about the track or its own dynamical model, just an initial valid driving example. Localisation is only applied to monitor the robot and to provide an indication of its position along the track’s centre axis. We propose a method for finding a policy that minimises the time per lap while keeping the vehicle on the track using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert space. We apply an algorithm to search more efficiently over high-dimensional policy-parameter spaces with BO, by iterating over each dimension individually, in a sequential coordinate descent-like scheme. Experiments demonstrate the performance of the algorithm against other methods in a simulated car racing environment.
Tasks Bayesian Optimisation, Car Racing
Published 2018-02-17
URL http://arxiv.org/abs/1802.06179v1
PDF http://arxiv.org/pdf/1802.06179v1.pdf
PWC https://paperswithcode.com/paper/learning-to-race-through-coordinate-descent
Repo
Framework

Presentation Attack Detection for Iris Recognition: An Assessment of the State of the Art

Title Presentation Attack Detection for Iris Recognition: An Assessment of the State of the Art
Authors Adam Czajka, Kevin W. Bowyer
Abstract Iris recognition is increasingly used in large-scale applications. As a result, presentation attack detection for iris recognition takes on fundamental importance. This survey covers the diverse research literature on this topic. Different categories of presentation attack are described and placed in an application-relevant framework, and the state of the art in detecting each category of attack is summarized. One conclusion from this is that presentation attack detection for iris recognition is not yet a solved problem. Datasets available for research are described, research directions for the near- and medium-term future are outlined, and a short list of recommended readings are suggested.
Tasks Iris Recognition
Published 2018-03-31
URL http://arxiv.org/abs/1804.00194v3
PDF http://arxiv.org/pdf/1804.00194v3.pdf
PWC https://paperswithcode.com/paper/presentation-attack-detection-for-iris
Repo
Framework

Learning Selfie-Friendly Abstraction from Artistic Style Images

Title Learning Selfie-Friendly Abstraction from Artistic Style Images
Authors Yicun Liu, Jimmy Ren, Jianbo Liu, Jiawei Zhang, Xiaohao Chen
Abstract Artistic style transfer can be thought as a process to generate different versions of abstraction of the original image. However, most of the artistic style transfer operators are not optimized for human faces thus mainly suffers from two undesirable features when applying them to selfies. First, the edges of human faces may unpleasantly deviate from the ones in the original image. Second, the skin color is far from faithful to the original one which is usually problematic in producing quality selfies. In this paper, we take a different approach and formulate this abstraction process as a gradient domain learning problem. We aim to learn a type of abstraction which not only achieves the specified artistic style but also circumvents the two aforementioned drawbacks thus highly applicable to selfie photography. We also show that our method can be directly generalized to videos with high inter-frame consistency. Our method is also robust to non-selfie images, and the generalization to various kinds of real-life scenes is discussed. We will make our code publicly available.
Tasks Style Transfer
Published 2018-05-05
URL http://arxiv.org/abs/1805.02085v2
PDF http://arxiv.org/pdf/1805.02085v2.pdf
PWC https://paperswithcode.com/paper/learning-selfie-friendly-abstraction-from
Repo
Framework
comments powered by Disqus