October 17, 2019

2915 words 14 mins read

Paper Group ANR 760

Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders. Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging. On the Learning Dynamics of Deep Neural Networks. Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Gr …

Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders


Title	Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-Encoders
Authors	Emad M. Grais, Dominic Ward, Mark D. Plumbley
Abstract	Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. In this work, we introduce a novel multi-channel, multi-resolution convolutional auto-encoder neural network that works on raw time-domain signals to determine appropriate multi-resolution features for separating the singing-voice from stereo music. Our experimental results show that the proposed method can achieve multi-channel audio source separation without the need for hand-crafted features or any pre- or post-processing.
Tasks
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00702v1
PDF	http://arxiv.org/pdf/1803.00702v1.pdf
PWC	https://paperswithcode.com/paper/raw-multi-channel-audio-source-separation
Repo
Framework

Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging


Title	Predicting Conversion of Mild Cognitive Impairments to Alzheimer’s Disease and Exploring Impact of Neuroimaging
Authors	Yaroslav Shmulev, Mikhail Belyaev
Abstract	Nowadays, a lot of scientific efforts are concentrated on the diagnosis of Alzheimer’s Disease (AD) applying deep learning methods to neuroimaging data. Even for 2017, there were published more than a hundred papers dedicated to AD diagnosis, whereas only a few works considered a problem of mild cognitive impairments (MCI) conversion to the AD. However, the conversion prediction is an important problem since approximately 15% of patients with MCI converges to the AD every year. In the current work, we are focusing on the conversion prediction using brain Magnetic Resonance Imaging and clinical data, such as demographics, cognitive assessments, genetic, and biochemical markers. First of all, we applied state-of-the-art deep learning algorithms on the neuroimaging data and compared these results with two machine learning algorithms that we fit using the clinical data. As a result, the models trained on the clinical data outperform the deep learning algorithms applied to the MR images. To explore the impact of neuroimaging further, we trained a deep feed-forward embedding using similarity learning with Histogram loss on all available MRIs and obtained 64-dimensional vector representation of neuroimaging data. The use of learned representation from the deep embedding allowed to increase the quality of prediction based on the neuroimaging. Finally, the current results on this dataset show that the neuroimaging does affect conversion prediction, however, cannot noticeably increase the quality of the prediction. The best results of predicting MCI-to-AD conversion are provided by XGBoost algorithm trained on the clinical and embedding data. The resulting accuracy is 0.76 +- 0.01 and the area under the ROC curve - 0.86 +- 0.01.
Tasks
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11228v1
PDF	http://arxiv.org/pdf/1807.11228v1.pdf
PWC	https://paperswithcode.com/paper/predicting-conversion-of-mild-cognitive
Repo
Framework

On the Learning Dynamics of Deep Neural Networks


Title	On the Learning Dynamics of Deep Neural Networks
Authors	Remi Tachet des Combes, Mohammad Pezeshki, Samira Shabanian, Aaron Courville, Yoshua Bengio
Abstract	While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features’ frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.
Tasks
Published	2018-09-18
URL	https://arxiv.org/abs/1809.06848v2
PDF	https://arxiv.org/pdf/1809.06848v2.pdf
PWC	https://paperswithcode.com/paper/on-the-learning-dynamics-of-deep-neural
Repo
Framework

Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks


Title	Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks
Authors	Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, Daniel Gildea
Abstract	Multi-hop reading comprehension focuses on one type of factoid question, where a system needs to properly integrate multiple pieces of evidence to correctly answer a question. Previous work approximates global evidence with local coreference information, encoding coreference chains with DAG-styled GRU layers within a gated-attention reader. However, coreference is limited in providing information for rich inference. We introduce a new method for better connecting global evidence, which forms more complex graphs compared to DAGs. To perform evidence integration on our graphs, we investigate two recent graph neural networks, namely graph convolutional network (GCN) and graph recurrent network (GRN). Experiments on two standard datasets show that richer global information leads to better answers. Our method performs better than all published results on these datasets.
Tasks	Multi-Hop Reading Comprehension, Reading Comprehension
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02040v1
PDF	http://arxiv.org/pdf/1809.02040v1.pdf
PWC	https://paperswithcode.com/paper/exploring-graph-structured-passage
Repo
Framework

Fusion of stereo and still monocular depth estimates in a self-supervised learning context


Title	Fusion of stereo and still monocular depth estimates in a self-supervised learning context
Authors	Diogo Martins, Kevin van Hecke, Guido de Croon
Abstract	We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion method that preserves high confidence stereo estimates, while leveraging the CNN estimates in the low-confidence regions. The main contribution of the article is that it is shown that the fused estimates lead to a higher performance than the stereo vision estimates alone. Experiments are performed on the KITTI dataset, and on board of a Parrot SLAMDunk, showing that even rather limited CNNs can help provide stereo vision equipped robots with more reliable depth maps for autonomous navigation.
Tasks	Autonomous Navigation, Depth Estimation
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07512v1
PDF	http://arxiv.org/pdf/1803.07512v1.pdf
PWC	https://paperswithcode.com/paper/fusion-of-stereo-and-still-monocular-depth
Repo
Framework

Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference


Title	Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference
Authors	Reza Ghaeini, Xiaoli Z. Fern, Prasad Tadepalli
Abstract	Deep learning models have achieved remarkable success in natural language inference (NLI) tasks. While these models are widely explored, they are hard to interpret and it is often unclear how and why they actually work. In this paper, we take a step toward explaining such deep learning based models through a case study on a popular neural model for NLI. In particular, we propose to interpret the intermediate layers of NLI models by visualizing the saliency of attention and LSTM gating signals. We present several examples for which our methods are able to reveal interesting insights and identify the critical information contributing to the model decisions.
Tasks	Natural Language Inference
Published	2018-08-12
URL	http://arxiv.org/abs/1808.03894v1
PDF	http://arxiv.org/pdf/1808.03894v1.pdf
PWC	https://paperswithcode.com/paper/interpreting-recurrent-and-attention-based
Repo
Framework

Coupled Fluid Density and Motion from Single Views


Title	Coupled Fluid Density and Motion from Single Views
Authors	Marie-Lena Eckert, Wolfgang Heidrich, Nils Thuerey
Abstract	We present a novel method to reconstruct a fluid’s 3D density and motion based on just a single sequence of images. This is rendered possible by using powerful physical priors for this strongly under-determined problem. More specifically, we propose a novel strategy to infer density updates strongly coupled to previous and current estimates of the flow motion. Additionally, we employ an accurate discretization and depth-based regularizers to compute stable solutions. Using only one view for the reconstruction reduces the complexity of the capturing setup drastically and could even allow for online video databases or smart-phone videos as inputs. The reconstructed 3D velocity can then be flexibly utilized, e.g., for re-simulation, domain modification or guiding purposes. We will demonstrate the capacity of our method with a series of synthetic test cases and the reconstruction of real smoke plumes captured with a Raspberry Pi camera.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06613v1
PDF	http://arxiv.org/pdf/1806.06613v1.pdf
PWC	https://paperswithcode.com/paper/coupled-fluid-density-and-motion-from-single
Repo
Framework

Compositional Verification for Autonomous Systems with Deep Learning Components


Title	Compositional Verification for Autonomous Systems with Deep Learning Components
Authors	Corina S. Pasareanu, Divya Gopinath, Huafeng Yu
Abstract	As autonomy becomes prevalent in many applications, ranging from recommendation systems to fully autonomous vehicles, there is an increased need to provide safety guarantees for such systems. The problem is difficult, as these are large, complex systems which operate in uncertain environments, requiring data-driven machine-learning components. However, learning techniques such as Deep Neural Networks, widely used today, are inherently unpredictable and lack the theoretical foundations to provide strong assurance guarantees. We present a compositional approach for the scalable, formal verification of autonomous systems that contain Deep Neural Network components. The approach uses assume-guarantee reasoning whereby {\em contracts}, encoding the input-output behavior of individual components, allow the designer to model and incorporate the behavior of the learning-enabled components working side-by-side with the other components. We illustrate the approach on an example taken from the autonomous vehicles domain.
Tasks	Autonomous Vehicles, Recommendation Systems
Published	2018-10-18
URL	http://arxiv.org/abs/1810.08303v1
PDF	http://arxiv.org/pdf/1810.08303v1.pdf
PWC	https://paperswithcode.com/paper/compositional-verification-for-autonomous
Repo
Framework

An Analysis of the Accuracy of the P300 BCI


Title	An Analysis of the Accuracy of the P300 BCI
Authors	Nitzan S. Artzi, Oren Shriki
Abstract	The P300 Brain-Computer Interface (BCI) is a well-established communication channel for severely disabled people. The P300 event-related potential is mostly characterized by its amplitude or its area, which correlate with the spelling accuracy of the P300 speller. Here, we introduce a novel approach for estimating the efficiency of this BCI by considering the P300 signal-to-noise ratio (SNR), a parameter that estimates the spatial and temporal noise levels and has a significantly stronger correlation with spelling accuracy. Furthermore, we suggest a Gaussian noise model, which utilizes the P300 event-related potential SNR to predict spelling accuracy under various conditions for LDA-based classification. We demonstrate the utility of this analysis using real data and discuss its potential applications, such as speeding up the process of electrode selection.
Tasks
Published	2018-12-11
URL	http://arxiv.org/abs/1901.03299v1
PDF	http://arxiv.org/pdf/1901.03299v1.pdf
PWC	https://paperswithcode.com/paper/an-analysis-of-the-accuracy-of-the-p300-bci
Repo
Framework

Structured Neural Topic Models for Reviews


Title	Structured Neural Topic Models for Reviews
Authors	Babak Esmaeili, Hongyi Huang, Byron C. Wallace, Jan-Willem van de Meent
Abstract	We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user and item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by inferring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects. Experimental evaluation on large number of datasets demonstrates that aspects are interpretable, yield higher coherence scores than non-structured autoencoding topic model variants, and can be utilized to perform aspect-based comparison and genre discovery.
Tasks	Topic Models
Published	2018-12-12
URL	http://arxiv.org/abs/1812.05035v2
PDF	http://arxiv.org/pdf/1812.05035v2.pdf
PWC	https://paperswithcode.com/paper/structured-neural-topic-models-for-reviews
Repo
Framework

Proximal Online Gradient is Optimum for Dynamic Regret


Title	Proximal Online Gradient is Optimum for Dynamic Regret
Authors	Yawei Zhao, Shuang Qiu, Ji Liu
Abstract	In online learning, the dynamic regret metric chooses the reference (optimal) solution that may change over time, while the typical (static) regret metric assumes the reference solution to be constant over the whole time horizon. The dynamic regret metric is particularly interesting for applications such as online recommendation (since the customers’ preference always evolves over time). While the online gradient method has been shown to be optimal for the static regret metric, the optimal algorithm for the dynamic regret remains unknown. In this paper, we show that proximal online gradient (a general version of online gradient) is optimum to the dynamic regret by showing that the proved lower bound matches the upper bound that slightly improves existing upper bound.
Tasks
Published	2018-10-08
URL	https://arxiv.org/abs/1810.03594v6
PDF	https://arxiv.org/pdf/1810.03594v6.pdf
PWC	https://paperswithcode.com/paper/proximal-online-gradient-is-optimum-for
Repo
Framework

Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition


Title	Extracting Domain Invariant Features by Unsupervised Learning for Robust Automatic Speech Recognition
Authors	Wei-Ning Hsu, James Glass
Abstract	The performance of automatic speech recognition (ASR) systems can be significantly compromised by previously unseen conditions, which is typically due to a mismatch between training and testing distributions. In this paper, we address robustness by studying domain invariant features, such that domain information becomes transparent to ASR systems, resolving the mismatch problem. Specifically, we investigate a recent model, called the Factorized Hierarchical Variational Autoencoder (FHVAE). FHVAEs learn to factorize sequence-level and segment-level attributes into different latent variables without supervision. We argue that the set of latent variables that contain segment-level information is our desired domain invariant feature for ASR. Experiments are conducted on Aurora-4 and CHiME-4, which demonstrate 41% and 27% absolute word error rate reductions respectively on mismatched domains.
Tasks	Speech Recognition
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02551v1
PDF	http://arxiv.org/pdf/1803.02551v1.pdf
PWC	https://paperswithcode.com/paper/extracting-domain-invariant-features-by
Repo
Framework

Learning to Race through Coordinate Descent Bayesian Optimisation


Title	Learning to Race through Coordinate Descent Bayesian Optimisation
Authors	Rafael Oliveira, Fernando H. M. Rocha, Lionel Ott, Vitor Guizilini, Fabio Ramos, Valdir Grassi Jr
Abstract	In the automation of many kinds of processes, the observable outcome can often be described as the combined effect of an entire sequence of actions, or controls, applied throughout its execution. In these cases, strategies to optimise control policies for individual stages of the process might not be applicable, and instead the whole policy might have to be optimised at once. On the other hand, the cost to evaluate the policy’s performance might also be high, being desirable that a solution can be found with as few interactions as possible with the real system. We consider the problem of optimising control policies to allow a robot to complete a given race track within a minimum amount of time. We assume that the robot has no prior information about the track or its own dynamical model, just an initial valid driving example. Localisation is only applied to monitor the robot and to provide an indication of its position along the track’s centre axis. We propose a method for finding a policy that minimises the time per lap while keeping the vehicle on the track using a Bayesian optimisation (BO) approach over a reproducing kernel Hilbert space. We apply an algorithm to search more efficiently over high-dimensional policy-parameter spaces with BO, by iterating over each dimension individually, in a sequential coordinate descent-like scheme. Experiments demonstrate the performance of the algorithm against other methods in a simulated car racing environment.
Tasks	Bayesian Optimisation, Car Racing
Published	2018-02-17
URL	http://arxiv.org/abs/1802.06179v1
PDF	http://arxiv.org/pdf/1802.06179v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-race-through-coordinate-descent
Repo
Framework

Presentation Attack Detection for Iris Recognition: An Assessment of the State of the Art


Title	Presentation Attack Detection for Iris Recognition: An Assessment of the State of the Art
Authors	Adam Czajka, Kevin W. Bowyer
Abstract	Iris recognition is increasingly used in large-scale applications. As a result, presentation attack detection for iris recognition takes on fundamental importance. This survey covers the diverse research literature on this topic. Different categories of presentation attack are described and placed in an application-relevant framework, and the state of the art in detecting each category of attack is summarized. One conclusion from this is that presentation attack detection for iris recognition is not yet a solved problem. Datasets available for research are described, research directions for the near- and medium-term future are outlined, and a short list of recommended readings are suggested.
Tasks	Iris Recognition
Published	2018-03-31
URL	http://arxiv.org/abs/1804.00194v3
PDF	http://arxiv.org/pdf/1804.00194v3.pdf
PWC	https://paperswithcode.com/paper/presentation-attack-detection-for-iris
Repo
Framework

Learning Selfie-Friendly Abstraction from Artistic Style Images


Title	Learning Selfie-Friendly Abstraction from Artistic Style Images
Authors	Yicun Liu, Jimmy Ren, Jianbo Liu, Jiawei Zhang, Xiaohao Chen
Abstract	Artistic style transfer can be thought as a process to generate different versions of abstraction of the original image. However, most of the artistic style transfer operators are not optimized for human faces thus mainly suffers from two undesirable features when applying them to selfies. First, the edges of human faces may unpleasantly deviate from the ones in the original image. Second, the skin color is far from faithful to the original one which is usually problematic in producing quality selfies. In this paper, we take a different approach and formulate this abstraction process as a gradient domain learning problem. We aim to learn a type of abstraction which not only achieves the specified artistic style but also circumvents the two aforementioned drawbacks thus highly applicable to selfie photography. We also show that our method can be directly generalized to videos with high inter-frame consistency. Our method is also robust to non-selfie images, and the generalization to various kinds of real-life scenes is discussed. We will make our code publicly available.
Tasks	Style Transfer
Published	2018-05-05
URL	http://arxiv.org/abs/1805.02085v2
PDF	http://arxiv.org/pdf/1805.02085v2.pdf
PWC	https://paperswithcode.com/paper/learning-selfie-friendly-abstraction-from
Repo
Framework