January 28, 2020

2925 words 14 mins read

Paper Group ANR 935

Structural Recurrent Neural Network for Traffic Speed Prediction. Robust Monocular Edge Visual Odometry through Coarse-to-Fine Data Association. Zero-Shot Task Transfer. Data Efficient Voice Cloning for Neural Singing Synthesis. Faster Neural Network Training with Data Echoing. Neural Network based End-to-End Query by Example Spoken Term Detection. …

Structural Recurrent Neural Network for Traffic Speed Prediction


Title	Structural Recurrent Neural Network for Traffic Speed Prediction
Authors	Youngjoo Kim, Peng Wang, Lyudmila Mihaylova
Abstract	Deep neural networks have recently demonstrated the traffic prediction capability with the time series data obtained by sensors mounted on road segments. However, capturing spatio-temporal features of the traffic data often requires a significant number of parameters to train, increasing computational burden. In this work we demonstrate that embedding topological information of the road network improves the process of learning traffic features. We use a graph of a vehicular road network with recurrent neural networks (RNNs) to infer the interaction between adjacent road segments as well as the temporal dynamics. The topology of the road network is converted into a spatio-temporal graph to form a structural RNN (SRNN). The proposed approach is validated over traffic speed data from the road network of the city of Santander in Spain. The experiment shows that the graph-based method outperforms the state-of-the-art methods based on spatio-temporal images, requiring much fewer parameters to train.
Tasks	Time Series, Traffic Prediction
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06506v1
PDF	http://arxiv.org/pdf/1902.06506v1.pdf
PWC	https://paperswithcode.com/paper/structural-recurrent-neural-network-for
Repo
Framework

Robust Monocular Edge Visual Odometry through Coarse-to-Fine Data Association


Title	Robust Monocular Edge Visual Odometry through Coarse-to-Fine Data Association
Authors	Xiaolong Wu, Patricio Vela, Cedric Pradalier
Abstract	In this work, we propose a monocular visual odometry framework, which allows exploiting the best attributes of edge feature for illumination-robust camera tracking, while at the same time ameliorating the performance degradation of edge mapping. In the front-end, an ICP-based edge registration can provide robust motion estimation and coarse data association under lighting changes. In the back-end, a novel edge-guided data association pipeline searches for the best photometrically matched points along geometrically possible edges through template matching, so that the matches can be further refined in later bundle adjustment. The core of our proposed data association strategy lies in a point-to-edge geometric uncertainty analysis, which analytically derives (1) the probabilistic search length formula that significantly reduces the search space for system speed-up and (2) the geometrical confidence metric for mapping degradation detection based on the predicted depth uncertainty. Moreover, match confidence based patch size adaption strategy is integrated into our pipeline, connecting with other components, to reduce the matching ambiguity. We present extensive analysis and evaluation of our proposed system on synthetic and real-world benchmark datasets under the influence of illumination changes and large camera motions, where our proposed system outperforms current state-of-art algorithms.
Tasks	Monocular Visual Odometry, Motion Estimation, Visual Odometry
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11362v2
PDF	https://arxiv.org/pdf/1909.11362v2.pdf
PWC	https://paperswithcode.com/paper/robust-semi-direct-monocular-visual-odometry
Repo
Framework

Zero-Shot Task Transfer


Title	Zero-Shot Task Transfer
Authors	Arghya Pal, Vineeth N Balasubramanian
Abstract	In this work, we present a novel meta-learning algorithm, i.e. TTNet, that regresses model parameters for novel tasks for which no ground truth is available (zero-shot tasks). In order to adapt to novel zero-shot tasks, our meta-learner learns from the model parameters of known tasks (with ground truth) and the correlation of known tasks to zero-shot tasks. Such intuition finds its foothold in cognitive science, where a subject (human baby) can adapt to a novel-concept (depth understanding) by correlating it with old concepts (hand movement or self-motion), without receiving explicit supervision. We evaluated our model on the Taskonomy dataset, with four tasks as zero-shot: surface-normal, room layout, depth, and camera pose estimation. These tasks were chosen based on the data acquisition complexity and the complexity associated with the learning process using a deep network. Our proposed methodology out-performs state-of-the-art models (which use ground truth)on each of our zero-shot tasks, showing promise on zero-shot task transfer. We also conducted extensive experiments to study the various choices of our methodology, as well as showed how the proposed method can also be used in transfer learning. To the best of our knowledge, this is the firstsuch effort on zero-shot learning in the task space.
Tasks	Meta-Learning, Pose Estimation, Transfer Learning, Zero-Shot Learning
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01092v1
PDF	http://arxiv.org/pdf/1903.01092v1.pdf
PWC	https://paperswithcode.com/paper/zero-shot-task-transfer
Repo
Framework

Data Efficient Voice Cloning for Neural Singing Synthesis


Title	Data Efficient Voice Cloning for Neural Singing Synthesis
Authors	Merlijn Blaauw, Jordi Bonada, Ryunosuke Daido
Abstract	There are many use cases in singing synthesis where creating voices from small amounts of data is desirable. In text-to-speech there have been several promising results that apply voice cloning techniques to modern deep learning based models. In this work, we adapt one such technique to the case of singing synthesis. By leveraging data from many speakers to first create a multispeaker model, small amounts of target data can then efficiently adapt the model to new unseen voices. We evaluate the system using listening tests across a number of different use cases, languages and kinds of data.
Tasks
Published	2019-02-19
URL	http://arxiv.org/abs/1902.07292v1
PDF	http://arxiv.org/pdf/1902.07292v1.pdf
PWC	https://paperswithcode.com/paper/data-efficient-voice-cloning-for-neural
Repo
Framework

Faster Neural Network Training with Data Echoing


Title	Faster Neural Network Training with Data Echoing
Authors	Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl
Abstract	In the twilight of Moore’s law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training. However, earlier stages of the training pipeline, such as disk I/O and data preprocessing, do not run on accelerators. As accelerators continue to improve, these earlier stages will increasingly become the bottleneck. In this paper, we introduce “data echoing,” which reduces the total computation used by earlier pipeline stages and speeds up training whenever computation upstream from accelerators dominates the training time. Data echoing reuses (or “echoes”) intermediate outputs from earlier pipeline stages in order to reclaim idle capacity. We investigate the behavior of different data echoing algorithms on various workloads, for various amounts of echoing, and for various batch sizes. We find that in all settings, at least one data echoing algorithm can match the baseline’s predictive performance using less upstream computation. We measured a factor of 3.25 decrease in wall-clock time for ResNet-50 on ImageNet when reading training data over a network.
Tasks
Published	2019-07-12
URL	https://arxiv.org/abs/1907.05550v2
PDF	https://arxiv.org/pdf/1907.05550v2.pdf
PWC	https://paperswithcode.com/paper/faster-neural-network-training-with-data
Repo
Framework

Neural Network based End-to-End Query by Example Spoken Term Detection


Title	Neural Network based End-to-End Query by Example Spoken Term Detection
Authors	Dhananjay Ram, Lesly Miculicich, Hervé Bourlard
Abstract	This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other. We propose to integrate these two stages in a fully neural network based end-to-end learning framework to enable joint optimization of those two stages simultaneously. The proposed approaches are evaluated on two challenging multilingual datasets: Spoken Web Search 2013 and Query by Example Search on Speech Task 2014, demonstrating in each case significant improvements.
Tasks
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08332v1
PDF	https://arxiv.org/pdf/1911.08332v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-based-end-to-end-query-by
Repo
Framework

Robust Risk Minimization for Statistical Learning


Title	Robust Risk Minimization for Statistical Learning
Authors	Muhammad Osama, Dave Zachariah, Peter Stoica
Abstract	We consider a general statistical learning problem where an unknown fraction of the training data is corrupted. We develop a robust learning method that only requires specifying an upper bound on the corrupted data fraction. The method minimizes a risk function defined by a non-parametric distribution with unknown probability weights. We derive and analyse the optimal weights and show how they provide robustness against corrupted data. Furthermore, we give a computationally efficient coordinate descent algorithm to solve the risk minimization problem. We demonstrate the wide range applicability of the method, including regression, classification, unsupervised learning and classic parameter estimation, with state-of-the-art performance.
Tasks
Published	2019-10-03
URL	https://arxiv.org/abs/1910.01544v2
PDF	https://arxiv.org/pdf/1910.01544v2.pdf
PWC	https://paperswithcode.com/paper/robust-risk-minimization-for-statistical
Repo
Framework

NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations


Title	NeuronInspect: Detecting Backdoors in Neural Networks via Output Explanations
Authors	Xijie Huang, Moustafa Alzantot, Mani Srivastava
Abstract	Deep neural networks have achieved state-of-the-art performance on various tasks. However, lack of interpretability and transparency makes it easier for malicious attackers to inject trojan backdoor into the neural networks, which will make the model behave abnormally when a backdoor sample with a specific trigger is input. In this paper, we propose NeuronInspect, a framework to detect trojan backdoors in deep neural networks via output explanation techniques. NeuronInspect first identifies the existence of backdoor attack targets by generating the explanation heatmap of the output layer. We observe that generated heatmaps from clean and backdoored models have different characteristics. Therefore we extract features that measure the attributes of explanations from an attacked model namely: sparse, smooth and persistent. We combine these features and use outlier detection to figure out the outliers, which is the set of attack targets. We demonstrate the effectiveness and efficiency of NeuronInspect on MNIST digit recognition dataset and GTSRB traffic sign recognition dataset. We extensively evaluate NeuronInspect on different attack scenarios and prove better robustness and effectiveness over state-of-the-art trojan backdoor detection techniques Neural Cleanse by a great margin.
Tasks	Outlier Detection, Traffic Sign Recognition
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07399v1
PDF	https://arxiv.org/pdf/1911.07399v1.pdf
PWC	https://paperswithcode.com/paper/neuroninspect-detecting-backdoors-in-neural
Repo
Framework

Sequential Neural Networks as Automata


Title	Sequential Neural Networks as Automata
Authors	William Merrill
Abstract	This work attempts to explain the types of computation that neural networks can perform by relating them to automata. We first define what it means for a real-time network with bounded precision to accept a language. A measure of network memory follows from this definition. We then characterize the classes of languages acceptable by various recurrent networks, attention, and convolutional networks. We find that LSTMs function like counter machines and relate convolutional networks to the subregular hierarchy. Overall, this work attempts to increase our understanding and ability to interpret neural networks through the lens of theory. These theoretical insights help explain neural computation, as well as the relationship between neural networks and natural language grammar.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01615v2
PDF	https://arxiv.org/pdf/1906.01615v2.pdf
PWC	https://paperswithcode.com/paper/sequential-neural-networks-as-automata
Repo
Framework

Deep learning for cardiac image segmentation: A review


Title	Deep learning for cardiac image segmentation: A review
Authors	Chen Chen, Chen Qin, Huaqi Qiu, Giacomo Tarroni, Jinming Duan, Wenjia Bai, Daniel Rueckert
Abstract	Deep learning has become the most widely used approach for cardiac image segmentation in recent years. In this paper, we provide a review of over 100 cardiac image segmentation papers using deep learning, which covers common imaging modalities including magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound (US) and major anatomical structures of interest (ventricles, atria and vessels). In addition, a summary of publicly available cardiac image datasets and code repositories are included to provide a base for encouraging reproducible research. Finally, we discuss the challenges and limitations with current deep learning-based approaches (scarcity of labels, model generalizability across different domains, interpretability) and suggest potential directions for future research.
Tasks	Computed Tomography (CT), Semantic Segmentation
Published	2019-11-09
URL	https://arxiv.org/abs/1911.03723v1
PDF	https://arxiv.org/pdf/1911.03723v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-cardiac-image-segmentation
Repo
Framework

Optimization on the Surface of the (Hyper)-Sphere


Title	Optimization on the Surface of the (Hyper)-Sphere
Authors	Parameswaran Raman, Jiasen Yang
Abstract	Thomson problem is a classical problem in physics to study how $n$ number of charged particles distribute themselves on the surface of a sphere of $k$ dimensions. When $k=2$, i.e. a 2-sphere (a circle), the particles appear at equally spaced points. Such a configuration can be computed analytically. However, for higher dimensions such as $k \ge 3$, i.e. the case of 3-sphere (standard sphere), there is not much that is understood analytically. Finding global minimum of the problem under these settings is particularly tough since the optimization problem becomes increasingly computationally intensive with larger values of $k$ and $n$. In this work, we explore a wide variety of numerical optimization methods to solve the Thomson problem. In our empirical study, we find stochastic gradient based methods (SGD) to be a compelling choice for this problem as it scales well with the number of points.
Tasks
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06463v1
PDF	https://arxiv.org/pdf/1909.06463v1.pdf
PWC	https://paperswithcode.com/paper/optimization-on-the-surface-of-the-hyper
Repo
Framework

Automated Augmentation with Reinforcement Learning and GANs for Robust Identification of Traffic Signs using Front Camera Images


Title	Automated Augmentation with Reinforcement Learning and GANs for Robust Identification of Traffic Signs using Front Camera Images
Authors	Sohini Roy Chowdhury, Lars Tornberg, Robin Halvfordsson, Jonatan Nordh, Adam Suhren Gustafsson, Joel Wall, Mattias Westerberg, Adam Wirehed, Louis Tilloy, Zhanying Hu, Haoyuan Tan, Meng Pan, Jonas Sjoberg
Abstract	Traffic sign identification using camera images from vehicles plays a critical role in autonomous driving and path planning. However, the front camera images can be distorted due to blurriness, lighting variations and vandalism which can lead to degradation of detection performances. As a solution, machine learning models must be trained with data from multiple domains, and collecting and labeling more data in each new domain is time consuming and expensive. In this work, we present an end-to-end framework to augment traffic sign training data using optimal reinforcement learning policies and a variety of Generative Adversarial Network (GAN) models, that can then be used to train traffic sign detector modules. Our automated augmenter enables learning from transformed nightime, poor lighting, and varying degrees of occlusions using the LISA Traffic Sign and BDD-Nexar dataset. The proposed method enables mapping training data from one domain to another, thereby improving traffic sign detection precision/recall from 0.70/0.66 to 0.83/0.71 for nighttime images.
Tasks	Autonomous Driving
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06486v1
PDF	https://arxiv.org/pdf/1911.06486v1.pdf
PWC	https://paperswithcode.com/paper/automated-augmentation-with-reinforcement
Repo
Framework

On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation


Title	On the Importance of the Kullback-Leibler Divergence Term in Variational Autoencoders for Text Generation
Authors	Victor Prokhorov, Ehsan Shareghi, Yingzhen Li, Mohammad Taher Pilehvar, Nigel Collier
Abstract	Variational Autoencoders (VAEs) are known to suffer from learning uninformative latent representation of the input due to issues such as approximated posterior collapse, or entanglement of the latent space. We impose an explicit constraint on the Kullback-Leibler (KL) divergence term inside the VAE objective function. While the explicit constraint naturally avoids posterior collapse, we use it to further understand the significance of the KL term in controlling the information transmitted through the VAE channel. Within this framework, we explore different properties of the estimated posterior distribution, and highlight the trade-off between the amount of information encoded in a latent code during training, and the generative capacity of the model.
Tasks	Text Generation
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13668v1
PDF	https://arxiv.org/pdf/1909.13668v1.pdf
PWC	https://paperswithcode.com/paper/on-the-importance-of-the-kullback-leibler
Repo
Framework

Scalable methods for computing state similarity in deterministic Markov Decision Processes


Title	Scalable methods for computing state similarity in deterministic Markov Decision Processes
Authors	Pablo Samuel Castro
Abstract	We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09291v1
PDF	https://arxiv.org/pdf/1911.09291v1.pdf
PWC	https://paperswithcode.com/paper/scalable-methods-for-computing-state
Repo
Framework

On the Physical Interpretation of Proper Orthogonal Decomposition and Dynamic Mode Decomposition for Liquid Injection


Title	On the Physical Interpretation of Proper Orthogonal Decomposition and Dynamic Mode Decomposition for Liquid Injection
Authors	Scott B. Leask, Vincent G. McDonell
Abstract	The modal decomposition techniques of proper orthogonal decomposition (POD) and dynamic mode decomposition (DMD) have become a common method for analysing the spatio-temporal coherence of dynamical systems. In particular, these techniques are of interest for liquid injection systems due to the inherent complexity of multiphase interactions and extracting the underlying flow processes is desired. Although numerous works investigating flow processes have implemented POD and DMD, the results are often highly interpretive with limited link between the decomposition theory and the interpreted physical meaning of the extracted modes. Here, we provide insight into the interpretation of POD and DMD modes in a hierarchical structure. The interpretation of modes for simple canonical systems is validated through knowledge of the underlying processes which dominate the systems. We show that modes which capture true underlying phenomena produce subsequent modes at higher harmonics, up until the Nyquist limit, whose modal structure scales decrease proportionally with increasing modal frequency. These higher harmonics primarily encode motion information and may or may not capture additional structural information, which is dependent on the system. We demonstrate these findings first on canonical liquid injection systems to enhance the interpretation and understanding of results extracted from practical jet in crossflow systems.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07576v1
PDF	https://arxiv.org/pdf/1909.07576v1.pdf
PWC	https://paperswithcode.com/paper/on-the-physical-interpretation-of-proper
Repo
Framework