January 26, 2020

2944 words 14 mins read

Paper Group ANR 1576

A Selective Overview of Deep Learning. Hierarchical Reinforcement Learning for Quadruped Locomotion. Self-Organization and Artificial Life. Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition. Semi-supervised voice conversion with amortized variational inference. Investigation of Error Simulation T …

A Selective Overview of Deep Learning


Title	A Selective Overview of Deep Learning
Authors	Jianqing Fan, Cong Ma, Yiqiao Zhong
Abstract	Deep learning has arguably achieved tremendous success in recent years. In simple words, deep learning uses the composition of many nonlinear functions to model the complex dependency between input features and labels. While neural networks have a long history, recent advances have greatly improved their performance in computer vision, natural language processing, etc. From the statistical and scientific perspective, it is natural to ask: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? To answer these questions, we introduce common neural network models (e.g., convolutional neural nets, recurrent neural nets, generative adversarial nets) and training techniques (e.g., stochastic gradient descent, dropout, batch normalization) from a statistical point of view. Along the way, we highlight new characteristics of deep learning (including depth and over-parametrization) and explain their practical and theoretical benefits. We also sample recent results on theories of deep learning, many of which are only suggestive. While a complete understanding of deep learning remains elusive, we hope that our perspectives and discussions serve as a stimulus for new statistical research.
Tasks
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05526v2
PDF	http://arxiv.org/pdf/1904.05526v2.pdf
PWC	https://paperswithcode.com/paper/a-selective-overview-of-deep-learning
Repo
Framework

Hierarchical Reinforcement Learning for Quadruped Locomotion


Title	Hierarchical Reinforcement Learning for Quadruped Locomotion
Authors	Deepali Jain, Atil Iscen, Ken Caluwaerts
Abstract	Legged locomotion is a challenging task for learning algorithms, especially when the task requires a diverse set of primitive behaviors. To solve these problems, we introduce a hierarchical framework to automatically decompose complex locomotion tasks. A high-level policy issues commands in a latent space and also selects for how long the low-level policy will execute the latent command. Concurrently, the low-level policy uses the latent command and only the robot’s on-board sensors to control the robot’s actuators. Our approach allows the high-level policy to run at a lower frequency than the low-level one. We test our framework on a path-following task for a dynamic quadruped robot and we show that steering behaviors automatically emerge in the latent command space as low-level skills are needed for this task. We then show efficient adaptation of the trained policy to a different task by transfer of the trained low-level policy. Finally, we validate the policies on a real quadruped robot. To the best of our knowledge, this is the first application of end-to-end hierarchical learning to a real robotic locomotion task.
Tasks	Hierarchical Reinforcement Learning
Published	2019-05-22
URL	https://arxiv.org/abs/1905.08926v1
PDF	https://arxiv.org/pdf/1905.08926v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-reinforcement-learning-for
Repo
Framework

Self-Organization and Artificial Life


Title	Self-Organization and Artificial Life
Authors	Carlos Gershenson, Vito Trianni, Justin Werfel, Hiroki Sayama
Abstract	Self-organization can be broadly defined as the ability of a system to display ordered spatio-temporal patterns solely as the result of the interactions among the system components. Processes of this kind characterize both living and artificial systems, making self-organization a concept that is at the basis of several disciplines, from physics to biology to engineering. Placed at the frontiers between disciplines, Artificial Life (ALife) has heavily borrowed concepts and tools from the study of self-organization, providing mechanistic interpretations of life-like phenomena as well as useful constructivist approaches to artificial system design. Despite its broad usage within ALife, the concept of self-organization has been often excessively stretched or misinterpreted, calling for a clarification that could help with tracing the borders between what can and cannot be considered self-organization. In this review, we discuss the fundamental aspects of self-organization and list the main usages within three primary ALife domains, namely “soft” (mathematical/computational modeling), “hard” (physical robots), and “wet” (chemical/biological systems) ALife. Finally, we discuss the usefulness of self-organization within ALife studies, point to perspectives for future research, and list open questions.
Tasks	Artificial Life
Published	2019-03-14
URL	http://arxiv.org/abs/1903.07456v1
PDF	http://arxiv.org/pdf/1903.07456v1.pdf
PWC	https://paperswithcode.com/paper/self-organization-and-artificial-life
Repo
Framework

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition


Title	Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Authors	Zheng Lian, Jianhua Tao, Bin Liu, Jian Huang
Abstract	Prior works on speech emotion recognition utilize various unsupervised learning approaches to deal with low-resource samples. However, these methods pay less attention to modeling the long-term dynamic dependency, which is important for speech emotion recognition. To deal with this problem, this paper combines the unsupervised representation learning strategy – Future Observation Prediction (FOP), with transfer learning approaches (such as Fine-tuning and Hypercolumns). To verify the effectiveness of the proposed method, we conduct experiments on the IEMOCAP database. Experimental results demonstrate that our method is superior to currently advanced unsupervised learning strategies.
Tasks	Emotion Recognition, Representation Learning, Speech Emotion Recognition, Transfer Learning, Unsupervised Representation Learning
Published	2019-10-24
URL	https://arxiv.org/abs/1910.13806v1
PDF	https://arxiv.org/pdf/1910.13806v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-learning-with-2
Repo
Framework

Semi-supervised voice conversion with amortized variational inference


Title	Semi-supervised voice conversion with amortized variational inference
Authors	Cory Stephenson, Gokce Keskin, Anil Thomas, Oguz H. Elibol
Abstract	In this work we introduce a semi-supervised approach to the voice conversion problem, in which speech from a source speaker is converted into speech of a target speaker. The proposed method makes use of both parallel and non-parallel utterances from the source and target simultaneously during training. This approach can be used to extend existing parallel data voice conversion systems such that they can be trained with semi-supervision. We show that incorporating semi-supervision improves the voice conversion performance compared to fully supervised training when the number of parallel utterances is limited as in many practical applications. Additionally, we find that increasing the number non-parallel utterances used in training continues to improve performance when the amount of parallel training data is held constant.
Tasks	Voice Conversion
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00067v1
PDF	https://arxiv.org/pdf/1910.00067v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-voice-conversion-with
Repo
Framework

Investigation of Error Simulation Techniques for Learning Dialog Policies for Conversational Error Recovery


Title	Investigation of Error Simulation Techniques for Learning Dialog Policies for Conversational Error Recovery
Authors	Maryam Fazel-Zarandi, Longshaokan Wang, Aditya Tiwari, Spyros Matsoukas
Abstract	Training dialog policies for speech-based virtual assistants requires a plethora of conversational data. The data collection phase is often expensive and time consuming due to human involvement. To address this issue, a common solution is to build user simulators for data generation. For the successful deployment of the trained policies into real world domains, it is vital that the user simulator mimics realistic conditions. In particular, speech-based assistants are heavily affected by automatic speech recognition and language understanding errors, hence the user simulator should be able to simulate similar errors. In this paper, we review the existing error simulation methods that induce errors at audio, phoneme, text, or semantic level; and conduct detailed comparisons between the audio-level and text-level methods. In the process, we improve the existing text-level method by introducing confidence score prediction and out-of-vocabulary word mapping. We also explore the impact of audio-level and text-level methods on learning a simple clarification dialog policy to recover from errors to provide insight on future improvement for both approaches.
Tasks	Speech Recognition
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03378v1
PDF	https://arxiv.org/pdf/1911.03378v1.pdf
PWC	https://paperswithcode.com/paper/investigation-of-error-simulation-techniques
Repo
Framework

Deep Learning Models to Predict Pediatric Asthma Emergency Department Visits


Title	Deep Learning Models to Predict Pediatric Asthma Emergency Department Visits
Authors	Xiao Wang, Zhijie Wang, Yolande M. Pengetnze, Barry S. Lachman, Vikas Chowdhry
Abstract	Pediatric asthma is the most prevalent chronic childhood illness, afflicting about 6.2 million children in the United States. However, asthma could be better managed by identifying and avoiding triggers, educating about medications and proper disease management strategies. This research utilizes deep learning methodologies to predict asthma-related emergency department (ED) visit within 3 months using Medicaid claims data. We compare prediction results against traditional statistical classification model - penalized Lasso logistic regression, which we trained and have deployed since 2015. The results have indicated that deep learning model Artificial Neural Networks (ANN) slightly outperforms (with AUC = 0.845) the Lasso logistic regression (with AUC = 0.842). The reason may come from the nonlinear nature of ANN.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11195v1
PDF	https://arxiv.org/pdf/1907.11195v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-models-to-predict-pediatric
Repo
Framework

DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling


Title	DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Authors	Sachin Mehta, Rik Koncel-Kedziorski, Mohammad Rastegari, Hannaneh Hajishirzi
Abstract	For sequence models with large vocabularies, a majority of network parameters lie in the input and output layers. In this work, we describe a new method, DeFINE, for learning deep token representations efficiently. Our architecture uses a hierarchical structure with novel skip-connections which allows for the use of low dimensional input and output layers, reducing total parameters and training time while delivering similar or better performance versus existing methods. DeFINE can be incorporated easily in new or existing sequence models. Compared to state-of-the-art methods including adaptive input representations, this technique results in a 6% to 20% drop in perplexity. On WikiText-103, DeFINE reduces the total parameters of Transformer-XL by half with minimal impact on performance. On the Penn Treebank, DeFINE improves AWD-LSTM by 4 points with a 17% reduction in parameters, achieving comparable performance to state-of-the-art methods with fewer parameters. For machine translation, DeFINE improves the efficiency of the Transformer model by about 1.4 times while delivering similar performance.
Tasks	Machine Translation, Word Embeddings
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12385v2
PDF	https://arxiv.org/pdf/1911.12385v2.pdf
PWC	https://paperswithcode.com/paper/define-deep-factorized-input-word-embeddings-1
Repo
Framework

Essentia: Mining Domain-Specific Paraphrases with Word-Alignment Graphs


Title	Essentia: Mining Domain-Specific Paraphrases with Word-Alignment Graphs
Authors	Danni Ma, Chen Chen, Behzad Golshan, Wang-Chiew Tan
Abstract	Paraphrases are important linguistic resources for a wide variety of NLP applications. Many techniques for automatic paraphrase mining from general corpora have been proposed. While these techniques are successful at discovering generic paraphrases, they often fail to identify domain-specific paraphrases (e.g., {staff, concierge} in the hospitality domain). This is because current techniques are often based on statistical methods, while domain-specific corpora are too small to fit statistical methods. In this paper, we present an unsupervised graph-based technique to mine paraphrases from a small set of sentences that roughly share the same topic or intent. Our system, Essentia, relies on word-alignment techniques to create a word-alignment graph that merges and organizes tokens from input sentences. The resulting graph is then used to generate candidate paraphrases. We demonstrate that our system obtains high-quality paraphrases, as evaluated by crowd workers. We further show that the majority of the identified paraphrases are domain-specific and thus complement existing paraphrase databases.
Tasks	Word Alignment
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00637v2
PDF	https://arxiv.org/pdf/1910.00637v2.pdf
PWC	https://paperswithcode.com/paper/essentia-mining-domain-specific-paraphrases
Repo
Framework

Can learning from natural image denoising be used for seismic data interpolation?


Title	Can learning from natural image denoising be used for seismic data interpolation?
Authors	Hao Zhang, Xiuyan Yang, Jianwei Ma
Abstract	We propose a convolutional neural network (CNN) denoising based method for seismic data interpolation. It provides a simple and efficient way to break though the lack problem of geophysical training labels that are often required by deep learning methods. The new method consists of two steps: (1) Train a set of CNN denoisers from natural image clean-noisy pairs to learn denoising; (2) Integrate the trained CNN denoisers into project onto convex set (POCS) framework to perform seismic data interpolation. The method alleviates the demanding of seismic big data with similar features as applications of end-to-end deep learning on seismic data interpolation. Additionally, the proposed method is flexible for many cases of traces missing because missing cases are not involved in the training step, and thus it is of plug-and-play nature. These indicate the high generalizability of our approach and the reduction of the need of the problem-specific training. Primary results on synthetic and field data show promising interpolation performances of the presented CNN-POCS method in terms of signal-to-noise ratio, de-aliasing and weak-feature reconstruction, in comparison with traditional $f$-$x$ prediction filtering and curvelet transform based POCS methods.
Tasks	De-aliasing, Denoising, Image Denoising
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10379v2
PDF	http://arxiv.org/pdf/1902.10379v2.pdf
PWC	https://paperswithcode.com/paper/can-learning-from-natural-image-denoising-be
Repo
Framework

A comparison of end-to-end models for long-form speech recognition


Title	A comparison of end-to-end models for long-form speech recognition
Authors	Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu
Abstract	End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of seconds. Whether such architectures are practical on long utterances that last from minutes to hours remains an open question. In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription. We first present an empirical comparison of different end-to-end models on a real world long-form task and demonstrate that the RNN-T model is much more robust than attention-based systems in this regime. We next explore two improvements to attention-based systems that significantly improve its performance: restricting the attention to be monotonic, and applying a novel decoding algorithm that breaks long utterances into shorter overlapping segments. Combining these two improvements, we show that attention-based end-to-end models can be very competitive to RNN-T on long-form speech recognition.
Tasks	Speech Recognition
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02242v1
PDF	https://arxiv.org/pdf/1911.02242v1.pdf
PWC	https://paperswithcode.com/paper/a-comparison-of-end-to-end-models-for-long
Repo
Framework

Deep Learning for Brain Tumor Segmentation in Radiosurgery: Prospective Clinical Evaluation


Title	Deep Learning for Brain Tumor Segmentation in Radiosurgery: Prospective Clinical Evaluation
Authors	Boris Shirokikh, Alexandra Dalechina, Alexey Shevtsov, Egor Krivov, Valery Kostjuchenko, Amayak Durgaryan, Mikhail Galkin, Ivan Osinov, Andrey Golanov, Mikhail Belyaev
Abstract	Stereotactic radiosurgery is a minimally-invasive treatment option for a large number of patients with intracranial tumors. As part of the therapy treatment, accurate delineation of brain tumors is of great importance. However, slice-by-slice manual segmentation on T1c MRI could be time-consuming (especially for multiple metastases) and subjective. In our work, we compared several deep convolutional networks architectures and training procedures and evaluated the best model in a radiation therapy department for three types of brain tumors: meningiomas, schwannomas and multiple brain metastases. The developed semiautomatic segmentation system accelerates the contouring process by 2.2 times on average and increases inter-rater agreement from 92.0% to 96.5%.
Tasks	Brain Tumor Segmentation
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02799v3
PDF	https://arxiv.org/pdf/1909.02799v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-brain-tumor-segmentation-in
Repo
Framework

End-to-End Boundary Aware Networks for Medical Image Segmentation


Title	End-to-End Boundary Aware Networks for Medical Image Segmentation
Authors	Ali Hatamizadeh, Demetri Terzopoulos, Andriy Myronenko
Abstract	Fully convolutional neural networks (CNNs) have proven to be effective at representing and classifying textural information, thus transforming image intensity into output class masks that achieve semantic image segmentation. In medical image analysis, however, expert manual segmentation often relies on the boundaries of anatomical structures of interest. We propose boundary aware CNNs for medical image segmentation. Our networks are designed to account for organ boundary information, both by providing a special network edge branch and edge-aware loss terms, and they are trainable end-to-end. We validate their effectiveness on the task of brain tumor segmentation using the BraTS 2018 dataset. Our experiments reveal that our approach yields more accurate segmentation results, which makes it promising for more extensive application to medical image segmentation.
Tasks	Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.08071v2
PDF	https://arxiv.org/pdf/1908.08071v2.pdf
PWC	https://paperswithcode.com/paper/boundary-aware-networks-for-medical-image
Repo
Framework

DualDis: Dual-Branch Disentangling with Adversarial Learning


Title	DualDis: Dual-Branch Disentangling with Adversarial Learning
Authors	Thomas Robert, Nicolas Thome, Matthieu Cord
Abstract	In computer vision, disentangling techniques aim at improving latent representations of images by modeling factors of variation. In this paper, we propose DualDis, a new auto-encoder-based framework that disentangles and linearizes class and attribute information. This is achieved thanks to a two-branch architecture forcing the separation of the two kinds of information, accompanied by a decoder for image reconstruction and generation. To effectively separate the information, we propose to use a combination of regular and adversarial classifiers to guide the two branches in specializing for class and attribute information respectively. We also investigate the possibility of using semi-supervised learning for an effective disentangling even using few labels. We leverage the linearization property of the latent spaces for semantic image editing and generation of new images. We validate our approach on CelebA, Yale-B and NORB by measuring the efficiency of information separation via classification metrics, visual image manipulation and data augmentation.
Tasks	Data Augmentation, Image Reconstruction
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00804v1
PDF	https://arxiv.org/pdf/1906.00804v1.pdf
PWC	https://paperswithcode.com/paper/190600804
Repo
Framework

A logical-based corpus for cross-lingual evaluation


Title	A logical-based corpus for cross-lingual evaluation
Authors	Felipe Salvatore, Marcelo Finger, Roberto Hirata Jr
Abstract	At present, different deep learning models are presenting high accuracy on popular inference datasets such as SNLI, MNLI, and SciTail. However, there are different indicators that those datasets can be exploited by using some simple linguistic patterns. This fact poses difficulties to our understanding of the actual capacity of machine learning models to solve the complex task of textual inference. We propose a new set of syntactic tasks focused on contradiction detection that require specific capacities over linguistic logical forms such as: Boolean coordination, quantifiers, definite description, and counting operators. We evaluate two kinds of deep learning models that implicitly exploit language structure: recurrent models and the Transformer network BERT. We show that although BERT is clearly more efficient to generalize over most logical forms, there is space for improvement when dealing with counting operators. Since the syntactic tasks can be implemented in different languages, we show a successful case of cross-lingual transfer learning between English and Portuguese.
Tasks	Cross-Lingual Transfer, Natural Language Inference, Transfer Learning
Published	2019-05-10
URL	https://arxiv.org/abs/1905.05704v5
PDF	https://arxiv.org/pdf/1905.05704v5.pdf
PWC	https://paperswithcode.com/paper/using-syntactical-and-logical-forms-to
Repo
Framework