July 29, 2019

2921 words 14 mins read

Paper Group AWR 177

Paper Group AWR 177

CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks. Transition-Based Generation from Abstract Meaning Representations. Reinforcement Learning via Recurrent Convolutional Neural Networks. Analogs of Linguistic Structure in Deep Representations. Character-based Joint Segmentation and POS Tagging for C …

CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks

Title CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks
Authors Mustafa Mustafa, Deborah Bard, Wahid Bhimji, Zarija Lukić, Rami Al-Rfou, Jan M. Kratochvil
Abstract Inferring model parameters from experimental data is a grand challenge in many sciences, including cosmology. This often relies critically on high fidelity numerical simulations, which are prohibitively computationally expensive. The application of deep learning techniques to generative modeling is renewing interest in using high dimensional density estimators as computationally inexpensive emulators of fully-fledged simulations. These generative models have the potential to make a dramatic shift in the field of scientific simulations, but for that shift to happen we need to study the performance of such generators in the precision regime needed for science applications. To this end, in this work we apply Generative Adversarial Networks to the problem of generating weak lensing convergence maps. We show that our generator network produces maps that are described by, with high statistical confidence, the same summary statistics as the fully simulated maps.
Tasks
Published 2017-06-07
URL https://arxiv.org/abs/1706.02390v6
PDF https://arxiv.org/pdf/1706.02390v6.pdf
PWC https://paperswithcode.com/paper/cosmogan-creating-high-fidelity-weak-lensing
Repo https://github.com/MustafaMustafa/cosmoGAN
Framework tf

Transition-Based Generation from Abstract Meaning Representations

Title Transition-Based Generation from Abstract Meaning Representations
Authors Timo Schick
Abstract This work addresses the task of generating English sentences from Abstract Meaning Representation (AMR) graphs. To cope with this task, we transform each input AMR graph into a structure similar to a dependency tree and annotate it with syntactic information by applying various predefined actions to it. Subsequently, a sentence is obtained from this tree structure by visiting its nodes in a specific order. We train maximum entropy models to estimate the probability of each individual action and devise an algorithm that efficiently approximates the best sequence of actions to be applied. Using a substandard language model, our generator achieves a Bleu score of 27.4 on the LDC2014T12 test set, the best result reported so far without using silver standard annotations from another corpus as additional training data.
Tasks Language Modelling
Published 2017-07-24
URL http://arxiv.org/abs/1707.07591v1
PDF http://arxiv.org/pdf/1707.07591v1.pdf
PWC https://paperswithcode.com/paper/transition-based-generation-from-abstract
Repo https://github.com/timoschick/amr-gen
Framework none

Reinforcement Learning via Recurrent Convolutional Neural Networks

Title Reinforcement Learning via Recurrent Convolutional Neural Networks
Authors Tanmay Shankar, Santosha K. Dwivedy, Prithwijit Guha
Abstract Deep Reinforcement Learning has enabled the learning of policies for complex tasks in partially observable environments, without explicitly learning the underlying model of the tasks. While such model-free methods achieve considerable performance, they often ignore the structure of task. We present a natural representation of to Reinforcement Learning (RL) problems using Recurrent Convolutional Neural Networks (RCNNs), to better exploit this inherent structure. We define 3 such RCNNs, whose forward passes execute an efficient Value Iteration, propagate beliefs of state in partially observable environments, and choose optimal actions respectively. Backpropagating gradients through these RCNNs allows the system to explicitly learn the Transition Model and Reward Function associated with the underlying MDP, serving as an elegant alternative to classical model-based RL. We evaluate the proposed algorithms in simulation, considering a robot planning problem. We demonstrate the capability of our framework to reduce the cost of replanning, learn accurate MDP models, and finally re-plan with learnt models to achieve near-optimal policies.
Tasks
Published 2017-01-09
URL http://arxiv.org/abs/1701.02392v1
PDF http://arxiv.org/pdf/1701.02392v1.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-via-recurrent
Repo https://github.com/tanmayshankar/RCNN_MDP
Framework none

Analogs of Linguistic Structure in Deep Representations

Title Analogs of Linguistic Structure in Deep Representations
Authors Jacob Andreas, Dan Klein
Abstract We investigate the compositional structure of message vectors computed by a deep network trained on a communication game. By comparing truth-conditional representations of encoder-produced message vectors to human-produced referring expressions, we are able to identify aligned (vector, utterance) pairs with the same meaning. We then search for structured relationships among these aligned pairs to discover simple vector space transformations corresponding to negation, conjunction, and disjunction. Our results suggest that neural representations are capable of spontaneously developing a “syntax” with functional analogues to qualitative properties of natural language.
Tasks
Published 2017-07-25
URL http://arxiv.org/abs/1707.08139v1
PDF http://arxiv.org/pdf/1707.08139v1.pdf
PWC https://paperswithcode.com/paper/analogs-of-linguistic-structure-in-deep
Repo https://github.com/LeenaShekhar/NLP-Linguistics-ML-Resources
Framework tf

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

Title Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF
Authors Yan Shao, Christian Hardmeier, Jörg Tiedemann, Joakim Nivre
Abstract We present a character-based model for joint segmentation and POS tagging for Chinese. The bidirectional RNN-CRF architecture for general sequence tagging is adapted and applied with novel vector representations of Chinese characters that capture rich contextual information and lower-than-character level features. The proposed model is extensively evaluated and compared with a state-of-the-art tagger respectively on CTB5, CTB9 and UD Chinese. The experimental results indicate that our model is accurate and robust across datasets in different sizes, genres and annotation schemes. We obtain state-of-the-art performance on CTB5, achieving 94.38 F1-score for joint segmentation and POS tagging.
Tasks
Published 2017-04-05
URL http://arxiv.org/abs/1704.01314v3
PDF http://arxiv.org/pdf/1704.01314v3.pdf
PWC https://paperswithcode.com/paper/character-based-joint-segmentation-and-pos
Repo https://github.com/yanshao9798/tagger
Framework tf

AENet: Learning Deep Audio Features for Video Analysis

Title AENet: Learning Deep Audio Features for Video Analysis
Authors Naoya Takahashi, Michael Gygli, Luc Van Gool
Abstract We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear sub-word units that are present in speech. In order to incorporate this long-time frequency structure of audio events, we introduce a convolutional neural network (CNN) operating on a large temporal input. In contrast to previous works this allows us to train an audio event detection system end-to-end. The combination of our network architecture and a novel data augmentation outperforms previous methods for audio event detection by 16%. Furthermore, we perform transfer learning and show that our model learnt generic audio features, similar to the way CNNs learn generic features on vision tasks. In video analysis, combining visual features and traditional audio features such as MFCC typically only leads to marginal improvements. Instead, combining visual features with our AENet features, which can be computed efficiently on a GPU, leads to significant performance improvements on action recognition and video highlight detection. In video highlight detection, our audio features improve the performance by more than 8% over visual features alone.
Tasks Data Augmentation, Temporal Action Localization, Transfer Learning
Published 2017-01-03
URL http://arxiv.org/abs/1701.00599v2
PDF http://arxiv.org/pdf/1701.00599v2.pdf
PWC https://paperswithcode.com/paper/aenet-learning-deep-audio-features-for-video
Repo https://github.com/znaoya/aenet
Framework none

AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding

Title AI Challenger : A Large-scale Dataset for Going Deeper in Image Understanding
Authors Jiahong Wu, He Zheng, Bo Zhao, Yixin Li, Baoming Yan, Rui Liang, Wenjia Wang, Shipei Zhou, Guosen Lin, Yanwei Fu, Yizhou Wang, Yonggang Wang
Abstract Significant progress has been achieved in Computer Vision by leveraging large-scale image datasets. However, large-scale datasets for complex Computer Vision tasks beyond classification are still limited. This paper proposed a large-scale dataset named AIC (AI Challenger) with three sub-datasets, human keypoint detection (HKD), large-scale attribute dataset (LAD) and image Chinese captioning (ICC). In this dataset, we annotate class labels (LAD), keypoint coordinate (HKD), bounding box (HKD and LAD), attribute (LAD) and caption (ICC). These rich annotations bridge the semantic gap between low-level images and high-level concepts. The proposed dataset is an effective benchmark to evaluate and improve different computational methods. In addition, for related tasks, others can also use our dataset as a new resource to pre-train their models.
Tasks Keypoint Detection
Published 2017-11-17
URL http://arxiv.org/abs/1711.06475v1
PDF http://arxiv.org/pdf/1711.06475v1.pdf
PWC https://paperswithcode.com/paper/ai-challenger-a-large-scale-dataset-for-going
Repo https://github.com/chingswy/HumanPoseMemo
Framework pytorch

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

Title Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments
Authors Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan
Abstract Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance. In this paper, we present an experimental study on these linear filters in a specific speech recognition task, namely the CHiME-4 challenge, which features real recordings in multiple noisy environments. Specifically, the rank-1 MWF is employed for noise reduction and a new constant residual noise power constraint is derived which enhances the recognition performance. To fulfill the underlying rank-1 assumption, the speech covariance matrix is reconstructed based on eigenvectors or generalized eigenvectors. Then the rank-1 constrained MWF is evaluated with alternative multichannel linear filters under the same framework, which involves a Bidirectional Long Short-Term Memory (BLSTM) network for mask estimation. The proposed filter outperforms alternative ones, leading to a 40% relative Word Error Rate (WER) reduction compared with the baseline Weighted Delay and Sum (WDAS) beamformer on the real test set, and a 15% relative WER reduction compared with the GEV-BAN method. The results also suggest that the speech recognition accuracy correlates more with the Mel-frequency cepstral coefficients (MFCC) feature variance than with the noise reduction or the speech distortion level.
Tasks Speech Recognition
Published 2017-07-01
URL http://arxiv.org/abs/1707.00201v2
PDF http://arxiv.org/pdf/1707.00201v2.pdf
PWC https://paperswithcode.com/paper/rank-1-constrained-multichannel-wiener-filter
Repo https://github.com/ZitengWang/nn_mask
Framework none

Stable Architectures for Deep Neural Networks

Title Stable Architectures for Deep Neural Networks
Authors Eldad Haber, Lars Ruthotto
Abstract Deep neural networks have become invaluable tools for supervised machine learning, e.g., classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Important issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.
Tasks
Published 2017-05-09
URL http://arxiv.org/abs/1705.03341v3
PDF http://arxiv.org/pdf/1705.03341v3.pdf
PWC https://paperswithcode.com/paper/stable-architectures-for-deep-neural-networks
Repo https://github.com/TheoryDev/Deep-neural-network-training-optimisation
Framework pytorch

Modelling Domain Relationships for Transfer Learning on Retrieval-based Question Answering Systems in E-commerce

Title Modelling Domain Relationships for Transfer Learning on Retrieval-based Question Answering Systems in E-commerce
Authors Jianfei Yu, Minghui Qiu, Jing Jiang, Jun Huang, Shuangyong Song, Wei Chu, Haiqing Chen
Abstract In this paper, we study transfer learning for the PI and NLI problems, aiming to propose a general framework, which can effectively and efficiently adapt the shared knowledge learned from a resource-rich source domain to a resource- poor target domain. Specifically, since most existing transfer learning methods only focus on learning a shared feature space across domains while ignoring the relationship between the source and target domains, we propose to simultaneously learn shared representations and domain relationships in a unified framework. Furthermore, we propose an efficient and effective hybrid model by combining a sentence encoding- based method and a sentence interaction-based method as our base model. Extensive experiments on both paraphrase identification and natural language inference demonstrate that our base model is efficient and has promising performance compared to the competing models, and our transfer learning method can help to significantly boost the performance. Further analysis shows that the inter-domain and intra-domain relationship captured by our model are insightful. Last but not least, we deploy our transfer learning model for PI into our online chatbot system, which can bring in significant improvements over our existing system. Finally, we launch our new system on the chatbot platform Eva in our E-commerce site AliExpress.
Tasks Chatbot, Natural Language Inference, Paraphrase Identification, Question Answering, Transfer Learning
Published 2017-11-23
URL http://arxiv.org/abs/1711.08726v1
PDF http://arxiv.org/pdf/1711.08726v1.pdf
PWC https://paperswithcode.com/paper/modelling-domain-relationships-for-transfer
Repo https://github.com/jefferyYu/WSDM18_codes
Framework tf

A Deeper Look at Experience Replay

Title A Deeper Look at Experience Replay
Authors Shangtong Zhang, Richard S. Sutton
Abstract Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.
Tasks Atari Games
Published 2017-12-04
URL http://arxiv.org/abs/1712.01275v3
PDF http://arxiv.org/pdf/1712.01275v3.pdf
PWC https://paperswithcode.com/paper/a-deeper-look-at-experience-replay
Repo https://github.com/seungjaeryanlee/combined-experience-replay
Framework pytorch

Guide Actor-Critic for Continuous Control

Title Guide Actor-Critic for Continuous Control
Authors Voot Tangkaratt, Abbas Abdolmaleki, Masashi Sugiyama
Abstract Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GAC). GAC firstly learns a guide actor that locally maximizes the critic and then it updates the policy parameter based on the guide actor by supervised learning. Our main theoretical contributions are two folds. First, we show that GAC updates the guide actor by performing second-order optimization in the action space where the curvature matrix is based on the Hessians of the critic. Second, we show that the deterministic policy gradient method is a special case of GAC when the Hessians are ignored. Through experiments, we show that our method is a promising reinforcement learning method for continuous controls.
Tasks Continuous Control
Published 2017-05-22
URL http://arxiv.org/abs/1705.07606v2
PDF http://arxiv.org/pdf/1705.07606v2.pdf
PWC https://paperswithcode.com/paper/guide-actor-critic-for-continuous-control
Repo https://github.com/voot-t/guide-actor-critic
Framework tf

Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning

Title Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning
Authors Yuhang Song, Main Xu, Songyang Zhang, Liangyu Huo
Abstract Deep learning (DL) advances state-of-the-art reinforcement learning (RL), by incorporating deep neural networks in learning representations from the input to RL. However, the conventional deep neural network architecture is limited in learning representations for multi-task RL (MT-RL), as multiple tasks can refer to different kinds of representations. In this paper, we thus propose a novel deep neural network architecture, namely generalization tower network (GTN), which can achieve MT-RL within a single learned model. Specifically, the architecture of GTN is composed of both horizontal and vertical streams. In our GTN architecture, horizontal streams are used to learn representation shared in similar tasks. In contrast, the vertical streams are introduced to be more suitable for handling diverse tasks, which encodes hierarchical shared knowledge of these tasks. The effectiveness of the introduced vertical stream is validated by experimental results. Experimental results further verify that our GTN architecture is able to advance the state-of-the-art MT-RL, via being tested on 51 Atari games.
Tasks Atari Games, Multi-Task Learning
Published 2017-10-27
URL http://arxiv.org/abs/1710.10036v3
PDF http://arxiv.org/pdf/1710.10036v3.pdf
PWC https://paperswithcode.com/paper/generalization-tower-network-a-novel-deep
Repo https://github.com/YuhangSong/GTN
Framework pytorch

Key-Value Retrieval Networks for Task-Oriented Dialogue

Title Key-Value Retrieval Networks for Task-Oriented Dialogue
Authors Mihail Eric, Christopher D. Manning
Abstract Neural task-oriented dialogue systems often struggle to smoothly interface with a knowledge base. In this work, we seek to address this problem by proposing a new neural dialogue agent that is able to effectively sustain grounded, multi-domain discourse through a novel key-value retrieval mechanism. The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. We also release a new dataset of 3,031 dialogues that are grounded through underlying knowledge bases and span three distinct tasks in the in-car personal assistant space: calendar scheduling, weather information retrieval, and point-of-interest navigation. Our architecture is simultaneously trained on data from all domains and significantly outperforms a competitive rule-based system and other existing neural dialogue architectures on the provided domains according to both automatic and human evaluation metrics.
Tasks Information Retrieval, Task-Oriented Dialogue Systems
Published 2017-05-15
URL http://arxiv.org/abs/1705.05414v2
PDF http://arxiv.org/pdf/1705.05414v2.pdf
PWC https://paperswithcode.com/paper/key-value-retrieval-networks-for-task
Repo https://github.com/ysglh/Task-Oriented-Dialogue-Dataset-Survey
Framework none

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Title Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Authors Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu
Abstract This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of $4.53$ comparable to a MOS of $4.58$ for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and $F_0$ features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.
Tasks Speech Synthesis
Published 2017-12-16
URL http://arxiv.org/abs/1712.05884v2
PDF http://arxiv.org/pdf/1712.05884v2.pdf
PWC https://paperswithcode.com/paper/natural-tts-synthesis-by-conditioning-wavenet
Repo https://github.com/CorentinJ/Real-Time-Voice-Cloning
Framework tf
comments powered by Disqus