July 30, 2019

2739 words 13 mins read

Paper Group AWR 13

Conditional Time Series Forecasting with Convolutional Neural Networks. Learning to Segment Every Thing. Rasa: Open Source Language Understanding and Dialogue Management. Joint 3D Proposal Generation and Object Detection from View Aggregation. Contaminated speech training methods for robust DNN-HMM distant speech recognition. MUTAN: Multimodal Tuck …

Conditional Time Series Forecasting with Convolutional Neural Networks


Title	Conditional Time Series Forecasting with Convolutional Neural Networks
Authors	Anastasia Borovykh, Sander Bohte, Cornelis W. Oosterlee
Abstract	We present a method for conditional time series forecasting based on an adaptation of the recent deep convolutional WaveNet architecture. The proposed network contains stacks of dilated convolutions that allow it to access a broad range of history when forecasting, a ReLU activation function and conditioning is performed by applying multiple convolutional filters in parallel to separate time series which allows for the fast processing of data and the exploitation of the correlation structure between the multivariate time series. We test and analyze the performance of the convolutional network both unconditionally as well as conditionally for financial time series forecasting using the S&P500, the volatility index, the CBOE interest rate and several exchange rates and extensively compare it to the performance of the well-known autoregressive model and a long-short term memory network. We show that a convolutional network is well-suited for regression-type problems and is able to effectively learn dependencies in and between the series without the need for long historical time series, is a time-efficient and easy to implement alternative to recurrent-type networks and tends to outperform linear and recurrent models.
Tasks	Time Series, Time Series Forecasting
Published	2017-03-14
URL	http://arxiv.org/abs/1703.04691v5
PDF	http://arxiv.org/pdf/1703.04691v5.pdf
PWC	https://paperswithcode.com/paper/conditional-time-series-forecasting-with
Repo	https://github.com/litanli/wavenet-time-series-forecasting
Framework	pytorch

Learning to Segment Every Thing


Title	Learning to Segment Every Thing
Authors	Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick
Abstract	Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We evaluate our approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2017-11-28
URL	http://arxiv.org/abs/1711.10370v2
PDF	http://arxiv.org/pdf/1711.10370v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-segment-every-thing
Repo	https://github.com/facebookresearch/detectron
Framework	pytorch

Rasa: Open Source Language Understanding and Dialogue Management


Title	Rasa: Open Source Language Understanding and Dialogue Management
Authors	Tom Bocklisch, Joey Faulkner, Nick Pawlowski, Alan Nichol
Abstract	We introduce a pair of tools, Rasa NLU and Rasa Core, which are open source python libraries for building conversational software. Their purpose is to make machine-learning based dialogue management and language understanding accessible to non-specialist software developers. In terms of design philosophy, we aim for ease of use, and bootstrapping from minimal (or no) initial training data. Both packages are extensively documented and ship with a comprehensive suite of tests. The code is available at https://github.com/RasaHQ/
Tasks	Dialogue Management
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05181v2
PDF	http://arxiv.org/pdf/1712.05181v2.pdf
PWC	https://paperswithcode.com/paper/rasa-open-source-language-understanding-and
Repo	https://github.com/RasaHQ/rasa
Framework	none

Joint 3D Proposal Generation and Object Detection from View Aggregation


Title	Joint 3D Proposal Generation and Object Detection from View Aggregation
Authors	Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, Steven Waslander
Abstract	We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. Our proposed architecture is shown to produce state of the art results on the KITTI 3D object detection benchmark while running in real time with a low memory footprint, making it a suitable candidate for deployment on autonomous vehicles. Code is at: https://github.com/kujason/avod
Tasks	3D Object Detection, Autonomous Driving, Autonomous Vehicles, Object Detection
Published	2017-12-06
URL	http://arxiv.org/abs/1712.02294v4
PDF	http://arxiv.org/pdf/1712.02294v4.pdf
PWC	https://paperswithcode.com/paper/joint-3d-proposal-generation-and-object
Repo	https://github.com/kujason/avod
Framework	tf

Contaminated speech training methods for robust DNN-HMM distant speech recognition


Title	Contaminated speech training methods for robust DNN-HMM distant speech recognition
Authors	Mirco Ravanelli, Maurizio Omologo
Abstract	Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition. Robustness of distant speech recognition in adverse acoustic conditions, on the other hand, remains a crucial open issue for future applications of human-machine interaction. To this end, several advances in speech enhancement, acoustic scene analysis as well as acoustic modeling, have recently contributed to improve the state-of-the-art in the field. One of the most effective approaches to derive a robust acoustic modeling is based on using contaminated speech, which proved helpful in reducing the acoustic mismatch between training and testing conditions. In this paper, we revise this classical approach in the context of modern DNN-HMM systems, and propose the adoption of three methods, namely, asymmetric context windowing, close-talk based supervision, and close-talk based pre-training. The experimental results, obtained using both real and simulated data, show a significant advantage in using these three methods, overall providing a 15% error rate reduction compared to the baseline systems. The same trend in performance is confirmed either using a high-quality training set of small size, and a large one.
Tasks	Distant Speech Recognition, Speech Enhancement, Speech Recognition
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03538v1
PDF	http://arxiv.org/pdf/1710.03538v1.pdf
PWC	https://paperswithcode.com/paper/contaminated-speech-training-methods-for
Repo	https://github.com/mravanelli/pySpeechRev
Framework	none

MUTAN: Multimodal Tucker Fusion for Visual Question Answering


Title	MUTAN: Multimodal Tucker Fusion for Visual Question Answering
Authors	Hedi Ben-younes, Rémi Cadene, Matthieu Cord, Nicolas Thome
Abstract	Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks. They help to learn high level associations between question meaning and visual concepts in the image, but they suffer from huge dimensionality issues. We introduce MUTAN, a multimodal tensor-based Tucker decomposition to efficiently parametrize bilinear interactions between visual and textual representations. Additionally to the Tucker framework, we design a low-rank matrix-based decomposition to explicitly constrain the interaction rank. With MUTAN, we control the complexity of the merging scheme while keeping nice interpretable fusion relations. We show how our MUTAN model generalizes some of the latest VQA architectures, providing state-of-the-art results.
Tasks	Visual Question Answering
Published	2017-05-18
URL	http://arxiv.org/abs/1705.06676v1
PDF	http://arxiv.org/pdf/1705.06676v1.pdf
PWC	https://paperswithcode.com/paper/mutan-multimodal-tucker-fusion-for-visual
Repo	https://github.com/Cadene/vqa.pytorch
Framework	pytorch


Title	The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments
Authors	Mirco Ravanelli, Maurizio Omologo
Abstract	This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.
Tasks	Distant Speech Recognition, Speech Recognition
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02560v1
PDF	http://arxiv.org/pdf/1710.02560v1.pdf
PWC	https://paperswithcode.com/paper/the-dirha-english-corpus-and-related-tasks
Repo	https://github.com/mravanelli/pySpeechRev
Framework	none

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition


Title	Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
Authors	Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
Abstract	In this paper, a novel architecture for a deep recurrent neural network, residual LSTM is introduced. A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. The residual LSTM provides an additional spatial shortcut path from lower layers for efficient training of deep networks with multiple LSTM layers. Compared with the previous work, highway LSTM, residual LSTM separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows. Furthermore, residual LSTM reuses the output projection matrix and the output gate of LSTM to control the spatial information flow instead of additional gate networks, which effectively reduces more than 10% of network parameters. An experiment for distant speech recognition on the AMI SDM corpus shows that 10-layer plain and highway LSTM networks presented 13.7% and 6.2% increase in WER over 3-layer aselines, respectively. On the contrary, 10-layer residual LSTM networks provided the lowest WER 41.0%, which corresponds to 3.3% and 2.8% WER reduction over plain and highway LSTM networks, respectively.
Tasks	Distant Speech Recognition, Speech Recognition
Published	2017-01-10
URL	http://arxiv.org/abs/1701.03360v3
PDF	http://arxiv.org/pdf/1701.03360v3.pdf
PWC	https://paperswithcode.com/paper/residual-lstm-design-of-a-deep-recurrent
Repo	https://github.com/kdgutier/residual_lstm/blob/master/residual_lstm.py
Framework	pytorch

VGGFace2: A dataset for recognising faces across pose and age


Title	VGGFace2: A dataset for recognising faces across pose and age
Authors	Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman
Abstract	In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS- Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A, IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin. Datasets and models are publicly available.
Tasks	Face Recognition, Image Retrieval
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08092v2
PDF	http://arxiv.org/pdf/1710.08092v2.pdf
PWC	https://paperswithcode.com/paper/vggface2-a-dataset-for-recognising-faces
Repo	https://github.com/chenggongliang/arcface
Framework	mxnet

Multilabel Classification with R Package mlr


Title	Multilabel Classification with R Package mlr
Authors	Philipp Probst, Quay Au, Giuseppe Casalicchio, Clemens Stachl, Bernd Bischl
Abstract	We implemented several multilabel classification algorithms in the machine learning package mlr. The implemented methods are binary relevance, classifier chains, nested stacking, dependent binary relevance and stacking, which can be used with any base learner that is accessible in mlr. Moreover, there is access to the multilabel classification versions of randomForestSRC and rFerns. All these methods can be easily compared by different implemented multilabel performance measures and resampling methods in the standardized mlr framework. In a benchmark experiment with several multilabel datasets, the performance of the different methods is evaluated.
Tasks
Published	2017-03-27
URL	http://arxiv.org/abs/1703.08991v2
PDF	http://arxiv.org/pdf/1703.08991v2.pdf
PWC	https://paperswithcode.com/paper/multilabel-classification-with-r-package-mlr
Repo	https://github.com/mlr-org/mlr
Framework	none

Performance Evaluation of Channel Decoding With Deep Neural Networks


Title	Performance Evaluation of Channel Decoding With Deep Neural Networks
Authors	Wei Lyu, Zhaoyang Zhang, Chunxu Jiao, Kangjian Qin, Huazi Zhang
Abstract	With the demand of high data rate and low latency in fifth generation (5G), deep neural network decoder (NND) has become a promising candidate due to its capability of one-shot decoding and parallel computing. In this paper, three types of NND, i.e., multi-layer perceptron (MLP), convolution neural network (CNN) and recurrent neural network (RNN), are proposed with the same parameter magnitude. The performance of these deep neural networks are evaluated through extensive simulation. Numerical results show that RNN has the best decoding performance, yet at the price of the highest computational overhead. Moreover, we find there exists a saturation length for each type of neural network, which is caused by their restricted learning abilities.
Tasks
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00727v2
PDF	http://arxiv.org/pdf/1711.00727v2.pdf
PWC	https://paperswithcode.com/paper/performance-evaluation-of-channel-decoding
Repo	https://github.com/levylv/deep-neural-network-decoder
Framework	tf

A Deep Reinforced Model for Abstractive Summarization


Title	A Deep Reinforced Model for Abstractive Summarization
Authors	Romain Paulus, Caiming Xiong, Richard Socher
Abstract	Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit “exposure bias” - they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries.
Tasks	Abstractive Text Summarization
Published	2017-05-11
URL	http://arxiv.org/abs/1705.04304v3
PDF	http://arxiv.org/pdf/1705.04304v3.pdf
PWC	https://paperswithcode.com/paper/a-deep-reinforced-model-for-abstractive
Repo	https://github.com/JRC1995/Abstractive-Summarization
Framework	tf

Generative Adversarial Network for Abstractive Text Summarization


Title	Generative Adversarial Network for Abstractive Text Summarization
Authors	Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li
Abstract	In this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries.
Tasks	Abstractive Text Summarization, Text Summarization
Published	2017-11-26
URL	http://arxiv.org/abs/1711.09357v1
PDF	http://arxiv.org/pdf/1711.09357v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-network-for
Repo	https://github.com/iwangjian/textsum-gan
Framework	tf

The E2E Dataset: New Challenges For End-to-End Generation


Title	The E2E Dataset: New Challenges For End-to-End Generation
Authors	Jekaterina Novikova, Ondřej Dušek, Verena Rieser
Abstract	This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.
Tasks	Data-to-Text Generation, Text Generation
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09254v2
PDF	http://arxiv.org/pdf/1706.09254v2.pdf
PWC	https://paperswithcode.com/paper/the-e2e-dataset-new-challenges-for-end-to-end
Repo	https://github.com/UFAL-DSG/tgen
Framework	tf

Binary Bouncy Particle Sampler


Title	Binary Bouncy Particle Sampler
Authors	Ari Pakman
Abstract	The Bouncy Particle Sampler is a novel rejection-free non-reversible sampler for differentiable probability distributions over continuous variables. We generalize the algorithm to piecewise differentiable distributions and apply it to generic binary distributions using a piecewise differentiable augmentation. We illustrate the new algorithm in a binary Markov Random Field example, and compare it to binary Hamiltonian Monte Carlo. Our results suggest that binary BPS samplers are better for easy to mix distributions.
Tasks
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00922v1
PDF	http://arxiv.org/pdf/1711.00922v1.pdf
PWC	https://paperswithcode.com/paper/binary-bouncy-particle-sampler
Repo	https://github.com/aripakman/binary_bps
Framework	none