July 30, 2019

2739 words 13 mins read

Paper Group AWR 13

Paper Group AWR 13

Conditional Time Series Forecasting with Convolutional Neural Networks. Learning to Segment Every Thing. Rasa: Open Source Language Understanding and Dialogue Management. Joint 3D Proposal Generation and Object Detection from View Aggregation. Contaminated speech training methods for robust DNN-HMM distant speech recognition. MUTAN: Multimodal Tuck …

Conditional Time Series Forecasting with Convolutional Neural Networks

Title Conditional Time Series Forecasting with Convolutional Neural Networks
Authors Anastasia Borovykh, Sander Bohte, Cornelis W. Oosterlee
Abstract We present a method for conditional time series forecasting based on an adaptation of the recent deep convolutional WaveNet architecture. The proposed network contains stacks of dilated convolutions that allow it to access a broad range of history when forecasting, a ReLU activation function and conditioning is performed by applying multiple convolutional filters in parallel to separate time series which allows for the fast processing of data and the exploitation of the correlation structure between the multivariate time series. We test and analyze the performance of the convolutional network both unconditionally as well as conditionally for financial time series forecasting using the S&P500, the volatility index, the CBOE interest rate and several exchange rates and extensively compare it to the performance of the well-known autoregressive model and a long-short term memory network. We show that a convolutional network is well-suited for regression-type problems and is able to effectively learn dependencies in and between the series without the need for long historical time series, is a time-efficient and easy to implement alternative to recurrent-type networks and tends to outperform linear and recurrent models.
Tasks Time Series, Time Series Forecasting
Published 2017-03-14
URL http://arxiv.org/abs/1703.04691v5
PDF http://arxiv.org/pdf/1703.04691v5.pdf
PWC https://paperswithcode.com/paper/conditional-time-series-forecasting-with
Repo https://github.com/litanli/wavenet-time-series-forecasting
Framework pytorch

Learning to Segment Every Thing

Title Learning to Segment Every Thing
Authors Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick
Abstract Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We evaluate our approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world.
Tasks Instance Segmentation, Semantic Segmentation
Published 2017-11-28
URL http://arxiv.org/abs/1711.10370v2
PDF http://arxiv.org/pdf/1711.10370v2.pdf
PWC https://paperswithcode.com/paper/learning-to-segment-every-thing
Repo https://github.com/facebookresearch/detectron
Framework pytorch

Rasa: Open Source Language Understanding and Dialogue Management

Title Rasa: Open Source Language Understanding and Dialogue Management
Authors Tom Bocklisch, Joey Faulkner, Nick Pawlowski, Alan Nichol
Abstract We introduce a pair of tools, Rasa NLU and Rasa Core, which are open source python libraries for building conversational software. Their purpose is to make machine-learning based dialogue management and language understanding accessible to non-specialist software developers. In terms of design philosophy, we aim for ease of use, and bootstrapping from minimal (or no) initial training data. Both packages are extensively documented and ship with a comprehensive suite of tests. The code is available at https://github.com/RasaHQ/
Tasks Dialogue Management
Published 2017-12-14
URL http://arxiv.org/abs/1712.05181v2
PDF http://arxiv.org/pdf/1712.05181v2.pdf
PWC https://paperswithcode.com/paper/rasa-open-source-language-understanding-and
Repo https://github.com/RasaHQ/rasa
Framework none

Joint 3D Proposal Generation and Object Detection from View Aggregation

Title Joint 3D Proposal Generation and Object Detection from View Aggregation
Authors Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, Steven Waslander
Abstract We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. Our proposed architecture is shown to produce state of the art results on the KITTI 3D object detection benchmark while running in real time with a low memory footprint, making it a suitable candidate for deployment on autonomous vehicles. Code is at: https://github.com/kujason/avod
Tasks 3D Object Detection, Autonomous Driving, Autonomous Vehicles, Object Detection
Published 2017-12-06
URL http://arxiv.org/abs/1712.02294v4
PDF http://arxiv.org/pdf/1712.02294v4.pdf
PWC https://paperswithcode.com/paper/joint-3d-proposal-generation-and-object
Repo https://github.com/kujason/avod
Framework tf

Contaminated speech training methods for robust DNN-HMM distant speech recognition

Title Contaminated speech training methods for robust DNN-HMM distant speech recognition
Authors Mirco Ravanelli, Maurizio Omologo
Abstract Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition. Robustness of distant speech recognition in adverse acoustic conditions, on the other hand, remains a crucial open issue for future applications of human-machine interaction. To this end, several advances in speech enhancement, acoustic scene analysis as well as acoustic modeling, have recently contributed to improve the state-of-the-art in the field. One of the most effective approaches to derive a robust acoustic modeling is based on using contaminated speech, which proved helpful in reducing the acoustic mismatch between training and testing conditions. In this paper, we revise this classical approach in the context of modern DNN-HMM systems, and propose the adoption of three methods, namely, asymmetric context windowing, close-talk based supervision, and close-talk based pre-training. The experimental results, obtained using both real and simulated data, show a significant advantage in using these three methods, overall providing a 15% error rate reduction compared to the baseline systems. The same trend in performance is confirmed either using a high-quality training set of small size, and a large one.
Tasks Distant Speech Recognition, Speech Enhancement, Speech Recognition
Published 2017-10-10
URL http://arxiv.org/abs/1710.03538v1
PDF http://arxiv.org/pdf/1710.03538v1.pdf
PWC https://paperswithcode.com/paper/contaminated-speech-training-methods-for
Repo https://github.com/mravanelli/pySpeechRev
Framework none

MUTAN: Multimodal Tucker Fusion for Visual Question Answering

Title MUTAN: Multimodal Tucker Fusion for Visual Question Answering
Authors Hedi Ben-younes, Rémi Cadene, Matthieu Cord, Nicolas Thome
Abstract Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks. They help to learn high level associations between question meaning and visual concepts in the image, but they suffer from huge dimensionality issues. We introduce MUTAN, a multimodal tensor-based Tucker decomposition to efficiently parametrize bilinear interactions between visual and textual representations. Additionally to the Tucker framework, we design a low-rank matrix-based decomposition to explicitly constrain the interaction rank. With MUTAN, we control the complexity of the merging scheme while keeping nice interpretable fusion relations. We show how our MUTAN model generalizes some of the latest VQA architectures, providing state-of-the-art results.
Tasks Visual Question Answering
Published 2017-05-18
URL http://arxiv.org/abs/1705.06676v1
PDF http://arxiv.org/pdf/1705.06676v1.pdf
PWC https://paperswithcode.com/paper/mutan-multimodal-tucker-fusion-for-visual
Repo https://github.com/Cadene/vqa.pytorch
Framework pytorch
Title The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments
Authors Mirco Ravanelli, Maurizio Omologo
Abstract This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.
Tasks Distant Speech Recognition, Speech Recognition
Published 2017-10-06
URL http://arxiv.org/abs/1710.02560v1
PDF http://arxiv.org/pdf/1710.02560v1.pdf
PWC https://paperswithcode.com/paper/the-dirha-english-corpus-and-related-tasks
Repo https://github.com/mravanelli/pySpeechRev
Framework none

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Title Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
Authors Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee
Abstract In this paper, a novel architecture for a deep recurrent neural network, residual LSTM is introduced. A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. The residual LSTM provides an additional spatial shortcut path from lower layers for efficient training of deep networks with multiple LSTM layers. Compared with the previous work, highway LSTM, residual LSTM separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows. Furthermore, residual LSTM reuses the output projection matrix and the output gate of LSTM to control the spatial information flow instead of additional gate networks, which effectively reduces more than 10% of network parameters. An experiment for distant speech recognition on the AMI SDM corpus shows that 10-layer plain and highway LSTM networks presented 13.7% and 6.2% increase in WER over 3-layer aselines, respectively. On the contrary, 10-layer residual LSTM networks provided the lowest WER 41.0%, which corresponds to 3.3% and 2.8% WER reduction over plain and highway LSTM networks, respectively.
Tasks Distant Speech Recognition, Speech Recognition
Published 2017-01-10
URL http://arxiv.org/abs/1701.03360v3
PDF http://arxiv.org/pdf/1701.03360v3.pdf
PWC https://paperswithcode.com/paper/residual-lstm-design-of-a-deep-recurrent
Repo https://github.com/kdgutier/residual_lstm/blob/master/residual_lstm.py
Framework pytorch

VGGFace2: A dataset for recognising faces across pose and age

Title VGGFace2: A dataset for recognising faces across pose and age
Authors Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman
Abstract In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS- Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A, IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin. Datasets and models are publicly available.
Tasks Face Recognition, Image Retrieval
Published 2017-10-23
URL http://arxiv.org/abs/1710.08092v2
PDF http://arxiv.org/pdf/1710.08092v2.pdf
PWC https://paperswithcode.com/paper/vggface2-a-dataset-for-recognising-faces
Repo https://github.com/chenggongliang/arcface
Framework mxnet

Multilabel Classification with R Package mlr

Title Multilabel Classification with R Package mlr
Authors Philipp Probst, Quay Au, Giuseppe Casalicchio, Clemens Stachl, Bernd Bischl
Abstract We implemented several multilabel classification algorithms in the machine learning package mlr. The implemented methods are binary relevance, classifier chains, nested stacking, dependent binary relevance and stacking, which can be used with any base learner that is accessible in mlr. Moreover, there is access to the multilabel classification versions of randomForestSRC and rFerns. All these methods can be easily compared by different implemented multilabel performance measures and resampling methods in the standardized mlr framework. In a benchmark experiment with several multilabel datasets, the performance of the different methods is evaluated.
Tasks
Published 2017-03-27
URL http://arxiv.org/abs/1703.08991v2
PDF http://arxiv.org/pdf/1703.08991v2.pdf
PWC https://paperswithcode.com/paper/multilabel-classification-with-r-package-mlr
Repo https://github.com/mlr-org/mlr
Framework none

Performance Evaluation of Channel Decoding With Deep Neural Networks

Title Performance Evaluation of Channel Decoding With Deep Neural Networks
Authors Wei Lyu, Zhaoyang Zhang, Chunxu Jiao, Kangjian Qin, Huazi Zhang
Abstract With the demand of high data rate and low latency in fifth generation (5G), deep neural network decoder (NND) has become a promising candidate due to its capability of one-shot decoding and parallel computing. In this paper, three types of NND, i.e., multi-layer perceptron (MLP), convolution neural network (CNN) and recurrent neural network (RNN), are proposed with the same parameter magnitude. The performance of these deep neural networks are evaluated through extensive simulation. Numerical results show that RNN has the best decoding performance, yet at the price of the highest computational overhead. Moreover, we find there exists a saturation length for each type of neural network, which is caused by their restricted learning abilities.
Tasks
Published 2017-11-01
URL http://arxiv.org/abs/1711.00727v2
PDF http://arxiv.org/pdf/1711.00727v2.pdf
PWC https://paperswithcode.com/paper/performance-evaluation-of-channel-decoding
Repo https://github.com/levylv/deep-neural-network-decoder
Framework tf

A Deep Reinforced Model for Abstractive Summarization

Title A Deep Reinforced Model for Abstractive Summarization
Authors Romain Paulus, Caiming Xiong, Richard Socher
Abstract Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit “exposure bias” - they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries.
Tasks Abstractive Text Summarization
Published 2017-05-11
URL http://arxiv.org/abs/1705.04304v3
PDF http://arxiv.org/pdf/1705.04304v3.pdf
PWC https://paperswithcode.com/paper/a-deep-reinforced-model-for-abstractive
Repo https://github.com/JRC1995/Abstractive-Summarization
Framework tf

Generative Adversarial Network for Abstractive Text Summarization

Title Generative Adversarial Network for Abstractive Text Summarization
Authors Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li
Abstract In this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries.
Tasks Abstractive Text Summarization, Text Summarization
Published 2017-11-26
URL http://arxiv.org/abs/1711.09357v1
PDF http://arxiv.org/pdf/1711.09357v1.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-network-for
Repo https://github.com/iwangjian/textsum-gan
Framework tf

The E2E Dataset: New Challenges For End-to-End Generation

Title The E2E Dataset: New Challenges For End-to-End Generation
Authors Jekaterina Novikova, Ondřej Dušek, Verena Rieser
Abstract This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.
Tasks Data-to-Text Generation, Text Generation
Published 2017-06-28
URL http://arxiv.org/abs/1706.09254v2
PDF http://arxiv.org/pdf/1706.09254v2.pdf
PWC https://paperswithcode.com/paper/the-e2e-dataset-new-challenges-for-end-to-end
Repo https://github.com/UFAL-DSG/tgen
Framework tf

Binary Bouncy Particle Sampler

Title Binary Bouncy Particle Sampler
Authors Ari Pakman
Abstract The Bouncy Particle Sampler is a novel rejection-free non-reversible sampler for differentiable probability distributions over continuous variables. We generalize the algorithm to piecewise differentiable distributions and apply it to generic binary distributions using a piecewise differentiable augmentation. We illustrate the new algorithm in a binary Markov Random Field example, and compare it to binary Hamiltonian Monte Carlo. Our results suggest that binary BPS samplers are better for easy to mix distributions.
Tasks
Published 2017-11-02
URL http://arxiv.org/abs/1711.00922v1
PDF http://arxiv.org/pdf/1711.00922v1.pdf
PWC https://paperswithcode.com/paper/binary-bouncy-particle-sampler
Repo https://github.com/aripakman/binary_bps
Framework none
comments powered by Disqus