Paper Group AWR 13
Conditional Time Series Forecasting with Convolutional Neural Networks. Learning to Segment Every Thing. Rasa: Open Source Language Understanding and Dialogue Management. Joint 3D Proposal Generation and Object Detection from View Aggregation. Contaminated speech training methods for robust DNN-HMM distant speech recognition. MUTAN: Multimodal Tuck …
Conditional Time Series Forecasting with Convolutional Neural Networks
Title | Conditional Time Series Forecasting with Convolutional Neural Networks |
Authors | Anastasia Borovykh, Sander Bohte, Cornelis W. Oosterlee |
Abstract | We present a method for conditional time series forecasting based on an adaptation of the recent deep convolutional WaveNet architecture. The proposed network contains stacks of dilated convolutions that allow it to access a broad range of history when forecasting, a ReLU activation function and conditioning is performed by applying multiple convolutional filters in parallel to separate time series which allows for the fast processing of data and the exploitation of the correlation structure between the multivariate time series. We test and analyze the performance of the convolutional network both unconditionally as well as conditionally for financial time series forecasting using the S&P500, the volatility index, the CBOE interest rate and several exchange rates and extensively compare it to the performance of the well-known autoregressive model and a long-short term memory network. We show that a convolutional network is well-suited for regression-type problems and is able to effectively learn dependencies in and between the series without the need for long historical time series, is a time-efficient and easy to implement alternative to recurrent-type networks and tends to outperform linear and recurrent models. |
Tasks | Time Series, Time Series Forecasting |
Published | 2017-03-14 |
URL | http://arxiv.org/abs/1703.04691v5 |
http://arxiv.org/pdf/1703.04691v5.pdf | |
PWC | https://paperswithcode.com/paper/conditional-time-series-forecasting-with |
Repo | https://github.com/litanli/wavenet-time-series-forecasting |
Framework | pytorch |
Learning to Segment Every Thing
Title | Learning to Segment Every Thing |
Authors | Ronghang Hu, Piotr Dollár, Kaiming He, Trevor Darrell, Ross Girshick |
Abstract | Most methods for object instance segmentation require all training examples to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. The goal of this paper is to propose a new partially supervised training paradigm, together with a novel weight transfer function, that enables training instance segmentation models on a large set of categories all of which have box annotations, but only a small fraction of which have mask annotations. These contributions allow us to train Mask R-CNN to detect and segment 3000 visual concepts using box annotations from the Visual Genome dataset and mask annotations from the 80 classes in the COCO dataset. We evaluate our approach in a controlled study on the COCO dataset. This work is a first step towards instance segmentation models that have broad comprehension of the visual world. |
Tasks | Instance Segmentation, Semantic Segmentation |
Published | 2017-11-28 |
URL | http://arxiv.org/abs/1711.10370v2 |
http://arxiv.org/pdf/1711.10370v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-segment-every-thing |
Repo | https://github.com/facebookresearch/detectron |
Framework | pytorch |
Rasa: Open Source Language Understanding and Dialogue Management
Title | Rasa: Open Source Language Understanding and Dialogue Management |
Authors | Tom Bocklisch, Joey Faulkner, Nick Pawlowski, Alan Nichol |
Abstract | We introduce a pair of tools, Rasa NLU and Rasa Core, which are open source python libraries for building conversational software. Their purpose is to make machine-learning based dialogue management and language understanding accessible to non-specialist software developers. In terms of design philosophy, we aim for ease of use, and bootstrapping from minimal (or no) initial training data. Both packages are extensively documented and ship with a comprehensive suite of tests. The code is available at https://github.com/RasaHQ/ |
Tasks | Dialogue Management |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05181v2 |
http://arxiv.org/pdf/1712.05181v2.pdf | |
PWC | https://paperswithcode.com/paper/rasa-open-source-language-understanding-and |
Repo | https://github.com/RasaHQ/rasa |
Framework | none |
Joint 3D Proposal Generation and Object Detection from View Aggregation
Title | Joint 3D Proposal Generation and Object Detection from View Aggregation |
Authors | Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, Steven Waslander |
Abstract | We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. Our proposed architecture is shown to produce state of the art results on the KITTI 3D object detection benchmark while running in real time with a low memory footprint, making it a suitable candidate for deployment on autonomous vehicles. Code is at: https://github.com/kujason/avod |
Tasks | 3D Object Detection, Autonomous Driving, Autonomous Vehicles, Object Detection |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02294v4 |
http://arxiv.org/pdf/1712.02294v4.pdf | |
PWC | https://paperswithcode.com/paper/joint-3d-proposal-generation-and-object |
Repo | https://github.com/kujason/avod |
Framework | tf |
Contaminated speech training methods for robust DNN-HMM distant speech recognition
Title | Contaminated speech training methods for robust DNN-HMM distant speech recognition |
Authors | Mirco Ravanelli, Maurizio Omologo |
Abstract | Despite the significant progress made in the last years, state-of-the-art speech recognition technologies provide a satisfactory performance only in the close-talking condition. Robustness of distant speech recognition in adverse acoustic conditions, on the other hand, remains a crucial open issue for future applications of human-machine interaction. To this end, several advances in speech enhancement, acoustic scene analysis as well as acoustic modeling, have recently contributed to improve the state-of-the-art in the field. One of the most effective approaches to derive a robust acoustic modeling is based on using contaminated speech, which proved helpful in reducing the acoustic mismatch between training and testing conditions. In this paper, we revise this classical approach in the context of modern DNN-HMM systems, and propose the adoption of three methods, namely, asymmetric context windowing, close-talk based supervision, and close-talk based pre-training. The experimental results, obtained using both real and simulated data, show a significant advantage in using these three methods, overall providing a 15% error rate reduction compared to the baseline systems. The same trend in performance is confirmed either using a high-quality training set of small size, and a large one. |
Tasks | Distant Speech Recognition, Speech Enhancement, Speech Recognition |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03538v1 |
http://arxiv.org/pdf/1710.03538v1.pdf | |
PWC | https://paperswithcode.com/paper/contaminated-speech-training-methods-for |
Repo | https://github.com/mravanelli/pySpeechRev |
Framework | none |
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
Title | MUTAN: Multimodal Tucker Fusion for Visual Question Answering |
Authors | Hedi Ben-younes, Rémi Cadene, Matthieu Cord, Nicolas Thome |
Abstract | Bilinear models provide an appealing framework for mixing and merging information in Visual Question Answering (VQA) tasks. They help to learn high level associations between question meaning and visual concepts in the image, but they suffer from huge dimensionality issues. We introduce MUTAN, a multimodal tensor-based Tucker decomposition to efficiently parametrize bilinear interactions between visual and textual representations. Additionally to the Tucker framework, we design a low-rank matrix-based decomposition to explicitly constrain the interaction rank. With MUTAN, we control the complexity of the merging scheme while keeping nice interpretable fusion relations. We show how our MUTAN model generalizes some of the latest VQA architectures, providing state-of-the-art results. |
Tasks | Visual Question Answering |
Published | 2017-05-18 |
URL | http://arxiv.org/abs/1705.06676v1 |
http://arxiv.org/pdf/1705.06676v1.pdf | |
PWC | https://paperswithcode.com/paper/mutan-multimodal-tucker-fusion-for-visual |
Repo | https://github.com/Cadene/vqa.pytorch |
Framework | pytorch |
The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments
Title | The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments |
Authors | Mirco Ravanelli, Maurizio Omologo |
Abstract | This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level. |
Tasks | Distant Speech Recognition, Speech Recognition |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02560v1 |
http://arxiv.org/pdf/1710.02560v1.pdf | |
PWC | https://paperswithcode.com/paper/the-dirha-english-corpus-and-related-tasks |
Repo | https://github.com/mravanelli/pySpeechRev |
Framework | none |
Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition
Title | Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition |
Authors | Jaeyoung Kim, Mostafa El-Khamy, Jungwon Lee |
Abstract | In this paper, a novel architecture for a deep recurrent neural network, residual LSTM is introduced. A plain LSTM has an internal memory cell that can learn long term dependencies of sequential data. It also provides a temporal shortcut path to avoid vanishing or exploding gradients in the temporal domain. The residual LSTM provides an additional spatial shortcut path from lower layers for efficient training of deep networks with multiple LSTM layers. Compared with the previous work, highway LSTM, residual LSTM separates a spatial shortcut path with temporal one by using output layers, which can help to avoid a conflict between spatial and temporal-domain gradient flows. Furthermore, residual LSTM reuses the output projection matrix and the output gate of LSTM to control the spatial information flow instead of additional gate networks, which effectively reduces more than 10% of network parameters. An experiment for distant speech recognition on the AMI SDM corpus shows that 10-layer plain and highway LSTM networks presented 13.7% and 6.2% increase in WER over 3-layer aselines, respectively. On the contrary, 10-layer residual LSTM networks provided the lowest WER 41.0%, which corresponds to 3.3% and 2.8% WER reduction over plain and highway LSTM networks, respectively. |
Tasks | Distant Speech Recognition, Speech Recognition |
Published | 2017-01-10 |
URL | http://arxiv.org/abs/1701.03360v3 |
http://arxiv.org/pdf/1701.03360v3.pdf | |
PWC | https://paperswithcode.com/paper/residual-lstm-design-of-a-deep-recurrent |
Repo | https://github.com/kdgutier/residual_lstm/blob/master/residual_lstm.py |
Framework | pytorch |
VGGFace2: A dataset for recognising faces across pose and age
Title | VGGFace2: A dataset for recognising faces across pose and age |
Authors | Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman |
Abstract | In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS- Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A, IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin. Datasets and models are publicly available. |
Tasks | Face Recognition, Image Retrieval |
Published | 2017-10-23 |
URL | http://arxiv.org/abs/1710.08092v2 |
http://arxiv.org/pdf/1710.08092v2.pdf | |
PWC | https://paperswithcode.com/paper/vggface2-a-dataset-for-recognising-faces |
Repo | https://github.com/chenggongliang/arcface |
Framework | mxnet |
Multilabel Classification with R Package mlr
Title | Multilabel Classification with R Package mlr |
Authors | Philipp Probst, Quay Au, Giuseppe Casalicchio, Clemens Stachl, Bernd Bischl |
Abstract | We implemented several multilabel classification algorithms in the machine learning package mlr. The implemented methods are binary relevance, classifier chains, nested stacking, dependent binary relevance and stacking, which can be used with any base learner that is accessible in mlr. Moreover, there is access to the multilabel classification versions of randomForestSRC and rFerns. All these methods can be easily compared by different implemented multilabel performance measures and resampling methods in the standardized mlr framework. In a benchmark experiment with several multilabel datasets, the performance of the different methods is evaluated. |
Tasks | |
Published | 2017-03-27 |
URL | http://arxiv.org/abs/1703.08991v2 |
http://arxiv.org/pdf/1703.08991v2.pdf | |
PWC | https://paperswithcode.com/paper/multilabel-classification-with-r-package-mlr |
Repo | https://github.com/mlr-org/mlr |
Framework | none |
Performance Evaluation of Channel Decoding With Deep Neural Networks
Title | Performance Evaluation of Channel Decoding With Deep Neural Networks |
Authors | Wei Lyu, Zhaoyang Zhang, Chunxu Jiao, Kangjian Qin, Huazi Zhang |
Abstract | With the demand of high data rate and low latency in fifth generation (5G), deep neural network decoder (NND) has become a promising candidate due to its capability of one-shot decoding and parallel computing. In this paper, three types of NND, i.e., multi-layer perceptron (MLP), convolution neural network (CNN) and recurrent neural network (RNN), are proposed with the same parameter magnitude. The performance of these deep neural networks are evaluated through extensive simulation. Numerical results show that RNN has the best decoding performance, yet at the price of the highest computational overhead. Moreover, we find there exists a saturation length for each type of neural network, which is caused by their restricted learning abilities. |
Tasks | |
Published | 2017-11-01 |
URL | http://arxiv.org/abs/1711.00727v2 |
http://arxiv.org/pdf/1711.00727v2.pdf | |
PWC | https://paperswithcode.com/paper/performance-evaluation-of-channel-decoding |
Repo | https://github.com/levylv/deep-neural-network-decoder |
Framework | tf |
A Deep Reinforced Model for Abstractive Summarization
Title | A Deep Reinforced Model for Abstractive Summarization |
Authors | Romain Paulus, Caiming Xiong, Richard Socher |
Abstract | Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit “exposure bias” - they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries. |
Tasks | Abstractive Text Summarization |
Published | 2017-05-11 |
URL | http://arxiv.org/abs/1705.04304v3 |
http://arxiv.org/pdf/1705.04304v3.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-reinforced-model-for-abstractive |
Repo | https://github.com/JRC1995/Abstractive-Summarization |
Framework | tf |
Generative Adversarial Network for Abstractive Text Summarization
Title | Generative Adversarial Network for Abstractive Text Summarization |
Authors | Linqing Liu, Yao Lu, Min Yang, Qiang Qu, Jia Zhu, Hongyan Li |
Abstract | In this paper, we propose an adversarial process for abstractive text summarization, in which we simultaneously train a generative model G and a discriminative model D. In particular, we build the generator G as an agent of reinforcement learning, which takes the raw text as input and predicts the abstractive summarization. We also build a discriminator which attempts to distinguish the generated summary from the ground truth summary. Extensive experiments demonstrate that our model achieves competitive ROUGE scores with the state-of-the-art methods on CNN/Daily Mail dataset. Qualitatively, we show that our model is able to generate more abstractive, readable and diverse summaries. |
Tasks | Abstractive Text Summarization, Text Summarization |
Published | 2017-11-26 |
URL | http://arxiv.org/abs/1711.09357v1 |
http://arxiv.org/pdf/1711.09357v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-network-for |
Repo | https://github.com/iwangjian/textsum-gan |
Framework | tf |
The E2E Dataset: New Challenges For End-to-End Generation
Title | The E2E Dataset: New Challenges For End-to-End Generation |
Authors | Jekaterina Novikova, Ondřej Dušek, Verena Rieser |
Abstract | This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data. |
Tasks | Data-to-Text Generation, Text Generation |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09254v2 |
http://arxiv.org/pdf/1706.09254v2.pdf | |
PWC | https://paperswithcode.com/paper/the-e2e-dataset-new-challenges-for-end-to-end |
Repo | https://github.com/UFAL-DSG/tgen |
Framework | tf |
Binary Bouncy Particle Sampler
Title | Binary Bouncy Particle Sampler |
Authors | Ari Pakman |
Abstract | The Bouncy Particle Sampler is a novel rejection-free non-reversible sampler for differentiable probability distributions over continuous variables. We generalize the algorithm to piecewise differentiable distributions and apply it to generic binary distributions using a piecewise differentiable augmentation. We illustrate the new algorithm in a binary Markov Random Field example, and compare it to binary Hamiltonian Monte Carlo. Our results suggest that binary BPS samplers are better for easy to mix distributions. |
Tasks | |
Published | 2017-11-02 |
URL | http://arxiv.org/abs/1711.00922v1 |
http://arxiv.org/pdf/1711.00922v1.pdf | |
PWC | https://paperswithcode.com/paper/binary-bouncy-particle-sampler |
Repo | https://github.com/aripakman/binary_bps |
Framework | none |