October 21, 2019

2709 words 13 mins read

Paper Group AWR 163

Paper Group AWR 163

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq. SFace: An Efficient Network for Face Detection in Large Scale Variations. Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment. Learning Representat …

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

Title SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Authors Taku Kudo, John Richardson
Abstract This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece.
Tasks Machine Translation
Published 2018-08-19
URL http://arxiv.org/abs/1808.06226v1
PDF http://arxiv.org/pdf/1808.06226v1.pdf
PWC https://paperswithcode.com/paper/sentencepiece-a-simple-and-language-1
Repo https://github.com/google/sentencepiece
Framework tf

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq

Title Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq
Authors Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius
Abstract We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x less training time. OpenSeq2Seq currently provides building blocks for models that solve a wide range of tasks including neural machine translation, automatic speech recognition, and speech synthesis.
Tasks Machine Translation, Speech Recognition, Speech Synthesis
Published 2018-05-25
URL http://arxiv.org/abs/1805.10387v2
PDF http://arxiv.org/pdf/1805.10387v2.pdf
PWC https://paperswithcode.com/paper/mixed-precision-training-for-nlp-and-speech
Repo https://github.com/rickyHong/OpenSeq2Seq-repl
Framework tf

SFace: An Efficient Network for Face Detection in Large Scale Variations

Title SFace: An Efficient Network for Face Detection in Large Scale Variations
Authors Jianfeng Wang, Ye Yuan, Boxun Li, Gang Yu, Sun Jian
Abstract Face detection serves as a fundamental research topic for many applications like face recognition. Impressive progress has been made especially with the recent development of convolutional neural networks. However, the issue of large scale variations, which widely exists in high resolution images/videos, has not been well addressed in the literature. In this paper, we present a novel algorithm called SFace, which efficiently integrates the anchor-based method and anchor-free method to address the scale issues. A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations. The SFace architecture shows promising results on the new 4K-Face benchmarks. In addition, our method can run at 50 frames per second (fps) with an accuracy of 80% AP on the standard WIDER FACE dataset, which outperforms the state-of-art algorithms by almost one order of magnitude in speed while achieves comparative performance.
Tasks Face Detection, Face Recognition
Published 2018-04-18
URL http://arxiv.org/abs/1804.06559v2
PDF http://arxiv.org/pdf/1804.06559v2.pdf
PWC https://paperswithcode.com/paper/sface-an-efficient-network-for-face-detection
Repo https://github.com/wjfwzzc/4K-Face
Framework none

Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment

Title Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment
Authors Jia Guo, Jiankang Deng, Niannan Xue, Stefanos Zafeiriou
Abstract Face Analysis Project on MXNet
Tasks Face Alignment, Face Recognition, Robust Face Alignment, Robust Face Recognition
Published 2018-12-05
URL http://arxiv.org/abs/1812.01936v1
PDF http://arxiv.org/pdf/1812.01936v1.pdf
PWC https://paperswithcode.com/paper/stacked-dense-u-nets-with-dual-transformers
Repo https://github.com/deepinsight/insightface
Framework mxnet

Learning Representations of Sets through Optimized Permutations

Title Learning Representations of Sets through Optimized Permutations
Authors Yan Zhang, Jonathon Hare, Adam Prügel-Bennett
Abstract Representations of sets are challenging to learn because operations on sets should be permutation-invariant. To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. The permuted set can be further processed to learn a permutation-invariant representation of that set, avoiding a bottleneck in traditional set models. We demonstrate our model’s ability to learn permutations and set representations with either explicit or implicit supervision on four datasets, on which we achieve state-of-the-art results: number sorting, image mosaics, classification from image mosaics, and visual question answering.
Tasks Question Answering, Visual Question Answering
Published 2018-12-10
URL http://arxiv.org/abs/1812.03928v3
PDF http://arxiv.org/pdf/1812.03928v3.pdf
PWC https://paperswithcode.com/paper/learning-representations-of-sets-through
Repo https://github.com/iclr2019-anon123456/perm-optim
Framework pytorch

Improved Mixed-Example Data Augmentation

Title Improved Mixed-Example Data Augmentation
Authors Cecilia Summers, Michael J. Dinneen
Abstract In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these types of transformations make intuitive sense, recent work has demonstrated that even non-label-preserving data augmentation can be surprisingly effective, examining this type of data augmentation through linear combinations of pairs of examples. Despite their effectiveness, little is known about why such methods work. In this work, we aim to explore a new, more generalized form of this type of data augmentation in order to determine whether such linearity is necessary. By considering this broader scope of “mixed-example data augmentation”, we find a much larger space of practical augmentation techniques, including methods that improve upon previous state-of-the-art. This generalization has benefits beyond the promise of improved performance, revealing a number of types of mixed-example data augmentation that are radically different from those considered in prior work, which provides evidence that current theories for the effectiveness of such methods are incomplete and suggests that any such theory must explain a much broader phenomenon. Code is available at https://github.com/ceciliaresearch/MixedExample.
Tasks Data Augmentation, Image Augmentation
Published 2018-05-29
URL http://arxiv.org/abs/1805.11272v4
PDF http://arxiv.org/pdf/1805.11272v4.pdf
PWC https://paperswithcode.com/paper/improved-mixed-example-data-augmentation
Repo https://github.com/ceciliaresearch/MixedExample
Framework tf

Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification

Title Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification
Authors Rodrigo Minetto, Mauricio Pamplona Segundo, Sudeep Sarkar
Abstract We describe in this paper Hydra, an ensemble of convolutional neural networks (CNN) for geospatial land classification. The idea behind Hydra is to create an initial CNN that is coarsely optimized but provides a good starting pointing for further optimization, which will serve as the Hydra’s body. Then, the obtained weights are fine-tuned multiple times with different augmentation techniques, crop styles, and classes weights to form an ensemble of CNNs that represent the Hydra’s heads. By doing so, we prompt convergence to different endpoints, which is a desirable aspect for ensembles. With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble. We created ensembles for our experiments using two state-of-the-art CNN architectures, ResNet and DenseNet. We have demonstrated the application of our Hydra framework in two datasets, FMOW and NWPU-RESISC45, achieving results comparable to the state-of-the-art for the former and the best reported performance so far for the latter. Code and CNN models are available at https://github.com/maups/hydra-fmow
Tasks
Published 2018-02-10
URL http://arxiv.org/abs/1802.03518v2
PDF http://arxiv.org/pdf/1802.03518v2.pdf
PWC https://paperswithcode.com/paper/hydra-an-ensemble-of-convolutional-neural
Repo https://github.com/maups/hydra-fmow
Framework tf

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning

Title Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Authors Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
Abstract Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. Qualitative results (videos), code, pre-trained models, and simulation environments are available at http://vpg.cs.princeton.edu
Tasks Q-Learning
Published 2018-03-27
URL http://arxiv.org/abs/1803.09956v3
PDF http://arxiv.org/pdf/1803.09956v3.pdf
PWC https://paperswithcode.com/paper/learning-synergies-between-pushing-and
Repo https://github.com/cww97/visual-language-grasping
Framework pytorch

ATOMO: Communication-efficient Learning via Atomic Sparsification

Title ATOMO: Communication-efficient Learning via Atomic Sparsification
Authors Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos
Abstract Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster distributed training.
Tasks
Published 2018-06-11
URL http://arxiv.org/abs/1806.04090v3
PDF http://arxiv.org/pdf/1806.04090v3.pdf
PWC https://paperswithcode.com/paper/atomo-communication-efficient-learning-via
Repo https://github.com/hwang595/ATOMO
Framework pytorch

Composable Deep Reinforcement Learning for Robotic Manipulation

Title Composable Deep Reinforcement Learning for Robotic Manipulation
Authors Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine
Abstract Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.
Tasks Q-Learning
Published 2018-03-19
URL http://arxiv.org/abs/1803.06773v1
PDF http://arxiv.org/pdf/1803.06773v1.pdf
PWC https://paperswithcode.com/paper/composable-deep-reinforcement-learning-for
Repo https://github.com/haarnoja/softqlearning
Framework none

Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages

Title Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages
Authors Siddique Latif, Adnan Qayyum, Muhammad Usman, Junaid Qadir
Abstract Cross-lingual speech emotion recognition is an important task for practical applications. The performance of automatic speech emotion recognition systems degrades in cross-corpus scenarios, particularly in scenarios involving multiple languages or a previously unseen language such as Urdu for which limited or no data is available. In this study, we investigate the problem of cross-lingual emotion recognition for Urdu language and contribute URDU—the first ever spontaneous Urdu-language speech emotion database. Evaluations are performed using three different Western languages against Urdu and experimental results on different possible scenarios suggest various interesting aspects for designing more adaptive emotion recognition system for such limited languages. In results, selecting training instances of multiple languages can deliver comparable results to baseline and augmentation a fraction of testing language data while training can help to boost accuracy for speech emotion recognition. URDU data is publicly available for further research.
Tasks Emotion Recognition, Speech Emotion Recognition
Published 2018-12-15
URL http://arxiv.org/abs/1812.10411v1
PDF http://arxiv.org/pdf/1812.10411v1.pdf
PWC https://paperswithcode.com/paper/cross-lingual-speech-emotion-recognition-urdu
Repo https://github.com/siddiquelatif/URDU-Dataset
Framework none

Learning sparse transformations through backpropagation

Title Learning sparse transformations through backpropagation
Authors Peter Bloem
Abstract Many transformations in deep learning architectures are sparsely connected. When such transformations cannot be designed by hand, they can be learned, even through plain backpropagation, for instance in attention mechanisms. However, during learning, such sparse structures are often represented in a dense form, as we do not know beforehand which elements will eventually become non-zero. We introduce the adaptive, sparse hyperlayer, a method for learning a sparse transformation, paramatrized sparsely: as index-tuples with associated values. To overcome the lack of gradients from such a discrete structure, we introduce a method of randomly sampling connections, and backpropagating over the randomly wired computation graph. To show that this approach allows us to train a model to competitive performance on real data, we use it to build two architectures. First, an attention mechanism for visual classification. Second, we implement a method for differentiable sorting: specifically, learning to sort unlabeled MNIST digits, given only the correct order.
Tasks
Published 2018-10-22
URL http://arxiv.org/abs/1810.09184v1
PDF http://arxiv.org/pdf/1810.09184v1.pdf
PWC https://paperswithcode.com/paper/learning-sparse-transformations-through
Repo https://github.com/MaestroGraph/quicksort
Framework pytorch

End-to-end neural relation extraction using deep biaffine attention

Title End-to-end neural relation extraction using deep biaffine attention
Authors Dat Quoc Nguyen, Karin Verspoor
Abstract We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features. The key contribution of our model is to extend a BiLSTM-CRF-based entity recognition model with a deep biaffine attention layer to model second-order interactions between latent features for relation classification, specifically attending to the role of an entity in a directional relationship. On the benchmark “relation and entity recognition” dataset CoNLL04, experimental results show that our model outperforms previous models, producing new state-of-the-art performances.
Tasks Relation Classification, Relation Extraction
Published 2018-12-29
URL http://arxiv.org/abs/1812.11275v1
PDF http://arxiv.org/pdf/1812.11275v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-neural-relation-extraction-using
Repo https://github.com/datquocnguyen/jointRE
Framework none

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings

Title Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
Authors Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, Martin Jaggi
Abstract Keyphrase extraction is the task of automatically selecting a small set of phrases that best describe a given free text document. Supervised keyphrase extraction requires large amounts of labeled training data and generalizes very poorly outside the domain of the training data. At the same time, unsupervised systems have poor accuracy, and often do not generalize well, as they require the input document to belong to a larger corpus also given as input. Addressing these drawbacks, in this paper, we tackle keyphrase extraction from single documents with EmbedRank: a novel unsupervised method, that leverages sentence embeddings. EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data. With EmbedRank, we also explicitly increase coverage and diversity among the selected keyphrases by introducing an embedding-based maximal marginal relevance (MMR) for new phrases. A user study including over 200 votes showed that, although reducing the phrases’ semantic overlap leads to no gains in F-score, our high diversity selection is preferred by humans.
Tasks Sentence Embeddings
Published 2018-01-13
URL http://arxiv.org/abs/1801.04470v3
PDF http://arxiv.org/pdf/1801.04470v3.pdf
PWC https://paperswithcode.com/paper/simple-unsupervised-keyphrase-extraction
Repo https://github.com/swisscom/ai-research-keyphrase-extraction
Framework none

Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge

Title Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge
Authors Debanjan Chaudhuri, Agustinus Kristiadi, Jens Lehmann, Asja Fischer
Abstract Building systems that can communicate with humans is a core problem in Artificial Intelligence. This work proposes a novel neural network architecture for response selection in an end-to-end multi-turn conversational dialogue setting. The architecture applies context level attention and incorporates additional external knowledge provided by descriptions of domain-specific words. It uses a bi-directional Gated Recurrent Unit (GRU) for encoding context and responses and learns to attend over the context words given the latent response representation and vice versa.In addition, it incorporates external domain specific information using another GRU for encoding the domain keyword descriptions. This allows better representation of domain-specific keywords in responses and hence improves the overall performance. Experimental results show that our model outperforms all other state-of-the-art methods for response selection in multi-turn conversations.
Tasks
Published 2018-09-10
URL http://arxiv.org/abs/1809.03194v3
PDF http://arxiv.org/pdf/1809.03194v3.pdf
PWC https://paperswithcode.com/paper/improving-response-selection-in-multi-turn
Repo https://github.com/SmartDataAnalytics/AK-DE-biGRU
Framework pytorch
comments powered by Disqus