October 21, 2019

2709 words 13 mins read

Paper Group AWR 163

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq. SFace: An Efficient Network for Face Detection in Large Scale Variations. Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment. Learning Representat …

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing


Title	SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Authors	Taku Kudo, John Richardson
Abstract	This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece.
Tasks	Machine Translation
Published	2018-08-19
URL	http://arxiv.org/abs/1808.06226v1
PDF	http://arxiv.org/pdf/1808.06226v1.pdf
PWC	https://paperswithcode.com/paper/sentencepiece-a-simple-and-language-1
Repo	https://github.com/google/sentencepiece
Framework	tf

Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq


Title	Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq
Authors	Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius
Abstract	We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x less training time. OpenSeq2Seq currently provides building blocks for models that solve a wide range of tasks including neural machine translation, automatic speech recognition, and speech synthesis.
Tasks	Machine Translation, Speech Recognition, Speech Synthesis
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10387v2
PDF	http://arxiv.org/pdf/1805.10387v2.pdf
PWC	https://paperswithcode.com/paper/mixed-precision-training-for-nlp-and-speech
Repo	https://github.com/rickyHong/OpenSeq2Seq-repl
Framework	tf

SFace: An Efficient Network for Face Detection in Large Scale Variations


Title	SFace: An Efficient Network for Face Detection in Large Scale Variations
Authors	Jianfeng Wang, Ye Yuan, Boxun Li, Gang Yu, Sun Jian
Abstract	Face detection serves as a fundamental research topic for many applications like face recognition. Impressive progress has been made especially with the recent development of convolutional neural networks. However, the issue of large scale variations, which widely exists in high resolution images/videos, has not been well addressed in the literature. In this paper, we present a novel algorithm called SFace, which efficiently integrates the anchor-based method and anchor-free method to address the scale issues. A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations. The SFace architecture shows promising results on the new 4K-Face benchmarks. In addition, our method can run at 50 frames per second (fps) with an accuracy of 80% AP on the standard WIDER FACE dataset, which outperforms the state-of-art algorithms by almost one order of magnitude in speed while achieves comparative performance.
Tasks	Face Detection, Face Recognition
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06559v2
PDF	http://arxiv.org/pdf/1804.06559v2.pdf
PWC	https://paperswithcode.com/paper/sface-an-efficient-network-for-face-detection
Repo	https://github.com/wjfwzzc/4K-Face
Framework	none

Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment


Title	Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment
Authors	Jia Guo, Jiankang Deng, Niannan Xue, Stefanos Zafeiriou
Abstract	Face Analysis Project on MXNet
Tasks	Face Alignment, Face Recognition, Robust Face Alignment, Robust Face Recognition
Published	2018-12-05
URL	http://arxiv.org/abs/1812.01936v1
PDF	http://arxiv.org/pdf/1812.01936v1.pdf
PWC	https://paperswithcode.com/paper/stacked-dense-u-nets-with-dual-transformers
Repo	https://github.com/deepinsight/insightface
Framework	mxnet

Learning Representations of Sets through Optimized Permutations


Title	Learning Representations of Sets through Optimized Permutations
Authors	Yan Zhang, Jonathon Hare, Adam Prügel-Bennett
Abstract	Representations of sets are challenging to learn because operations on sets should be permutation-invariant. To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. The permuted set can be further processed to learn a permutation-invariant representation of that set, avoiding a bottleneck in traditional set models. We demonstrate our model’s ability to learn permutations and set representations with either explicit or implicit supervision on four datasets, on which we achieve state-of-the-art results: number sorting, image mosaics, classification from image mosaics, and visual question answering.
Tasks	Question Answering, Visual Question Answering
Published	2018-12-10
URL	http://arxiv.org/abs/1812.03928v3
PDF	http://arxiv.org/pdf/1812.03928v3.pdf
PWC	https://paperswithcode.com/paper/learning-representations-of-sets-through
Repo	https://github.com/iclr2019-anon123456/perm-optim
Framework	pytorch

Improved Mixed-Example Data Augmentation


Title	Improved Mixed-Example Data Augmentation
Authors	Cecilia Summers, Michael J. Dinneen
Abstract	In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these types of transformations make intuitive sense, recent work has demonstrated that even non-label-preserving data augmentation can be surprisingly effective, examining this type of data augmentation through linear combinations of pairs of examples. Despite their effectiveness, little is known about why such methods work. In this work, we aim to explore a new, more generalized form of this type of data augmentation in order to determine whether such linearity is necessary. By considering this broader scope of “mixed-example data augmentation”, we find a much larger space of practical augmentation techniques, including methods that improve upon previous state-of-the-art. This generalization has benefits beyond the promise of improved performance, revealing a number of types of mixed-example data augmentation that are radically different from those considered in prior work, which provides evidence that current theories for the effectiveness of such methods are incomplete and suggests that any such theory must explain a much broader phenomenon. Code is available at https://github.com/ceciliaresearch/MixedExample.
Tasks	Data Augmentation, Image Augmentation
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11272v4
PDF	http://arxiv.org/pdf/1805.11272v4.pdf
PWC	https://paperswithcode.com/paper/improved-mixed-example-data-augmentation
Repo	https://github.com/ceciliaresearch/MixedExample
Framework	tf

Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification


Title	Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification
Authors	Rodrigo Minetto, Mauricio Pamplona Segundo, Sudeep Sarkar
Abstract	We describe in this paper Hydra, an ensemble of convolutional neural networks (CNN) for geospatial land classification. The idea behind Hydra is to create an initial CNN that is coarsely optimized but provides a good starting pointing for further optimization, which will serve as the Hydra’s body. Then, the obtained weights are fine-tuned multiple times with different augmentation techniques, crop styles, and classes weights to form an ensemble of CNNs that represent the Hydra’s heads. By doing so, we prompt convergence to different endpoints, which is a desirable aspect for ensembles. With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble. We created ensembles for our experiments using two state-of-the-art CNN architectures, ResNet and DenseNet. We have demonstrated the application of our Hydra framework in two datasets, FMOW and NWPU-RESISC45, achieving results comparable to the state-of-the-art for the former and the best reported performance so far for the latter. Code and CNN models are available at https://github.com/maups/hydra-fmow
Tasks
Published	2018-02-10
URL	http://arxiv.org/abs/1802.03518v2
PDF	http://arxiv.org/pdf/1802.03518v2.pdf
PWC	https://paperswithcode.com/paper/hydra-an-ensemble-of-convolutional-neural
Repo	https://github.com/maups/hydra-fmow
Framework	tf

Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning


Title	Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Authors	Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
Abstract	Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. Qualitative results (videos), code, pre-trained models, and simulation environments are available at http://vpg.cs.princeton.edu
Tasks	Q-Learning
Published	2018-03-27
URL	http://arxiv.org/abs/1803.09956v3
PDF	http://arxiv.org/pdf/1803.09956v3.pdf
PWC	https://paperswithcode.com/paper/learning-synergies-between-pushing-and
Repo	https://github.com/cww97/visual-language-grasping
Framework	pytorch

ATOMO: Communication-efficient Learning via Atomic Sparsification


Title	ATOMO: Communication-efficient Learning via Atomic Sparsification
Authors	Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos
Abstract	Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster distributed training.
Tasks
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04090v3
PDF	http://arxiv.org/pdf/1806.04090v3.pdf
PWC	https://paperswithcode.com/paper/atomo-communication-efficient-learning-via
Repo	https://github.com/hwang595/ATOMO
Framework	pytorch

Composable Deep Reinforcement Learning for Robotic Manipulation


Title	Composable Deep Reinforcement Learning for Robotic Manipulation
Authors	Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine
Abstract	Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.
Tasks	Q-Learning
Published	2018-03-19
URL	http://arxiv.org/abs/1803.06773v1
PDF	http://arxiv.org/pdf/1803.06773v1.pdf
PWC	https://paperswithcode.com/paper/composable-deep-reinforcement-learning-for
Repo	https://github.com/haarnoja/softqlearning
Framework	none

Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages


Title	Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages
Authors	Siddique Latif, Adnan Qayyum, Muhammad Usman, Junaid Qadir
Abstract	Cross-lingual speech emotion recognition is an important task for practical applications. The performance of automatic speech emotion recognition systems degrades in cross-corpus scenarios, particularly in scenarios involving multiple languages or a previously unseen language such as Urdu for which limited or no data is available. In this study, we investigate the problem of cross-lingual emotion recognition for Urdu language and contribute URDU—the first ever spontaneous Urdu-language speech emotion database. Evaluations are performed using three different Western languages against Urdu and experimental results on different possible scenarios suggest various interesting aspects for designing more adaptive emotion recognition system for such limited languages. In results, selecting training instances of multiple languages can deliver comparable results to baseline and augmentation a fraction of testing language data while training can help to boost accuracy for speech emotion recognition. URDU data is publicly available for further research.
Tasks	Emotion Recognition, Speech Emotion Recognition
Published	2018-12-15
URL	http://arxiv.org/abs/1812.10411v1
PDF	http://arxiv.org/pdf/1812.10411v1.pdf
PWC	https://paperswithcode.com/paper/cross-lingual-speech-emotion-recognition-urdu
Repo	https://github.com/siddiquelatif/URDU-Dataset
Framework	none

Learning sparse transformations through backpropagation


Title	Learning sparse transformations through backpropagation
Authors	Peter Bloem
Abstract	Many transformations in deep learning architectures are sparsely connected. When such transformations cannot be designed by hand, they can be learned, even through plain backpropagation, for instance in attention mechanisms. However, during learning, such sparse structures are often represented in a dense form, as we do not know beforehand which elements will eventually become non-zero. We introduce the adaptive, sparse hyperlayer, a method for learning a sparse transformation, paramatrized sparsely: as index-tuples with associated values. To overcome the lack of gradients from such a discrete structure, we introduce a method of randomly sampling connections, and backpropagating over the randomly wired computation graph. To show that this approach allows us to train a model to competitive performance on real data, we use it to build two architectures. First, an attention mechanism for visual classification. Second, we implement a method for differentiable sorting: specifically, learning to sort unlabeled MNIST digits, given only the correct order.
Tasks
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09184v1
PDF	http://arxiv.org/pdf/1810.09184v1.pdf
PWC	https://paperswithcode.com/paper/learning-sparse-transformations-through
Repo	https://github.com/MaestroGraph/quicksort
Framework	pytorch

End-to-end neural relation extraction using deep biaffine attention


Title	End-to-end neural relation extraction using deep biaffine attention
Authors	Dat Quoc Nguyen, Karin Verspoor
Abstract	We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features. The key contribution of our model is to extend a BiLSTM-CRF-based entity recognition model with a deep biaffine attention layer to model second-order interactions between latent features for relation classification, specifically attending to the role of an entity in a directional relationship. On the benchmark “relation and entity recognition” dataset CoNLL04, experimental results show that our model outperforms previous models, producing new state-of-the-art performances.
Tasks	Relation Classification, Relation Extraction
Published	2018-12-29
URL	http://arxiv.org/abs/1812.11275v1
PDF	http://arxiv.org/pdf/1812.11275v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-neural-relation-extraction-using
Repo	https://github.com/datquocnguyen/jointRE
Framework	none

Simple Unsupervised Keyphrase Extraction using Sentence Embeddings


Title	Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
Authors	Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, Martin Jaggi
Abstract	Keyphrase extraction is the task of automatically selecting a small set of phrases that best describe a given free text document. Supervised keyphrase extraction requires large amounts of labeled training data and generalizes very poorly outside the domain of the training data. At the same time, unsupervised systems have poor accuracy, and often do not generalize well, as they require the input document to belong to a larger corpus also given as input. Addressing these drawbacks, in this paper, we tackle keyphrase extraction from single documents with EmbedRank: a novel unsupervised method, that leverages sentence embeddings. EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data. With EmbedRank, we also explicitly increase coverage and diversity among the selected keyphrases by introducing an embedding-based maximal marginal relevance (MMR) for new phrases. A user study including over 200 votes showed that, although reducing the phrases’ semantic overlap leads to no gains in F-score, our high diversity selection is preferred by humans.
Tasks	Sentence Embeddings
Published	2018-01-13
URL	http://arxiv.org/abs/1801.04470v3
PDF	http://arxiv.org/pdf/1801.04470v3.pdf
PWC	https://paperswithcode.com/paper/simple-unsupervised-keyphrase-extraction
Repo	https://github.com/swisscom/ai-research-keyphrase-extraction
Framework	none

Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge


Title	Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge
Authors	Debanjan Chaudhuri, Agustinus Kristiadi, Jens Lehmann, Asja Fischer
Abstract	Building systems that can communicate with humans is a core problem in Artificial Intelligence. This work proposes a novel neural network architecture for response selection in an end-to-end multi-turn conversational dialogue setting. The architecture applies context level attention and incorporates additional external knowledge provided by descriptions of domain-specific words. It uses a bi-directional Gated Recurrent Unit (GRU) for encoding context and responses and learns to attend over the context words given the latent response representation and vice versa.In addition, it incorporates external domain specific information using another GRU for encoding the domain keyword descriptions. This allows better representation of domain-specific keywords in responses and hence improves the overall performance. Experimental results show that our model outperforms all other state-of-the-art methods for response selection in multi-turn conversations.
Tasks
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03194v3
PDF	http://arxiv.org/pdf/1809.03194v3.pdf
PWC	https://paperswithcode.com/paper/improving-response-selection-in-multi-turn
Repo	https://github.com/SmartDataAnalytics/AK-DE-biGRU
Framework	pytorch