Paper Group AWR 163
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq. SFace: An Efficient Network for Face Detection in Large Scale Variations. Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment. Learning Representat …
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Title | SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing |
Authors | Taku Kudo, John Richardson |
Abstract | This paper describes SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine Translation. It provides open-source C++ and Python implementations for subword units. While existing subword segmentation tools assume that the input is pre-tokenized into word sequences, SentencePiece can train subword models directly from raw sentences, which allows us to make a purely end-to-end and language independent system. We perform a validation experiment of NMT on English-Japanese machine translation, and find that it is possible to achieve comparable accuracy to direct subword training from raw sentences. We also compare the performance of subword training and segmentation with various configurations. SentencePiece is available under the Apache 2 license at https://github.com/google/sentencepiece. |
Tasks | Machine Translation |
Published | 2018-08-19 |
URL | http://arxiv.org/abs/1808.06226v1 |
http://arxiv.org/pdf/1808.06226v1.pdf | |
PWC | https://paperswithcode.com/paper/sentencepiece-a-simple-and-language-1 |
Repo | https://github.com/google/sentencepiece |
Framework | tf |
Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq
Title | Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq |
Authors | Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius |
Abstract | We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training. Benchmarks on machine translation and speech recognition tasks show that models built using OpenSeq2Seq give state-of-the-art performance at 1.5-3x less training time. OpenSeq2Seq currently provides building blocks for models that solve a wide range of tasks including neural machine translation, automatic speech recognition, and speech synthesis. |
Tasks | Machine Translation, Speech Recognition, Speech Synthesis |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10387v2 |
http://arxiv.org/pdf/1805.10387v2.pdf | |
PWC | https://paperswithcode.com/paper/mixed-precision-training-for-nlp-and-speech |
Repo | https://github.com/rickyHong/OpenSeq2Seq-repl |
Framework | tf |
SFace: An Efficient Network for Face Detection in Large Scale Variations
Title | SFace: An Efficient Network for Face Detection in Large Scale Variations |
Authors | Jianfeng Wang, Ye Yuan, Boxun Li, Gang Yu, Sun Jian |
Abstract | Face detection serves as a fundamental research topic for many applications like face recognition. Impressive progress has been made especially with the recent development of convolutional neural networks. However, the issue of large scale variations, which widely exists in high resolution images/videos, has not been well addressed in the literature. In this paper, we present a novel algorithm called SFace, which efficiently integrates the anchor-based method and anchor-free method to address the scale issues. A new dataset called 4K-Face is also introduced to evaluate the performance of face detection with extreme large scale variations. The SFace architecture shows promising results on the new 4K-Face benchmarks. In addition, our method can run at 50 frames per second (fps) with an accuracy of 80% AP on the standard WIDER FACE dataset, which outperforms the state-of-art algorithms by almost one order of magnitude in speed while achieves comparative performance. |
Tasks | Face Detection, Face Recognition |
Published | 2018-04-18 |
URL | http://arxiv.org/abs/1804.06559v2 |
http://arxiv.org/pdf/1804.06559v2.pdf | |
PWC | https://paperswithcode.com/paper/sface-an-efficient-network-for-face-detection |
Repo | https://github.com/wjfwzzc/4K-Face |
Framework | none |
Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment
Title | Stacked Dense U-Nets with Dual Transformers for Robust Face Alignment |
Authors | Jia Guo, Jiankang Deng, Niannan Xue, Stefanos Zafeiriou |
Abstract | Face Analysis Project on MXNet |
Tasks | Face Alignment, Face Recognition, Robust Face Alignment, Robust Face Recognition |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01936v1 |
http://arxiv.org/pdf/1812.01936v1.pdf | |
PWC | https://paperswithcode.com/paper/stacked-dense-u-nets-with-dual-transformers |
Repo | https://github.com/deepinsight/insightface |
Framework | mxnet |
Learning Representations of Sets through Optimized Permutations
Title | Learning Representations of Sets through Optimized Permutations |
Authors | Yan Zhang, Jonathon Hare, Adam Prügel-Bennett |
Abstract | Representations of sets are challenging to learn because operations on sets should be permutation-invariant. To this end, we propose a Permutation-Optimisation module that learns how to permute a set end-to-end. The permuted set can be further processed to learn a permutation-invariant representation of that set, avoiding a bottleneck in traditional set models. We demonstrate our model’s ability to learn permutations and set representations with either explicit or implicit supervision on four datasets, on which we achieve state-of-the-art results: number sorting, image mosaics, classification from image mosaics, and visual question answering. |
Tasks | Question Answering, Visual Question Answering |
Published | 2018-12-10 |
URL | http://arxiv.org/abs/1812.03928v3 |
http://arxiv.org/pdf/1812.03928v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-of-sets-through |
Repo | https://github.com/iclr2019-anon123456/perm-optim |
Framework | pytorch |
Improved Mixed-Example Data Augmentation
Title | Improved Mixed-Example Data Augmentation |
Authors | Cecilia Summers, Michael J. Dinneen |
Abstract | In order to reduce overfitting, neural networks are typically trained with data augmentation, the practice of artificially generating additional training data via label-preserving transformations of existing training examples. While these types of transformations make intuitive sense, recent work has demonstrated that even non-label-preserving data augmentation can be surprisingly effective, examining this type of data augmentation through linear combinations of pairs of examples. Despite their effectiveness, little is known about why such methods work. In this work, we aim to explore a new, more generalized form of this type of data augmentation in order to determine whether such linearity is necessary. By considering this broader scope of “mixed-example data augmentation”, we find a much larger space of practical augmentation techniques, including methods that improve upon previous state-of-the-art. This generalization has benefits beyond the promise of improved performance, revealing a number of types of mixed-example data augmentation that are radically different from those considered in prior work, which provides evidence that current theories for the effectiveness of such methods are incomplete and suggests that any such theory must explain a much broader phenomenon. Code is available at https://github.com/ceciliaresearch/MixedExample. |
Tasks | Data Augmentation, Image Augmentation |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11272v4 |
http://arxiv.org/pdf/1805.11272v4.pdf | |
PWC | https://paperswithcode.com/paper/improved-mixed-example-data-augmentation |
Repo | https://github.com/ceciliaresearch/MixedExample |
Framework | tf |
Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification
Title | Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification |
Authors | Rodrigo Minetto, Mauricio Pamplona Segundo, Sudeep Sarkar |
Abstract | We describe in this paper Hydra, an ensemble of convolutional neural networks (CNN) for geospatial land classification. The idea behind Hydra is to create an initial CNN that is coarsely optimized but provides a good starting pointing for further optimization, which will serve as the Hydra’s body. Then, the obtained weights are fine-tuned multiple times with different augmentation techniques, crop styles, and classes weights to form an ensemble of CNNs that represent the Hydra’s heads. By doing so, we prompt convergence to different endpoints, which is a desirable aspect for ensembles. With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble. We created ensembles for our experiments using two state-of-the-art CNN architectures, ResNet and DenseNet. We have demonstrated the application of our Hydra framework in two datasets, FMOW and NWPU-RESISC45, achieving results comparable to the state-of-the-art for the former and the best reported performance so far for the latter. Code and CNN models are available at https://github.com/maups/hydra-fmow |
Tasks | |
Published | 2018-02-10 |
URL | http://arxiv.org/abs/1802.03518v2 |
http://arxiv.org/pdf/1802.03518v2.pdf | |
PWC | https://paperswithcode.com/paper/hydra-an-ensemble-of-convolutional-neural |
Repo | https://github.com/maups/hydra-fmow |
Framework | tf |
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning
Title | Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning |
Authors | Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser |
Abstract | Skilled robotic manipulation benefits from complex synergies between non-prehensile (e.g. pushing) and prehensile (e.g. grasping) actions: pushing can help rearrange cluttered objects to make space for arms and fingers; likewise, grasping can help displace objects to make pushing movements more precise and collision-free. In this work, we demonstrate that it is possible to discover and learn these synergies from scratch through model-free deep reinforcement learning. Our method involves training two fully convolutional networks that map from visual observations to actions: one infers the utility of pushes for a dense pixel-wise sampling of end effector orientations and locations, while the other does the same for grasping. Both networks are trained jointly in a Q-learning framework and are entirely self-supervised by trial and error, where rewards are provided from successful grasps. In this way, our policy learns pushing motions that enable future grasps, while learning grasps that can leverage past pushes. During picking experiments in both simulation and real-world scenarios, we find that our system quickly learns complex behaviors amid challenging cases of clutter, and achieves better grasping success rates and picking efficiencies than baseline alternatives after only a few hours of training. We further demonstrate that our method is capable of generalizing to novel objects. Qualitative results (videos), code, pre-trained models, and simulation environments are available at http://vpg.cs.princeton.edu |
Tasks | Q-Learning |
Published | 2018-03-27 |
URL | http://arxiv.org/abs/1803.09956v3 |
http://arxiv.org/pdf/1803.09956v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-synergies-between-pushing-and |
Repo | https://github.com/cww97/visual-language-grasping |
Framework | pytorch |
ATOMO: Communication-efficient Learning via Atomic Sparsification
Title | ATOMO: Communication-efficient Learning via Atomic Sparsification |
Authors | Hongyi Wang, Scott Sievert, Zachary Charles, Shengchao Liu, Stephen Wright, Dimitris Papailiopoulos |
Abstract | Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that recent methods such as QSGD and TernGrad are special cases of ATOMO and that sparsifiying the singular value decomposition of neural networks gradients, rather than their coordinates, can lead to significantly faster distributed training. |
Tasks | |
Published | 2018-06-11 |
URL | http://arxiv.org/abs/1806.04090v3 |
http://arxiv.org/pdf/1806.04090v3.pdf | |
PWC | https://paperswithcode.com/paper/atomo-communication-efficient-learning-via |
Repo | https://github.com/hwang595/ATOMO |
Framework | pytorch |
Composable Deep Reinforcement Learning for Robotic Manipulation
Title | Composable Deep Reinforcement Learning for Robotic Manipulation |
Authors | Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine |
Abstract | Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks. |
Tasks | Q-Learning |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06773v1 |
http://arxiv.org/pdf/1803.06773v1.pdf | |
PWC | https://paperswithcode.com/paper/composable-deep-reinforcement-learning-for |
Repo | https://github.com/haarnoja/softqlearning |
Framework | none |
Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages
Title | Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages |
Authors | Siddique Latif, Adnan Qayyum, Muhammad Usman, Junaid Qadir |
Abstract | Cross-lingual speech emotion recognition is an important task for practical applications. The performance of automatic speech emotion recognition systems degrades in cross-corpus scenarios, particularly in scenarios involving multiple languages or a previously unseen language such as Urdu for which limited or no data is available. In this study, we investigate the problem of cross-lingual emotion recognition for Urdu language and contribute URDU—the first ever spontaneous Urdu-language speech emotion database. Evaluations are performed using three different Western languages against Urdu and experimental results on different possible scenarios suggest various interesting aspects for designing more adaptive emotion recognition system for such limited languages. In results, selecting training instances of multiple languages can deliver comparable results to baseline and augmentation a fraction of testing language data while training can help to boost accuracy for speech emotion recognition. URDU data is publicly available for further research. |
Tasks | Emotion Recognition, Speech Emotion Recognition |
Published | 2018-12-15 |
URL | http://arxiv.org/abs/1812.10411v1 |
http://arxiv.org/pdf/1812.10411v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-lingual-speech-emotion-recognition-urdu |
Repo | https://github.com/siddiquelatif/URDU-Dataset |
Framework | none |
Learning sparse transformations through backpropagation
Title | Learning sparse transformations through backpropagation |
Authors | Peter Bloem |
Abstract | Many transformations in deep learning architectures are sparsely connected. When such transformations cannot be designed by hand, they can be learned, even through plain backpropagation, for instance in attention mechanisms. However, during learning, such sparse structures are often represented in a dense form, as we do not know beforehand which elements will eventually become non-zero. We introduce the adaptive, sparse hyperlayer, a method for learning a sparse transformation, paramatrized sparsely: as index-tuples with associated values. To overcome the lack of gradients from such a discrete structure, we introduce a method of randomly sampling connections, and backpropagating over the randomly wired computation graph. To show that this approach allows us to train a model to competitive performance on real data, we use it to build two architectures. First, an attention mechanism for visual classification. Second, we implement a method for differentiable sorting: specifically, learning to sort unlabeled MNIST digits, given only the correct order. |
Tasks | |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09184v1 |
http://arxiv.org/pdf/1810.09184v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-sparse-transformations-through |
Repo | https://github.com/MaestroGraph/quicksort |
Framework | pytorch |
End-to-end neural relation extraction using deep biaffine attention
Title | End-to-end neural relation extraction using deep biaffine attention |
Authors | Dat Quoc Nguyen, Karin Verspoor |
Abstract | We propose a neural network model for joint extraction of named entities and relations between them, without any hand-crafted features. The key contribution of our model is to extend a BiLSTM-CRF-based entity recognition model with a deep biaffine attention layer to model second-order interactions between latent features for relation classification, specifically attending to the role of an entity in a directional relationship. On the benchmark “relation and entity recognition” dataset CoNLL04, experimental results show that our model outperforms previous models, producing new state-of-the-art performances. |
Tasks | Relation Classification, Relation Extraction |
Published | 2018-12-29 |
URL | http://arxiv.org/abs/1812.11275v1 |
http://arxiv.org/pdf/1812.11275v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-neural-relation-extraction-using |
Repo | https://github.com/datquocnguyen/jointRE |
Framework | none |
Simple Unsupervised Keyphrase Extraction using Sentence Embeddings
Title | Simple Unsupervised Keyphrase Extraction using Sentence Embeddings |
Authors | Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl, Martin Jaggi |
Abstract | Keyphrase extraction is the task of automatically selecting a small set of phrases that best describe a given free text document. Supervised keyphrase extraction requires large amounts of labeled training data and generalizes very poorly outside the domain of the training data. At the same time, unsupervised systems have poor accuracy, and often do not generalize well, as they require the input document to belong to a larger corpus also given as input. Addressing these drawbacks, in this paper, we tackle keyphrase extraction from single documents with EmbedRank: a novel unsupervised method, that leverages sentence embeddings. EmbedRank achieves higher F-scores than graph-based state of the art systems on standard datasets and is suitable for real-time processing of large amounts of Web data. With EmbedRank, we also explicitly increase coverage and diversity among the selected keyphrases by introducing an embedding-based maximal marginal relevance (MMR) for new phrases. A user study including over 200 votes showed that, although reducing the phrases’ semantic overlap leads to no gains in F-score, our high diversity selection is preferred by humans. |
Tasks | Sentence Embeddings |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.04470v3 |
http://arxiv.org/pdf/1801.04470v3.pdf | |
PWC | https://paperswithcode.com/paper/simple-unsupervised-keyphrase-extraction |
Repo | https://github.com/swisscom/ai-research-keyphrase-extraction |
Framework | none |
Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge
Title | Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge |
Authors | Debanjan Chaudhuri, Agustinus Kristiadi, Jens Lehmann, Asja Fischer |
Abstract | Building systems that can communicate with humans is a core problem in Artificial Intelligence. This work proposes a novel neural network architecture for response selection in an end-to-end multi-turn conversational dialogue setting. The architecture applies context level attention and incorporates additional external knowledge provided by descriptions of domain-specific words. It uses a bi-directional Gated Recurrent Unit (GRU) for encoding context and responses and learns to attend over the context words given the latent response representation and vice versa.In addition, it incorporates external domain specific information using another GRU for encoding the domain keyword descriptions. This allows better representation of domain-specific keywords in responses and hence improves the overall performance. Experimental results show that our model outperforms all other state-of-the-art methods for response selection in multi-turn conversations. |
Tasks | |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03194v3 |
http://arxiv.org/pdf/1809.03194v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-response-selection-in-multi-turn |
Repo | https://github.com/SmartDataAnalytics/AK-DE-biGRU |
Framework | pytorch |