May 7, 2019

2632 words 13 mins read

Paper Group AWR 71

Paper Group AWR 71

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. Context Gates for Neural Machine Translation. Using Fast Weights to Attend to the Recent Past. Dynamic Network Surgery for Efficient DNNs. The Parallel Knowledge Gradient Method for Batch Bayesian Optimization. Controlling Perceptual Factors in Neural Style Transfer. …

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples

Title CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Authors Filip Radenović, Giorgos Tolias, Ondřej Chum
Abstract Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
Tasks Image Retrieval
Published 2016-04-08
URL http://arxiv.org/abs/1604.02426v3
PDF http://arxiv.org/pdf/1604.02426v3.pdf
PWC https://paperswithcode.com/paper/cnn-image-retrieval-learns-from-bow
Repo https://github.com/filipradenovic/cnnimageretrieval
Framework pytorch

Context Gates for Neural Machine Translation

Title Context Gates for Neural Machine Translation
Authors Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, Hang Li
Abstract In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency. Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.
Tasks Machine Translation
Published 2016-08-22
URL http://arxiv.org/abs/1608.06043v3
PDF http://arxiv.org/pdf/1608.06043v3.pdf
PWC https://paperswithcode.com/paper/context-gates-for-neural-machine-translation
Repo https://github.com/tuzhaopeng/nmt
Framework none

Using Fast Weights to Attend to the Recent Past

Title Using Fast Weights to Attend to the Recent Past
Authors Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu
Abstract Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These “fast weights” can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.
Tasks
Published 2016-10-20
URL http://arxiv.org/abs/1610.06258v3
PDF http://arxiv.org/pdf/1610.06258v3.pdf
PWC https://paperswithcode.com/paper/using-fast-weights-to-attend-to-the-recent
Repo https://github.com/GokuMohandas/fast-weights
Framework tf

Dynamic Network Surgery for Efficient DNNs

Title Dynamic Network Surgery for Efficient DNNs
Authors Yiwen Guo, Anbang Yao, Yurong Chen
Abstract Deep learning has become a ubiquitous technology to improve machine intelligence. However, most of the existing deep models are structurally very complex, making them difficult to be deployed on the mobile platforms with limited computational power. In this paper, we propose a novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning. Unlike the previous methods which accomplish this task in a greedy way, we properly incorporate connection splicing into the whole process to avoid incorrect pruning and make it as a continual network maintenance. The effectiveness of our method is proved with experiments. Without any accuracy loss, our method can efficiently compress the number of parameters in LeNet-5 and AlexNet by a factor of $\bm{108}\times$ and $\bm{17.7}\times$ respectively, proving that it outperforms the recent pruning method by considerable margins. Code and some models are available at https://github.com/yiwenguo/Dynamic-Network-Surgery.
Tasks
Published 2016-08-16
URL http://arxiv.org/abs/1608.04493v2
PDF http://arxiv.org/pdf/1608.04493v2.pdf
PWC https://paperswithcode.com/paper/dynamic-network-surgery-for-efficient-dnns
Repo https://github.com/yiwenguo/Dynamic-Network-Surgery
Framework none

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Title The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
Authors Jian Wu, Peter I. Frazier
Abstract In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm — the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.
Tasks
Published 2016-06-14
URL http://arxiv.org/abs/1606.04414v4
PDF http://arxiv.org/pdf/1606.04414v4.pdf
PWC https://paperswithcode.com/paper/the-parallel-knowledge-gradient-method-for
Repo https://github.com/wujian16/Cornell-MOE
Framework none

Controlling Perceptual Factors in Neural Style Transfer

Title Controlling Perceptual Factors in Neural Style Transfer
Authors Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, Aaron Hertzmann, Eli Shechtman
Abstract Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky regions. Furthermore, by decomposing style into these perceptual factors we enable the combination of style information from multiple sources to generate new, perceptually appealing styles from existing ones. We also describe how these methods can be used to more efficiently produce large size, high-quality stylisation. Finally we show how the introduced control measures can be applied in recent methods for Fast Neural Style Transfer.
Tasks Style Transfer
Published 2016-11-23
URL http://arxiv.org/abs/1611.07865v2
PDF http://arxiv.org/pdf/1611.07865v2.pdf
PWC https://paperswithcode.com/paper/controlling-perceptual-factors-in-neural
Repo https://github.com/leongatys/NeuralImageSynthesis
Framework torch

Consensus Attention-based Neural Networks for Chinese Reading Comprehension

Title Consensus Attention-based Neural Networks for Chinese Reading Comprehension
Authors Yiming Cui, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu
Abstract Reading comprehension has embraced a booming in recent NLP research. Several institutes have released the Cloze-style reading comprehension data, and these have greatly accelerated the research of machine comprehension. In this work, we firstly present Chinese reading comprehension datasets, which consist of People Daily news dataset and Children’s Fairy Tale (CFT) dataset. Also, we propose a consensus attention-based neural network architecture to tackle the Cloze-style reading comprehension problem, which aims to induce a consensus attention over every words in the query. Experimental results show that the proposed neural network significantly outperforms the state-of-the-art baselines in several public datasets. Furthermore, we setup a baseline for Chinese reading comprehension task, and hopefully this would speed up the process for future research.
Tasks Reading Comprehension
Published 2016-07-08
URL http://arxiv.org/abs/1607.02250v3
PDF http://arxiv.org/pdf/1607.02250v3.pdf
PWC https://paperswithcode.com/paper/consensus-attention-based-neural-networks-for
Repo https://github.com/ymcui/Chinese-Cloze-RC
Framework none

Massively Multilingual Word Embeddings

Title Massively Multilingual Word Embeddings
Authors Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith
Abstract We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.
Tasks Multilingual Word Embeddings, Text Categorization, Word Embeddings
Published 2016-02-05
URL http://arxiv.org/abs/1602.01925v2
PDF http://arxiv.org/pdf/1602.01925v2.pdf
PWC https://paperswithcode.com/paper/massively-multilingual-word-embeddings
Repo https://github.com/idiap/mhan
Framework none

Interacting Conceptual Spaces

Title Interacting Conceptual Spaces
Authors Josef Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Daniel Marsden, Robin Piedeleu
Abstract We propose applying the categorical compositional scheme of [6] to conceptual space models of cognition. In order to do this we introduce the category of convex relations as a new setting for categorical compositional semantics, emphasizing the convex structure important to conceptual space applications. We show how conceptual spaces for composite types such as adjectives and verbs can be constructed. We illustrate this new model on detailed examples.
Tasks
Published 2016-08-04
URL http://arxiv.org/abs/1608.01402v1
PDF http://arxiv.org/pdf/1608.01402v1.pdf
PWC https://paperswithcode.com/paper/interacting-conceptual-spaces
Repo https://github.com/1230113202/NV-JM-DD
Framework pytorch

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

Title Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates
Authors Hiroaki Hayashi, Jayanth Koushik, Graham Neubig
Abstract Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a new algorithm that adapts the learning rate locally for each parameter separately, and also globally for all parameters together. Specifically, we modify Adam, a popular method for training deep learning models, with a coefficient that captures properties of the objective function. Empirically, we show that our method, which we call Eve, outperforms Adam and other popular methods in training deep neural networks, like convolutional neural networks for image classification, and recurrent neural networks for language tasks.
Tasks Image Classification, Stochastic Optimization
Published 2016-11-04
URL http://arxiv.org/abs/1611.01505v3
PDF http://arxiv.org/pdf/1611.01505v3.pdf
PWC https://paperswithcode.com/paper/eve-a-gradient-based-optimization-method-with
Repo https://github.com/muupan/chainer-eve
Framework none

Data Programming: Creating Large Training Sets, Quickly

Title Data Programming: Creating Large Training Sets, Quickly
Authors Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré
Abstract Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. We show that by explicitly representing this training set labeling process as a generative model, we can “denoise” the generated training set, and establish theoretically that we can recover the parameters of these generative models in a handful of settings. We then show how to modify a discriminative loss function to make it noise-aware, and demonstrate our method over a range of discriminative models including logistic regression and LSTMs. Experimentally, on the 2014 TAC-KBP Slot Filling challenge, we show that data programming would have led to a new winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a state-of-the-art LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.
Tasks Slot Filling
Published 2016-05-25
URL http://arxiv.org/abs/1605.07723v3
PDF http://arxiv.org/pdf/1605.07723v3.pdf
PWC https://paperswithcode.com/paper/data-programming-creating-large-training-sets
Repo https://github.com/HazyResearch/metal
Framework pytorch

Neural Architectures for Named Entity Recognition

Title Neural Architectures for Named Entity Recognition
Authors Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer
Abstract State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures—one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.
Tasks Named Entity Recognition
Published 2016-03-04
URL http://arxiv.org/abs/1603.01360v3
PDF http://arxiv.org/pdf/1603.01360v3.pdf
PWC https://paperswithcode.com/paper/neural-architectures-for-named-entity
Repo https://github.com/karlstratos/mention2vec
Framework none

Reactive Collision Avoidance using Evolutionary Neural Networks

Title Reactive Collision Avoidance using Evolutionary Neural Networks
Authors Hesham Eraqi, Youssef EmadEldin, Mohamed Moustafa
Abstract Collision avoidance systems can play a vital role in reducing the number of accidents and saving human lives. In this paper, we introduce and validate a novel method for vehicles reactive collision avoidance using evolutionary neural networks (ENN). A single front-facing rangefinder sensor is the only input required by our method. The training process and the proposed method analysis and validation are carried out using simulation. Extensive experiments are conducted to analyse the proposed method and evaluate its performance. Firstly, we experiment the ability to learn collision avoidance in a static free track. Secondly, we analyse the effect of the rangefinder sensor resolution on the learning process. Thirdly, we experiment the ability of a vehicle to individually and simultaneously learn collision avoidance. Finally, we test the generality of the proposed method. We used a more realistic and powerful simulation environment (CarMaker), a camera as an alternative input sensor, and lane keeping as an extra feature to learn. The results are encouraging; the proposed method successfully allows vehicles to learn collision avoidance in different scenarios that are unseen during training. It also generalizes well if any of the input sensor, the simulator, or the task to be learned is changed.
Tasks
Published 2016-09-27
URL http://arxiv.org/abs/1609.08414v1
PDF http://arxiv.org/pdf/1609.08414v1.pdf
PWC https://paperswithcode.com/paper/reactive-collision-avoidance-using
Repo https://github.com/heshameraqi/GA-NN-Car
Framework none

Recurrent Memory Networks for Language Modeling

Title Recurrent Memory Networks for Language Modeling
Authors Ke Tran, Arianna Bisazza, Christof Monz
Abstract Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin.
Tasks Language Modelling
Published 2016-01-06
URL http://arxiv.org/abs/1601.01272v2
PDF http://arxiv.org/pdf/1601.01272v2.pdf
PWC https://paperswithcode.com/paper/recurrent-memory-networks-for-language
Repo https://github.com/simonjisu/NMT
Framework pytorch

Smooth Imitation Learning for Online Sequence Prediction

Title Smooth Imitation Learning for Online Sequence Prediction
Authors Hoang M. Le, Andrew Kang, Yisong Yue, Peter Carr
Abstract We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential context input. Since the mapping from context to behavior is often complex, we take a learning reduction approach to reduce smooth imitation learning to a regression problem using complex function classes that are regularized to ensure smoothness. We present a learning meta-algorithm that achieves fast and stable convergence to a good policy. Our approach enjoys several attractive properties, including being fully deterministic, employing an adaptive learning rate that can provably yield larger policy improvements compared to previous approaches, and the ability to ensure stable convergence. Our empirical results demonstrate significant performance gains over previous approaches.
Tasks Imitation Learning
Published 2016-06-03
URL http://arxiv.org/abs/1606.00968v1
PDF http://arxiv.org/pdf/1606.00968v1.pdf
PWC https://paperswithcode.com/paper/smooth-imitation-learning-for-online-sequence
Repo https://github.com/lucianacendon/simile
Framework none
comments powered by Disqus