May 7, 2019

2632 words 13 mins read

Paper Group AWR 71

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. Context Gates for Neural Machine Translation. Using Fast Weights to Attend to the Recent Past. Dynamic Network Surgery for Efficient DNNs. The Parallel Knowledge Gradient Method for Batch Bayesian Optimization. Controlling Perceptual Factors in Neural Style Transfer. …

CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples


Title	CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Authors	Filip Radenović, Giorgos Tolias, Ondřej Chum
Abstract	Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes.
Tasks	Image Retrieval
Published	2016-04-08
URL	http://arxiv.org/abs/1604.02426v3
PDF	http://arxiv.org/pdf/1604.02426v3.pdf
PWC	https://paperswithcode.com/paper/cnn-image-retrieval-learns-from-bow
Repo	https://github.com/filipradenovic/cnnimageretrieval
Framework	pytorch

Context Gates for Neural Machine Translation


Title	Context Gates for Neural Machine Translation
Authors	Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, Hang Li
Abstract	In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency. Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points.
Tasks	Machine Translation
Published	2016-08-22
URL	http://arxiv.org/abs/1608.06043v3
PDF	http://arxiv.org/pdf/1608.06043v3.pdf
PWC	https://paperswithcode.com/paper/context-gates-for-neural-machine-translation
Repo	https://github.com/tuzhaopeng/nmt
Framework	none

Using Fast Weights to Attend to the Recent Past


Title	Using Fast Weights to Attend to the Recent Past
Authors	Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu
Abstract	Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These “fast weights” can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns.
Tasks
Published	2016-10-20
URL	http://arxiv.org/abs/1610.06258v3
PDF	http://arxiv.org/pdf/1610.06258v3.pdf
PWC	https://paperswithcode.com/paper/using-fast-weights-to-attend-to-the-recent
Repo	https://github.com/GokuMohandas/fast-weights
Framework	tf

Dynamic Network Surgery for Efficient DNNs


Title	Dynamic Network Surgery for Efficient DNNs
Authors	Yiwen Guo, Anbang Yao, Yurong Chen
Abstract	Deep learning has become a ubiquitous technology to improve machine intelligence. However, most of the existing deep models are structurally very complex, making them difficult to be deployed on the mobile platforms with limited computational power. In this paper, we propose a novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning. Unlike the previous methods which accomplish this task in a greedy way, we properly incorporate connection splicing into the whole process to avoid incorrect pruning and make it as a continual network maintenance. The effectiveness of our method is proved with experiments. Without any accuracy loss, our method can efficiently compress the number of parameters in LeNet-5 and AlexNet by a factor of $\bm{108}\times$ and $\bm{17.7}\times$ respectively, proving that it outperforms the recent pruning method by considerable margins. Code and some models are available at https://github.com/yiwenguo/Dynamic-Network-Surgery.
Tasks
Published	2016-08-16
URL	http://arxiv.org/abs/1608.04493v2
PDF	http://arxiv.org/pdf/1608.04493v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-network-surgery-for-efficient-dnns
Repo	https://github.com/yiwenguo/Dynamic-Network-Surgery
Framework	none

The Parallel Knowledge Gradient Method for Batch Bayesian Optimization


Title	The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
Authors	Jian Wu, Peter I. Frazier
Abstract	In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm — the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy.
Tasks
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04414v4
PDF	http://arxiv.org/pdf/1606.04414v4.pdf
PWC	https://paperswithcode.com/paper/the-parallel-knowledge-gradient-method-for
Repo	https://github.com/wujian16/Cornell-MOE
Framework	none

Controlling Perceptual Factors in Neural Style Transfer


Title	Controlling Perceptual Factors in Neural Style Transfer
Authors	Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, Aaron Hertzmann, Eli Shechtman
Abstract	Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky regions. Furthermore, by decomposing style into these perceptual factors we enable the combination of style information from multiple sources to generate new, perceptually appealing styles from existing ones. We also describe how these methods can be used to more efficiently produce large size, high-quality stylisation. Finally we show how the introduced control measures can be applied in recent methods for Fast Neural Style Transfer.
Tasks	Style Transfer
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07865v2
PDF	http://arxiv.org/pdf/1611.07865v2.pdf
PWC	https://paperswithcode.com/paper/controlling-perceptual-factors-in-neural
Repo	https://github.com/leongatys/NeuralImageSynthesis
Framework	torch

Consensus Attention-based Neural Networks for Chinese Reading Comprehension


Title	Consensus Attention-based Neural Networks for Chinese Reading Comprehension
Authors	Yiming Cui, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu
Abstract	Reading comprehension has embraced a booming in recent NLP research. Several institutes have released the Cloze-style reading comprehension data, and these have greatly accelerated the research of machine comprehension. In this work, we firstly present Chinese reading comprehension datasets, which consist of People Daily news dataset and Children’s Fairy Tale (CFT) dataset. Also, we propose a consensus attention-based neural network architecture to tackle the Cloze-style reading comprehension problem, which aims to induce a consensus attention over every words in the query. Experimental results show that the proposed neural network significantly outperforms the state-of-the-art baselines in several public datasets. Furthermore, we setup a baseline for Chinese reading comprehension task, and hopefully this would speed up the process for future research.
Tasks	Reading Comprehension
Published	2016-07-08
URL	http://arxiv.org/abs/1607.02250v3
PDF	http://arxiv.org/pdf/1607.02250v3.pdf
PWC	https://paperswithcode.com/paper/consensus-attention-based-neural-networks-for
Repo	https://github.com/ymcui/Chinese-Cloze-RC
Framework	none

Massively Multilingual Word Embeddings


Title	Massively Multilingual Word Embeddings
Authors	Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith
Abstract	We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.
Tasks	Multilingual Word Embeddings, Text Categorization, Word Embeddings
Published	2016-02-05
URL	http://arxiv.org/abs/1602.01925v2
PDF	http://arxiv.org/pdf/1602.01925v2.pdf
PWC	https://paperswithcode.com/paper/massively-multilingual-word-embeddings
Repo	https://github.com/idiap/mhan
Framework	none

Interacting Conceptual Spaces


Title	Interacting Conceptual Spaces
Authors	Josef Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Daniel Marsden, Robin Piedeleu
Abstract	We propose applying the categorical compositional scheme of [6] to conceptual space models of cognition. In order to do this we introduce the category of convex relations as a new setting for categorical compositional semantics, emphasizing the convex structure important to conceptual space applications. We show how conceptual spaces for composite types such as adjectives and verbs can be constructed. We illustrate this new model on detailed examples.
Tasks
Published	2016-08-04
URL	http://arxiv.org/abs/1608.01402v1
PDF	http://arxiv.org/pdf/1608.01402v1.pdf
PWC	https://paperswithcode.com/paper/interacting-conceptual-spaces
Repo	https://github.com/1230113202/NV-JM-DD
Framework	pytorch

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates


Title	Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates
Authors	Hiroaki Hayashi, Jayanth Koushik, Graham Neubig
Abstract	Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a new algorithm that adapts the learning rate locally for each parameter separately, and also globally for all parameters together. Specifically, we modify Adam, a popular method for training deep learning models, with a coefficient that captures properties of the objective function. Empirically, we show that our method, which we call Eve, outperforms Adam and other popular methods in training deep neural networks, like convolutional neural networks for image classification, and recurrent neural networks for language tasks.
Tasks	Image Classification, Stochastic Optimization
Published	2016-11-04
URL	http://arxiv.org/abs/1611.01505v3
PDF	http://arxiv.org/pdf/1611.01505v3.pdf
PWC	https://paperswithcode.com/paper/eve-a-gradient-based-optimization-method-with
Repo	https://github.com/muupan/chainer-eve
Framework	none

Data Programming: Creating Large Training Sets, Quickly


Title	Data Programming: Creating Large Training Sets, Quickly
Authors	Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré
Abstract	Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. We show that by explicitly representing this training set labeling process as a generative model, we can “denoise” the generated training set, and establish theoretically that we can recover the parameters of these generative models in a handful of settings. We then show how to modify a discriminative loss function to make it noise-aware, and demonstrate our method over a range of discriminative models including logistic regression and LSTMs. Experimentally, on the 2014 TAC-KBP Slot Filling challenge, we show that data programming would have led to a new winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a state-of-the-art LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable.
Tasks	Slot Filling
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07723v3
PDF	http://arxiv.org/pdf/1605.07723v3.pdf
PWC	https://paperswithcode.com/paper/data-programming-creating-large-training-sets
Repo	https://github.com/HazyResearch/metal
Framework	pytorch

Neural Architectures for Named Entity Recognition


Title	Neural Architectures for Named Entity Recognition
Authors	Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer
Abstract	State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures—one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers.
Tasks	Named Entity Recognition
Published	2016-03-04
URL	http://arxiv.org/abs/1603.01360v3
PDF	http://arxiv.org/pdf/1603.01360v3.pdf
PWC	https://paperswithcode.com/paper/neural-architectures-for-named-entity
Repo	https://github.com/karlstratos/mention2vec
Framework	none

Reactive Collision Avoidance using Evolutionary Neural Networks


Title	Reactive Collision Avoidance using Evolutionary Neural Networks
Authors	Hesham Eraqi, Youssef EmadEldin, Mohamed Moustafa
Abstract	Collision avoidance systems can play a vital role in reducing the number of accidents and saving human lives. In this paper, we introduce and validate a novel method for vehicles reactive collision avoidance using evolutionary neural networks (ENN). A single front-facing rangefinder sensor is the only input required by our method. The training process and the proposed method analysis and validation are carried out using simulation. Extensive experiments are conducted to analyse the proposed method and evaluate its performance. Firstly, we experiment the ability to learn collision avoidance in a static free track. Secondly, we analyse the effect of the rangefinder sensor resolution on the learning process. Thirdly, we experiment the ability of a vehicle to individually and simultaneously learn collision avoidance. Finally, we test the generality of the proposed method. We used a more realistic and powerful simulation environment (CarMaker), a camera as an alternative input sensor, and lane keeping as an extra feature to learn. The results are encouraging; the proposed method successfully allows vehicles to learn collision avoidance in different scenarios that are unseen during training. It also generalizes well if any of the input sensor, the simulator, or the task to be learned is changed.
Tasks
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08414v1
PDF	http://arxiv.org/pdf/1609.08414v1.pdf
PWC	https://paperswithcode.com/paper/reactive-collision-avoidance-using
Repo	https://github.com/heshameraqi/GA-NN-Car
Framework	none

Recurrent Memory Networks for Language Modeling


Title	Recurrent Memory Networks for Language Modeling
Authors	Ke Tran, Arianna Bisazza, Christof Monz
Abstract	Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin.
Tasks	Language Modelling
Published	2016-01-06
URL	http://arxiv.org/abs/1601.01272v2
PDF	http://arxiv.org/pdf/1601.01272v2.pdf
PWC	https://paperswithcode.com/paper/recurrent-memory-networks-for-language
Repo	https://github.com/simonjisu/NMT
Framework	pytorch

Smooth Imitation Learning for Online Sequence Prediction


Title	Smooth Imitation Learning for Online Sequence Prediction
Authors	Hoang M. Le, Andrew Kang, Yisong Yue, Peter Carr
Abstract	We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential context input. Since the mapping from context to behavior is often complex, we take a learning reduction approach to reduce smooth imitation learning to a regression problem using complex function classes that are regularized to ensure smoothness. We present a learning meta-algorithm that achieves fast and stable convergence to a good policy. Our approach enjoys several attractive properties, including being fully deterministic, employing an adaptive learning rate that can provably yield larger policy improvements compared to previous approaches, and the ability to ensure stable convergence. Our empirical results demonstrate significant performance gains over previous approaches.
Tasks	Imitation Learning
Published	2016-06-03
URL	http://arxiv.org/abs/1606.00968v1
PDF	http://arxiv.org/pdf/1606.00968v1.pdf
PWC	https://paperswithcode.com/paper/smooth-imitation-learning-for-online-sequence
Repo	https://github.com/lucianacendon/simile
Framework	none