Paper Group AWR 71
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. Context Gates for Neural Machine Translation. Using Fast Weights to Attend to the Recent Past. Dynamic Network Surgery for Efficient DNNs. The Parallel Knowledge Gradient Method for Batch Bayesian Optimization. Controlling Perceptual Factors in Neural Style Transfer. …
CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples
Title | CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples |
Authors | Filip Radenović, Giorgos Tolias, Ondřej Chum |
Abstract | Convolutional Neural Networks (CNNs) achieve state-of-the-art performance in many computer vision tasks. However, this achievement is preceded by extreme manual annotation in order to perform either training from scratch or fine-tuning for the target task. In this work, we propose to fine-tune CNN for image retrieval from a large collection of unordered images in a fully automated manner. We employ state-of-the-art retrieval and Structure-from-Motion (SfM) methods to obtain 3D models, which are used to guide the selection of the training data for CNN fine-tuning. We show that both hard positive and hard negative examples enhance the final performance in particular object retrieval with compact codes. |
Tasks | Image Retrieval |
Published | 2016-04-08 |
URL | http://arxiv.org/abs/1604.02426v3 |
http://arxiv.org/pdf/1604.02426v3.pdf | |
PWC | https://paperswithcode.com/paper/cnn-image-retrieval-learns-from-bow |
Repo | https://github.com/filipradenovic/cnnimageretrieval |
Framework | pytorch |
Context Gates for Neural Machine Translation
Title | Context Gates for Neural Machine Translation |
Authors | Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, Hang Li |
Abstract | In neural machine translation (NMT), generation of a target word depends on both source and target contexts. We find that source contexts have a direct impact on the adequacy of a translation while target contexts affect the fluency. Intuitively, generation of a content word should rely more on the source context and generation of a functional word should rely more on the target context. Due to the lack of effective control over the influence from source and target contexts, conventional NMT tends to yield fluent but inadequate translations. To address this problem, we propose context gates which dynamically control the ratios at which source and target contexts contribute to the generation of target words. In this way, we can enhance both the adequacy and fluency of NMT with more careful control of the information flow from contexts. Experiments show that our approach significantly improves upon a standard attention-based NMT system by +2.3 BLEU points. |
Tasks | Machine Translation |
Published | 2016-08-22 |
URL | http://arxiv.org/abs/1608.06043v3 |
http://arxiv.org/pdf/1608.06043v3.pdf | |
PWC | https://paperswithcode.com/paper/context-gates-for-neural-machine-translation |
Repo | https://github.com/tuzhaopeng/nmt |
Framework | none |
Using Fast Weights to Attend to the Recent Past
Title | Using Fast Weights to Attend to the Recent Past |
Authors | Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu |
Abstract | Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs. There is no good reason for this restriction. Synapses have dynamics at many different time-scales and this suggests that artificial neural networks might benefit from variables that change slower than activities but much faster than the standard weights. These “fast weights” can be used to store temporary memories of the recent past and they provide a neurally plausible way of implementing the type of attention to the past that has recently proved very helpful in sequence-to-sequence models. By using fast weights we can avoid the need to store copies of neural activity patterns. |
Tasks | |
Published | 2016-10-20 |
URL | http://arxiv.org/abs/1610.06258v3 |
http://arxiv.org/pdf/1610.06258v3.pdf | |
PWC | https://paperswithcode.com/paper/using-fast-weights-to-attend-to-the-recent |
Repo | https://github.com/GokuMohandas/fast-weights |
Framework | tf |
Dynamic Network Surgery for Efficient DNNs
Title | Dynamic Network Surgery for Efficient DNNs |
Authors | Yiwen Guo, Anbang Yao, Yurong Chen |
Abstract | Deep learning has become a ubiquitous technology to improve machine intelligence. However, most of the existing deep models are structurally very complex, making them difficult to be deployed on the mobile platforms with limited computational power. In this paper, we propose a novel network compression method called dynamic network surgery, which can remarkably reduce the network complexity by making on-the-fly connection pruning. Unlike the previous methods which accomplish this task in a greedy way, we properly incorporate connection splicing into the whole process to avoid incorrect pruning and make it as a continual network maintenance. The effectiveness of our method is proved with experiments. Without any accuracy loss, our method can efficiently compress the number of parameters in LeNet-5 and AlexNet by a factor of $\bm{108}\times$ and $\bm{17.7}\times$ respectively, proving that it outperforms the recent pruning method by considerable margins. Code and some models are available at https://github.com/yiwenguo/Dynamic-Network-Surgery. |
Tasks | |
Published | 2016-08-16 |
URL | http://arxiv.org/abs/1608.04493v2 |
http://arxiv.org/pdf/1608.04493v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-network-surgery-for-efficient-dnns |
Repo | https://github.com/yiwenguo/Dynamic-Network-Surgery |
Framework | none |
The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
Title | The Parallel Knowledge Gradient Method for Batch Bayesian Optimization |
Authors | Jian Wu, Peter I. Frazier |
Abstract | In many applications of black-box optimization, one can evaluate multiple points simultaneously, e.g. when evaluating the performances of several different neural network architectures in a parallel computing environment. In this paper, we develop a novel batch Bayesian optimization algorithm — the parallel knowledge gradient method. By construction, this method provides the one-step Bayes-optimal batch of points to sample. We provide an efficient strategy for computing this Bayes-optimal batch of points, and we demonstrate that the parallel knowledge gradient method finds global optima significantly faster than previous batch Bayesian optimization algorithms on both synthetic test functions and when tuning hyperparameters of practical machine learning algorithms, especially when function evaluations are noisy. |
Tasks | |
Published | 2016-06-14 |
URL | http://arxiv.org/abs/1606.04414v4 |
http://arxiv.org/pdf/1606.04414v4.pdf | |
PWC | https://paperswithcode.com/paper/the-parallel-knowledge-gradient-method-for |
Repo | https://github.com/wujian16/Cornell-MOE |
Framework | none |
Controlling Perceptual Factors in Neural Style Transfer
Title | Controlling Perceptual Factors in Neural Style Transfer |
Authors | Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, Aaron Hertzmann, Eli Shechtman |
Abstract | Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky regions. Furthermore, by decomposing style into these perceptual factors we enable the combination of style information from multiple sources to generate new, perceptually appealing styles from existing ones. We also describe how these methods can be used to more efficiently produce large size, high-quality stylisation. Finally we show how the introduced control measures can be applied in recent methods for Fast Neural Style Transfer. |
Tasks | Style Transfer |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07865v2 |
http://arxiv.org/pdf/1611.07865v2.pdf | |
PWC | https://paperswithcode.com/paper/controlling-perceptual-factors-in-neural |
Repo | https://github.com/leongatys/NeuralImageSynthesis |
Framework | torch |
Consensus Attention-based Neural Networks for Chinese Reading Comprehension
Title | Consensus Attention-based Neural Networks for Chinese Reading Comprehension |
Authors | Yiming Cui, Ting Liu, Zhipeng Chen, Shijin Wang, Guoping Hu |
Abstract | Reading comprehension has embraced a booming in recent NLP research. Several institutes have released the Cloze-style reading comprehension data, and these have greatly accelerated the research of machine comprehension. In this work, we firstly present Chinese reading comprehension datasets, which consist of People Daily news dataset and Children’s Fairy Tale (CFT) dataset. Also, we propose a consensus attention-based neural network architecture to tackle the Cloze-style reading comprehension problem, which aims to induce a consensus attention over every words in the query. Experimental results show that the proposed neural network significantly outperforms the state-of-the-art baselines in several public datasets. Furthermore, we setup a baseline for Chinese reading comprehension task, and hopefully this would speed up the process for future research. |
Tasks | Reading Comprehension |
Published | 2016-07-08 |
URL | http://arxiv.org/abs/1607.02250v3 |
http://arxiv.org/pdf/1607.02250v3.pdf | |
PWC | https://paperswithcode.com/paper/consensus-attention-based-neural-networks-for |
Repo | https://github.com/ymcui/Chinese-Cloze-RC |
Framework | none |
Massively Multilingual Word Embeddings
Title | Massively Multilingual Word Embeddings |
Authors | Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith |
Abstract | We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods. |
Tasks | Multilingual Word Embeddings, Text Categorization, Word Embeddings |
Published | 2016-02-05 |
URL | http://arxiv.org/abs/1602.01925v2 |
http://arxiv.org/pdf/1602.01925v2.pdf | |
PWC | https://paperswithcode.com/paper/massively-multilingual-word-embeddings |
Repo | https://github.com/idiap/mhan |
Framework | none |
Interacting Conceptual Spaces
Title | Interacting Conceptual Spaces |
Authors | Josef Bolt, Bob Coecke, Fabrizio Genovese, Martha Lewis, Daniel Marsden, Robin Piedeleu |
Abstract | We propose applying the categorical compositional scheme of [6] to conceptual space models of cognition. In order to do this we introduce the category of convex relations as a new setting for categorical compositional semantics, emphasizing the convex structure important to conceptual space applications. We show how conceptual spaces for composite types such as adjectives and verbs can be constructed. We illustrate this new model on detailed examples. |
Tasks | |
Published | 2016-08-04 |
URL | http://arxiv.org/abs/1608.01402v1 |
http://arxiv.org/pdf/1608.01402v1.pdf | |
PWC | https://paperswithcode.com/paper/interacting-conceptual-spaces |
Repo | https://github.com/1230113202/NV-JM-DD |
Framework | pytorch |
Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates
Title | Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates |
Authors | Hiroaki Hayashi, Jayanth Koushik, Graham Neubig |
Abstract | Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally. However, there is also a global learning rate which must be tuned in order to get the best performance. In this paper, we present a new algorithm that adapts the learning rate locally for each parameter separately, and also globally for all parameters together. Specifically, we modify Adam, a popular method for training deep learning models, with a coefficient that captures properties of the objective function. Empirically, we show that our method, which we call Eve, outperforms Adam and other popular methods in training deep neural networks, like convolutional neural networks for image classification, and recurrent neural networks for language tasks. |
Tasks | Image Classification, Stochastic Optimization |
Published | 2016-11-04 |
URL | http://arxiv.org/abs/1611.01505v3 |
http://arxiv.org/pdf/1611.01505v3.pdf | |
PWC | https://paperswithcode.com/paper/eve-a-gradient-based-optimization-method-with |
Repo | https://github.com/muupan/chainer-eve |
Framework | none |
Data Programming: Creating Large Training Sets, Quickly
Title | Data Programming: Creating Large Training Sets, Quickly |
Authors | Alexander Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, Christopher Ré |
Abstract | Large labeled training sets are the critical building blocks of supervised learning methods and are key enablers of deep learning techniques. For some applications, creating labeled training sets is the most time-consuming and expensive part of applying machine learning. We therefore propose a paradigm for the programmatic creation of training sets called data programming in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. We show that by explicitly representing this training set labeling process as a generative model, we can “denoise” the generated training set, and establish theoretically that we can recover the parameters of these generative models in a handful of settings. We then show how to modify a discriminative loss function to make it noise-aware, and demonstrate our method over a range of discriminative models including logistic regression and LSTMs. Experimentally, on the 2014 TAC-KBP Slot Filling challenge, we show that data programming would have led to a new winning score, and also show that applying data programming to an LSTM model leads to a TAC-KBP score almost 6 F1 points over a state-of-the-art LSTM baseline (and into second place in the competition). Additionally, in initial user studies we observed that data programming may be an easier way for non-experts to create machine learning models when training data is limited or unavailable. |
Tasks | Slot Filling |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07723v3 |
http://arxiv.org/pdf/1605.07723v3.pdf | |
PWC | https://paperswithcode.com/paper/data-programming-creating-large-training-sets |
Repo | https://github.com/HazyResearch/metal |
Framework | pytorch |
Neural Architectures for Named Entity Recognition
Title | Neural Architectures for Named Entity Recognition |
Authors | Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer |
Abstract | State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available. In this paper, we introduce two new neural architectures—one based on bidirectional LSTMs and conditional random fields, and the other that constructs and labels segments using a transition-based approach inspired by shift-reduce parsers. Our models rely on two sources of information about words: character-based word representations learned from the supervised corpus and unsupervised word representations learned from unannotated corpora. Our models obtain state-of-the-art performance in NER in four languages without resorting to any language-specific knowledge or resources such as gazetteers. |
Tasks | Named Entity Recognition |
Published | 2016-03-04 |
URL | http://arxiv.org/abs/1603.01360v3 |
http://arxiv.org/pdf/1603.01360v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-architectures-for-named-entity |
Repo | https://github.com/karlstratos/mention2vec |
Framework | none |
Reactive Collision Avoidance using Evolutionary Neural Networks
Title | Reactive Collision Avoidance using Evolutionary Neural Networks |
Authors | Hesham Eraqi, Youssef EmadEldin, Mohamed Moustafa |
Abstract | Collision avoidance systems can play a vital role in reducing the number of accidents and saving human lives. In this paper, we introduce and validate a novel method for vehicles reactive collision avoidance using evolutionary neural networks (ENN). A single front-facing rangefinder sensor is the only input required by our method. The training process and the proposed method analysis and validation are carried out using simulation. Extensive experiments are conducted to analyse the proposed method and evaluate its performance. Firstly, we experiment the ability to learn collision avoidance in a static free track. Secondly, we analyse the effect of the rangefinder sensor resolution on the learning process. Thirdly, we experiment the ability of a vehicle to individually and simultaneously learn collision avoidance. Finally, we test the generality of the proposed method. We used a more realistic and powerful simulation environment (CarMaker), a camera as an alternative input sensor, and lane keeping as an extra feature to learn. The results are encouraging; the proposed method successfully allows vehicles to learn collision avoidance in different scenarios that are unseen during training. It also generalizes well if any of the input sensor, the simulator, or the task to be learned is changed. |
Tasks | |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08414v1 |
http://arxiv.org/pdf/1609.08414v1.pdf | |
PWC | https://paperswithcode.com/paper/reactive-collision-avoidance-using |
Repo | https://github.com/heshameraqi/GA-NN-Car |
Framework | none |
Recurrent Memory Networks for Language Modeling
Title | Recurrent Memory Networks for Language Modeling |
Authors | Ke Tran, Arianna Bisazza, Christof Monz |
Abstract | Recurrent Neural Networks (RNN) have obtained excellent result in many natural language processing (NLP) tasks. However, understanding and interpreting the source of this success remains a challenge. In this paper, we propose Recurrent Memory Network (RMN), a novel RNN architecture, that not only amplifies the power of RNN but also facilitates our understanding of its internal functioning and allows us to discover underlying patterns in data. We demonstrate the power of RMN on language modeling and sentence completion tasks. On language modeling, RMN outperforms Long Short-Term Memory (LSTM) network on three large German, Italian, and English dataset. Additionally we perform in-depth analysis of various linguistic dimensions that RMN captures. On Sentence Completion Challenge, for which it is essential to capture sentence coherence, our RMN obtains 69.2% accuracy, surpassing the previous state-of-the-art by a large margin. |
Tasks | Language Modelling |
Published | 2016-01-06 |
URL | http://arxiv.org/abs/1601.01272v2 |
http://arxiv.org/pdf/1601.01272v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-memory-networks-for-language |
Repo | https://github.com/simonjisu/NMT |
Framework | pytorch |
Smooth Imitation Learning for Online Sequence Prediction
Title | Smooth Imitation Learning for Online Sequence Prediction |
Authors | Hoang M. Le, Andrew Kang, Yisong Yue, Peter Carr |
Abstract | We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential context input. Since the mapping from context to behavior is often complex, we take a learning reduction approach to reduce smooth imitation learning to a regression problem using complex function classes that are regularized to ensure smoothness. We present a learning meta-algorithm that achieves fast and stable convergence to a good policy. Our approach enjoys several attractive properties, including being fully deterministic, employing an adaptive learning rate that can provably yield larger policy improvements compared to previous approaches, and the ability to ensure stable convergence. Our empirical results demonstrate significant performance gains over previous approaches. |
Tasks | Imitation Learning |
Published | 2016-06-03 |
URL | http://arxiv.org/abs/1606.00968v1 |
http://arxiv.org/pdf/1606.00968v1.pdf | |
PWC | https://paperswithcode.com/paper/smooth-imitation-learning-for-online-sequence |
Repo | https://github.com/lucianacendon/simile |
Framework | none |