July 30, 2019

2981 words 14 mins read

Paper Group AWR 21

Paper Group AWR 21

THUMT: An Open Source Toolkit for Neural Machine Translation. AMC: Attention guided Multi-modal Correlation Learning for Image Search. Data-Driven Sparse Structure Selection for Deep Neural Networks. Speech-Based Visual Question Answering. Second-Order Word Embeddings from Nearest Neighbor Topological Features. Parameter Space Noise for Exploration …

THUMT: An Open Source Toolkit for Neural Machine Translation

Title THUMT: An Open Source Toolkit for Neural Machine Translation
Authors Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, Yang Liu
Abstract This paper introduces THUMT, an open-source toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attention-based encoder-decoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk training, and semi-supervised training. It features a visualization tool for displaying the relevance between hidden states in neural networks and contextual words, which helps to analyze the internal workings of NMT. Experiments on Chinese-English datasets show that THUMT using minimum risk training significantly outperforms GroundHog, a state-of-the-art toolkit for NMT.
Tasks Machine Translation
Published 2017-06-20
URL http://arxiv.org/abs/1706.06415v1
PDF http://arxiv.org/pdf/1706.06415v1.pdf
PWC https://paperswithcode.com/paper/thumt-an-open-source-toolkit-for-neural
Repo https://github.com/insigh/THUMT
Framework tf
Title AMC: Attention guided Multi-modal Correlation Learning for Image Search
Authors Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia
Abstract Given a user’s query, traditional image search systems rank images according to its relevance to a single modality (e.g., image content or surrounding text). Nowadays, an increasing number of images on the Internet are available with associated meta data in rich modalities (e.g., titles, keywords, tags, etc.), which can be exploited for better similarity measure with queries. In this paper, we leverage visual and textual modalities for image search by learning their correlation with input query. According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities. We propose a novel Attention guided Multi-modal Correlation (AMC) learning method which consists of a jointly learned hierarchy of intra and inter-attention networks. Conditioned on query’s intent, intra-attention networks (i.e., visual intra-attention network and language intra-attention network) attend on informative parts within each modality; a multi-modal inter-attention network promotes the importance of the most query-relevant modalities. In experiments, we evaluate AMC models on the search logs from two real world image search engines and show a significant boost on the ranking of user-clicked images in search results. Additionally, we extend AMC models to caption ranking task on COCO dataset and achieve competitive results compared with recent state-of-the-arts.
Tasks Image Retrieval
Published 2017-04-03
URL http://arxiv.org/abs/1704.00763v1
PDF http://arxiv.org/pdf/1704.00763v1.pdf
PWC https://paperswithcode.com/paper/amc-attention-guided-multi-modal-correlation
Repo https://github.com/kanchen-usc/amc_att
Framework none

Data-Driven Sparse Structure Selection for Deep Neural Networks

Title Data-Driven Sparse Structure Selection for Deep Neural Networks
Authors Zehao Huang, Naiyan Wang
Abstract Deep convolutional neural networks have liberated its extraordinary power on various tasks. However, it is still very challenging to deploy state-of-the-art models into real-world applications due to their high computational complexity. How can we design a compact and effective network without massive experiments and expert knowledge? In this paper, we propose a simple and effective framework to learn and prune deep models in an end-to-end manner. In our framework, a new type of parameter – scaling factor is first introduced to scale the outputs of specific structures, such as neurons, groups or residual blocks. Then we add sparsity regularizations on these factors, and solve this optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method. By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN. Comparing with other structure selection methods that may need thousands of trials or iterative fine-tuning, our method is trained fully end-to-end in one training pass without bells and whistles. We evaluate our method, Sparse Structure Selection with several state-of-the-art CNNs, and demonstrate very promising results with adaptive depth and width selection.
Tasks
Published 2017-07-05
URL http://arxiv.org/abs/1707.01213v3
PDF http://arxiv.org/pdf/1707.01213v3.pdf
PWC https://paperswithcode.com/paper/data-driven-sparse-structure-selection-for
Repo https://github.com/huangzehao/sparse-structure-selection
Framework mxnet

Speech-Based Visual Question Answering

Title Speech-Based Visual Question Answering
Authors Ted Zhang, Dengxin Dai, Tinne Tuytelaars, Marie-Francine Moens, Luc Van Gool
Abstract This paper introduces speech-based visual question answering (VQA), the task of generating an answer given an image and a spoken question. Two methods are studied: an end-to-end, deep neural network that directly uses audio waveforms as input versus a pipelined approach that performs ASR (Automatic Speech Recognition) on the question, followed by text-based visual question answering. Furthermore, we investigate the robustness of both methods by injecting various levels of noise into the spoken question and find both methods to be tolerate noise at similar levels.
Tasks Question Answering, Speech Recognition, Visual Question Answering
Published 2017-05-01
URL http://arxiv.org/abs/1705.00464v2
PDF http://arxiv.org/pdf/1705.00464v2.pdf
PWC https://paperswithcode.com/paper/speech-based-visual-question-answering
Repo https://github.com/zted/sbvqa
Framework none

Second-Order Word Embeddings from Nearest Neighbor Topological Features

Title Second-Order Word Embeddings from Nearest Neighbor Topological Features
Authors Denis Newman-Griffis, Eric Fosler-Lussier
Abstract We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recognition. Surprisingly, we find that nearest neighbor information alone is sufficient to capture most of the performance benefits derived from using pre-trained word embeddings. Furthermore, second-order embeddings are able to handle highly heterogeneous data better than first-order representations, though at the cost of some specificity. Additionally, augmenting contextual embeddings with second-order information further improves model performance in some cases. Due to variance in the random initializations of word embeddings, utilizing nearest neighbor features from multiple first-order embedding samples can also contribute to downstream performance gains. Finally, we identify intriguing characteristics of second-order embedding spaces for further research, including much higher density and different semantic interpretations of cosine similarity.
Tasks Named Entity Recognition, Natural Language Inference, Word Embeddings
Published 2017-05-23
URL http://arxiv.org/abs/1705.08488v1
PDF http://arxiv.org/pdf/1705.08488v1.pdf
PWC https://paperswithcode.com/paper/second-order-word-embeddings-from-nearest
Repo https://github.com/drgriffis/knn-embedding
Framework tf

Parameter Space Noise for Exploration

Title Parameter Space Noise for Exploration
Authors Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz
Abstract Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent’s parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.
Tasks Continuous Control
Published 2017-06-06
URL http://arxiv.org/abs/1706.01905v2
PDF http://arxiv.org/pdf/1706.01905v2.pdf
PWC https://paperswithcode.com/paper/parameter-space-noise-for-exploration
Repo https://github.com/tensorflow/models
Framework tf

Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields

Title Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields
Authors Fei Liu, Timothy Baldwin, Trevor Cohn
Abstract Despite successful applications across a broad range of NLP tasks, conditional random fields (“CRFs”), in particular the linear-chain variant, are only able to model local features. While this has important benefits in terms of inference tractability, it limits the ability of the model to capture long-range dependencies between items. Attempts to extend CRFs to capture long-range dependencies have largely come at the cost of computational complexity and approximate inference. In this work, we propose an extension to CRFs by integrating external memory, taking inspiration from memory networks, thereby allowing CRFs to incorporate information far beyond neighbouring steps. Experiments across two tasks show substantial improvements over strong CRF and LSTM baselines.
Tasks
Published 2017-09-12
URL http://arxiv.org/abs/1709.03637v2
PDF http://arxiv.org/pdf/1709.03637v2.pdf
PWC https://paperswithcode.com/paper/capturing-long-range-contextual-dependencies
Repo https://github.com/liufly/mecrf
Framework tf

Uplift Modeling with Multiple Treatments and General Response Types

Title Uplift Modeling with Multiple Treatments and General Response Types
Authors Yan Zhao, Xiao Fang, David Simchi-Levi
Abstract Randomized experiments have been used to assist decision-making in many areas. They help people select the optimal treatment for the test population with certain statistical guarantee. However, subjects can show significant heterogeneity in response to treatments. The problem of customizing treatment assignment based on subject characteristics is known as uplift modeling, differential response analysis, or personalized treatment learning in literature. A key feature for uplift modeling is that the data is unlabeled. It is impossible to know whether the chosen treatment is optimal for an individual subject because response under alternative treatments is unobserved. This presents a challenge to both the training and the evaluation of uplift models. In this paper we describe how to obtain an unbiased estimate of the key performance metric of an uplift model, the expected response. We present a new uplift algorithm which creates a forest of randomized trees. The trees are built with a splitting criterion designed to directly optimize their uplift performance based on the proposed evaluation method. Both the evaluation method and the algorithm apply to arbitrary number of treatments and general response types. Experimental results on synthetic data and industry-provided data show that our algorithm leads to significant performance improvement over other applicable methods.
Tasks Decision Making
Published 2017-05-23
URL http://arxiv.org/abs/1705.08492v1
PDF http://arxiv.org/pdf/1705.08492v1.pdf
PWC https://paperswithcode.com/paper/uplift-modeling-with-multiple-treatments-and
Repo https://github.com/Matthias2193/APA
Framework none

A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models

Title A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models
Authors Ilija Ilievski, Jiashi Feng
Abstract Visual question answering as recently proposed multimodal learning task has enjoyed wide attention from the deep learning community. Lately, the focus was on developing new representation fusion methods and attention mechanisms to achieve superior performance. On the other hand, very little focus has been put on the models’ loss function, arguably one of the most important aspects of training deep learning models. The prevailing practice is to use cross entropy loss function that penalizes the probability given to all the answers in the vocabulary except the single most common answer for the particular question. However, the VQA evaluation function compares the predicted answer with all the ground-truth answers for the given question and if there is a matching, a partial point is given. This causes a discrepancy between the model’s cross entropy loss and the model’s accuracy as calculated by the VQA evaluation function. In this work, we propose a novel loss, termed as soft cross entropy, that considers all ground-truth answers and thus reduces the loss-accuracy discrepancy. The proposed loss leads to an improved training convergence of VQA models and an increase in accuracy as much as 1.6%.
Tasks Question Answering, Visual Question Answering
Published 2017-08-02
URL http://arxiv.org/abs/1708.00584v1
PDF http://arxiv.org/pdf/1708.00584v1.pdf
PWC https://paperswithcode.com/paper/a-simple-loss-function-for-improving-the
Repo https://github.com/ilija139/vqa-soft
Framework torch

Stochastic Answer Networks for Machine Reading Comprehension

Title Stochastic Answer Networks for Machine Reading Comprehension
Authors Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao
Abstract We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension. Compared to previous work such as ReasoNet which used reinforcement learning to determine the number of steps, the unique feature is the use of a kind of stochastic prediction dropout on the answer module (final layer) of the neural network during the training. We show that this simple trick improves robustness and achieves results competitive to the state-of-the-art on the Stanford Question Answering Dataset (SQuAD), the Adversarial SQuAD, and the Microsoft MAchine Reading COmprehension Dataset (MS MARCO).
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2017-12-10
URL http://arxiv.org/abs/1712.03556v2
PDF http://arxiv.org/pdf/1712.03556v2.pdf
PWC https://paperswithcode.com/paper/stochastic-answer-networks-for-machine
Repo https://github.com/kevinduh/san_mrc
Framework pytorch

Reinforced Mnemonic Reader for Machine Reading Comprehension

Title Reinforced Mnemonic Reader for Machine Reading Comprehension
Authors Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, Ming Zhou
Abstract In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.
Tasks Machine Reading Comprehension, Question Answering, Reading Comprehension
Published 2017-05-08
URL http://arxiv.org/abs/1705.02798v6
PDF http://arxiv.org/pdf/1705.02798v6.pdf
PWC https://paperswithcode.com/paper/reinforced-mnemonic-reader-for-machine
Repo https://github.com/yly-revive/chainer-mreader
Framework none

FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension

Title FusionNet: Fusing via Fully-Aware Attention with Application to Machine Comprehension
Authors Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, Weizhu Chen
Abstract This paper introduces a new neural structure called FusionNet, which extends existing attention approaches from three perspectives. First, it puts forward a novel concept of “history of word” to characterize attention information from the lowest word-level embedding up to the highest semantic-level representation. Second, it introduces an improved attention scoring function that better utilizes the “history of word” concept. Third, it proposes a fully-aware multi-level attention mechanism to capture the complete information in one text (such as a question) and exploit it in its counterpart (such as context or passage) layer by layer. We apply FusionNet to the Stanford Question Answering Dataset (SQuAD) and it achieves the first position for both single and ensemble model on the official SQuAD leaderboard at the time of writing (Oct. 4th, 2017). Meanwhile, we verify the generalization of FusionNet with two adversarial SQuAD datasets and it sets up the new state-of-the-art on both datasets: on AddSent, FusionNet increases the best F1 metric from 46.6% to 51.4%; on AddOneSent, FusionNet boosts the best F1 metric from 56.0% to 60.7%.
Tasks Question Answering, Reading Comprehension
Published 2017-11-16
URL http://arxiv.org/abs/1711.07341v2
PDF http://arxiv.org/pdf/1711.07341v2.pdf
PWC https://paperswithcode.com/paper/fusionnet-fusing-via-fully-aware-attention
Repo https://github.com/felixgwu/FastFusionNet
Framework pytorch

DCN+: Mixed Objective and Deep Residual Coattention for Question Answering

Title DCN+: Mixed Objective and Deep Residual Coattention for Question Answering
Authors Caiming Xiong, Victor Zhong, Richard Socher
Abstract Traditional models for question answering optimize using cross entropy loss, which encourages exact answers at the cost of penalizing nearby or overlapping answers that are sometimes equally accurate. We propose a mixed objective that combines cross entropy loss with self-critical policy learning. The objective uses rewards derived from word overlap to solve the misalignment between evaluation metric and optimization objective. In addition to the mixed objective, we improve dynamic coattention networks (DCN) with a deep residual coattention encoder that is inspired by recent work in deep self-attention and residual networks. Our proposals improve model performance across question types and input lengths, especially for long questions that requires the ability to capture long-term dependencies. On the Stanford Question Answering Dataset, our model achieves state-of-the-art results with 75.1% exact match accuracy and 83.1% F1, while the ensemble obtains 78.9% exact match accuracy and 86.0% F1.
Tasks Question Answering
Published 2017-10-31
URL http://arxiv.org/abs/1711.00106v2
PDF http://arxiv.org/pdf/1711.00106v2.pdf
PWC https://paperswithcode.com/paper/dcn-mixed-objective-and-deep-residual
Repo https://github.com/lmn-extracts/dcn_plus
Framework tf

Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set

Title Fast(er) Exact Decoding and Global Training for Transition-Based Dependency Parsing via a Minimal Feature Set
Authors Tianze Shi, Liang Huang, Lillian Lee
Abstract We first present a minimal feature set for transition-based dependency parsing, continuing a recent trend started by Kiperwasser and Goldberg (2016a) and Cross and Huang (2016a) of using bi-directional LSTM features. We plug our minimal feature set into the dynamic-programming framework of Huang and Sagae (2010) and Kuhlmann et al. (2011) to produce the first implementation of worst-case O(n^3) exact decoders for arc-hybrid and arc-eager transition systems. With our minimal features, we also present O(n^3) global training methods. Finally, using ensembles including our new parsers, we achieve the best unlabeled attachment score reported (to our knowledge) on the Chinese Treebank and the “second-best-in-class” result on the English Penn Treebank.
Tasks Dependency Parsing, Transition-Based Dependency Parsing
Published 2017-08-30
URL http://arxiv.org/abs/1708.09403v1
PDF http://arxiv.org/pdf/1708.09403v1.pdf
PWC https://paperswithcode.com/paper/faster-exact-decoding-and-global-training-for
Repo https://github.com/tzshi/dp-parser-emnlp17
Framework none

Unified Spectral Clustering with Optimal Graph

Title Unified Spectral Clustering with Optimal Graph
Authors Zhao Kang, Chong Peng, Qiang Cheng, Zenglin Xu
Abstract Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.
Tasks graph construction
Published 2017-11-12
URL http://arxiv.org/abs/1711.04258v1
PDF http://arxiv.org/pdf/1711.04258v1.pdf
PWC https://paperswithcode.com/paper/unified-spectral-clustering-with-optimal
Repo https://github.com/sckangz/AAAI18
Framework none
comments powered by Disqus