July 29, 2019

2956 words 14 mins read

Paper Group AWR 99

Improving Lexical Choice in Neural Machine Translation. Joint Matrix-Tensor Factorization for Knowledge Base Inference. DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding. Hierarchical Attentive Recurrent Tracking. Training Ensembles to Detect Adversarial Examples. QCD-Aware Recursive Neural Networks for Jet Physics. …

Improving Lexical Choice in Neural Machine Translation


Title	Improving Lexical Choice in Neural Machine Translation
Authors	Toan Q. Nguyen, David Chiang
Abstract	We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.
Tasks	Machine Translation, Word Embeddings
Published	2017-10-03
URL	http://arxiv.org/abs/1710.01329v3
PDF	http://arxiv.org/pdf/1710.01329v3.pdf
PWC	https://paperswithcode.com/paper/improving-lexical-choice-in-neural-machine
Repo	https://github.com/arturo-garza/NMTLexiconModel
Framework	none

Joint Matrix-Tensor Factorization for Knowledge Base Inference


Title	Joint Matrix-Tensor Factorization for Knowledge Base Inference
Authors	Prachi Jain, Shikhar Murty, Mausam, Soumen Chakrabarti
Abstract	While several matrix factorization (MF) and tensor factorization (TF) models have been proposed for knowledge base (KB) inference, they have rarely been compared across various datasets. Is there a single model that performs well across datasets? If not, what characteristics of a dataset determine the performance of MF and TF models? Is there a joint TF+MF model that performs robustly on all datasets? We perform an extensive evaluation to compare popular KB inference models across popular datasets in the literature. In addition to answering the questions above, we remove a limitation in the standard evaluation protocol for MF models, propose an extension to MF models so that they can better handle out-of-vocabulary (OOV) entity pairs, and develop a novel combination of TF and MF models. We also analyze and explain the results based on models and dataset characteristics. Our best model is robust, and obtains strong results across all datasets.
Tasks
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00637v1
PDF	http://arxiv.org/pdf/1706.00637v1.pdf
PWC	https://paperswithcode.com/paper/joint-matrix-tensor-factorization-for
Repo	https://github.com/MurtyShikhar/KBI
Framework	none

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding


Title	DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
Authors	Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, Chengqi Zhang
Abstract	Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.
Tasks	Natural Language Inference, Sentence Embedding
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04696v3
PDF	http://arxiv.org/pdf/1709.04696v3.pdf
PWC	https://paperswithcode.com/paper/disan-directional-self-attention-network-for
Repo	https://github.com/taoshen58/DiSAN
Framework	tf

Hierarchical Attentive Recurrent Tracking


Title	Hierarchical Attentive Recurrent Tracking
Authors	Adam R. Kosiorek, Alex Bewley, Ingmar Posner
Abstract	Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori. Inspired by how the human visual cortex employs spatial attention and separate “where” and “what” processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos. The first layer of attention discards the majority of background by selecting a region containing the object of interest, while the subsequent layers tune in on visual features particular to the tracked object. This framework is fully differentiable and can be trained in a purely data driven fashion by gradient methods. To improve training convergence, we augment the loss function with terms for a number of auxiliary tasks relevant for tracking. Evaluation of the proposed model is performed on two datasets: pedestrian tracking on the KTH activity recognition dataset and the more difficult KITTI object tracking dataset.
Tasks	Activity Recognition, Object Tracking
Published	2017-06-28
URL	http://arxiv.org/abs/1706.09262v2
PDF	http://arxiv.org/pdf/1706.09262v2.pdf
PWC	https://paperswithcode.com/paper/hierarchical-attentive-recurrent-tracking
Repo	https://github.com/akosiorek/hart
Framework	tf

Training Ensembles to Detect Adversarial Examples


Title	Training Ensembles to Detect Adversarial Examples
Authors	Alexander Bagnall, Razvan Bunescu, Gordon Stewart
Abstract	We propose a new ensemble method for detecting and classifying adversarial examples generated by state-of-the-art attacks, including DeepFool and C&W. Our method works by training the members of an ensemble to have low classification error on random benign examples while simultaneously minimizing agreement on examples outside the training distribution. We evaluate on both MNIST and CIFAR-10, against oblivious and both white- and black-box adversaries.
Tasks
Published	2017-12-11
URL	http://arxiv.org/abs/1712.04006v1
PDF	http://arxiv.org/pdf/1712.04006v1.pdf
PWC	https://paperswithcode.com/paper/training-ensembles-to-detect-adversarial
Repo	https://github.com/bagnalla/ensemble_detect_adv
Framework	tf

QCD-Aware Recursive Neural Networks for Jet Physics


Title	QCD-Aware Recursive Neural Networks for Jet Physics
Authors	Gilles Louppe, Kyunghyun Cho, Cyril Becot, Kyle Cranmer
Abstract	Recent progress in applying machine learning for jet physics has been built upon an analogy between calorimeters and images. In this work, we present a novel class of recursive neural networks built instead upon an analogy between QCD and natural languages. In the analogy, four-momenta are like words and the clustering history of sequential recombination jet algorithms is like the parsing of a sentence. Our approach works directly with the four-momenta of a variable-length set of particles, and the jet-based tree structure varies on an event-by-event basis. Our experiments highlight the flexibility of our method for building task-specific jet embeddings and show that recursive architectures are significantly more accurate and data efficient than previous image-based networks. We extend the analogy from individual jets (sentences) to full events (paragraphs), and show for the first time an event-level classifier operating on all the stable particles produced in an LHC event.
Tasks
Published	2017-02-02
URL	http://arxiv.org/abs/1702.00748v2
PDF	http://arxiv.org/pdf/1702.00748v2.pdf
PWC	https://paperswithcode.com/paper/qcd-aware-recursive-neural-networks-for-jet
Repo	https://github.com/SebastianMacaluso/RecNN_PyTorch_batch
Framework	pytorch

Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification


Title	Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification
Authors	Sergey Korolev, Amir Safiullin, Mikhail Belyaev, Yulia Dodonova
Abstract	In the recent years there have been a number of studies that applied deep learning algorithms to neuroimaging data. Pipelines used in those studies mostly require multiple processing steps for feature extraction, although modern advancements in deep learning for image classification can provide a powerful framework for automatic feature generation and more straightforward analysis. In this paper, we show how similar performance can be achieved skipping these feature extraction steps with the residual and plain 3D convolutional neural network architectures. We demonstrate the performance of the proposed approach for classification of Alzheimer’s disease versus mild cognitive impairment and normal controls on the Alzheimer’s Disease National Initiative (ADNI) dataset of 3D structural MRI brain scans.
Tasks	Image Classification
Published	2017-01-23
URL	http://arxiv.org/abs/1701.06643v1
PDF	http://arxiv.org/pdf/1701.06643v1.pdf
PWC	https://paperswithcode.com/paper/residual-and-plain-convolutional-neural
Repo	https://github.com/west-gates/3DCNN-Vis
Framework	none

Causal Patterns: Extraction of multiple causal relationships by Mixture of Probabilistic Partial Canonical Correlation Analysis


Title	Causal Patterns: Extraction of multiple causal relationships by Mixture of Probabilistic Partial Canonical Correlation Analysis
Authors	Hiroki Mori, Keisuke Kawano, Hiroki Yokoyama
Abstract	In this paper, we propose a mixture of probabilistic partial canonical correlation analysis (MPPCCA) that extracts the Causal Patterns from two multivariate time series. Causal patterns refer to the signal patterns within interactions of two elements having multiple types of mutually causal relationships, rather than a mixture of simultaneous correlations or the absence of presence of a causal relationship between the elements. In multivariate statistics, partial canonical correlation analysis (PCCA) evaluates the correlation between two multivariates after subtracting the effect of the third multivariate. PCCA can calculate the Granger Causal- ity Index (which tests whether a time-series can be predicted from an- other time-series), but is not applicable to data containing multiple partial canonical correlations. After introducing the MPPCCA, we propose an expectation-maxmization (EM) algorithm that estimates the parameters and latent variables of the MPPCCA. The MPPCCA is expected to ex- tract multiple partial canonical correlations from data series without any supervised signals to split the data as clusters. The method was then eval- uated in synthetic data experiments. In the synthetic dataset, our method estimated the multiple partial canonical correlations more accurately than the existing method. To determine the types of patterns detectable by the method, experiments were also conducted on real datasets. The method estimated the communication patterns In motion-capture data. The MP- PCCA is applicable to various type of signals such as brain signals, human communication and nonlinear complex multibody systems.
Tasks	Motion Capture, Time Series
Published	2017-12-12
URL	http://arxiv.org/abs/1712.04221v1
PDF	http://arxiv.org/pdf/1712.04221v1.pdf
PWC	https://paperswithcode.com/paper/causal-patterns-extraction-of-multiple-causal
Repo	https://github.com/kskkwn/mppcca
Framework	none

A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations


Title	A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations
Authors	Josua Krause, Aritra Dasgupta, Jordan Swartz, Yindalon Aphinyanaphongs, Enrico Bertini
Abstract	Human-in-the-loop data analysis applications necessitate greater transparency in machine learning models for experts to understand and trust their decisions. To this end, we propose a visual analytics workflow to help data scientists and domain experts explore, diagnose, and understand the decisions made by a binary classifier. The approach leverages “instance-level explanations”, measures of local feature relevance that explain single instances, and uses them to build a set of visual representations that guide the users in their investigation. The workflow is based on three main visual representations and steps: one based on aggregate statistics to see how data distributes across correct / incorrect decisions; one based on explanations to understand which features are used to make these decisions; and one based on raw data, to derive insights on potential root causes for the observed patterns. The workflow is derived from a long-term collaboration with a group of machine learning and healthcare professionals who used our method to make sense of machine learning models they developed. The case study from this collaboration demonstrates that the proposed workflow helps experts derive useful knowledge about the model and the phenomena it describes, thus experts can generate useful hypotheses on how a model can be improved.
Tasks
Published	2017-05-04
URL	http://arxiv.org/abs/1705.01968v3
PDF	http://arxiv.org/pdf/1705.01968v3.pdf
PWC	https://paperswithcode.com/paper/a-workflow-for-visual-diagnostics-of-binary
Repo	https://github.com/nyuvis/explanation_explorer
Framework	none

Regularization and Optimization strategies in Deep Convolutional Neural Network


Title	Regularization and Optimization strategies in Deep Convolutional Neural Network
Authors	Pushparaja Murugan, Shanmugasundaram Durairaj
Abstract	Convolution Neural Networks, known as ConvNets exceptionally perform well in many complex machine learning tasks. The architecture of ConvNets demands the huge and rich amount of data and involves with a vast number of parameters that leads the learning takes to be computationally expensive, slow convergence towards the global minima, trap in local minima with poor predictions. In some cases, architecture overfits the data and make the architecture difficult to generalise for new samples that were not in the training set samples. To address these limitations, many regularization and optimization strategies are developed for the past few years. Also, studies suggested that these techniques significantly increase the performance of the networks as well as reducing the computational cost. In implementing these techniques, one must thoroughly understand the theoretical concept of how this technique works in increasing the expressive power of the networks. This article is intended to provide the theoretical concepts and mathematical formulation of the most commonly used strategies in developing a ConvNet architecture.
Tasks
Published	2017-12-13
URL	http://arxiv.org/abs/1712.04711v1
PDF	http://arxiv.org/pdf/1712.04711v1.pdf
PWC	https://paperswithcode.com/paper/regularization-and-optimization-strategies-in
Repo	https://github.com/xiaorenwu1111/Machine-Learning
Framework	tf

The Effects of Memory Replay in Reinforcement Learning


Title	The Effects of Memory Replay in Reinforcement Learning
Authors	Ruishan Liu, James Zou
Abstract	Experience replay is a key technique behind many recent advances in deep reinforcement learning. Allowing the agent to learn from earlier memories can speed up learning and break undesirable temporal correlations. Despite its wide-spread application, very little is understood about the properties of experience replay. How does the amount of memory kept affect learning dynamics? Does it help to prioritize certain experiences? In this paper, we address these questions by formulating a dynamical systems ODE model of Q-learning with experience replay. We derive analytic solutions of the ODE for a simple setting. We show that even in this very simple setting, the amount of memory kept can substantially affect the agent’s performance. Too much or too little memory both slow down learning. Moreover, we characterize regimes where prioritized replay harms the agent’s learning. We show that our analytic solutions have excellent agreement with experiments. Finally, we propose a simple algorithm for adaptively changing the memory buffer size which achieves consistently good empirical performance.
Tasks	Q-Learning
Published	2017-10-18
URL	http://arxiv.org/abs/1710.06574v1
PDF	http://arxiv.org/pdf/1710.06574v1.pdf
PWC	https://paperswithcode.com/paper/the-effects-of-memory-replay-in-reinforcement
Repo	https://github.com/VictorZuanazzi/Project_RL.git
Framework	none

Food Ingredients Recognition through Multi-label Learning


Title	Food Ingredients Recognition through Multi-label Learning
Authors	Marc Bolaños, Aina Ferrà, Petia Radeva
Abstract	Automatically constructing a food diary that tracks the ingredients consumed can help people follow a healthy diet. We tackle the problem of food ingredients recognition as a multi-label learning problem. We propose a method for adapting a highly performing state of the art CNN in order to act as a multi-label predictor for learning recipes in terms of their list of ingredients. We prove that our model is able to, given a picture, predict its list of ingredients, even if the recipe corresponding to the picture has never been seen by the model. We make public two new datasets suitable for this purpose. Furthermore, we prove that a model trained with a high variability of recipes and ingredients is able to generalize better on new data, and visualize how it specializes each of its neurons to different ingredients.
Tasks	Multi-Label Classification, Multi-Label Learning
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08816v1
PDF	http://arxiv.org/pdf/1707.08816v1.pdf
PWC	https://paperswithcode.com/paper/food-ingredients-recognition-through-multi
Repo	https://github.com/kshen3778/Ingredient-Detection
Framework	pytorch

Target Curricula via Selection of Minimum Feature Sets: a Case Study in Boolean Networks


Title	Target Curricula via Selection of Minimum Feature Sets: a Case Study in Boolean Networks
Authors	Shannon Fenn, Pablo Moscato
Abstract	We consider the effect of introducing a curriculum of targets when training Boolean models on supervised Multi Label Classification (MLC) problems. In particular, we consider how to order targets in the absence of prior knowledge, and how such a curriculum may be enforced when using meta-heuristics to train discrete non-linear models. We show that hierarchical dependencies between targets can be exploited by enforcing an appropriate curriculum using hierarchical loss functions. On several multi output circuit-inference problems with known target difficulties, Feedforward Boolean Networks (FBNs) trained with such a loss function achieve significantly lower out-of-sample error, up to $10%$ in some cases. This improvement increases as the loss places more emphasis on target order and is strongly correlated with an easy-to-hard curricula. We also demonstrate the same improvements on three real-world models and two Gene Regulatory Network (GRN) inference problems. We posit a simple a-priori method for identifying an appropriate target order and estimating the strength of target relationships in Boolean MLCs. These methods use intrinsic dimension as a proxy for target difficulty, which is estimated using optimal solutions to a combinatorial optimisation problem known as the Minimum-Feature-Set (minFS) problem. We also demonstrate that the same generalisation gains can be achieved without providing any knowledge of target difficulty.
Tasks	Multi-Label Classification
Published	2017-06-15
URL	http://arxiv.org/abs/1706.04721v2
PDF	http://arxiv.org/pdf/1706.04721v2.pdf
PWC	https://paperswithcode.com/paper/target-curricula-via-selection-of-minimum
Repo	https://github.com/shannonfenn/Multi-Label-Curricula-via-Minimum-Feature-Selection
Framework	none

Generalized Value Iteration Networks: Life Beyond Lattices


Title	Generalized Value Iteration Networks: Life Beyond Lattices
Authors	Sufeng Niu, Siheng Chen, Hanyu Guo, Colin Targonski, Melissa C. Smith, Jelena Kovačević
Abstract	In this paper, we introduce a generalized value iteration network (GVIN), which is an end-to-end neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Q-learning, an improvement upon traditional n-step Q-learning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and real-world street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image).
Tasks	Q-Learning
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02416v2
PDF	http://arxiv.org/pdf/1706.02416v2.pdf
PWC	https://paperswithcode.com/paper/generalized-value-iteration-networks-life
Repo	https://github.com/sufengniu/GVIN
Framework	tf

Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning


Title	Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
Authors	Jason D. Williams, Kavosh Asadi, Geoffrey Zweig
Abstract	End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-to-end approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. In addition, HCNs can be optimized with supervised learning, reinforcement learning, or a mixture of both. HCNs attain state-of-the-art performance on the bAbI dialog dataset, and outperform two commercially deployed customer-facing dialog systems.
Tasks
Published	2017-02-10
URL	http://arxiv.org/abs/1702.03274v2
PDF	http://arxiv.org/pdf/1702.03274v2.pdf
PWC	https://paperswithcode.com/paper/hybrid-code-networks-practical-and-efficient
Repo	https://github.com/deepmipt/DeepPavlov
Framework	tf