Paper Group AWR 99
Improving Lexical Choice in Neural Machine Translation. Joint Matrix-Tensor Factorization for Knowledge Base Inference. DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding. Hierarchical Attentive Recurrent Tracking. Training Ensembles to Detect Adversarial Examples. QCD-Aware Recursive Neural Networks for Jet Physics. …
Improving Lexical Choice in Neural Machine Translation
Title | Improving Lexical Choice in Neural Machine Translation |
Authors | Toan Q. Nguyen, David Chiang |
Abstract | We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings. |
Tasks | Machine Translation, Word Embeddings |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.01329v3 |
http://arxiv.org/pdf/1710.01329v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-lexical-choice-in-neural-machine |
Repo | https://github.com/arturo-garza/NMTLexiconModel |
Framework | none |
Joint Matrix-Tensor Factorization for Knowledge Base Inference
Title | Joint Matrix-Tensor Factorization for Knowledge Base Inference |
Authors | Prachi Jain, Shikhar Murty, Mausam, Soumen Chakrabarti |
Abstract | While several matrix factorization (MF) and tensor factorization (TF) models have been proposed for knowledge base (KB) inference, they have rarely been compared across various datasets. Is there a single model that performs well across datasets? If not, what characteristics of a dataset determine the performance of MF and TF models? Is there a joint TF+MF model that performs robustly on all datasets? We perform an extensive evaluation to compare popular KB inference models across popular datasets in the literature. In addition to answering the questions above, we remove a limitation in the standard evaluation protocol for MF models, propose an extension to MF models so that they can better handle out-of-vocabulary (OOV) entity pairs, and develop a novel combination of TF and MF models. We also analyze and explain the results based on models and dataset characteristics. Our best model is robust, and obtains strong results across all datasets. |
Tasks | |
Published | 2017-06-02 |
URL | http://arxiv.org/abs/1706.00637v1 |
http://arxiv.org/pdf/1706.00637v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-matrix-tensor-factorization-for |
Repo | https://github.com/MurtyShikhar/KBI |
Framework | none |
DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
Title | DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding |
Authors | Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, Chengqi Zhang |
Abstract | Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)", is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets. |
Tasks | Natural Language Inference, Sentence Embedding |
Published | 2017-09-14 |
URL | http://arxiv.org/abs/1709.04696v3 |
http://arxiv.org/pdf/1709.04696v3.pdf | |
PWC | https://paperswithcode.com/paper/disan-directional-self-attention-network-for |
Repo | https://github.com/taoshen58/DiSAN |
Framework | tf |
Hierarchical Attentive Recurrent Tracking
Title | Hierarchical Attentive Recurrent Tracking |
Authors | Adam R. Kosiorek, Alex Bewley, Ingmar Posner |
Abstract | Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori. Inspired by how the human visual cortex employs spatial attention and separate “where” and “what” processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos. The first layer of attention discards the majority of background by selecting a region containing the object of interest, while the subsequent layers tune in on visual features particular to the tracked object. This framework is fully differentiable and can be trained in a purely data driven fashion by gradient methods. To improve training convergence, we augment the loss function with terms for a number of auxiliary tasks relevant for tracking. Evaluation of the proposed model is performed on two datasets: pedestrian tracking on the KTH activity recognition dataset and the more difficult KITTI object tracking dataset. |
Tasks | Activity Recognition, Object Tracking |
Published | 2017-06-28 |
URL | http://arxiv.org/abs/1706.09262v2 |
http://arxiv.org/pdf/1706.09262v2.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-attentive-recurrent-tracking |
Repo | https://github.com/akosiorek/hart |
Framework | tf |
Training Ensembles to Detect Adversarial Examples
Title | Training Ensembles to Detect Adversarial Examples |
Authors | Alexander Bagnall, Razvan Bunescu, Gordon Stewart |
Abstract | We propose a new ensemble method for detecting and classifying adversarial examples generated by state-of-the-art attacks, including DeepFool and C&W. Our method works by training the members of an ensemble to have low classification error on random benign examples while simultaneously minimizing agreement on examples outside the training distribution. We evaluate on both MNIST and CIFAR-10, against oblivious and both white- and black-box adversaries. |
Tasks | |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.04006v1 |
http://arxiv.org/pdf/1712.04006v1.pdf | |
PWC | https://paperswithcode.com/paper/training-ensembles-to-detect-adversarial |
Repo | https://github.com/bagnalla/ensemble_detect_adv |
Framework | tf |
QCD-Aware Recursive Neural Networks for Jet Physics
Title | QCD-Aware Recursive Neural Networks for Jet Physics |
Authors | Gilles Louppe, Kyunghyun Cho, Cyril Becot, Kyle Cranmer |
Abstract | Recent progress in applying machine learning for jet physics has been built upon an analogy between calorimeters and images. In this work, we present a novel class of recursive neural networks built instead upon an analogy between QCD and natural languages. In the analogy, four-momenta are like words and the clustering history of sequential recombination jet algorithms is like the parsing of a sentence. Our approach works directly with the four-momenta of a variable-length set of particles, and the jet-based tree structure varies on an event-by-event basis. Our experiments highlight the flexibility of our method for building task-specific jet embeddings and show that recursive architectures are significantly more accurate and data efficient than previous image-based networks. We extend the analogy from individual jets (sentences) to full events (paragraphs), and show for the first time an event-level classifier operating on all the stable particles produced in an LHC event. |
Tasks | |
Published | 2017-02-02 |
URL | http://arxiv.org/abs/1702.00748v2 |
http://arxiv.org/pdf/1702.00748v2.pdf | |
PWC | https://paperswithcode.com/paper/qcd-aware-recursive-neural-networks-for-jet |
Repo | https://github.com/SebastianMacaluso/RecNN_PyTorch_batch |
Framework | pytorch |
Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification
Title | Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification |
Authors | Sergey Korolev, Amir Safiullin, Mikhail Belyaev, Yulia Dodonova |
Abstract | In the recent years there have been a number of studies that applied deep learning algorithms to neuroimaging data. Pipelines used in those studies mostly require multiple processing steps for feature extraction, although modern advancements in deep learning for image classification can provide a powerful framework for automatic feature generation and more straightforward analysis. In this paper, we show how similar performance can be achieved skipping these feature extraction steps with the residual and plain 3D convolutional neural network architectures. We demonstrate the performance of the proposed approach for classification of Alzheimer’s disease versus mild cognitive impairment and normal controls on the Alzheimer’s Disease National Initiative (ADNI) dataset of 3D structural MRI brain scans. |
Tasks | Image Classification |
Published | 2017-01-23 |
URL | http://arxiv.org/abs/1701.06643v1 |
http://arxiv.org/pdf/1701.06643v1.pdf | |
PWC | https://paperswithcode.com/paper/residual-and-plain-convolutional-neural |
Repo | https://github.com/west-gates/3DCNN-Vis |
Framework | none |
Causal Patterns: Extraction of multiple causal relationships by Mixture of Probabilistic Partial Canonical Correlation Analysis
Title | Causal Patterns: Extraction of multiple causal relationships by Mixture of Probabilistic Partial Canonical Correlation Analysis |
Authors | Hiroki Mori, Keisuke Kawano, Hiroki Yokoyama |
Abstract | In this paper, we propose a mixture of probabilistic partial canonical correlation analysis (MPPCCA) that extracts the Causal Patterns from two multivariate time series. Causal patterns refer to the signal patterns within interactions of two elements having multiple types of mutually causal relationships, rather than a mixture of simultaneous correlations or the absence of presence of a causal relationship between the elements. In multivariate statistics, partial canonical correlation analysis (PCCA) evaluates the correlation between two multivariates after subtracting the effect of the third multivariate. PCCA can calculate the Granger Causal- ity Index (which tests whether a time-series can be predicted from an- other time-series), but is not applicable to data containing multiple partial canonical correlations. After introducing the MPPCCA, we propose an expectation-maxmization (EM) algorithm that estimates the parameters and latent variables of the MPPCCA. The MPPCCA is expected to ex- tract multiple partial canonical correlations from data series without any supervised signals to split the data as clusters. The method was then eval- uated in synthetic data experiments. In the synthetic dataset, our method estimated the multiple partial canonical correlations more accurately than the existing method. To determine the types of patterns detectable by the method, experiments were also conducted on real datasets. The method estimated the communication patterns In motion-capture data. The MP- PCCA is applicable to various type of signals such as brain signals, human communication and nonlinear complex multibody systems. |
Tasks | Motion Capture, Time Series |
Published | 2017-12-12 |
URL | http://arxiv.org/abs/1712.04221v1 |
http://arxiv.org/pdf/1712.04221v1.pdf | |
PWC | https://paperswithcode.com/paper/causal-patterns-extraction-of-multiple-causal |
Repo | https://github.com/kskkwn/mppcca |
Framework | none |
A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations
Title | A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations |
Authors | Josua Krause, Aritra Dasgupta, Jordan Swartz, Yindalon Aphinyanaphongs, Enrico Bertini |
Abstract | Human-in-the-loop data analysis applications necessitate greater transparency in machine learning models for experts to understand and trust their decisions. To this end, we propose a visual analytics workflow to help data scientists and domain experts explore, diagnose, and understand the decisions made by a binary classifier. The approach leverages “instance-level explanations”, measures of local feature relevance that explain single instances, and uses them to build a set of visual representations that guide the users in their investigation. The workflow is based on three main visual representations and steps: one based on aggregate statistics to see how data distributes across correct / incorrect decisions; one based on explanations to understand which features are used to make these decisions; and one based on raw data, to derive insights on potential root causes for the observed patterns. The workflow is derived from a long-term collaboration with a group of machine learning and healthcare professionals who used our method to make sense of machine learning models they developed. The case study from this collaboration demonstrates that the proposed workflow helps experts derive useful knowledge about the model and the phenomena it describes, thus experts can generate useful hypotheses on how a model can be improved. |
Tasks | |
Published | 2017-05-04 |
URL | http://arxiv.org/abs/1705.01968v3 |
http://arxiv.org/pdf/1705.01968v3.pdf | |
PWC | https://paperswithcode.com/paper/a-workflow-for-visual-diagnostics-of-binary |
Repo | https://github.com/nyuvis/explanation_explorer |
Framework | none |
Regularization and Optimization strategies in Deep Convolutional Neural Network
Title | Regularization and Optimization strategies in Deep Convolutional Neural Network |
Authors | Pushparaja Murugan, Shanmugasundaram Durairaj |
Abstract | Convolution Neural Networks, known as ConvNets exceptionally perform well in many complex machine learning tasks. The architecture of ConvNets demands the huge and rich amount of data and involves with a vast number of parameters that leads the learning takes to be computationally expensive, slow convergence towards the global minima, trap in local minima with poor predictions. In some cases, architecture overfits the data and make the architecture difficult to generalise for new samples that were not in the training set samples. To address these limitations, many regularization and optimization strategies are developed for the past few years. Also, studies suggested that these techniques significantly increase the performance of the networks as well as reducing the computational cost. In implementing these techniques, one must thoroughly understand the theoretical concept of how this technique works in increasing the expressive power of the networks. This article is intended to provide the theoretical concepts and mathematical formulation of the most commonly used strategies in developing a ConvNet architecture. |
Tasks | |
Published | 2017-12-13 |
URL | http://arxiv.org/abs/1712.04711v1 |
http://arxiv.org/pdf/1712.04711v1.pdf | |
PWC | https://paperswithcode.com/paper/regularization-and-optimization-strategies-in |
Repo | https://github.com/xiaorenwu1111/Machine-Learning |
Framework | tf |
The Effects of Memory Replay in Reinforcement Learning
Title | The Effects of Memory Replay in Reinforcement Learning |
Authors | Ruishan Liu, James Zou |
Abstract | Experience replay is a key technique behind many recent advances in deep reinforcement learning. Allowing the agent to learn from earlier memories can speed up learning and break undesirable temporal correlations. Despite its wide-spread application, very little is understood about the properties of experience replay. How does the amount of memory kept affect learning dynamics? Does it help to prioritize certain experiences? In this paper, we address these questions by formulating a dynamical systems ODE model of Q-learning with experience replay. We derive analytic solutions of the ODE for a simple setting. We show that even in this very simple setting, the amount of memory kept can substantially affect the agent’s performance. Too much or too little memory both slow down learning. Moreover, we characterize regimes where prioritized replay harms the agent’s learning. We show that our analytic solutions have excellent agreement with experiments. Finally, we propose a simple algorithm for adaptively changing the memory buffer size which achieves consistently good empirical performance. |
Tasks | Q-Learning |
Published | 2017-10-18 |
URL | http://arxiv.org/abs/1710.06574v1 |
http://arxiv.org/pdf/1710.06574v1.pdf | |
PWC | https://paperswithcode.com/paper/the-effects-of-memory-replay-in-reinforcement |
Repo | https://github.com/VictorZuanazzi/Project_RL.git |
Framework | none |
Food Ingredients Recognition through Multi-label Learning
Title | Food Ingredients Recognition through Multi-label Learning |
Authors | Marc Bolaños, Aina Ferrà, Petia Radeva |
Abstract | Automatically constructing a food diary that tracks the ingredients consumed can help people follow a healthy diet. We tackle the problem of food ingredients recognition as a multi-label learning problem. We propose a method for adapting a highly performing state of the art CNN in order to act as a multi-label predictor for learning recipes in terms of their list of ingredients. We prove that our model is able to, given a picture, predict its list of ingredients, even if the recipe corresponding to the picture has never been seen by the model. We make public two new datasets suitable for this purpose. Furthermore, we prove that a model trained with a high variability of recipes and ingredients is able to generalize better on new data, and visualize how it specializes each of its neurons to different ingredients. |
Tasks | Multi-Label Classification, Multi-Label Learning |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08816v1 |
http://arxiv.org/pdf/1707.08816v1.pdf | |
PWC | https://paperswithcode.com/paper/food-ingredients-recognition-through-multi |
Repo | https://github.com/kshen3778/Ingredient-Detection |
Framework | pytorch |
Target Curricula via Selection of Minimum Feature Sets: a Case Study in Boolean Networks
Title | Target Curricula via Selection of Minimum Feature Sets: a Case Study in Boolean Networks |
Authors | Shannon Fenn, Pablo Moscato |
Abstract | We consider the effect of introducing a curriculum of targets when training Boolean models on supervised Multi Label Classification (MLC) problems. In particular, we consider how to order targets in the absence of prior knowledge, and how such a curriculum may be enforced when using meta-heuristics to train discrete non-linear models. We show that hierarchical dependencies between targets can be exploited by enforcing an appropriate curriculum using hierarchical loss functions. On several multi output circuit-inference problems with known target difficulties, Feedforward Boolean Networks (FBNs) trained with such a loss function achieve significantly lower out-of-sample error, up to $10%$ in some cases. This improvement increases as the loss places more emphasis on target order and is strongly correlated with an easy-to-hard curricula. We also demonstrate the same improvements on three real-world models and two Gene Regulatory Network (GRN) inference problems. We posit a simple a-priori method for identifying an appropriate target order and estimating the strength of target relationships in Boolean MLCs. These methods use intrinsic dimension as a proxy for target difficulty, which is estimated using optimal solutions to a combinatorial optimisation problem known as the Minimum-Feature-Set (minFS) problem. We also demonstrate that the same generalisation gains can be achieved without providing any knowledge of target difficulty. |
Tasks | Multi-Label Classification |
Published | 2017-06-15 |
URL | http://arxiv.org/abs/1706.04721v2 |
http://arxiv.org/pdf/1706.04721v2.pdf | |
PWC | https://paperswithcode.com/paper/target-curricula-via-selection-of-minimum |
Repo | https://github.com/shannonfenn/Multi-Label-Curricula-via-Minimum-Feature-Selection |
Framework | none |
Generalized Value Iteration Networks: Life Beyond Lattices
Title | Generalized Value Iteration Networks: Life Beyond Lattices |
Authors | Sufeng Niu, Siheng Chen, Hanyu Guo, Colin Targonski, Melissa C. Smith, Jelena Kovačević |
Abstract | In this paper, we introduce a generalized value iteration network (GVIN), which is an end-to-end neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Q-learning, an improvement upon traditional n-step Q-learning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and real-world street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image). |
Tasks | Q-Learning |
Published | 2017-06-08 |
URL | http://arxiv.org/abs/1706.02416v2 |
http://arxiv.org/pdf/1706.02416v2.pdf | |
PWC | https://paperswithcode.com/paper/generalized-value-iteration-networks-life |
Repo | https://github.com/sufengniu/GVIN |
Framework | tf |
Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning
Title | Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning |
Authors | Jason D. Williams, Kavosh Asadi, Geoffrey Zweig |
Abstract | End-to-end learning of recurrent neural networks (RNNs) is an attractive solution for dialog systems; however, current techniques are data-intensive and require thousands of dialogs to learn simple behaviors. We introduce Hybrid Code Networks (HCNs), which combine an RNN with domain-specific knowledge encoded as software and system action templates. Compared to existing end-to-end approaches, HCNs considerably reduce the amount of training data required, while retaining the key benefit of inferring a latent representation of dialog state. In addition, HCNs can be optimized with supervised learning, reinforcement learning, or a mixture of both. HCNs attain state-of-the-art performance on the bAbI dialog dataset, and outperform two commercially deployed customer-facing dialog systems. |
Tasks | |
Published | 2017-02-10 |
URL | http://arxiv.org/abs/1702.03274v2 |
http://arxiv.org/pdf/1702.03274v2.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-code-networks-practical-and-efficient |
Repo | https://github.com/deepmipt/DeepPavlov |
Framework | tf |