October 20, 2019

3020 words 15 mins read

Paper Group AWR 351

Paper Group AWR 351

Learning Time-Sensitive Strategies in Space Fortress. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives. Boosting Domain Adaptation by Discovering Latent Domains. InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics. End-to-End Latent Fingerprint Search. …

Learning Time-Sensitive Strategies in Space Fortress

Title Learning Time-Sensitive Strategies in Space Fortress
Authors Akshat Agarwal, Ryan Hope, Katia Sycara
Abstract Although there has been remarkable progress and impressive performance on reinforcement learning (RL) on Atari games, there are many problems with challenging characteristics that have not yet been explored in Deep Learning for RL. These include reward sparsity, abrupt context-dependent reversals of strategy and time-sensitive game play. In this paper, we present Space Fortress, a game that incorporates all these characteristics and experimentally show that the presence of any of these renders state of the art Deep RL algorithms incapable of learning. Then, we present our enhancements to an existing algorithm and show big performance increases through each enhancement through an ablation study. We discuss how each of these enhancements was able to help and also argue that appropriate transfer learning boosts performance.
Tasks Atari Games, Transfer Learning
Published 2018-05-17
URL http://arxiv.org/abs/1805.06824v4
PDF http://arxiv.org/pdf/1805.06824v4.pdf
PWC https://paperswithcode.com/paper/learning-time-sensitive-strategies-in-space
Repo https://github.com/agakshat/spacefortress
Framework pytorch

Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives

Title Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives
Authors Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, Payel Das
Abstract In this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be %necessarily and minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been explicitly identified by current explanation methods that explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and a brain activity strength dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate.
Tasks
Published 2018-02-21
URL http://arxiv.org/abs/1802.07623v2
PDF http://arxiv.org/pdf/1802.07623v2.pdf
PWC https://paperswithcode.com/paper/explanations-based-on-the-missing-towards
Repo https://github.com/SeldonIO/alibi
Framework tf

Boosting Domain Adaptation by Discovering Latent Domains

Title Boosting Domain Adaptation by Discovering Latent Domains
Authors Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Abstract Current Domain Adaptation (DA) methods based on deep architectures assume that the source samples arise from a single distribution. However, in practice, most datasets can be regarded as mixtures of multiple domains. In these cases exploiting single-source DA methods for learning target classifiers may lead to sub-optimal, if not poor, results. In addition, in many applications it is difficult to manually provide the domain labels for all source data points, i.e. latent domains should be automatically discovered. This paper introduces a novel Convolutional Neural Network (CNN) architecture which (i) automatically discovers latent domains in visual datasets and (ii) exploits this information to learn robust target classifiers. Our approach is based on the introduction of two main components, which can be embedded into any existing CNN architecture: (i) a side branch that automatically computes the assignment of a source sample to a latent domain and (ii) novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution. We test our approach on publicly-available datasets, showing that it outperforms state-of-the-art multi-source DA methods by a large margin.
Tasks Domain Adaptation
Published 2018-05-03
URL http://arxiv.org/abs/1805.01386v1
PDF http://arxiv.org/pdf/1805.01386v1.pdf
PWC https://paperswithcode.com/paper/boosting-domain-adaptation-by-discovering
Repo https://github.com/mancinimassimiliano/pytorch_wbn
Framework pytorch

InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics

Title InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics
Authors Young-Jin Park, Han-Lim Choi
Abstract The goal of system identification is to learn about underlying physics dynamics behind the time-series data. To model the probabilistic and nonparametric dynamics model, Gaussian process (GP) have been widely used; GP can estimate the uncertainty of prediction and avoid over-fitting. Traditional GPSSMs, however, are based on Gaussian transition model, thus often have difficulty in describing a more complex transition model, e.g. aircraft motions. To resolve the challenge, this paper proposes a framework using multiple GP transition models which is capable of describing multi-modal dynamics. Furthermore, we extend the model to the information-theoretic framework, the so-called InfoSSM, by introducing a mutual information regularizer helping the model to learn interpretable and distinguishable multiple dynamics models. Two illustrative numerical experiments in simple Dubins vehicle and high-fidelity flight simulator are presented to demonstrate the performance and interpretability of the proposed model. Finally, this paper introduces a framework using InfoSSM with Bayesian filtering for air traffic control tracking.
Tasks Time Series
Published 2018-09-19
URL http://arxiv.org/abs/1809.07109v2
PDF http://arxiv.org/pdf/1809.07109v2.pdf
PWC https://paperswithcode.com/paper/infossm-interpretable-unsupervised-learning
Repo https://github.com/yjparkLiCS/InfoSSM
Framework tf
Title End-to-End Latent Fingerprint Search
Authors Kai Cao, Dinh-Luan Nguyen, Cori Tymoszek, A. K. Jain
Abstract A system for identifying latent fingerprints. Created at Michigan State University by Anil K. Jain, Kai Cao, Dinh-Luan Nguyen, and Cori Tymoszek.
Tasks Quantization
Published 2018-12-26
URL http://arxiv.org/abs/1812.10213v1
PDF http://arxiv.org/pdf/1812.10213v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-latent-fingerprint-search
Repo https://github.com/prip-lab/MSU-LatentAFIS
Framework pytorch

CRAVES: Controlling Robotic Arm with a Vision-based Economic System

Title CRAVES: Controlling Robotic Arm with a Vision-based Economic System
Authors Yiming Zuo, Weichao Qiu, Lingxi Xie, Fangwei Zhong, Yizhou Wang, Alan L. Yuille
Abstract Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry. This work discusses the role of computer vision algorithms in this field. We focus on low-cost arms on which no sensors are equipped and thus all decisions are made upon visual recognition, e.g., real-time 3D pose estimation. This requires annotating a lot of training data, which is not only time-consuming but also laborious. In this paper, we present an alternative solution, which uses a 3D model to create a large number of synthetic data, trains a vision model in this virtual domain, and applies it to real-world images after domain adaptation. To this end, we design a semi-supervised approach, which fully leverages the geometric constraints among keypoints. We apply an iterative algorithm for optimization. Without any annotations on real images, our algorithm generalizes well and produces satisfying results on 3D pose estimation, which is evaluated on two real-world datasets. We also construct a vision-based control system for task accomplishment, for which we train a reinforcement learning agent in a virtual environment and apply it to the real-world. Moreover, our approach, with merely a 3D model being required, has the potential to generalize to other types of multi-rigid-body dynamic systems.
Tasks 3D Pose Estimation, Domain Adaptation, Pose Estimation
Published 2018-12-03
URL https://arxiv.org/abs/1812.00725v2
PDF https://arxiv.org/pdf/1812.00725v2.pdf
PWC https://paperswithcode.com/paper/towards-accurate-task-accomplishment-with-low
Repo https://github.com/zuoym15/craves.ai
Framework pytorch

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization

Title An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
Authors Gongbo Tang, Fabienne Cap, Eva Pettersson, Joakim Nivre
Abstract In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models for low-resource languages. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization.
Tasks Machine Translation
Published 2018-06-13
URL http://arxiv.org/abs/1806.05210v2
PDF http://arxiv.org/pdf/1806.05210v2.pdf
PWC https://paperswithcode.com/paper/an-evaluation-of-neural-machine-translation
Repo https://github.com/tanggongbo/normalization-NMT
Framework none

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models

Title What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Authors Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass
Abstract Despite the remarkable evolution of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. Previous work largely focused on what these models learn at the representation level. We break this analysis down further and study individual dimensions (neurons) in the vector representation learned by end-to-end neural models in NLP tasks. We propose two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation Analysis, an unsupervised method to extract salient neurons w.r.t. the model itself. We evaluate the effectiveness of our techniques by ablating the identified neurons and reevaluating the network’s performance for two tasks: neural machine translation (NMT) and neural language modeling (NLM). We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models? ii) are certain neurons exclusive to some properties and not others? iii) is the information more or less distributed in NMT vs. NLM? and iv) how important are the neurons identified through the linguistic correlation method to the overall task? Our code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019).
Tasks Language Modelling, Machine Translation
Published 2018-12-21
URL http://arxiv.org/abs/1812.09355v1
PDF http://arxiv.org/pdf/1812.09355v1.pdf
PWC https://paperswithcode.com/paper/what-is-one-grain-of-sand-in-the-desert
Repo https://github.com/fdalvi/NeuroX
Framework pytorch

OpenNMT: Neural Machine Translation Toolkit

Title OpenNMT: Neural Machine Translation Toolkit
Authors Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush
Abstract OpenNMT is an open-source toolkit for neural machine translation (NMT). The system prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques. OpenNMT has been used in several production MT systems, modified for numerous research papers, and is implemented across several deep learning frameworks.
Tasks Machine Translation
Published 2018-05-28
URL http://arxiv.org/abs/1805.11462v1
PDF http://arxiv.org/pdf/1805.11462v1.pdf
PWC https://paperswithcode.com/paper/opennmt-neural-machine-translation-toolkit
Repo https://github.com/Waino/OpenNMT-py
Framework pytorch

Large batch size training of neural networks with adversarial training and second-order information

Title Large batch size training of neural networks with adversarial training and second-order information
Authors Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph Gonzalez, Kurt Keutzer, Michael Mahoney
Abstract The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size. However, large batch training often leads to poorer generalization. A recently proposed solution for this problem is to use adaptive batch sizes in SGD. In this case, one starts with a small number of processes and scales the processes as training progresses. Two major challenges with this approach are (i) that dynamically resizing the cluster can add non-trivial overhead, in part since it is currently not supported, and (ii) that the overall speed up is limited by the initial phase with smaller batches. In this work, we address both challenges by developing a new adaptive batch size framework, with autoscaling based on the Ray framework. This allows very efficient elastic scaling with negligible resizing overhead (0.32% of time for ResNet18 ImageNet training). Furthermore, we propose a new adaptive batch size training scheme using second order methods and adversarial training. These enable increasing batch sizes earlier during training, which leads to better training time. We extensively evaluate our method on Cifar-10/100, SVHN, TinyImageNet, and ImageNet datasets, using multiple neural networks, including ResNets and smaller networks such as SqueezeNext. Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1% and $5\times$, respectively). Importantly, this is achieved without any additional hyper-parameter tuning to tailor our method in any of these experiments.
Tasks
Published 2018-10-02
URL https://arxiv.org/abs/1810.01021v3
PDF https://arxiv.org/pdf/1810.01021v3.pdf
PWC https://paperswithcode.com/paper/large-batch-size-training-of-neural-networks
Repo https://github.com/amirgholami/hessianflow
Framework pytorch

Learning Long Term Dependencies via Fourier Recurrent Units

Title Learning Long Term Dependencies via Fourier Recurrent Units
Authors Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon
Abstract It is a known fact that training recurrent neural networks for tasks that have long term dependencies is challenging. One of the main reasons is the vanishing or exploding gradient problem, which prevents gradient information from propagating to early layers. In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its training while giving us stronger expressive power. Specifically, FRU summarizes the hidden states $h^{(t)}$ along the temporal dimension with Fourier basis functions. This allows gradients to easily reach any layer due to FRU’s residual learning structure and the global support of trigonometric functions. We show that FRU has gradient lower and upper bounds independent of temporal dimension. We also show the strong expressivity of sparse Fourier basis, from which FRU obtains its strong expressive power. Our experimental study also demonstrates that with fewer parameters the proposed architecture outperforms other recurrent architectures on many tasks.
Tasks
Published 2018-03-17
URL http://arxiv.org/abs/1803.06585v1
PDF http://arxiv.org/pdf/1803.06585v1.pdf
PWC https://paperswithcode.com/paper/learning-long-term-dependencies-via-fourier
Repo https://github.com/limbo018/FRU
Framework tf

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks

Title S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks
Authors Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang
Abstract In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network. Our architecture, named S3D, encodes the entire video stream and discretizes the output space of temporal activity spans into a set of default spans over different temporal locations and scales. At prediction time, S3D predicts scores for the presence of activity categories in each default span and produces temporal adjustments relative to the span location to predict the precise activity duration. Unlike many state-of-the-art systems that require a separate proposal and classification stage, our S3D is intrinsically simple and dedicatedly designed for single-shot, end-to-end temporal activity detection. When evaluating on THUMOS’14 detection benchmark, S3D achieves state-of-the-art performance and is very efficient and can operate at 1271 FPS.
Tasks Action Detection, Activity Detection
Published 2018-07-21
URL http://arxiv.org/abs/1807.08069v2
PDF http://arxiv.org/pdf/1807.08069v2.pdf
PWC https://paperswithcode.com/paper/s3d-single-shot-multi-span-detector-via-fully
Repo https://github.com/dazhang-cv/Project
Framework none

Context-Aware Crowd Counting

Title Context-Aware Crowd Counting
Authors Weizhe Liu, Mathieu Salzmann, Pascal Fua
Abstract State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. They typically use the same filters over the whole image or over large image patches. Only then do they estimate local scale to compensate for perspective distortion. This is typically achieved by training an auxiliary classifier to select, for predefined image patches, the best kernel size among a limited set of choices. As such, these methods are not end-to-end trainable and restricted in the scope of context they can leverage. In this paper, we introduce an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location. In other words, our approach adaptively encodes the scale of the contextual information required to accurately predict crowd density. This yields an algorithm that outperforms state-of-the-art crowd counting methods, especially when perspective effects are strong.
Tasks Crowd Counting
Published 2018-11-26
URL http://arxiv.org/abs/1811.10452v2
PDF http://arxiv.org/pdf/1811.10452v2.pdf
PWC https://paperswithcode.com/paper/context-aware-crowd-counting
Repo https://github.com/weizheliu/Context-Aware-Crowd-Counting
Framework pytorch

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

Title Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient
Authors Rui Zhao, Volker Tresp
Abstract Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the GuessWhat?! game, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 7%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances.
Tasks Policy Gradient Methods, Visual Dialog
Published 2018-07-02
URL http://arxiv.org/abs/1807.00737v4
PDF http://arxiv.org/pdf/1807.00737v4.pdf
PWC https://paperswithcode.com/paper/learning-goal-oriented-visual-dialog-via
Repo https://github.com/ruizhaogit/GuessWhat-TemperedPolicyGradient
Framework pytorch

Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection

Title Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection
Authors Da Zhang, Xiyang Dai, Yuan-Fang Wang
Abstract Recognizing instances at different scales simultaneously is a fundamental challenge in visual detection problems. While spatial multi-scale modeling has been well studied in object detection, how to effectively apply a multi-scale architecture to temporal models for activity detection is still under-explored. In this paper, we identify three unique challenges that need to be specifically handled for temporal activity detection compared to its spatial counterpart. To address all these issues, we propose Dynamic Temporal Pyramid Network (DTPN), a new activity detection framework with a multi-scale pyramidal architecture featuring three novel designs: (1) We sample input video frames dynamically with varying frame per seconds (FPS) to construct a natural pyramidal input for video of an arbitrary length. (2) We design a two-branch multi-scale temporal feature hierarchy to deal with the inherent temporal scale variation of activity instances. (3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important. By combining all these components into a uniform network, we end up with a single-shot activity detector involving single-pass inferencing and end-to-end training. Extensive experiments show that the proposed DTPN achieves state-of-the-art performance on the challenging ActvityNet dataset.
Tasks Action Detection, Activity Detection, Object Detection
Published 2018-08-07
URL http://arxiv.org/abs/1808.02536v2
PDF http://arxiv.org/pdf/1808.02536v2.pdf
PWC https://paperswithcode.com/paper/dynamic-temporal-pyramid-network-a-closer
Repo https://github.com/dazhang-cv/Project
Framework none
comments powered by Disqus