October 20, 2019

3020 words 15 mins read

Paper Group AWR 351

Learning Time-Sensitive Strategies in Space Fortress. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives. Boosting Domain Adaptation by Discovering Latent Domains. InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics. End-to-End Latent Fingerprint Search. …

Learning Time-Sensitive Strategies in Space Fortress


Title	Learning Time-Sensitive Strategies in Space Fortress
Authors	Akshat Agarwal, Ryan Hope, Katia Sycara
Abstract	Although there has been remarkable progress and impressive performance on reinforcement learning (RL) on Atari games, there are many problems with challenging characteristics that have not yet been explored in Deep Learning for RL. These include reward sparsity, abrupt context-dependent reversals of strategy and time-sensitive game play. In this paper, we present Space Fortress, a game that incorporates all these characteristics and experimentally show that the presence of any of these renders state of the art Deep RL algorithms incapable of learning. Then, we present our enhancements to an existing algorithm and show big performance increases through each enhancement through an ablation study. We discuss how each of these enhancements was able to help and also argue that appropriate transfer learning boosts performance.
Tasks	Atari Games, Transfer Learning
Published	2018-05-17
URL	http://arxiv.org/abs/1805.06824v4
PDF	http://arxiv.org/pdf/1805.06824v4.pdf
PWC	https://paperswithcode.com/paper/learning-time-sensitive-strategies-in-space
Repo	https://github.com/agakshat/spacefortress
Framework	pytorch

Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives


Title	Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives
Authors	Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, Payel Das
Abstract	In this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be %necessarily and minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been explicitly identified by current explanation methods that explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and a brain activity strength dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate.
Tasks
Published	2018-02-21
URL	http://arxiv.org/abs/1802.07623v2
PDF	http://arxiv.org/pdf/1802.07623v2.pdf
PWC	https://paperswithcode.com/paper/explanations-based-on-the-missing-towards
Repo	https://github.com/SeldonIO/alibi
Framework	tf

Boosting Domain Adaptation by Discovering Latent Domains


Title	Boosting Domain Adaptation by Discovering Latent Domains
Authors	Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci
Abstract	Current Domain Adaptation (DA) methods based on deep architectures assume that the source samples arise from a single distribution. However, in practice, most datasets can be regarded as mixtures of multiple domains. In these cases exploiting single-source DA methods for learning target classifiers may lead to sub-optimal, if not poor, results. In addition, in many applications it is difficult to manually provide the domain labels for all source data points, i.e. latent domains should be automatically discovered. This paper introduces a novel Convolutional Neural Network (CNN) architecture which (i) automatically discovers latent domains in visual datasets and (ii) exploits this information to learn robust target classifiers. Our approach is based on the introduction of two main components, which can be embedded into any existing CNN architecture: (i) a side branch that automatically computes the assignment of a source sample to a latent domain and (ii) novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution. We test our approach on publicly-available datasets, showing that it outperforms state-of-the-art multi-source DA methods by a large margin.
Tasks	Domain Adaptation
Published	2018-05-03
URL	http://arxiv.org/abs/1805.01386v1
PDF	http://arxiv.org/pdf/1805.01386v1.pdf
PWC	https://paperswithcode.com/paper/boosting-domain-adaptation-by-discovering
Repo	https://github.com/mancinimassimiliano/pytorch_wbn
Framework	pytorch


Title	InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics
Authors	Young-Jin Park, Han-Lim Choi
Abstract	The goal of system identification is to learn about underlying physics dynamics behind the time-series data. To model the probabilistic and nonparametric dynamics model, Gaussian process (GP) have been widely used; GP can estimate the uncertainty of prediction and avoid over-fitting. Traditional GPSSMs, however, are based on Gaussian transition model, thus often have difficulty in describing a more complex transition model, e.g. aircraft motions. To resolve the challenge, this paper proposes a framework using multiple GP transition models which is capable of describing multi-modal dynamics. Furthermore, we extend the model to the information-theoretic framework, the so-called InfoSSM, by introducing a mutual information regularizer helping the model to learn interpretable and distinguishable multiple dynamics models. Two illustrative numerical experiments in simple Dubins vehicle and high-fidelity flight simulator are presented to demonstrate the performance and interpretability of the proposed model. Finally, this paper introduces a framework using InfoSSM with Bayesian filtering for air traffic control tracking.
Tasks	Time Series
Published	2018-09-19
URL	http://arxiv.org/abs/1809.07109v2
PDF	http://arxiv.org/pdf/1809.07109v2.pdf
PWC	https://paperswithcode.com/paper/infossm-interpretable-unsupervised-learning
Repo	https://github.com/yjparkLiCS/InfoSSM
Framework	tf

End-to-End Latent Fingerprint Search


Title	End-to-End Latent Fingerprint Search
Authors	Kai Cao, Dinh-Luan Nguyen, Cori Tymoszek, A. K. Jain
Abstract	A system for identifying latent fingerprints. Created at Michigan State University by Anil K. Jain, Kai Cao, Dinh-Luan Nguyen, and Cori Tymoszek.
Tasks	Quantization
Published	2018-12-26
URL	http://arxiv.org/abs/1812.10213v1
PDF	http://arxiv.org/pdf/1812.10213v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-latent-fingerprint-search
Repo	https://github.com/prip-lab/MSU-LatentAFIS
Framework	pytorch

CRAVES: Controlling Robotic Arm with a Vision-based Economic System


Title	CRAVES: Controlling Robotic Arm with a Vision-based Economic System
Authors	Yiming Zuo, Weichao Qiu, Lingxi Xie, Fangwei Zhong, Yizhou Wang, Alan L. Yuille
Abstract	Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry. This work discusses the role of computer vision algorithms in this field. We focus on low-cost arms on which no sensors are equipped and thus all decisions are made upon visual recognition, e.g., real-time 3D pose estimation. This requires annotating a lot of training data, which is not only time-consuming but also laborious. In this paper, we present an alternative solution, which uses a 3D model to create a large number of synthetic data, trains a vision model in this virtual domain, and applies it to real-world images after domain adaptation. To this end, we design a semi-supervised approach, which fully leverages the geometric constraints among keypoints. We apply an iterative algorithm for optimization. Without any annotations on real images, our algorithm generalizes well and produces satisfying results on 3D pose estimation, which is evaluated on two real-world datasets. We also construct a vision-based control system for task accomplishment, for which we train a reinforcement learning agent in a virtual environment and apply it to the real-world. Moreover, our approach, with merely a 3D model being required, has the potential to generalize to other types of multi-rigid-body dynamic systems.
Tasks	3D Pose Estimation, Domain Adaptation, Pose Estimation
Published	2018-12-03
URL	https://arxiv.org/abs/1812.00725v2
PDF	https://arxiv.org/pdf/1812.00725v2.pdf
PWC	https://paperswithcode.com/paper/towards-accurate-task-accomplishment-with-low
Repo	https://github.com/zuoym15/craves.ai
Framework	pytorch

An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization


Title	An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
Authors	Gongbo Tang, Fabienne Cap, Eva Pettersson, Joakim Nivre
Abstract	In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models for low-resource languages. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization.
Tasks	Machine Translation
Published	2018-06-13
URL	http://arxiv.org/abs/1806.05210v2
PDF	http://arxiv.org/pdf/1806.05210v2.pdf
PWC	https://paperswithcode.com/paper/an-evaluation-of-neural-machine-translation
Repo	https://github.com/tanggongbo/normalization-NMT
Framework	none

What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models


Title	What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Authors	Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass
Abstract	Despite the remarkable evolution of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. Previous work largely focused on what these models learn at the representation level. We break this analysis down further and study individual dimensions (neurons) in the vector representation learned by end-to-end neural models in NLP tasks. We propose two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation Analysis, an unsupervised method to extract salient neurons w.r.t. the model itself. We evaluate the effectiveness of our techniques by ablating the identified neurons and reevaluating the network’s performance for two tasks: neural machine translation (NMT) and neural language modeling (NLM). We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models? ii) are certain neurons exclusive to some properties and not others? iii) is the information more or less distributed in NMT vs. NLM? and iv) how important are the neurons identified through the linguistic correlation method to the overall task? Our code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019).
Tasks	Language Modelling, Machine Translation
Published	2018-12-21
URL	http://arxiv.org/abs/1812.09355v1
PDF	http://arxiv.org/pdf/1812.09355v1.pdf
PWC	https://paperswithcode.com/paper/what-is-one-grain-of-sand-in-the-desert
Repo	https://github.com/fdalvi/NeuroX
Framework	pytorch

OpenNMT: Neural Machine Translation Toolkit


Title	OpenNMT: Neural Machine Translation Toolkit
Authors	Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush
Abstract	OpenNMT is an open-source toolkit for neural machine translation (NMT). The system prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques. OpenNMT has been used in several production MT systems, modified for numerous research papers, and is implemented across several deep learning frameworks.
Tasks	Machine Translation
Published	2018-05-28
URL	http://arxiv.org/abs/1805.11462v1
PDF	http://arxiv.org/pdf/1805.11462v1.pdf
PWC	https://paperswithcode.com/paper/opennmt-neural-machine-translation-toolkit
Repo	https://github.com/Waino/OpenNMT-py
Framework	pytorch

Large batch size training of neural networks with adversarial training and second-order information


Title	Large batch size training of neural networks with adversarial training and second-order information
Authors	Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph Gonzalez, Kurt Keutzer, Michael Mahoney
Abstract	The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size. However, large batch training often leads to poorer generalization. A recently proposed solution for this problem is to use adaptive batch sizes in SGD. In this case, one starts with a small number of processes and scales the processes as training progresses. Two major challenges with this approach are (i) that dynamically resizing the cluster can add non-trivial overhead, in part since it is currently not supported, and (ii) that the overall speed up is limited by the initial phase with smaller batches. In this work, we address both challenges by developing a new adaptive batch size framework, with autoscaling based on the Ray framework. This allows very efficient elastic scaling with negligible resizing overhead (0.32% of time for ResNet18 ImageNet training). Furthermore, we propose a new adaptive batch size training scheme using second order methods and adversarial training. These enable increasing batch sizes earlier during training, which leads to better training time. We extensively evaluate our method on Cifar-10/100, SVHN, TinyImageNet, and ImageNet datasets, using multiple neural networks, including ResNets and smaller networks such as SqueezeNext. Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1% and $5\times$, respectively). Importantly, this is achieved without any additional hyper-parameter tuning to tailor our method in any of these experiments.
Tasks
Published	2018-10-02
URL	https://arxiv.org/abs/1810.01021v3
PDF	https://arxiv.org/pdf/1810.01021v3.pdf
PWC	https://paperswithcode.com/paper/large-batch-size-training-of-neural-networks
Repo	https://github.com/amirgholami/hessianflow
Framework	pytorch

Learning Long Term Dependencies via Fourier Recurrent Units


Title	Learning Long Term Dependencies via Fourier Recurrent Units
Authors	Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon
Abstract	It is a known fact that training recurrent neural networks for tasks that have long term dependencies is challenging. One of the main reasons is the vanishing or exploding gradient problem, which prevents gradient information from propagating to early layers. In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its training while giving us stronger expressive power. Specifically, FRU summarizes the hidden states $h^{(t)}$ along the temporal dimension with Fourier basis functions. This allows gradients to easily reach any layer due to FRU’s residual learning structure and the global support of trigonometric functions. We show that FRU has gradient lower and upper bounds independent of temporal dimension. We also show the strong expressivity of sparse Fourier basis, from which FRU obtains its strong expressive power. Our experimental study also demonstrates that with fewer parameters the proposed architecture outperforms other recurrent architectures on many tasks.
Tasks
Published	2018-03-17
URL	http://arxiv.org/abs/1803.06585v1
PDF	http://arxiv.org/pdf/1803.06585v1.pdf
PWC	https://paperswithcode.com/paper/learning-long-term-dependencies-via-fourier
Repo	https://github.com/limbo018/FRU
Framework	tf

S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks


Title	S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks
Authors	Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang
Abstract	In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network. Our architecture, named S3D, encodes the entire video stream and discretizes the output space of temporal activity spans into a set of default spans over different temporal locations and scales. At prediction time, S3D predicts scores for the presence of activity categories in each default span and produces temporal adjustments relative to the span location to predict the precise activity duration. Unlike many state-of-the-art systems that require a separate proposal and classification stage, our S3D is intrinsically simple and dedicatedly designed for single-shot, end-to-end temporal activity detection. When evaluating on THUMOS’14 detection benchmark, S3D achieves state-of-the-art performance and is very efficient and can operate at 1271 FPS.
Tasks	Action Detection, Activity Detection
Published	2018-07-21
URL	http://arxiv.org/abs/1807.08069v2
PDF	http://arxiv.org/pdf/1807.08069v2.pdf
PWC	https://paperswithcode.com/paper/s3d-single-shot-multi-span-detector-via-fully
Repo	https://github.com/dazhang-cv/Project
Framework	none

Context-Aware Crowd Counting


Title	Context-Aware Crowd Counting
Authors	Weizhe Liu, Mathieu Salzmann, Pascal Fua
Abstract	State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. They typically use the same filters over the whole image or over large image patches. Only then do they estimate local scale to compensate for perspective distortion. This is typically achieved by training an auxiliary classifier to select, for predefined image patches, the best kernel size among a limited set of choices. As such, these methods are not end-to-end trainable and restricted in the scope of context they can leverage. In this paper, we introduce an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location. In other words, our approach adaptively encodes the scale of the contextual information required to accurately predict crowd density. This yields an algorithm that outperforms state-of-the-art crowd counting methods, especially when perspective effects are strong.
Tasks	Crowd Counting
Published	2018-11-26
URL	http://arxiv.org/abs/1811.10452v2
PDF	http://arxiv.org/pdf/1811.10452v2.pdf
PWC	https://paperswithcode.com/paper/context-aware-crowd-counting
Repo	https://github.com/weizheliu/Context-Aware-Crowd-Counting
Framework	pytorch

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient


Title	Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient
Authors	Rui Zhao, Volker Tresp
Abstract	Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the GuessWhat?! game, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 7%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances.
Tasks	Policy Gradient Methods, Visual Dialog
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00737v4
PDF	http://arxiv.org/pdf/1807.00737v4.pdf
PWC	https://paperswithcode.com/paper/learning-goal-oriented-visual-dialog-via
Repo	https://github.com/ruizhaogit/GuessWhat-TemperedPolicyGradient
Framework	pytorch

Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection


Title	Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection
Authors	Da Zhang, Xiyang Dai, Yuan-Fang Wang
Abstract	Recognizing instances at different scales simultaneously is a fundamental challenge in visual detection problems. While spatial multi-scale modeling has been well studied in object detection, how to effectively apply a multi-scale architecture to temporal models for activity detection is still under-explored. In this paper, we identify three unique challenges that need to be specifically handled for temporal activity detection compared to its spatial counterpart. To address all these issues, we propose Dynamic Temporal Pyramid Network (DTPN), a new activity detection framework with a multi-scale pyramidal architecture featuring three novel designs: (1) We sample input video frames dynamically with varying frame per seconds (FPS) to construct a natural pyramidal input for video of an arbitrary length. (2) We design a two-branch multi-scale temporal feature hierarchy to deal with the inherent temporal scale variation of activity instances. (3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important. By combining all these components into a uniform network, we end up with a single-shot activity detector involving single-pass inferencing and end-to-end training. Extensive experiments show that the proposed DTPN achieves state-of-the-art performance on the challenging ActvityNet dataset.
Tasks	Action Detection, Activity Detection, Object Detection
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02536v2
PDF	http://arxiv.org/pdf/1808.02536v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-temporal-pyramid-network-a-closer
Repo	https://github.com/dazhang-cv/Project
Framework	none