Paper Group AWR 351
Learning Time-Sensitive Strategies in Space Fortress. Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives. Boosting Domain Adaptation by Discovering Latent Domains. InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics. End-to-End Latent Fingerprint Search. …
Learning Time-Sensitive Strategies in Space Fortress
Title | Learning Time-Sensitive Strategies in Space Fortress |
Authors | Akshat Agarwal, Ryan Hope, Katia Sycara |
Abstract | Although there has been remarkable progress and impressive performance on reinforcement learning (RL) on Atari games, there are many problems with challenging characteristics that have not yet been explored in Deep Learning for RL. These include reward sparsity, abrupt context-dependent reversals of strategy and time-sensitive game play. In this paper, we present Space Fortress, a game that incorporates all these characteristics and experimentally show that the presence of any of these renders state of the art Deep RL algorithms incapable of learning. Then, we present our enhancements to an existing algorithm and show big performance increases through each enhancement through an ablation study. We discuss how each of these enhancements was able to help and also argue that appropriate transfer learning boosts performance. |
Tasks | Atari Games, Transfer Learning |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06824v4 |
http://arxiv.org/pdf/1805.06824v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-time-sensitive-strategies-in-space |
Repo | https://github.com/agakshat/spacefortress |
Framework | pytorch |
Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives
Title | Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives |
Authors | Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, Payel Das |
Abstract | In this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be %necessarily and minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been explicitly identified by current explanation methods that explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and a brain activity strength dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate. |
Tasks | |
Published | 2018-02-21 |
URL | http://arxiv.org/abs/1802.07623v2 |
http://arxiv.org/pdf/1802.07623v2.pdf | |
PWC | https://paperswithcode.com/paper/explanations-based-on-the-missing-towards |
Repo | https://github.com/SeldonIO/alibi |
Framework | tf |
Boosting Domain Adaptation by Discovering Latent Domains
Title | Boosting Domain Adaptation by Discovering Latent Domains |
Authors | Massimiliano Mancini, Lorenzo Porzi, Samuel Rota Bulò, Barbara Caputo, Elisa Ricci |
Abstract | Current Domain Adaptation (DA) methods based on deep architectures assume that the source samples arise from a single distribution. However, in practice, most datasets can be regarded as mixtures of multiple domains. In these cases exploiting single-source DA methods for learning target classifiers may lead to sub-optimal, if not poor, results. In addition, in many applications it is difficult to manually provide the domain labels for all source data points, i.e. latent domains should be automatically discovered. This paper introduces a novel Convolutional Neural Network (CNN) architecture which (i) automatically discovers latent domains in visual datasets and (ii) exploits this information to learn robust target classifiers. Our approach is based on the introduction of two main components, which can be embedded into any existing CNN architecture: (i) a side branch that automatically computes the assignment of a source sample to a latent domain and (ii) novel layers that exploit domain membership information to appropriately align the distribution of the CNN internal feature representations to a reference distribution. We test our approach on publicly-available datasets, showing that it outperforms state-of-the-art multi-source DA methods by a large margin. |
Tasks | Domain Adaptation |
Published | 2018-05-03 |
URL | http://arxiv.org/abs/1805.01386v1 |
http://arxiv.org/pdf/1805.01386v1.pdf | |
PWC | https://paperswithcode.com/paper/boosting-domain-adaptation-by-discovering |
Repo | https://github.com/mancinimassimiliano/pytorch_wbn |
Framework | pytorch |
InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics
Title | InfoSSM: Interpretable Unsupervised Learning of Nonparametric State-Space Model for Multi-modal Dynamics |
Authors | Young-Jin Park, Han-Lim Choi |
Abstract | The goal of system identification is to learn about underlying physics dynamics behind the time-series data. To model the probabilistic and nonparametric dynamics model, Gaussian process (GP) have been widely used; GP can estimate the uncertainty of prediction and avoid over-fitting. Traditional GPSSMs, however, are based on Gaussian transition model, thus often have difficulty in describing a more complex transition model, e.g. aircraft motions. To resolve the challenge, this paper proposes a framework using multiple GP transition models which is capable of describing multi-modal dynamics. Furthermore, we extend the model to the information-theoretic framework, the so-called InfoSSM, by introducing a mutual information regularizer helping the model to learn interpretable and distinguishable multiple dynamics models. Two illustrative numerical experiments in simple Dubins vehicle and high-fidelity flight simulator are presented to demonstrate the performance and interpretability of the proposed model. Finally, this paper introduces a framework using InfoSSM with Bayesian filtering for air traffic control tracking. |
Tasks | Time Series |
Published | 2018-09-19 |
URL | http://arxiv.org/abs/1809.07109v2 |
http://arxiv.org/pdf/1809.07109v2.pdf | |
PWC | https://paperswithcode.com/paper/infossm-interpretable-unsupervised-learning |
Repo | https://github.com/yjparkLiCS/InfoSSM |
Framework | tf |
End-to-End Latent Fingerprint Search
Title | End-to-End Latent Fingerprint Search |
Authors | Kai Cao, Dinh-Luan Nguyen, Cori Tymoszek, A. K. Jain |
Abstract | A system for identifying latent fingerprints. Created at Michigan State University by Anil K. Jain, Kai Cao, Dinh-Luan Nguyen, and Cori Tymoszek. |
Tasks | Quantization |
Published | 2018-12-26 |
URL | http://arxiv.org/abs/1812.10213v1 |
http://arxiv.org/pdf/1812.10213v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-latent-fingerprint-search |
Repo | https://github.com/prip-lab/MSU-LatentAFIS |
Framework | pytorch |
CRAVES: Controlling Robotic Arm with a Vision-based Economic System
Title | CRAVES: Controlling Robotic Arm with a Vision-based Economic System |
Authors | Yiming Zuo, Weichao Qiu, Lingxi Xie, Fangwei Zhong, Yizhou Wang, Alan L. Yuille |
Abstract | Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry. This work discusses the role of computer vision algorithms in this field. We focus on low-cost arms on which no sensors are equipped and thus all decisions are made upon visual recognition, e.g., real-time 3D pose estimation. This requires annotating a lot of training data, which is not only time-consuming but also laborious. In this paper, we present an alternative solution, which uses a 3D model to create a large number of synthetic data, trains a vision model in this virtual domain, and applies it to real-world images after domain adaptation. To this end, we design a semi-supervised approach, which fully leverages the geometric constraints among keypoints. We apply an iterative algorithm for optimization. Without any annotations on real images, our algorithm generalizes well and produces satisfying results on 3D pose estimation, which is evaluated on two real-world datasets. We also construct a vision-based control system for task accomplishment, for which we train a reinforcement learning agent in a virtual environment and apply it to the real-world. Moreover, our approach, with merely a 3D model being required, has the potential to generalize to other types of multi-rigid-body dynamic systems. |
Tasks | 3D Pose Estimation, Domain Adaptation, Pose Estimation |
Published | 2018-12-03 |
URL | https://arxiv.org/abs/1812.00725v2 |
https://arxiv.org/pdf/1812.00725v2.pdf | |
PWC | https://paperswithcode.com/paper/towards-accurate-task-accomplishment-with-low |
Repo | https://github.com/zuoym15/craves.ai |
Framework | pytorch |
An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization
Title | An Evaluation of Neural Machine Translation Models on Historical Spelling Normalization |
Authors | Gongbo Tang, Fabienne Cap, Eva Pettersson, Joakim Nivre |
Abstract | In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The NMT models are at different levels, have different attention mechanisms, and different neural network architectures. Our results show that NMT models are much better than SMT models in terms of character error rate. The vanilla RNNs are competitive to GRUs/LSTMs in historical spelling normalization. Transformer models perform better only when provided with more training data. We also find that subword-level models with a small subword vocabulary are better than character-level models for low-resource languages. In addition, we propose a hybrid method which further improves the performance of historical spelling normalization. |
Tasks | Machine Translation |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.05210v2 |
http://arxiv.org/pdf/1806.05210v2.pdf | |
PWC | https://paperswithcode.com/paper/an-evaluation-of-neural-machine-translation |
Repo | https://github.com/tanggongbo/normalization-NMT |
Framework | none |
What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models
Title | What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models |
Authors | Fahim Dalvi, Nadir Durrani, Hassan Sajjad, Yonatan Belinkov, Anthony Bau, James Glass |
Abstract | Despite the remarkable evolution of deep neural networks in natural language processing (NLP), their interpretability remains a challenge. Previous work largely focused on what these models learn at the representation level. We break this analysis down further and study individual dimensions (neurons) in the vector representation learned by end-to-end neural models in NLP tasks. We propose two methods: Linguistic Correlation Analysis, based on a supervised method to extract the most relevant neurons with respect to an extrinsic task, and Cross-model Correlation Analysis, an unsupervised method to extract salient neurons w.r.t. the model itself. We evaluate the effectiveness of our techniques by ablating the identified neurons and reevaluating the network’s performance for two tasks: neural machine translation (NMT) and neural language modeling (NLM). We further present a comprehensive analysis of neurons with the aim to address the following questions: i) how localized or distributed are different linguistic properties in the models? ii) are certain neurons exclusive to some properties and not others? iii) is the information more or less distributed in NMT vs. NLM? and iv) how important are the neurons identified through the linguistic correlation method to the overall task? Our code is publicly available as part of the NeuroX toolkit (Dalvi et al. 2019). |
Tasks | Language Modelling, Machine Translation |
Published | 2018-12-21 |
URL | http://arxiv.org/abs/1812.09355v1 |
http://arxiv.org/pdf/1812.09355v1.pdf | |
PWC | https://paperswithcode.com/paper/what-is-one-grain-of-sand-in-the-desert |
Repo | https://github.com/fdalvi/NeuroX |
Framework | pytorch |
OpenNMT: Neural Machine Translation Toolkit
Title | OpenNMT: Neural Machine Translation Toolkit |
Authors | Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush |
Abstract | OpenNMT is an open-source toolkit for neural machine translation (NMT). The system prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques. OpenNMT has been used in several production MT systems, modified for numerous research papers, and is implemented across several deep learning frameworks. |
Tasks | Machine Translation |
Published | 2018-05-28 |
URL | http://arxiv.org/abs/1805.11462v1 |
http://arxiv.org/pdf/1805.11462v1.pdf | |
PWC | https://paperswithcode.com/paper/opennmt-neural-machine-translation-toolkit |
Repo | https://github.com/Waino/OpenNMT-py |
Framework | pytorch |
Large batch size training of neural networks with adversarial training and second-order information
Title | Large batch size training of neural networks with adversarial training and second-order information |
Authors | Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph Gonzalez, Kurt Keutzer, Michael Mahoney |
Abstract | The most straightforward method to accelerate Stochastic Gradient Descent (SGD) computation is to distribute the randomly selected batch of inputs over multiple processors. To keep the distributed processors fully utilized requires commensurately growing the batch size. However, large batch training often leads to poorer generalization. A recently proposed solution for this problem is to use adaptive batch sizes in SGD. In this case, one starts with a small number of processes and scales the processes as training progresses. Two major challenges with this approach are (i) that dynamically resizing the cluster can add non-trivial overhead, in part since it is currently not supported, and (ii) that the overall speed up is limited by the initial phase with smaller batches. In this work, we address both challenges by developing a new adaptive batch size framework, with autoscaling based on the Ray framework. This allows very efficient elastic scaling with negligible resizing overhead (0.32% of time for ResNet18 ImageNet training). Furthermore, we propose a new adaptive batch size training scheme using second order methods and adversarial training. These enable increasing batch sizes earlier during training, which leads to better training time. We extensively evaluate our method on Cifar-10/100, SVHN, TinyImageNet, and ImageNet datasets, using multiple neural networks, including ResNets and smaller networks such as SqueezeNext. Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1% and $5\times$, respectively). Importantly, this is achieved without any additional hyper-parameter tuning to tailor our method in any of these experiments. |
Tasks | |
Published | 2018-10-02 |
URL | https://arxiv.org/abs/1810.01021v3 |
https://arxiv.org/pdf/1810.01021v3.pdf | |
PWC | https://paperswithcode.com/paper/large-batch-size-training-of-neural-networks |
Repo | https://github.com/amirgholami/hessianflow |
Framework | pytorch |
Learning Long Term Dependencies via Fourier Recurrent Units
Title | Learning Long Term Dependencies via Fourier Recurrent Units |
Authors | Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon |
Abstract | It is a known fact that training recurrent neural networks for tasks that have long term dependencies is challenging. One of the main reasons is the vanishing or exploding gradient problem, which prevents gradient information from propagating to early layers. In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its training while giving us stronger expressive power. Specifically, FRU summarizes the hidden states $h^{(t)}$ along the temporal dimension with Fourier basis functions. This allows gradients to easily reach any layer due to FRU’s residual learning structure and the global support of trigonometric functions. We show that FRU has gradient lower and upper bounds independent of temporal dimension. We also show the strong expressivity of sparse Fourier basis, from which FRU obtains its strong expressive power. Our experimental study also demonstrates that with fewer parameters the proposed architecture outperforms other recurrent architectures on many tasks. |
Tasks | |
Published | 2018-03-17 |
URL | http://arxiv.org/abs/1803.06585v1 |
http://arxiv.org/pdf/1803.06585v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-long-term-dependencies-via-fourier |
Repo | https://github.com/limbo018/FRU |
Framework | tf |
S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks
Title | S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Networks |
Authors | Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang |
Abstract | In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network. Our architecture, named S3D, encodes the entire video stream and discretizes the output space of temporal activity spans into a set of default spans over different temporal locations and scales. At prediction time, S3D predicts scores for the presence of activity categories in each default span and produces temporal adjustments relative to the span location to predict the precise activity duration. Unlike many state-of-the-art systems that require a separate proposal and classification stage, our S3D is intrinsically simple and dedicatedly designed for single-shot, end-to-end temporal activity detection. When evaluating on THUMOS’14 detection benchmark, S3D achieves state-of-the-art performance and is very efficient and can operate at 1271 FPS. |
Tasks | Action Detection, Activity Detection |
Published | 2018-07-21 |
URL | http://arxiv.org/abs/1807.08069v2 |
http://arxiv.org/pdf/1807.08069v2.pdf | |
PWC | https://paperswithcode.com/paper/s3d-single-shot-multi-span-detector-via-fully |
Repo | https://github.com/dazhang-cv/Project |
Framework | none |
Context-Aware Crowd Counting
Title | Context-Aware Crowd Counting |
Authors | Weizhe Liu, Mathieu Salzmann, Pascal Fua |
Abstract | State-of-the-art methods for counting people in crowded scenes rely on deep networks to estimate crowd density. They typically use the same filters over the whole image or over large image patches. Only then do they estimate local scale to compensate for perspective distortion. This is typically achieved by training an auxiliary classifier to select, for predefined image patches, the best kernel size among a limited set of choices. As such, these methods are not end-to-end trainable and restricted in the scope of context they can leverage. In this paper, we introduce an end-to-end trainable deep architecture that combines features obtained using multiple receptive field sizes and learns the importance of each such feature at each image location. In other words, our approach adaptively encodes the scale of the contextual information required to accurately predict crowd density. This yields an algorithm that outperforms state-of-the-art crowd counting methods, especially when perspective effects are strong. |
Tasks | Crowd Counting |
Published | 2018-11-26 |
URL | http://arxiv.org/abs/1811.10452v2 |
http://arxiv.org/pdf/1811.10452v2.pdf | |
PWC | https://paperswithcode.com/paper/context-aware-crowd-counting |
Repo | https://github.com/weizheliu/Context-Aware-Crowd-Counting |
Framework | pytorch |
Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient
Title | Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient |
Authors | Rui Zhao, Volker Tresp |
Abstract | Learning goal-oriented dialogues by means of deep reinforcement learning has recently become a popular research topic. However, commonly used policy-based dialogue agents often end up focusing on simple utterances and suboptimal policies. To mitigate this problem, we propose a class of novel temperature-based extensions for policy gradient methods, which are referred to as Tempered Policy Gradients (TPGs). On a recent AI-testbed, i.e., the GuessWhat?! game, we achieve significant improvements with two innovations. The first one is an extension of the state-of-the-art solutions with Seq2Seq and Memory Network structures that leads to an improvement of 7%. The second one is the application of our newly developed TPG methods, which improves the performance additionally by around 5% and, even more importantly, helps produce more convincing utterances. |
Tasks | Policy Gradient Methods, Visual Dialog |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00737v4 |
http://arxiv.org/pdf/1807.00737v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-goal-oriented-visual-dialog-via |
Repo | https://github.com/ruizhaogit/GuessWhat-TemperedPolicyGradient |
Framework | pytorch |
Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection
Title | Dynamic Temporal Pyramid Network: A Closer Look at Multi-Scale Modeling for Activity Detection |
Authors | Da Zhang, Xiyang Dai, Yuan-Fang Wang |
Abstract | Recognizing instances at different scales simultaneously is a fundamental challenge in visual detection problems. While spatial multi-scale modeling has been well studied in object detection, how to effectively apply a multi-scale architecture to temporal models for activity detection is still under-explored. In this paper, we identify three unique challenges that need to be specifically handled for temporal activity detection compared to its spatial counterpart. To address all these issues, we propose Dynamic Temporal Pyramid Network (DTPN), a new activity detection framework with a multi-scale pyramidal architecture featuring three novel designs: (1) We sample input video frames dynamically with varying frame per seconds (FPS) to construct a natural pyramidal input for video of an arbitrary length. (2) We design a two-branch multi-scale temporal feature hierarchy to deal with the inherent temporal scale variation of activity instances. (3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important. By combining all these components into a uniform network, we end up with a single-shot activity detector involving single-pass inferencing and end-to-end training. Extensive experiments show that the proposed DTPN achieves state-of-the-art performance on the challenging ActvityNet dataset. |
Tasks | Action Detection, Activity Detection, Object Detection |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02536v2 |
http://arxiv.org/pdf/1808.02536v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-temporal-pyramid-network-a-closer |
Repo | https://github.com/dazhang-cv/Project |
Framework | none |