July 29, 2019

3172 words 15 mins read

Paper Group AWR 201

Deep Layer Aggregation. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. AOGNets: Compositional Grammatical Architectures for Deep Learning. Revisiting the Arcade Learning Environment: Evaluation Protocols …

Deep Layer Aggregation


Title	Deep Layer Aggregation
Authors	Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell
Abstract	Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Architectural efforts are exploring many dimensions for network backbones, designing deeper or wider architectures, but how to best aggregate layers and blocks across a network deserves further attention. Although skip connections have been incorporated to combine layers, these connections have been “shallow” themselves, and only fuse by simple, one-step operations. We augment standard architectures with deeper aggregation to better fuse information across layers. Our deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. The code is at https://github.com/ucbdrive/dla.
Tasks
Published	2017-07-20
URL	http://arxiv.org/abs/1707.06484v3
PDF	http://arxiv.org/pdf/1707.06484v3.pdf
PWC	https://paperswithcode.com/paper/deep-layer-aggregation
Repo	https://github.com/ucbdrive/dla
Framework	pytorch

ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games


Title	ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games
Authors	Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, C. Lawrence Zitnick
Abstract	In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a miniature version of StarCraft, captures key game dynamics and runs at 40K frame-per-second (FPS) per core on a Macbook Pro notebook. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end-to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environments like Arcade Learning Environment. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU and Batch Normalization coupled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than $70%$ of the time in the full game of Mini-RTS. Strong performance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open-sourced at https://github.com/facebookresearch/ELF.
Tasks	Atari Games, Real-Time Strategy Games, Starcraft
Published	2017-07-04
URL	http://arxiv.org/abs/1707.01067v2
PDF	http://arxiv.org/pdf/1707.01067v2.pdf
PWC	https://paperswithcode.com/paper/elf-an-extensive-lightweight-and-flexible
Repo	https://github.com/GaoFangshu/ELF-example
Framework	pytorch

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation


Title	Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Authors	Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba
Abstract	In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also a method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the MuJoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2- to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods. Code is available at https://github.com/openai/baselines
Tasks	Atari Games, Continuous Control
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05144v2
PDF	http://arxiv.org/pdf/1708.05144v2.pdf
PWC	https://paperswithcode.com/paper/scalable-trust-region-method-for-deep
Repo	https://github.com/ghostFaceKillah/expert
Framework	tf

AOGNets: Compositional Grammatical Architectures for Deep Learning


Title	AOGNets: Compositional Grammatical Architectures for Deep Learning
Authors	Xilai Li, Xi Song, Tianfu Wu
Abstract	Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the CIFAR-10, CIFAR-100 and ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In CIFAR-10, CIFAR-100 and ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN.
Tasks	Adversarial Defense, Image Classification, Object Detection, Representation Learning
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05847v3
PDF	http://arxiv.org/pdf/1711.05847v3.pdf
PWC	https://paperswithcode.com/paper/learning-deep-compositional-grammatical
Repo	https://github.com/iVMCL/AOGNets
Framework	pytorch

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents


Title	Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Authors	Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling
Abstract	The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In this article we take a big picture look at how the ALE is being used by the research community. We show how diverse the evaluation methodologies in the ALE have become with time, and highlight some key concerns when evaluating agents in the ALE. We use this discussion to present some methodological best practices and provide new benchmark results using these best practices. To further the progress in the field, we introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. We conclude this big picture look by revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open.
Tasks	Atari Games
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06009v2
PDF	http://arxiv.org/pdf/1709.06009v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-the-arcade-learning-environment
Repo	https://github.com/bclyang/updated-atari-env
Framework	none

Batched Large-scale Bayesian Optimization in High-dimensional Spaces


Title	Batched Large-scale Bayesian Optimization in High-dimensional Spaces
Authors	Zi Wang, Clement Gehring, Pushmeet Kohli, Stefanie Jegelka
Abstract	Bayesian optimization (BO) has become an effective approach for black-box function optimization problems when function evaluations are expensive and the optimum can be achieved within a relatively small number of queries. However, many cases, such as the ones with high-dimensional inputs, may require a much larger number of observations for optimization. Despite an abundance of observations thanks to parallel experiments, current BO techniques have been limited to merely a few thousand observations. In this paper, we propose ensemble Bayesian optimization (EBO) to address three current challenges in BO simultaneously: (1) large-scale observations; (2) high dimensional input spaces; and (3) selections of batch queries that balance quality and diversity. The key idea of EBO is to operate on an ensemble of additive Gaussian process models, each of which possesses a randomized strategy to divide and conquer. We show unprecedented, previously impossible results of scaling up BO to tens of thousands of observations within minutes of computation.
Tasks
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01445v4
PDF	http://arxiv.org/pdf/1706.01445v4.pdf
PWC	https://paperswithcode.com/paper/batched-large-scale-bayesian-optimization-in
Repo	https://github.com/zi-w/Ensemble-Bayesian-Optimization
Framework	none

An overview of embedding models of entities and relationships for knowledge base completion


Title	An overview of embedding models of entities and relationships for knowledge base completion
Authors	Dat Quoc Nguyen
Abstract	Knowledge bases (KBs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform knowledge base completion or link prediction, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper serves as a comprehensive overview of embedding models of entities and relationships for knowledge base completion, summarizing up-to-date experimental results on standard benchmark datasets.
Tasks	Knowledge Base Completion, Link Prediction
Published	2017-03-23
URL	https://arxiv.org/abs/1703.08098v6
PDF	https://arxiv.org/pdf/1703.08098v6.pdf
PWC	https://paperswithcode.com/paper/an-overview-of-embedding-models-of-entities
Repo	https://github.com/Sujit-O/pykg2vec
Framework	tf

A Fast and Accurate Vietnamese Word Segmenter


Title	A Fast and Accurate Vietnamese Word Segmenter
Authors	Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson
Abstract	We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter.
Tasks
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06307v2
PDF	http://arxiv.org/pdf/1709.06307v2.pdf
PWC	https://paperswithcode.com/paper/a-fast-and-accurate-vietnamese-word-segmenter
Repo	https://github.com/datquocnguyen/RDRsegmenter
Framework	none

Consistent feature attribution for tree ensembles


Title	Consistent feature attribution for tree ensembles
Authors	Scott M. Lundberg, Su-In Lee
Abstract	Note that a newer expanded version of this paper is now available at: arXiv:1802.03888 It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.
Tasks	Feature Importance
Published	2017-06-19
URL	http://arxiv.org/abs/1706.06060v6
PDF	http://arxiv.org/pdf/1706.06060v6.pdf
PWC	https://paperswithcode.com/paper/consistent-feature-attribution-for-tree
Repo	https://github.com/bgreenwell/fastshap
Framework	none

meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting


Title	meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
Authors	Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang
Abstract	We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp
Tasks
Published	2017-06-19
URL	http://arxiv.org/abs/1706.06197v5
PDF	http://arxiv.org/pdf/1706.06197v5.pdf
PWC	https://paperswithcode.com/paper/meprop-sparsified-back-propagation-for
Repo	https://github.com/jklj077/meProp
Framework	pytorch

Learning Multi-Level Hierarchies with Hindsight


Title	Learning Multi-Level Hierarchies with Hindsight
Authors	Andrew Levy, George Konidaris, Robert Platt, Kate Saenko
Abstract	Hierarchical agents have the potential to solve sequential decision making tasks with greater sample efficiency than their non-hierarchical counterparts because hierarchical agents can break down tasks into sets of subtasks that only require short sequences of decisions. In order to realize this potential of faster learning, hierarchical agents need to be able to learn their multiple levels of policies in parallel so these simpler subproblems can be solved simultaneously. Yet, learning multiple levels of policies in parallel is hard because it is inherently unstable: changes in a policy at one level of the hierarchy may cause changes in the transition and reward functions at higher levels in the hierarchy, making it difficult to jointly learn multiple levels of policies. In this paper, we introduce a new Hierarchical Reinforcement Learning (HRL) framework, Hierarchical Actor-Critic (HAC), that can overcome the instability issues that arise when agents try to jointly learn multiple levels of policies. The main idea behind HAC is to train each level of the hierarchy independently of the lower levels by training each level as if the lower level policies are already optimal. We demonstrate experimentally in both grid world and simulated robotics domains that our approach can significantly accelerate learning relative to other non-hierarchical and hierarchical methods. Indeed, our framework is the first to successfully learn 3-level hierarchies in parallel in tasks with continuous state and action spaces.
Tasks	Decision Making, Hierarchical Reinforcement Learning
Published	2017-12-04
URL	https://arxiv.org/abs/1712.00948v5
PDF	https://arxiv.org/pdf/1712.00948v5.pdf
PWC	https://paperswithcode.com/paper/learning-multi-level-hierarchies-with
Repo	https://github.com/andrew-j-levy/Hierarchical-Actor-Critc-HAC-
Framework	tf

Distributional Reinforcement Learning with Quantile Regression


Title	Distributional Reinforcement Learning with Quantile Regression
Authors	Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos
Abstract	In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation. Finally, we evaluate this new algorithm on the Atari 2600 games, observing that it significantly outperforms many of the recent improvements on DQN, including the related distributional algorithm C51.
Tasks	Atari Games, Distributional Reinforcement Learning
Published	2017-10-27
URL	http://arxiv.org/abs/1710.10044v1
PDF	http://arxiv.org/pdf/1710.10044v1.pdf
PWC	https://paperswithcode.com/paper/distributional-reinforcement-learning-with-1
Repo	https://github.com/ars-ashuha/quantile-regression-dqn-pytorch
Framework	pytorch

LabelBank: Revisiting Global Perspectives for Semantic Segmentation


Title	LabelBank: Revisiting Global Perspectives for Semantic Segmentation
Authors	Hexiang Hu, Zhiwei Deng, Guang-Tong Zhou, Fei Sha, Greg Mori
Abstract	Semantic segmentation requires a detailed labeling of image pixels by object category. Information derived from local image patches is necessary to describe the detailed shape of individual objects. However, this information is ambiguous and can result in noisy labels. Global inference of image content can instead capture the general semantic concepts present. We advocate that holistic inference of image concepts provides valuable information for detailed pixel labeling. We propose a generic framework to leverage holistic information in the form of a LabelBank for pixel-level segmentation. We show the ability of our framework to improve semantic segmentation performance in a variety of settings. We learn models for extracting a holistic LabelBank from visual cues, attributes, and/or textual descriptions. We demonstrate improvements in semantic segmentation accuracy on standard datasets across a range of state-of-the-art segmentation architectures and holistic inference approaches.
Tasks	Semantic Segmentation
Published	2017-03-29
URL	http://arxiv.org/abs/1703.09891v1
PDF	http://arxiv.org/pdf/1703.09891v1.pdf
PWC	https://paperswithcode.com/paper/labelbank-revisiting-global-perspectives-for
Repo	https://github.com/nightrome/cocostuff10k
Framework	none


Title	XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification
Authors	Cătălina Cangea, Petar Veličković, Pietro Liò
Abstract	In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for example, correlations in the space or time domain across modalities - but should be wisely exploited in order to benefit from their full predictive potential. We propose two deep learning architectures with multimodal cross-connections that allow for dataflow between several feature extractors (XFlow). Our models derive more interpretable features and achieve better performances than models which do not exchange representations, usefully exploiting correlations between audio and visual data, which have a different dimensionality and are nontrivially exchangeable. Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data. Illustrating some of the representations learned by the connections, we analyse their contribution to the increase in discrimination ability and reveal their compatibility with a lip-reading network intermediate representation. We provide the research community with Digits, a new dataset consisting of three data types extracted from videos of people saying the digits 0-9. Results show that both cross-modal architectures outperform their baselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits datasets, achieving state-of-the-art results.
Tasks
Published	2017-09-02
URL	http://arxiv.org/abs/1709.00572v2
PDF	http://arxiv.org/pdf/1709.00572v2.pdf
PWC	https://paperswithcode.com/paper/xflow-1d-2d-cross-modal-deep-neural-networks
Repo	https://github.com/catalina17/XFlow
Framework	tf

Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video


Title	Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video
Authors	Aljoša Ošep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe
Abstract	We explore object discovery and detector adaptation based on unlabeled video sequences captured from a mobile platform. We propose a fully automatic approach for object mining from video which builds upon a generic object tracking approach. By applying this method to three large video datasets from autonomous driving and mobile robotics scenarios, we demonstrate its robustness and generality. Based on the object mining results, we propose a novel approach for unsupervised object discovery by appearance-based clustering. We show that this approach successfully discovers interesting objects relevant to driving scenarios. In addition, we perform self-supervised detector adaptation in order to improve detection performance on the KITTI dataset for existing categories. Our approach has direct relevance for enabling large-scale object learning for autonomous driving.
Tasks	Autonomous Driving, Object Tracking
Published	2017-12-23
URL	http://arxiv.org/abs/1712.08832v1
PDF	http://arxiv.org/pdf/1712.08832v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-object-discovery-and-detector
Repo	https://github.com/aljosaosep/kitti-track-collection
Framework	none