Paper Group AWR 201
Deep Layer Aggregation. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. AOGNets: Compositional Grammatical Architectures for Deep Learning. Revisiting the Arcade Learning Environment: Evaluation Protocols …
Deep Layer Aggregation
Title | Deep Layer Aggregation |
Authors | Fisher Yu, Dequan Wang, Evan Shelhamer, Trevor Darrell |
Abstract | Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Architectural efforts are exploring many dimensions for network backbones, designing deeper or wider architectures, but how to best aggregate layers and blocks across a network deserves further attention. Although skip connections have been incorporated to combine layers, these connections have been “shallow” themselves, and only fuse by simple, one-step operations. We augment standard architectures with deeper aggregation to better fuse information across layers. Our deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. The code is at https://github.com/ucbdrive/dla. |
Tasks | |
Published | 2017-07-20 |
URL | http://arxiv.org/abs/1707.06484v3 |
http://arxiv.org/pdf/1707.06484v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-layer-aggregation |
Repo | https://github.com/ucbdrive/dla |
Framework | pytorch |
ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games
Title | ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games |
Authors | Yuandong Tian, Qucheng Gong, Wenling Shang, Yuxin Wu, C. Lawrence Zitnick |
Abstract | In this paper, we propose ELF, an Extensive, Lightweight and Flexible platform for fundamental reinforcement learning research. Using ELF, we implement a highly customizable real-time strategy (RTS) engine with three game environments (Mini-RTS, Capture the Flag and Tower Defense). Mini-RTS, as a miniature version of StarCraft, captures key game dynamics and runs at 40K frame-per-second (FPS) per core on a Macbook Pro notebook. When coupled with modern reinforcement learning methods, the system can train a full-game bot against built-in AIs end-to-end in one day with 6 CPUs and 1 GPU. In addition, our platform is flexible in terms of environment-agent communication topologies, choices of RL methods, changes in game parameters, and can host existing C/C++-based game environments like Arcade Learning Environment. Using ELF, we thoroughly explore training parameters and show that a network with Leaky ReLU and Batch Normalization coupled with long-horizon training and progressive curriculum beats the rule-based built-in AI more than $70%$ of the time in the full game of Mini-RTS. Strong performance is also achieved on the other two games. In game replays, we show our agents learn interesting strategies. ELF, along with its RL platform, is open-sourced at https://github.com/facebookresearch/ELF. |
Tasks | Atari Games, Real-Time Strategy Games, Starcraft |
Published | 2017-07-04 |
URL | http://arxiv.org/abs/1707.01067v2 |
http://arxiv.org/pdf/1707.01067v2.pdf | |
PWC | https://paperswithcode.com/paper/elf-an-extensive-lightweight-and-flexible |
Repo | https://github.com/GaoFangshu/ELF-example |
Framework | pytorch |
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
Title | Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation |
Authors | Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, Jimmy Ba |
Abstract | In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also a method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the MuJoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2- to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods. Code is available at https://github.com/openai/baselines |
Tasks | Atari Games, Continuous Control |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05144v2 |
http://arxiv.org/pdf/1708.05144v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-trust-region-method-for-deep |
Repo | https://github.com/ghostFaceKillah/expert |
Framework | tf |
AOGNets: Compositional Grammatical Architectures for Deep Learning
Title | AOGNets: Compositional Grammatical Architectures for Deep Learning |
Authors | Xilai Li, Xi Song, Tianfu Wu |
Abstract | Neural architectures are the foundation for improving performance of deep neural networks (DNNs). This paper presents deep compositional grammatical architectures which harness the best of two worlds: grammar models and DNNs. The proposed architectures integrate compositionality and reconfigurability of the former and the capability of learning rich features of the latter in a principled way. We utilize AND-OR Grammar (AOG) as network generator in this paper and call the resulting networks AOGNets. An AOGNet consists of a number of stages each of which is composed of a number of AOG building blocks. An AOG building block splits its input feature map into N groups along feature channels and then treat it as a sentence of N words. It then jointly realizes a phrase structure grammar and a dependency grammar in bottom-up parsing the “sentence” for better feature exploration and reuse. It provides a unified framework for the best practices developed in state-of-the-art DNNs. In experiments, AOGNet is tested in the CIFAR-10, CIFAR-100 and ImageNet-1K classification benchmark and the MS-COCO object detection and segmentation benchmark. In CIFAR-10, CIFAR-100 and ImageNet-1K, AOGNet obtains better performance than ResNet and most of its variants, ResNeXt and its attention based variants such as SENet, DenseNet and DualPathNet. AOGNet also obtains the best model interpretability score using network dissection. AOGNet further shows better potential in adversarial defense. In MS-COCO, AOGNet obtains better performance than the ResNet and ResNeXt backbones in Mask R-CNN. |
Tasks | Adversarial Defense, Image Classification, Object Detection, Representation Learning |
Published | 2017-11-15 |
URL | http://arxiv.org/abs/1711.05847v3 |
http://arxiv.org/pdf/1711.05847v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-compositional-grammatical |
Repo | https://github.com/iVMCL/AOGNets |
Framework | pytorch |
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Title | Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents |
Authors | Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling |
Abstract | The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In this article we take a big picture look at how the ALE is being used by the research community. We show how diverse the evaluation methodologies in the ALE have become with time, and highlight some key concerns when evaluating agents in the ALE. We use this discussion to present some methodological best practices and provide new benchmark results using these best practices. To further the progress in the field, we introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. We conclude this big picture look by revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open. |
Tasks | Atari Games |
Published | 2017-09-18 |
URL | http://arxiv.org/abs/1709.06009v2 |
http://arxiv.org/pdf/1709.06009v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-the-arcade-learning-environment |
Repo | https://github.com/bclyang/updated-atari-env |
Framework | none |
Batched Large-scale Bayesian Optimization in High-dimensional Spaces
Title | Batched Large-scale Bayesian Optimization in High-dimensional Spaces |
Authors | Zi Wang, Clement Gehring, Pushmeet Kohli, Stefanie Jegelka |
Abstract | Bayesian optimization (BO) has become an effective approach for black-box function optimization problems when function evaluations are expensive and the optimum can be achieved within a relatively small number of queries. However, many cases, such as the ones with high-dimensional inputs, may require a much larger number of observations for optimization. Despite an abundance of observations thanks to parallel experiments, current BO techniques have been limited to merely a few thousand observations. In this paper, we propose ensemble Bayesian optimization (EBO) to address three current challenges in BO simultaneously: (1) large-scale observations; (2) high dimensional input spaces; and (3) selections of batch queries that balance quality and diversity. The key idea of EBO is to operate on an ensemble of additive Gaussian process models, each of which possesses a randomized strategy to divide and conquer. We show unprecedented, previously impossible results of scaling up BO to tens of thousands of observations within minutes of computation. |
Tasks | |
Published | 2017-06-05 |
URL | http://arxiv.org/abs/1706.01445v4 |
http://arxiv.org/pdf/1706.01445v4.pdf | |
PWC | https://paperswithcode.com/paper/batched-large-scale-bayesian-optimization-in |
Repo | https://github.com/zi-w/Ensemble-Bayesian-Optimization |
Framework | none |
An overview of embedding models of entities and relationships for knowledge base completion
Title | An overview of embedding models of entities and relationships for knowledge base completion |
Authors | Dat Quoc Nguyen |
Abstract | Knowledge bases (KBs) of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform knowledge base completion or link prediction, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper serves as a comprehensive overview of embedding models of entities and relationships for knowledge base completion, summarizing up-to-date experimental results on standard benchmark datasets. |
Tasks | Knowledge Base Completion, Link Prediction |
Published | 2017-03-23 |
URL | https://arxiv.org/abs/1703.08098v6 |
https://arxiv.org/pdf/1703.08098v6.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-of-embedding-models-of-entities |
Repo | https://github.com/Sujit-O/pykg2vec |
Framework | tf |
A Fast and Accurate Vietnamese Word Segmenter
Title | A Fast and Accurate Vietnamese Word Segmenter |
Authors | Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson |
Abstract | We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter. |
Tasks | |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06307v2 |
http://arxiv.org/pdf/1709.06307v2.pdf | |
PWC | https://paperswithcode.com/paper/a-fast-and-accurate-vietnamese-word-segmenter |
Repo | https://github.com/datquocnguyen/RDRsegmenter |
Framework | none |
Consistent feature attribution for tree ensembles
Title | Consistent feature attribution for tree ensembles |
Authors | Scott M. Lundberg, Su-In Lee |
Abstract | Note that a newer expanded version of this paper is now available at: arXiv:1802.03888 It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications. |
Tasks | Feature Importance |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.06060v6 |
http://arxiv.org/pdf/1706.06060v6.pdf | |
PWC | https://paperswithcode.com/paper/consistent-feature-attribution-for-tree |
Repo | https://github.com/bgreenwell/fastshap |
Framework | none |
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
Title | meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting |
Authors | Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang |
Abstract | We propose a simple yet effective technique for neural network learning. The forward propagation is computed as usual. In back propagation, only a small subset of the full gradient is computed to update the model parameters. The gradient vectors are sparsified in such a way that only the top-$k$ elements (in terms of magnitude) are kept. As a result, only $k$ rows or columns (depending on the layout) of the weight matrix are modified, leading to a linear reduction ($k$ divided by the vector dimension) in the computational cost. Surprisingly, experimental results demonstrate that we can update only 1-4% of the weights at each back propagation pass. This does not result in a larger number of training iterations. More interestingly, the accuracy of the resulting models is actually improved rather than degraded, and a detailed analysis is given. The code is available at https://github.com/lancopku/meProp |
Tasks | |
Published | 2017-06-19 |
URL | http://arxiv.org/abs/1706.06197v5 |
http://arxiv.org/pdf/1706.06197v5.pdf | |
PWC | https://paperswithcode.com/paper/meprop-sparsified-back-propagation-for |
Repo | https://github.com/jklj077/meProp |
Framework | pytorch |
Learning Multi-Level Hierarchies with Hindsight
Title | Learning Multi-Level Hierarchies with Hindsight |
Authors | Andrew Levy, George Konidaris, Robert Platt, Kate Saenko |
Abstract | Hierarchical agents have the potential to solve sequential decision making tasks with greater sample efficiency than their non-hierarchical counterparts because hierarchical agents can break down tasks into sets of subtasks that only require short sequences of decisions. In order to realize this potential of faster learning, hierarchical agents need to be able to learn their multiple levels of policies in parallel so these simpler subproblems can be solved simultaneously. Yet, learning multiple levels of policies in parallel is hard because it is inherently unstable: changes in a policy at one level of the hierarchy may cause changes in the transition and reward functions at higher levels in the hierarchy, making it difficult to jointly learn multiple levels of policies. In this paper, we introduce a new Hierarchical Reinforcement Learning (HRL) framework, Hierarchical Actor-Critic (HAC), that can overcome the instability issues that arise when agents try to jointly learn multiple levels of policies. The main idea behind HAC is to train each level of the hierarchy independently of the lower levels by training each level as if the lower level policies are already optimal. We demonstrate experimentally in both grid world and simulated robotics domains that our approach can significantly accelerate learning relative to other non-hierarchical and hierarchical methods. Indeed, our framework is the first to successfully learn 3-level hierarchies in parallel in tasks with continuous state and action spaces. |
Tasks | Decision Making, Hierarchical Reinforcement Learning |
Published | 2017-12-04 |
URL | https://arxiv.org/abs/1712.00948v5 |
https://arxiv.org/pdf/1712.00948v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-multi-level-hierarchies-with |
Repo | https://github.com/andrew-j-levy/Hierarchical-Actor-Critc-HAC- |
Framework | tf |
Distributional Reinforcement Learning with Quantile Regression
Title | Distributional Reinforcement Learning with Quantile Regression |
Authors | Will Dabney, Mark Rowland, Marc G. Bellemare, Rémi Munos |
Abstract | In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation. Finally, we evaluate this new algorithm on the Atari 2600 games, observing that it significantly outperforms many of the recent improvements on DQN, including the related distributional algorithm C51. |
Tasks | Atari Games, Distributional Reinforcement Learning |
Published | 2017-10-27 |
URL | http://arxiv.org/abs/1710.10044v1 |
http://arxiv.org/pdf/1710.10044v1.pdf | |
PWC | https://paperswithcode.com/paper/distributional-reinforcement-learning-with-1 |
Repo | https://github.com/ars-ashuha/quantile-regression-dqn-pytorch |
Framework | pytorch |
LabelBank: Revisiting Global Perspectives for Semantic Segmentation
Title | LabelBank: Revisiting Global Perspectives for Semantic Segmentation |
Authors | Hexiang Hu, Zhiwei Deng, Guang-Tong Zhou, Fei Sha, Greg Mori |
Abstract | Semantic segmentation requires a detailed labeling of image pixels by object category. Information derived from local image patches is necessary to describe the detailed shape of individual objects. However, this information is ambiguous and can result in noisy labels. Global inference of image content can instead capture the general semantic concepts present. We advocate that holistic inference of image concepts provides valuable information for detailed pixel labeling. We propose a generic framework to leverage holistic information in the form of a LabelBank for pixel-level segmentation. We show the ability of our framework to improve semantic segmentation performance in a variety of settings. We learn models for extracting a holistic LabelBank from visual cues, attributes, and/or textual descriptions. We demonstrate improvements in semantic segmentation accuracy on standard datasets across a range of state-of-the-art segmentation architectures and holistic inference approaches. |
Tasks | Semantic Segmentation |
Published | 2017-03-29 |
URL | http://arxiv.org/abs/1703.09891v1 |
http://arxiv.org/pdf/1703.09891v1.pdf | |
PWC | https://paperswithcode.com/paper/labelbank-revisiting-global-perspectives-for |
Repo | https://github.com/nightrome/cocostuff10k |
Framework | none |
XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification
Title | XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification |
Authors | Cătălina Cangea, Petar Veličković, Pietro Liò |
Abstract | In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for example, correlations in the space or time domain across modalities - but should be wisely exploited in order to benefit from their full predictive potential. We propose two deep learning architectures with multimodal cross-connections that allow for dataflow between several feature extractors (XFlow). Our models derive more interpretable features and achieve better performances than models which do not exchange representations, usefully exploiting correlations between audio and visual data, which have a different dimensionality and are nontrivially exchangeable. Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data. Illustrating some of the representations learned by the connections, we analyse their contribution to the increase in discrimination ability and reveal their compatibility with a lip-reading network intermediate representation. We provide the research community with Digits, a new dataset consisting of three data types extracted from videos of people saying the digits 0-9. Results show that both cross-modal architectures outperform their baselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits datasets, achieving state-of-the-art results. |
Tasks | |
Published | 2017-09-02 |
URL | http://arxiv.org/abs/1709.00572v2 |
http://arxiv.org/pdf/1709.00572v2.pdf | |
PWC | https://paperswithcode.com/paper/xflow-1d-2d-cross-modal-deep-neural-networks |
Repo | https://github.com/catalina17/XFlow |
Framework | tf |
Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video
Title | Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video |
Authors | Aljoša Ošep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe |
Abstract | We explore object discovery and detector adaptation based on unlabeled video sequences captured from a mobile platform. We propose a fully automatic approach for object mining from video which builds upon a generic object tracking approach. By applying this method to three large video datasets from autonomous driving and mobile robotics scenarios, we demonstrate its robustness and generality. Based on the object mining results, we propose a novel approach for unsupervised object discovery by appearance-based clustering. We show that this approach successfully discovers interesting objects relevant to driving scenarios. In addition, we perform self-supervised detector adaptation in order to improve detection performance on the KITTI dataset for existing categories. Our approach has direct relevance for enabling large-scale object learning for autonomous driving. |
Tasks | Autonomous Driving, Object Tracking |
Published | 2017-12-23 |
URL | http://arxiv.org/abs/1712.08832v1 |
http://arxiv.org/pdf/1712.08832v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-object-discovery-and-detector |
Repo | https://github.com/aljosaosep/kitti-track-collection |
Framework | none |