May 7, 2019

3221 words 16 mins read

Paper Group AWR 100

Paper Group AWR 100

Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata. Coordination Annotation Extension in the Penn Tree Bank. Facial Expression Recognition using Convolutional Neural Networks: State of the Art. Entropic Causal Inference. Unsupervised Monocular Depth Estimation with Left-Right …

Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata

Title Understanding the 2016 US Presidential Election using ecological inference and distribution regression with census microdata
Authors Seth Flaxman, Dougal Sutherland, Yu-Xiang Wang, Yee Whye Teh
Abstract We combine fine-grained spatially referenced census data with the vote outcomes from the 2016 US presidential election. Using this dataset, we perform ecological inference using distribution regression (Flaxman et al, KDD 2015) with a multinomial-logit regression so as to model the vote outcome Trump, Clinton, Other / Didn’t vote as a function of demographic and socioeconomic features. Ecological inference allows us to estimate “exit poll” style results like what was Trump’s support among white women, but for entirely novel categories. We also perform exploratory data analysis to understand which census variables are predictive of voting for Trump, voting for Clinton, or not voting for either. All of our methods are implemented in python and R and are available online for replication.
Tasks
Published 2016-11-11
URL http://arxiv.org/abs/1611.03787v1
PDF http://arxiv.org/pdf/1611.03787v1.pdf
PWC https://paperswithcode.com/paper/understanding-the-2016-us-presidential
Repo https://github.com/flaxter/us2016
Framework none

Coordination Annotation Extension in the Penn Tree Bank

Title Coordination Annotation Extension in the Penn Tree Bank
Authors Jessica Ficler, Yoav Goldberg
Abstract Coordination is an important and common syntactic construction which is not handled well by state of the art parsers. Coordinations in the Penn Treebank are missing internal structure in many cases, do not include explicit marking of the conjuncts and contain various errors and inconsistencies. In this work, we initiated manual annotation process for solving these issues. We identify the different elements in a coordination phrase and label each element with its function. We add phrase boundaries when these are missing, unify inconsistencies, and fix errors. The outcome is an extension of the PTB that includes consistent and detailed structures for coordinations. We make the coordination annotation publicly available, in hope that they will facilitate further research into coordination disambiguation.
Tasks
Published 2016-06-08
URL http://arxiv.org/abs/1606.02529v1
PDF http://arxiv.org/pdf/1606.02529v1.pdf
PWC https://paperswithcode.com/paper/coordination-annotation-extension-in-the-penn
Repo https://github.com/Jess1ca/CoordinationExtPTB
Framework none

Facial Expression Recognition using Convolutional Neural Networks: State of the Art

Title Facial Expression Recognition using Convolutional Neural Networks: State of the Art
Authors Christopher Pramerdorfer, Martin Kampel
Abstract The ability to recognize facial expressions automatically enables novel applications in human-computer interaction and other areas. Consequently, there has been active research in this field, with several recent works utilizing Convolutional Neural Networks (CNNs) for feature extraction and inference. These works differ significantly in terms of CNN architectures and other factors. Based on the reported results alone, the performance impact of these factors is unclear. In this paper, we review the state of the art in image-based facial expression recognition using CNNs and highlight algorithmic differences and their performance impact. On this basis, we identify existing bottlenecks and consequently directions for advancing this research field. Furthermore, we demonstrate that overcoming one of these bottlenecks - the comparatively basic architectures of the CNNs utilized in this field - leads to a substantial performance increase. By forming an ensemble of modern deep CNNs, we obtain a FER2013 test accuracy of 75.2%, outperforming previous works without requiring auxiliary training data or face registration.
Tasks Facial Expression Recognition
Published 2016-12-09
URL http://arxiv.org/abs/1612.02903v1
PDF http://arxiv.org/pdf/1612.02903v1.pdf
PWC https://paperswithcode.com/paper/facial-expression-recognition-using
Repo https://github.com/apuayush/face_express
Framework none

Entropic Causal Inference

Title Entropic Causal Inference
Authors Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi
Abstract We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam’s razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using R'enyi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low $H_0$ entropy (cardinality) in the true direction, it must have high $H_0$ entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of finding the exogenous variable with minimum entropy is equivalent to the problem of finding minimum joint entropy given $n$ marginal distributions, also known as minimum entropy coupling problem. We propose an efficient greedy algorithm for the minimum entropy coupling problem, that for $n=2$ provably finds a local optimum. This gives a greedy algorithm for finding the exogenous variable with minimum $H_1$ (Shannon Entropy). Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models.
Tasks Causal Inference
Published 2016-11-12
URL http://arxiv.org/abs/1611.04035v2
PDF http://arxiv.org/pdf/1611.04035v2.pdf
PWC https://paperswithcode.com/paper/entropic-causal-inference
Repo https://github.com/mkocaoglu/Entropic-Causality
Framework none

Unsupervised Monocular Depth Estimation with Left-Right Consistency

Title Unsupervised Monocular Depth Estimation with Left-Right Consistency
Authors Clément Godard, Oisin Mac Aodha, Gabriel J. Brostow
Abstract Learning based methods have shown very promising results for the task of depth estimation in single images. However, most existing approaches treat depth prediction as a supervised regression problem and as a result, require vast quantities of corresponding ground truth depth data for training. Just recording quality depth data in a range of environments is a challenging problem. In this paper, we innovate beyond existing approaches, replacing the use of explicit depth data during training with easier-to-obtain binocular stereo footage. We propose a novel training objective that enables our convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data. Exploiting epipolar geometry constraints, we generate disparity images by training our network with an image reconstruction loss. We show that solving for image reconstruction alone results in poor quality depth images. To overcome this problem, we propose a novel training loss that enforces consistency between the disparities produced relative to both the left and right images, leading to improved performance and robustness compared to existing approaches. Our method produces state of the art results for monocular depth estimation on the KITTI driving dataset, even outperforming supervised methods that have been trained with ground truth depth.
Tasks Depth Estimation, Image Reconstruction
Published 2016-09-13
URL http://arxiv.org/abs/1609.03677v3
PDF http://arxiv.org/pdf/1609.03677v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-monocular-depth-estimation-with
Repo https://github.com/Yc174/monodepth
Framework tf

Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

Title Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging
Authors Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley
Abstract Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classification task. For the acoustic modeling, a large set of contextual frames of the chunk are fed into the DNN to perform a multi-label classification for the expected tags, considering that only chunk (or utterance) level rather than frame-level labels are available. Dropout and background noise aware training are also adopted to improve the generalization capability of the DNNs. For the unsupervised feature learning, we propose to use a symmetric or asymmetric deep de-noising auto-encoder (sDAE or aDAE) to generate new data-driven features from the Mel-Filter Banks (MFBs) features. The new features, which are smoothed against background noise and more compact with contextual information, can further improve the performance of the DNN baseline. Compared with the standard Gaussian Mixture Model (GMM) baseline of the DCASE 2016 audio tagging challenge, our proposed method obtains a significant equal error rate (EER) reduction from 0.21 to 0.13 on the development set. The proposed aDAE system can get a relative 6.7% EER reduction compared with the strong DNN baseline on the development set. Finally, the results also show that our approach obtains the state-of-the-art performance with 0.15 EER on the evaluation set of the DCASE 2016 audio tagging task while EER of the first prize of this challenge is 0.17.
Tasks Audio Tagging, Multi-Label Classification
Published 2016-07-13
URL http://arxiv.org/abs/1607.03681v2
PDF http://arxiv.org/pdf/1607.03681v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-feature-learning-based-on-deep
Repo https://github.com/lgerrets/asait18-tagging
Framework tf

ECO: Efficient Convolution Operators for Tracking

Title ECO: Efficient Convolution Operators for Tracking
Authors Martin Danelljan, Goutam Bhat, Fahad Shahbaz Khan, Michael Felsberg
Abstract In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples; (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015.
Tasks Visual Object Tracking
Published 2016-11-28
URL http://arxiv.org/abs/1611.09224v2
PDF http://arxiv.org/pdf/1611.09224v2.pdf
PWC https://paperswithcode.com/paper/eco-efficient-convolution-operators-for
Repo https://github.com/martin-danelljan/ECO
Framework pytorch

Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Scarcity

Title Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Scarcity
Authors Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu, Manish Shrivastava
Abstract In this paper we describe an end to end Neural Model for Named Entity Recognition NER) which is based on Bi-Directional RNN-LSTM. Almost all NER systems for Hindi use Language Specific features and handcrafted rules with gazetteers. Our model is language independent and uses no domain specific features or any handcrafted rules. Our models rely on semantic information in the form of word vectors which are learnt by an unsupervised learning algorithm on an unannotated corpus. Our model attained state of the art performance in both English and Hindi without the use of any morphological analysis or without using gazetteers of any sort.
Tasks Morphological Analysis, Named Entity Recognition
Published 2016-10-31
URL http://arxiv.org/abs/1610.09756v2
PDF http://arxiv.org/pdf/1610.09756v2.pdf
PWC https://paperswithcode.com/paper/towards-deep-learning-in-hindi-ner-an
Repo https://github.com/monikkinom/ner-lstm
Framework tf

Playing Doom with SLAM-Augmented Deep Reinforcement Learning

Title Playing Doom with SLAM-Augmented Deep Reinforcement Learning
Authors Shehroze Bhatti, Alban Desmaison, Ondrej Miksik, Nantas Nardelli, N. Siddharth, Philip H. S. Torr
Abstract A number of recent approaches to policy learning in 2D game domains have been successful going directly from raw input images to actions. However when employed in complex 3D environments, they typically suffer from challenges related to partial observability, combinatorial exploration spaces, path planning, and a scarcity of rewarding scenarios. Inspired from prior work in human cognition that indicates how humans employ a variety of semantic concepts and abstractions (object categories, localisation, etc.) to reason about the world, we build an agent-model that incorporates such abstractions into its policy-learning framework. We augment the raw image input to a Deep Q-Learning Network (DQN), by adding details of objects and structural elements encountered, along with the agent’s localisation. The different components are automatically extracted and composed into a topological representation using on-the-fly object detection and 3D-scene reconstruction.We evaluate the efficacy of our approach in Doom, a 3D first-person combat game that exhibits a number of challenges discussed, and show that our augmented framework consistently learns better, more effective policies.
Tasks Object Detection, Q-Learning
Published 2016-12-01
URL http://arxiv.org/abs/1612.00380v1
PDF http://arxiv.org/pdf/1612.00380v1.pdf
PWC https://paperswithcode.com/paper/playing-doom-with-slam-augmented-deep
Repo https://github.com/shehroze37/Augmented-Deep-Reinforcement-Learning-for-3D-environments
Framework none

Linear Support Tensor Machine: Pedestrian Detection in Thermal Infrared Images

Title Linear Support Tensor Machine: Pedestrian Detection in Thermal Infrared Images
Authors Sujoy Kumar Biswas, Peyman Milanfar
Abstract Pedestrian detection in thermal infrared images poses unique challenges because of the low resolution and noisy nature of the image. Here we propose a mid-level attribute in the form of multidimensional template, or tensor, using Local Steering Kernel (LSK) as low-level descriptors for detecting pedestrians in far infrared images. LSK is specifically designed to deal with intrinsic image noise and pixel level uncertainty by capturing local image geometry succinctly instead of collecting local orientation statistics (e.g., histograms in HOG). Our second contribution is the introduction of a new image similarity kernel in the popular maximum margin framework of support vector machines that results in a relatively short and simple training phase for building a rigid pedestrian detector. Our third contribution is to replace the sluggish but de facto sliding window based detection methodology with multichannel discrete Fourier transform, facilitating very fast and efficient pedestrian localization. The experimental studies on publicly available thermal infrared images justify our proposals and model assumptions. In addition, the proposed work also involves the release of our in-house annotations of pedestrians in more than 17000 frames of OSU Color Thermal database for the purpose of sharing with the research community.
Tasks Pedestrian Detection
Published 2016-09-26
URL http://arxiv.org/abs/1609.07878v1
PDF http://arxiv.org/pdf/1609.07878v1.pdf
PWC https://paperswithcode.com/paper/linear-support-tensor-machine-pedestrian
Repo https://github.com/tigereatsheep/LSKfeatures
Framework none

Dataflow matrix machines as programmable, dynamically expandable, self-referential generalized recurrent neural networks

Title Dataflow matrix machines as programmable, dynamically expandable, self-referential generalized recurrent neural networks
Authors Michael Bukatin, Steve Matthews, Andrey Radul
Abstract Dataflow matrix machines are a powerful generalization of recurrent neural networks. They work with multiple types of linear streams and multiple types of neurons, including higher-order neurons which dynamically update the matrix describing weights and topology of the network in question while the network is running. It seems that the power of dataflow matrix machines is sufficient for them to be a convenient general purpose programming platform. This paper explores a number of useful programming idioms and constructions arising in this context.
Tasks
Published 2016-05-17
URL http://arxiv.org/abs/1605.05296v2
PDF http://arxiv.org/pdf/1605.05296v2.pdf
PWC https://paperswithcode.com/paper/dataflow-matrix-machines-as-programmable
Repo https://github.com/anhinga/typed-dmms
Framework none
Title Deep Image Retrieval: Learning global representations for image search
Authors Albert Gordo, Jon Almazan, Jerome Revaud, Diane Larlus
Abstract We propose a novel approach for instance-level image retrieval. It produces a global and compact fixed-length representation for each image by aggregating many region-wise descriptors. In contrast to previous works employing pre-trained deep networks as a black box to produce features, our method leverages a deep architecture trained for the specific task of image retrieval. Our contribution is twofold: (i) we leverage a ranking framework to learn convolution and projection weights that are used to build the region features; and (ii) we employ a region proposal network to learn which regions should be pooled to form the final global descriptor. We show that using clean training data is key to the success of our approach. To that aim, we use a large scale but noisy landmark dataset and develop an automatic cleaning approach. The proposed architecture produces a global image representation in a single forward pass. Our approach significantly outperforms previous approaches based on global descriptors on standard datasets. It even surpasses most prior works based on costly local descriptor indexing and spatial verification. Additional material is available at www.xrce.xerox.com/Deep-Image-Retrieval.
Tasks Image Retrieval
Published 2016-04-05
URL http://arxiv.org/abs/1604.01325v2
PDF http://arxiv.org/pdf/1604.01325v2.pdf
PWC https://paperswithcode.com/paper/deep-image-retrieval-learning-global
Repo https://github.com/talal579/Deep-image-matching
Framework none

Large-Scale Image Retrieval with Attentive Deep Local Features

Title Large-Scale Image Retrieval with Attentive Deep Local Features
Authors Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, Bohyung Han
Abstract We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for keypoint selection, which shares most network layers with the descriptor. This framework can be used for image retrieval as a drop-in replacement for other keypoint detectors and descriptors, enabling more accurate feature matching and geometric verification. Our system produces reliable confidence scores to reject false positives—in particular, it is robust against queries that have no correct match in the database. To evaluate the proposed descriptor, we introduce a new large-scale dataset, referred to as Google-Landmarks dataset, which involves challenges in both database and query such as background clutter, partial occlusion, multiple landmarks, objects in variable scales, etc. We show that DELF outperforms the state-of-the-art global and local descriptors in the large-scale setting by significant margins. Code and dataset can be found at the project webpage: https://github.com/tensorflow/models/tree/master/research/delf .
Tasks Image Retrieval
Published 2016-12-19
URL http://arxiv.org/abs/1612.06321v4
PDF http://arxiv.org/pdf/1612.06321v4.pdf
PWC https://paperswithcode.com/paper/large-scale-image-retrieval-with-attentive
Repo https://github.com/jandaldrop/landmark-recognition-challenge
Framework tf

Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach

Title Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach
Authors Satoshi Hara, Kohei Hayashi
Abstract Tree ensembles, such as random forests and boosted trees, are renowned for their high prediction performance. However, their interpretability is critically limited due to the enormous complexity. In this study, we present a method to make a complex tree ensemble interpretable by simplifying the model. Specifically, we formalize the simplification of tree ensembles as a model selection problem. Given a complex tree ensemble, we aim at obtaining the simplest representation that is essentially equivalent to the original one. To this end, we derive a Bayesian model selection algorithm that optimizes the simplified model while maintaining the prediction performance. Our numerical experiments on several datasets showed that complicated tree ensembles were reasonably approximated as interpretable.
Tasks Model Selection
Published 2016-06-29
URL http://arxiv.org/abs/1606.09066v3
PDF http://arxiv.org/pdf/1606.09066v3.pdf
PWC https://paperswithcode.com/paper/making-tree-ensembles-interpretable-a
Repo https://github.com/sato9hara/defragTrees
Framework none

Continuous Deep Q-Learning with Model-based Acceleration

Title Continuous Deep Q-Learning with Model-based Acceleration
Authors Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine
Abstract Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized adantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.
Tasks Continuous Control, Q-Learning
Published 2016-03-02
URL http://arxiv.org/abs/1603.00748v1
PDF http://arxiv.org/pdf/1603.00748v1.pdf
PWC https://paperswithcode.com/paper/continuous-deep-q-learning-with-model-based
Repo https://github.com/ikostrikov/pytorch-rl
Framework pytorch
comments powered by Disqus