May 7, 2019

2990 words 15 mins read

Paper Group AWR 19

Paper Group AWR 19

Query-Efficient Imitation Learning for End-to-End Autonomous Driving. Embracing data abundance: BookTest Dataset for Reading Comprehension. Collaborative Filtering with User-Item Co-Autoregressive Models. Collaborative Filtering with Recurrent Neural Networks. Deep Markov Random Field for Image Modeling. PN-Net: Conjoined Triple Deep Network for Le …

Query-Efficient Imitation Learning for End-to-End Autonomous Driving

Title Query-Efficient Imitation Learning for End-to-End Autonomous Driving
Authors Jiakai Zhang, Kyunghyun Cho
Abstract One way to approach end-to-end autonomous driving is to learn a policy function that maps from a sensory input, such as an image frame from a front-facing camera, to a driving action, by imitating an expert driver, or a reference policy. This can be done by supervised learning, where a policy function is tuned to minimize the difference between the predicted and ground-truth actions. A policy function trained in this way however is known to suffer from unexpected behaviours due to the mismatch between the states reachable by the reference policy and trained policy functions. More advanced algorithms for imitation learning, such as DAgger, addresses this issue by iteratively collecting training examples from both reference and trained policies. These algorithms often requires a large number of queries to a reference policy, which is undesirable as the reference policy is often expensive. In this paper, we propose an extension of the DAgger, called SafeDAgger, that is query-efficient and more suitable for end-to-end autonomous driving. We evaluate the proposed SafeDAgger in a car racing simulator and show that it indeed requires less queries to a reference policy. We observe a significant speed up in convergence, which we conjecture to be due to the effect of automated curriculum learning.
Tasks Autonomous Driving, Car Racing, Imitation Learning
Published 2016-05-20
URL http://arxiv.org/abs/1605.06450v1
PDF http://arxiv.org/pdf/1605.06450v1.pdf
PWC https://paperswithcode.com/paper/query-efficient-imitation-learning-for-end-to
Repo https://github.com/mbhenaff/EEN
Framework pytorch

Embracing data abundance: BookTest Dataset for Reading Comprehension

Title Embracing data abundance: BookTest Dataset for Reading Comprehension
Authors Ondrej Bajgar, Rudolf Kadlec, Jan Kleindienst
Abstract There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Children’s Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.
Tasks Reading Comprehension
Published 2016-10-04
URL http://arxiv.org/abs/1610.00956v1
PDF http://arxiv.org/pdf/1610.00956v1.pdf
PWC https://paperswithcode.com/paper/embracing-data-abundance-booktest-dataset-for
Repo https://github.com/facebookresearch/ParlAI
Framework pytorch

Collaborative Filtering with User-Item Co-Autoregressive Models

Title Collaborative Filtering with User-Item Co-Autoregressive Models
Authors Chao Du, Chongxuan Li, Yin Zheng, Jun Zhu, Bo Zhang
Abstract Deep neural networks have shown promise in collaborative filtering (CF). However, existing neural approaches are either user-based or item-based, which cannot leverage all the underlying information explicitly. We propose CF-UIcA, a neural co-autoregressive model for CF tasks, which exploits the structural correlation in the domains of both users and items. The co-autoregression allows extra desired properties to be incorporated for different tasks. Furthermore, we develop an efficient stochastic learning algorithm to handle large scale datasets. We evaluate CF-UIcA on two popular benchmarks: MovieLens 1M and Netflix, and achieve state-of-the-art performance in both rating prediction and top-N recommendation tasks, which demonstrates the effectiveness of CF-UIcA.
Tasks
Published 2016-12-21
URL http://arxiv.org/abs/1612.07146v3
PDF http://arxiv.org/pdf/1612.07146v3.pdf
PWC https://paperswithcode.com/paper/collaborative-filtering-with-user-item-co
Repo https://github.com/thu-ml/CF-UIcA
Framework none

Collaborative Filtering with Recurrent Neural Networks

Title Collaborative Filtering with Recurrent Neural Networks
Authors Robin Devooght, Hugues Bersini
Abstract We show that collaborative filtering can be viewed as a sequence prediction problem, and that given this interpretation, recurrent neural networks offer very competitive approach. In particular we study how the long short-term memory (LSTM) can be applied to collaborative filtering, and how it compares to standard nearest neighbors and matrix factorization methods on movie recommendation. We show that the LSTM is competitive in all aspects, and largely outperforms other methods in terms of item coverage and short term predictions.
Tasks
Published 2016-08-26
URL http://arxiv.org/abs/1608.07400v2
PDF http://arxiv.org/pdf/1608.07400v2.pdf
PWC https://paperswithcode.com/paper/collaborative-filtering-with-recurrent-neural
Repo https://github.com/rdevooght/sequence-based-recommendations
Framework none

Deep Markov Random Field for Image Modeling

Title Deep Markov Random Field for Image Modeling
Authors Zhirong Wu, Dahua Lin, Xiaoou Tang
Abstract Markov Random Fields (MRFs), a formulation widely used in generative image modeling, have long been plagued by the lack of expressive power. This issue is primarily due to the fact that conventional MRFs formulations tend to use simplistic factors to capture local patterns. In this paper, we move beyond such limitations, and propose a novel MRF model that uses fully-connected neurons to express the complex interactions among pixels. Through theoretical analysis, we reveal an inherent connection between this model and recurrent neural networks, and thereon derive an approximated feed-forward network that couples multiple RNNs along opposite directions. This formulation combines the expressive power of deep neural networks and the cyclic dependency structure of MRF in a unified model, bringing the modeling capability to a new level. The feed-forward approximation also allows it to be efficiently learned from data. Experimental results on a variety of low-level vision tasks show notable improvement over state-of-the-arts.
Tasks
Published 2016-09-07
URL http://arxiv.org/abs/1609.02036v1
PDF http://arxiv.org/pdf/1609.02036v1.pdf
PWC https://paperswithcode.com/paper/deep-markov-random-field-for-image-modeling
Repo https://github.com/zhirongw/deep-mrf
Framework none

PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors

Title PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors
Authors Vassileios Balntas, Edward Johns, Lilian Tang, Krystian Mikolajczyk
Abstract In this paper we propose a new approach for learning local descriptors for matching image patches. It has recently been demonstrated that descriptors based on convolutional neural networks (CNN) can significantly improve the matching performance. Unfortunately their computational complexity is prohibitive for any practical application. We address this problem and propose a CNN based descriptor with improved matching performance, significantly reduced training and execution time, as well as low dimensionality. We propose to train the network with triplets of patches that include a positive and negative pairs. To that end we introduce a new loss function that exploits the relations within the triplets. We compare our approach to recently introduced MatchNet and DeepCompare and demonstrate the advantages of our descriptor in terms of performance, memory footprint and speed i.e. when run in GPU, the extraction time of our 128 dimensional feature is comparable to the fastest available binary descriptors such as BRIEF and ORB.
Tasks
Published 2016-01-19
URL http://arxiv.org/abs/1601.05030v1
PDF http://arxiv.org/pdf/1601.05030v1.pdf
PWC https://paperswithcode.com/paper/pn-net-conjoined-triple-deep-network-for
Repo https://github.com/vbalnt/pnnet
Framework torch

Improving patch-based scene text script identification with ensembles of conjoined networks

Title Improving patch-based scene text script identification with ensembles of conjoined networks
Authors Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas
Abstract This paper focuses on the problem of script identification in scene text images. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. Our experiments with this learning procedure demonstrate state-of-the-art results in two public script identification datasets. In addition, we propose a new public benchmark dataset for the evaluation of multi-lingual scene text end-to-end reading systems. Experiments done in this dataset demonstrate the key role of script identification in a complete end-to-end system that combines our script identification method with a previously published text detector and an off-the-shelf OCR engine.
Tasks Optical Character Recognition
Published 2016-02-24
URL http://arxiv.org/abs/1602.07480v2
PDF http://arxiv.org/pdf/1602.07480v2.pdf
PWC https://paperswithcode.com/paper/improving-patch-based-scene-text-script
Repo https://github.com/lluisgomez/script_identification
Framework none

EpistAid: Interactive Interface for Document Filtering in Evidence-based Health Care

Title EpistAid: Interactive Interface for Document Filtering in Evidence-based Health Care
Authors Ivania Donoso, Denis Parra
Abstract Evidence-based health care (EBHC) is an important practice of medicine which attempts to provide systematic scientific evidence to answer clinical questions. In this context, Epistemonikos (www.epistemonikos.org) is one of the first and most important online systems in the field, providing an interface that supports users on searching and filtering scientific articles for practicing EBHC. The system nowadays requires a large amount of expert human effort, where close to 500 physicians manually curate articles to be utilized in the platform. In order to scale up the large and continuous amount of data to keep the system updated, we introduce EpistAid, an interactive intelligent interface which supports clinicians in the process of curating documents for Epistemonikos within lists of papers called evidence matrices. We introduce the characteristics, design and algorithms of our solution, as well as a prototype implementation and a case study to show how our solution addresses the information overload problem in this area.
Tasks
Published 2016-11-07
URL http://arxiv.org/abs/1611.02119v1
PDF http://arxiv.org/pdf/1611.02119v1.pdf
PWC https://paperswithcode.com/paper/epistaid-interactive-interface-for-document
Repo https://github.com/indonoso/EpisteAid
Framework none

Learning in Quantum Control: High-Dimensional Global Optimization for Noisy Quantum Dynamics

Title Learning in Quantum Control: High-Dimensional Global Optimization for Noisy Quantum Dynamics
Authors Pantita Palittapongarnpim, Peter Wittek, Ehsan Zahedinejad, Shakib Vedaie, Barry C. Sanders
Abstract Quantum control is valuable for various quantum technologies such as high-fidelity gates for universal quantum computing, adaptive quantum-enhanced metrology, and ultra-cold atom manipulation. Although supervised machine learning and reinforcement learning are widely used for optimizing control parameters in classical systems, quantum control for parameter optimization is mainly pursued via gradient-based greedy algorithms. Although the quantum fitness landscape is often compatible with greedy algorithms, sometimes greedy algorithms yield poor results, especially for large-dimensional quantum systems. We employ differential evolution algorithms to circumvent the stagnation problem of non-convex optimization. We improve quantum control fidelity for noisy system by averaging over the objective function. To reduce computational cost, we introduce heuristics for early termination of runs and for adaptive selection of search subspaces. Our implementation is massively parallel and vectorized to reduce run time even further. We demonstrate our methods with two examples, namely quantum phase estimation and quantum gate design, for which we achieve superior fidelity and scalability than obtained using greedy algorithms.
Tasks
Published 2016-07-12
URL http://arxiv.org/abs/1607.03428v3
PDF http://arxiv.org/pdf/1607.03428v3.pdf
PWC https://paperswithcode.com/paper/learning-in-quantum-control-high-dimensional
Repo https://github.com/PanPalitta/phase_estimation
Framework none

Deep Variational Information Bottleneck

Title Deep Variational Information Bottleneck
Authors Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
Abstract We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method “Deep Variational Information Bottleneck”, or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.
Tasks Adversarial Attack
Published 2016-12-01
URL https://arxiv.org/abs/1612.00410v7
PDF https://arxiv.org/pdf/1612.00410v7.pdf
PWC https://paperswithcode.com/paper/deep-variational-information-bottleneck
Repo https://github.com/AliLotfi92/Deep_Variational_Information_Bottlenck
Framework tf

PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

Title PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
Authors Xingcheng Zhang, Zhizhong Li, Chen Change Loy, Dahua Lin
Abstract A number of studies have shown that increasing the depth or width of convolutional networks is a rewarding approach to improve the performance of image recognition. In our study, however, we observed difficulties along both directions. On one hand, the pursuit for very deep networks is met with a diminishing return and increased training difficulty; on the other hand, widening a network would result in a quadratic growth in both computational cost and memory demand. These difficulties motivate us to explore structural diversity in designing deep networks, a new dimension beyond just depth and width. Specifically, we present a new family of modules, namely the PolyInception, which can be flexibly inserted in isolation or in a composition as replacements of different parts of a network. Choosing PolyInception modules with the guidance of architectural efficiency can improve the expressive power while preserving comparable computational cost. The Very Deep PolyNet, designed following this direction, demonstrates substantial improvements over the state-of-the-art on the ILSVRC 2012 benchmark. Compared to Inception-ResNet-v2, it reduces the top-5 validation error on single crops from 4.9% to 4.25%, and that on multi-crops from 3.7% to 3.45%.
Tasks Image Classification
Published 2016-11-17
URL http://arxiv.org/abs/1611.05725v2
PDF http://arxiv.org/pdf/1611.05725v2.pdf
PWC https://paperswithcode.com/paper/polynet-a-pursuit-of-structural-diversity-in
Repo https://github.com/innerlee/Publications
Framework none

LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks

Title LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks
Authors Hendrik Strobelt, Sebastian Gehrmann, Hanspeter Pfister, Alexander M. Rush
Abstract Recurrent neural networks, and in particular long short-term memory (LSTM) networks, are a remarkably effective tool for sequence modeling that learn a dense black-box hidden representation of their sequential input. Researchers interested in better understanding these models have studied the changes in hidden state representations over time and noticed some interpretable patterns but also significant noise. In this work, we present LSTMVIS, a visual analysis tool for recurrent neural networks with a focus on understanding these hidden state dynamics. The tool allows users to select a hypothesis input range to focus on local state changes, to match these states changes to similar patterns in a large data set, and to align these results with structural annotations from their domain. We show several use cases of the tool for analyzing specific hidden state properties on dataset containing nesting, phrase structure, and chord progressions, and demonstrate how the tool can be used to isolate patterns for further statistical analysis. We characterize the domain, the different stakeholders, and their goals and tasks.
Tasks
Published 2016-06-23
URL http://arxiv.org/abs/1606.07461v2
PDF http://arxiv.org/pdf/1606.07461v2.pdf
PWC https://paperswithcode.com/paper/lstmvis-a-tool-for-visual-analysis-of-hidden
Repo https://github.com/HendrikStrobelt/LSTMVis
Framework tf

Multimodal Attention for Neural Machine Translation

Title Multimodal Attention for Neural Machine Translation
Authors Ozan Caglayan, Loïc Barrault, Fethi Bougares
Abstract The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.
Tasks Image Captioning, Machine Translation
Published 2016-09-13
URL http://arxiv.org/abs/1609.03976v1
PDF http://arxiv.org/pdf/1609.03976v1.pdf
PWC https://paperswithcode.com/paper/multimodal-attention-for-neural-machine
Repo https://github.com/lium-lst/nmtpy
Framework none

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Title DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
Authors Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
Abstract In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
Tasks Semantic Segmentation
Published 2016-06-02
URL http://arxiv.org/abs/1606.00915v2
PDF http://arxiv.org/pdf/1606.00915v2.pdf
PWC https://paperswithcode.com/paper/deeplab-semantic-image-segmentation-with-deep
Repo https://github.com/violin0847/crowdcounting
Framework none

ADAGIO: Fast Data-aware Near-Isometric Linear Embeddings

Title ADAGIO: Fast Data-aware Near-Isometric Linear Embeddings
Authors Jarosław Błasiok, Charalampos E. Tsourakakis
Abstract Many important applications, including signal reconstruction, parameter estimation, and signal processing in a compressed domain, rely on a low-dimensional representation of the dataset that preserves {\em all} pairwise distances between the data points and leverages the inherent geometric structure that is typically present. Recently Hedge, Sankaranarayanan, Yin and Baraniuk \cite{hedge2015} proposed the first data-aware near-isometric linear embedding which achieves the best of both worlds. However, their method NuMax does not scale to large-scale datasets. Our main contribution is a simple, data-aware, near-isometric linear dimensionality reduction method which significantly outperforms a state-of-the-art method \cite{hedge2015} with respect to scalability while achieving high quality near-isometries. Furthermore, our method comes with strong worst-case theoretical guarantees that allow us to guarantee the quality of the obtained near-isometry. We verify experimentally the efficiency of our method on numerous real-world datasets, where we find that our method ($<$10 secs) is more than 3,000$\times$ faster than the state-of-the-art method \cite{hedge2015} ($>$9 hours) on medium scale datasets with 60,000 data points in 784 dimensions. Finally, we use our method as a preprocessing step to increase the computational efficiency of a classification application and for speeding up approximate nearest neighbor queries.
Tasks Dimensionality Reduction
Published 2016-09-17
URL http://arxiv.org/abs/1609.05388v1
PDF http://arxiv.org/pdf/1609.05388v1.pdf
PWC https://paperswithcode.com/paper/adagio-fast-data-aware-near-isometric-linear
Repo https://github.com/tsourolampis/Adagio
Framework none
comments powered by Disqus