October 20, 2019

2836 words 14 mins read

Paper Group AWR 261

Behavioral Cloning from Observation. Learning to Drive in a Day. DOOBNet: Deep Object Occlusion Boundary Detection from an Image. Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget. Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant. Context is Everything: Finding Meaning Stat …

Behavioral Cloning from Observation


Title	Behavioral Cloning from Observation
Authors	Faraz Torabi, Garrett Warnell, Peter Stone
Abstract	Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.
Tasks	Imitation Learning
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01954v2
PDF	http://arxiv.org/pdf/1805.01954v2.pdf
PWC	https://paperswithcode.com/paper/behavioral-cloning-from-observation
Repo	https://github.com/montaserFath/BCO
Framework	pytorch

Learning to Drive in a Day


Title	Learning to Drive in a Day
Authors	Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah
Abstract	We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, mapping, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks.
Tasks	Autonomous Driving
Published	2018-07-01
URL	http://arxiv.org/abs/1807.00412v2
PDF	http://arxiv.org/pdf/1807.00412v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-drive-in-a-day
Repo	https://github.com/nautilusPrime/autodrive_ddpg
Framework	none

DOOBNet: Deep Object Occlusion Boundary Detection from an Image


Title	DOOBNet: Deep Object Occlusion Boundary Detection from an Image
Authors	Guoxia Wang, Xiaohui Liang, Frederick W. B. Li
Abstract	Object occlusion boundary detection is a fundamental and crucial research problem in computer vision. This is challenging to solve as encountering the extreme boundary/non-boundary class imbalance during training an object occlusion boundary detector. In this paper, we propose to address this class imbalance by up-weighting the loss contribution of false negative and false positive examples with our novel Attention Loss function. We also propose a unified end-to-end multi-task deep object occlusion boundary detection network (DOOBNet) by sharing convolutional features to simultaneously predict object boundary and occlusion orientation. DOOBNet adopts an encoder-decoder structure with skip connection in order to automatically learn multi-scale and multi-level features. We significantly surpass the state-of-the-art on the PIOD dataset (ODS F-score of .702) and the BSDS ownership dataset (ODS F-score of .555), as well as improving the detecting speed to as 0.037s per image on the PIOD dataset.
Tasks	Boundary Detection
Published	2018-06-11
URL	http://arxiv.org/abs/1806.03772v3
PDF	http://arxiv.org/pdf/1806.03772v3.pdf
PWC	https://paperswithcode.com/paper/doobnet-deep-object-occlusion-boundary
Repo	https://github.com/GuoxiaWang/DOOBNet
Framework	tf

Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget


Title	Concentrated Differentially Private Gradient Descent with Adaptive per-Iteration Privacy Budget
Authors	Jaewoo Lee, Daniel Kifer
Abstract	Iterative algorithms, like gradient descent, are common tools for solving a variety of problems, such as model fitting. For this reason, there is interest in creating differentially private versions of them. However, their conversion to differentially private algorithms is often naive. For instance, a fixed number of iterations are chosen, the privacy budget is split evenly among them, and at each iteration, parameters are updated with a noisy gradient. In this paper, we show that gradient-based algorithms can be improved by a more careful allocation of privacy budget per iteration. Intuitively, at the beginning of the optimization, gradients are expected to be large, so that they do not need to be measured as accurately. However, as the parameters approach their optimal values, the gradients decrease and hence need to be measured more accurately. We add a basic line-search capability that helps the algorithm decide when more accurate gradient measurements are necessary. Our gradient descent algorithm works with the recently introduced zCDP version of differential privacy. It outperforms prior algorithms for model fitting and is competitive with the state-of-the-art for $(\epsilon,\delta)$-differential privacy, a strictly weaker definition than zCDP.
Tasks
Published	2018-08-28
URL	http://arxiv.org/abs/1808.09501v1
PDF	http://arxiv.org/pdf/1808.09501v1.pdf
PWC	https://paperswithcode.com/paper/concentrated-differentially-private-gradient
Repo	https://github.com/ppmlguy/DP-AGD
Framework	none

Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant


Title	Sparsity in Deep Neural Networks - An Empirical Investigation with TensorQuant
Authors	Dominik Marek Loroch, Franz-Josef Pfreundt, Norbert Wehn, Janis Keuper
Abstract	Deep learning is finding its way into the embedded world with applications such as autonomous driving, smart sensors and aug- mented reality. However, the computation of deep neural networks is demanding in energy, compute power and memory. Various approaches have been investigated to reduce the necessary resources, one of which is to leverage the sparsity occurring in deep neural networks due to the high levels of redundancy in the network parameters. It has been shown that sparsity can be promoted specifically and the achieved sparsity can be very high. But in many cases the methods are evaluated on rather small topologies. It is not clear if the results transfer onto deeper topologies. In this paper, the TensorQuant toolbox has been extended to offer a platform to investigate sparsity, especially in deeper models. Several practical relevant topologies for varying classification problem sizes are investigated to show the differences in sparsity for activations, weights and gradients.
Tasks	Autonomous Driving
Published	2018-08-27
URL	http://arxiv.org/abs/1808.08784v1
PDF	http://arxiv.org/pdf/1808.08784v1.pdf
PWC	https://paperswithcode.com/paper/sparsity-in-deep-neural-networks-an-empirical
Repo	https://github.com/DominikFHG/TensorQuant
Framework	tf

Context is Everything: Finding Meaning Statistically in Semantic Spaces


Title	Context is Everything: Finding Meaning Statistically in Semantic Spaces
Authors	Eric Zelikman
Abstract	This paper introduces Contextual Salience (CoSal), a simple and explicit measure of a word’s importance in context which is a more theoretically natural, practically simpler, and more accurate replacement to tf-idf. CoSal supports very small contexts (20 or more sentences), out-of context words, and is easy to calculate. A word vector space generated with both bigram phrases and unigram tokens reveals that contextually significant words disproportionately define phrases. This relationship is applied to produce simple weighted bag-of-words sentence embeddings. This model outperforms SkipThought and the best models trained on unordered sentences in most tests in Facebook’s SentEval, beats tf-idf on all available tests, and is generally comparable to the state of the art. This paper also applies CoSal to sentence and document summarization and an improved and context-aware cosine distance. Applying the premise that unexpected words are important, CoSal is presented as a replacement for tf-idf and an intuitive measure of contextual word importance.
Tasks	Document Summarization, Sentence Embeddings
Published	2018-03-22
URL	http://arxiv.org/abs/1803.08493v5
PDF	http://arxiv.org/pdf/1803.08493v5.pdf
PWC	https://paperswithcode.com/paper/context-is-everything-finding-meaning
Repo	https://github.com/ezelikman/Context-Is-Everything
Framework	none

Entropy and mutual information in models of deep neural networks


Title	Entropy and mutual information in models of deep neural networks
Authors	Marylou Gabrié, Andre Manoel, Clément Luneau, Jean Barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová
Abstract	We examine a class of deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method. (iii) We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive.
Tasks
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09785v2
PDF	http://arxiv.org/pdf/1805.09785v2.pdf
PWC	https://paperswithcode.com/paper/entropy-and-mutual-information-in-models-of
Repo	https://github.com/sphinxteam/dnner
Framework	none

Real Time System for Facial Analysis


Title	Real Time System for Facial Analysis
Authors	Janne Tommola, Pedram Ghazi, Bishwo Adhikari, Heikki Huttunen
Abstract	In this paper we describe the anatomy of a real-time facial analysis system. The system recognizes the age, gender and facial expression from users in appearing in front of the camera. All components are based on convolutional neural networks, whose accuracy we study on commonly used training and evaluation sets. A key contribution of the work is the description of the interplay between processing threads for frame grabbing, face detection and the three types of recognition. The python code for executing the system uses common libraries–keras/tensorflow, opencv and dlib–and is available for download.
Tasks	Face Detection
Published	2018-09-14
URL	http://arxiv.org/abs/1809.05474v1
PDF	http://arxiv.org/pdf/1809.05474v1.pdf
PWC	https://paperswithcode.com/paper/real-time-system-for-facial-analysis
Repo	https://github.com/mahehu/TUT-live-age-estimator
Framework	tf

2.5D Visual Sound


Title	2.5D Visual Sound
Authors	Ruohan Gao, Kristen Grauman
Abstract	Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual experience of the scene. However, binaural recordings are scarcely available and require nontrivial expertise and equipment to obtain. We propose to convert common monaural audio into binaural audio by leveraging video. The key idea is that visual frames reveal significant spatial cues that, while explicitly lacking in the accompanying single-channel audio, are strongly linked to it. Our multi-modal approach recovers this link from unlabeled video. We devise a deep convolutional neural network that learns to decode the monaural (single-channel) soundtrack into its binaural counterpart by injecting visual information about object and scene configurations. We call the resulting output 2.5D visual sound—the visual stream helps “lift” the flat single channel audio into spatialized sound. In addition to sound generation, we show the self-supervised representation learned by our network benefits audio-visual source separation. Our video results: http://vision.cs.utexas.edu/projects/2.5D_visual_sound/
Tasks
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04204v4
PDF	http://arxiv.org/pdf/1812.04204v4.pdf
PWC	https://paperswithcode.com/paper/25d-visual-sound
Repo	https://github.com/facebookresearch/FAIR-Play
Framework	none

Asynchronous Bidirectional Decoding for Neural Machine Translation


Title	Asynchronous Bidirectional Decoding for Neural Machine Translation
Authors	Xiangwen Zhang, Jinsong Su, Yue Qin, Yang Liu, Rongrong Ji, Hongji Wang
Abstract	The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-toright manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3.14 and 1.38 BLEU points, respectively. The source code of this work can be obtained from https://github.com/DeepLearnXMU/ABDNMT.
Tasks	Machine Translation
Published	2018-01-16
URL	http://arxiv.org/abs/1801.05122v2
PDF	http://arxiv.org/pdf/1801.05122v2.pdf
PWC	https://paperswithcode.com/paper/asynchronous-bidirectional-decoding-for
Repo	https://github.com/DeepLearnXMU/ABD-NMT
Framework	none

Deep Retinex Decomposition for Low-Light Enhancement


Title	Deep Retinex Decomposition for Low-Light Enhancement
Authors	Chen Wei, Wenjing Wang, Wenhan Yang, Jiaying Liu
Abstract	Retinex model is an effective tool for low-light image enhancement. It assumes that observed images can be decomposed into the reflectance and illumination. Most existing Retinex-based methods have carefully designed hand-crafted constraints and parameters for this highly ill-posed decomposition, which may be limited by model capacity when applied in various scenes. In this paper, we collect a LOw-Light dataset (LOL) containing low/normal-light image pairs and propose a deep Retinex-Net learned on this dataset, including a Decom-Net for decomposition and an Enhance-Net for illumination adjustment. In the training process for Decom-Net, there is no ground truth of decomposed reflectance and illumination. The network is learned with only key constraints including the consistent reflectance shared by paired low/normal-light images, and the smoothness of illumination. Based on the decomposition, subsequent lightness enhancement is conducted on illumination by an enhancement network called Enhance-Net, and for joint denoising there is a denoising operation on reflectance. The Retinex-Net is end-to-end trainable, so that the learned decomposition is by nature good for lightness adjustment. Extensive experiments demonstrate that our method not only achieves visually pleasing quality for low-light enhancement but also provides a good representation of image decomposition.
Tasks	Denoising, Image Enhancement, Low-Light Image Enhancement
Published	2018-08-14
URL	http://arxiv.org/abs/1808.04560v1
PDF	http://arxiv.org/pdf/1808.04560v1.pdf
PWC	https://paperswithcode.com/paper/deep-retinex-decomposition-for-low-light
Repo	https://github.com/weichen582/RetinexNet
Framework	tf

Open3D: A Modern Library for 3D Data Processing


Title	Open3D: A Modern Library for 3D Data Processing
Authors	Qian-Yi Zhou, Jaesik Park, Vladlen Koltun
Abstract	Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. Open3D was developed from a clean slate with a small and carefully considered set of dependencies. It can be set up on different platforms and compiled from source with minimal effort. The code is clean, consistently styled, and maintained via a clear code review mechanism. Open3D has been used in a number of published research projects and is actively deployed in the cloud. We welcome contributions from the open-source community.
Tasks
Published	2018-01-30
URL	http://arxiv.org/abs/1801.09847v1
PDF	http://arxiv.org/pdf/1801.09847v1.pdf
PWC	https://paperswithcode.com/paper/open3d-a-modern-library-for-3d-data
Repo	https://github.com/IntelVCL/Open3D
Framework	none

Transferring GANs: generating images from limited data


Title	Transferring GANs: generating images from limited data
Authors	Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez-Garcia, Bogdan Raducanu
Abstract	Transferring the knowledge of pretrained networks to new domains by means of finetuning is a widely used practice for applications based on discriminative models. To the best of our knowledge this practice has not been studied within the context of generative deep networks. Therefore, we study domain adaptation applied to image generation with generative adversarial networks. We evaluate several aspects of domain adaptation, including the impact of target domain size, the relative distance between source and target domain, and the initialization of conditional GANs. Our results show that using knowledge from pretrained networks can shorten the convergence time and can significantly improve the quality of the generated images, especially when the target data is limited. We show that these conclusions can also be drawn for conditional GANs even when the pretrained model was trained without conditioning. Our results also suggest that density may be more important than diversity and a dataset with one or few densely sampled classes may be a better source model than more diverse datasets such as ImageNet or Places.
Tasks	Domain Adaptation, Image Generation
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01677v2
PDF	http://arxiv.org/pdf/1805.01677v2.pdf
PWC	https://paperswithcode.com/paper/transferring-gans-generating-images-from
Repo	https://github.com/WuChenshen/MeRGAN
Framework	tf

Breaking the Activation Function Bottleneck through Adaptive Parameterization


Title	Breaking the Activation Function Bottleneck through Adaptive Parameterization
Authors	Sebastian Flennerhag, Hujun Yin, John Keane, Mark Elliot
Abstract	Standard neural network architectures are non-linear only by virtue of a simple element-wise activation function, making them both brittle and excessively large. In this paper, we consider methods for making the feed-forward layer more flexible while preserving its basic structure. We develop simple drop-in replacements that learn to adapt their parameterization conditional on the input, thereby increasing statistical efficiency significantly. We present an adaptive LSTM that advances the state of the art for the Penn Treebank and WikiText-2 word-modeling tasks while using fewer parameters and converging in less than half as many iterations.
Tasks
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08574v4
PDF	http://arxiv.org/pdf/1805.08574v4.pdf
PWC	https://paperswithcode.com/paper/breaking-the-activation-function-bottleneck
Repo	https://github.com/flennerhag/alstm
Framework	pytorch

Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction


Title	Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction
Authors	Ningyu Zhang, Shumin Deng, Zhanlin Sun, Xi Chen, Wei Zhang, Huajun Chen
Abstract	A capsule is a group of neurons, whose activity vector represents the instantiation parameters of a specific type of entity. In this paper, we explore the capsule networks used for relation extraction in a multi-instance multi-label learning framework and propose a novel neural approach based on capsule networks with attention mechanisms. We evaluate our method with different benchmarks, and it is demonstrated that our method improves the precision of the predicted relations. Particularly, we show that capsule networks improve multiple entity pairs relation extraction.
Tasks	Multi-Label Learning, Relation Extraction
Published	2018-12-29
URL	http://arxiv.org/abs/1812.11321v1
PDF	http://arxiv.org/pdf/1812.11321v1.pdf
PWC	https://paperswithcode.com/paper/attention-based-capsule-networks-with-dynamic
Repo	https://github.com/WHUNLPLab/Papers-to-read
Framework	none