May 7, 2019

2824 words 14 mins read

Paper Group AWR 68

Robust Estimators in High Dimensions without the Computational Intractability. Improved Image Captioning via Policy Gradient optimization of SPIDEr. Tagger: Deep Unsupervised Perceptual Grouping. Preserving Color in Neural Artistic Style Transfer. LSTM based Conversation Models. Learning Physical Intuition of Block Towers by Example. Learning Visua …

Robust Estimators in High Dimensions without the Computational Intractability


Title	Robust Estimators in High Dimensions without the Computational Intractability
Authors	Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, Alistair Stewart
Abstract	We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question:Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly-linearly with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.
Tasks
Published	2016-04-21
URL	http://arxiv.org/abs/1604.06443v2
PDF	http://arxiv.org/pdf/1604.06443v2.pdf
PWC	https://paperswithcode.com/paper/robust-estimators-in-high-dimensions-without
Repo	https://github.com/hoonose/robust-filter
Framework	none

Improved Image Captioning via Policy Gradient optimization of SPIDEr


Title	Improved Image Captioning via Policy Gradient optimization of SPIDEr
Authors	Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy
Abstract	Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for. In this paper, we show how to use a policy gradient (PG) method to directly optimize a linear combination of SPICE and CIDEr (a combination we call SPIDEr): the SPICE score ensures our captions are semantically faithful to the image, while CIDEr score ensures our captions are syntactically fluent. The PG method we propose improves on the prior MIXER approach, by using Monte Carlo rollouts instead of mixing MLE training with PG. We show empirically that our algorithm leads to easier optimization and improved results compared to MIXER. Finally, we show that using our PG method we can optimize any of the metrics, including the proposed SPIDEr metric which results in image captions that are strongly preferred by human raters compared to captions generated by the same model but trained to optimize MLE or the COCO metrics.
Tasks	Image Captioning
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00370v4
PDF	http://arxiv.org/pdf/1612.00370v4.pdf
PWC	https://paperswithcode.com/paper/improved-image-captioning-via-policy-gradient
Repo	https://github.com/peteanderson80/SPICE
Framework	none

Tagger: Deep Unsupervised Perceptual Grouping


Title	Tagger: Deep Unsupervised Perceptual Grouping
Authors	Klaus Greff, Antti Rasmus, Mathias Berglund, Tele Hotloo Hao, Jürgen Schmidhuber, Harri Valpola
Abstract	We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multi-digit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semi-supervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency.
Tasks
Published	2016-06-21
URL	http://arxiv.org/abs/1606.06724v2
PDF	http://arxiv.org/pdf/1606.06724v2.pdf
PWC	https://paperswithcode.com/paper/tagger-deep-unsupervised-perceptual-grouping
Repo	https://github.com/CuriousAI/tagger
Framework	none

Preserving Color in Neural Artistic Style Transfer


Title	Preserving Color in Neural Artistic Style Transfer
Authors	Leon A. Gatys, Matthias Bethge, Aaron Hertzmann, Eli Shechtman
Abstract	This note presents an extension to the neural artistic style transfer algorithm (Gatys et al.). The original algorithm transforms an image to have the style of another given image. For example, a photograph can be transformed to have the style of a famous painting. Here we address a potential shortcoming of the original method: the algorithm transfers the colors of the original painting, which can alter the appearance of the scene in undesirable ways. We describe simple linear methods for transferring style while preserving colors.
Tasks	Style Transfer
Published	2016-06-19
URL	http://arxiv.org/abs/1606.05897v1
PDF	http://arxiv.org/pdf/1606.05897v1.pdf
PWC	https://paperswithcode.com/paper/preserving-color-in-neural-artistic-style
Repo	https://github.com/telecombcn-dl/2018-dlai-team5
Framework	tf

LSTM based Conversation Models


Title	LSTM based Conversation Models
Authors	Yi Luan, Yangfeng Ji, Mari Ostendorf
Abstract	In this paper, we present a conversational model that incorporates both context and participant role for two-party conversations. Different architectures are explored for integrating participant role and context information into a Long Short-term Memory (LSTM) language model. The conversational model can function as a language model or a language generation model. Experiments on the Ubuntu Dialog Corpus show that our model can capture multiple turn interaction between participants. The proposed method outperforms a traditional LSTM model as measured by language model perplexity and response ranking. Generated responses show characteristic differences between the two participant roles.
Tasks	Language Modelling, Text Generation
Published	2016-03-31
URL	http://arxiv.org/abs/1603.09457v1
PDF	http://arxiv.org/pdf/1603.09457v1.pdf
PWC	https://paperswithcode.com/paper/lstm-based-conversation-models
Repo	https://github.com/michaelfarrell76/End-To-End-Generative-Dialogue
Framework	torch

Learning Physical Intuition of Block Towers by Example


Title	Learning Physical Intuition of Block Towers by Example
Authors	Adam Lerer, Sam Gross, Rob Fergus
Abstract	Wooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the block trajectories. The models are also able to generalize in two important ways: (i) to new physical scenarios, e.g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects.
Tasks
Published	2016-03-03
URL	http://arxiv.org/abs/1603.01312v1
PDF	http://arxiv.org/pdf/1603.01312v1.pdf
PWC	https://paperswithcode.com/paper/learning-physical-intuition-of-block-towers
Repo	https://github.com/facebook/UETorch
Framework	torch

Learning Visual Storylines with Skipping Recurrent Neural Networks


Title	Learning Visual Storylines with Skipping Recurrent Neural Networks
Authors	Gunnar A. Sigurdsson, Xinlei Chen, Abhinav Gupta
Abstract	What does a typical visit to Paris look like? Do people first take photos of the Louvre and then the Eiffel Tower? Can we visually model a temporal event like “Paris Vacation” using current frameworks? In this paper, we explore how we can automatically learn the temporal aspects, or storylines of visual concepts from web data. Previous attempts focus on consecutive image-to-image transitions and are unsuccessful at recovering the long-term underlying story. Our novel Skipping Recurrent Neural Network (S-RNN) model does not attempt to predict each and every data point in the sequence, like classic RNNs. Rather, S-RNN uses a framework that skips through the images in the photo stream to explore the space of all ordered subsets of the albums via an efficient sampling procedure. This approach reduces the negative impact of strong short-term correlations, and recovers the latent story more accurately. We show how our learned storylines can be used to analyze, predict, and summarize photo albums from Flickr. Our experimental results provide strong qualitative and quantitative evidence that S-RNN is significantly better than other candidate methods such as LSTMs on learning long-term correlations and recovering latent storylines. Moreover, we show how storylines can help machines better understand and summarize photo streams by inferring a brief personalized story of each individual album.
Tasks
Published	2016-04-14
URL	http://arxiv.org/abs/1604.04279v2
PDF	http://arxiv.org/pdf/1604.04279v2.pdf
PWC	https://paperswithcode.com/paper/learning-visual-storylines-with-skipping
Repo	https://github.com/gsig/srnn
Framework	pytorch

On the adoption of abductive reasoning for time series interpretation


Title	On the adoption of abductive reasoning for time series interpretation
Authors	Tomás Teijeiro, Paulo Félix
Abstract	Time series interpretation aims to provide an explanation of what is observed in terms of its underlying processes. The present work is based on the assumption that the common classification-based approaches to time series interpretation suffer from a set of inherent weaknesses, whose ultimate cause lies in the monotonic nature of the deductive reasoning paradigm. In this document we propose a new approach to this problem, based on the initial hypothesis that abductive reasoning properly accounts for the human ability to identify and characterize the patterns appearing in a time series. The result of this interpretation is a set of conjectures in the form of observations, organized into an abstraction hierarchy and explaining what has been observed. A knowledge-based framework and a set of algorithms for the interpretation task are provided, implementing a hypothesize-and-test cycle guided by an attentional mechanism. As a representative application domain, interpretation of the electrocardiogram allows us to highlight the strengths of the proposed approach in comparison with traditional classification-based approaches.
Tasks	Time Series
Published	2016-09-19
URL	http://arxiv.org/abs/1609.05632v3
PDF	http://arxiv.org/pdf/1609.05632v3.pdf
PWC	https://paperswithcode.com/paper/on-the-adoption-of-abductive-reasoning-for
Repo	https://github.com/citiususc/construe
Framework	none

FractalNet: Ultra-Deep Neural Networks without Residuals


Title	FractalNet: Ultra-Deep Neural Networks without Residuals
Authors	Gustav Larsson, Michael Maire, Gregory Shakhnarovich
Abstract	We introduce a design strategy for neural network macro-architecture based on self-similarity. Repeated application of a simple expansion rule generates deep networks whose structural layouts are precisely truncated fractals. These networks contain interacting subpaths of different lengths, but do not include any pass-through or residual connections; every internal signal is transformed by a filter and nonlinearity before being seen by subsequent layers. In experiments, fractal networks match the excellent performance of standard residual networks on both CIFAR and ImageNet classification tasks, thereby demonstrating that residual representations may not be fundamental to the success of extremely deep convolutional neural networks. Rather, the key may be the ability to transition, during training, from effectively shallow to deep. We note similarities with student-teacher behavior and develop drop-path, a natural extension of dropout, to regularize co-adaptation of subpaths in fractal architectures. Such regularization allows extraction of high-performance fixed-depth subnetworks. Additionally, fractal networks exhibit an anytime property: shallow subnetworks provide a quick answer, while deeper subnetworks, with higher latency, provide a more accurate answer.
Tasks	Image Classification
Published	2016-05-24
URL	http://arxiv.org/abs/1605.07648v4
PDF	http://arxiv.org/pdf/1605.07648v4.pdf
PWC	https://paperswithcode.com/paper/fractalnet-ultra-deep-neural-networks-without
Repo	https://github.com/snf/keras-fractalnet
Framework	none

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains


Title	Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains
Authors	David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire
Abstract	High-dimensional observations and complex real-world dynamics present major challenges in reinforcement learning for both function approximation and exploration. We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals. And second, we propose an exploration strategy inspired by the principles of state abstraction and information acquisition under uncertainty. We demonstrate the empirical effectiveness of these techniques, first, as a preliminary check, on two standard tasks (Blackjack and $n$-Chain), and then on two much larger and more realistic tasks with high-dimensional observation spaces. Specifically, we introduce two benchmarks built within the game Minecraft where the observations are pixel arrays of the agent’s visual field. A combination of our two algorithmic techniques performs competitively on the standard reinforcement-learning tasks while consistently and substantially outperforming baselines on the two tasks with high-dimensional observation spaces. The new function approximator, exploration strategy, and evaluation benchmarks are each of independent interest in the pursuit of reinforcement-learning methods that scale to real-world domains.
Tasks
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04119v1
PDF	http://arxiv.org/pdf/1603.04119v1.pdf
PWC	https://paperswithcode.com/paper/exploratory-gradient-boosting-for
Repo	https://github.com/wattlebirdaz/geql
Framework	none

Systematic evaluation of CNN advances on the ImageNet


Title	Systematic evaluation of CNN advances on the ImageNet
Authors	Dmytro Mishkin, Nikolay Sergievskiy, Jiri Matas
Abstract	The paper systematically studies the impact of a range of recent advances in CNN architectures and learning methods on the object categorization (ILSVRC) problem. The evalution tests the influence of the following choices of the architecture: non-linearity (ReLU, ELU, maxout, compatibility with batch normalization), pooling variants (stochastic, max, average, mixed), network width, classifier design (convolutional, fully-connected, SPP), image pre-processing, and of learning parameters: learning rate, batch size, cleanliness of the data, etc. The performance gains of the proposed modifications are first tested individually and then in combination. The sum of individual gains is bigger than the observed improvement when all modifications are introduced, but the “deficit” is small suggesting independence of their benefits. We show that the use of 128x128 pixel images is sufficient to make qualitative conclusions about optimal network structure that hold for the full size Caffe and VGG nets. The results are obtained an order of magnitude faster than with the standard 224 pixel images.
Tasks
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02228v2
PDF	http://arxiv.org/pdf/1606.02228v2.pdf
PWC	https://paperswithcode.com/paper/systematic-evaluation-of-cnn-advances-on-the
Repo	https://github.com/ducha-aiki/caffenet-benchmark
Framework	none

Generative Visual Manipulation on the Natural Image Manifold


Title	Generative Visual Manipulation on the Natural Image Manifold
Authors	Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
Abstract	Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to “fall off” the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network. We then define a class of image editing operations, and constrain their output to lie on that learned manifold at all times. The model automatically adjusts the output keeping all edits as realistic as possible. All our manipulations are expressed in terms of constrained optimization and are applied in near-real time. We evaluate our algorithm on the task of realistic photo manipulation of shape and color. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on user’s scribbles.
Tasks
Published	2016-09-12
URL	http://arxiv.org/abs/1609.03552v3
PDF	http://arxiv.org/pdf/1609.03552v3.pdf
PWC	https://paperswithcode.com/paper/generative-visual-manipulation-on-the-natural
Repo	https://github.com/junyanz/iGAN
Framework	pytorch

Gaussian Error Linear Units (GELUs)


Title	Gaussian Error Linear Units (GELUs)
Authors	Dan Hendrycks, Kevin Gimpel
Abstract	We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU nonlinearity is the expected transformation of a stochastic regularizer which randomly applies the identity or zero map to a neuron’s input. The GELU nonlinearity weights inputs by their magnitude, rather than gates inputs by their sign as in ReLUs. We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.
Tasks
Published	2016-06-27
URL	http://arxiv.org/abs/1606.08415v3
PDF	http://arxiv.org/pdf/1606.08415v3.pdf
PWC	https://paperswithcode.com/paper/gaussian-error-linear-units-gelus
Repo	https://github.com/hendrycks/GELUs
Framework	tf

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task


Title	A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
Authors	Danqi Chen, Jason Bolton, Christopher D. Manning
Abstract	Enabling a computer to understand a document so that it can answer comprehension questions is a central, yet unsolved goal of NLP. A key factor impeding its solution by machine learned systems is the limited availability of human-annotated data. Hermann et al. (2015) seek to solve this problem by creating over a million training examples by pairing CNN and Daily Mail news articles with their summarized bullet points, and show that a neural network can then be trained to give good performance on this task. In this paper, we conduct a thorough examination of this new reading comprehension task. Our primary aim is to understand what depth of language understanding is required to do well on this task. We approach this from one side by doing a careful hand-analysis of a small subset of the problems and from the other by showing that simple, carefully designed systems can obtain accuracies of 73.6% and 76.6% on these two datasets, exceeding current state-of-the-art results by 7-10% and approaching what we believe is the ceiling for performance on this task.
Tasks	Reading Comprehension
Published	2016-06-09
URL	http://arxiv.org/abs/1606.02858v2
PDF	http://arxiv.org/pdf/1606.02858v2.pdf
PWC	https://paperswithcode.com/paper/a-thorough-examination-of-the-cnndaily-mail
Repo	https://github.com/danqi/rc-cnn-dailymail
Framework	none

Syntactically Informed Text Compression with Recurrent Neural Networks


Title	Syntactically Informed Text Compression with Recurrent Neural Networks
Authors	David Cox
Abstract	We present a self-contained system for constructing natural language models for use in text compression. Our system improves upon previous neural network based models by utilizing recent advances in syntactic parsing – Google’s SyntaxNet – to augment character-level recurrent neural networks. RNNs have proven exceptional in modeling sequence data such as text, as their architecture allows for modeling of long-term contextual information.
Tasks
Published	2016-08-08
URL	http://arxiv.org/abs/1608.02893v2
PDF	http://arxiv.org/pdf/1608.02893v2.pdf
PWC	https://paperswithcode.com/paper/syntactically-informed-text-compression-with
Repo	https://github.com/davidcox143/rnn-text-compress
Framework	none