May 7, 2019

2957 words 14 mins read

Paper Group AWR 48

Scalable Learning of Non-Decomposable Objectives. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning. Tree-to-Sequence Attentional Neural Machine Translation. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. Temporal Attention-Gated Model for Robust Sequence Classification. Bottleneck Conditional Density …

Scalable Learning of Non-Decomposable Objectives


Title	Scalable Learning of Non-Decomposable Objectives
Authors	Elad ET. Eban, Mariano Schain, Alan Mackey, Ariel Gordon, Rif A. Saurous, Gal Elidan
Abstract	Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the $F_\beta$ score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest. In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline.
Tasks
Published	2016-08-16
URL	http://arxiv.org/abs/1608.04802v2
PDF	http://arxiv.org/pdf/1608.04802v2.pdf
PWC	https://paperswithcode.com/paper/scalable-learning-of-non-decomposable
Repo	https://github.com/tensorflow/models
Framework	tf

ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning


Title	ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
Authors	Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, Wojciech Jaśkowski
Abstract	The recent advances in deep neural networks have led to effective vision-based reinforcement learning methods that have been employed to obtain human-level controllers in Atari 2600 games from pixel data. Atari 2600 games, however, do not resemble real-world tasks since they involve non-realistic 2D environments and the third-person perspective. Here, we propose a novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world. The software, called ViZDoom, is based on the classical first-person shooter video game, Doom. It allows developing bots that play the game using the screen buffer. ViZDoom is lightweight, fast, and highly customizable via a convenient mechanism of user scenarios. In the experimental part, we test the environment by trying to learn bots for two scenarios: a basic move-and-shoot task and a more complex maze-navigation problem. Using convolutional deep neural networks with Q-learning and experience replay, for both scenarios, we were able to train competent bots, which exhibit human-like behaviors. The results confirm the utility of ViZDoom as an AI research platform and imply that visual reinforcement learning in 3D realistic first-person perspective environments is feasible.
Tasks	Atari Games, FPS Games, Game of Doom, Q-Learning
Published	2016-05-06
URL	http://arxiv.org/abs/1605.02097v2
PDF	http://arxiv.org/pdf/1605.02097v2.pdf
PWC	https://paperswithcode.com/paper/vizdoom-a-doom-based-ai-research-platform-for
Repo	https://github.com/chengyu2/vizdoom_rl_community_canberra
Framework	none

Tree-to-Sequence Attentional Neural Machine Translation


Title	Tree-to-Sequence Attentional Neural Machine Translation
Authors	Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka
Abstract	Most of the existing Neural Machine Translation (NMT) models focus on the conversion of sequential data and do not directly use syntactic information. We propose a novel end-to-end syntactic NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. Experimental results on the WAT’15 English-to-Japanese dataset demonstrate that our proposed model considerably outperforms sequence-to-sequence attentional NMT models and compares favorably with the state-of-the-art tree-to-string SMT system.
Tasks	Machine Translation
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06075v3
PDF	http://arxiv.org/pdf/1603.06075v3.pdf
PWC	https://paperswithcode.com/paper/tree-to-sequence-attentional-neural-machine
Repo	https://github.com/tempra28/tree2seq
Framework	none

Scribbler: Controlling Deep Image Synthesis with Sketch and Color


Title	Scribbler: Controlling Deep Image Synthesis with Sketch and Color
Authors	Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
Abstract	Recently, there have been several promising methods to generate realistic imagery from deep convolutional networks. These methods sidestep the traditional computer graphics rendering pipeline and instead generate imagery at the pixel level by learning from large collections of photos (e.g. faces or bedrooms). However, these methods are of limited utility because it is difficult for a user to control what the network produces. In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces. We demonstrate a sketch based image synthesis system which allows users to ‘scribble’ over the sketch to indicate preferred color for objects. Our network can then generate convincing images that satisfy both the color and the sketch constraints of user. The network is feed-forward which allows users to see the effect of their edits in real time. We compare to recent work on sketch to image synthesis and show that our approach can generate more realistic, more diverse, and more controllable outputs. The architecture is also effective at user-guided colorization of grayscale images.
Tasks	Colorization, Image Generation
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00835v2
PDF	http://arxiv.org/pdf/1612.00835v2.pdf
PWC	https://paperswithcode.com/paper/scribbler-controlling-deep-image-synthesis
Repo	https://github.com/Pingxia/ConvolutionalSketchInversion
Framework	none

Temporal Attention-Gated Model for Robust Sequence Classification


Title	Temporal Attention-Gated Model for Robust Sequence Classification
Authors	Wenjie Pei, Tadas Baltrušaitis, David M. J. Tax, Louis-Philippe Morency
Abstract	Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences. Specifically, we extend the concept of attention model to measure the relevance of each observation (time step) of a sequence. We then use a novel gated recurrent network to learn the hidden representation for the final prediction. An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence. We demonstrate the merits of our TAGM approach, both for prediction accuracy and interpretability, on three different tasks: spoken digit recognition, text-based sentiment analysis and visual event recognition.
Tasks	Sentiment Analysis
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00385v2
PDF	http://arxiv.org/pdf/1612.00385v2.pdf
PWC	https://paperswithcode.com/paper/temporal-attention-gated-model-for-robust
Repo	https://github.com/wenjiepei/TAGM
Framework	torch

Bottleneck Conditional Density Estimation


Title	Bottleneck Conditional Density Estimation
Authors	Rui Shu, Hung H. Bui, Mohammad Ghavamzadeh
Abstract	We introduce a new framework for training deep generative models for high-dimensional conditional density estimation. The Bottleneck Conditional Density Estimator (BCDE) is a variant of the conditional variational autoencoder (CVAE) that employs layer(s) of stochastic variables as the bottleneck between the input $x$ and target $y$, where both are high-dimensional. Crucially, we propose a new hybrid training method that blends the conditional generative model with a joint generative model. Hybrid blending is the key to effective training of the BCDE, which avoids overfitting and provides a novel mechanism for leveraging unlabeled data. We show that our hybrid training procedure enables models to achieve competitive results in the MNIST quadrant prediction task in the fully-supervised setting, and sets new benchmarks in the semi-supervised regime for MNIST, SVHN, and CelebA.
Tasks	Density Estimation
Published	2016-11-25
URL	http://arxiv.org/abs/1611.08568v3
PDF	http://arxiv.org/pdf/1611.08568v3.pdf
PWC	https://paperswithcode.com/paper/bottleneck-conditional-density-estimation
Repo	https://github.com/ruishu/bcde
Framework	tf

Deeply-Fused Nets


Title	Deeply-Fused Nets
Authors	Jingdong Wang, Zhen Wei, Ting Zhang, Wenjun Zeng
Abstract	In this paper, we present a novel deep learning approach, deeply-fused nets. The central idea of our approach is deep fusion, i.e., combine the intermediate representations of base networks, where the fused output serves as the input of the remaining part of each base network, and perform such combinations deeply over several intermediate representations. The resulting deeply fused net enjoys several benefits. First, it is able to learn multi-scale representations as it enjoys the benefits of more base networks, which could form the same fused network, other than the initial group of base networks. Second, in our suggested fused net formed by one deep and one shallow base networks, the flows of the information from the earlier intermediate layer of the deep base network to the output and from the input to the later intermediate layer of the deep base network are both improved. Last, the deep and shallow base networks are jointly learnt and can benefit from each other. More interestingly, the essential depth of a fused net composed from a deep base network and a shallow base network is reduced because the fused net could be composed from a less deep base network, and thus training the fused net is less difficult than training the initial deep base network. Empirical results demonstrate that our approach achieves superior performance over two closely-related methods, ResNet and Highway, and competitive performance compared to the state-of-the-arts.
Tasks
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07716v1
PDF	http://arxiv.org/pdf/1605.07716v1.pdf
PWC	https://paperswithcode.com/paper/deeply-fused-nets
Repo	https://github.com/homles11/IGCV3
Framework	tf

FALDOI: A new minimization strategy for large displacement variational optical flow


Title	FALDOI: A new minimization strategy for large displacement variational optical flow
Authors	Roberto P. Palomares, Enric Meinhardt-Llopis, Coloma Ballester, Gloria Haro
Abstract	We propose a large displacement optical flow method that introduces a new strategy to compute a good local minimum of any optical flow energy functional. The method requires a given set of discrete matches, which can be extremely sparse, and an energy functional which locally guides the interpolation from those matches. In particular, the matches are used to guide a structured coordinate-descent of the energy functional around these keypoints. It results in a two-step minimization method at the finest scale which is very robust to the inevitable outliers of the sparse matcher and able to capture large displacements of small objects. Its benefits over other variational methods that also rely on a set of sparse matches are its robustness against very few matches, high levels of noise and outliers. We validate our proposal using several optical flow variational models. The results consistently outperform the coarse-to-fine approaches and achieve good qualitative and quantitative performance on the standard optical flow benchmarks.
Tasks	Optical Flow Estimation
Published	2016-02-29
URL	http://arxiv.org/abs/1602.08960v3
PDF	http://arxiv.org/pdf/1602.08960v3.pdf
PWC	https://paperswithcode.com/paper/faldoi-a-new-minimization-strategy-for-large
Repo	https://github.com/fperezgamonal/faldoi-ipol
Framework	none

Gradient Coding


Title	Gradient Coding
Authors	Rashish Tandon, Qi Lei, Alexandros G. Dimakis, Nikos Karampatziakis
Abstract	We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for Synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error.
Tasks
Published	2016-12-10
URL	http://arxiv.org/abs/1612.03301v2
PDF	http://arxiv.org/pdf/1612.03301v2.pdf
PWC	https://paperswithcode.com/paper/gradient-coding
Repo	https://github.com/hwang595/ErasureHead
Framework	pytorch

Deep Reinforcement Learning for Multi-Domain Dialogue Systems


Title	Deep Reinforcement Learning for Multi-Domain Dialogue Systems
Authors	Heriberto Cuayáhuitl, Seunghak Yu, Ashley Williamson, Jacob Carse
Abstract	Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning—termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems.
Tasks
Published	2016-11-26
URL	http://arxiv.org/abs/1611.08675v1
PDF	http://arxiv.org/pdf/1611.08675v1.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-for-multi-domain
Repo	https://github.com/cuayahuitl/SimpleDS
Framework	none

FPNN: Field Probing Neural Networks for 3D Data


Title	FPNN: Field Probing Neural Networks for 3D Data
Authors	Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, Leonidas J. Guibas
Abstract	Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of 3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract features from them. Each field probing filter is a set of probing points — sensors that perceive the space. Our learning algorithm optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space “intelligently”, rather than operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets.
Tasks	3D Object Recognition, Object Recognition
Published	2016-05-20
URL	http://arxiv.org/abs/1605.06240v3
PDF	http://arxiv.org/pdf/1605.06240v3.pdf
PWC	https://paperswithcode.com/paper/fpnn-field-probing-neural-networks-for-3d
Repo	https://github.com/yangyanli/FPNN
Framework	none

A User Simulator for Task-Completion Dialogues


Title	A User Simulator for Task-Completion Dialogues
Authors	Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen
Abstract	Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotated data. Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge. Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues. Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator. Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online. To ease empirical algorithmic comparisons in dialogues, this paper introduces a new, publicly available simulation framework, where our simulator, designed for the movie-booking domain, leverages both rules and collected data. The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demonstrate several agents and detail the procedure to add and test your own agent in the proposed framework.
Tasks	Task-Oriented Dialogue Systems
Published	2016-12-17
URL	http://arxiv.org/abs/1612.05688v3
PDF	http://arxiv.org/pdf/1612.05688v3.pdf
PWC	https://paperswithcode.com/paper/a-user-simulator-for-task-completion
Repo	https://github.com/MiuLab/UserSimulator
Framework	none

Data Augmentation via Levy Processes


Title	Data Augmentation via Levy Processes
Authors	Stefan Wager, William Fithian, Percy Liang
Abstract	If a document is about travel, we may expect that short snippets of the document should also be about travel. We introduce a general framework for incorporating these types of invariances into a discriminative classifier. The framework imagines data as being drawn from a slice of a Levy process. If we slice the Levy process at an earlier point in time, we obtain additional pseudo-examples, which can be used to train the classifier. We show that this scheme has two desirable properties: it preserves the Bayes decision boundary, and it is equivalent to fitting a generative model in the limit where we rewind time back to 0. Our construction captures popular schemes such as Gaussian feature noising and dropout training, as well as admitting new generalizations.
Tasks	Data Augmentation, Image Augmentation
Published	2016-03-21
URL	http://arxiv.org/abs/1603.06340v1
PDF	http://arxiv.org/pdf/1603.06340v1.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-via-levy-processes
Repo	https://github.com/swager/levythin
Framework	none

node2vec: Scalable Feature Learning for Networks


Title	node2vec: Scalable Feature Learning for Networks
Authors	Aditya Grover, Jure Leskovec
Abstract	Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node’s network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
Tasks	Link Prediction, Multi-Label Classification, Node Classification, Representation Learning
Published	2016-07-03
URL	http://arxiv.org/abs/1607.00653v1
PDF	http://arxiv.org/pdf/1607.00653v1.pdf
PWC	https://paperswithcode.com/paper/node2vec-scalable-feature-learning-for
Repo	https://github.com/WiktorJ/msnode2vec
Framework	none

Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer


Title	Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Authors	Sergey Zagoruyko, Nikos Komodakis
Abstract	Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important role in the context of applying artificial neural networks to a variety of tasks from fields such as computer vision and NLP. In this work we show that, by properly defining attention for convolutional neural networks, we can actually use this type of information in order to significantly improve the performance of a student CNN network by forcing it to mimic the attention maps of a powerful teacher network. To that end, we propose several novel methods of transferring attention, showing consistent improvement across a variety of datasets and convolutional neural network architectures. Code and models for our experiments are available at https://github.com/szagoruyko/attention-transfer
Tasks
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03928v3
PDF	http://arxiv.org/pdf/1612.03928v3.pdf
PWC	https://paperswithcode.com/paper/paying-more-attention-to-attention-improving
Repo	https://github.com/bharatsau/Darknet2
Framework	none