October 20, 2019

2886 words 14 mins read

Paper Group AWR 218

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches. A new dataset and model for learning to understand navigational instructions. Attention-based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions. DeepVoxels: Learning Persistent 3D Feature Embeddings. Chinese Pinyin Aided …

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches


Title	Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Authors	Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse
Abstract	Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies. Unfortunately, due to the large number of weights, all the examples in a mini-batch typically share the same weight perturbation, thereby limiting the variance reduction effect of large mini-batches. We introduce flipout, an efficient method for decorrelating the gradients within a mini-batch by implicitly sampling pseudo-independent weight perturbations for each example. Empirically, flipout achieves the ideal linear variance reduction for fully connected networks, convolutional networks, and RNNs. We find significant speedups in training neural networks with multiplicative Gaussian perturbations. We show that flipout is effective at regularizing LSTMs, and outperforms previous methods. Flipout also enables us to vectorize evolution strategies: in our experiments, a single GPU with flipout can handle the same throughput as at least 40 CPU cores using existing methods, equivalent to a factor-of-4 cost reduction on Amazon Web Services.
Tasks
Published	2018-03-12
URL	http://arxiv.org/abs/1803.04386v2
PDF	http://arxiv.org/pdf/1803.04386v2.pdf
PWC	https://paperswithcode.com/paper/flipout-efficient-pseudo-independent-weight
Repo	https://github.com/wan-ji/Bayesian-deep-learning
Framework	tf

A new dataset and model for learning to understand navigational instructions


Title	A new dataset and model for learning to understand navigational instructions
Authors	Ozan Arkan Can, Deniz Yuret
Abstract	In this paper, we present a state-of-the-art model and introduce a new dataset for grounded language learning. Our goal is to develop a model that can learn to follow new instructions given prior instruction-perception-action examples. We based our work on the SAIL dataset which consists of navigational instructions and actions in a maze-like environment. The new model we propose achieves the best results to date on the SAIL dataset by using an improved perceptual component that can represent relative positions of objects. We also analyze the problems with the SAIL dataset regarding its size and balance. We argue that performance on a small, fixed-size dataset is no longer a good measure to differentiate state-of-the-art models. We introduce SAILx, a synthetic dataset generator, and perform experiments where the size and balance of the dataset are controlled.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07952v1
PDF	http://arxiv.org/pdf/1805.07952v1.pdf
PWC	https://paperswithcode.com/paper/a-new-dataset-and-model-for-learning-to
Repo	https://github.com/ozanarkancan/SAILx
Framework	none

Attention-based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions


Title	Attention-based Adaptive Selection of Operations for Image Restoration in the Presence of Unknown Combined Distortions
Authors	Masanori Suganuma, Xing Liu, Takayuki Okatani
Abstract	Many studies have been conducted so far on image restoration, the problem of restoring a clean image from its distorted version. There are many different types of distortion which affect image quality. Previous studies have focused on single types of distortion, proposing methods for removing them. However, image quality degrades due to multiple factors in the real world. Thus, depending on applications, e.g., vision for autonomous cars or surveillance cameras, we need to be able to deal with multiple combined distortions with unknown mixture ratios. For this purpose, we propose a simple yet effective layer architecture of neural networks. It performs multiple operations in parallel, which are weighted by an attention mechanism to enable selection of proper operations depending on the input. The layer can be stacked to form a deep network, which is differentiable and thus can be trained in an end-to-end fashion by gradient descent. The experimental results show that the proposed method works better than previous methods by a good margin on tasks of restoring images with multiple combined distortions.
Tasks	Image Restoration
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00733v2
PDF	http://arxiv.org/pdf/1812.00733v2.pdf
PWC	https://paperswithcode.com/paper/attention-based-adaptive-selection-of
Repo	https://github.com/sg-nm/Operation-wise-attention-network
Framework	pytorch

DeepVoxels: Learning Persistent 3D Feature Embeddings


Title	DeepVoxels: Learning Persistent 3D Feature Embeddings
Authors	Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, Michael Zollhöfer
Abstract	In this work, we address the lack of 3D understanding of generative neural networks by introducing a persistent 3D feature embedding for view synthesis. To this end, we propose DeepVoxels, a learned representation that encodes the view-dependent appearance of a 3D scene without having to explicitly model its geometry. At its core, our approach is based on a Cartesian 3D grid of persistent embedded features that learn to make use of the underlying 3D scene structure. Our approach combines insights from 3D geometric computer vision with recent advances in learning image-to-image mappings based on adversarial loss functions. DeepVoxels is supervised, without requiring a 3D reconstruction of the scene, using a 2D re-rendering loss and enforces perspective and multi-view geometry in a principled manner. We apply our persistent 3D scene representation to the problem of novel view synthesis demonstrating high-quality results for a variety of challenging scenes.
Tasks	3D Reconstruction, Novel View Synthesis
Published	2018-12-03
URL	http://arxiv.org/abs/1812.01024v2
PDF	http://arxiv.org/pdf/1812.01024v2.pdf
PWC	https://paperswithcode.com/paper/deepvoxels-learning-persistent-3d-feature
Repo	https://github.com/vsitzmann/deepvoxels
Framework	pytorch

Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet


Title	Chinese Pinyin Aided IME, Input What You Have Not Keystroked Yet
Authors	Yafang Huang, Hai Zhao
Abstract	Chinese pinyin input method engine (IME) converts pinyin into character so that Chinese characters can be conveniently inputted into computer through common keyboard. IMEs work relying on its core component, pinyin-to-character conversion (P2C). Usually Chinese IMEs simply predict a list of character sequences for user choice only according to user pinyin input at each turn. However, Chinese inputting is a multi-turn online procedure, which can be supposed to be exploited for further user experience promoting. This paper thus for the first time introduces a sequence-to-sequence model with gated-attention mechanism for the core task in IMEs. The proposed neural P2C model is learned by encoding previous input utterance as extra context to enable our IME capable of predicting character sequence with incomplete pinyin input. Our model is evaluated in different benchmark datasets showing great user experience improvement compared to traditional models, which demonstrates the first engineering practice of building Chinese aided IME.
Tasks
Published	2018-09-02
URL	http://arxiv.org/abs/1809.00329v1
PDF	http://arxiv.org/pdf/1809.00329v1.pdf
PWC	https://paperswithcode.com/paper/chinese-pinyin-aided-ime-input-what-you-have
Repo	https://github.com/YvonneHuang/gaIME
Framework	pytorch

Towards Sparse Hierarchical Graph Classifiers


Title	Towards Sparse Hierarchical Graph Classifiers
Authors	Cătălina Cangea, Petar Veličković, Nikola Jovanović, Thomas Kipf, Pietro Liò
Abstract	Recent advances in representation learning on graphs, mainly leveraging graph convolutional networks, have brought a substantial improvement on many graph-based benchmark tasks. While novel approaches to learning node embeddings are highly suitable for node classification and link prediction, their application to graph classification (predicting a single label for the entire graph) remains mostly rudimentary, typically using a single global pooling step to aggregate node features or a hand-designed, fixed heuristic for hierarchical coarsening of the graph structure. An important step towards ameliorating this is differentiable graph coarsening—the ability to reduce the size of the graph in an adaptive, data-dependent manner within a graph neural network pipeline, analogous to image downsampling within CNNs. However, the previous prominent approach to pooling has quadratic memory requirements during training and is therefore not scalable to large graphs. Here we combine several recent advances in graph neural network design to demonstrate that competitive hierarchical graph classification results are possible without sacrificing sparsity. Our results are verified on several established graph classification benchmarks, and highlight an important direction for future research in graph-based neural networks.
Tasks	Graph Classification, Link Prediction, Node Classification, Representation Learning
Published	2018-11-03
URL	http://arxiv.org/abs/1811.01287v1
PDF	http://arxiv.org/pdf/1811.01287v1.pdf
PWC	https://paperswithcode.com/paper/towards-sparse-hierarchical-graph-classifiers
Repo	https://github.com/HeapHop30/hierarchical-pooling
Framework	tf

Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing


Title	Fully Convolutional Network with Multi-Step Reinforcement Learning for Image Processing
Authors	Ryosuke Furuta, Naoto Inoue, Toshihiko Yamasaki
Abstract	This paper tackles a new problem setting: reinforcement learning with pixel-wise rewards (pixelRL) for image processing. After the introduction of the deep Q-network, deep RL has been achieving great success. However, the applications of deep RL for image processing are still limited. Therefore, we extend deep RL to pixelRL for various image processing applications. In pixelRL, each pixel has an agent, and the agent changes the pixel value by taking an action. We also propose an effective learning method for pixelRL that significantly improves the performance by considering not only the future states of the own pixel but also those of the neighbor pixels. The proposed method can be applied to some image processing tasks that require pixel-wise manipulations, where deep RL has never been applied. We apply the proposed method to three image processing tasks: image denoising, image restoration, and local color enhancement. Our experimental results demonstrate that the proposed method achieves comparable or better performance, compared with the state-of-the-art methods based on supervised learning.
Tasks	Denoising, Image Denoising, Image Restoration, Local Color Enhancement
Published	2018-11-10
URL	http://arxiv.org/abs/1811.04323v2
PDF	http://arxiv.org/pdf/1811.04323v2.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-network-with-multi-step
Repo	https://github.com/rfuruta/pixelRL
Framework	none

Fast Model-Selection through Adapting Design of Experiments Maximizing Information Gain


Title	Fast Model-Selection through Adapting Design of Experiments Maximizing Information Gain
Authors	Stefano Balietti, Brennan Klein, Christoph Riedl
Abstract	To perform model-selection efficiently, we must run informative experiments. Here, we extend a seminal method for designing Bayesian optimal experiments that maximize the information gained from data collected. We introduce two computational improvements that make the procedure tractable: a search algorithm from artificial intelligence and a sampling procedure shrinking the space of possible experiments to evaluate. We collected data for five different experimental designs of a simple imperfect information game and show that experiments optimized for information gain make model-selection possible (and cheaper). We compare the ability of the optimal experimental design to discriminate among competing models against the experimental designs chosen by a “wisdom of experts” prediction experiment. We find that a simple reinforcement learning model best explains human decision-making and that subject behavior is not adequately described by Bayesian Nash equilibrium. Our procedure is general and can be applied iteratively to lab, field and online experiments.
Tasks	Decision Making, Model Selection
Published	2018-07-10
URL	http://arxiv.org/abs/1807.07024v2
PDF	http://arxiv.org/pdf/1807.07024v2.pdf
PWC	https://paperswithcode.com/paper/fast-model-selection-through-adapting-design
Repo	https://github.com/shakty/optimal-design
Framework	none


Title	Inequity aversion improves cooperation in intertemporal social dilemmas
Authors	Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel
Abstract	Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.
Tasks	Multi-agent Reinforcement Learning
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08884v3
PDF	http://arxiv.org/pdf/1803.08884v3.pdf
PWC	https://paperswithcode.com/paper/inequity-aversion-improves-cooperation-in
Repo	https://github.com/eugenevinitsky/sequential_social_dilemma_games
Framework	none

Conversational Recommender System


Title	Conversational Recommender System
Authors	Yueming Sun, Yi Zhang
Abstract	A personalized conversational sales agent could have much commercial potential. E-commerce companies such as Amazon, eBay, JD, Alibaba etc. are piloting such kind of agents with their users. However, the research on this topic is very limited and existing solutions are either based on single round adhoc search engine or traditional multi round dialog system. They usually only utilize user inputs in the current session, ignoring users’ long term preferences. On the other hand, it is well known that sales conversion rate can be greatly improved based on recommender systems, which learn user preferences based on past purchasing behavior and optimize business oriented metrics such as conversion rate or expected revenue. In this work, we propose to integrate research in dialog systems and recommender systems into a novel and unified deep reinforcement learning framework to build a personalized conversational recommendation agent that optimizes a per session based utility function.
Tasks	Recommendation Systems
Published	2018-06-08
URL	https://arxiv.org/abs/1806.03277v1
PDF	https://arxiv.org/pdf/1806.03277v1.pdf
PWC	https://paperswithcode.com/paper/conversational-recommender-system
Repo	https://github.com/xxkkrr/conv_rec_sys
Framework	none

Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning


Title	Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning
Authors	Luca Franceschi, Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo, Paolo Frasconi
Abstract	In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize learning rates, automatically weight the loss of single examples and learn hyper-representations with Far-HO, a software package based on the popular deep learning framework TensorFlow that allows to seamlessly tackle both HO and ML problems.
Tasks	Hyperparameter Optimization, Meta-Learning
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04941v1
PDF	http://arxiv.org/pdf/1806.04941v1.pdf
PWC	https://paperswithcode.com/paper/far-ho-a-bilevel-programming-package-for
Repo	https://github.com/prolearner/hyper-representation
Framework	tf

From Principal Subspaces to Principal Components with Linear Autoencoders


Title	From Principal Subspaces to Principal Components with Linear Autoencoders
Authors	Elad Plaut
Abstract	The autoencoder is an effective unsupervised learning model which is widely used in deep learning. It is well known that an autoencoder with a single fully-connected hidden layer, a linear activation function and a squared error cost function trains weights that span the same subspace as the one spanned by the principal component loading vectors, but that they are not identical to the loading vectors. In this paper, we show how to recover the loading vectors from the autoencoder weights.
Tasks	Dimensionality Reduction
Published	2018-04-26
URL	http://arxiv.org/abs/1804.10253v3
PDF	http://arxiv.org/pdf/1804.10253v3.pdf
PWC	https://paperswithcode.com/paper/from-principal-subspaces-to-principal
Repo	https://github.com/plaut/linear-ae-pca
Framework	none

A High-Performance CNN Method for Offline Handwritten Chinese Character Recognition and Visualization


Title	A High-Performance CNN Method for Offline Handwritten Chinese Character Recognition and Visualization
Authors	Pavlo Melnyk, Zhiqiang You, Keqin Li
Abstract	Recent researches introduced fast, compact and efficient convolutional neural networks (CNNs) for offline handwritten Chinese character recognition (HCCR). However, many of them did not address the problem of network interpretability. We propose a new architecture of a deep CNN with high recognition performance which is capable of learning deep features for visualization. A special characteristic of our model is the bottleneck layers which enable us to retain its expressiveness while reducing the number of multiply-accumulate operations and the required storage. We introduce a modification of global weighted average pooling (GWAP) - global weighted output average pooling (GWOAP). This paper demonstrates how they allow us to calculate class activation maps (CAMs) in order to indicate the most relevant input character image regions used by our CNN to identify a certain class. Evaluating on the ICDAR-2013 offline HCCR competition dataset, we show that our model enables a relative 0.83% error reduction while having 49% fewer parameters and the same computational cost compared to the current state-of-the-art single-network method trained only on handwritten data. Our solution outperforms even recent residual learning approaches.
Tasks	Offline Handwritten Chinese Character Recognition
Published	2018-12-30
URL	https://arxiv.org/abs/1812.11489v2
PDF	https://arxiv.org/pdf/1812.11489v2.pdf
PWC	https://paperswithcode.com/paper/a-high-performance-cnn-method-for-offline
Repo	https://github.com/pavlo-melnyk/offline-HCCR
Framework	tf

Dense Information Flow for Neural Machine Translation


Title	Dense Information Flow for Neural Machine Translation
Authors	Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu
Abstract	Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework. From the optimization perspective, residual connections are adopted to improve learning performance for both encoder and decoder in most of these deep architectures, and advanced attention connections are applied as well. Inspired by the success of the DenseNet model in computer vision problems, in this paper, we propose a densely connected NMT architecture (DenseNMT) that is able to train more efficiently for NMT. The proposed DenseNMT not only allows dense connection in creating new features for both encoder and decoder, but also uses the dense attention structure to improve attention quality. Our experiments on multiple datasets show that DenseNMT structure is more competitive and efficient.
Tasks	Machine Translation
Published	2018-06-03
URL	http://arxiv.org/abs/1806.00722v2
PDF	http://arxiv.org/pdf/1806.00722v2.pdf
PWC	https://paperswithcode.com/paper/dense-information-flow-for-neural-machine
Repo	https://github.com/yanyao-shen/fairseq
Framework	pytorch

Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation


Title	Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation
Authors	Tsun-Hsuan Wang, Fu-En Wang, Juan-Ting Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
Abstract	We propose a novel plug-and-play (PnP) module for improving depth prediction with taking arbitrary patterns of sparse depths as input. Given any pre-trained depth prediction model, our PnP module updates the intermediate feature map such that the model outputs new depths consistent with the given sparse depths. Our method requires no additional training and can be applied to practical applications such as leveraging both RGB and sparse LiDAR points to robustly estimate dense depth map. Our approach achieves consistent improvements on various state-of-the-art methods on indoor (i.e., NYU-v2) and outdoor (i.e., KITTI) datasets. Various types of LiDARs are also synthesized in our experiments to verify the general applicability of our PnP module in practice. For project page, see https://zswang666.github.io/PnP-Depth-Project-Page/
Tasks	Depth Estimation
Published	2018-12-20
URL	http://arxiv.org/abs/1812.08350v2
PDF	http://arxiv.org/pdf/1812.08350v2.pdf
PWC	https://paperswithcode.com/paper/plug-and-play-improve-depth-estimation-via
Repo	https://github.com/zswang666/PnP-Depth
Framework	pytorch