October 20, 2019

2962 words 14 mins read

Paper Group AWR 241

A bag-to-class divergence approach to multiple-instance learning. Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition. TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service. Hierarchical Neural Story Generation. Deep Learning-Based Channel Estimation. Efficient Lifelong Learning with A-GEM. Disentang …

A bag-to-class divergence approach to multiple-instance learning


Title	A bag-to-class divergence approach to multiple-instance learning
Authors	Kajsa Møllersen, Jon Yngve Hardeberg, Fred Godtliebsen
Abstract	In multi-instance (MI) learning, each object (bag) consists of multiple feature vectors (instances), and is most commonly regarded as a set of points in a multidimensional space. A different viewpoint is that the instances are realisations of random vectors with corresponding probability distribution, and that a bag is the distribution, not the realisations. In MI classification, each bag in the training set has a class label, but the instances are unlabelled. By introducing the probability distribution space to bag-level classification problems, dissimilarities between probability distributions (divergences) can be applied. The bag-to-bag Kullback-Leibler information is asymptotically the best classifier, but the typical sparseness of MI training sets is an obstacle. We introduce bag-to-class divergence to MI learning, emphasising the hierarchical nature of the random vectors that makes bags from the same class different. We propose two properties for bag-to-class divergences, and an additional property for sparse training sets.
Tasks	Multiple Instance Learning
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02782v2
PDF	http://arxiv.org/pdf/1803.02782v2.pdf
PWC	https://paperswithcode.com/paper/a-bag-to-class-divergence-approach-to
Repo	https://github.com/kajsam/Bag-to-class-divergence
Framework	none

Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition


Title	Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
Authors	Huai-Qian Khor, John See, Raphael C. W. Phan, Weiyao Lin
Abstract	Facial micro-expression (ME) recognition has posed a huge challenge to researchers for its subtlety in motion and limited databases. Recently, handcrafted techniques have achieved superior performance in micro-expression recognition but at the cost of domain specificity and cumbersome parametric tunings. In this paper, we propose an Enriched Long-term Recurrent Convolutional Network (ELRCN) that first encodes each micro-expression frame into a feature vector through CNN module(s), then predicts the micro-expression by passing the feature vector through a Long Short-term Memory (LSTM) module. The framework contains two different network variants: (1) Channel-wise stacking of input data for spatial enrichment, (2) Feature-wise stacking of features for temporal enrichment. We demonstrate that the proposed approach is able to achieve reasonably good performance, without data augmentation. In addition, we also present ablation studies conducted on the framework and visualizations of what CNN “sees” when predicting the micro-expression classes.
Tasks	Data Augmentation
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08417v1
PDF	http://arxiv.org/pdf/1805.08417v1.pdf
PWC	https://paperswithcode.com/paper/enriched-long-term-recurrent-convolutional
Repo	https://github.com/IcedDoggie/Micro-Expression-with-Deep-Learning
Framework	none

TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service


Title	TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service
Authors	Amartya Sanyal, Matt J. Kusner, Adrià Gascón, Varun Kanade
Abstract	Machine learning methods are widely used for a variety of prediction problems. \emph{Prediction as a service} is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) are designed. Equally important is to minimize the amount of computation and communication required between client and server. Fully homomorphic encryption offers a possible way out, whereby clients may encrypt their data, and on which the server may perform arithmetic computations. The main drawback of using fully homomorphic encryption is the amount of time required to evaluate large machine learning models on encrypted data. We combine ideas from the machine learning literature, particularly work on binarization and sparsification of neural networks, together with algorithmic tools to speed-up and parallelize computation using encrypted data.
Tasks
Published	2018-06-09
URL	http://arxiv.org/abs/1806.03461v1
PDF	http://arxiv.org/pdf/1806.03461v1.pdf
PWC	https://paperswithcode.com/paper/tapas-tricks-to-accelerate-encrypted
Repo	https://github.com/amartya18x/tapas
Framework	pytorch

Hierarchical Neural Story Generation


Title	Hierarchical Neural Story Generation
Authors	Angela Fan, Mike Lewis, Yann Dauphin
Abstract	We explore story generation: creative systems that can build coherent and fluent passages of text about a topic. We collect a large dataset of 300K human-written stories paired with writing prompts from an online forum. Our dataset enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text. We gain further improvements with a novel form of model fusion that improves the relevance of the story to the prompt, and adding a new gated multi-scale self-attention mechanism to model long-range context. Experiments show large improvements over strong baselines on both automated and human evaluations. Human judges prefer stories generated by our approach to those from a strong non-hierarchical model by a factor of two to one.
Tasks
Published	2018-05-13
URL	http://arxiv.org/abs/1805.04833v1
PDF	http://arxiv.org/pdf/1805.04833v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-neural-story-generation
Repo	https://github.com/pytorch/fairseq
Framework	pytorch

Deep Learning-Based Channel Estimation


Title	Deep Learning-Based Channel Estimation
Authors	Mehran Soltani, Vahid Pourahmadi, Ali Mirzaei, Hamid Sheikhzadeh
Abstract	In this paper, we present a deep learning (DL) algorithm for channel estimation in communication systems. We consider the time-frequency response of a fast fading communication channel as a two-dimensional image. The aim is to find the unknown values of the channel response using some known values at the pilot locations. To this end, a general pipeline using deep image processing techniques, image super-resolution (SR) and image restoration (IR) is proposed. This scheme considers the pilot values, altogether, as a low-resolution image and uses an SR network cascaded with a denoising IR network to estimate the channel. Moreover, an implementation of the proposed pipeline is presented. The estimation error shows that the presented algorithm is comparable to the minimum mean square error (MMSE) with full knowledge of the channel statistics and it is better than ALMMSE (an approximation to linear MMSE). The results confirm that this pipeline can be used efficiently in channel estimation.
Tasks	Denoising, Image Restoration, Image Super-Resolution, Super-Resolution
Published	2018-10-13
URL	http://arxiv.org/abs/1810.05893v4
PDF	http://arxiv.org/pdf/1810.05893v4.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-channel-estimation
Repo	https://github.com/Mehran-Soltani/ChannelNet
Framework	none

Efficient Lifelong Learning with A-GEM


Title	Efficient Lifelong Learning with A-GEM
Authors	Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny
Abstract	In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.
Tasks
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00420v2
PDF	http://arxiv.org/pdf/1812.00420v2.pdf
PWC	https://paperswithcode.com/paper/efficient-lifelong-learning-with-a-gem
Repo	https://github.com/facebookresearch/agem
Framework	tf

Disentangling by Factorising


Title	Disentangling by Factorising
Authors	Hyunjik Kim, Andriy Mnih
Abstract	We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon $\beta$-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.
Tasks
Published	2018-02-16
URL	https://arxiv.org/abs/1802.05983v3
PDF	https://arxiv.org/pdf/1802.05983v3.pdf
PWC	https://paperswithcode.com/paper/disentangling-by-factorising
Repo	https://github.com/1Konny/FactorVAE
Framework	pytorch

Scaled Simplex Representation for Subspace Clustering


Title	Scaled Simplex Representation for Subspace Clustering
Authors	Jun Xu, Mengyang Yu, Ling Shao, Wangmeng Zuo, Deyu Meng, Lei Zhang, David Zhang
Abstract	The self-expressive property of data points, i.e., each data point can be linearly represented by the other data points in the same subspace, has proven effective in leading subspace clustering methods. Most self-expressive methods usually construct a feasible affinity matrix from a coefficient matrix, obtained by solving an optimization problem. However, the negative entries in the coefficient matrix are forced to be positive when constructing the affinity matrix via exponentiation, absolute symmetrization, or squaring operations. This consequently damages the inherent correlations among the data. Besides, the affine constraint used in these methods is not flexible enough for practical applications. To overcome these problems, in this paper, we introduce a scaled simplex representation (SSR) for subspace clustering problem. Specifically, the non-negative constraint is used to make the coefficient matrix physically meaningful, and the coefficient vector is constrained to be summed up to a scalar s<1 to make it more discriminative. The proposed SSR based subspace clustering (SSRSC) model is reformulated as a linear equality-constrained problem, which is solved efficiently under the alternating direction method of multipliers framework. Experiments on benchmark datasets demonstrate that the proposed SSRSC algorithm is very efficient and outperforms state-of-the-art subspace clustering methods on accuracy. The code can be found at https://github.com/csjunxu/SSRSC.
Tasks
Published	2018-07-26
URL	https://arxiv.org/abs/1807.09930v3
PDF	https://arxiv.org/pdf/1807.09930v3.pdf
PWC	https://paperswithcode.com/paper/simplex-representation-for-subspace
Repo	https://github.com/csjunxu/SSRSC
Framework	none

Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations


Title	Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations
Authors	Maziar Raissi
Abstract	Classical numerical methods for solving partial differential equations suffer from the curse dimensionality mainly due to their reliance on meticulously generated spatio-temporal grids. Inspired by modern deep learning based techniques for solving forward and inverse problems associated with partial differential equations, we circumvent the tyranny of numerical discretization by devising an algorithm that is scalable to high-dimensions. In particular, we approximate the unknown solution by a deep neural network which essentially enables us to benefit from the merits of automatic differentiation. To train the aforementioned neural network we leverage the well-known connection between high-dimensional partial differential equations and forward-backward stochastic differential equations. In fact, independent realizations of a standard Brownian motion will act as training data. We test the effectiveness of our approach for a couple of benchmark problems spanning a number of scientific domains including Black-Scholes-Barenblatt and Hamilton-Jacobi-Bellman equations, both in 100-dimensions.
Tasks
Published	2018-04-19
URL	http://arxiv.org/abs/1804.07010v1
PDF	http://arxiv.org/pdf/1804.07010v1.pdf
PWC	https://paperswithcode.com/paper/forward-backward-stochastic-neural-networks
Repo	https://github.com/batuhanguler/Deep-BSDE-Solver
Framework	pytorch

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors


Title	A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Authors	Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora
Abstract	Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.
Tasks	Document Classification, Domain Adaptation, Transfer Learning
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05388v1
PDF	http://arxiv.org/pdf/1805.05388v1.pdf
PWC	https://paperswithcode.com/paper/a-la-carte-embedding-cheap-but-effective
Repo	https://github.com/NLPrinceton/ALaCarte
Framework	none

RUDDER: Return Decomposition for Delayed Rewards


Title	RUDDER: Return Decomposition for Delayed Rewards
Authors	Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter
Abstract	We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero. (i) Reward redistribution that leads to return-equivalent decision processes with the same optimal policies and, when optimal, zero expected future rewards. (ii) Return decomposition via contribution analysis which transforms the reinforcement learning task into a regression task at which deep learning excels. On artificial tasks with delayed rewards, RUDDER is significantly faster than MC and exponentially faster than Monte Carlo Tree Search (MCTS), TD({\lambda}), and reward shaping approaches. At Atari games, RUDDER on top of a Proximal Policy Optimization (PPO) baseline improves the scores, which is most prominent at games with delayed rewards. Source code is available at \url{https://github.com/ml-jku/rudder} and demonstration videos at \url{https://goo.gl/EQerZV}.
Tasks	Atari Games
Published	2018-06-20
URL	https://arxiv.org/abs/1806.07857v3
PDF	https://arxiv.org/pdf/1806.07857v3.pdf
PWC	https://paperswithcode.com/paper/rudder-return-decomposition-for-delayed
Repo	https://github.com/ml-jku/rudder
Framework	pytorch

Ranking for Relevance and Display Preferences in Complex Presentation Layouts


Title	Ranking for Relevance and Display Preferences in Complex Presentation Layouts
Authors	Harrie Oosterhuis, Maarten de Rijke
Abstract	Learning to Rank has traditionally considered settings where given the relevance information of objects, the desired order in which to rank the objects is clear. However, with today’s large variety of users and layouts this is not always the case. In this paper, we consider so-called complex ranking settings where it is not clear what should be displayed, that is, what the relevant items are, and how they should be displayed, that is, where the most relevant items should be placed. These ranking settings are complex as they involve both traditional ranking and inferring the best display order. Existing learning to rank methods cannot handle such complex ranking settings as they assume that the display order is known beforehand. To address this gap we introduce a novel Deep Reinforcement Learning method that is capable of learning complex rankings, both the layout and the best ranking given the layout, from weak reward signals. Our proposed method does so by selecting documents and positions sequentially, hence it ranks both the documents and positions, which is why we call it the Double-Rank Model (DRM). Our experiments show that DRM outperforms all existing methods in complex ranking settings, thus it leads to substantial ranking improvements in cases where the display order is not known a priori.
Tasks	Learning-To-Rank
Published	2018-05-07
URL	http://arxiv.org/abs/1805.02404v1
PDF	http://arxiv.org/pdf/1805.02404v1.pdf
PWC	https://paperswithcode.com/paper/ranking-for-relevance-and-display-preferences
Repo	https://github.com/HarrieO/RankingComplexLayouts
Framework	tf

Spectral Normalization for Generative Adversarial Networks


Title	Spectral Normalization for Generative Adversarial Networks
Authors	Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida
Abstract	One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.
Tasks	Image Generation
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05957v1
PDF	http://arxiv.org/pdf/1802.05957v1.pdf
PWC	https://paperswithcode.com/paper/spectral-normalization-for-generative
Repo	https://github.com/taki0112/Spectral_Normalization-Tensorflow
Framework	tf

LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation


Title	LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation
Authors	Tak-Wai Hui, Xiaoou Tang, Chen Change Loy
Abstract	FlowNet2, the state-of-the-art convolutional neural network (CNN) for optical flow estimation, requires over 160M parameters to achieve accurate flow estimation. In this paper we present an alternative network that outperforms FlowNet2 on the challenging Sintel final pass and KITTI benchmarks, while being 30 times smaller in the model size and 1.36 times faster in the running speed. This is made possible by drilling down to architectural details that might have been missed in the current frameworks: (1) We present a more effective flow inference approach at each pyramid level through a lightweight cascaded network. It not only improves flow estimation accuracy through early correction, but also permits seamless incorporation of descriptor matching in our network. (2) We present a novel flow regularization layer to ameliorate the issue of outliers and vague flow boundaries by using a feature-driven local convolution. (3) Our network owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2. Our code and trained models are available at https://github.com/twhui/LiteFlowNet .
Tasks	Optical Flow Estimation
Published	2018-05-18
URL	http://arxiv.org/abs/1805.07036v1
PDF	http://arxiv.org/pdf/1805.07036v1.pdf
PWC	https://paperswithcode.com/paper/liteflownet-a-lightweight-convolutional
Repo	https://github.com/twhui/LiteFlowNet
Framework	tf

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks


Title	LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Authors	Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, Gang Hua
Abstract	Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of prediction accuracy between the quantized model and the full-precision model. To address this gap, we propose to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization. Our method for learning the quantizers applies to both network weights and activations with arbitrary-bit precision, and our quantizers are easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets show that our method works consistently well for various network structures such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous quantization methods in terms of accuracy by an appreciable margin. Code available at https://github.com/Microsoft/LQ-Nets
Tasks	Quantization
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10029v1
PDF	http://arxiv.org/pdf/1807.10029v1.pdf
PWC	https://paperswithcode.com/paper/lq-nets-learned-quantization-for-highly
Repo	https://github.com/Microsoft/LQ-Nets
Framework	tf