Paper Group AWR 241
A bag-to-class divergence approach to multiple-instance learning. Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition. TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service. Hierarchical Neural Story Generation. Deep Learning-Based Channel Estimation. Efficient Lifelong Learning with A-GEM. Disentang …
A bag-to-class divergence approach to multiple-instance learning
Title | A bag-to-class divergence approach to multiple-instance learning |
Authors | Kajsa Møllersen, Jon Yngve Hardeberg, Fred Godtliebsen |
Abstract | In multi-instance (MI) learning, each object (bag) consists of multiple feature vectors (instances), and is most commonly regarded as a set of points in a multidimensional space. A different viewpoint is that the instances are realisations of random vectors with corresponding probability distribution, and that a bag is the distribution, not the realisations. In MI classification, each bag in the training set has a class label, but the instances are unlabelled. By introducing the probability distribution space to bag-level classification problems, dissimilarities between probability distributions (divergences) can be applied. The bag-to-bag Kullback-Leibler information is asymptotically the best classifier, but the typical sparseness of MI training sets is an obstacle. We introduce bag-to-class divergence to MI learning, emphasising the hierarchical nature of the random vectors that makes bags from the same class different. We propose two properties for bag-to-class divergences, and an additional property for sparse training sets. |
Tasks | Multiple Instance Learning |
Published | 2018-03-07 |
URL | http://arxiv.org/abs/1803.02782v2 |
http://arxiv.org/pdf/1803.02782v2.pdf | |
PWC | https://paperswithcode.com/paper/a-bag-to-class-divergence-approach-to |
Repo | https://github.com/kajsam/Bag-to-class-divergence |
Framework | none |
Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition
Title | Enriched Long-term Recurrent Convolutional Network for Facial Micro-Expression Recognition |
Authors | Huai-Qian Khor, John See, Raphael C. W. Phan, Weiyao Lin |
Abstract | Facial micro-expression (ME) recognition has posed a huge challenge to researchers for its subtlety in motion and limited databases. Recently, handcrafted techniques have achieved superior performance in micro-expression recognition but at the cost of domain specificity and cumbersome parametric tunings. In this paper, we propose an Enriched Long-term Recurrent Convolutional Network (ELRCN) that first encodes each micro-expression frame into a feature vector through CNN module(s), then predicts the micro-expression by passing the feature vector through a Long Short-term Memory (LSTM) module. The framework contains two different network variants: (1) Channel-wise stacking of input data for spatial enrichment, (2) Feature-wise stacking of features for temporal enrichment. We demonstrate that the proposed approach is able to achieve reasonably good performance, without data augmentation. In addition, we also present ablation studies conducted on the framework and visualizations of what CNN “sees” when predicting the micro-expression classes. |
Tasks | Data Augmentation |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08417v1 |
http://arxiv.org/pdf/1805.08417v1.pdf | |
PWC | https://paperswithcode.com/paper/enriched-long-term-recurrent-convolutional |
Repo | https://github.com/IcedDoggie/Micro-Expression-with-Deep-Learning |
Framework | none |
TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service
Title | TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service |
Authors | Amartya Sanyal, Matt J. Kusner, Adrià Gascón, Varun Kanade |
Abstract | Machine learning methods are widely used for a variety of prediction problems. \emph{Prediction as a service} is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) are designed. Equally important is to minimize the amount of computation and communication required between client and server. Fully homomorphic encryption offers a possible way out, whereby clients may encrypt their data, and on which the server may perform arithmetic computations. The main drawback of using fully homomorphic encryption is the amount of time required to evaluate large machine learning models on encrypted data. We combine ideas from the machine learning literature, particularly work on binarization and sparsification of neural networks, together with algorithmic tools to speed-up and parallelize computation using encrypted data. |
Tasks | |
Published | 2018-06-09 |
URL | http://arxiv.org/abs/1806.03461v1 |
http://arxiv.org/pdf/1806.03461v1.pdf | |
PWC | https://paperswithcode.com/paper/tapas-tricks-to-accelerate-encrypted |
Repo | https://github.com/amartya18x/tapas |
Framework | pytorch |
Hierarchical Neural Story Generation
Title | Hierarchical Neural Story Generation |
Authors | Angela Fan, Mike Lewis, Yann Dauphin |
Abstract | We explore story generation: creative systems that can build coherent and fluent passages of text about a topic. We collect a large dataset of 300K human-written stories paired with writing prompts from an online forum. Our dataset enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text. We gain further improvements with a novel form of model fusion that improves the relevance of the story to the prompt, and adding a new gated multi-scale self-attention mechanism to model long-range context. Experiments show large improvements over strong baselines on both automated and human evaluations. Human judges prefer stories generated by our approach to those from a strong non-hierarchical model by a factor of two to one. |
Tasks | |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04833v1 |
http://arxiv.org/pdf/1805.04833v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-neural-story-generation |
Repo | https://github.com/pytorch/fairseq |
Framework | pytorch |
Deep Learning-Based Channel Estimation
Title | Deep Learning-Based Channel Estimation |
Authors | Mehran Soltani, Vahid Pourahmadi, Ali Mirzaei, Hamid Sheikhzadeh |
Abstract | In this paper, we present a deep learning (DL) algorithm for channel estimation in communication systems. We consider the time-frequency response of a fast fading communication channel as a two-dimensional image. The aim is to find the unknown values of the channel response using some known values at the pilot locations. To this end, a general pipeline using deep image processing techniques, image super-resolution (SR) and image restoration (IR) is proposed. This scheme considers the pilot values, altogether, as a low-resolution image and uses an SR network cascaded with a denoising IR network to estimate the channel. Moreover, an implementation of the proposed pipeline is presented. The estimation error shows that the presented algorithm is comparable to the minimum mean square error (MMSE) with full knowledge of the channel statistics and it is better than ALMMSE (an approximation to linear MMSE). The results confirm that this pipeline can be used efficiently in channel estimation. |
Tasks | Denoising, Image Restoration, Image Super-Resolution, Super-Resolution |
Published | 2018-10-13 |
URL | http://arxiv.org/abs/1810.05893v4 |
http://arxiv.org/pdf/1810.05893v4.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-channel-estimation |
Repo | https://github.com/Mehran-Soltani/ChannelNet |
Framework | none |
Efficient Lifelong Learning with A-GEM
Title | Efficient Lifelong Learning with A-GEM |
Authors | Arslan Chaudhry, Marc’Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny |
Abstract | In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency. |
Tasks | |
Published | 2018-12-02 |
URL | http://arxiv.org/abs/1812.00420v2 |
http://arxiv.org/pdf/1812.00420v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-lifelong-learning-with-a-gem |
Repo | https://github.com/facebookresearch/agem |
Framework | tf |
Disentangling by Factorising
Title | Disentangling by Factorising |
Authors | Hyunjik Kim, Andriy Mnih |
Abstract | We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon $\beta$-VAE by providing a better trade-off between disentanglement and reconstruction quality. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them. |
Tasks | |
Published | 2018-02-16 |
URL | https://arxiv.org/abs/1802.05983v3 |
https://arxiv.org/pdf/1802.05983v3.pdf | |
PWC | https://paperswithcode.com/paper/disentangling-by-factorising |
Repo | https://github.com/1Konny/FactorVAE |
Framework | pytorch |
Scaled Simplex Representation for Subspace Clustering
Title | Scaled Simplex Representation for Subspace Clustering |
Authors | Jun Xu, Mengyang Yu, Ling Shao, Wangmeng Zuo, Deyu Meng, Lei Zhang, David Zhang |
Abstract | The self-expressive property of data points, i.e., each data point can be linearly represented by the other data points in the same subspace, has proven effective in leading subspace clustering methods. Most self-expressive methods usually construct a feasible affinity matrix from a coefficient matrix, obtained by solving an optimization problem. However, the negative entries in the coefficient matrix are forced to be positive when constructing the affinity matrix via exponentiation, absolute symmetrization, or squaring operations. This consequently damages the inherent correlations among the data. Besides, the affine constraint used in these methods is not flexible enough for practical applications. To overcome these problems, in this paper, we introduce a scaled simplex representation (SSR) for subspace clustering problem. Specifically, the non-negative constraint is used to make the coefficient matrix physically meaningful, and the coefficient vector is constrained to be summed up to a scalar s<1 to make it more discriminative. The proposed SSR based subspace clustering (SSRSC) model is reformulated as a linear equality-constrained problem, which is solved efficiently under the alternating direction method of multipliers framework. Experiments on benchmark datasets demonstrate that the proposed SSRSC algorithm is very efficient and outperforms state-of-the-art subspace clustering methods on accuracy. The code can be found at https://github.com/csjunxu/SSRSC. |
Tasks | |
Published | 2018-07-26 |
URL | https://arxiv.org/abs/1807.09930v3 |
https://arxiv.org/pdf/1807.09930v3.pdf | |
PWC | https://paperswithcode.com/paper/simplex-representation-for-subspace |
Repo | https://github.com/csjunxu/SSRSC |
Framework | none |
Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations
Title | Forward-Backward Stochastic Neural Networks: Deep Learning of High-dimensional Partial Differential Equations |
Authors | Maziar Raissi |
Abstract | Classical numerical methods for solving partial differential equations suffer from the curse dimensionality mainly due to their reliance on meticulously generated spatio-temporal grids. Inspired by modern deep learning based techniques for solving forward and inverse problems associated with partial differential equations, we circumvent the tyranny of numerical discretization by devising an algorithm that is scalable to high-dimensions. In particular, we approximate the unknown solution by a deep neural network which essentially enables us to benefit from the merits of automatic differentiation. To train the aforementioned neural network we leverage the well-known connection between high-dimensional partial differential equations and forward-backward stochastic differential equations. In fact, independent realizations of a standard Brownian motion will act as training data. We test the effectiveness of our approach for a couple of benchmark problems spanning a number of scientific domains including Black-Scholes-Barenblatt and Hamilton-Jacobi-Bellman equations, both in 100-dimensions. |
Tasks | |
Published | 2018-04-19 |
URL | http://arxiv.org/abs/1804.07010v1 |
http://arxiv.org/pdf/1804.07010v1.pdf | |
PWC | https://paperswithcode.com/paper/forward-backward-stochastic-neural-networks |
Repo | https://github.com/batuhanguler/Deep-BSDE-Solver |
Framework | pytorch |
A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors
Title | A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors |
Authors | Mikhail Khodak, Nikunj Saunshi, Yingyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora |
Abstract | Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks. |
Tasks | Document Classification, Domain Adaptation, Transfer Learning |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05388v1 |
http://arxiv.org/pdf/1805.05388v1.pdf | |
PWC | https://paperswithcode.com/paper/a-la-carte-embedding-cheap-but-effective |
Repo | https://github.com/NLPrinceton/ALaCarte |
Framework | none |
RUDDER: Return Decomposition for Delayed Rewards
Title | RUDDER: Return Decomposition for Delayed Rewards |
Authors | Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter |
Abstract | We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected immediate reward plus the expected future rewards. The latter are related to bias problems in temporal difference (TD) learning and to high variance problems in Monte Carlo (MC) learning. Both problems are even more severe when rewards are delayed. RUDDER aims at making the expected future rewards zero, which simplifies Q-value estimation to computing the mean of the immediate reward. We propose the following two new concepts to push the expected future rewards toward zero. (i) Reward redistribution that leads to return-equivalent decision processes with the same optimal policies and, when optimal, zero expected future rewards. (ii) Return decomposition via contribution analysis which transforms the reinforcement learning task into a regression task at which deep learning excels. On artificial tasks with delayed rewards, RUDDER is significantly faster than MC and exponentially faster than Monte Carlo Tree Search (MCTS), TD({\lambda}), and reward shaping approaches. At Atari games, RUDDER on top of a Proximal Policy Optimization (PPO) baseline improves the scores, which is most prominent at games with delayed rewards. Source code is available at \url{https://github.com/ml-jku/rudder} and demonstration videos at \url{https://goo.gl/EQerZV}. |
Tasks | Atari Games |
Published | 2018-06-20 |
URL | https://arxiv.org/abs/1806.07857v3 |
https://arxiv.org/pdf/1806.07857v3.pdf | |
PWC | https://paperswithcode.com/paper/rudder-return-decomposition-for-delayed |
Repo | https://github.com/ml-jku/rudder |
Framework | pytorch |
Ranking for Relevance and Display Preferences in Complex Presentation Layouts
Title | Ranking for Relevance and Display Preferences in Complex Presentation Layouts |
Authors | Harrie Oosterhuis, Maarten de Rijke |
Abstract | Learning to Rank has traditionally considered settings where given the relevance information of objects, the desired order in which to rank the objects is clear. However, with today’s large variety of users and layouts this is not always the case. In this paper, we consider so-called complex ranking settings where it is not clear what should be displayed, that is, what the relevant items are, and how they should be displayed, that is, where the most relevant items should be placed. These ranking settings are complex as they involve both traditional ranking and inferring the best display order. Existing learning to rank methods cannot handle such complex ranking settings as they assume that the display order is known beforehand. To address this gap we introduce a novel Deep Reinforcement Learning method that is capable of learning complex rankings, both the layout and the best ranking given the layout, from weak reward signals. Our proposed method does so by selecting documents and positions sequentially, hence it ranks both the documents and positions, which is why we call it the Double-Rank Model (DRM). Our experiments show that DRM outperforms all existing methods in complex ranking settings, thus it leads to substantial ranking improvements in cases where the display order is not known a priori. |
Tasks | Learning-To-Rank |
Published | 2018-05-07 |
URL | http://arxiv.org/abs/1805.02404v1 |
http://arxiv.org/pdf/1805.02404v1.pdf | |
PWC | https://paperswithcode.com/paper/ranking-for-relevance-and-display-preferences |
Repo | https://github.com/HarrieO/RankingComplexLayouts |
Framework | tf |
Spectral Normalization for Generative Adversarial Networks
Title | Spectral Normalization for Generative Adversarial Networks |
Authors | Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida |
Abstract | One of the challenges in the study of generative adversarial networks is the instability of its training. In this paper, we propose a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator. Our new normalization technique is computationally light and easy to incorporate into existing implementations. We tested the efficacy of spectral normalization on CIFAR10, STL-10, and ILSVRC2012 dataset, and we experimentally confirmed that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques. |
Tasks | Image Generation |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.05957v1 |
http://arxiv.org/pdf/1802.05957v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-normalization-for-generative |
Repo | https://github.com/taki0112/Spectral_Normalization-Tensorflow |
Framework | tf |
LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation
Title | LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation |
Authors | Tak-Wai Hui, Xiaoou Tang, Chen Change Loy |
Abstract | FlowNet2, the state-of-the-art convolutional neural network (CNN) for optical flow estimation, requires over 160M parameters to achieve accurate flow estimation. In this paper we present an alternative network that outperforms FlowNet2 on the challenging Sintel final pass and KITTI benchmarks, while being 30 times smaller in the model size and 1.36 times faster in the running speed. This is made possible by drilling down to architectural details that might have been missed in the current frameworks: (1) We present a more effective flow inference approach at each pyramid level through a lightweight cascaded network. It not only improves flow estimation accuracy through early correction, but also permits seamless incorporation of descriptor matching in our network. (2) We present a novel flow regularization layer to ameliorate the issue of outliers and vague flow boundaries by using a feature-driven local convolution. (3) Our network owns an effective structure for pyramidal feature extraction and embraces feature warping rather than image warping as practiced in FlowNet2. Our code and trained models are available at https://github.com/twhui/LiteFlowNet . |
Tasks | Optical Flow Estimation |
Published | 2018-05-18 |
URL | http://arxiv.org/abs/1805.07036v1 |
http://arxiv.org/pdf/1805.07036v1.pdf | |
PWC | https://paperswithcode.com/paper/liteflownet-a-lightweight-convolutional |
Repo | https://github.com/twhui/LiteFlowNet |
Framework | tf |
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Title | LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks |
Authors | Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, Gang Hua |
Abstract | Although weight and activation quantization is an effective approach for Deep Neural Network (DNN) compression and has a lot of potentials to increase inference speed leveraging bit-operations, there is still a noticeable gap in terms of prediction accuracy between the quantized model and the full-precision model. To address this gap, we propose to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization schemes such as uniform or logarithmic quantization. Our method for learning the quantizers applies to both network weights and activations with arbitrary-bit precision, and our quantizers are easy to train. The comprehensive experiments on CIFAR-10 and ImageNet datasets show that our method works consistently well for various network structures such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous quantization methods in terms of accuracy by an appreciable margin. Code available at https://github.com/Microsoft/LQ-Nets |
Tasks | Quantization |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10029v1 |
http://arxiv.org/pdf/1807.10029v1.pdf | |
PWC | https://paperswithcode.com/paper/lq-nets-learned-quantization-for-highly |
Repo | https://github.com/Microsoft/LQ-Nets |
Framework | tf |