May 7, 2019

3134 words 15 mins read

Paper Group AWR 4

Discrete Variational Autoencoders. Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields. Off-policy evaluation for slate recommendation. DisturbLabel: Regularizing CNN on the Loss Layer. Superpixel Segmentation Using Gaussian Mixture Model. Knowledge Elicitation via Sequential Probabil …

Discrete Variational Autoencoders


Title	Discrete Variational Autoencoders
Authors	Jason Tyler Rolfe
Abstract	Probabilistic models with discrete latent variables naturally capture datasets composed of discrete classes. However, they are difficult to train efficiently, since backpropagation through discrete variables is generally not possible. We present a novel method to train a class of probabilistic models with discrete latent variables using the variational autoencoder framework, including backpropagation through the discrete latent variables. The associated class of probabilistic models comprises an undirected discrete component and a directed hierarchical continuous component. The discrete component captures the distribution over the disconnected smooth manifolds induced by the continuous component. As a result, this class of models efficiently learns both the class of objects in an image, and their specific realization in pixels, from unsupervised data, and outperforms state-of-the-art methods on the permutation-invariant MNIST, Omniglot, and Caltech-101 Silhouettes datasets.
Tasks	Omniglot
Published	2016-09-07
URL	http://arxiv.org/abs/1609.02200v2
PDF	http://arxiv.org/pdf/1609.02200v2.pdf
PWC	https://paperswithcode.com/paper/discrete-variational-autoencoders
Repo	https://github.com/QuadrantAI/dvae
Framework	tf

Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields


Title	Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields
Authors	Seungryong Kim, Kihong Park, Kwanghoon Sohn, Stephen Lin
Abstract	We present a method for jointly predicting a depth map and intrinsic images from single-image input. The two tasks are formulated in a synergistic manner through a joint conditional random field (CRF) that is solved using a novel convolutional neural network (CNN) architecture, called the joint convolutional neural field (JCNF) model. Tailored to our joint estimation problem, JCNF differs from previous CNNs in its sharing of convolutional activations and layers between networks for each task, its inference in the gradient domain where there exists greater correlation between depth and intrinsic images, and the incorporation of a gradient scale network that learns the confidence of estimated gradients in order to effectively balance them in the solution. This approach is shown to surpass state-of-the-art methods both on single-image depth estimation and on intrinsic image decomposition.
Tasks	Depth Estimation, Intrinsic Image Decomposition
Published	2016-03-21
URL	http://arxiv.org/abs/1603.06359v1
PDF	http://arxiv.org/pdf/1603.06359v1.pdf
PWC	https://paperswithcode.com/paper/unified-depth-prediction-and-intrinsic-image
Repo	https://github.com/seungryong/JCNF
Framework	none

Off-policy evaluation for slate recommendation


Title	Off-policy evaluation for slate recommendation
Authors	Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni
Abstract	This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context—a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy’s performance. A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learning-to-rank task, where it achieves competitive performance. We derive conditions under which our estimator is unbiased—these conditions are weaker than prior heuristics for slate evaluation—and experimentally demonstrate a smaller bias than parametric approaches, even when these conditions are violated. Finally, our theory and experiments also show exponential savings in the amount of required data compared with general unbiased estimators.
Tasks	Learning-To-Rank
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04812v3
PDF	http://arxiv.org/pdf/1605.04812v3.pdf
PWC	https://paperswithcode.com/paper/off-policy-evaluation-for-slate
Repo	https://github.com/adith387/slates_semisynth_expts
Framework	none

DisturbLabel: Regularizing CNN on the Loss Layer


Title	DisturbLabel: Regularizing CNN on the Loss Layer
Authors	Lingxi Xie, Jingdong Wang, Zhen Wei, Meng Wang, Qi Tian
Abstract	During a long period of time we are combating over-fitting in the CNN training process with model regularization, including weight decay, model averaging, data augmentation, etc. In this paper, we present DisturbLabel, an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration. Although it seems weird to intentionally generate incorrect training labels, we show that DisturbLabel prevents the network training from over-fitting by implicitly averaging over exponentially many networks which are trained with different label sets. To the best of our knowledge, DisturbLabel serves as the first work which adds noises on the loss layer. Meanwhile, DisturbLabel cooperates well with Dropout to provide complementary regularization functions. Experiments demonstrate competitive recognition results on several popular image recognition datasets.
Tasks	Data Augmentation
Published	2016-04-30
URL	http://arxiv.org/abs/1605.00055v1
PDF	http://arxiv.org/pdf/1605.00055v1.pdf
PWC	https://paperswithcode.com/paper/disturblabel-regularizing-cnn-on-the-loss
Repo	https://github.com/amirhfarzaneh/DisturbLabel-PyTorch
Framework	pytorch

Superpixel Segmentation Using Gaussian Mixture Model


Title	Superpixel Segmentation Using Gaussian Mixture Model
Authors	Zhihua Ban, Jianguo Liu, Li Cao
Abstract	Superpixel segmentation algorithms are to partition an image into perceptually coherence atomic regions by assigning every pixel a superpixel label. Those algorithms have been wildly used as a preprocessing step in computer vision works, as they can enormously reduce the number of entries of subsequent algorithms. In this work, we propose an alternative superpixel segmentation method based on Gaussian mixture model (GMM) by assuming that each superpixel corresponds to a Gaussian distribution, and assuming that each pixel is generated by first randomly choosing one distribution from several Gaussian distributions which are defined to be related to that pixel, and then the pixel is drawn from the selected distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distributions with unknown parameters (GMM). An algorithm based on expectation-maximization method is applied to estimate the unknown parameters. Once the unknown parameters are obtained, the superpixel label of a pixel is determined by a posterior probability. The success of applying GMM to superpixel segmentation depends on the two major differences between the traditional GMM-based clustering and the proposed one: data points in our model may be non-identically distributed, and we present an approach to control the shape of the estimated Gaussian functions by adjusting their covariance matrices. Our method is of linear complexity with respect to the number of pixels. The proposed algorithm is inherently parallel and can get faster speed by adding simple OpenMP directives to our implementation. According to our experiments, our algorithm outperforms the state-of-the-art superpixel algorithms in accuracy and presents a competitive performance in computational efficiency.
Tasks
Published	2016-12-28
URL	http://arxiv.org/abs/1612.08792v2
PDF	http://arxiv.org/pdf/1612.08792v2.pdf
PWC	https://paperswithcode.com/paper/superpixel-segmentation-using-gaussian
Repo	https://github.com/ahban/GMMSP
Framework	none

Knowledge Elicitation via Sequential Probabilistic Inference for High-Dimensional Prediction


Title	Knowledge Elicitation via Sequential Probabilistic Inference for High-Dimensional Prediction
Authors	Pedram Daee, Tomi Peltola, Marta Soare, Samuel Kaski
Abstract	Prediction in a small-sized sample with a large number of covariates, the “small n, large p” problem, is challenging. This setting is encountered in multiple applications, such as precision medicine, where obtaining additional samples can be extremely costly or even impossible, and extensive research effort has recently been dedicated to finding principled solutions for accurate prediction. However, a valuable source of additional information, domain experts, has not yet been efficiently exploited. We formulate knowledge elicitation generally as a probabilistic inference process, where expert knowledge is sequentially queried to improve predictions. In the specific case of sparse linear regression, where we assume the expert has knowledge about the values of the regression coefficients or about the relevance of the features, we propose an algorithm and computational approximation for fast and efficient interaction, which sequentially identifies the most informative features on which to query expert knowledge. Evaluations of our method in experiments with simulated and real users show improved prediction accuracy already with a small effort from the expert.
Tasks
Published	2016-12-10
URL	http://arxiv.org/abs/1612.03328v2
PDF	http://arxiv.org/pdf/1612.03328v2.pdf
PWC	https://paperswithcode.com/paper/knowledge-elicitation-via-sequential
Repo	https://github.com/HIIT/knowledge-elicitation-for-linear-regression
Framework	none

Faster CNNs with Direct Sparse Convolutions and Guided Pruning


Title	Faster CNNs with Direct Sparse Convolutions and Guided Pruning
Authors	Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, Pradeep Dubey
Abstract	Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and undesirable. Consequently, various methods have been developed to prune a CNN once it is trained. Nevertheless, the resulting CNNs offer limited benefits. While pruning the fully connected layers reduces a CNN’s size considerably, it does not improve inference speed noticeably as the compute heavy parts lie in convolutions. Pruning CNNs in a way that increase inference speed often imposes specific sparsity structures, thus limiting the achievable sparsity levels. We present a method to realize simultaneously size economy and speed improvement while pruning CNNs. Paramount to our success is an efficient general sparse-with-dense matrix multiplication implementation that is applicable to convolution of feature maps with kernels of arbitrary sparsity patterns. Complementing this, we developed a performance model that predicts sweet spots of sparsity levels for different layers and on different computer architectures. Together, these two allow us to demonstrate 3.1–7.3$\times$ convolution speedups over dense convolution in AlexNet, on Intel Atom, Xeon, and Xeon Phi processors, spanning the spectrum from mobile devices to supercomputers. We also open source our project at https://github.com/IntelLabs/SkimCaffe.
Tasks
Published	2016-08-04
URL	http://arxiv.org/abs/1608.01409v5
PDF	http://arxiv.org/pdf/1608.01409v5.pdf
PWC	https://paperswithcode.com/paper/faster-cnns-with-direct-sparse-convolutions
Repo	https://github.com/IntelLabs/SkimCaffe
Framework	tf

Variational Neural Discourse Relation Recognizer


Title	Variational Neural Discourse Relation Recognizer
Authors	Biao Zhang, Deyi Xiong, Jinsong Su, Qun Liu, Rongrong Ji, Hong Duan, Min Zhang
Abstract	Implicit discourse relation recognition is a crucial component for automatic discourselevel analysis and nature language understanding. Previous studies exploit discriminative models that are built on either powerful manual features or deep discourse representations. In this paper, instead, we explore generative models and propose a variational neural discourse relation recognizer. We refer to this model as VarNDRR. VarNDRR establishes a directed probabilistic model with a latent continuous variable that generates both a discourse and the relation between the two arguments of the discourse. In order to perform efficient inference and learning, we introduce neural discourse relation models to approximate the prior and posterior distributions of the latent variable, and employ these approximated distributions to optimize a reparameterized variational lower bound. This allows VarNDRR to be trained with standard stochastic gradient methods. Experiments on the benchmark data set show that VarNDRR can achieve comparable results against stateof- the-art baselines without using any manual features.
Tasks
Published	2016-03-12
URL	http://arxiv.org/abs/1603.03876v2
PDF	http://arxiv.org/pdf/1603.03876v2.pdf
PWC	https://paperswithcode.com/paper/variational-neural-discourse-relation
Repo	https://github.com/DeepLearnXMU/VarNDRR
Framework	none

Towards optimal nonlinearities for sparse recovery using higher-order statistics


Title	Towards optimal nonlinearities for sparse recovery using higher-order statistics
Authors	Steffen Limmer, Sławomir Stańczak
Abstract	We consider machine learning techniques to develop low-latency approximate solutions to a class of inverse problems. More precisely, we use a probabilistic approach for the problem of recovering sparse stochastic signals that are members of the $\ell_p$-balls. In this context, we analyze the Bayesian mean-square-error (MSE) for two types of estimators: (i) a linear estimator and (ii) a structured estimator composed of a linear operator followed by a Cartesian product of univariate nonlinear mappings. By construction, the complexity of the proposed nonlinear estimator is comparable to that of its linear counterpart since the nonlinear mapping can be implemented efficiently in hardware by means of look-up tables (LUTs). The proposed structure lends itself to neural networks and iterative shrinkage/thresholding-type algorithms restricted to a single iterate (e.g. due to imposed hardware or latency constraints). By resorting to an alternating minimization technique, we obtain a sequence of optimized linear operators and nonlinear mappings that converge in the MSE objective. The result is attractive for real-time applications where general iterative and convex optimization methods are infeasible.
Tasks
Published	2016-05-26
URL	http://arxiv.org/abs/1605.08201v2
PDF	http://arxiv.org/pdf/1605.08201v2.pdf
PWC	https://paperswithcode.com/paper/towards-optimal-nonlinearities-for-sparse
Repo	https://github.com/stli/MLSP2016_OptNonlin
Framework	none

Amortised MAP Inference for Image Super-resolution


Title	Amortised MAP Inference for Image Super-resolution
Authors	Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, Ferenc Huszár
Abstract	Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A more desirable approach would employ Maximum a Posteriori (MAP) inference, preferring solutions that always have a high probability under the image prior, and thus appear more plausible. Direct MAP estimation for SR is non-trivial, as it requires us to build a model for the image prior from samples. Furthermore, MAP inference is often performed via optimisation-based iterative algorithms which don’t compare well with the efficiency of neural-network-based alternatives. Here we introduce new methods for amortised MAP inference whereby we calculate the MAP estimate directly using a convolutional neural network. We first introduce a novel neural network architecture that performs a projection to the affine subspace of valid SR solutions ensuring that the high resolution output of the network is always consistent with the low resolution input. We show that, using this architecture, the amortised MAP inference problem reduces to minimising the cross-entropy between two distributions, similar to training generative models. We propose three methods to solve this optimisation problem: (1) Generative Adversarial Networks (GAN) (2) denoiser-guided SR which backpropagates gradient-estimates from denoising to train the network, and (3) a baseline method using a maximum-likelihood-trained image prior. Our experiments show that the GAN based approach performs best on real image data. Lastly, we establish a connection between GANs and amortised variational inference as in e.g. variational autoencoders.
Tasks	Denoising, Image Super-Resolution, Super-Resolution
Published	2016-10-14
URL	http://arxiv.org/abs/1610.04490v3
PDF	http://arxiv.org/pdf/1610.04490v3.pdf
PWC	https://paperswithcode.com/paper/amortised-map-inference-for-image-super
Repo	https://github.com/Justin-Tan/invariant_reps
Framework	tf

Derivative Delay Embedding: Online Modeling of Streaming Time Series


Title	Derivative Delay Embedding: Online Modeling of Streaming Time Series
Authors	Zhifei Zhang, Yang Song, Wei Wang, Hairong Qi
Abstract	The staggering amount of streaming time series coming from the real world calls for more efficient and effective online modeling solution. For time series modeling, most existing works make some unrealistic assumptions such as the input data is of fixed length or well aligned, which requires extra effort on segmentation or normalization of the raw streaming data. Although some literature claim their approaches to be invariant to data length and misalignment, they are too time-consuming to model a streaming time series in an online manner. We propose a novel and more practical online modeling and classification scheme, DDE-MGM, which does not make any assumptions on the time series while maintaining high efficiency and state-of-the-art performance. The derivative delay embedding (DDE) is developed to incrementally transform time series to the embedding space, where the intrinsic characteristics of data is preserved as recursive patterns regardless of the stream length and misalignment. Then, a non-parametric Markov geographic model (MGM) is proposed to both model and classify the pattern in an online manner. Experimental results demonstrate the effectiveness and superior classification accuracy of the proposed DDE-MGM in an online setting as compared to the state-of-the-art.
Tasks	Time Series
Published	2016-09-24
URL	http://arxiv.org/abs/1609.07540v1
PDF	http://arxiv.org/pdf/1609.07540v1.pdf
PWC	https://paperswithcode.com/paper/derivative-delay-embedding-online-modeling-of
Repo	https://github.com/ZZUTK/Delay_Embedding
Framework	none

Response Selection with Topic Clues for Retrieval-based Chatbots


Title	Response Selection with Topic Clues for Retrieval-based Chatbots
Authors	Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou
Abstract	We consider incorporating topic information into message-response matching to boost responses with rich content in retrieval-based chatbots. To this end, we propose a topic-aware convolutional neural tensor network (TACNTN). In TACNTN, matching between a message and a response is not only conducted between a message vector and a response vector generated by convolutional neural networks, but also leverages extra topic information encoded in two topic vectors. The two topic vectors are linear combinations of topic words of the message and the response respectively, where the topic words are obtained from a pre-trained LDA model and their weights are determined by themselves as well as the message vector and the response vector. The message vector, the response vector, and the two topic vectors are fed to neural tensors to calculate a matching score. Empirical study on a public data set and a human annotated data set shows that TACNTN can significantly outperform state-of-the-art methods for message-response matching.
Tasks
Published	2016-04-30
URL	http://arxiv.org/abs/1605.00090v3
PDF	http://arxiv.org/pdf/1605.00090v3.pdf
PWC	https://paperswithcode.com/paper/response-selection-with-topic-clues-for
Repo	https://github.com/MarkWuNLP/TACNTN
Framework	none

Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm


Title	Bayesian nonparametric image segmentation using a generalized Swendsen-Wang algorithm
Authors	Richard Yi Da Xu, Francois Caron, Arnaud Doucet
Abstract	Unsupervised image segmentation aims at clustering the set of pixels of an image into spatially homogeneous regions. We introduce here a class of Bayesian nonparametric models to address this problem. These models are based on a combination of a Potts-like spatial smoothness component and a prior on partitions which is used to control both the number and size of clusters. This class of models is flexible enough to include the standard Potts model and the more recent Potts-Dirichlet Process model \cite{Orbanz2008}. More importantly, any prior on partitions can be introduced to control the global clustering structure so that it is possible to penalize small or large clusters if necessary. Bayesian computation is carried out using an original generalized Swendsen-Wang algorithm. Experiments demonstrate that our method is competitive in terms of RAND\ index compared to popular image segmentation methods, such as mean-shift, and recent alternative Bayesian nonparametric models.
Tasks	Semantic Segmentation
Published	2016-02-09
URL	http://arxiv.org/abs/1602.03048v1
PDF	http://arxiv.org/pdf/1602.03048v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-nonparametric-image-segmentation
Repo	https://github.com/roboticcam/publications
Framework	none

VideoLSTM Convolves, Attends and Flows for Action Recognition


Title	VideoLSTM Convolves, Attends and Flows for Action Recognition
Authors	Zhenyang Li, Efstratios Gavves, Mihir Jain, Cees G. M. Snoek
Abstract	We present a new architecture for end-to-end sequence learning of actions in video, we call VideoLSTM. Rather than adapting the video to the peculiarities of established recurrent or convolutional architectures, we adapt the architecture to fit the requirements of the video medium. Starting from the soft-Attention LSTM, VideoLSTM makes three novel contributions. First, video has a spatial layout. To exploit the spatial correlation we hardwire convolutions in the soft-Attention LSTM architecture. Second, motion not only informs us about the action content, but also guides better the attention towards the relevant spatio-temporal locations. We introduce motion-based attention. And finally, we demonstrate how the attention from VideoLSTM can be used for action localization by relying on just the action class label. Experiments and comparisons on challenging datasets for action classification and localization support our claims.
Tasks	Action Classification, Action Localization, Temporal Action Localization
Published	2016-07-06
URL	http://arxiv.org/abs/1607.01794v1
PDF	http://arxiv.org/pdf/1607.01794v1.pdf
PWC	https://paperswithcode.com/paper/videolstm-convolves-attends-and-flows-for
Repo	https://github.com/zhenyangli/VideoLSTM
Framework	none

Associative Long Short-Term Memory


Title	Associative Long Short-Term Memory
Authors	Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, Alex Graves
Abstract	We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.
Tasks
Published	2016-02-09
URL	http://arxiv.org/abs/1602.03032v2
PDF	http://arxiv.org/pdf/1602.03032v2.pdf
PWC	https://paperswithcode.com/paper/associative-long-short-term-memory
Repo	https://github.com/mohammadpz/Associative_LSTM
Framework	none