May 5, 2019

2936 words 14 mins read

Paper Group ANR 569

Generative Knowledge Transfer for Neural Language Models. Inferring Sparsity: Compressed Sensing using Generalized Restricted Boltzmann Machines. Fast Face-swap Using Convolutional Neural Networks. A Factorization Approach to Inertial Affine Structure from Motion. Efficient L1-Norm Principal-Component Analysis via Bit Flipping. Structured Sparse Co …

Generative Knowledge Transfer for Neural Language Models


Title	Generative Knowledge Transfer for Neural Language Models
Authors	Sungho Shin, Kyuyeon Hwang, Wonyong Sung
Abstract	In this paper, we propose a generative knowledge transfer technique that trains an RNN based language model (student network) using text and output probabilities generated from a previously trained RNN (teacher network). The text generation can be conducted by either the teacher or the student network. We can also improve the performance by taking the ensemble of soft labels obtained from multiple teacher networks. This method can be used for privacy conscious language model adaptation because no user data is directly used for training. Especially, when the soft labels of multiple devices are aggregated via a trusted third party, we can expect very strong privacy protection.
Tasks	Language Modelling, Text Generation, Transfer Learning
Published	2016-08-14
URL	http://arxiv.org/abs/1608.04077v3
PDF	http://arxiv.org/pdf/1608.04077v3.pdf
PWC	https://paperswithcode.com/paper/generative-knowledge-transfer-for-neural
Repo
Framework

Inferring Sparsity: Compressed Sensing using Generalized Restricted Boltzmann Machines


Title	Inferring Sparsity: Compressed Sensing using Generalized Restricted Boltzmann Machines
Authors	Eric W. Tramel, Andre Manoel, Francesco Caltagirone, Marylou Gabrié, Florent Krzakala
Abstract	In this work, we consider compressed sensing reconstruction from $M$ measurements of $K$-sparse structured signals which do not possess a writable correlation model. Assuming that a generative statistical model, such as a Boltzmann machine, can be trained in an unsupervised manner on example signals, we demonstrate how this signal model can be used within a Bayesian framework of signal reconstruction. By deriving a message-passing inference for general distribution restricted Boltzmann machines, we are able to integrate these inferred signal models into approximate message passing for compressed sensing reconstruction. Finally, we show for the MNIST dataset that this approach can be very effective, even for $M < K$.
Tasks
Published	2016-06-13
URL	http://arxiv.org/abs/1606.03956v1
PDF	http://arxiv.org/pdf/1606.03956v1.pdf
PWC	https://paperswithcode.com/paper/inferring-sparsity-compressed-sensing-using
Repo
Framework

Fast Face-swap Using Convolutional Neural Networks


Title	Fast Face-swap Using Convolutional Neural Networks
Authors	Iryna Korshunova, Wenzhe Shi, Joni Dambre, Lucas Theis
Abstract	We consider the problem of face swapping in images, where an input identity is transformed into a target identity while preserving pose, facial expression, and lighting. To perform this mapping, we use convolutional neural networks trained to capture the appearance of the target identity from an unstructured collection of his/her photographs.This approach is enabled by framing the face swapping problem in terms of style transfer, where the goal is to render an image in the style of another one. Building on recent advances in this area, we devise a new loss function that enables the network to produce highly photorealistic results. By combining neural networks with simple pre- and post-processing steps, we aim at making face swap work in real-time with no input from the user.
Tasks	Face Swapping, Style Transfer
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09577v2
PDF	http://arxiv.org/pdf/1611.09577v2.pdf
PWC	https://paperswithcode.com/paper/fast-face-swap-using-convolutional-neural
Repo
Framework

A Factorization Approach to Inertial Affine Structure from Motion


Title	A Factorization Approach to Inertial Affine Structure from Motion
Authors	Roberto Tron
Abstract	We consider the problem of reconstructing a 3-D scene from a moving camera with high frame rate using the affine projection model. This problem is traditionally known as Affine Structure from Motion (Affine SfM), and can be solved using an elegant low-rank factorization formulation. In this paper, we assume that an accelerometer and gyro are rigidly mounted with the camera, so that synchronized linear acceleration and angular velocity measurements are available together with the image measurements. We extend the standard Affine SfM algorithm to integrate these measurements through the use of image derivatives.
Tasks
Published	2016-08-09
URL	http://arxiv.org/abs/1608.02680v1
PDF	http://arxiv.org/pdf/1608.02680v1.pdf
PWC	https://paperswithcode.com/paper/a-factorization-approach-to-inertial-affine
Repo
Framework

Efficient L1-Norm Principal-Component Analysis via Bit Flipping


Title	Efficient L1-Norm Principal-Component Analysis via Bit Flipping
Authors	Panos P. Markopoulos, Sandipan Kundu, Shubham Chamadia, Dimitris A. Pados
Abstract	It was shown recently that the $K$ L1-norm principal components (L1-PCs) of a real-valued data matrix $\mathbf X \in \mathbb R^{D \times N}$ ($N$ data samples of $D$ dimensions) can be exactly calculated with cost $\mathcal{O}(2^{NK})$ or, when advantageous, $\mathcal{O}(N^{dK - K + 1})$ where $d=\mathrm{rank}(\mathbf X)$, $K<d$ [1],[2]. In applications where $\mathbf X$ is large (e.g., “big” data of large $N$ and/or “heavy” data of large $d$), these costs are prohibitive. In this work, we present a novel suboptimal algorithm for the calculation of the $K < d$ L1-PCs of $\mathbf X$ of cost $\mathcal O(ND \mathrm{min} { N,D} + N^2(K^4 + dK^2) + dNK^3)$, which is comparable to that of standard (L2-norm) PC analysis. Our theoretical and experimental studies show that the proposed algorithm calculates the exact optimal L1-PCs with high frequency and achieves higher value in the L1-PC optimization metric than any known alternative algorithm of comparable computational cost. The superiority of the calculated L1-PCs over standard L2-PCs (singular vectors) in characterizing potentially faulty data/measurements is demonstrated with experiments on data dimensionality reduction and disease diagnosis from genomic data.
Tasks	Dimensionality Reduction
Published	2016-10-06
URL	http://arxiv.org/abs/1610.01959v1
PDF	http://arxiv.org/pdf/1610.01959v1.pdf
PWC	https://paperswithcode.com/paper/efficient-l1-norm-principal-component
Repo
Framework

Structured Sparse Convolutional Autoencoder


Title	Structured Sparse Convolutional Autoencoder
Authors	Ehsan Hosseini-Asl
Abstract	This paper aims to improve the feature learning in Convolutional Networks (Convnet) by capturing the structure of objects. A new sparsity function is imposed on the extracted featuremap to capture the structure and shape of the learned object, extracting interpretable features to improve the prediction performance. The proposed algorithm is based on organizing the activation within and across featuremap by constraining the node activities through $\ell_{2}$ and $\ell_{1}$ normalization in a structured form.
Tasks
Published	2016-04-17
URL	http://arxiv.org/abs/1604.04812v3
PDF	http://arxiv.org/pdf/1604.04812v3.pdf
PWC	https://paperswithcode.com/paper/structured-sparse-convolutional-autoencoder
Repo
Framework

Surveillance Video Parsing with Single Frame Supervision


Title	Surveillance Video Parsing with Single Frame Supervision
Authors	Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao
Abstract	Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications. However,pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding the frame is jointly considered. SVP (1) roughly parses the frames within the video segment, (2) estimates the optical flow between frames and (3) fuses the rough parsing results warped by optical flow to produce the refined parsing result. The three components of SVP, namely frame parsing, optical flow estimation and temporal fusion are integrated in an end-to-end manner. Experimental results on two surveillance video datasets show the superiority of SVP over state-of-the-arts.
Tasks	Optical Flow Estimation
Published	2016-11-29
URL	http://arxiv.org/abs/1611.09587v1
PDF	http://arxiv.org/pdf/1611.09587v1.pdf
PWC	https://paperswithcode.com/paper/surveillance-video-parsing-with-single-frame
Repo
Framework

Recursive Recurrent Nets with Attention Modeling for OCR in the Wild


Title	Recursive Recurrent Nets with Attention Modeling for OCR in the Wild
Authors	Chen-Yu Lee, Simon Osindero
Abstract	We present recursive recurrent neural networks with attention modeling (R$^2$AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction; (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams; and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.
Tasks	Language Modelling, Optical Character Recognition
Published	2016-03-09
URL	http://arxiv.org/abs/1603.03101v1
PDF	http://arxiv.org/pdf/1603.03101v1.pdf
PWC	https://paperswithcode.com/paper/recursive-recurrent-nets-with-attention
Repo
Framework

The Opacity of Backbones


Title	The Opacity of Backbones
Authors	Lane A. Hemaspaandra, David E. Narváez
Abstract	This paper approaches, using structural complexity theory, the question of whether there is a chasm between knowing an object exists and getting one’s hands on the object or its properties. In particular, we study the nontransparency of so-called backbones. A backbone of a boolean formula $F$ is a collection $S$ of its variables for which there is a unique partial assignment $a_S$ such that $F[a_S]$ is satisfiable [MZK+99,WGS03]. We show that, under the widely believed assumption that integer factoring is hard, there exist sets of boolean formulas that have obvious, nontrivial backbones yet finding the values, $a_S$, of those backbones is intractable. We also show that, under the same assumption, there exist sets of boolean formulas that obviously have large backbones yet producing such a backbone $S$ is intractable. Furthermore, we show that if integer factoring is not merely worst-case hard but is frequently hard, as is widely believed, then the frequency of hardness in our two results is not too much less than that frequency. These results hold more generally, namely, in the settings where, respectively, one’s assumption is that P $\neq$ NP $\cap$ coNP or that some problem in NP $\cap$ coNP is frequently hard.
Tasks
Published	2016-06-11
URL	http://arxiv.org/abs/1606.03634v5
PDF	http://arxiv.org/pdf/1606.03634v5.pdf
PWC	https://paperswithcode.com/paper/the-opacity-of-backbones
Repo
Framework

An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application


Title	An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application
Authors	Fouad Khan
Abstract	K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed to overcome this problem and has been shown to have better accuracy and computational efficiency than k-means. In many clustering problems though -such as when classifying georeferenced data for mapping applications- standardization of clustering methodology, specifically, the ability to arrive at the same cluster assignment for every run of the method i.e. replicability of the methodology, may be of greater significance than any perceived measure of accuracy, especially when the solution is known to be non-unique, as in the case of k-means clustering. Here we propose a simple initial seed selection algorithm for k-means clustering along one attribute that draws initial cluster boundaries along the ‘deepest valleys’ or greatest gaps in dataset. Thus, it incorporates a measure to maximize distance between consecutive cluster centers which augments the conventional k-means optimization for minimum distance between cluster center and cluster members. Unlike existing initialization methods, no additional parameters or degrees of freedom are introduced to the clustering algorithm. This improves the replicability of cluster assignments by as much as 100% over k-means and k-means++, virtually reducing the variance over different runs to zero, without introducing any additional parameters to the clustering process. Further, the proposed method is more computationally efficient than k-means++ and in some cases, more accurate.
Tasks
Published	2016-04-17
URL	http://arxiv.org/abs/1604.04893v1
PDF	http://arxiv.org/pdf/1604.04893v1.pdf
PWC	https://paperswithcode.com/paper/an-initial-seed-selection-algorithm-for-k
Repo
Framework

Video Depth-From-Defocus


Title	Video Depth-From-Defocus
Authors	Hyeongwoo Kim, Christian Richardt, Christian Theobalt
Abstract	Many compelling video post-processing effects, in particular aesthetic focus editing and refocusing effects, are feasible if per-frame depth information is available. Existing computational methods to capture RGB and depth either purposefully modify the optics (coded aperture, light-field imaging), or employ active RGB-D cameras. Since these methods are less practical for users with normal cameras, we present an algorithm to capture all-in-focus RGB-D video of dynamic scenes with an unmodified commodity video camera. Our algorithm turns the often unwanted defocus blur into a valuable signal. The input to our method is a video in which the focus plane is continuously moving back and forth during capture, and thus defocus blur is provoked and strongly visible. This can be achieved by manually turning the focus ring of the lens during recording. The core algorithmic ingredient is a new video-based depth-from-defocus algorithm that computes space-time-coherent depth maps, deblurred all-in-focus video, and the focus distance for each frame. We extensively evaluate our approach, and show that it enables compelling video post-processing effects, such as different types of refocusing.
Tasks
Published	2016-10-12
URL	http://arxiv.org/abs/1610.03782v1
PDF	http://arxiv.org/pdf/1610.03782v1.pdf
PWC	https://paperswithcode.com/paper/video-depth-from-defocus
Repo
Framework

Scalable image coding based on epitomes


Title	Scalable image coding based on epitomes
Authors	Martin Alain, Christine Guillemot, Dominique Thoreau, Philippe Guillotel
Abstract	In this paper, we propose a novel scheme for scalable image coding based on the concept of epitome. An epitome can be seen as a factorized representation of an image. Focusing on spatial scalability, the enhancement layer of the proposed scheme contains only the epitome of the input image. The pixels of the enhancement layer not contained in the epitome are then restored using two approaches inspired from local learning-based super-resolution methods. In the first method, a locally linear embedding model is learned on base layer patches and then applied to the corresponding epitome patches to reconstruct the enhancement layer. The second approach learns linear mappings between pairs of co-located base layer and epitome patches. Experiments have shown that significant improvement of the rate-distortion performances can be achieved compared to an SHVC reference.
Tasks	Super-Resolution
Published	2016-06-28
URL	http://arxiv.org/abs/1606.08694v1
PDF	http://arxiv.org/pdf/1606.08694v1.pdf
PWC	https://paperswithcode.com/paper/scalable-image-coding-based-on-epitomes
Repo
Framework

Estimation of low rank density matrices by Pauli measurements


Title	Estimation of low rank density matrices by Pauli measurements
Authors	Dong Xia
Abstract	Density matrices are positively semi-definite Hermitian matrices with unit trace that describe the states of quantum systems. Many quantum systems of physical interest can be represented as high-dimensional low rank density matrices. A popular problem in {\it quantum state tomography} (QST) is to estimate the unknown low rank density matrix of a quantum system by conducting Pauli measurements. Our main contribution is twofold. First, we establish the minimax lower bounds in Schatten $p$-norms with $1\leq p\leq +\infty$ for low rank density matrices estimation by Pauli measurements. In our previous paper, these minimax lower bounds are proved under the trace regression model with Gaussian noise and the noise is assumed to have common variance. In this paper, we prove these bounds under the Binomial observation model which meets the actual model in QST. Second, we study the Dantzig estimator (DE) for estimating the unknown low rank density matrix under the Binomial observation model by using Pauli measurements. In our previous papers, we studied the least squares estimator and the projection estimator, where we proved the optimal convergence rates for the least squares estimator in Schatten $p$-norms with $1\leq p\leq 2$ and, under a stronger condition, the optimal convergence rates for the projection estimator in Schatten $p$-norms with $1\leq p\leq +\infty$. In this paper, we show that the results of these two distinct estimators can be simultaneously obtained by the Dantzig estimator. Moreover, better convergence rates in Schatten norm distances can be proved for Dantzig estimator under conditions weaker than those needed in previous papers. When the objective function of DE is replaced by the negative von Neumann entropy, we obtain sharp convergence rate in Kullback-Leibler divergence.
Tasks	Quantum State Tomography
Published	2016-10-16
URL	http://arxiv.org/abs/1610.04811v2
PDF	http://arxiv.org/pdf/1610.04811v2.pdf
PWC	https://paperswithcode.com/paper/estimation-of-low-rank-density-matrices-by
Repo
Framework

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation


Title	Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation
Authors	Yacine Jernite, Anna Choromanska, David Sontag
Abstract	We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchi- cal predictor. Our approach optimizes an objec- tive function which favors balanced and easily- separable multi-way node partitions. We theoret- ically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the al- gorithm on text classification and language mod- eling, respectively, and show that they compare favorably to common baselines in terms of accu- racy and running time.
Tasks	Density Estimation, Representation Learning, Text Classification
Published	2016-10-14
URL	http://arxiv.org/abs/1610.04658v2
PDF	http://arxiv.org/pdf/1610.04658v2.pdf
PWC	https://paperswithcode.com/paper/simultaneous-learning-of-trees-and
Repo
Framework

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction


Title	Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction
Authors	Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
Abstract	Regularized empirical risk minimization (R-ERM) is an important branch of machine learning, since it constrains the capacity of the hypothesis space and guarantees the generalization ability of the learning algorithm. Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem. Recently, variance reduction technique was proposed to improve ProxSGD and ProxSCD, and the corresponding ProxSVRG and ProxSVRCD have better convergence rate. These proximal algorithms with variance reduction technique have also achieved great success in applications at small and moderate scales. However, in order to solve large-scale R-ERM problems and make more practical impacts, the parallel version of these algorithms are sorely needed. In this paper, we propose asynchronous ProxSVRG (Async-ProxSVRG) and asynchronous ProxSVRCD (Async-ProxSVRCD) algorithms, and prove that Async-ProxSVRG can achieve near linear speedup when the training data is sparse, while Async-ProxSVRCD can achieve near linear speedup regardless of the sparse condition, as long as the number of block partitions are appropriately set. We have conducted experiments on a regularized logistic regression task. The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.
Tasks
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08435v1
PDF	http://arxiv.org/pdf/1609.08435v1.pdf
PWC	https://paperswithcode.com/paper/asynchronous-stochastic-proximal-optimization
Repo
Framework