July 29, 2019

3426 words 17 mins read

Paper Group AWR 105

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition. Deep Voice: Real-time Neural Text-to-Speech. An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery. GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks. Incorporation of prior knowledge of the signal behavior into the …

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition


Title	Two-stream Flow-guided Convolutional Attention Networks for Action Recognition
Authors	An Tran, Loong-Fah Cheong
Abstract	This paper proposes a two-stream flow-guided convolutional attention networks for action recognition in videos. The central idea is that optical flows, when properly compensated for the camera motion, can be used to guide attention to the human foreground. We thus develop cross-link layers from the temporal network (trained on flows) to the spatial network (trained on RGB frames). These cross-link layers guide the spatial-stream to pay more attention to the human foreground areas and be less affected by background clutter. We obtain promising performances with our approach on the UCF101, HMDB51 and Hollywood2 datasets.
Tasks	Action Recognition In Videos, Temporal Action Localization
Published	2017-08-30
URL	http://arxiv.org/abs/1708.09268v1
PDF	http://arxiv.org/pdf/1708.09268v1.pdf
PWC	https://paperswithcode.com/paper/two-stream-flow-guided-convolutional
Repo	https://github.com/antran89/two-stream-fcan
Framework	none

Deep Voice: Real-time Neural Text-to-Speech


Title	Deep Voice: Real-time Neural Text-to-Speech
Authors	Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi
Abstract	We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For the segmentation model, we propose a novel way of performing phoneme boundary detection with deep neural networks using connectionist temporal classification (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet that requires fewer parameters and trains faster than the original. By using a neural network for each component, our system is simpler and more flexible than traditional text-to-speech systems, where each component requires laborious feature engineering and extensive domain expertise. Finally, we show that inference with our system can be performed faster than real time and describe optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.
Tasks	Boundary Detection, Feature Engineering, Speech Synthesis
Published	2017-02-25
URL	http://arxiv.org/abs/1702.07825v2
PDF	http://arxiv.org/pdf/1702.07825v2.pdf
PWC	https://paperswithcode.com/paper/deep-voice-real-time-neural-text-to-speech
Repo	https://github.com/NVIDIA/nv-wavenet
Framework	pytorch

An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery


Title	An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery
Authors	Alex Tank, Ian Cover, Nicholas J. Foti, Ali Shojaie, Emily B. Fox
Abstract	While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A sufficient condition for Granger non-causality in this setting is that all of the outgoing weights of the input data, the past lags of a series, to the first hidden layer are zero. For estimation, we utilize a group lasso penalty to shrink groups of input weights to zero. We also propose a hierarchical penalty for simultaneous Granger causality and lag estimation. We validate our approach on simulated data from both a sparse linear autoregressive model and the sparse and nonlinear Lorenz-96 model.
Tasks	Time Series
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08160v2
PDF	http://arxiv.org/pdf/1711.08160v2.pdf
PWC	https://paperswithcode.com/paper/an-interpretable-and-sparse-neural-network
Repo	https://github.com/christeefy/Novel-Techniques-for-PTR-FD
Framework	none

GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks


Title	GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks
Authors	Xing Di, Vishwanath A. Sindagi, Vishal M. Patel
Abstract	Facial landmarks constitute the most compressed representation of faces and are known to preserve information such as pose, gender and facial structure present in the faces. Several works exist that attempt to perform high-level face-related analysis tasks based on landmarks. In contrast, in this work, an attempt is made to tackle the inverse problem of synthesizing faces from their respective landmarks. The primary aim of this work is to demonstrate that information preserved by landmarks (gender in particular) can be further accentuated by leveraging generative models to synthesize corresponding faces. Though the problem is particularly challenging due to its ill-posed nature, we believe that successful synthesis will enable several applications such as boosting performance of high-level face related tasks using landmark points and performing dataset augmentation. To this end, a novel face-synthesis method known as Gender Preserving Generative Adversarial Network (GP-GAN) that is guided by adversarial loss, perceptual loss and a gender preserving loss is presented. Further, we propose a novel generator sub-network UDeNet for GP-GAN that leverages advantages of U-Net and DenseNet architectures. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed method.
Tasks	Face Generation
Published	2017-10-03
URL	http://arxiv.org/abs/1710.00962v2
PDF	http://arxiv.org/pdf/1710.00962v2.pdf
PWC	https://paperswithcode.com/paper/gp-gan-gender-preserving-gan-for-synthesizing
Repo	https://github.com/DetionDX/GP-GAN-GenderPreserving-GAN-for-Synthesizing-Faces-from-Landmarks
Framework	pytorch

Incorporation of prior knowledge of the signal behavior into the reconstruction to accelerate the acquisition of MR diffusion data


Title	Incorporation of prior knowledge of the signal behavior into the reconstruction to accelerate the acquisition of MR diffusion data
Authors	Juan F P J Abascal, Manuel Desco, Juan Parra-Robles
Abstract	Diffusion MRI measurements using hyperpolarized gases are generally acquired during patient breath hold, which yields a compromise between achievable image resolution, lung coverage and number of b-values. In this work, we propose a novel method that accelerates the acquisition of MR diffusion data by undersampling in both spatial and b-value dimensions, thanks to incorporating knowledge about the signal decay into the reconstruction (SIDER). SIDER is compared to total variation (TV) reconstruction by assessing their effect on both the recovery of ventilation images and estimated mean alveolar dimensions (MAD). Both methods are assessed by retrospectively undersampling diffusion datasets of normal volunteers and COPD patients (n=8) for acceleration factors between x2 and x10. TV led to large errors and artefacts for acceleration factors equal or larger than x5. SIDER improved TV, presenting lower errors and histograms of MAD closer to those obtained from fully sampled data for accelerations factors up to x10. SIDER preserved image quality at all acceleration factors but images were slightly smoothed and some details were lost at x10. In conclusion, we have developed and validated a novel compressed sensing method for lung MRI imaging and achieved high acceleration factors, which can be used to increase the amount of data acquired during a breath-hold. This methodology is expected to improve the accuracy of estimated lung microstructure dimensions and widen the possibilities of studying lung diseases with MRI.
Tasks
Published	2017-02-09
URL	http://arxiv.org/abs/1702.02743v1
PDF	http://arxiv.org/pdf/1702.02743v1.pdf
PWC	https://paperswithcode.com/paper/incorporation-of-prior-knowledge-of-the
Repo	https://github.com/HGGM-LIM/compressed-sensing-diffusion-lung-MRI
Framework	none

Spectral Graph Convolutions for Population-based Disease Prediction


Title	Spectral Graph Convolutions for Population-based Disease Prediction
Authors	Sarah Parisot, Sofia Ira Ktena, Enzo Ferrante, Matthew Lee, Ricardo Guerrerro Moreno, Ben Glocker, Daniel Rueckert
Abstract	Exploiting the wealth of imaging and non-imaging information for disease prediction tasks requires models capable of representing, at the same time, individual features as well as data associations between subjects from potentially large populations. Graphs provide a natural framework for such tasks, yet previous graph-based approaches focus on pairwise similarities without modelling the subjects’ individual characteristics and features. On the other hand, relying solely on subject-specific imaging feature vectors fails to model the interaction and similarity between subjects, which can reduce performance. In this paper, we introduce the novel concept of Graph Convolutional Networks (GCN) for brain analysis in populations, combining imaging and non-imaging data. We represent populations as a sparse graph where its vertices are associated with image-based feature vectors and the edges encode phenotypic information. This structure was used to train a GCN model on partially labelled graphs, aiming to infer the classes of unlabelled nodes from the node features and pairwise associations between subjects. We demonstrate the potential of the method on the challenging ADNI and ABIDE databases, as a proof of concept of the benefit from integrating contextual information in classification tasks. This has a clear impact on the quality of the predictions, leading to 69.5% accuracy for ABIDE (outperforming the current state of the art of 66.8%) and 77% for ADNI for prediction of MCI conversion, significantly outperforming standard linear classifiers where only individual features are considered.
Tasks	Disease Prediction
Published	2017-03-08
URL	http://arxiv.org/abs/1703.03020v3
PDF	http://arxiv.org/pdf/1703.03020v3.pdf
PWC	https://paperswithcode.com/paper/spectral-graph-convolutions-for-population
Repo	https://github.com/parisots/population-gcn
Framework	tf

Learning to Generate Long-term Future via Hierarchical Prediction


Title	Learning to Generate Long-term Future via Hierarchical Prediction
Authors	Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee
Abstract	We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions. Long-term video prediction is difficult to perform by recurrently observing the predicted frames because the small errors in pixel space exponentially amplify as predictions are made deeper into the future. Our approach prevents pixel-level error propagation from happening by removing the need to observe the predicted frames. Our model is built with a combination of LSTM and analogy based encoder-decoder convolutional neural networks, which independently predict the video structure and generate the future frames, respectively. In experiments, our model is evaluated on the Human3.6M and Penn Action datasets on the task of long-term pixel-level video prediction of humans performing actions and demonstrate significantly better results than the state-of-the-art.
Tasks	Video Prediction
Published	2017-04-19
URL	http://arxiv.org/abs/1704.05831v5
PDF	http://arxiv.org/pdf/1704.05831v5.pdf
PWC	https://paperswithcode.com/paper/learning-to-generate-long-term-future-via
Repo	https://github.com/xcyan/eccv18_mtvae
Framework	tf

Efficient Manifold and Subspace Approximations with Spherelets


Title	Efficient Manifold and Subspace Approximations with Spherelets
Authors	Didong Li, Minerva Mukhopadhyay, David B. Dunson
Abstract	Data lying in a high dimensional ambient space are commonly thought to have a much lower intrinsic dimension. In particular, the data may be concentrated near a lower-dimensional subspace or manifold. There is an immense literature focused on approximating the unknown subspace, and in exploiting such approximations in clustering, data compression, and building of predictive models. Most of the literature relies on approximating subspaces using a locally linear, and potentially multiscale, dictionary. In this article, we propose a simple and general alternative, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace. Theory is developed showing that spherelets can produce lower covering numbers and MSEs for many manifolds. We develop spherical principal components analysis (SPCA). Results relative to state-of-the-art competitors show gains in ability to accurately approximate the subspace with fewer components. In addition, unlike most competitors, our approach can be used for data denoising and can efficiently embed new data without retraining. The methods are illustrated with standard toy manifold learning examples, and applications to multiple real data sets.
Tasks	Denoising
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08263v3
PDF	http://arxiv.org/pdf/1706.08263v3.pdf
PWC	https://paperswithcode.com/paper/efficient-manifold-and-subspace
Repo	https://github.com/david-dunson/GeodesicDistance
Framework	none

full-FORCE: A Target-Based Method for Training Recurrent Networks


Title	full-FORCE: A Target-Based Method for Training Recurrent Networks
Authors	Brian DePasquale, Christopher J. Cueva, Kanaka Rajan, G. Sean Escola, L. F. Abbott
Abstract	Trained recurrent networks are powerful tools for modeling dynamic neural computations. We present a target-based method for modifying the full connectivity matrix of a recurrent network to train it to perform tasks involving temporally complex input/output transformations. The method introduces a second network during training to provide suitable “target” dynamics useful for performing the task. Because it exploits the full recurrent connectivity, the method produces networks that perform tasks with fewer neurons and greater noise robustness than traditional least-squares (FORCE) approaches. In addition, we show how introducing additional input signals into the target-generating network, which act as task hints, greatly extends the range of tasks that can be learned and provides control over the complexity and nature of the dynamics of the trained, task-performing network.
Tasks
Published	2017-10-09
URL	http://arxiv.org/abs/1710.03070v1
PDF	http://arxiv.org/pdf/1710.03070v1.pdf
PWC	https://paperswithcode.com/paper/full-force-a-target-based-method-for-training
Repo	https://github.com/briandepasquale/full-FORCE-demos
Framework	none

Capsule Network Performance on Complex Data


Title	Capsule Network Performance on Complex Data
Authors	Edgar Xi, Selina Bing, Yang Jin
Abstract	In recent years, convolutional neural networks (CNN) have played an important role in the field of deep learning. Variants of CNN’s have proven to be very successful in classification tasks across different domains. However, there are two big drawbacks to CNN’s: their failure to take into account of important spatial hierarchies between features, and their lack of rotational invariance. As long as certain key features of an object are present in the test data, CNN’s classify the test data as the object, disregarding features’ relative spatial orientation to each other. This causes false positives. The lack of rotational invariance in CNN’s would cause the network to incorrectly assign the object another label, causing false negatives. To address this concern, Hinton et al. propose a novel type of neural network using the concept of capsules in a recent paper. With the use of dynamic routing and reconstruction regularization, the capsule network model would be both rotation invariant and spatially aware. The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0.39%. To further test out the application of capsule networks on data with higher dimensionality, we attempt to find the best set of configurations that yield the optimal test error on CIFAR10 dataset.
Tasks	Data Augmentation
Published	2017-12-10
URL	http://arxiv.org/abs/1712.03480v1
PDF	http://arxiv.org/pdf/1712.03480v1.pdf
PWC	https://paperswithcode.com/paper/capsule-network-performance-on-complex-data
Repo	https://github.com/swkarlekar/summaries
Framework	tf

Folded Recurrent Neural Networks for Future Video Prediction


Title	Folded Recurrent Neural Networks for Future Video Prediction
Authors	Marc Oliu, Javier Selva, Sergio Escalera
Abstract	Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach.
Tasks	Video Prediction
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00311v2
PDF	http://arxiv.org/pdf/1712.00311v2.pdf
PWC	https://paperswithcode.com/paper/folded-recurrent-neural-networks-for-future
Repo	https://github.com/moliusimon/frnn
Framework	tf

Incorporating Prior Information in Compressive Online Robust Principal Component Analysis


Title	Incorporating Prior Information in Compressive Online Robust Principal Component Analysis
Authors	Huynh Van Luong, Nikos Deligiannis, Jurgen Seiler, Soren Forchhammer, Andre Kaup
Abstract	We consider an online version of the robust Principle Component Analysis (PCA), which arises naturally in time-varying source separations such as video foreground-background separation. This paper proposes a compressive online robust PCA with prior information for recursively separating a sequences of frames into sparse and low-rank components from a small set of measurements. In contrast to conventional batch-based PCA, which processes all the frames directly, the proposed method processes measurements taken from each frame. Moreover, this method can efficiently incorporate multiple prior information, namely previous reconstructed frames, to improve the separation and thereafter, update the prior information for the next frame. We utilize multiple prior information by solving $n\text{-}\ell_{1}$ minimization for incorporating the previous sparse components and using incremental singular value decomposition ($\mathrm{SVD}$) for exploiting the previous low-rank components. We also establish theoretical bounds on the number of measurements required to guarantee successful separation under assumptions of static or slowly-changing low-rank components. Using numerical experiments, we evaluate our bounds and the performance of the proposed algorithm. In addition, we apply the proposed algorithm to online video foreground and background separation from compressive measurements. Experimental results show that the proposed method outperforms the existing methods.
Tasks
Published	2017-01-24
URL	http://arxiv.org/abs/1701.06852v2
PDF	http://arxiv.org/pdf/1701.06852v2.pdf
PWC	https://paperswithcode.com/paper/incorporating-prior-information-in
Repo	https://github.com/huynhlvd/corpca
Framework	none

MetaLDA: a Topic Model that Efficiently Incorporates Meta information


Title	MetaLDA: a Topic Model that Efficiently Incorporates Meta information
Authors	He Zhao, Lan Du, Wray Buntine, Gang Liu
Abstract	Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.
Tasks	Topic Models, Word Embeddings
Published	2017-09-19
URL	http://arxiv.org/abs/1709.06365v1
PDF	http://arxiv.org/pdf/1709.06365v1.pdf
PWC	https://paperswithcode.com/paper/metalda-a-topic-model-that-efficiently
Repo	https://github.com/ethanhezhao/MetaLDA
Framework	none

Topic supervised non-negative matrix factorization


Title	Topic supervised non-negative matrix factorization
Authors	Kelsey MacMillan, James D. Wilson
Abstract	Topic models have been extensively used to organize and interpret the contents of large, unstructured corpora of text documents. Although topic models often perform well on traditional training vs. test set evaluations, it is often the case that the results of a topic model do not align with human interpretation. This interpretability fallacy is largely due to the unsupervised nature of topic models, which prohibits any user guidance on the results of a model. In this paper, we introduce a semi-supervised method called topic supervised non-negative matrix factorization (TS-NMF) that enables the user to provide labeled example documents to promote the discovery of more meaningful semantic structure of a corpus. In this way, the results of TS-NMF better match the intuition and desired labeling of the user. The core of TS-NMF relies on solving a non-convex optimization problem for which we derive an iterative algorithm that is shown to be monotonic and convergent to a local optimum. We demonstrate the practical utility of TS-NMF on the Reuters and PubMed corpora, and find that TS-NMF is especially useful for conceptual or broad topics, where topic key terms are not well understood. Although identifying an optimal latent structure for the data is not a primary objective of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard similarity scores than the contemporary methods, (unsupervised) NMF and latent Dirichlet allocation, at supervision rates as low as 10% to 20%.
Tasks	Topic Models
Published	2017-06-12
URL	http://arxiv.org/abs/1706.05084v2
PDF	http://arxiv.org/pdf/1706.05084v2.pdf
PWC	https://paperswithcode.com/paper/topic-supervised-non-negative-matrix
Repo	https://github.com/Vokturz/tsnmf-sparse
Framework	tf

Variational auto-encoding of protein sequences


Title	Variational auto-encoding of protein sequences
Authors	Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak
Abstract	Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the mapping between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein. This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model. This generative model can be used to computationally guide exploration of protein sequence space and to better inform rational and automatic protein design.
Tasks
Published	2017-12-09
URL	http://arxiv.org/abs/1712.03346v3
PDF	http://arxiv.org/pdf/1712.03346v3.pdf
PWC	https://paperswithcode.com/paper/variational-auto-encoding-of-protein
Repo	https://github.com/samsinai/VAE_protein_function
Framework	none