Paper Group AWR 105
Two-stream Flow-guided Convolutional Attention Networks for Action Recognition. Deep Voice: Real-time Neural Text-to-Speech. An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery. GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks. Incorporation of prior knowledge of the signal behavior into the …
Two-stream Flow-guided Convolutional Attention Networks for Action Recognition
Title | Two-stream Flow-guided Convolutional Attention Networks for Action Recognition |
Authors | An Tran, Loong-Fah Cheong |
Abstract | This paper proposes a two-stream flow-guided convolutional attention networks for action recognition in videos. The central idea is that optical flows, when properly compensated for the camera motion, can be used to guide attention to the human foreground. We thus develop cross-link layers from the temporal network (trained on flows) to the spatial network (trained on RGB frames). These cross-link layers guide the spatial-stream to pay more attention to the human foreground areas and be less affected by background clutter. We obtain promising performances with our approach on the UCF101, HMDB51 and Hollywood2 datasets. |
Tasks | Action Recognition In Videos, Temporal Action Localization |
Published | 2017-08-30 |
URL | http://arxiv.org/abs/1708.09268v1 |
http://arxiv.org/pdf/1708.09268v1.pdf | |
PWC | https://paperswithcode.com/paper/two-stream-flow-guided-convolutional |
Repo | https://github.com/antran89/two-stream-fcan |
Framework | none |
Deep Voice: Real-time Neural Text-to-Speech
Title | Deep Voice: Real-time Neural Text-to-Speech |
Authors | Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi |
Abstract | We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks. Deep Voice lays the groundwork for truly end-to-end neural speech synthesis. The system comprises five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For the segmentation model, we propose a novel way of performing phoneme boundary detection with deep neural networks using connectionist temporal classification (CTC) loss. For the audio synthesis model, we implement a variant of WaveNet that requires fewer parameters and trains faster than the original. By using a neural network for each component, our system is simpler and more flexible than traditional text-to-speech systems, where each component requires laborious feature engineering and extensive domain expertise. Finally, we show that inference with our system can be performed faster than real time and describe optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations. |
Tasks | Boundary Detection, Feature Engineering, Speech Synthesis |
Published | 2017-02-25 |
URL | http://arxiv.org/abs/1702.07825v2 |
http://arxiv.org/pdf/1702.07825v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-voice-real-time-neural-text-to-speech |
Repo | https://github.com/NVIDIA/nv-wavenet |
Framework | pytorch |
An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery
Title | An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery |
Authors | Alex Tank, Ian Cover, Nicholas J. Foti, Ali Shojaie, Emily B. Fox |
Abstract | While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A sufficient condition for Granger non-causality in this setting is that all of the outgoing weights of the input data, the past lags of a series, to the first hidden layer are zero. For estimation, we utilize a group lasso penalty to shrink groups of input weights to zero. We also propose a hierarchical penalty for simultaneous Granger causality and lag estimation. We validate our approach on simulated data from both a sparse linear autoregressive model and the sparse and nonlinear Lorenz-96 model. |
Tasks | Time Series |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08160v2 |
http://arxiv.org/pdf/1711.08160v2.pdf | |
PWC | https://paperswithcode.com/paper/an-interpretable-and-sparse-neural-network |
Repo | https://github.com/christeefy/Novel-Techniques-for-PTR-FD |
Framework | none |
GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks
Title | GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks |
Authors | Xing Di, Vishwanath A. Sindagi, Vishal M. Patel |
Abstract | Facial landmarks constitute the most compressed representation of faces and are known to preserve information such as pose, gender and facial structure present in the faces. Several works exist that attempt to perform high-level face-related analysis tasks based on landmarks. In contrast, in this work, an attempt is made to tackle the inverse problem of synthesizing faces from their respective landmarks. The primary aim of this work is to demonstrate that information preserved by landmarks (gender in particular) can be further accentuated by leveraging generative models to synthesize corresponding faces. Though the problem is particularly challenging due to its ill-posed nature, we believe that successful synthesis will enable several applications such as boosting performance of high-level face related tasks using landmark points and performing dataset augmentation. To this end, a novel face-synthesis method known as Gender Preserving Generative Adversarial Network (GP-GAN) that is guided by adversarial loss, perceptual loss and a gender preserving loss is presented. Further, we propose a novel generator sub-network UDeNet for GP-GAN that leverages advantages of U-Net and DenseNet architectures. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed method. |
Tasks | Face Generation |
Published | 2017-10-03 |
URL | http://arxiv.org/abs/1710.00962v2 |
http://arxiv.org/pdf/1710.00962v2.pdf | |
PWC | https://paperswithcode.com/paper/gp-gan-gender-preserving-gan-for-synthesizing |
Repo | https://github.com/DetionDX/GP-GAN-GenderPreserving-GAN-for-Synthesizing-Faces-from-Landmarks |
Framework | pytorch |
Incorporation of prior knowledge of the signal behavior into the reconstruction to accelerate the acquisition of MR diffusion data
Title | Incorporation of prior knowledge of the signal behavior into the reconstruction to accelerate the acquisition of MR diffusion data |
Authors | Juan F P J Abascal, Manuel Desco, Juan Parra-Robles |
Abstract | Diffusion MRI measurements using hyperpolarized gases are generally acquired during patient breath hold, which yields a compromise between achievable image resolution, lung coverage and number of b-values. In this work, we propose a novel method that accelerates the acquisition of MR diffusion data by undersampling in both spatial and b-value dimensions, thanks to incorporating knowledge about the signal decay into the reconstruction (SIDER). SIDER is compared to total variation (TV) reconstruction by assessing their effect on both the recovery of ventilation images and estimated mean alveolar dimensions (MAD). Both methods are assessed by retrospectively undersampling diffusion datasets of normal volunteers and COPD patients (n=8) for acceleration factors between x2 and x10. TV led to large errors and artefacts for acceleration factors equal or larger than x5. SIDER improved TV, presenting lower errors and histograms of MAD closer to those obtained from fully sampled data for accelerations factors up to x10. SIDER preserved image quality at all acceleration factors but images were slightly smoothed and some details were lost at x10. In conclusion, we have developed and validated a novel compressed sensing method for lung MRI imaging and achieved high acceleration factors, which can be used to increase the amount of data acquired during a breath-hold. This methodology is expected to improve the accuracy of estimated lung microstructure dimensions and widen the possibilities of studying lung diseases with MRI. |
Tasks | |
Published | 2017-02-09 |
URL | http://arxiv.org/abs/1702.02743v1 |
http://arxiv.org/pdf/1702.02743v1.pdf | |
PWC | https://paperswithcode.com/paper/incorporation-of-prior-knowledge-of-the |
Repo | https://github.com/HGGM-LIM/compressed-sensing-diffusion-lung-MRI |
Framework | none |
Spectral Graph Convolutions for Population-based Disease Prediction
Title | Spectral Graph Convolutions for Population-based Disease Prediction |
Authors | Sarah Parisot, Sofia Ira Ktena, Enzo Ferrante, Matthew Lee, Ricardo Guerrerro Moreno, Ben Glocker, Daniel Rueckert |
Abstract | Exploiting the wealth of imaging and non-imaging information for disease prediction tasks requires models capable of representing, at the same time, individual features as well as data associations between subjects from potentially large populations. Graphs provide a natural framework for such tasks, yet previous graph-based approaches focus on pairwise similarities without modelling the subjects’ individual characteristics and features. On the other hand, relying solely on subject-specific imaging feature vectors fails to model the interaction and similarity between subjects, which can reduce performance. In this paper, we introduce the novel concept of Graph Convolutional Networks (GCN) for brain analysis in populations, combining imaging and non-imaging data. We represent populations as a sparse graph where its vertices are associated with image-based feature vectors and the edges encode phenotypic information. This structure was used to train a GCN model on partially labelled graphs, aiming to infer the classes of unlabelled nodes from the node features and pairwise associations between subjects. We demonstrate the potential of the method on the challenging ADNI and ABIDE databases, as a proof of concept of the benefit from integrating contextual information in classification tasks. This has a clear impact on the quality of the predictions, leading to 69.5% accuracy for ABIDE (outperforming the current state of the art of 66.8%) and 77% for ADNI for prediction of MCI conversion, significantly outperforming standard linear classifiers where only individual features are considered. |
Tasks | Disease Prediction |
Published | 2017-03-08 |
URL | http://arxiv.org/abs/1703.03020v3 |
http://arxiv.org/pdf/1703.03020v3.pdf | |
PWC | https://paperswithcode.com/paper/spectral-graph-convolutions-for-population |
Repo | https://github.com/parisots/population-gcn |
Framework | tf |
Learning to Generate Long-term Future via Hierarchical Prediction
Title | Learning to Generate Long-term Future via Hierarchical Prediction |
Authors | Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee |
Abstract | We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixel-level prediction, we propose to first estimate high-level structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted high-level structure, we construct the future frames without having to observe any of the pixel-level predictions. Long-term video prediction is difficult to perform by recurrently observing the predicted frames because the small errors in pixel space exponentially amplify as predictions are made deeper into the future. Our approach prevents pixel-level error propagation from happening by removing the need to observe the predicted frames. Our model is built with a combination of LSTM and analogy based encoder-decoder convolutional neural networks, which independently predict the video structure and generate the future frames, respectively. In experiments, our model is evaluated on the Human3.6M and Penn Action datasets on the task of long-term pixel-level video prediction of humans performing actions and demonstrate significantly better results than the state-of-the-art. |
Tasks | Video Prediction |
Published | 2017-04-19 |
URL | http://arxiv.org/abs/1704.05831v5 |
http://arxiv.org/pdf/1704.05831v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-generate-long-term-future-via |
Repo | https://github.com/xcyan/eccv18_mtvae |
Framework | tf |
Efficient Manifold and Subspace Approximations with Spherelets
Title | Efficient Manifold and Subspace Approximations with Spherelets |
Authors | Didong Li, Minerva Mukhopadhyay, David B. Dunson |
Abstract | Data lying in a high dimensional ambient space are commonly thought to have a much lower intrinsic dimension. In particular, the data may be concentrated near a lower-dimensional subspace or manifold. There is an immense literature focused on approximating the unknown subspace, and in exploiting such approximations in clustering, data compression, and building of predictive models. Most of the literature relies on approximating subspaces using a locally linear, and potentially multiscale, dictionary. In this article, we propose a simple and general alternative, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace. Theory is developed showing that spherelets can produce lower covering numbers and MSEs for many manifolds. We develop spherical principal components analysis (SPCA). Results relative to state-of-the-art competitors show gains in ability to accurately approximate the subspace with fewer components. In addition, unlike most competitors, our approach can be used for data denoising and can efficiently embed new data without retraining. The methods are illustrated with standard toy manifold learning examples, and applications to multiple real data sets. |
Tasks | Denoising |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08263v3 |
http://arxiv.org/pdf/1706.08263v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-manifold-and-subspace |
Repo | https://github.com/david-dunson/GeodesicDistance |
Framework | none |
full-FORCE: A Target-Based Method for Training Recurrent Networks
Title | full-FORCE: A Target-Based Method for Training Recurrent Networks |
Authors | Brian DePasquale, Christopher J. Cueva, Kanaka Rajan, G. Sean Escola, L. F. Abbott |
Abstract | Trained recurrent networks are powerful tools for modeling dynamic neural computations. We present a target-based method for modifying the full connectivity matrix of a recurrent network to train it to perform tasks involving temporally complex input/output transformations. The method introduces a second network during training to provide suitable “target” dynamics useful for performing the task. Because it exploits the full recurrent connectivity, the method produces networks that perform tasks with fewer neurons and greater noise robustness than traditional least-squares (FORCE) approaches. In addition, we show how introducing additional input signals into the target-generating network, which act as task hints, greatly extends the range of tasks that can be learned and provides control over the complexity and nature of the dynamics of the trained, task-performing network. |
Tasks | |
Published | 2017-10-09 |
URL | http://arxiv.org/abs/1710.03070v1 |
http://arxiv.org/pdf/1710.03070v1.pdf | |
PWC | https://paperswithcode.com/paper/full-force-a-target-based-method-for-training |
Repo | https://github.com/briandepasquale/full-FORCE-demos |
Framework | none |
Capsule Network Performance on Complex Data
Title | Capsule Network Performance on Complex Data |
Authors | Edgar Xi, Selina Bing, Yang Jin |
Abstract | In recent years, convolutional neural networks (CNN) have played an important role in the field of deep learning. Variants of CNN’s have proven to be very successful in classification tasks across different domains. However, there are two big drawbacks to CNN’s: their failure to take into account of important spatial hierarchies between features, and their lack of rotational invariance. As long as certain key features of an object are present in the test data, CNN’s classify the test data as the object, disregarding features’ relative spatial orientation to each other. This causes false positives. The lack of rotational invariance in CNN’s would cause the network to incorrectly assign the object another label, causing false negatives. To address this concern, Hinton et al. propose a novel type of neural network using the concept of capsules in a recent paper. With the use of dynamic routing and reconstruction regularization, the capsule network model would be both rotation invariant and spatially aware. The capsule network has shown its potential by achieving a state-of-the-art result of 0.25% test error on MNIST without data augmentation such as rotation and scaling, better than the previous baseline of 0.39%. To further test out the application of capsule networks on data with higher dimensionality, we attempt to find the best set of configurations that yield the optimal test error on CIFAR10 dataset. |
Tasks | Data Augmentation |
Published | 2017-12-10 |
URL | http://arxiv.org/abs/1712.03480v1 |
http://arxiv.org/pdf/1712.03480v1.pdf | |
PWC | https://paperswithcode.com/paper/capsule-network-performance-on-complex-data |
Repo | https://github.com/swkarlekar/summaries |
Framework | tf |
Folded Recurrent Neural Networks for Future Video Prediction
Title | Folded Recurrent Neural Networks for Future Video Prediction |
Authors | Marc Oliu, Javier Selva, Sergio Escalera |
Abstract | Future video prediction is an ill-posed Computer Vision problem that recently received much attention. Its main challenges are the high variability in video content, the propagation of errors through time, and the non-specificity of the future frames: given a sequence of past frames there is a continuous distribution of possible futures. This work introduces bijective Gated Recurrent Units, a double mapping between the input and output of a GRU layer. This allows for recurrent auto-encoders with state sharing between encoder and decoder, stratifying the sequence representation and helping to prevent capacity problems. We show how with this topology only the encoder or decoder needs to be applied for input encoding and prediction, respectively. This reduces the computational cost and avoids re-encoding the predictions when generating a sequence of frames, mitigating the propagation of errors. Furthermore, it is possible to remove layers from an already trained model, giving an insight to the role performed by each layer and making the model more explainable. We evaluate our approach on three video datasets, outperforming state of the art prediction results on MMNIST and UCF101, and obtaining competitive results on KTH with 2 and 3 times less memory usage and computational cost than the best scored approach. |
Tasks | Video Prediction |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00311v2 |
http://arxiv.org/pdf/1712.00311v2.pdf | |
PWC | https://paperswithcode.com/paper/folded-recurrent-neural-networks-for-future |
Repo | https://github.com/moliusimon/frnn |
Framework | tf |
Incorporating Prior Information in Compressive Online Robust Principal Component Analysis
Title | Incorporating Prior Information in Compressive Online Robust Principal Component Analysis |
Authors | Huynh Van Luong, Nikos Deligiannis, Jurgen Seiler, Soren Forchhammer, Andre Kaup |
Abstract | We consider an online version of the robust Principle Component Analysis (PCA), which arises naturally in time-varying source separations such as video foreground-background separation. This paper proposes a compressive online robust PCA with prior information for recursively separating a sequences of frames into sparse and low-rank components from a small set of measurements. In contrast to conventional batch-based PCA, which processes all the frames directly, the proposed method processes measurements taken from each frame. Moreover, this method can efficiently incorporate multiple prior information, namely previous reconstructed frames, to improve the separation and thereafter, update the prior information for the next frame. We utilize multiple prior information by solving $n\text{-}\ell_{1}$ minimization for incorporating the previous sparse components and using incremental singular value decomposition ($\mathrm{SVD}$) for exploiting the previous low-rank components. We also establish theoretical bounds on the number of measurements required to guarantee successful separation under assumptions of static or slowly-changing low-rank components. Using numerical experiments, we evaluate our bounds and the performance of the proposed algorithm. In addition, we apply the proposed algorithm to online video foreground and background separation from compressive measurements. Experimental results show that the proposed method outperforms the existing methods. |
Tasks | |
Published | 2017-01-24 |
URL | http://arxiv.org/abs/1701.06852v2 |
http://arxiv.org/pdf/1701.06852v2.pdf | |
PWC | https://paperswithcode.com/paper/incorporating-prior-information-in |
Repo | https://github.com/huynhlvd/corpca |
Framework | none |
MetaLDA: a Topic Model that Efficiently Incorporates Meta information
Title | MetaLDA: a Topic Model that Efficiently Incorporates Meta information |
Authors | He Zhao, Lan Du, Wray Buntine, Gang Liu |
Abstract | Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster. |
Tasks | Topic Models, Word Embeddings |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06365v1 |
http://arxiv.org/pdf/1709.06365v1.pdf | |
PWC | https://paperswithcode.com/paper/metalda-a-topic-model-that-efficiently |
Repo | https://github.com/ethanhezhao/MetaLDA |
Framework | none |
Topic supervised non-negative matrix factorization
Title | Topic supervised non-negative matrix factorization |
Authors | Kelsey MacMillan, James D. Wilson |
Abstract | Topic models have been extensively used to organize and interpret the contents of large, unstructured corpora of text documents. Although topic models often perform well on traditional training vs. test set evaluations, it is often the case that the results of a topic model do not align with human interpretation. This interpretability fallacy is largely due to the unsupervised nature of topic models, which prohibits any user guidance on the results of a model. In this paper, we introduce a semi-supervised method called topic supervised non-negative matrix factorization (TS-NMF) that enables the user to provide labeled example documents to promote the discovery of more meaningful semantic structure of a corpus. In this way, the results of TS-NMF better match the intuition and desired labeling of the user. The core of TS-NMF relies on solving a non-convex optimization problem for which we derive an iterative algorithm that is shown to be monotonic and convergent to a local optimum. We demonstrate the practical utility of TS-NMF on the Reuters and PubMed corpora, and find that TS-NMF is especially useful for conceptual or broad topics, where topic key terms are not well understood. Although identifying an optimal latent structure for the data is not a primary objective of the proposed approach, we find that TS-NMF achieves higher weighted Jaccard similarity scores than the contemporary methods, (unsupervised) NMF and latent Dirichlet allocation, at supervision rates as low as 10% to 20%. |
Tasks | Topic Models |
Published | 2017-06-12 |
URL | http://arxiv.org/abs/1706.05084v2 |
http://arxiv.org/pdf/1706.05084v2.pdf | |
PWC | https://paperswithcode.com/paper/topic-supervised-non-negative-matrix |
Repo | https://github.com/Vokturz/tsnmf-sparse |
Framework | tf |
Variational auto-encoding of protein sequences
Title | Variational auto-encoding of protein sequences |
Authors | Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak |
Abstract | Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and to predict the effects of mutations is extremely valuable in many domains of biology and medicine. However the mapping between protein sequence and function is complex and poorly understood. Here we present an embedding of natural protein sequences using a Variational Auto-Encoder and use it to predict how mutations affect protein function. We use this unsupervised approach to cluster natural variants and learn interactions between sets of positions within a protein. This approach generally performs better than baseline methods that consider no interactions within sequences, and in some cases better than the state-of-the-art approaches that use the inverse-Potts model. This generative model can be used to computationally guide exploration of protein sequence space and to better inform rational and automatic protein design. |
Tasks | |
Published | 2017-12-09 |
URL | http://arxiv.org/abs/1712.03346v3 |
http://arxiv.org/pdf/1712.03346v3.pdf | |
PWC | https://paperswithcode.com/paper/variational-auto-encoding-of-protein |
Repo | https://github.com/samsinai/VAE_protein_function |
Framework | none |