May 7, 2019

2878 words 14 mins read

Paper Group AWR 53

Hierarchical Multiscale Recurrent Neural Networks. Generative Choreography using Deep Learning. High-Dimensional Metrics in R. Effective Quantization Methods for Recurrent Neural Networks. CrowdNet: A Deep Convolutional Network for Dense Crowd Counting. PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI. Grad-CAM: Why …

Hierarchical Multiscale Recurrent Neural Networks


Title	Hierarchical Multiscale Recurrent Neural Networks
Authors	Junyoung Chung, Sungjin Ahn, Yoshua Bengio
Abstract	Learning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism. We show some evidence that our proposed multiscale architecture can discover underlying hierarchical structure in the sequences without using explicit boundary information. We evaluate our proposed model on character-level language modelling and handwriting sequence modelling.
Tasks	Language Modelling
Published	2016-09-06
URL	http://arxiv.org/abs/1609.01704v7
PDF	http://arxiv.org/pdf/1609.01704v7.pdf
PWC	https://paperswithcode.com/paper/hierarchical-multiscale-recurrent-neural
Repo	https://github.com/kaiu85/hm-rnn
Framework	pytorch

Generative Choreography using Deep Learning


Title	Generative Choreography using Deep Learning
Authors	Luka Crnkovic-Friis, Louise Crnkovic-Friis
Abstract	Recent advances in deep learning have enabled the extraction of high-level features from raw sensor data which has opened up new possibilities in many different fields, including computer generated choreography. In this paper we present a system chor-rnn for generating novel choreographic material in the nuanced choreographic language and style of an individual choreographer. It also shows promising results in producing a higher level compositional cohesion, rather than just generating sequences of movement. At the core of chor-rnn is a deep recurrent neural network trained on raw motion capture data and that can generate new dance sequences for a solo dancer. Chor-rnn can be used for collaborative human-machine choreography or as a creative catalyst, serving as inspiration for a choreographer.
Tasks	Motion Capture
Published	2016-05-23
URL	http://arxiv.org/abs/1605.06921v1
PDF	http://arxiv.org/pdf/1605.06921v1.pdf
PWC	https://paperswithcode.com/paper/generative-choreography-using-deep-learning
Repo	https://github.com/mariel-pettee/choreography
Framework	tf

High-Dimensional Metrics in R


Title	High-Dimensional Metrics in R
Authors	Victor Chernozhukov, Chris Hansen, Martin Spindler
Abstract	The package High-dimensional Metrics (\Rpackage{hdm}) is an evolving collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented, including a joint significance test for Lasso regression. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included. \R and the package \Rpackage{hdm} are open-source software projects and can be freely downloaded from CRAN: \texttt{http://cran.r-project.org}.
Tasks
Published	2016-03-05
URL	http://arxiv.org/abs/1603.01700v2
PDF	http://arxiv.org/pdf/1603.01700v2.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-metrics-in-r
Repo	https://github.com/PhilippBach/hdm
Framework	none

Effective Quantization Methods for Recurrent Neural Networks


Title	Effective Quantization Methods for Recurrent Neural Networks
Authors	Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou
Abstract	Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN.
Tasks	Quantization
Published	2016-11-30
URL	http://arxiv.org/abs/1611.10176v1
PDF	http://arxiv.org/pdf/1611.10176v1.pdf
PWC	https://paperswithcode.com/paper/effective-quantization-methods-for-recurrent
Repo	https://github.com/qinyao-he/bit-rnn
Framework	tf

CrowdNet: A Deep Convolutional Network for Dense Crowd Counting


Title	CrowdNet: A Deep Convolutional Network for Dense Crowd Counting
Authors	Lokesh Boominathan, Srinivas S S Kruthiventi, R. Venkatesh Babu
Abstract	Our work proposes a novel deep learning framework for estimating crowd density from static images of highly dense crowds. We use a combination of deep and shallow, fully convolutional networks to predict the density map for a given crowd image. Such a combination is used for effectively capturing both the high-level semantic information (face/body detectors) and the low-level features (blob detectors), that are necessary for crowd counting under large scale variations. As most crowd datasets have limited training samples (<100 images) and deep learning based approaches require large amounts of training data, we perform multi-scale data augmentation. Augmenting the training samples in such a manner helps in guiding the CNN to learn scale invariant representations. Our method is tested on the challenging UCF_CC_50 dataset, and shown to outperform the state of the art methods.
Tasks	Crowd Counting, Data Augmentation
Published	2016-08-22
URL	http://arxiv.org/abs/1608.06197v1
PDF	http://arxiv.org/pdf/1608.06197v1.pdf
PWC	https://paperswithcode.com/paper/crowdnet-a-deep-convolutional-network-for
Repo	https://github.com/violin0847/crowdcounting
Framework	none

PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI


Title	PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI
Authors	Amir Alansary, Bernhard Kainz, Martin Rajchl, Maria Murgasova, Mellisa Damodaram, David F. A. Lloyd, Alice Davidson, Steven G. McDonagh, Mary Rutherford, Joseph V. Hajnal, Daniel Rueckert
Abstract	In this paper we present a novel method for the correction of motion artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible anatomical enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct a large field of view of non-rigidly deforming structures. It relaxes rigid motion assumptions by introducing a specific amount of redundant information that is exploited with parallelized patch-wise optimization, super-resolution, and automatic outlier rejection. We further describe and provide an efficient parallel implementation of PVR allowing its execution within reasonable time on commercially available graphics processing units (GPU), enabling its use in the clinical practice. We evaluate PVR’s computational overhead compared to standard methods and observe improved reconstruction accuracy in presence of affine motion artifacts of approximately 30% compared to conventional SVR in synthetic experiments. Furthermore, we have evaluated our method qualitatively and quantitatively on real fetal MRI data subject to maternal breathing and sudden fetal movements. We evaluate peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM), and cross correlation (CC) with respect to the originally acquired data and provide a method for visual inspection of reconstruction uncertainty. With these experiments we demonstrate successful application of PVR motion compensation to the whole uterus, the human fetus, and the human placenta.
Tasks	Motion Compensation, Super-Resolution
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07289v2
PDF	http://arxiv.org/pdf/1611.07289v2.pdf
PWC	https://paperswithcode.com/paper/pvr-patch-to-volume-reconstruction-for-large
Repo	https://github.com/bkainz/fetalReconstruction
Framework	none

Grad-CAM: Why did you say that?


Title	Grad-CAM: Why did you say that?
Authors	Ramprasaath R Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
Abstract	We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are ‘important’ for predictions – or visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixel-space visualizations to create a novel high-resolution and class-discriminative visualization called Guided Grad-CAM. These methods help better understand CNN-based models, including image captioning and visual question answering (VQA) models. We evaluate our visual explanations by measuring their ability to discriminate between classes, to inspire trust in humans, and their correlation with occlusion maps. Grad-CAM provides a new way to understand CNN-based models. We have released code, an online demo hosted on CloudCV, and a full version of this extended abstract.
Tasks	Image Captioning, Visual Question Answering
Published	2016-11-22
URL	http://arxiv.org/abs/1611.07450v2
PDF	http://arxiv.org/pdf/1611.07450v2.pdf
PWC	https://paperswithcode.com/paper/grad-cam-why-did-you-say-that
Repo	https://github.com/ramprs/grad-cam
Framework	torch

Fast Patch-based Style Transfer of Arbitrary Style


Title	Fast Patch-based Style Transfer of Arbitrary Style
Authors	Tian Qi Chen, Mark Schmidt
Abstract	Artistic style transfer is an image synthesis problem where the content of an image is reproduced with the style of another. Recent works show that a visually appealing style transfer can be achieved by using the hidden activations of a pretrained convolutional neural network. However, existing methods either apply (i) an optimization procedure that works for any style image but is very expensive, or (ii) an efficient feedforward network that only allows a limited number of trained styles. In this work we propose a simpler optimization objective based on local matching that combines the content structure and style textures in a single layer of the pretrained network. We show that our objective has desirable properties such as a simpler optimization landscape, intuitive parameter tuning, and consistent frame-by-frame performance on video. Furthermore, we use 80,000 natural images and 80,000 paintings to train an inverse network that approximates the result of the optimization. This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images.
Tasks	Image Generation, Style Transfer
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04337v1
PDF	http://arxiv.org/pdf/1612.04337v1.pdf
PWC	https://paperswithcode.com/paper/fast-patch-based-style-transfer-of-arbitrary
Repo	https://github.com/JianqiangRen/AAMS
Framework	tf

Structured prediction models for RNN based sequence labeling in clinical text


Title	Structured prediction models for RNN based sequence labeling in clinical text
Authors	Abhyuday Jagannatha, Hong Yu
Abstract	Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Tasks	Named Entity Recognition, Structured Prediction
Published	2016-08-01
URL	http://arxiv.org/abs/1608.00612v1
PDF	http://arxiv.org/pdf/1608.00612v1.pdf
PWC	https://paperswithcode.com/paper/structured-prediction-models-for-rnn-based
Repo	https://github.com/abhyudaynj/LSTM-CRF-models
Framework	none

Guided Alignment Training for Topic-Aware Neural Machine Translation


Title	Guided Alignment Training for Topic-Aware Neural Machine Translation
Authors	Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter
Abstract	In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute.
Tasks	Domain Adaptation, Machine Translation, Word Alignment
Published	2016-07-06
URL	http://arxiv.org/abs/1607.01628v1
PDF	http://arxiv.org/pdf/1607.01628v1.pdf
PWC	https://paperswithcode.com/paper/guided-alignment-training-for-topic-aware
Repo	https://github.com/wenhuchen/iwslt-2015-de-en-topics
Framework	none

Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention


Title	Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
Authors	Yang Liu, Chengjie Sun, Lei Lin, Xiaolong Wang
Abstract	In this paper, we proposed a sentence encoding-based model for recognizing text entailment. In our approach, the encoding of sentence is a two-stage process. Firstly, average pooling was used over word-level bidirectional LSTM (biLSTM) to generate a first-stage sentence representation. Secondly, attention mechanism was employed to replace average pooling on the same sentence for better representations. Instead of using target sentence to attend words in source sentence, we utilized the sentence’s first-stage representation to attend words appeared in itself, which is called “Inner-Attention” in our paper . Experiments conducted on Stanford Natural Language Inference (SNLI) Corpus has proved the effectiveness of “Inner-Attention” mechanism. With less number of parameters, our model outperformed the existing best sentence encoding-based approach by a large margin.
Tasks	Natural Language Inference
Published	2016-05-30
URL	http://arxiv.org/abs/1605.09090v1
PDF	http://arxiv.org/pdf/1605.09090v1.pdf
PWC	https://paperswithcode.com/paper/learning-natural-language-inference-using
Repo	https://github.com/Smerity/keras_snli
Framework	none

Fully-Convolutional Siamese Networks for Object Tracking


Title	Fully-Convolutional Siamese Networks for Object Tracking
Authors	Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr
Abstract	The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.
Tasks	Object Detection, Object Tracking
Published	2016-06-30
URL	http://arxiv.org/abs/1606.09549v2
PDF	http://arxiv.org/pdf/1606.09549v2.pdf
PWC	https://paperswithcode.com/paper/fully-convolutional-siamese-networks-for-1
Repo	https://github.com/suraj-maniyar/Object-Tracking-SSD300
Framework	pytorch

Learning Unitary Operators with Help From u(n)


Title	Learning Unitary Operators with Help From u(n)
Authors	Stephanie L. Hyland, Gunnar Rätsch
Abstract	A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra $\mathfrak{u}(n)$ associated with the Lie group $U(n)$ of $n \times n$ unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using $n^2$ real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks.
Tasks
Published	2016-07-17
URL	http://arxiv.org/abs/1607.04903v3
PDF	http://arxiv.org/pdf/1607.04903v3.pdf
PWC	https://paperswithcode.com/paper/learning-unitary-operators-with-help-from-un
Repo	https://github.com/ratschlab/uRNN
Framework	none

Efficient Metric Learning for the Analysis of Motion Data


Title	Efficient Metric Learning for the Analysis of Motion Data
Authors	Babak Hosseini, Barbara Hammer
Abstract	We investigate metric learning in the context of dynamic time warping (DTW), the by far most popular dissimilarity measure used for the comparison and analysis of motion capture data. While metric learning enables a problem-adapted representation of data, the majority of methods has been proposed for vectorial data only. In this contribution, we extend the popular principle offered by the large margin nearest neighbors learner (LMNN) to DTW by treating the resulting component-wise dissimilarity values as features. We demonstrate that this principle greatly enhances the classification accuracy in several benchmarks. Further, we show that recent auxiliary concepts such as metric regularization can be transferred from the vectorial case to component-wise DTW in a similar way. We illustrate that metric regularization constitutes a crucial prerequisite for the interpretation of the resulting relevance profiles.
Tasks	Metric Learning, Motion Capture
Published	2016-10-17
URL	http://arxiv.org/abs/1610.05083v3
PDF	http://arxiv.org/pdf/1610.05083v3.pdf
PWC	https://paperswithcode.com/paper/efficient-metric-learning-for-the-analysis-of
Repo	https://github.com/bab-git/dist-LMNN
Framework	none

Residual Networks of Residual Networks: Multilevel Residual Networks


Title	Residual Networks of Residual Networks: Multilevel Residual Networks
Authors	Ke Zhang, Miao Sun, Tony X. Han, Xingfang Yuan, Liru Guo, Tao Liu
Abstract	A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping. In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set.
Tasks	Image Classification
Published	2016-08-09
URL	http://arxiv.org/abs/1608.02908v2
PDF	http://arxiv.org/pdf/1608.02908v2.pdf
PWC	https://paperswithcode.com/paper/residual-networks-of-residual-networks
Repo	https://github.com/osmr/imgclsmob
Framework	mxnet