Paper Group AWR 53
Hierarchical Multiscale Recurrent Neural Networks. Generative Choreography using Deep Learning. High-Dimensional Metrics in R. Effective Quantization Methods for Recurrent Neural Networks. CrowdNet: A Deep Convolutional Network for Dense Crowd Counting. PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI. Grad-CAM: Why …
Hierarchical Multiscale Recurrent Neural Networks
Title | Hierarchical Multiscale Recurrent Neural Networks |
Authors | Junyoung Chung, Sungjin Ahn, Yoshua Bengio |
Abstract | Learning both hierarchical and temporal representation has been among the long-standing challenges of recurrent neural networks. Multiscale recurrent neural networks have been considered as a promising approach to resolve this issue, yet there has been a lack of empirical evidence showing that this type of models can actually capture the temporal dependencies by discovering the latent hierarchical structure of the sequence. In this paper, we propose a novel multiscale approach, called the hierarchical multiscale recurrent neural networks, which can capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism. We show some evidence that our proposed multiscale architecture can discover underlying hierarchical structure in the sequences without using explicit boundary information. We evaluate our proposed model on character-level language modelling and handwriting sequence modelling. |
Tasks | Language Modelling |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01704v7 |
http://arxiv.org/pdf/1609.01704v7.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-multiscale-recurrent-neural |
Repo | https://github.com/kaiu85/hm-rnn |
Framework | pytorch |
Generative Choreography using Deep Learning
Title | Generative Choreography using Deep Learning |
Authors | Luka Crnkovic-Friis, Louise Crnkovic-Friis |
Abstract | Recent advances in deep learning have enabled the extraction of high-level features from raw sensor data which has opened up new possibilities in many different fields, including computer generated choreography. In this paper we present a system chor-rnn for generating novel choreographic material in the nuanced choreographic language and style of an individual choreographer. It also shows promising results in producing a higher level compositional cohesion, rather than just generating sequences of movement. At the core of chor-rnn is a deep recurrent neural network trained on raw motion capture data and that can generate new dance sequences for a solo dancer. Chor-rnn can be used for collaborative human-machine choreography or as a creative catalyst, serving as inspiration for a choreographer. |
Tasks | Motion Capture |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.06921v1 |
http://arxiv.org/pdf/1605.06921v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-choreography-using-deep-learning |
Repo | https://github.com/mariel-pettee/choreography |
Framework | tf |
High-Dimensional Metrics in R
Title | High-Dimensional Metrics in R |
Authors | Victor Chernozhukov, Chris Hansen, Martin Spindler |
Abstract | The package High-dimensional Metrics (\Rpackage{hdm}) is an evolving collection of statistical methods for estimation and quantification of uncertainty in high-dimensional approximately sparse models. It focuses on providing confidence intervals and significance testing for (possibly many) low-dimensional subcomponents of the high-dimensional parameter vector. Efficient estimators and uniformly valid confidence intervals for regression coefficients on target variables (e.g., treatment or policy variable) in a high-dimensional approximately sparse regression model, for average treatment effect (ATE) and average treatment effect for the treated (ATET), as well for extensions of these parameters to the endogenous setting are provided. Theory grounded, data-driven methods for selecting the penalization parameter in Lasso regressions under heteroscedastic and non-Gaussian errors are implemented. Moreover, joint/ simultaneous confidence intervals for regression coefficients of a high-dimensional sparse regression are implemented, including a joint significance test for Lasso regression. Data sets which have been used in the literature and might be useful for classroom demonstration and for testing new estimators are included. \R and the package \Rpackage{hdm} are open-source software projects and can be freely downloaded from CRAN: \texttt{http://cran.r-project.org}. |
Tasks | |
Published | 2016-03-05 |
URL | http://arxiv.org/abs/1603.01700v2 |
http://arxiv.org/pdf/1603.01700v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-metrics-in-r |
Repo | https://github.com/PhilippBach/hdm |
Framework | none |
Effective Quantization Methods for Recurrent Neural Networks
Title | Effective Quantization Methods for Recurrent Neural Networks |
Authors | Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou, Yuheng Zou |
Abstract | Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN. |
Tasks | Quantization |
Published | 2016-11-30 |
URL | http://arxiv.org/abs/1611.10176v1 |
http://arxiv.org/pdf/1611.10176v1.pdf | |
PWC | https://paperswithcode.com/paper/effective-quantization-methods-for-recurrent |
Repo | https://github.com/qinyao-he/bit-rnn |
Framework | tf |
CrowdNet: A Deep Convolutional Network for Dense Crowd Counting
Title | CrowdNet: A Deep Convolutional Network for Dense Crowd Counting |
Authors | Lokesh Boominathan, Srinivas S S Kruthiventi, R. Venkatesh Babu |
Abstract | Our work proposes a novel deep learning framework for estimating crowd density from static images of highly dense crowds. We use a combination of deep and shallow, fully convolutional networks to predict the density map for a given crowd image. Such a combination is used for effectively capturing both the high-level semantic information (face/body detectors) and the low-level features (blob detectors), that are necessary for crowd counting under large scale variations. As most crowd datasets have limited training samples (<100 images) and deep learning based approaches require large amounts of training data, we perform multi-scale data augmentation. Augmenting the training samples in such a manner helps in guiding the CNN to learn scale invariant representations. Our method is tested on the challenging UCF_CC_50 dataset, and shown to outperform the state of the art methods. |
Tasks | Crowd Counting, Data Augmentation |
Published | 2016-08-22 |
URL | http://arxiv.org/abs/1608.06197v1 |
http://arxiv.org/pdf/1608.06197v1.pdf | |
PWC | https://paperswithcode.com/paper/crowdnet-a-deep-convolutional-network-for |
Repo | https://github.com/violin0847/crowdcounting |
Framework | none |
PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI
Title | PVR: Patch-to-Volume Reconstruction for Large Area Motion Correction of Fetal MRI |
Authors | Amir Alansary, Bernhard Kainz, Martin Rajchl, Maria Murgasova, Mellisa Damodaram, David F. A. Lloyd, Alice Davidson, Steven G. McDonagh, Mary Rutherford, Joseph V. Hajnal, Daniel Rueckert |
Abstract | In this paper we present a novel method for the correction of motion artifacts that are present in fetal Magnetic Resonance Imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible anatomical enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct a large field of view of non-rigidly deforming structures. It relaxes rigid motion assumptions by introducing a specific amount of redundant information that is exploited with parallelized patch-wise optimization, super-resolution, and automatic outlier rejection. We further describe and provide an efficient parallel implementation of PVR allowing its execution within reasonable time on commercially available graphics processing units (GPU), enabling its use in the clinical practice. We evaluate PVR’s computational overhead compared to standard methods and observe improved reconstruction accuracy in presence of affine motion artifacts of approximately 30% compared to conventional SVR in synthetic experiments. Furthermore, we have evaluated our method qualitatively and quantitatively on real fetal MRI data subject to maternal breathing and sudden fetal movements. We evaluate peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM), and cross correlation (CC) with respect to the originally acquired data and provide a method for visual inspection of reconstruction uncertainty. With these experiments we demonstrate successful application of PVR motion compensation to the whole uterus, the human fetus, and the human placenta. |
Tasks | Motion Compensation, Super-Resolution |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07289v2 |
http://arxiv.org/pdf/1611.07289v2.pdf | |
PWC | https://paperswithcode.com/paper/pvr-patch-to-volume-reconstruction-for-large |
Repo | https://github.com/bkainz/fetalReconstruction |
Framework | none |
Grad-CAM: Why did you say that?
Title | Grad-CAM: Why did you say that? |
Authors | Ramprasaath R Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra |
Abstract | We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are ‘important’ for predictions – or visual explanations. Our approach, called Gradient-weighted Class Activation Mapping (Grad-CAM), uses class-specific gradient information to localize important regions. These localizations are combined with existing pixel-space visualizations to create a novel high-resolution and class-discriminative visualization called Guided Grad-CAM. These methods help better understand CNN-based models, including image captioning and visual question answering (VQA) models. We evaluate our visual explanations by measuring their ability to discriminate between classes, to inspire trust in humans, and their correlation with occlusion maps. Grad-CAM provides a new way to understand CNN-based models. We have released code, an online demo hosted on CloudCV, and a full version of this extended abstract. |
Tasks | Image Captioning, Visual Question Answering |
Published | 2016-11-22 |
URL | http://arxiv.org/abs/1611.07450v2 |
http://arxiv.org/pdf/1611.07450v2.pdf | |
PWC | https://paperswithcode.com/paper/grad-cam-why-did-you-say-that |
Repo | https://github.com/ramprs/grad-cam |
Framework | torch |
Fast Patch-based Style Transfer of Arbitrary Style
Title | Fast Patch-based Style Transfer of Arbitrary Style |
Authors | Tian Qi Chen, Mark Schmidt |
Abstract | Artistic style transfer is an image synthesis problem where the content of an image is reproduced with the style of another. Recent works show that a visually appealing style transfer can be achieved by using the hidden activations of a pretrained convolutional neural network. However, existing methods either apply (i) an optimization procedure that works for any style image but is very expensive, or (ii) an efficient feedforward network that only allows a limited number of trained styles. In this work we propose a simpler optimization objective based on local matching that combines the content structure and style textures in a single layer of the pretrained network. We show that our objective has desirable properties such as a simpler optimization landscape, intuitive parameter tuning, and consistent frame-by-frame performance on video. Furthermore, we use 80,000 natural images and 80,000 paintings to train an inverse network that approximates the result of the optimization. This results in a procedure for artistic style transfer that is efficient but also allows arbitrary content and style images. |
Tasks | Image Generation, Style Transfer |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04337v1 |
http://arxiv.org/pdf/1612.04337v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-patch-based-style-transfer-of-arbitrary |
Repo | https://github.com/JianqiangRen/AAMS |
Framework | tf |
Structured prediction models for RNN based sequence labeling in clinical text
Title | Structured prediction models for RNN based sequence labeling in clinical text |
Authors | Abhyuday Jagannatha, Hong Yu |
Abstract | Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities. |
Tasks | Named Entity Recognition, Structured Prediction |
Published | 2016-08-01 |
URL | http://arxiv.org/abs/1608.00612v1 |
http://arxiv.org/pdf/1608.00612v1.pdf | |
PWC | https://paperswithcode.com/paper/structured-prediction-models-for-rnn-based |
Repo | https://github.com/abhyudaynj/LSTM-CRF-models |
Framework | none |
Guided Alignment Training for Topic-Aware Neural Machine Translation
Title | Guided Alignment Training for Topic-Aware Neural Machine Translation |
Authors | Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter |
Abstract | In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The developed NMT system also performs well on the IWSLT speech translation task, where an ensemble of four variant systems outperforms the phrase-based baseline by 2.1% BLEU absolute. |
Tasks | Domain Adaptation, Machine Translation, Word Alignment |
Published | 2016-07-06 |
URL | http://arxiv.org/abs/1607.01628v1 |
http://arxiv.org/pdf/1607.01628v1.pdf | |
PWC | https://paperswithcode.com/paper/guided-alignment-training-for-topic-aware |
Repo | https://github.com/wenhuchen/iwslt-2015-de-en-topics |
Framework | none |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention
Title | Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention |
Authors | Yang Liu, Chengjie Sun, Lei Lin, Xiaolong Wang |
Abstract | In this paper, we proposed a sentence encoding-based model for recognizing text entailment. In our approach, the encoding of sentence is a two-stage process. Firstly, average pooling was used over word-level bidirectional LSTM (biLSTM) to generate a first-stage sentence representation. Secondly, attention mechanism was employed to replace average pooling on the same sentence for better representations. Instead of using target sentence to attend words in source sentence, we utilized the sentence’s first-stage representation to attend words appeared in itself, which is called “Inner-Attention” in our paper . Experiments conducted on Stanford Natural Language Inference (SNLI) Corpus has proved the effectiveness of “Inner-Attention” mechanism. With less number of parameters, our model outperformed the existing best sentence encoding-based approach by a large margin. |
Tasks | Natural Language Inference |
Published | 2016-05-30 |
URL | http://arxiv.org/abs/1605.09090v1 |
http://arxiv.org/pdf/1605.09090v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-natural-language-inference-using |
Repo | https://github.com/Smerity/keras_snli |
Framework | none |
Fully-Convolutional Siamese Networks for Object Tracking
Title | Fully-Convolutional Siamese Networks for Object Tracking |
Authors | Luca Bertinetto, Jack Valmadre, João F. Henriques, Andrea Vedaldi, Philip H. S. Torr |
Abstract | The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks. |
Tasks | Object Detection, Object Tracking |
Published | 2016-06-30 |
URL | http://arxiv.org/abs/1606.09549v2 |
http://arxiv.org/pdf/1606.09549v2.pdf | |
PWC | https://paperswithcode.com/paper/fully-convolutional-siamese-networks-for-1 |
Repo | https://github.com/suraj-maniyar/Object-Tracking-SSD300 |
Framework | pytorch |
Learning Unitary Operators with Help From u(n)
Title | Learning Unitary Operators with Help From u(n) |
Authors | Stephanie L. Hyland, Gunnar Rätsch |
Abstract | A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra $\mathfrak{u}(n)$ associated with the Lie group $U(n)$ of $n \times n$ unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using $n^2$ real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks. |
Tasks | |
Published | 2016-07-17 |
URL | http://arxiv.org/abs/1607.04903v3 |
http://arxiv.org/pdf/1607.04903v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-unitary-operators-with-help-from-un |
Repo | https://github.com/ratschlab/uRNN |
Framework | none |
Efficient Metric Learning for the Analysis of Motion Data
Title | Efficient Metric Learning for the Analysis of Motion Data |
Authors | Babak Hosseini, Barbara Hammer |
Abstract | We investigate metric learning in the context of dynamic time warping (DTW), the by far most popular dissimilarity measure used for the comparison and analysis of motion capture data. While metric learning enables a problem-adapted representation of data, the majority of methods has been proposed for vectorial data only. In this contribution, we extend the popular principle offered by the large margin nearest neighbors learner (LMNN) to DTW by treating the resulting component-wise dissimilarity values as features. We demonstrate that this principle greatly enhances the classification accuracy in several benchmarks. Further, we show that recent auxiliary concepts such as metric regularization can be transferred from the vectorial case to component-wise DTW in a similar way. We illustrate that metric regularization constitutes a crucial prerequisite for the interpretation of the resulting relevance profiles. |
Tasks | Metric Learning, Motion Capture |
Published | 2016-10-17 |
URL | http://arxiv.org/abs/1610.05083v3 |
http://arxiv.org/pdf/1610.05083v3.pdf | |
PWC | https://paperswithcode.com/paper/efficient-metric-learning-for-the-analysis-of |
Repo | https://github.com/bab-git/dist-LMNN |
Framework | none |
Residual Networks of Residual Networks: Multilevel Residual Networks
Title | Residual Networks of Residual Networks: Multilevel Residual Networks |
Authors | Ke Zhang, Miao Sun, Tony X. Han, Xingfang Yuan, Liru Guo, Tao Liu |
Abstract | A residual-networks family with hundreds or even thousands of layers dominates major image recognition tasks, but building a network by simply stacking residual blocks inevitably limits its optimization ability. This paper proposes a novel residual-network architecture, Residual networks of Residual networks (RoR), to dig the optimization ability of residual networks. RoR substitutes optimizing residual mapping of residual mapping for optimizing original residual mapping. In particular, RoR adds level-wise shortcut connections upon original residual networks to promote the learning capability of residual networks. More importantly, RoR can be applied to various kinds of residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their performance. Our experiments demonstrate the effectiveness and versatility of RoR, where it achieves the best performance in all residual-network-like structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%, respectively. RoR-3 models also achieve state-of-the-art results compared to ResNets on ImageNet data set. |
Tasks | Image Classification |
Published | 2016-08-09 |
URL | http://arxiv.org/abs/1608.02908v2 |
http://arxiv.org/pdf/1608.02908v2.pdf | |
PWC | https://paperswithcode.com/paper/residual-networks-of-residual-networks |
Repo | https://github.com/osmr/imgclsmob |
Framework | mxnet |