May 7, 2019

2903 words 14 mins read

Paper Group AWR 37

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies. TGIF: A New Dataset and Benchmark on Animated GIF Description. Video Description using Bidirectional Recurrent Neural Networks. High-Dimensional Regularized Discriminant Analysis. MusicMood: Predicting the mood of music from song lyrics using machine learning. Improved Technique …

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies


Title	Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
Authors	Tal Linzen, Emmanuel Dupoux, Yoav Goldberg
Abstract	The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.
Tasks	Language Modelling
Published	2016-11-04
URL	http://arxiv.org/abs/1611.01368v1
PDF	http://arxiv.org/pdf/1611.01368v1.pdf
PWC	https://paperswithcode.com/paper/assessing-the-ability-of-lstms-to-learn
Repo	https://github.com/icewing1996/bert-syntax
Framework	none

TGIF: A New Dataset and Benchmark on Animated GIF Description


Title	TGIF: A New Dataset and Benchmark on Animated GIF Description
Authors	Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo
Abstract	With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich metadata. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowdworkers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description.
Tasks	Image Captioning, Machine Translation, Text Generation, Video Description
Published	2016-04-10
URL	http://arxiv.org/abs/1604.02748v2
PDF	http://arxiv.org/pdf/1604.02748v2.pdf
PWC	https://paperswithcode.com/paper/tgif-a-new-dataset-and-benchmark-on-animated
Repo	https://github.com/raingo/TGIF-Release
Framework	none

Video Description using Bidirectional Recurrent Neural Networks


Title	Video Description using Bidirectional Recurrent Neural Networks
Authors	Álvaro Peris, Marc Bolaños, Petia Radeva, Francisco Casacuberta
Abstract	Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames.
Tasks	Text Generation, Video Captioning, Video Description
Published	2016-04-12
URL	http://arxiv.org/abs/1604.03390v2
PDF	http://arxiv.org/pdf/1604.03390v2.pdf
PWC	https://paperswithcode.com/paper/video-description-using-bidirectional
Repo	https://github.com/lvapeab/ABiViRNet
Framework	tf

High-Dimensional Regularized Discriminant Analysis


Title	High-Dimensional Regularized Discriminant Analysis
Authors	John A. Ramey, Caleb K. Stein, Phil D. Young, Dean M. Young
Abstract	Regularized discriminant analysis (RDA), proposed by Friedman (1989), is a widely popular classifier that lacks interpretability and is impractical for high-dimensional data sets. Here, we present an interpretable and computationally efficient classifier called high-dimensional RDA (HDRDA), designed for the small-sample, high-dimensional setting. For HDRDA, we show that each training observation, regardless of class, contributes to the class covariance matrix, resulting in an interpretable estimator that borrows from the pooled sample covariance matrix. Moreover, we show that HDRDA is equivalent to a classifier in a reduced-feature space with dimension approximately equal to the training sample size. As a result, the matrix operations employed by HDRDA are computationally linear in the number of features, making the classifier well-suited for high-dimensional classification in practice. We demonstrate that HDRDA is often superior to several sparse and regularized classifiers in terms of classification accuracy with three artificial and six real high-dimensional data sets. Also, timing comparisons between our HDRDA implementation in the sparsediscrim R package and the standard RDA formulation in the klaR R package demonstrate that as the number of features increases, the computational runtime of HDRDA is drastically smaller than that of RDA.
Tasks
Published	2016-02-03
URL	http://arxiv.org/abs/1602.01182v2
PDF	http://arxiv.org/pdf/1602.01182v2.pdf
PWC	https://paperswithcode.com/paper/high-dimensional-regularized-discriminant
Repo	https://github.com/ramhiser/paper-hdrda
Framework	none

MusicMood: Predicting the mood of music from song lyrics using machine learning


Title	MusicMood: Predicting the mood of music from song lyrics using machine learning
Authors	Sebastian Raschka
Abstract	Sentiment prediction of contemporary music can have a wide-range of applications in modern society, for instance, selecting music for public institutions such as hospitals or restaurants to potentially improve the emotional well-being of personnel, patients, and customers, respectively. In this project, music recommendation system built upon on a naive Bayes classifier, trained to predict the sentiment of songs based on song lyrics alone. The experimental results show that music corresponding to a happy mood can be detected with high precision based on text features obtained from song lyrics.
Tasks
Published	2016-11-01
URL	http://arxiv.org/abs/1611.00138v1
PDF	http://arxiv.org/pdf/1611.00138v1.pdf
PWC	https://paperswithcode.com/paper/musicmood-predicting-the-mood-of-music-from
Repo	https://github.com/rasbt/musicmood
Framework	none

Improved Techniques for Training GANs


Title	Improved Techniques for Training GANs
Authors	Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen
Abstract	We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.
Tasks	Conditional Image Generation, Image Generation, Semi-Supervised Image Classification
Published	2016-06-10
URL	http://arxiv.org/abs/1606.03498v1
PDF	http://arxiv.org/pdf/1606.03498v1.pdf
PWC	https://paperswithcode.com/paper/improved-techniques-for-training-gans
Repo	https://github.com/vuanhtu1993/Keras-SRGANs
Framework	tf

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks


Title	A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Authors	Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
Abstract	Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.
Tasks	Chunking, Multi-Task Learning
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01587v5
PDF	http://arxiv.org/pdf/1611.01587v5.pdf
PWC	https://paperswithcode.com/paper/a-joint-many-task-model-growing-a-neural
Repo	https://github.com/rubythonode/joint-many-task-model
Framework	tf

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex


Title	Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
Authors	Qianli Liao, Tomaso Poggio
Abstract	We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex. We begin with the observation that a shallow RNN is exactly equivalent to a very deep ResNet with weight sharing among the layers. A direct implementation of such a RNN, although having orders of magnitude fewer parameters, leads to a performance similar to the corresponding ResNet. We propose 1) a generalization of both RNN and ResNet architectures and 2) the conjecture that a class of moderately deep RNNs is a biologically-plausible model of the ventral stream in visual cortex. We demonstrate the effectiveness of the architectures by testing them on the CIFAR-10 dataset.
Tasks
Published	2016-04-13
URL	http://arxiv.org/abs/1604.03640v1
PDF	http://arxiv.org/pdf/1604.03640v1.pdf
PWC	https://paperswithcode.com/paper/bridging-the-gaps-between-residual-learning
Repo	https://github.com/ry/tensorflow-resnet
Framework	tf

Stick-Breaking Variational Autoencoders


Title	Stick-Breaking Variational Autoencoders
Authors	Eric Nalisnick, Padhraic Smyth
Abstract	We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-supervised variant, learn highly discriminative latent representations that often outperform the Gaussian VAE’s.
Tasks
Published	2016-05-20
URL	http://arxiv.org/abs/1605.06197v3
PDF	http://arxiv.org/pdf/1605.06197v3.pdf
PWC	https://paperswithcode.com/paper/stick-breaking-variational-autoencoders
Repo	https://github.com/enalisnick/stick-breaking_dgms
Framework	none

Mapping Between fMRI Responses to Movies and their Natural Language Annotations


Title	Mapping Between fMRI Responses to Movies and their Natural Language Annotations
Authors	Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora
Abstract	Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock [1], and learn bidirectional mappings between fMRI responses and natural language representations. We show how to leverage data from multiple subjects watching the same movie to improve the accuracy of the mappings, allowing us to succeed at a scene classification task with 72% accuracy (random guessing would give 4%) and at a scene ranking task with average rank in the top 4% (random guessing would give 50%). The key ingredients are (a) the use of the Shared Response Model (SRM) and its variant SRM-ICA [2, 3] to aggregate fMRI data from multiple subjects, both of which are shown to be superior to standard PCA in producing low-dimensional representations for the tasks in this paper; (b) a sentence embedding technique adapted from the natural language processing (NLP) literature [4] that produces semantic vector representation of the annotations; (c) using previous timestep information in the featurization of the predictor data.
Tasks	Scene Classification, Sentence Embedding
Published	2016-10-13
URL	http://arxiv.org/abs/1610.03914v3
PDF	http://arxiv.org/pdf/1610.03914v3.pdf
PWC	https://paperswithcode.com/paper/mapping-between-fmri-responses-to-movies-and
Repo	https://github.com/asprout/CPSC490
Framework	none

Delving into Transferable Adversarial Examples and Black-box Attacks


Title	Delving into Transferable Adversarial Examples and Black-box Attacks
Authors	Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song
Abstract	An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.
Tasks	Adversarial Attack, Adversarial Defense, Image Classification
Published	2016-11-08
URL	http://arxiv.org/abs/1611.02770v3
PDF	http://arxiv.org/pdf/1611.02770v3.pdf
PWC	https://paperswithcode.com/paper/delving-into-transferable-adversarial
Repo	https://github.com/sunblaze-ucb/transferability-advdnn-pub
Framework	tf

Distributed Coordinate Descent for Generalized Linear Models with Regularization


Title	Distributed Coordinate Descent for Generalized Linear Models with Regularization
Authors	Ilya Trofimov, Alexander Genkin
Abstract	Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.
Tasks
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02101v2
PDF	http://arxiv.org/pdf/1611.02101v2.pdf
PWC	https://paperswithcode.com/paper/distributed-coordinate-descent-for
Repo	https://github.com/IlyaTrofimov/dlr
Framework	none

Spatiotemporal Residual Networks for Video Action Recognition


Title	Spatiotemporal Residual Networks for Video Action Recognition
Authors	Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes
Abstract	Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by introducing residual connections in two ways. First, we inject residual connections between the appearance and motion pathways of a two-stream architecture to allow spatiotemporal interaction between the two streams. Second, we transform pretrained image ConvNets into spatiotemporal networks by equipping these with learnable convolutional filters that are initialized as temporal residual connections and operate on adjacent feature maps in time. This approach slowly increases the spatiotemporal receptive field as the depth of the model increases and naturally integrates image ConvNet design principles. The whole model is trained end-to-end to allow hierarchical learning of complex spatiotemporal features. We evaluate our novel spatiotemporal ResNet using two widely used action recognition benchmarks where it exceeds the previous state-of-the-art.
Tasks	Action Recognition In Videos, Temporal Action Localization
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02155v1
PDF	http://arxiv.org/pdf/1611.02155v1.pdf
PWC	https://paperswithcode.com/paper/spatiotemporal-residual-networks-for-video
Repo	https://github.com/feichtenhofer/st-resnet
Framework	none

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic


Title	Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Authors	Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine
Abstract	Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym’s MuJoCo continuous control environments.
Tasks	Continuous Control, Policy Gradient Methods, Q-Learning
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02247v3
PDF	http://arxiv.org/pdf/1611.02247v3.pdf
PWC	https://paperswithcode.com/paper/q-prop-sample-efficient-policy-gradient-with
Repo	https://github.com/brain-research/mirage-rl
Framework	tf

Fast K-Means with Accurate Bounds


Title	Fast K-Means with Accurate Bounds
Authors	James Newling, François Fleuret
Abstract	We propose a novel accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster. We also propose a general improvement of existing state-of-the-art accelerated exact k-means algorithms through better estimates of the distance bounds used to reduce the number of distance calculations, and get a speedup in 36 of 44 experiments, up to 1.8 times faster. We have conducted experiments with our own implementations of existing methods to ensure homogeneous evaluation of performance, and we show that our implementations perform as well or better than existing available implementations. Finally, we propose simplified variants of standard approaches and show that they are faster than their fully-fledged counterparts in 59 of 62 experiments.
Tasks
Published	2016-02-08
URL	http://arxiv.org/abs/1602.02514v6
PDF	http://arxiv.org/pdf/1602.02514v6.pdf
PWC	https://paperswithcode.com/paper/fast-k-means-with-accurate-bounds
Repo	https://github.com/idiap/eakmeans
Framework	none