May 7, 2019

2903 words 14 mins read

Paper Group AWR 37

Paper Group AWR 37

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies. TGIF: A New Dataset and Benchmark on Animated GIF Description. Video Description using Bidirectional Recurrent Neural Networks. High-Dimensional Regularized Discriminant Analysis. MusicMood: Predicting the mood of music from song lyrics using machine learning. Improved Technique …

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

Title Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
Authors Tal Linzen, Emmanuel Dupoux, Yoav Goldberg
Abstract The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.
Tasks Language Modelling
Published 2016-11-04
URL http://arxiv.org/abs/1611.01368v1
PDF http://arxiv.org/pdf/1611.01368v1.pdf
PWC https://paperswithcode.com/paper/assessing-the-ability-of-lstms-to-learn
Repo https://github.com/icewing1996/bert-syntax
Framework none

TGIF: A New Dataset and Benchmark on Animated GIF Description

Title TGIF: A New Dataset and Benchmark on Animated GIF Description
Authors Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo
Abstract With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich metadata. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowdworkers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description.
Tasks Image Captioning, Machine Translation, Text Generation, Video Description
Published 2016-04-10
URL http://arxiv.org/abs/1604.02748v2
PDF http://arxiv.org/pdf/1604.02748v2.pdf
PWC https://paperswithcode.com/paper/tgif-a-new-dataset-and-benchmark-on-animated
Repo https://github.com/raingo/TGIF-Release
Framework none

Video Description using Bidirectional Recurrent Neural Networks

Title Video Description using Bidirectional Recurrent Neural Networks
Authors Álvaro Peris, Marc Bolaños, Petia Radeva, Francisco Casacuberta
Abstract Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames.
Tasks Text Generation, Video Captioning, Video Description
Published 2016-04-12
URL http://arxiv.org/abs/1604.03390v2
PDF http://arxiv.org/pdf/1604.03390v2.pdf
PWC https://paperswithcode.com/paper/video-description-using-bidirectional
Repo https://github.com/lvapeab/ABiViRNet
Framework tf

High-Dimensional Regularized Discriminant Analysis

Title High-Dimensional Regularized Discriminant Analysis
Authors John A. Ramey, Caleb K. Stein, Phil D. Young, Dean M. Young
Abstract Regularized discriminant analysis (RDA), proposed by Friedman (1989), is a widely popular classifier that lacks interpretability and is impractical for high-dimensional data sets. Here, we present an interpretable and computationally efficient classifier called high-dimensional RDA (HDRDA), designed for the small-sample, high-dimensional setting. For HDRDA, we show that each training observation, regardless of class, contributes to the class covariance matrix, resulting in an interpretable estimator that borrows from the pooled sample covariance matrix. Moreover, we show that HDRDA is equivalent to a classifier in a reduced-feature space with dimension approximately equal to the training sample size. As a result, the matrix operations employed by HDRDA are computationally linear in the number of features, making the classifier well-suited for high-dimensional classification in practice. We demonstrate that HDRDA is often superior to several sparse and regularized classifiers in terms of classification accuracy with three artificial and six real high-dimensional data sets. Also, timing comparisons between our HDRDA implementation in the sparsediscrim R package and the standard RDA formulation in the klaR R package demonstrate that as the number of features increases, the computational runtime of HDRDA is drastically smaller than that of RDA.
Tasks
Published 2016-02-03
URL http://arxiv.org/abs/1602.01182v2
PDF http://arxiv.org/pdf/1602.01182v2.pdf
PWC https://paperswithcode.com/paper/high-dimensional-regularized-discriminant
Repo https://github.com/ramhiser/paper-hdrda
Framework none

MusicMood: Predicting the mood of music from song lyrics using machine learning

Title MusicMood: Predicting the mood of music from song lyrics using machine learning
Authors Sebastian Raschka
Abstract Sentiment prediction of contemporary music can have a wide-range of applications in modern society, for instance, selecting music for public institutions such as hospitals or restaurants to potentially improve the emotional well-being of personnel, patients, and customers, respectively. In this project, music recommendation system built upon on a naive Bayes classifier, trained to predict the sentiment of songs based on song lyrics alone. The experimental results show that music corresponding to a happy mood can be detected with high precision based on text features obtained from song lyrics.
Tasks
Published 2016-11-01
URL http://arxiv.org/abs/1611.00138v1
PDF http://arxiv.org/pdf/1611.00138v1.pdf
PWC https://paperswithcode.com/paper/musicmood-predicting-the-mood-of-music-from
Repo https://github.com/rasbt/musicmood
Framework none

Improved Techniques for Training GANs

Title Improved Techniques for Training GANs
Authors Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen
Abstract We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes.
Tasks Conditional Image Generation, Image Generation, Semi-Supervised Image Classification
Published 2016-06-10
URL http://arxiv.org/abs/1606.03498v1
PDF http://arxiv.org/pdf/1606.03498v1.pdf
PWC https://paperswithcode.com/paper/improved-techniques-for-training-gans
Repo https://github.com/vuanhtu1993/Keras-SRGANs
Framework tf

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Title A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Authors Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
Abstract Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.
Tasks Chunking, Multi-Task Learning
Published 2016-11-05
URL http://arxiv.org/abs/1611.01587v5
PDF http://arxiv.org/pdf/1611.01587v5.pdf
PWC https://paperswithcode.com/paper/a-joint-many-task-model-growing-a-neural
Repo https://github.com/rubythonode/joint-many-task-model
Framework tf

Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

Title Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
Authors Qianli Liao, Tomaso Poggio
Abstract We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex. We begin with the observation that a shallow RNN is exactly equivalent to a very deep ResNet with weight sharing among the layers. A direct implementation of such a RNN, although having orders of magnitude fewer parameters, leads to a performance similar to the corresponding ResNet. We propose 1) a generalization of both RNN and ResNet architectures and 2) the conjecture that a class of moderately deep RNNs is a biologically-plausible model of the ventral stream in visual cortex. We demonstrate the effectiveness of the architectures by testing them on the CIFAR-10 dataset.
Tasks
Published 2016-04-13
URL http://arxiv.org/abs/1604.03640v1
PDF http://arxiv.org/pdf/1604.03640v1.pdf
PWC https://paperswithcode.com/paper/bridging-the-gaps-between-residual-learning
Repo https://github.com/ry/tensorflow-resnet
Framework tf

Stick-Breaking Variational Autoencoders

Title Stick-Breaking Variational Autoencoders
Authors Eric Nalisnick, Padhraic Smyth
Abstract We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-supervised variant, learn highly discriminative latent representations that often outperform the Gaussian VAE’s.
Tasks
Published 2016-05-20
URL http://arxiv.org/abs/1605.06197v3
PDF http://arxiv.org/pdf/1605.06197v3.pdf
PWC https://paperswithcode.com/paper/stick-breaking-variational-autoencoders
Repo https://github.com/enalisnick/stick-breaking_dgms
Framework none

Mapping Between fMRI Responses to Movies and their Natural Language Annotations

Title Mapping Between fMRI Responses to Movies and their Natural Language Annotations
Authors Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora
Abstract Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock [1], and learn bidirectional mappings between fMRI responses and natural language representations. We show how to leverage data from multiple subjects watching the same movie to improve the accuracy of the mappings, allowing us to succeed at a scene classification task with 72% accuracy (random guessing would give 4%) and at a scene ranking task with average rank in the top 4% (random guessing would give 50%). The key ingredients are (a) the use of the Shared Response Model (SRM) and its variant SRM-ICA [2, 3] to aggregate fMRI data from multiple subjects, both of which are shown to be superior to standard PCA in producing low-dimensional representations for the tasks in this paper; (b) a sentence embedding technique adapted from the natural language processing (NLP) literature [4] that produces semantic vector representation of the annotations; (c) using previous timestep information in the featurization of the predictor data.
Tasks Scene Classification, Sentence Embedding
Published 2016-10-13
URL http://arxiv.org/abs/1610.03914v3
PDF http://arxiv.org/pdf/1610.03914v3.pdf
PWC https://paperswithcode.com/paper/mapping-between-fmri-responses-to-movies-and
Repo https://github.com/asprout/CPSC490
Framework none

Delving into Transferable Adversarial Examples and Black-box Attacks

Title Delving into Transferable Adversarial Examples and Black-box Attacks
Authors Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song
Abstract An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system.
Tasks Adversarial Attack, Adversarial Defense, Image Classification
Published 2016-11-08
URL http://arxiv.org/abs/1611.02770v3
PDF http://arxiv.org/pdf/1611.02770v3.pdf
PWC https://paperswithcode.com/paper/delving-into-transferable-adversarial
Repo https://github.com/sunblaze-ucb/transferability-advdnn-pub
Framework tf

Distributed Coordinate Descent for Generalized Linear Models with Regularization

Title Distributed Coordinate Descent for Generalized Linear Models with Regularization
Authors Ilya Trofimov, Alexander Genkin
Abstract Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets.
Tasks
Published 2016-11-07
URL http://arxiv.org/abs/1611.02101v2
PDF http://arxiv.org/pdf/1611.02101v2.pdf
PWC https://paperswithcode.com/paper/distributed-coordinate-descent-for
Repo https://github.com/IlyaTrofimov/dlr
Framework none

Spatiotemporal Residual Networks for Video Action Recognition

Title Spatiotemporal Residual Networks for Video Action Recognition
Authors Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes
Abstract Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by introducing residual connections in two ways. First, we inject residual connections between the appearance and motion pathways of a two-stream architecture to allow spatiotemporal interaction between the two streams. Second, we transform pretrained image ConvNets into spatiotemporal networks by equipping these with learnable convolutional filters that are initialized as temporal residual connections and operate on adjacent feature maps in time. This approach slowly increases the spatiotemporal receptive field as the depth of the model increases and naturally integrates image ConvNet design principles. The whole model is trained end-to-end to allow hierarchical learning of complex spatiotemporal features. We evaluate our novel spatiotemporal ResNet using two widely used action recognition benchmarks where it exceeds the previous state-of-the-art.
Tasks Action Recognition In Videos, Temporal Action Localization
Published 2016-11-07
URL http://arxiv.org/abs/1611.02155v1
PDF http://arxiv.org/pdf/1611.02155v1.pdf
PWC https://paperswithcode.com/paper/spatiotemporal-residual-networks-for-video
Repo https://github.com/feichtenhofer/st-resnet
Framework none

Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

Title Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Authors Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine
Abstract Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym’s MuJoCo continuous control environments.
Tasks Continuous Control, Policy Gradient Methods, Q-Learning
Published 2016-11-07
URL http://arxiv.org/abs/1611.02247v3
PDF http://arxiv.org/pdf/1611.02247v3.pdf
PWC https://paperswithcode.com/paper/q-prop-sample-efficient-policy-gradient-with
Repo https://github.com/brain-research/mirage-rl
Framework tf

Fast K-Means with Accurate Bounds

Title Fast K-Means with Accurate Bounds
Authors James Newling, François Fleuret
Abstract We propose a novel accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster. We also propose a general improvement of existing state-of-the-art accelerated exact k-means algorithms through better estimates of the distance bounds used to reduce the number of distance calculations, and get a speedup in 36 of 44 experiments, up to 1.8 times faster. We have conducted experiments with our own implementations of existing methods to ensure homogeneous evaluation of performance, and we show that our implementations perform as well or better than existing available implementations. Finally, we propose simplified variants of standard approaches and show that they are faster than their fully-fledged counterparts in 59 of 62 experiments.
Tasks
Published 2016-02-08
URL http://arxiv.org/abs/1602.02514v6
PDF http://arxiv.org/pdf/1602.02514v6.pdf
PWC https://paperswithcode.com/paper/fast-k-means-with-accurate-bounds
Repo https://github.com/idiap/eakmeans
Framework none
comments powered by Disqus