Paper Group AWR 37
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies. TGIF: A New Dataset and Benchmark on Animated GIF Description. Video Description using Bidirectional Recurrent Neural Networks. High-Dimensional Regularized Discriminant Analysis. MusicMood: Predicting the mood of music from song lyrics using machine learning. Improved Technique …
Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies
Title | Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies |
Authors | Tal Linzen, Emmanuel Dupoux, Yoav Goldberg |
Abstract | The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture’s grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured. |
Tasks | Language Modelling |
Published | 2016-11-04 |
URL | http://arxiv.org/abs/1611.01368v1 |
http://arxiv.org/pdf/1611.01368v1.pdf | |
PWC | https://paperswithcode.com/paper/assessing-the-ability-of-lstms-to-learn |
Repo | https://github.com/icewing1996/bert-syntax |
Framework | none |
TGIF: A New Dataset and Benchmark on Animated GIF Description
Title | TGIF: A New Dataset and Benchmark on Animated GIF Description |
Authors | Yuncheng Li, Yale Song, Liangliang Cao, Joel Tetreault, Larry Goldberg, Alejandro Jaimes, Jiebo Luo |
Abstract | With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich metadata. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowdworkers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description. |
Tasks | Image Captioning, Machine Translation, Text Generation, Video Description |
Published | 2016-04-10 |
URL | http://arxiv.org/abs/1604.02748v2 |
http://arxiv.org/pdf/1604.02748v2.pdf | |
PWC | https://paperswithcode.com/paper/tgif-a-new-dataset-and-benchmark-on-animated |
Repo | https://github.com/raingo/TGIF-Release |
Framework | none |
Video Description using Bidirectional Recurrent Neural Networks
Title | Video Description using Bidirectional Recurrent Neural Networks |
Authors | Álvaro Peris, Marc Bolaños, Petia Radeva, Francisco Casacuberta |
Abstract | Although traditionally used in the machine translation field, the encoder-decoder framework has been recently applied for the generation of video and image descriptions. The combination of Convolutional and Recurrent Neural Networks in these models has proven to outperform the previous state of the art, obtaining more accurate video descriptions. In this work we propose pushing further this model by introducing two contributions into the encoding stage. First, producing richer image representations by combining object and location information from Convolutional Neural Networks and second, introducing Bidirectional Recurrent Neural Networks for capturing both forward and backward temporal relationships in the input frames. |
Tasks | Text Generation, Video Captioning, Video Description |
Published | 2016-04-12 |
URL | http://arxiv.org/abs/1604.03390v2 |
http://arxiv.org/pdf/1604.03390v2.pdf | |
PWC | https://paperswithcode.com/paper/video-description-using-bidirectional |
Repo | https://github.com/lvapeab/ABiViRNet |
Framework | tf |
High-Dimensional Regularized Discriminant Analysis
Title | High-Dimensional Regularized Discriminant Analysis |
Authors | John A. Ramey, Caleb K. Stein, Phil D. Young, Dean M. Young |
Abstract | Regularized discriminant analysis (RDA), proposed by Friedman (1989), is a widely popular classifier that lacks interpretability and is impractical for high-dimensional data sets. Here, we present an interpretable and computationally efficient classifier called high-dimensional RDA (HDRDA), designed for the small-sample, high-dimensional setting. For HDRDA, we show that each training observation, regardless of class, contributes to the class covariance matrix, resulting in an interpretable estimator that borrows from the pooled sample covariance matrix. Moreover, we show that HDRDA is equivalent to a classifier in a reduced-feature space with dimension approximately equal to the training sample size. As a result, the matrix operations employed by HDRDA are computationally linear in the number of features, making the classifier well-suited for high-dimensional classification in practice. We demonstrate that HDRDA is often superior to several sparse and regularized classifiers in terms of classification accuracy with three artificial and six real high-dimensional data sets. Also, timing comparisons between our HDRDA implementation in the sparsediscrim R package and the standard RDA formulation in the klaR R package demonstrate that as the number of features increases, the computational runtime of HDRDA is drastically smaller than that of RDA. |
Tasks | |
Published | 2016-02-03 |
URL | http://arxiv.org/abs/1602.01182v2 |
http://arxiv.org/pdf/1602.01182v2.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-regularized-discriminant |
Repo | https://github.com/ramhiser/paper-hdrda |
Framework | none |
MusicMood: Predicting the mood of music from song lyrics using machine learning
Title | MusicMood: Predicting the mood of music from song lyrics using machine learning |
Authors | Sebastian Raschka |
Abstract | Sentiment prediction of contemporary music can have a wide-range of applications in modern society, for instance, selecting music for public institutions such as hospitals or restaurants to potentially improve the emotional well-being of personnel, patients, and customers, respectively. In this project, music recommendation system built upon on a naive Bayes classifier, trained to predict the sentiment of songs based on song lyrics alone. The experimental results show that music corresponding to a happy mood can be detected with high precision based on text features obtained from song lyrics. |
Tasks | |
Published | 2016-11-01 |
URL | http://arxiv.org/abs/1611.00138v1 |
http://arxiv.org/pdf/1611.00138v1.pdf | |
PWC | https://paperswithcode.com/paper/musicmood-predicting-the-mood-of-music-from |
Repo | https://github.com/rasbt/musicmood |
Framework | none |
Improved Techniques for Training GANs
Title | Improved Techniques for Training GANs |
Authors | Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen |
Abstract | We present a variety of new architectural features and training procedures that we apply to the generative adversarial networks (GANs) framework. We focus on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic. Unlike most work on generative models, our primary goal is not to train a model that assigns high likelihood to test data, nor do we require the model to be able to learn well without using any labels. Using our new techniques, we achieve state-of-the-art results in semi-supervised classification on MNIST, CIFAR-10 and SVHN. The generated images are of high quality as confirmed by a visual Turing test: our model generates MNIST samples that humans cannot distinguish from real data, and CIFAR-10 samples that yield a human error rate of 21.3%. We also present ImageNet samples with unprecedented resolution and show that our methods enable the model to learn recognizable features of ImageNet classes. |
Tasks | Conditional Image Generation, Image Generation, Semi-Supervised Image Classification |
Published | 2016-06-10 |
URL | http://arxiv.org/abs/1606.03498v1 |
http://arxiv.org/pdf/1606.03498v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-techniques-for-training-gans |
Repo | https://github.com/vuanhtu1993/Keras-SRGANs |
Framework | tf |
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Title | A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks |
Authors | Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher |
Abstract | Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task’s loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks. |
Tasks | Chunking, Multi-Task Learning |
Published | 2016-11-05 |
URL | http://arxiv.org/abs/1611.01587v5 |
http://arxiv.org/pdf/1611.01587v5.pdf | |
PWC | https://paperswithcode.com/paper/a-joint-many-task-model-growing-a-neural |
Repo | https://github.com/rubythonode/joint-many-task-model |
Framework | tf |
Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex
Title | Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex |
Authors | Qianli Liao, Tomaso Poggio |
Abstract | We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex. We begin with the observation that a shallow RNN is exactly equivalent to a very deep ResNet with weight sharing among the layers. A direct implementation of such a RNN, although having orders of magnitude fewer parameters, leads to a performance similar to the corresponding ResNet. We propose 1) a generalization of both RNN and ResNet architectures and 2) the conjecture that a class of moderately deep RNNs is a biologically-plausible model of the ventral stream in visual cortex. We demonstrate the effectiveness of the architectures by testing them on the CIFAR-10 dataset. |
Tasks | |
Published | 2016-04-13 |
URL | http://arxiv.org/abs/1604.03640v1 |
http://arxiv.org/pdf/1604.03640v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-the-gaps-between-residual-learning |
Repo | https://github.com/ry/tensorflow-resnet |
Framework | tf |
Stick-Breaking Variational Autoencoders
Title | Stick-Breaking Variational Autoencoders |
Authors | Eric Nalisnick, Padhraic Smyth |
Abstract | We extend Stochastic Gradient Variational Bayes to perform posterior inference for the weights of Stick-Breaking processes. This development allows us to define a Stick-Breaking Variational Autoencoder (SB-VAE), a Bayesian nonparametric version of the variational autoencoder that has a latent representation with stochastic dimensionality. We experimentally demonstrate that the SB-VAE, and a semi-supervised variant, learn highly discriminative latent representations that often outperform the Gaussian VAE’s. |
Tasks | |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06197v3 |
http://arxiv.org/pdf/1605.06197v3.pdf | |
PWC | https://paperswithcode.com/paper/stick-breaking-variational-autoencoders |
Repo | https://github.com/enalisnick/stick-breaking_dgms |
Framework | none |
Mapping Between fMRI Responses to Movies and their Natural Language Annotations
Title | Mapping Between fMRI Responses to Movies and their Natural Language Annotations |
Authors | Kiran Vodrahalli, Po-Hsuan Chen, Yingyu Liang, Christopher Baldassano, Janice Chen, Esther Yong, Christopher Honey, Uri Hasson, Peter Ramadge, Ken Norman, Sanjeev Arora |
Abstract | Several research groups have shown how to correlate fMRI responses to the meanings of presented stimuli. This paper presents new methods for doing so when only a natural language annotation is available as the description of the stimulus. We study fMRI data gathered from subjects watching an episode of BBCs Sherlock [1], and learn bidirectional mappings between fMRI responses and natural language representations. We show how to leverage data from multiple subjects watching the same movie to improve the accuracy of the mappings, allowing us to succeed at a scene classification task with 72% accuracy (random guessing would give 4%) and at a scene ranking task with average rank in the top 4% (random guessing would give 50%). The key ingredients are (a) the use of the Shared Response Model (SRM) and its variant SRM-ICA [2, 3] to aggregate fMRI data from multiple subjects, both of which are shown to be superior to standard PCA in producing low-dimensional representations for the tasks in this paper; (b) a sentence embedding technique adapted from the natural language processing (NLP) literature [4] that produces semantic vector representation of the annotations; (c) using previous timestep information in the featurization of the predictor data. |
Tasks | Scene Classification, Sentence Embedding |
Published | 2016-10-13 |
URL | http://arxiv.org/abs/1610.03914v3 |
http://arxiv.org/pdf/1610.03914v3.pdf | |
PWC | https://paperswithcode.com/paper/mapping-between-fmri-responses-to-movies-and |
Repo | https://github.com/asprout/CPSC490 |
Framework | none |
Delving into Transferable Adversarial Examples and Black-box Attacks
Title | Delving into Transferable Adversarial Examples and Black-box Attacks |
Authors | Yanpei Liu, Xinyun Chen, Chang Liu, Dawn Song |
Abstract | An intriguing property of deep neural networks is the existence of adversarial examples, which can transfer among different architectures. These transferable adversarial examples may severely hinder deep neural network-based applications. Previous works mostly study the transferability using small scale datasets. In this work, we are the first to conduct an extensive study of the transferability over large models and a large scale dataset, and we are also the first to study the transferability of targeted adversarial examples with their target labels. We study both non-targeted and targeted adversarial examples, and show that while transferable non-targeted adversarial examples are easy to find, targeted adversarial examples generated using existing approaches almost never transfer with their target labels. Therefore, we propose novel ensemble-based approaches to generating transferable adversarial examples. Using such approaches, we observe a large proportion of targeted adversarial examples that are able to transfer with their target labels for the first time. We also present some geometric studies to help understanding the transferable adversarial examples. Finally, we show that the adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, which is a black-box image classification system. |
Tasks | Adversarial Attack, Adversarial Defense, Image Classification |
Published | 2016-11-08 |
URL | http://arxiv.org/abs/1611.02770v3 |
http://arxiv.org/pdf/1611.02770v3.pdf | |
PWC | https://paperswithcode.com/paper/delving-into-transferable-adversarial |
Repo | https://github.com/sunblaze-ucb/transferability-advdnn-pub |
Framework | tf |
Distributed Coordinate Descent for Generalized Linear Models with Regularization
Title | Distributed Coordinate Descent for Generalized Linear Models with Regularization |
Authors | Ilya Trofimov, Alexander Genkin |
Abstract | Generalized linear model with $L_1$ and $L_2$ regularization is a widely used technique for solving classification, class probability estimation and regression problems. With the numbers of both features and examples growing rapidly in the fields like text mining and clickstream data analysis parallelization and the use of cluster architectures becomes important. We present a novel algorithm for fitting regularized generalized linear models in the distributed environment. The algorithm splits data between nodes by features, uses coordinate descent on each node and line search to merge results globally. Convergence proof is provided. A modifications of the algorithm addresses slow node problem. For an important particular case of logistic regression we empirically compare our program with several state-of-the art approaches that rely on different algorithmic and data spitting methods. Experiments demonstrate that our approach is scalable and superior when training on large and sparse datasets. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02101v2 |
http://arxiv.org/pdf/1611.02101v2.pdf | |
PWC | https://paperswithcode.com/paper/distributed-coordinate-descent-for |
Repo | https://github.com/IlyaTrofimov/dlr |
Framework | none |
Spatiotemporal Residual Networks for Video Action Recognition
Title | Spatiotemporal Residual Networks for Video Action Recognition |
Authors | Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes |
Abstract | Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by introducing residual connections in two ways. First, we inject residual connections between the appearance and motion pathways of a two-stream architecture to allow spatiotemporal interaction between the two streams. Second, we transform pretrained image ConvNets into spatiotemporal networks by equipping these with learnable convolutional filters that are initialized as temporal residual connections and operate on adjacent feature maps in time. This approach slowly increases the spatiotemporal receptive field as the depth of the model increases and naturally integrates image ConvNet design principles. The whole model is trained end-to-end to allow hierarchical learning of complex spatiotemporal features. We evaluate our novel spatiotemporal ResNet using two widely used action recognition benchmarks where it exceeds the previous state-of-the-art. |
Tasks | Action Recognition In Videos, Temporal Action Localization |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02155v1 |
http://arxiv.org/pdf/1611.02155v1.pdf | |
PWC | https://paperswithcode.com/paper/spatiotemporal-residual-networks-for-video |
Repo | https://github.com/feichtenhofer/st-resnet |
Framework | none |
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Title | Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic |
Authors | Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine |
Abstract | Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are more sample-efficient but biased, and often require costly hyperparameter sweeps to stabilize. In this work, we aim to develop methods that combine the stability of policy gradients with the efficiency of off-policy RL. We present Q-Prop, a policy gradient method that uses a Taylor expansion of the off-policy critic as a control variate. Q-Prop is both sample efficient and stable, and effectively combines the benefits of on-policy and off-policy methods. We analyze the connection between Q-Prop and existing model-free algorithms, and use control variate theory to derive two variants of Q-Prop with conservative and aggressive adaptation. We show that conservative Q-Prop provides substantial gains in sample efficiency over trust region policy optimization (TRPO) with generalized advantage estimation (GAE), and improves stability over deep deterministic policy gradient (DDPG), the state-of-the-art on-policy and off-policy methods, on OpenAI Gym’s MuJoCo continuous control environments. |
Tasks | Continuous Control, Policy Gradient Methods, Q-Learning |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02247v3 |
http://arxiv.org/pdf/1611.02247v3.pdf | |
PWC | https://paperswithcode.com/paper/q-prop-sample-efficient-policy-gradient-with |
Repo | https://github.com/brain-research/mirage-rl |
Framework | tf |
Fast K-Means with Accurate Bounds
Title | Fast K-Means with Accurate Bounds |
Authors | James Newling, François Fleuret |
Abstract | We propose a novel accelerated exact k-means algorithm, which performs better than the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster. We also propose a general improvement of existing state-of-the-art accelerated exact k-means algorithms through better estimates of the distance bounds used to reduce the number of distance calculations, and get a speedup in 36 of 44 experiments, up to 1.8 times faster. We have conducted experiments with our own implementations of existing methods to ensure homogeneous evaluation of performance, and we show that our implementations perform as well or better than existing available implementations. Finally, we propose simplified variants of standard approaches and show that they are faster than their fully-fledged counterparts in 59 of 62 experiments. |
Tasks | |
Published | 2016-02-08 |
URL | http://arxiv.org/abs/1602.02514v6 |
http://arxiv.org/pdf/1602.02514v6.pdf | |
PWC | https://paperswithcode.com/paper/fast-k-means-with-accurate-bounds |
Repo | https://github.com/idiap/eakmeans |
Framework | none |