May 7, 2019

2977 words 14 mins read

Paper Group AWR 16

Paper Group AWR 16

Asynchronous Temporal Fields for Action Recognition. Iterative Alternating Neural Attention for Machine Reading. Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders. Superpixels: An Evaluation of the State-of-the-Art. End to End Learning for Self-Driving Cars. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 …

Asynchronous Temporal Fields for Action Recognition

Title Asynchronous Temporal Fields for Action Recognition
Authors Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta
Abstract Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. End-to-end training of such structured models is a challenging endeavor: For inference and learning we need to construct mini-batches consisting of whole videos, leading to mini-batches with only a few videos. This causes high-correlation between data points leading to breakdown of the backprop algorithm. To address this challenge, we present an asynchronous variational inference method that allows efficient end-to-end training. Our method achieves a classification mAP of 22.4% on the Charades benchmark, outperforming the state-of-the-art (17.2% mAP), and offers equal gains on the task of temporal localization.
Tasks Action Classification, Temporal Action Localization, Temporal Localization
Published 2016-12-19
URL http://arxiv.org/abs/1612.06371v2
PDF http://arxiv.org/pdf/1612.06371v2.pdf
PWC https://paperswithcode.com/paper/asynchronous-temporal-fields-for-action
Repo https://github.com/gsig/temporal-fields
Framework torch

Iterative Alternating Neural Attention for Machine Reading

Title Iterative Alternating Neural Attention for Machine Reading
Authors Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio
Abstract We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. Unlike previous models, we do not collapse the query into a single vector, instead we deploy an iterative alternating attention mechanism that allows a fine-grained exploration of both the query and the document. Our model outperforms state-of-the-art baselines in standard machine comprehension benchmarks such as CNN news articles and the Children’s Book Test (CBT) dataset.
Tasks Question Answering, Reading Comprehension
Published 2016-06-07
URL http://arxiv.org/abs/1606.02245v4
PDF http://arxiv.org/pdf/1606.02245v4.pdf
PWC https://paperswithcode.com/paper/iterative-alternating-neural-attention-for
Repo https://github.com/zyy1659949090/TensorFlow1
Framework tf

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

Title Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders
Authors Simon Šuster, Ivan Titov, Gertjan van Noord
Abstract We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information. Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen sense. The two components are estimated jointly. We observe that the word representations induced from bilingual data outperform the monolingual counterparts across a range of evaluation tasks, even though crosslingual information is not available at test time.
Tasks Word Embeddings
Published 2016-03-30
URL http://arxiv.org/abs/1603.09128v1
PDF http://arxiv.org/pdf/1603.09128v1.pdf
PWC https://paperswithcode.com/paper/bilingual-learning-of-multi-sense-embeddings
Repo https://github.com/rug-compling/bimu
Framework none

Superpixels: An Evaluation of the State-of-the-Art

Title Superpixels: An Evaluation of the State-of-the-Art
Authors David Stutz, Alexander Hermans, Bastian Leibe
Abstract Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reducing the number of primitives for subsequent processing steps. As of these properties, superpixel algorithms have received much attention since their naming in 2003. By today, publicly available superpixel algorithms have turned into standard tools in low-level vision. As such, and due to their quick adoption in a wide range of applications, appropriate benchmarks are crucial for algorithm selection and comparison. Until now, the rapidly growing number of algorithms as well as varying experimental setups hindered the development of a unifying benchmark. We present a comprehensive evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark focussing on fair comparison and designed to provide new insights relevant for applications. To this end, we explicitly discuss parameter optimization and the importance of strictly enforcing connectivity. Furthermore, by extending well-known metrics, we are able to summarize algorithm performance independent of the number of generated superpixels, thereby overcoming a major limitation of available benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and affine transformations, implementation details as well as aspects of visual quality. Finally, we present an overall ranking of superpixel algorithms which redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the corresponding implementations which themselves are made publicly available as part of our benchmark at davidstutz.de/projects/superpixel-benchmark/.
Tasks
Published 2016-12-06
URL http://arxiv.org/abs/1612.01601v3
PDF http://arxiv.org/pdf/1612.01601v3.pdf
PWC https://paperswithcode.com/paper/superpixels-an-evaluation-of-the-state-of-the
Repo https://github.com/davidstutz/superpixel-benchmark
Framework none

End to End Learning for Self-Driving Cars

Title End to End Learning for Self-Driving Cars
Authors Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba
Abstract We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. We argue that this will eventually lead to better performance and smaller systems. Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e.g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn’t automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. We used an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE(TM) PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS).
Tasks Lane Detection, Self-Driving Cars
Published 2016-04-25
URL http://arxiv.org/abs/1604.07316v1
PDF http://arxiv.org/pdf/1604.07316v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-for-self-driving-cars
Repo https://github.com/PankajKarki/Self-Driving-Car
Framework tf

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

Title SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Authors Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
Abstract Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet
Tasks Model Compression
Published 2016-02-24
URL http://arxiv.org/abs/1602.07360v4
PDF http://arxiv.org/pdf/1602.07360v4.pdf
PWC https://paperswithcode.com/paper/squeezenet-alexnet-level-accuracy-with-50x
Repo https://github.com/x5675602/squeezeNet_keras
Framework none

Dual Learning for Machine Translation

Title Dual Learning for Machine Translation
Authors Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma
Abstract While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Tasks Language Modelling, Machine Translation, Policy Gradient Methods
Published 2016-11-01
URL http://arxiv.org/abs/1611.00179v1
PDF http://arxiv.org/pdf/1611.00179v1.pdf
PWC https://paperswithcode.com/paper/dual-learning-for-machine-translation
Repo https://github.com/NonameAuPlatal/Dual_Learning
Framework none

Generating images with recurrent adversarial networks

Title Generating images with recurrent adversarial networks
Authors Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, Roland Memisevic
Abstract Gatys et al. (2015) showed that optimizing pixels to match features in a convolutional network with respect reference image features is a way to render images of high visual quality. We show that unrolling this gradient-based optimization yields a recurrent computation that creates images by incrementally adding onto a visual “canvas”. We propose a recurrent generative model inspired by this view, and show that it can be trained using adversarial training to generate very good image samples. We also propose a way to quantitatively compare adversarial networks by having the generators and discriminators of these networks compete against each other.
Tasks
Published 2016-02-16
URL http://arxiv.org/abs/1602.05110v5
PDF http://arxiv.org/pdf/1602.05110v5.pdf
PWC https://paperswithcode.com/paper/generating-images-with-recurrent-adversarial
Repo https://github.com/jiwoongim/GRAN
Framework none

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Title Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Authors Qiang Liu, Dilin Wang
Abstract We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein’s identity and a recently proposed kernelized Stein discrepancy, which is of independent interest.
Tasks Bayesian Inference
Published 2016-08-16
URL https://arxiv.org/abs/1608.04471v3
PDF https://arxiv.org/pdf/1608.04471v3.pdf
PWC https://paperswithcode.com/paper/stein-variational-gradient-descent-a-general
Repo https://github.com/feynmanliang/dist-svgd
Framework pytorch

Deep Convolution Networks for Compression Artifacts Reduction

Title Deep Convolution Networks for Compression Artifacts Reduction
Authors Ke Yu, Chao Dong, Chen Change Loy, Xiaoou Tang
Abstract Lossy compression introduces complex compression artifacts, particularly blocking artifacts, ringing effects and blurring. Existing algorithms either focus on removing blocking artifacts and produce blurred output, or restore sharpened images that are accompanied with ringing effects. Inspired by the success of deep convolutional networks (DCN) on superresolution, we formulate a compact and efficient network for seamless attenuation of different compression artifacts. To meet the speed requirement of real-world applications, we further accelerate the proposed baseline model by layer decomposition and joint use of large-stride convolutional and deconvolutional layers. This also leads to a more general CNN framework that has a close relationship with the conventional Multi-Layer Perceptron (MLP). Finally, the modified network achieves a speed up of 7.5 times with almost no performance loss compared to the baseline model. We also demonstrate that a deeper model can be effectively trained with features learned in a shallow network. Following a similar “easy to hard” idea, we systematically investigate three practical transfer settings and show the effectiveness of transfer learning in low-level vision problems. Our method shows superior performance than the state-of-the-art methods both on benchmark datasets and a real-world use case.
Tasks Transfer Learning
Published 2016-08-09
URL http://arxiv.org/abs/1608.02778v1
PDF http://arxiv.org/pdf/1608.02778v1.pdf
PWC https://paperswithcode.com/paper/deep-convolution-networks-for-compression
Repo https://github.com/ankitf/artifact_reduction_jpeg
Framework tf

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

Title Source-LDA: Enhancing probabilistic topic models using prior knowledge sources
Authors Justin Wood, Patrick Tan, Wei Wang, Corey Arnold
Abstract A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these sets of words with a single n-gram. Such labels are useful for topic identification in summarization systems. This paper introduces a novel approach to labeling a group of n-grams comprising an individual topic. The approach taken is to complement the existing topic distributions over words with a known distribution based on a predefined set of topics. This is done by integrating existing labeled knowledge sources representing known potential topics into the probabilistic topic model. These knowledge sources are translated into a distribution and used to set the hyperparameters of the Dirichlet generated distribution over words. In the inference these modified distributions guide the convergence of the latent topics to conform with the complementary distributions. This approach ensures that the topic inference process is consistent with existing knowledge. The label assignment from the complementary knowledge sources are then transferred to the latent topics of the corpus. The results show both accurate label assignment to topics as well as improved topic generation than those obtained using various labeling approaches based off Latent Dirichlet allocation (LDA).
Tasks Topic Models
Published 2016-06-02
URL http://arxiv.org/abs/1606.00577v3
PDF http://arxiv.org/pdf/1606.00577v3.pdf
PWC https://paperswithcode.com/paper/source-lda-enhancing-probabilistic-topic
Repo https://github.com/ucla-scai/Source-LDA
Framework none

Recursive Diffeomorphism-Based Regression for Shape Functions

Title Recursive Diffeomorphism-Based Regression for Shape Functions
Authors Jieren Xu, Haizhao Yang, Ingrid Daubechies
Abstract This paper proposes a recursive diffeomorphism based regression method for one-dimensional generalized mode decomposition problem that aims at extracting generalized modes $\alpha_k(t)s_k(2\pi N_k\phi_k(t))$ from their superposition $\sum_{k=1}^K \alpha_k(t)s_k(2\pi N_k\phi_k(t))$. First, a one-dimensional synchrosqueezed transform is applied to estimate instantaneous information, e.g., $\alpha_k(t)$ and $N_k\phi_k(t)$. Second, a novel approach based on diffeomorphisms and nonparametric regression is proposed to estimate wave shape functions $s_k(t)$. These two methods lead to a framework for the generalized mode decomposition problem under a weak well-separation condition. Numerical examples of synthetic and real data are provided to demonstrate the fruitful applications of these methods.
Tasks
Published 2016-10-12
URL http://arxiv.org/abs/1610.03819v2
PDF http://arxiv.org/pdf/1610.03819v2.pdf
PWC https://paperswithcode.com/paper/recursive-diffeomorphism-based-regression-for
Repo https://github.com/HaizhaoYang/DeCom
Framework none

Learning from the memory of Atari 2600

Title Learning from the memory of Atari 2600
Authors Jakub Sygnowski, Henryk Michalewski
Abstract We train a number of neural networks to play games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen. As the benchmark we used the convolutional model proposed in NIPS and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.
Tasks Atari Games
Published 2016-05-04
URL http://arxiv.org/abs/1605.01335v1
PDF http://arxiv.org/pdf/1605.01335v1.pdf
PWC https://paperswithcode.com/paper/learning-from-the-memory-of-atari-2600
Repo https://github.com/ulstu/robotics_ml
Framework none

A Learned Representation For Artistic Style

Title A Learned Representation For Artistic Style
Authors Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
Abstract The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.
Tasks
Published 2016-10-24
URL http://arxiv.org/abs/1610.07629v5
PDF http://arxiv.org/pdf/1610.07629v5.pdf
PWC https://paperswithcode.com/paper/a-learned-representation-for-artistic-style
Repo https://github.com/KushajveerSingh/SPADE-PyTorch
Framework pytorch

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

Title SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
Authors Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick
Abstract We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embedding’s are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document topic-vectors that can express complex, multi-topic documents. Through extensive experiments on multi-class and multi-label classification tasks, we outperform the previous state-of-the-art method, NTSG (Liu et al., 2015a). We also show that SCDV embedding’s perform well on heterogeneous tasks like Topic Coherence, context-sensitive Learning and Information Retrieval. Moreover, we achieve significant reduction in training and prediction times compared to other representation methods. SCDV achieves best of both worlds - better performance with lower time and space complexity.
Tasks Information Retrieval, Multi-Label Classification
Published 2016-12-20
URL http://arxiv.org/abs/1612.06778v3
PDF http://arxiv.org/pdf/1612.06778v3.pdf
PWC https://paperswithcode.com/paper/scdv-sparse-composite-document-vectors-using
Repo https://github.com/nyk510/scdv-python
Framework none
comments powered by Disqus