May 7, 2019

2977 words 14 mins read

Paper Group AWR 16

Asynchronous Temporal Fields for Action Recognition. Iterative Alternating Neural Attention for Machine Reading. Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders. Superpixels: An Evaluation of the State-of-the-Art. End to End Learning for Self-Driving Cars. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 …

Asynchronous Temporal Fields for Action Recognition


Title	Asynchronous Temporal Fields for Action Recognition
Authors	Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta
Abstract	Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. End-to-end training of such structured models is a challenging endeavor: For inference and learning we need to construct mini-batches consisting of whole videos, leading to mini-batches with only a few videos. This causes high-correlation between data points leading to breakdown of the backprop algorithm. To address this challenge, we present an asynchronous variational inference method that allows efficient end-to-end training. Our method achieves a classification mAP of 22.4% on the Charades benchmark, outperforming the state-of-the-art (17.2% mAP), and offers equal gains on the task of temporal localization.
Tasks	Action Classification, Temporal Action Localization, Temporal Localization
Published	2016-12-19
URL	http://arxiv.org/abs/1612.06371v2
PDF	http://arxiv.org/pdf/1612.06371v2.pdf
PWC	https://paperswithcode.com/paper/asynchronous-temporal-fields-for-action
Repo	https://github.com/gsig/temporal-fields
Framework	torch

Iterative Alternating Neural Attention for Machine Reading


Title	Iterative Alternating Neural Attention for Machine Reading
Authors	Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio
Abstract	We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. Unlike previous models, we do not collapse the query into a single vector, instead we deploy an iterative alternating attention mechanism that allows a fine-grained exploration of both the query and the document. Our model outperforms state-of-the-art baselines in standard machine comprehension benchmarks such as CNN news articles and the Children’s Book Test (CBT) dataset.
Tasks	Question Answering, Reading Comprehension
Published	2016-06-07
URL	http://arxiv.org/abs/1606.02245v4
PDF	http://arxiv.org/pdf/1606.02245v4.pdf
PWC	https://paperswithcode.com/paper/iterative-alternating-neural-attention-for
Repo	https://github.com/zyy1659949090/TensorFlow1
Framework	tf

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders


Title	Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders
Authors	Simon Šuster, Ivan Titov, Gertjan van Noord
Abstract	We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information. Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen sense. The two components are estimated jointly. We observe that the word representations induced from bilingual data outperform the monolingual counterparts across a range of evaluation tasks, even though crosslingual information is not available at test time.
Tasks	Word Embeddings
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09128v1
PDF	http://arxiv.org/pdf/1603.09128v1.pdf
PWC	https://paperswithcode.com/paper/bilingual-learning-of-multi-sense-embeddings
Repo	https://github.com/rug-compling/bimu
Framework	none

Superpixels: An Evaluation of the State-of-the-Art


Title	Superpixels: An Evaluation of the State-of-the-Art
Authors	David Stutz, Alexander Hermans, Bastian Leibe
Abstract	Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reducing the number of primitives for subsequent processing steps. As of these properties, superpixel algorithms have received much attention since their naming in 2003. By today, publicly available superpixel algorithms have turned into standard tools in low-level vision. As such, and due to their quick adoption in a wide range of applications, appropriate benchmarks are crucial for algorithm selection and comparison. Until now, the rapidly growing number of algorithms as well as varying experimental setups hindered the development of a unifying benchmark. We present a comprehensive evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark focussing on fair comparison and designed to provide new insights relevant for applications. To this end, we explicitly discuss parameter optimization and the importance of strictly enforcing connectivity. Furthermore, by extending well-known metrics, we are able to summarize algorithm performance independent of the number of generated superpixels, thereby overcoming a major limitation of available benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and affine transformations, implementation details as well as aspects of visual quality. Finally, we present an overall ranking of superpixel algorithms which redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the corresponding implementations which themselves are made publicly available as part of our benchmark at davidstutz.de/projects/superpixel-benchmark/.
Tasks
Published	2016-12-06
URL	http://arxiv.org/abs/1612.01601v3
PDF	http://arxiv.org/pdf/1612.01601v3.pdf
PWC	https://paperswithcode.com/paper/superpixels-an-evaluation-of-the-state-of-the
Repo	https://github.com/davidstutz/superpixel-benchmark
Framework	none

End to End Learning for Self-Driving Cars


Title	End to End Learning for Self-Driving Cars
Authors	Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba
Abstract	We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. We argue that this will eventually lead to better performance and smaller systems. Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e.g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn’t automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. We used an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE(TM) PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS).
Tasks	Lane Detection, Self-Driving Cars
Published	2016-04-25
URL	http://arxiv.org/abs/1604.07316v1
PDF	http://arxiv.org/pdf/1604.07316v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-for-self-driving-cars
Repo	https://github.com/PankajKarki/Self-Driving-Car
Framework	tf

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size


Title	SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Authors	Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
Abstract	Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet
Tasks	Model Compression
Published	2016-02-24
URL	http://arxiv.org/abs/1602.07360v4
PDF	http://arxiv.org/pdf/1602.07360v4.pdf
PWC	https://paperswithcode.com/paper/squeezenet-alexnet-level-accuracy-with-50x
Repo	https://github.com/x5675602/squeezeNet_keras
Framework	none

Dual Learning for Machine Translation


Title	Dual Learning for Machine Translation
Authors	Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma
Abstract	While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.
Tasks	Language Modelling, Machine Translation, Policy Gradient Methods
Published	2016-11-01
URL	http://arxiv.org/abs/1611.00179v1
PDF	http://arxiv.org/pdf/1611.00179v1.pdf
PWC	https://paperswithcode.com/paper/dual-learning-for-machine-translation
Repo	https://github.com/NonameAuPlatal/Dual_Learning
Framework	none

Generating images with recurrent adversarial networks


Title	Generating images with recurrent adversarial networks
Authors	Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, Roland Memisevic
Abstract	Gatys et al. (2015) showed that optimizing pixels to match features in a convolutional network with respect reference image features is a way to render images of high visual quality. We show that unrolling this gradient-based optimization yields a recurrent computation that creates images by incrementally adding onto a visual “canvas”. We propose a recurrent generative model inspired by this view, and show that it can be trained using adversarial training to generate very good image samples. We also propose a way to quantitatively compare adversarial networks by having the generators and discriminators of these networks compete against each other.
Tasks
Published	2016-02-16
URL	http://arxiv.org/abs/1602.05110v5
PDF	http://arxiv.org/pdf/1602.05110v5.pdf
PWC	https://paperswithcode.com/paper/generating-images-with-recurrent-adversarial
Repo	https://github.com/jiwoongim/GRAN
Framework	none

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm


Title	Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Authors	Qiang Liu, Dilin Wang
Abstract	We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein’s identity and a recently proposed kernelized Stein discrepancy, which is of independent interest.
Tasks	Bayesian Inference
Published	2016-08-16
URL	https://arxiv.org/abs/1608.04471v3
PDF	https://arxiv.org/pdf/1608.04471v3.pdf
PWC	https://paperswithcode.com/paper/stein-variational-gradient-descent-a-general
Repo	https://github.com/feynmanliang/dist-svgd
Framework	pytorch

Deep Convolution Networks for Compression Artifacts Reduction


Title	Deep Convolution Networks for Compression Artifacts Reduction
Authors	Ke Yu, Chao Dong, Chen Change Loy, Xiaoou Tang
Abstract	Lossy compression introduces complex compression artifacts, particularly blocking artifacts, ringing effects and blurring. Existing algorithms either focus on removing blocking artifacts and produce blurred output, or restore sharpened images that are accompanied with ringing effects. Inspired by the success of deep convolutional networks (DCN) on superresolution, we formulate a compact and efficient network for seamless attenuation of different compression artifacts. To meet the speed requirement of real-world applications, we further accelerate the proposed baseline model by layer decomposition and joint use of large-stride convolutional and deconvolutional layers. This also leads to a more general CNN framework that has a close relationship with the conventional Multi-Layer Perceptron (MLP). Finally, the modified network achieves a speed up of 7.5 times with almost no performance loss compared to the baseline model. We also demonstrate that a deeper model can be effectively trained with features learned in a shallow network. Following a similar “easy to hard” idea, we systematically investigate three practical transfer settings and show the effectiveness of transfer learning in low-level vision problems. Our method shows superior performance than the state-of-the-art methods both on benchmark datasets and a real-world use case.
Tasks	Transfer Learning
Published	2016-08-09
URL	http://arxiv.org/abs/1608.02778v1
PDF	http://arxiv.org/pdf/1608.02778v1.pdf
PWC	https://paperswithcode.com/paper/deep-convolution-networks-for-compression
Repo	https://github.com/ankitf/artifact_reduction_jpeg
Framework	tf

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources


Title	Source-LDA: Enhancing probabilistic topic models using prior knowledge sources
Authors	Justin Wood, Patrick Tan, Wei Wang, Corey Arnold
Abstract	A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these sets of words with a single n-gram. Such labels are useful for topic identification in summarization systems. This paper introduces a novel approach to labeling a group of n-grams comprising an individual topic. The approach taken is to complement the existing topic distributions over words with a known distribution based on a predefined set of topics. This is done by integrating existing labeled knowledge sources representing known potential topics into the probabilistic topic model. These knowledge sources are translated into a distribution and used to set the hyperparameters of the Dirichlet generated distribution over words. In the inference these modified distributions guide the convergence of the latent topics to conform with the complementary distributions. This approach ensures that the topic inference process is consistent with existing knowledge. The label assignment from the complementary knowledge sources are then transferred to the latent topics of the corpus. The results show both accurate label assignment to topics as well as improved topic generation than those obtained using various labeling approaches based off Latent Dirichlet allocation (LDA).
Tasks	Topic Models
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00577v3
PDF	http://arxiv.org/pdf/1606.00577v3.pdf
PWC	https://paperswithcode.com/paper/source-lda-enhancing-probabilistic-topic
Repo	https://github.com/ucla-scai/Source-LDA
Framework	none

Recursive Diffeomorphism-Based Regression for Shape Functions


Title	Recursive Diffeomorphism-Based Regression for Shape Functions
Authors	Jieren Xu, Haizhao Yang, Ingrid Daubechies
Abstract	This paper proposes a recursive diffeomorphism based regression method for one-dimensional generalized mode decomposition problem that aims at extracting generalized modes $\alpha_k(t)s_k(2\pi N_k\phi_k(t))$ from their superposition $\sum_{k=1}^K \alpha_k(t)s_k(2\pi N_k\phi_k(t))$. First, a one-dimensional synchrosqueezed transform is applied to estimate instantaneous information, e.g., $\alpha_k(t)$ and $N_k\phi_k(t)$. Second, a novel approach based on diffeomorphisms and nonparametric regression is proposed to estimate wave shape functions $s_k(t)$. These two methods lead to a framework for the generalized mode decomposition problem under a weak well-separation condition. Numerical examples of synthetic and real data are provided to demonstrate the fruitful applications of these methods.
Tasks
Published	2016-10-12
URL	http://arxiv.org/abs/1610.03819v2
PDF	http://arxiv.org/pdf/1610.03819v2.pdf
PWC	https://paperswithcode.com/paper/recursive-diffeomorphism-based-regression-for
Repo	https://github.com/HaizhaoYang/DeCom
Framework	none

Learning from the memory of Atari 2600


Title	Learning from the memory of Atari 2600
Authors	Jakub Sygnowski, Henryk Michalewski
Abstract	We train a number of neural networks to play games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen. As the benchmark we used the convolutional model proposed in NIPS and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.
Tasks	Atari Games
Published	2016-05-04
URL	http://arxiv.org/abs/1605.01335v1
PDF	http://arxiv.org/pdf/1605.01335v1.pdf
PWC	https://paperswithcode.com/paper/learning-from-the-memory-of-atari-2600
Repo	https://github.com/ulstu/robotics_ml
Framework	none

A Learned Representation For Artistic Style


Title	A Learned Representation For Artistic Style
Authors	Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
Abstract	The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.
Tasks
Published	2016-10-24
URL	http://arxiv.org/abs/1610.07629v5
PDF	http://arxiv.org/pdf/1610.07629v5.pdf
PWC	https://paperswithcode.com/paper/a-learned-representation-for-artistic-style
Repo	https://github.com/KushajveerSingh/SPADE-PyTorch
Framework	pytorch

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations


Title	SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
Authors	Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick
Abstract	We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embedding’s are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document topic-vectors that can express complex, multi-topic documents. Through extensive experiments on multi-class and multi-label classification tasks, we outperform the previous state-of-the-art method, NTSG (Liu et al., 2015a). We also show that SCDV embedding’s perform well on heterogeneous tasks like Topic Coherence, context-sensitive Learning and Information Retrieval. Moreover, we achieve significant reduction in training and prediction times compared to other representation methods. SCDV achieves best of both worlds - better performance with lower time and space complexity.
Tasks	Information Retrieval, Multi-Label Classification
Published	2016-12-20
URL	http://arxiv.org/abs/1612.06778v3
PDF	http://arxiv.org/pdf/1612.06778v3.pdf
PWC	https://paperswithcode.com/paper/scdv-sparse-composite-document-vectors-using
Repo	https://github.com/nyk510/scdv-python
Framework	none