Paper Group AWR 16
Asynchronous Temporal Fields for Action Recognition. Iterative Alternating Neural Attention for Machine Reading. Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders. Superpixels: An Evaluation of the State-of-the-Art. End to End Learning for Self-Driving Cars. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 …
Asynchronous Temporal Fields for Action Recognition
Title | Asynchronous Temporal Fields for Action Recognition |
Authors | Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta |
Abstract | Actions are more than just movements and trajectories: we cook to eat and we hold a cup to drink from it. A thorough understanding of videos requires going beyond appearance modeling and necessitates reasoning about the sequence of activities, as well as the higher-level constructs such as intentions. But how do we model and reason about these? We propose a fully-connected temporal CRF model for reasoning over various aspects of activities that includes objects, actions, and intentions, where the potentials are predicted by a deep network. End-to-end training of such structured models is a challenging endeavor: For inference and learning we need to construct mini-batches consisting of whole videos, leading to mini-batches with only a few videos. This causes high-correlation between data points leading to breakdown of the backprop algorithm. To address this challenge, we present an asynchronous variational inference method that allows efficient end-to-end training. Our method achieves a classification mAP of 22.4% on the Charades benchmark, outperforming the state-of-the-art (17.2% mAP), and offers equal gains on the task of temporal localization. |
Tasks | Action Classification, Temporal Action Localization, Temporal Localization |
Published | 2016-12-19 |
URL | http://arxiv.org/abs/1612.06371v2 |
http://arxiv.org/pdf/1612.06371v2.pdf | |
PWC | https://paperswithcode.com/paper/asynchronous-temporal-fields-for-action |
Repo | https://github.com/gsig/temporal-fields |
Framework | torch |
Iterative Alternating Neural Attention for Machine Reading
Title | Iterative Alternating Neural Attention for Machine Reading |
Authors | Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio |
Abstract | We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. Unlike previous models, we do not collapse the query into a single vector, instead we deploy an iterative alternating attention mechanism that allows a fine-grained exploration of both the query and the document. Our model outperforms state-of-the-art baselines in standard machine comprehension benchmarks such as CNN news articles and the Children’s Book Test (CBT) dataset. |
Tasks | Question Answering, Reading Comprehension |
Published | 2016-06-07 |
URL | http://arxiv.org/abs/1606.02245v4 |
http://arxiv.org/pdf/1606.02245v4.pdf | |
PWC | https://paperswithcode.com/paper/iterative-alternating-neural-attention-for |
Repo | https://github.com/zyy1659949090/TensorFlow1 |
Framework | tf |
Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders
Title | Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders |
Authors | Simon Šuster, Ivan Titov, Gertjan van Noord |
Abstract | We present an approach to learning multi-sense word embeddings relying both on monolingual and bilingual information. Our model consists of an encoder, which uses monolingual and bilingual context (i.e. a parallel sentence) to choose a sense for a given word, and a decoder which predicts context words based on the chosen sense. The two components are estimated jointly. We observe that the word representations induced from bilingual data outperform the monolingual counterparts across a range of evaluation tasks, even though crosslingual information is not available at test time. |
Tasks | Word Embeddings |
Published | 2016-03-30 |
URL | http://arxiv.org/abs/1603.09128v1 |
http://arxiv.org/pdf/1603.09128v1.pdf | |
PWC | https://paperswithcode.com/paper/bilingual-learning-of-multi-sense-embeddings |
Repo | https://github.com/rug-compling/bimu |
Framework | none |
Superpixels: An Evaluation of the State-of-the-Art
Title | Superpixels: An Evaluation of the State-of-the-Art |
Authors | David Stutz, Alexander Hermans, Bastian Leibe |
Abstract | Superpixels group perceptually similar pixels to create visually meaningful entities while heavily reducing the number of primitives for subsequent processing steps. As of these properties, superpixel algorithms have received much attention since their naming in 2003. By today, publicly available superpixel algorithms have turned into standard tools in low-level vision. As such, and due to their quick adoption in a wide range of applications, appropriate benchmarks are crucial for algorithm selection and comparison. Until now, the rapidly growing number of algorithms as well as varying experimental setups hindered the development of a unifying benchmark. We present a comprehensive evaluation of 28 state-of-the-art superpixel algorithms utilizing a benchmark focussing on fair comparison and designed to provide new insights relevant for applications. To this end, we explicitly discuss parameter optimization and the importance of strictly enforcing connectivity. Furthermore, by extending well-known metrics, we are able to summarize algorithm performance independent of the number of generated superpixels, thereby overcoming a major limitation of available benchmarks. Furthermore, we discuss runtime, robustness against noise, blur and affine transformations, implementation details as well as aspects of visual quality. Finally, we present an overall ranking of superpixel algorithms which redefines the state-of-the-art and enables researchers to easily select appropriate algorithms and the corresponding implementations which themselves are made publicly available as part of our benchmark at davidstutz.de/projects/superpixel-benchmark/. |
Tasks | |
Published | 2016-12-06 |
URL | http://arxiv.org/abs/1612.01601v3 |
http://arxiv.org/pdf/1612.01601v3.pdf | |
PWC | https://paperswithcode.com/paper/superpixels-an-evaluation-of-the-state-of-the |
Repo | https://github.com/davidstutz/superpixel-benchmark |
Framework | none |
End to End Learning for Self-Driving Cars
Title | End to End Learning for Self-Driving Cars |
Authors | Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D. Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, Xin Zhang, Jake Zhao, Karol Zieba |
Abstract | We trained a convolutional neural network (CNN) to map raw pixels from a single front-facing camera directly to steering commands. This end-to-end approach proved surprisingly powerful. With minimum training data from humans the system learns to drive in traffic on local roads with or without lane markings and on highways. It also operates in areas with unclear visual guidance such as in parking lots and on unpaved roads. The system automatically learns internal representations of the necessary processing steps such as detecting useful road features with only the human steering angle as the training signal. We never explicitly trained it to detect, for example, the outline of roads. Compared to explicit decomposition of the problem, such as lane marking detection, path planning, and control, our end-to-end system optimizes all processing steps simultaneously. We argue that this will eventually lead to better performance and smaller systems. Better performance will result because the internal components self-optimize to maximize overall system performance, instead of optimizing human-selected intermediate criteria, e.g., lane detection. Such criteria understandably are selected for ease of human interpretation which doesn’t automatically guarantee maximum system performance. Smaller networks are possible because the system learns to solve the problem with the minimal number of processing steps. We used an NVIDIA DevBox and Torch 7 for training and an NVIDIA DRIVE(TM) PX self-driving car computer also running Torch 7 for determining where to drive. The system operates at 30 frames per second (FPS). |
Tasks | Lane Detection, Self-Driving Cars |
Published | 2016-04-25 |
URL | http://arxiv.org/abs/1604.07316v1 |
http://arxiv.org/pdf/1604.07316v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-for-self-driving-cars |
Repo | https://github.com/PankajKarki/Self-Driving-Car |
Framework | tf |
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
Title | SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size |
Authors | Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer |
Abstract | Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet |
Tasks | Model Compression |
Published | 2016-02-24 |
URL | http://arxiv.org/abs/1602.07360v4 |
http://arxiv.org/pdf/1602.07360v4.pdf | |
PWC | https://paperswithcode.com/paper/squeezenet-alexnet-level-accuracy-with-50x |
Repo | https://github.com/x5675602/squeezeNet_keras |
Framework | none |
Dual Learning for Machine Translation
Title | Dual Learning for Machine Translation |
Authors | Yingce Xia, Di He, Tao Qin, Liwei Wang, Nenghai Yu, Tie-Yan Liu, Wei-Ying Ma |
Abstract | While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task. |
Tasks | Language Modelling, Machine Translation, Policy Gradient Methods |
Published | 2016-11-01 |
URL | http://arxiv.org/abs/1611.00179v1 |
http://arxiv.org/pdf/1611.00179v1.pdf | |
PWC | https://paperswithcode.com/paper/dual-learning-for-machine-translation |
Repo | https://github.com/NonameAuPlatal/Dual_Learning |
Framework | none |
Generating images with recurrent adversarial networks
Title | Generating images with recurrent adversarial networks |
Authors | Daniel Jiwoong Im, Chris Dongjoo Kim, Hui Jiang, Roland Memisevic |
Abstract | Gatys et al. (2015) showed that optimizing pixels to match features in a convolutional network with respect reference image features is a way to render images of high visual quality. We show that unrolling this gradient-based optimization yields a recurrent computation that creates images by incrementally adding onto a visual “canvas”. We propose a recurrent generative model inspired by this view, and show that it can be trained using adversarial training to generate very good image samples. We also propose a way to quantitatively compare adversarial networks by having the generators and discriminators of these networks compete against each other. |
Tasks | |
Published | 2016-02-16 |
URL | http://arxiv.org/abs/1602.05110v5 |
http://arxiv.org/pdf/1602.05110v5.pdf | |
PWC | https://paperswithcode.com/paper/generating-images-with-recurrent-adversarial |
Repo | https://github.com/jiwoongim/GRAN |
Framework | none |
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Title | Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm |
Authors | Qiang Liu, Dilin Wang |
Abstract | We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein’s identity and a recently proposed kernelized Stein discrepancy, which is of independent interest. |
Tasks | Bayesian Inference |
Published | 2016-08-16 |
URL | https://arxiv.org/abs/1608.04471v3 |
https://arxiv.org/pdf/1608.04471v3.pdf | |
PWC | https://paperswithcode.com/paper/stein-variational-gradient-descent-a-general |
Repo | https://github.com/feynmanliang/dist-svgd |
Framework | pytorch |
Deep Convolution Networks for Compression Artifacts Reduction
Title | Deep Convolution Networks for Compression Artifacts Reduction |
Authors | Ke Yu, Chao Dong, Chen Change Loy, Xiaoou Tang |
Abstract | Lossy compression introduces complex compression artifacts, particularly blocking artifacts, ringing effects and blurring. Existing algorithms either focus on removing blocking artifacts and produce blurred output, or restore sharpened images that are accompanied with ringing effects. Inspired by the success of deep convolutional networks (DCN) on superresolution, we formulate a compact and efficient network for seamless attenuation of different compression artifacts. To meet the speed requirement of real-world applications, we further accelerate the proposed baseline model by layer decomposition and joint use of large-stride convolutional and deconvolutional layers. This also leads to a more general CNN framework that has a close relationship with the conventional Multi-Layer Perceptron (MLP). Finally, the modified network achieves a speed up of 7.5 times with almost no performance loss compared to the baseline model. We also demonstrate that a deeper model can be effectively trained with features learned in a shallow network. Following a similar “easy to hard” idea, we systematically investigate three practical transfer settings and show the effectiveness of transfer learning in low-level vision problems. Our method shows superior performance than the state-of-the-art methods both on benchmark datasets and a real-world use case. |
Tasks | Transfer Learning |
Published | 2016-08-09 |
URL | http://arxiv.org/abs/1608.02778v1 |
http://arxiv.org/pdf/1608.02778v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolution-networks-for-compression |
Repo | https://github.com/ankitf/artifact_reduction_jpeg |
Framework | tf |
Source-LDA: Enhancing probabilistic topic models using prior knowledge sources
Title | Source-LDA: Enhancing probabilistic topic models using prior knowledge sources |
Authors | Justin Wood, Patrick Tan, Wei Wang, Corey Arnold |
Abstract | A popular approach to topic modeling involves extracting co-occurring n-grams of a corpus into semantic themes. The set of n-grams in a theme represents an underlying topic, but most topic modeling approaches are not able to label these sets of words with a single n-gram. Such labels are useful for topic identification in summarization systems. This paper introduces a novel approach to labeling a group of n-grams comprising an individual topic. The approach taken is to complement the existing topic distributions over words with a known distribution based on a predefined set of topics. This is done by integrating existing labeled knowledge sources representing known potential topics into the probabilistic topic model. These knowledge sources are translated into a distribution and used to set the hyperparameters of the Dirichlet generated distribution over words. In the inference these modified distributions guide the convergence of the latent topics to conform with the complementary distributions. This approach ensures that the topic inference process is consistent with existing knowledge. The label assignment from the complementary knowledge sources are then transferred to the latent topics of the corpus. The results show both accurate label assignment to topics as well as improved topic generation than those obtained using various labeling approaches based off Latent Dirichlet allocation (LDA). |
Tasks | Topic Models |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00577v3 |
http://arxiv.org/pdf/1606.00577v3.pdf | |
PWC | https://paperswithcode.com/paper/source-lda-enhancing-probabilistic-topic |
Repo | https://github.com/ucla-scai/Source-LDA |
Framework | none |
Recursive Diffeomorphism-Based Regression for Shape Functions
Title | Recursive Diffeomorphism-Based Regression for Shape Functions |
Authors | Jieren Xu, Haizhao Yang, Ingrid Daubechies |
Abstract | This paper proposes a recursive diffeomorphism based regression method for one-dimensional generalized mode decomposition problem that aims at extracting generalized modes $\alpha_k(t)s_k(2\pi N_k\phi_k(t))$ from their superposition $\sum_{k=1}^K \alpha_k(t)s_k(2\pi N_k\phi_k(t))$. First, a one-dimensional synchrosqueezed transform is applied to estimate instantaneous information, e.g., $\alpha_k(t)$ and $N_k\phi_k(t)$. Second, a novel approach based on diffeomorphisms and nonparametric regression is proposed to estimate wave shape functions $s_k(t)$. These two methods lead to a framework for the generalized mode decomposition problem under a weak well-separation condition. Numerical examples of synthetic and real data are provided to demonstrate the fruitful applications of these methods. |
Tasks | |
Published | 2016-10-12 |
URL | http://arxiv.org/abs/1610.03819v2 |
http://arxiv.org/pdf/1610.03819v2.pdf | |
PWC | https://paperswithcode.com/paper/recursive-diffeomorphism-based-regression-for |
Repo | https://github.com/HaizhaoYang/DeCom |
Framework | none |
Learning from the memory of Atari 2600
Title | Learning from the memory of Atari 2600 |
Authors | Jakub Sygnowski, Henryk Michalewski |
Abstract | We train a number of neural networks to play games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen. As the benchmark we used the convolutional model proposed in NIPS and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents. |
Tasks | Atari Games |
Published | 2016-05-04 |
URL | http://arxiv.org/abs/1605.01335v1 |
http://arxiv.org/pdf/1605.01335v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-from-the-memory-of-atari-2600 |
Repo | https://github.com/ulstu/robotics_ml |
Framework | none |
A Learned Representation For Artistic Style
Title | A Learned Representation For Artistic Style |
Authors | Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur |
Abstract | The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style. |
Tasks | |
Published | 2016-10-24 |
URL | http://arxiv.org/abs/1610.07629v5 |
http://arxiv.org/pdf/1610.07629v5.pdf | |
PWC | https://paperswithcode.com/paper/a-learned-representation-for-artistic-style |
Repo | https://github.com/KushajveerSingh/SPADE-PyTorch |
Framework | pytorch |
SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
Title | SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations |
Authors | Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick |
Abstract | We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embedding’s are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document topic-vectors that can express complex, multi-topic documents. Through extensive experiments on multi-class and multi-label classification tasks, we outperform the previous state-of-the-art method, NTSG (Liu et al., 2015a). We also show that SCDV embedding’s perform well on heterogeneous tasks like Topic Coherence, context-sensitive Learning and Information Retrieval. Moreover, we achieve significant reduction in training and prediction times compared to other representation methods. SCDV achieves best of both worlds - better performance with lower time and space complexity. |
Tasks | Information Retrieval, Multi-Label Classification |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06778v3 |
http://arxiv.org/pdf/1612.06778v3.pdf | |
PWC | https://paperswithcode.com/paper/scdv-sparse-composite-document-vectors-using |
Repo | https://github.com/nyk510/scdv-python |
Framework | none |