Paper Group AWR 48
Scalable Learning of Non-Decomposable Objectives. ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning. Tree-to-Sequence Attentional Neural Machine Translation. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. Temporal Attention-Gated Model for Robust Sequence Classification. Bottleneck Conditional Density …
Scalable Learning of Non-Decomposable Objectives
Title | Scalable Learning of Non-Decomposable Objectives |
Authors | Elad ET. Eban, Mariano Schain, Alan Mackey, Ariel Gordon, Rif A. Saurous, Gal Elidan |
Abstract | Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the $F_\beta$ score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest. In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline. |
Tasks | |
Published | 2016-08-16 |
URL | http://arxiv.org/abs/1608.04802v2 |
http://arxiv.org/pdf/1608.04802v2.pdf | |
PWC | https://paperswithcode.com/paper/scalable-learning-of-non-decomposable |
Repo | https://github.com/tensorflow/models |
Framework | tf |
ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
Title | ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning |
Authors | Michał Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, Wojciech Jaśkowski |
Abstract | The recent advances in deep neural networks have led to effective vision-based reinforcement learning methods that have been employed to obtain human-level controllers in Atari 2600 games from pixel data. Atari 2600 games, however, do not resemble real-world tasks since they involve non-realistic 2D environments and the third-person perspective. Here, we propose a novel test-bed platform for reinforcement learning research from raw visual information which employs the first-person perspective in a semi-realistic 3D world. The software, called ViZDoom, is based on the classical first-person shooter video game, Doom. It allows developing bots that play the game using the screen buffer. ViZDoom is lightweight, fast, and highly customizable via a convenient mechanism of user scenarios. In the experimental part, we test the environment by trying to learn bots for two scenarios: a basic move-and-shoot task and a more complex maze-navigation problem. Using convolutional deep neural networks with Q-learning and experience replay, for both scenarios, we were able to train competent bots, which exhibit human-like behaviors. The results confirm the utility of ViZDoom as an AI research platform and imply that visual reinforcement learning in 3D realistic first-person perspective environments is feasible. |
Tasks | Atari Games, FPS Games, Game of Doom, Q-Learning |
Published | 2016-05-06 |
URL | http://arxiv.org/abs/1605.02097v2 |
http://arxiv.org/pdf/1605.02097v2.pdf | |
PWC | https://paperswithcode.com/paper/vizdoom-a-doom-based-ai-research-platform-for |
Repo | https://github.com/chengyu2/vizdoom_rl_community_canberra |
Framework | none |
Tree-to-Sequence Attentional Neural Machine Translation
Title | Tree-to-Sequence Attentional Neural Machine Translation |
Authors | Akiko Eriguchi, Kazuma Hashimoto, Yoshimasa Tsuruoka |
Abstract | Most of the existing Neural Machine Translation (NMT) models focus on the conversion of sequential data and do not directly use syntactic information. We propose a novel end-to-end syntactic NMT model, extending a sequence-to-sequence model with the source-side phrase structure. Our model has an attention mechanism that enables the decoder to generate a translated word while softly aligning it with phrases as well as words of the source sentence. Experimental results on the WAT’15 English-to-Japanese dataset demonstrate that our proposed model considerably outperforms sequence-to-sequence attentional NMT models and compares favorably with the state-of-the-art tree-to-string SMT system. |
Tasks | Machine Translation |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06075v3 |
http://arxiv.org/pdf/1603.06075v3.pdf | |
PWC | https://paperswithcode.com/paper/tree-to-sequence-attentional-neural-machine |
Repo | https://github.com/tempra28/tree2seq |
Framework | none |
Scribbler: Controlling Deep Image Synthesis with Sketch and Color
Title | Scribbler: Controlling Deep Image Synthesis with Sketch and Color |
Authors | Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays |
Abstract | Recently, there have been several promising methods to generate realistic imagery from deep convolutional networks. These methods sidestep the traditional computer graphics rendering pipeline and instead generate imagery at the pixel level by learning from large collections of photos (e.g. faces or bedrooms). However, these methods are of limited utility because it is difficult for a user to control what the network produces. In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces. We demonstrate a sketch based image synthesis system which allows users to ‘scribble’ over the sketch to indicate preferred color for objects. Our network can then generate convincing images that satisfy both the color and the sketch constraints of user. The network is feed-forward which allows users to see the effect of their edits in real time. We compare to recent work on sketch to image synthesis and show that our approach can generate more realistic, more diverse, and more controllable outputs. The architecture is also effective at user-guided colorization of grayscale images. |
Tasks | Colorization, Image Generation |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00835v2 |
http://arxiv.org/pdf/1612.00835v2.pdf | |
PWC | https://paperswithcode.com/paper/scribbler-controlling-deep-image-synthesis |
Repo | https://github.com/Pingxia/ConvolutionalSketchInversion |
Framework | none |
Temporal Attention-Gated Model for Robust Sequence Classification
Title | Temporal Attention-Gated Model for Robust Sequence Classification |
Authors | Wenjie Pei, Tadas Baltrušaitis, David M. J. Tax, Louis-Philippe Morency |
Abstract | Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences. Specifically, we extend the concept of attention model to measure the relevance of each observation (time step) of a sequence. We then use a novel gated recurrent network to learn the hidden representation for the final prediction. An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence. We demonstrate the merits of our TAGM approach, both for prediction accuracy and interpretability, on three different tasks: spoken digit recognition, text-based sentiment analysis and visual event recognition. |
Tasks | Sentiment Analysis |
Published | 2016-12-01 |
URL | http://arxiv.org/abs/1612.00385v2 |
http://arxiv.org/pdf/1612.00385v2.pdf | |
PWC | https://paperswithcode.com/paper/temporal-attention-gated-model-for-robust |
Repo | https://github.com/wenjiepei/TAGM |
Framework | torch |
Bottleneck Conditional Density Estimation
Title | Bottleneck Conditional Density Estimation |
Authors | Rui Shu, Hung H. Bui, Mohammad Ghavamzadeh |
Abstract | We introduce a new framework for training deep generative models for high-dimensional conditional density estimation. The Bottleneck Conditional Density Estimator (BCDE) is a variant of the conditional variational autoencoder (CVAE) that employs layer(s) of stochastic variables as the bottleneck between the input $x$ and target $y$, where both are high-dimensional. Crucially, we propose a new hybrid training method that blends the conditional generative model with a joint generative model. Hybrid blending is the key to effective training of the BCDE, which avoids overfitting and provides a novel mechanism for leveraging unlabeled data. We show that our hybrid training procedure enables models to achieve competitive results in the MNIST quadrant prediction task in the fully-supervised setting, and sets new benchmarks in the semi-supervised regime for MNIST, SVHN, and CelebA. |
Tasks | Density Estimation |
Published | 2016-11-25 |
URL | http://arxiv.org/abs/1611.08568v3 |
http://arxiv.org/pdf/1611.08568v3.pdf | |
PWC | https://paperswithcode.com/paper/bottleneck-conditional-density-estimation |
Repo | https://github.com/ruishu/bcde |
Framework | tf |
Deeply-Fused Nets
Title | Deeply-Fused Nets |
Authors | Jingdong Wang, Zhen Wei, Ting Zhang, Wenjun Zeng |
Abstract | In this paper, we present a novel deep learning approach, deeply-fused nets. The central idea of our approach is deep fusion, i.e., combine the intermediate representations of base networks, where the fused output serves as the input of the remaining part of each base network, and perform such combinations deeply over several intermediate representations. The resulting deeply fused net enjoys several benefits. First, it is able to learn multi-scale representations as it enjoys the benefits of more base networks, which could form the same fused network, other than the initial group of base networks. Second, in our suggested fused net formed by one deep and one shallow base networks, the flows of the information from the earlier intermediate layer of the deep base network to the output and from the input to the later intermediate layer of the deep base network are both improved. Last, the deep and shallow base networks are jointly learnt and can benefit from each other. More interestingly, the essential depth of a fused net composed from a deep base network and a shallow base network is reduced because the fused net could be composed from a less deep base network, and thus training the fused net is less difficult than training the initial deep base network. Empirical results demonstrate that our approach achieves superior performance over two closely-related methods, ResNet and Highway, and competitive performance compared to the state-of-the-arts. |
Tasks | |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07716v1 |
http://arxiv.org/pdf/1605.07716v1.pdf | |
PWC | https://paperswithcode.com/paper/deeply-fused-nets |
Repo | https://github.com/homles11/IGCV3 |
Framework | tf |
FALDOI: A new minimization strategy for large displacement variational optical flow
Title | FALDOI: A new minimization strategy for large displacement variational optical flow |
Authors | Roberto P. Palomares, Enric Meinhardt-Llopis, Coloma Ballester, Gloria Haro |
Abstract | We propose a large displacement optical flow method that introduces a new strategy to compute a good local minimum of any optical flow energy functional. The method requires a given set of discrete matches, which can be extremely sparse, and an energy functional which locally guides the interpolation from those matches. In particular, the matches are used to guide a structured coordinate-descent of the energy functional around these keypoints. It results in a two-step minimization method at the finest scale which is very robust to the inevitable outliers of the sparse matcher and able to capture large displacements of small objects. Its benefits over other variational methods that also rely on a set of sparse matches are its robustness against very few matches, high levels of noise and outliers. We validate our proposal using several optical flow variational models. The results consistently outperform the coarse-to-fine approaches and achieve good qualitative and quantitative performance on the standard optical flow benchmarks. |
Tasks | Optical Flow Estimation |
Published | 2016-02-29 |
URL | http://arxiv.org/abs/1602.08960v3 |
http://arxiv.org/pdf/1602.08960v3.pdf | |
PWC | https://paperswithcode.com/paper/faldoi-a-new-minimization-strategy-for-large |
Repo | https://github.com/fperezgamonal/faldoi-ipol |
Framework | none |
Gradient Coding
Title | Gradient Coding |
Authors | Rashish Tandon, Qi Lei, Alexandros G. Dimakis, Nikos Karampatziakis |
Abstract | We propose a novel coding theoretic framework for mitigating stragglers in distributed learning. We show how carefully replicating data blocks and coding across gradients can provide tolerance to failures and stragglers for Synchronous Gradient Descent. We implement our schemes in python (using MPI) to run on Amazon EC2, and show how we compare against baseline approaches in running time and generalization error. |
Tasks | |
Published | 2016-12-10 |
URL | http://arxiv.org/abs/1612.03301v2 |
http://arxiv.org/pdf/1612.03301v2.pdf | |
PWC | https://paperswithcode.com/paper/gradient-coding |
Repo | https://github.com/hwang595/ErasureHead |
Framework | pytorch |
Deep Reinforcement Learning for Multi-Domain Dialogue Systems
Title | Deep Reinforcement Learning for Multi-Domain Dialogue Systems |
Authors | Heriberto Cuayáhuitl, Seunghak Yu, Ashley Williamson, Jacob Carse |
Abstract | Standard deep reinforcement learning methods such as Deep Q-Networks (DQN) for multiple tasks (domains) face scalability problems. We propose a method for multi-domain dialogue policy learning—termed NDQN, and apply it to an information-seeking spoken dialogue system in the domains of restaurants and hotels. Experimental results comparing DQN (baseline) versus NDQN (proposed) using simulations report that our proposed method exhibits better scalability and is promising for optimising the behaviour of multi-domain dialogue systems. |
Tasks | |
Published | 2016-11-26 |
URL | http://arxiv.org/abs/1611.08675v1 |
http://arxiv.org/pdf/1611.08675v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-reinforcement-learning-for-multi-domain |
Repo | https://github.com/cuayahuitl/SimpleDS |
Framework | none |
FPNN: Field Probing Neural Networks for 3D Data
Title | FPNN: Field Probing Neural Networks for 3D Data |
Authors | Yangyan Li, Soeren Pirk, Hao Su, Charles R. Qi, Leonidas J. Guibas |
Abstract | Building discriminative representations for 3D data has been an important task in computer graphics and computer vision research. Convolutional Neural Networks (CNNs) have shown to operate on 2D images with great success for a variety of tasks. Lifting convolution operators to 3D (3DCNNs) seems like a plausible and promising next step. Unfortunately, the computational complexity of 3D CNNs grows cubically with respect to voxel resolution. Moreover, since most 3D geometry representations are boundary based, occupied regions do not increase proportionately with the size of the discretization, resulting in wasted computation. In this work, we represent 3D spaces as volumetric fields, and propose a novel design that employs field probing filters to efficiently extract features from them. Each field probing filter is a set of probing points — sensors that perceive the space. Our learning algorithm optimizes not only the weights associated with the probing points, but also their locations, which deforms the shape of the probing filters and adaptively distributes them in 3D space. The optimized probing points sense the 3D space “intelligently”, rather than operating blindly over the entire domain. We show that field probing is significantly more efficient than 3DCNNs, while providing state-of-the-art performance, on classification tasks for 3D object recognition benchmark datasets. |
Tasks | 3D Object Recognition, Object Recognition |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06240v3 |
http://arxiv.org/pdf/1605.06240v3.pdf | |
PWC | https://paperswithcode.com/paper/fpnn-field-probing-neural-networks-for-3d |
Repo | https://github.com/yangyanli/FPNN |
Framework | none |
A User Simulator for Task-Completion Dialogues
Title | A User Simulator for Task-Completion Dialogues |
Authors | Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen |
Abstract | Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotated data. Third, collecting and annotating human-machine or human-human conversations for task-oriented dialogues requires extensive domain knowledge. Because building an appropriate dataset can be both financially costly and time-consuming, one popular approach is to build a user simulator based upon a corpus of example dialogues. Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator. Dialogue agents trained on these simulators can serve as an effective starting point. Once agents master the simulator, they may be deployed in a real environment to interact with humans, and continue to be trained online. To ease empirical algorithmic comparisons in dialogues, this paper introduces a new, publicly available simulation framework, where our simulator, designed for the movie-booking domain, leverages both rules and collected data. The simulator supports two tasks: movie ticket booking and movie seeking. Finally, we demonstrate several agents and detail the procedure to add and test your own agent in the proposed framework. |
Tasks | Task-Oriented Dialogue Systems |
Published | 2016-12-17 |
URL | http://arxiv.org/abs/1612.05688v3 |
http://arxiv.org/pdf/1612.05688v3.pdf | |
PWC | https://paperswithcode.com/paper/a-user-simulator-for-task-completion |
Repo | https://github.com/MiuLab/UserSimulator |
Framework | none |
Data Augmentation via Levy Processes
Title | Data Augmentation via Levy Processes |
Authors | Stefan Wager, William Fithian, Percy Liang |
Abstract | If a document is about travel, we may expect that short snippets of the document should also be about travel. We introduce a general framework for incorporating these types of invariances into a discriminative classifier. The framework imagines data as being drawn from a slice of a Levy process. If we slice the Levy process at an earlier point in time, we obtain additional pseudo-examples, which can be used to train the classifier. We show that this scheme has two desirable properties: it preserves the Bayes decision boundary, and it is equivalent to fitting a generative model in the limit where we rewind time back to 0. Our construction captures popular schemes such as Gaussian feature noising and dropout training, as well as admitting new generalizations. |
Tasks | Data Augmentation, Image Augmentation |
Published | 2016-03-21 |
URL | http://arxiv.org/abs/1603.06340v1 |
http://arxiv.org/pdf/1603.06340v1.pdf | |
PWC | https://paperswithcode.com/paper/data-augmentation-via-levy-processes |
Repo | https://github.com/swager/levythin |
Framework | none |
node2vec: Scalable Feature Learning for Networks
Title | node2vec: Scalable Feature Learning for Networks |
Authors | Aditya Grover, Jure Leskovec |
Abstract | Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node’s network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks. |
Tasks | Link Prediction, Multi-Label Classification, Node Classification, Representation Learning |
Published | 2016-07-03 |
URL | http://arxiv.org/abs/1607.00653v1 |
http://arxiv.org/pdf/1607.00653v1.pdf | |
PWC | https://paperswithcode.com/paper/node2vec-scalable-feature-learning-for |
Repo | https://github.com/WiktorJ/msnode2vec |
Framework | none |
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Title | Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer |
Authors | Sergey Zagoruyko, Nikos Komodakis |
Abstract | Attention plays a critical role in human visual experience. Furthermore, it has recently been demonstrated that attention can also play an important role in the context of applying artificial neural networks to a variety of tasks from fields such as computer vision and NLP. In this work we show that, by properly defining attention for convolutional neural networks, we can actually use this type of information in order to significantly improve the performance of a student CNN network by forcing it to mimic the attention maps of a powerful teacher network. To that end, we propose several novel methods of transferring attention, showing consistent improvement across a variety of datasets and convolutional neural network architectures. Code and models for our experiments are available at https://github.com/szagoruyko/attention-transfer |
Tasks | |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03928v3 |
http://arxiv.org/pdf/1612.03928v3.pdf | |
PWC | https://paperswithcode.com/paper/paying-more-attention-to-attention-improving |
Repo | https://github.com/bharatsau/Darknet2 |
Framework | none |