May 7, 2019

2923 words 14 mins read

Paper Group AWR 90

Deep Convolutional Neural Network Design Patterns. Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data. Generic Inference in Latent Gaussian Process Models. Direct Sparse Odometry. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. A Compare-Aggregate Model for Ma …

Deep Convolutional Neural Network Design Patterns


Title	Deep Convolutional Neural Network Design Patterns
Authors	Leslie N. Smith, Nicholay Topin
Abstract	Recent research in the deep learning field has produced a plethora of new architectures. At the same time, a growing number of groups are applying deep learning to new applications. Some of these groups are likely to be composed of inexperienced deep learning practitioners who are baffled by the dizzying array of architecture choices and therefore opt to use an older architecture (i.e., Alexnet). Here we attempt to bridge this gap by mining the collective knowledge contained in recent deep learning research to discover underlying principles for designing neural network architectures. In addition, we describe several architectural innovations, including Fractal of FractalNet network, Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and prototxt files is available at https://github.com/iPhysicist/CNNDesignPatterns). We hope others are inspired to build on our preliminary work.
Tasks
Published	2016-11-02
URL	http://arxiv.org/abs/1611.00847v3
PDF	http://arxiv.org/pdf/1611.00847v3.pdf
PWC	https://paperswithcode.com/paper/deep-convolutional-neural-network-design
Repo	https://github.com/iPhysicist/CNNDesignPatterns
Framework	none

Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data


Title	Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data
Authors	Yuncheng Li, LiangLiang Cao, Jiang Zhu, Jiebo Luo
Abstract	Composing fashion outfits involves deep understanding of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., Jewelry, Bag, Pants, Dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data. We propose to leverage outfit popularity on fashion oriented websites to supervise the scoring component. The scoring component is a multi-modal multi-instance deep learning system that evaluates instance aesthetics and set compatibility simultaneously. In order to train and evaluate the proposed composition system, we have collected a large scale fashion outfit dataset with 195K outfits and 368K fashion items from Polyvore. Although the fashion outfit scoring and composition is rather challenging, we have achieved an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task.
Tasks
Published	2016-08-10
URL	http://arxiv.org/abs/1608.03016v2
PDF	http://arxiv.org/pdf/1608.03016v2.pdf
PWC	https://paperswithcode.com/paper/mining-fashion-outfit-composition-using-an
Repo	https://github.com/xthan/polyvore-dataset
Framework	none

Generic Inference in Latent Gaussian Process Models


Title	Generic Inference in Latent Gaussian Process Models
Authors	Edwin V. Bonilla, Karl Krauth, Amir Dezfouli
Abstract	We develop an automated variational method for inference in models with Gaussian process (GP) priors and general likelihoods. The method supports multiple outputs and multiple latent functions and does not require detailed knowledge of the conditional likelihood, only needing its evaluation as a black-box function. Using a mixture of Gaussians as the variational distribution, we show that the evidence lower bound and its gradients can be estimated efficiently using samples from univariate Gaussian distributions. Furthermore, the method is scalable to large datasets which is achieved by using an augmented prior via the inducing-variable approach underpinning most sparse GP approximations, along with parallel computation and stochastic optimization. We evaluate our approach quantitatively and qualitatively with experiments on small datasets, medium-scale datasets and large datasets, showing its competitiveness under different likelihood models and sparsity levels. On the large-scale experiments involving prediction of airline delays and classification of handwritten digits, we show that our method is on par with the state-of-the-art hard-coded approaches for scalable GP regression and classification.
Tasks	Stochastic Optimization
Published	2016-09-02
URL	http://arxiv.org/abs/1609.00577v2
PDF	http://arxiv.org/pdf/1609.00577v2.pdf
PWC	https://paperswithcode.com/paper/generic-inference-in-latent-gaussian-process
Repo	https://github.com/Karl-Krauth/Sparse-GP
Framework	none

Direct Sparse Odometry


Title	Direct Sparse Odometry
Authors	Jakob Engel, Vladlen Koltun, Daniel Cremers
Abstract	We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry – represented as inverse depth in a reference frame – and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.
Tasks	Calibration, Visual Odometry
Published	2016-07-09
URL	http://arxiv.org/abs/1607.02565v2
PDF	http://arxiv.org/pdf/1607.02565v2.pdf
PWC	https://paperswithcode.com/paper/direct-sparse-odometry
Repo	https://github.com/muskie82/CNN-DSO
Framework	tf

Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation


Title	Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Authors	Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean
Abstract	Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (“wordpieces”) for both input and output. This method provides a good balance between the flexibility of “character”-delimited models and the efficiency of “word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system.
Tasks	Machine Translation
Published	2016-09-26
URL	http://arxiv.org/abs/1609.08144v2
PDF	http://arxiv.org/pdf/1609.08144v2.pdf
PWC	https://paperswithcode.com/paper/googles-neural-machine-translation-system
Repo	https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition
Framework	tf

A Compare-Aggregate Model for Matching Text Sequences


Title	A Compare-Aggregate Model for Matching Text Sequences
Authors	Shuohang Wang, Jing Jiang
Abstract	Many NLP tasks including machine comprehension, answer selection and text entailment require the comparison between sequences. Matching the important units between sequences is a key to solve these problems. In this paper, we present a general “compare-aggregate” framework that performs word-level matching followed by aggregation using Convolutional Neural Networks. We particularly focus on the different comparison functions we can use to match two vectors. We use four different datasets to evaluate the model. We find that some simple comparison functions based on element-wise operations can work better than standard neural network and neural tensor network.
Tasks	Answer Selection, Reading Comprehension
Published	2016-11-06
URL	http://arxiv.org/abs/1611.01747v1
PDF	http://arxiv.org/pdf/1611.01747v1.pdf
PWC	https://paperswithcode.com/paper/a-compare-aggregate-model-for-matching-text
Repo	https://github.com/shuohangwang/SeqMatchSeq
Framework	torch

Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues


Title	Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Authors	Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik
Abstract	This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between people and clothing or body part mentions, as they are useful for distinguishing individuals. We automatically learn weights for combining these cues and at test time, perform joint inference over all phrases in a caption. The resulting system produces state of the art performance on phrase localization on the Flickr30k Entities dataset and visual relationship detection on the Stanford VRD dataset.
Tasks
Published	2016-11-21
URL	http://arxiv.org/abs/1611.06641v4
PDF	http://arxiv.org/pdf/1611.06641v4.pdf
PWC	https://paperswithcode.com/paper/phrase-localization-and-visual-relationship
Repo	https://github.com/BryanPlummer/pl-clc
Framework	none

Review Networks for Caption Generation


Title	Review Networks for Caption Generation
Authors	Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen
Abstract	We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoder- decoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-of- the-art encoder-decoder systems on the tasks of image captioning and source code captioning.
Tasks	Image Captioning
Published	2016-05-25
URL	http://arxiv.org/abs/1605.07912v4
PDF	http://arxiv.org/pdf/1605.07912v4.pdf
PWC	https://paperswithcode.com/paper/review-networks-for-caption-generation
Repo	https://github.com/kimiyoung/review_net
Framework	none

Stance Detection with Bidirectional Conditional Encoding


Title	Stance Detection with Bidirectional Conditional Encoding
Authors	Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva
Abstract	Stance detection is the task of classifying the attitude expressed in a text towards a target such as Hillary Clinton to be “positive”, negative” or “neutral”. Previous work has assumed that either the target is mentioned in the text or that training data for every target is given. This paper considers the more challenging version of this task, where targets are not always mentioned and no training data is available for the test targets. We experiment with conditional LSTM encoding, which builds a representation of the tweet that is dependent on the target, and demonstrate that it outperforms encoding the tweet and the target independently. Performance is improved further when the conditional model is augmented with bidirectional encoding. We evaluate our approach on the SemEval 2016 Task 6 Twitter Stance Detection corpus achieving performance second best only to a system trained on semi-automatically labelled tweets for the test target. When such weak supervision is added, our approach achieves state-of-the-art results.
Tasks	Stance Detection
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05464v2
PDF	http://arxiv.org/pdf/1606.05464v2.pdf
PWC	https://paperswithcode.com/paper/stance-detection-with-bidirectional
Repo	https://github.com/sheffieldnlp/stance-conditional
Framework	tf

Italy goes to Stanford: a collection of CoreNLP modules for Italian


Title	Italy goes to Stanford: a collection of CoreNLP modules for Italian
Authors	Alessio Palmero Aprosio, Giovanni Moretti
Abstract	In this we paper present Tint, an easy-to-use set of fast, accurate and extendable Natural Language Processing modules for Italian. It is based on Stanford CoreNLP and is freely available as a standalone software or a library that can be integrated in an existing project.
Tasks
Published	2016-09-20
URL	http://arxiv.org/abs/1609.06204v2
PDF	http://arxiv.org/pdf/1609.06204v2.pdf
PWC	https://paperswithcode.com/paper/italy-goes-to-stanford-a-collection-of
Repo	https://github.com/musixmatchresearch/umberto
Framework	pytorch

DeepCoder: Learning to Write Programs


Title	DeepCoder: Learning to Write Programs
Authors	Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow
Abstract	We develop a first line of attack for solving programming competition-style problems from input-output examples using deep learning. The approach is to train a neural network to predict properties of the program that generated the outputs from the inputs. We use the neural network’s predictions to augment search techniques from the programming languages community, including enumerative search and an SMT-based solver. Empirically, we show that our approach leads to an order of magnitude speedup over the strong non-augmented baselines and a Recurrent Neural Network approach, and that we are able to solve problems of difficulty comparable to the simplest problems on programming competition websites.
Tasks	Program Synthesis
Published	2016-11-07
URL	http://arxiv.org/abs/1611.01989v2
PDF	http://arxiv.org/pdf/1611.01989v2.pdf
PWC	https://paperswithcode.com/paper/deepcoder-learning-to-write-programs
Repo	https://github.com/lcary/io-generation
Framework	none

A Kronecker-factored approximate Fisher matrix for convolution layers


Title	A Kronecker-factored approximate Fisher matrix for convolution layers
Authors	Roger Grosse, James Martens
Abstract	Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting.
Tasks
Published	2016-02-03
URL	http://arxiv.org/abs/1602.01407v2
PDF	http://arxiv.org/pdf/1602.01407v2.pdf
PWC	https://paperswithcode.com/paper/a-kronecker-factored-approximate-fisher
Repo	https://github.com/Thrandis/EKFAC-pytorch
Framework	pytorch

Stacked Generative Adversarial Networks


Title	Stacked Generative Adversarial Networks
Authors	Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie
Abstract	In this paper, we propose a novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network. Our model consists of a top-down stack of GANs, each learned to generate lower-level representations conditioned on higher-level representations. A representation discriminator is introduced at each feature hierarchy to encourage the representation manifold of the generator to align with that of the bottom-up discriminative network, leveraging the powerful discriminative representations to guide the generative model. In addition, we introduce a conditional loss that encourages the use of conditional information from the layer above, and a novel entropy loss that maximizes a variational lower bound on the conditional entropy of generator outputs. We first train each stack independently, and then train the whole model end-to-end. Unlike the original GAN that uses a single noise vector to represent all the variations, our SGAN decomposes variations into multiple levels and gradually resolves uncertainties in the top-down generative process. Based on visual inspection, Inception scores and visual Turing test, we demonstrate that SGAN is able to generate images of much higher quality than GANs without stacking.
Tasks	Conditional Image Generation
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04357v4
PDF	http://arxiv.org/pdf/1612.04357v4.pdf
PWC	https://paperswithcode.com/paper/stacked-generative-adversarial-networks
Repo	https://github.com/xunhuang1995/SGAN
Framework	none

Inference Compilation and Universal Probabilistic Programming


Title	Inference Compilation and Universal Probabilistic Programming
Authors	Tuan Anh Le, Atilim Gunes Baydin, Frank Wood
Abstract	We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do “compilation of inference” because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.
Tasks	Probabilistic Programming
Published	2016-10-31
URL	http://arxiv.org/abs/1610.09900v2
PDF	http://arxiv.org/pdf/1610.09900v2.pdf
PWC	https://paperswithcode.com/paper/inference-compilation-and-universal
Repo	https://github.com/pyprob/pyprob
Framework	pytorch

Incremental Robot Learning of New Objects with Fixed Update Time


Title	Incremental Robot Learning of New Objects with Fixed Update Time
Authors	Raffaello Camoriano, Giulia Pasquale, Carlo Ciliberto, Lorenzo Natale, Lorenzo Rosasco, Giorgio Metta
Abstract	We consider object recognition in the context of lifelong learning, where a robotic agent learns to discriminate between a growing number of object classes as it accumulates experience about the environment. We propose an incremental variant of the Regularized Least Squares for Classification (RLSC) algorithm, and exploit its structure to seamlessly add new classes to the learned model. The presented algorithm addresses the problem of having an unbalanced proportion of training examples per class, which occurs when new objects are presented to the system for the first time. We evaluate our algorithm on both a machine learning benchmark dataset and two challenging object recognition tasks in a robotic setting. Empirical evidence shows that our approach achieves comparable or higher classification performance than its batch counterpart when classes are unbalanced, while being significantly faster.
Tasks	Active Learning, Object Recognition
Published	2016-05-17
URL	http://arxiv.org/abs/1605.05045v3
PDF	http://arxiv.org/pdf/1605.05045v3.pdf
PWC	https://paperswithcode.com/paper/incremental-robot-learning-of-new-objects
Repo	https://github.com/LCSL/incremental_multiclass_RLSC
Framework	none