Paper Group AWR 90
Deep Convolutional Neural Network Design Patterns. Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data. Generic Inference in Latent Gaussian Process Models. Direct Sparse Odometry. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. A Compare-Aggregate Model for Ma …
Deep Convolutional Neural Network Design Patterns
Title | Deep Convolutional Neural Network Design Patterns |
Authors | Leslie N. Smith, Nicholay Topin |
Abstract | Recent research in the deep learning field has produced a plethora of new architectures. At the same time, a growing number of groups are applying deep learning to new applications. Some of these groups are likely to be composed of inexperienced deep learning practitioners who are baffled by the dizzying array of architecture choices and therefore opt to use an older architecture (i.e., Alexnet). Here we attempt to bridge this gap by mining the collective knowledge contained in recent deep learning research to discover underlying principles for designing neural network architectures. In addition, we describe several architectural innovations, including Fractal of FractalNet network, Stagewise Boosting Networks, and Taylor Series Networks (our Caffe code and prototxt files is available at https://github.com/iPhysicist/CNNDesignPatterns). We hope others are inspired to build on our preliminary work. |
Tasks | |
Published | 2016-11-02 |
URL | http://arxiv.org/abs/1611.00847v3 |
http://arxiv.org/pdf/1611.00847v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-convolutional-neural-network-design |
Repo | https://github.com/iPhysicist/CNNDesignPatterns |
Framework | none |
Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data
Title | Mining Fashion Outfit Composition Using An End-to-End Deep Learning Approach on Set Data |
Authors | Yuncheng Li, LiangLiang Cao, Jiang Zhu, Jiebo Luo |
Abstract | Composing fashion outfits involves deep understanding of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., Jewelry, Bag, Pants, Dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data. We propose to leverage outfit popularity on fashion oriented websites to supervise the scoring component. The scoring component is a multi-modal multi-instance deep learning system that evaluates instance aesthetics and set compatibility simultaneously. In order to train and evaluate the proposed composition system, we have collected a large scale fashion outfit dataset with 195K outfits and 368K fashion items from Polyvore. Although the fashion outfit scoring and composition is rather challenging, we have achieved an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task. |
Tasks | |
Published | 2016-08-10 |
URL | http://arxiv.org/abs/1608.03016v2 |
http://arxiv.org/pdf/1608.03016v2.pdf | |
PWC | https://paperswithcode.com/paper/mining-fashion-outfit-composition-using-an |
Repo | https://github.com/xthan/polyvore-dataset |
Framework | none |
Generic Inference in Latent Gaussian Process Models
Title | Generic Inference in Latent Gaussian Process Models |
Authors | Edwin V. Bonilla, Karl Krauth, Amir Dezfouli |
Abstract | We develop an automated variational method for inference in models with Gaussian process (GP) priors and general likelihoods. The method supports multiple outputs and multiple latent functions and does not require detailed knowledge of the conditional likelihood, only needing its evaluation as a black-box function. Using a mixture of Gaussians as the variational distribution, we show that the evidence lower bound and its gradients can be estimated efficiently using samples from univariate Gaussian distributions. Furthermore, the method is scalable to large datasets which is achieved by using an augmented prior via the inducing-variable approach underpinning most sparse GP approximations, along with parallel computation and stochastic optimization. We evaluate our approach quantitatively and qualitatively with experiments on small datasets, medium-scale datasets and large datasets, showing its competitiveness under different likelihood models and sparsity levels. On the large-scale experiments involving prediction of airline delays and classification of handwritten digits, we show that our method is on par with the state-of-the-art hard-coded approaches for scalable GP regression and classification. |
Tasks | Stochastic Optimization |
Published | 2016-09-02 |
URL | http://arxiv.org/abs/1609.00577v2 |
http://arxiv.org/pdf/1609.00577v2.pdf | |
PWC | https://paperswithcode.com/paper/generic-inference-in-latent-gaussian-process |
Repo | https://github.com/Karl-Krauth/Sparse-GP |
Framework | none |
Direct Sparse Odometry
Title | Direct Sparse Odometry |
Authors | Jakob Engel, Vladlen Koltun, Daniel Cremers |
Abstract | We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry – represented as inverse depth in a reference frame – and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness. |
Tasks | Calibration, Visual Odometry |
Published | 2016-07-09 |
URL | http://arxiv.org/abs/1607.02565v2 |
http://arxiv.org/pdf/1607.02565v2.pdf | |
PWC | https://paperswithcode.com/paper/direct-sparse-odometry |
Repo | https://github.com/muskie82/CNN-DSO |
Framework | tf |
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Title | Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation |
Authors | Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean |
Abstract | Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT’s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google’s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (“wordpieces”) for both input and output. This method provides a good balance between the flexibility of “character”-delimited models and the efficiency of “word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT’14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google’s phrase-based production system. |
Tasks | Machine Translation |
Published | 2016-09-26 |
URL | http://arxiv.org/abs/1609.08144v2 |
http://arxiv.org/pdf/1609.08144v2.pdf | |
PWC | https://paperswithcode.com/paper/googles-neural-machine-translation-system |
Repo | https://github.com/guillaume-chevalier/LSTM-Human-Activity-Recognition |
Framework | tf |
A Compare-Aggregate Model for Matching Text Sequences
Title | A Compare-Aggregate Model for Matching Text Sequences |
Authors | Shuohang Wang, Jing Jiang |
Abstract | Many NLP tasks including machine comprehension, answer selection and text entailment require the comparison between sequences. Matching the important units between sequences is a key to solve these problems. In this paper, we present a general “compare-aggregate” framework that performs word-level matching followed by aggregation using Convolutional Neural Networks. We particularly focus on the different comparison functions we can use to match two vectors. We use four different datasets to evaluate the model. We find that some simple comparison functions based on element-wise operations can work better than standard neural network and neural tensor network. |
Tasks | Answer Selection, Reading Comprehension |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01747v1 |
http://arxiv.org/pdf/1611.01747v1.pdf | |
PWC | https://paperswithcode.com/paper/a-compare-aggregate-model-for-matching-text |
Repo | https://github.com/shuohangwang/SeqMatchSeq |
Framework | torch |
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Title | Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues |
Authors | Bryan A. Plummer, Arun Mallya, Christopher M. Cervantes, Julia Hockenmaier, Svetlana Lazebnik |
Abstract | This paper presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues. We model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions. Special attention is given to relationships between people and clothing or body part mentions, as they are useful for distinguishing individuals. We automatically learn weights for combining these cues and at test time, perform joint inference over all phrases in a caption. The resulting system produces state of the art performance on phrase localization on the Flickr30k Entities dataset and visual relationship detection on the Stanford VRD dataset. |
Tasks | |
Published | 2016-11-21 |
URL | http://arxiv.org/abs/1611.06641v4 |
http://arxiv.org/pdf/1611.06641v4.pdf | |
PWC | https://paperswithcode.com/paper/phrase-localization-and-visual-relationship |
Repo | https://github.com/BryanPlummer/pl-clc |
Framework | none |
Review Networks for Caption Generation
Title | Review Networks for Caption Generation |
Authors | Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen |
Abstract | We propose a novel extension of the encoder-decoder framework, called a review network. The review network is generic and can enhance any existing encoder- decoder model: in this paper, we consider RNN decoders with both CNN and RNN encoders. The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework. Empirically, we show that our framework improves over state-of- the-art encoder-decoder systems on the tasks of image captioning and source code captioning. |
Tasks | Image Captioning |
Published | 2016-05-25 |
URL | http://arxiv.org/abs/1605.07912v4 |
http://arxiv.org/pdf/1605.07912v4.pdf | |
PWC | https://paperswithcode.com/paper/review-networks-for-caption-generation |
Repo | https://github.com/kimiyoung/review_net |
Framework | none |
Stance Detection with Bidirectional Conditional Encoding
Title | Stance Detection with Bidirectional Conditional Encoding |
Authors | Isabelle Augenstein, Tim Rocktäschel, Andreas Vlachos, Kalina Bontcheva |
Abstract | Stance detection is the task of classifying the attitude expressed in a text towards a target such as Hillary Clinton to be “positive”, negative” or “neutral”. Previous work has assumed that either the target is mentioned in the text or that training data for every target is given. This paper considers the more challenging version of this task, where targets are not always mentioned and no training data is available for the test targets. We experiment with conditional LSTM encoding, which builds a representation of the tweet that is dependent on the target, and demonstrate that it outperforms encoding the tweet and the target independently. Performance is improved further when the conditional model is augmented with bidirectional encoding. We evaluate our approach on the SemEval 2016 Task 6 Twitter Stance Detection corpus achieving performance second best only to a system trained on semi-automatically labelled tweets for the test target. When such weak supervision is added, our approach achieves state-of-the-art results. |
Tasks | Stance Detection |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05464v2 |
http://arxiv.org/pdf/1606.05464v2.pdf | |
PWC | https://paperswithcode.com/paper/stance-detection-with-bidirectional |
Repo | https://github.com/sheffieldnlp/stance-conditional |
Framework | tf |
Italy goes to Stanford: a collection of CoreNLP modules for Italian
Title | Italy goes to Stanford: a collection of CoreNLP modules for Italian |
Authors | Alessio Palmero Aprosio, Giovanni Moretti |
Abstract | In this we paper present Tint, an easy-to-use set of fast, accurate and extendable Natural Language Processing modules for Italian. It is based on Stanford CoreNLP and is freely available as a standalone software or a library that can be integrated in an existing project. |
Tasks | |
Published | 2016-09-20 |
URL | http://arxiv.org/abs/1609.06204v2 |
http://arxiv.org/pdf/1609.06204v2.pdf | |
PWC | https://paperswithcode.com/paper/italy-goes-to-stanford-a-collection-of |
Repo | https://github.com/musixmatchresearch/umberto |
Framework | pytorch |
DeepCoder: Learning to Write Programs
Title | DeepCoder: Learning to Write Programs |
Authors | Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, Daniel Tarlow |
Abstract | We develop a first line of attack for solving programming competition-style problems from input-output examples using deep learning. The approach is to train a neural network to predict properties of the program that generated the outputs from the inputs. We use the neural network’s predictions to augment search techniques from the programming languages community, including enumerative search and an SMT-based solver. Empirically, we show that our approach leads to an order of magnitude speedup over the strong non-augmented baselines and a Recurrent Neural Network approach, and that we are able to solve problems of difficulty comparable to the simplest problems on programming competition websites. |
Tasks | Program Synthesis |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.01989v2 |
http://arxiv.org/pdf/1611.01989v2.pdf | |
PWC | https://paperswithcode.com/paper/deepcoder-learning-to-write-programs |
Repo | https://github.com/lcary/io-generation |
Framework | none |
A Kronecker-factored approximate Fisher matrix for convolution layers
Title | A Kronecker-factored approximate Fisher matrix for convolution layers |
Authors | Roger Grosse, James Martens |
Abstract | Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting. |
Tasks | |
Published | 2016-02-03 |
URL | http://arxiv.org/abs/1602.01407v2 |
http://arxiv.org/pdf/1602.01407v2.pdf | |
PWC | https://paperswithcode.com/paper/a-kronecker-factored-approximate-fisher |
Repo | https://github.com/Thrandis/EKFAC-pytorch |
Framework | pytorch |
Stacked Generative Adversarial Networks
Title | Stacked Generative Adversarial Networks |
Authors | Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie |
Abstract | In this paper, we propose a novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network. Our model consists of a top-down stack of GANs, each learned to generate lower-level representations conditioned on higher-level representations. A representation discriminator is introduced at each feature hierarchy to encourage the representation manifold of the generator to align with that of the bottom-up discriminative network, leveraging the powerful discriminative representations to guide the generative model. In addition, we introduce a conditional loss that encourages the use of conditional information from the layer above, and a novel entropy loss that maximizes a variational lower bound on the conditional entropy of generator outputs. We first train each stack independently, and then train the whole model end-to-end. Unlike the original GAN that uses a single noise vector to represent all the variations, our SGAN decomposes variations into multiple levels and gradually resolves uncertainties in the top-down generative process. Based on visual inspection, Inception scores and visual Turing test, we demonstrate that SGAN is able to generate images of much higher quality than GANs without stacking. |
Tasks | Conditional Image Generation |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04357v4 |
http://arxiv.org/pdf/1612.04357v4.pdf | |
PWC | https://paperswithcode.com/paper/stacked-generative-adversarial-networks |
Repo | https://github.com/xunhuang1995/SGAN |
Framework | none |
Inference Compilation and Universal Probabilistic Programming
Title | Inference Compilation and Universal Probabilistic Programming |
Authors | Tuan Anh Le, Atilim Gunes Baydin, Frank Wood |
Abstract | We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do “compilation of inference” because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference. |
Tasks | Probabilistic Programming |
Published | 2016-10-31 |
URL | http://arxiv.org/abs/1610.09900v2 |
http://arxiv.org/pdf/1610.09900v2.pdf | |
PWC | https://paperswithcode.com/paper/inference-compilation-and-universal |
Repo | https://github.com/pyprob/pyprob |
Framework | pytorch |
Incremental Robot Learning of New Objects with Fixed Update Time
Title | Incremental Robot Learning of New Objects with Fixed Update Time |
Authors | Raffaello Camoriano, Giulia Pasquale, Carlo Ciliberto, Lorenzo Natale, Lorenzo Rosasco, Giorgio Metta |
Abstract | We consider object recognition in the context of lifelong learning, where a robotic agent learns to discriminate between a growing number of object classes as it accumulates experience about the environment. We propose an incremental variant of the Regularized Least Squares for Classification (RLSC) algorithm, and exploit its structure to seamlessly add new classes to the learned model. The presented algorithm addresses the problem of having an unbalanced proportion of training examples per class, which occurs when new objects are presented to the system for the first time. We evaluate our algorithm on both a machine learning benchmark dataset and two challenging object recognition tasks in a robotic setting. Empirical evidence shows that our approach achieves comparable or higher classification performance than its batch counterpart when classes are unbalanced, while being significantly faster. |
Tasks | Active Learning, Object Recognition |
Published | 2016-05-17 |
URL | http://arxiv.org/abs/1605.05045v3 |
http://arxiv.org/pdf/1605.05045v3.pdf | |
PWC | https://paperswithcode.com/paper/incremental-robot-learning-of-new-objects |
Repo | https://github.com/LCSL/incremental_multiclass_RLSC |
Framework | none |