Paper Group AWR 89
Self Paced Deep Learning for Weakly Supervised Object Detection. Variational Inference: A Review for Statisticians. An empirical study on the effects of different types of noise in image classification tasks. Revisiting Distributed Synchronous SGD. RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks. Joint Unsupe …
Self Paced Deep Learning for Weakly Supervised Object Detection
Title | Self Paced Deep Learning for Weakly Supervised Object Detection |
Authors | Enver Sangineto, Moin Nabi, Dubravko Culibrk, Nicu Sebe |
Abstract | In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not available, most of the solutions proposed so far are based on an iterative, Multiple Instance Learning framework in which the current classifier is used to select the highest-confidence boxes in each image, which are treated as pseudo-ground truth in the next training iteration. However, the errors of an immature classifier can make the process drift, usually introducing many of false positives in the training dataset. To alleviate this problem, we propose in this paper a training protocol based on the self-paced learning paradigm. The main idea is to iteratively select a subset of images and boxes that are the most reliable, and use them for training. While in the past few years similar strategies have been adopted for SVMs and other classifiers, we are the first showing that a self-paced approach can be used with deep-network-based classifiers in an end-to-end training pipeline. The method we propose is built on the fully-supervised Fast-RCNN architecture and can be applied to similar architectures which represent the input image as a bag of boxes. We show state-of-the-art results on Pascal VOC 2007, Pascal VOC 2010 and ILSVRC 2013. On ILSVRC 2013 our results based on a low-capacity AlexNet network outperform even those weakly-supervised approaches which are based on much higher-capacity networks. |
Tasks | Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection |
Published | 2016-05-24 |
URL | http://arxiv.org/abs/1605.07651v3 |
http://arxiv.org/pdf/1605.07651v3.pdf | |
PWC | https://paperswithcode.com/paper/self-paced-deep-learning-for-weakly |
Repo | https://github.com/moinnabi/SelfPacedDeepLearning |
Framework | none |
Variational Inference: A Review for Statisticians
Title | Variational Inference: A Review for Statisticians |
Authors | David M. Blei, Alp Kucukelbir, Jon D. McAuliffe |
Abstract | One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms. |
Tasks | Stochastic Optimization |
Published | 2016-01-04 |
URL | http://arxiv.org/abs/1601.00670v9 |
http://arxiv.org/pdf/1601.00670v9.pdf | |
PWC | https://paperswithcode.com/paper/variational-inference-a-review-for |
Repo | https://github.com/taolicheng/Deep-Learning-Learning-Path |
Framework | tf |
An empirical study on the effects of different types of noise in image classification tasks
Title | An empirical study on the effects of different types of noise in image classification tasks |
Authors | Gabriel B. Paranhos da Costa, Welinton A. Contato, Tiago S. Nazare, João E. S. Batista Neto, Moacir Ponti |
Abstract | Image classification is one of the main research problems in computer vision and machine learning. Since in most real-world image classification applications there is no control over how the images are captured, it is necessary to consider the possibility that these images might be affected by noise (e.g. sensor noise in a low-quality surveillance camera). In this paper we analyse the impact of three different types of noise on descriptors extracted by two widely used feature extraction methods (LBP and HOG) and how denoising the images can help to mitigate this problem. We carry out experiments on two different datasets and consider several types of noise, noise levels, and denoising methods. Our results show that noise can hinder classification performance considerably and make classes harder to separate. Although denoising methods were not able to reach the same performance of the noise-free scenario, they improved classification results for noisy data. |
Tasks | Denoising, Image Classification |
Published | 2016-09-09 |
URL | http://arxiv.org/abs/1609.02781v1 |
http://arxiv.org/pdf/1609.02781v1.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-on-the-effects-of |
Repo | https://github.com/gbpcosta/wvc_2016_noise |
Framework | none |
Revisiting Distributed Synchronous SGD
Title | Revisiting Distributed Synchronous SGD |
Authors | Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz |
Abstract | Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting for straggling workers. We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches. We demonstrate that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers. Our approach is empirically validated and shown to converge faster and to better test accuracies. |
Tasks | Stochastic Optimization |
Published | 2016-04-04 |
URL | http://arxiv.org/abs/1604.00981v3 |
http://arxiv.org/pdf/1604.00981v3.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-distributed-synchronous-sgd |
Repo | https://github.com/tensorflow/models/tree/master/research/inception |
Framework | tf |
RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks
Title | RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks |
Authors | Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney |
Abstract | In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns. |
Tasks | |
Published | 2016-08-02 |
URL | http://arxiv.org/abs/1608.00895v2 |
http://arxiv.org/pdf/1608.00895v2.pdf | |
PWC | https://paperswithcode.com/paper/returnn-the-rwth-extensible-training |
Repo | https://github.com/rwth-i6/returnn |
Framework | tf |
Joint Unsupervised Learning of Deep Representations and Image Clusters
Title | Joint Unsupervised Learning of Deep Representations and Image Clusters |
Authors | Jianwei Yang, Devi Parikh, Dhruv Batra |
Abstract | In this paper, we propose a recurrent framework for Joint Unsupervised LEarning (JULE) of deep representations and image clusters. In our framework, successive operations in a clustering algorithm are expressed as steps in a recurrent process, stacked on top of representations output by a Convolutional Neural Network (CNN). During training, image clusters and representations are updated jointly: image clustering is conducted in the forward pass, while representation learning in the backward pass. Our key idea behind this framework is that good representations are beneficial to image clustering and clustering results provide supervisory signals to representation learning. By integrating two processes into a single model with a unified weighted triplet loss and optimizing it end-to-end, we can obtain not only more powerful representations, but also more precise image clusters. Extensive experiments show that our method outperforms the state-of-the-art on image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to other tasks. |
Tasks | Image Clustering, Representation Learning |
Published | 2016-04-13 |
URL | http://arxiv.org/abs/1604.03628v3 |
http://arxiv.org/pdf/1604.03628v3.pdf | |
PWC | https://paperswithcode.com/paper/joint-unsupervised-learning-of-deep |
Repo | https://github.com/jwyang/JULE-Torch |
Framework | torch |
Rationale-Augmented Convolutional Neural Networks for Text Classification
Title | Rationale-Augmented Convolutional Neural Networks for Text Classification |
Authors | Ye Zhang, Iain Marshall, Byron C. Wallace |
Abstract | We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their component sentences. Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) that support their overall document categorization, i.e., they provide rationales. Our model exploits such supervision via a hierarchical approach in which each document is represented by a linear combination of the vector representations of its component sentences. We propose a sentence-level convolutional model that estimates the probability that a given sentence is a rationale, and we then scale the contribution of each sentence to the aggregate document representation in proportion to these estimates. Experiments on five classification datasets that have document labels and associated rationales demonstrate that our approach consistently outperforms strong baselines. Moreover, our model naturally provides explanations for its predictions. |
Tasks | Text Classification |
Published | 2016-05-14 |
URL | http://arxiv.org/abs/1605.04469v3 |
http://arxiv.org/pdf/1605.04469v3.pdf | |
PWC | https://paperswithcode.com/paper/rationale-augmented-convolutional-neural |
Repo | https://github.com/yezhang-xiaofan/Rationale-CNN |
Framework | none |
Ex Machina: Personal Attacks Seen at Scale
Title | Ex Machina: Personal Attacks Seen at Scale |
Authors | Ellery Wulczyn, Nithum Thain, Lucas Dixon |
Abstract | The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users. |
Tasks | |
Published | 2016-10-27 |
URL | http://arxiv.org/abs/1610.08914v2 |
http://arxiv.org/pdf/1610.08914v2.pdf | |
PWC | https://paperswithcode.com/paper/ex-machina-personal-attacks-seen-at-scale |
Repo | https://github.com/canonical-debate-lab/paper |
Framework | none |
Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation
Title | Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation |
Authors | Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville |
Abstract | We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study. On Twitter, the model appears to generate more relevant and on-topic responses according to automatic evaluation metrics. Finally, our experiments demonstrate that the proposed model is more adept at overcoming the sparsity of natural language and is better able to capture long-term structure. |
Tasks | Dialogue Generation, Text Generation |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00776v2 |
http://arxiv.org/pdf/1606.00776v2.pdf | |
PWC | https://paperswithcode.com/paper/multiresolution-recurrent-neural-networks-an |
Repo | https://github.com/julianser/hed-dlg-truncated |
Framework | none |
Joint Multimodal Learning with Deep Generative Models
Title | Joint Multimodal Learning with Deep Generative Models |
Authors | Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo |
Abstract | We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such as variational autoencoders (VAEs). However, these models typically assume that modalities are forced to have a conditioned relation, i.e., we can only generate modalities in one direction. To achieve our objective, we should extract a joint representation that captures high-level concepts among all modalities and through which we can exchange them bi-directionally. As described herein, we propose a joint multimodal variational autoencoder (JMVAE), in which all modalities are independently conditioned on joint representation. In other words, it models a joint distribution of modalities. Furthermore, to be able to generate missing modalities from the remaining modalities properly, we develop an additional method, JMVAE-kl, that is trained by reducing the divergence between JMVAE’s encoder and prepared networks of respective modalities. Our experiments show that our proposed method can obtain appropriate joint representation from multiple modalities and that it can generate and reconstruct them more properly than conventional VAEs. We further demonstrate that JMVAE can generate multiple modalities bi-directionally. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.01891v1 |
http://arxiv.org/pdf/1611.01891v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-multimodal-learning-with-deep |
Repo | https://github.com/masa-su/Tars |
Framework | pytorch |
Revisiting Semi-Supervised Learning with Graph Embeddings
Title | Revisiting Semi-Supervised Learning with Graph Embeddings |
Authors | Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov |
Abstract | We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models. |
Tasks | Document Classification, Entity Extraction, Node Classification, Text Classification |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08861v2 |
http://arxiv.org/pdf/1603.08861v2.pdf | |
PWC | https://paperswithcode.com/paper/revisiting-semi-supervised-learning-with |
Repo | https://github.com/tkipf/ica |
Framework | none |
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Title | Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering |
Authors | Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst |
Abstract | In this work, we are interested in generalizing convolutional neural networks (CNNs) from low-dimensional regular grids, where image, video and speech are represented, to high-dimensional irregular domains, such as social networks, brain connectomes or words’ embedding, represented by graphs. We present a formulation of CNNs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs. Importantly, the proposed technique offers the same linear computational complexity and constant learning complexity as classical CNNs, while being universal to any graph structure. Experiments on MNIST and 20NEWS demonstrate the ability of this novel deep learning system to learn local, stationary, and compositional features on graphs. |
Tasks | Node Classification, Skeleton Based Action Recognition |
Published | 2016-06-30 |
URL | http://arxiv.org/abs/1606.09375v3 |
http://arxiv.org/pdf/1606.09375v3.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-networks-on-graphs-with |
Repo | https://github.com/SwissDataScienceCenter/DeepSphere |
Framework | tf |
The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM
Title | The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM |
Authors | Elias Mueggler, Henri Rebecq, Guillermo Gallego, Tobi Delbruck, Davide Scaramuzza |
Abstract | New vision sensors, such as the Dynamic and Active-pixel Vision sensor (DAVIS), incorporate a conventional global-shutter camera and an event-based sensor in the same pixel array. These sensors have great potential for high-speed robotics and computer vision because they allow us to combine the benefits of conventional cameras with those of event-based sensors: low latency, high temporal resolution, and very high dynamic range. However, new algorithms are required to exploit the sensor characteristics and cope with its unconventional output, which consists of a stream of asynchronous brightness changes (called “events”) and synchronous grayscale frames. For this purpose, we present and release a collection of datasets captured with a DAVIS in a variety of synthetic and real environments, which we hope will motivate research on new algorithms for high-speed and high-dynamic-range robotics and computer-vision applications. In addition to global-shutter intensity images and asynchronous events, we provide inertial measurements and ground-truth camera poses from a motion-capture system. The latter allows comparing the pose accuracy of ego-motion estimation algorithms quantitatively. All the data are released both as standard text files and binary files (i.e., rosbag). This paper provides an overview of the available data and describes a simulator that we release open-source to create synthetic event-camera data. |
Tasks | Motion Capture, Motion Estimation, Pose Estimation, Visual Odometry |
Published | 2016-10-26 |
URL | http://arxiv.org/abs/1610.08336v4 |
http://arxiv.org/pdf/1610.08336v4.pdf | |
PWC | https://paperswithcode.com/paper/the-event-camera-dataset-and-simulator-event |
Repo | https://github.com/uzh-rpg/rpg_davis_simulator |
Framework | none |
A Network-based End-to-End Trainable Task-oriented Dialogue System
Title | A Network-based End-to-End Trainable Task-oriented Dialogue System |
Authors | Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, Steve Young |
Abstract | Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain. |
Tasks | Task-Oriented Dialogue Systems |
Published | 2016-04-15 |
URL | http://arxiv.org/abs/1604.04562v3 |
http://arxiv.org/pdf/1604.04562v3.pdf | |
PWC | https://paperswithcode.com/paper/a-network-based-end-to-end-trainable-task |
Repo | https://github.com/ysglh/Task-Oriented-Dialogue-Dataset-Survey |
Framework | none |
A Hierarchical Approach for Generating Descriptive Image Paragraphs
Title | A Hierarchical Approach for Generating Descriptive Image Paragraphs |
Authors | Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei |
Abstract | Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable to produce a coherent story for an image. In this paper we overcome these limitations by generating entire paragraphs for describing images, which can tell detailed, unified stories. We develop a model that decomposes both images and paragraphs into their constituent parts, detecting semantic regions in images and using a hierarchical recurrent neural network to reason about language. Linguistic analysis confirms the complexity of the paragraph generation task, and thorough experiments on a new dataset of image and paragraph pairs demonstrate the effectiveness of our approach. |
Tasks | Image Captioning, Image Paragraph Captioning |
Published | 2016-11-20 |
URL | http://arxiv.org/abs/1611.06607v2 |
http://arxiv.org/pdf/1611.06607v2.pdf | |
PWC | https://paperswithcode.com/paper/a-hierarchical-approach-for-generating |
Repo | https://github.com/InnerPeace-Wu/im2p-tensorflow |
Framework | tf |