May 7, 2019

3095 words 15 mins read

Paper Group AWR 89

Self Paced Deep Learning for Weakly Supervised Object Detection. Variational Inference: A Review for Statisticians. An empirical study on the effects of different types of noise in image classification tasks. Revisiting Distributed Synchronous SGD. RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks. Joint Unsupe …

Self Paced Deep Learning for Weakly Supervised Object Detection


Title	Self Paced Deep Learning for Weakly Supervised Object Detection
Authors	Enver Sangineto, Moin Nabi, Dubravko Culibrk, Nicu Sebe
Abstract	In a weakly-supervised scenario object detectors need to be trained using image-level annotation alone. Since bounding-box-level ground truth is not available, most of the solutions proposed so far are based on an iterative, Multiple Instance Learning framework in which the current classifier is used to select the highest-confidence boxes in each image, which are treated as pseudo-ground truth in the next training iteration. However, the errors of an immature classifier can make the process drift, usually introducing many of false positives in the training dataset. To alleviate this problem, we propose in this paper a training protocol based on the self-paced learning paradigm. The main idea is to iteratively select a subset of images and boxes that are the most reliable, and use them for training. While in the past few years similar strategies have been adopted for SVMs and other classifiers, we are the first showing that a self-paced approach can be used with deep-network-based classifiers in an end-to-end training pipeline. The method we propose is built on the fully-supervised Fast-RCNN architecture and can be applied to similar architectures which represent the input image as a bag of boxes. We show state-of-the-art results on Pascal VOC 2007, Pascal VOC 2010 and ILSVRC 2013. On ILSVRC 2013 our results based on a low-capacity AlexNet network outperform even those weakly-supervised approaches which are based on much higher-capacity networks.
Tasks	Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection
Published	2016-05-24
URL	http://arxiv.org/abs/1605.07651v3
PDF	http://arxiv.org/pdf/1605.07651v3.pdf
PWC	https://paperswithcode.com/paper/self-paced-deep-learning-for-weakly
Repo	https://github.com/moinnabi/SelfPacedDeepLearning
Framework	none

Variational Inference: A Review for Statisticians


Title	Variational Inference: A Review for Statisticians
Authors	David M. Blei, Alp Kucukelbir, Jon D. McAuliffe
Abstract	One of the core problems of modern statistics is to approximate difficult-to-compute probability densities. This problem is especially important in Bayesian statistics, which frames all inference about unknown quantities as a calculation involving the posterior density. In this paper, we review variational inference (VI), a method from machine learning that approximates probability densities through optimization. VI has been used in many applications and tends to be faster than classical methods, such as Markov chain Monte Carlo sampling. The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. We review the ideas behind mean-field variational inference, discuss the special case of VI applied to exponential family models, present a full example with a Bayesian mixture of Gaussians, and derive a variant that uses stochastic optimization to scale up to massive data. We discuss modern research in VI and highlight important open problems. VI is powerful, but it is not yet well understood. Our hope in writing this paper is to catalyze statistical research on this class of algorithms.
Tasks	Stochastic Optimization
Published	2016-01-04
URL	http://arxiv.org/abs/1601.00670v9
PDF	http://arxiv.org/pdf/1601.00670v9.pdf
PWC	https://paperswithcode.com/paper/variational-inference-a-review-for
Repo	https://github.com/taolicheng/Deep-Learning-Learning-Path
Framework	tf

An empirical study on the effects of different types of noise in image classification tasks


Title	An empirical study on the effects of different types of noise in image classification tasks
Authors	Gabriel B. Paranhos da Costa, Welinton A. Contato, Tiago S. Nazare, João E. S. Batista Neto, Moacir Ponti
Abstract	Image classification is one of the main research problems in computer vision and machine learning. Since in most real-world image classification applications there is no control over how the images are captured, it is necessary to consider the possibility that these images might be affected by noise (e.g. sensor noise in a low-quality surveillance camera). In this paper we analyse the impact of three different types of noise on descriptors extracted by two widely used feature extraction methods (LBP and HOG) and how denoising the images can help to mitigate this problem. We carry out experiments on two different datasets and consider several types of noise, noise levels, and denoising methods. Our results show that noise can hinder classification performance considerably and make classes harder to separate. Although denoising methods were not able to reach the same performance of the noise-free scenario, they improved classification results for noisy data.
Tasks	Denoising, Image Classification
Published	2016-09-09
URL	http://arxiv.org/abs/1609.02781v1
PDF	http://arxiv.org/pdf/1609.02781v1.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-on-the-effects-of
Repo	https://github.com/gbpcosta/wvc_2016_noise
Framework	none

Revisiting Distributed Synchronous SGD


Title	Revisiting Distributed Synchronous SGD
Authors	Jianmin Chen, Xinghao Pan, Rajat Monga, Samy Bengio, Rafal Jozefowicz
Abstract	Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting for straggling workers. We revisit these conventional beliefs in this paper, and examine the weaknesses of both approaches. We demonstrate that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers. Our approach is empirically validated and shown to converge faster and to better test accuracies.
Tasks	Stochastic Optimization
Published	2016-04-04
URL	http://arxiv.org/abs/1604.00981v3
PDF	http://arxiv.org/pdf/1604.00981v3.pdf
PWC	https://paperswithcode.com/paper/revisiting-distributed-synchronous-sgd
Repo	https://github.com/tensorflow/models/tree/master/research/inception
Framework	tf

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks


Title	RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks
Authors	Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney
Abstract	In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns.
Tasks
Published	2016-08-02
URL	http://arxiv.org/abs/1608.00895v2
PDF	http://arxiv.org/pdf/1608.00895v2.pdf
PWC	https://paperswithcode.com/paper/returnn-the-rwth-extensible-training
Repo	https://github.com/rwth-i6/returnn
Framework	tf

Joint Unsupervised Learning of Deep Representations and Image Clusters


Title	Joint Unsupervised Learning of Deep Representations and Image Clusters
Authors	Jianwei Yang, Devi Parikh, Dhruv Batra
Abstract	In this paper, we propose a recurrent framework for Joint Unsupervised LEarning (JULE) of deep representations and image clusters. In our framework, successive operations in a clustering algorithm are expressed as steps in a recurrent process, stacked on top of representations output by a Convolutional Neural Network (CNN). During training, image clusters and representations are updated jointly: image clustering is conducted in the forward pass, while representation learning in the backward pass. Our key idea behind this framework is that good representations are beneficial to image clustering and clustering results provide supervisory signals to representation learning. By integrating two processes into a single model with a unified weighted triplet loss and optimizing it end-to-end, we can obtain not only more powerful representations, but also more precise image clusters. Extensive experiments show that our method outperforms the state-of-the-art on image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to other tasks.
Tasks	Image Clustering, Representation Learning
Published	2016-04-13
URL	http://arxiv.org/abs/1604.03628v3
PDF	http://arxiv.org/pdf/1604.03628v3.pdf
PWC	https://paperswithcode.com/paper/joint-unsupervised-learning-of-deep
Repo	https://github.com/jwyang/JULE-Torch
Framework	torch

Rationale-Augmented Convolutional Neural Networks for Text Classification


Title	Rationale-Augmented Convolutional Neural Networks for Text Classification
Authors	Ye Zhang, Iain Marshall, Byron C. Wallace
Abstract	We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their component sentences. Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) that support their overall document categorization, i.e., they provide rationales. Our model exploits such supervision via a hierarchical approach in which each document is represented by a linear combination of the vector representations of its component sentences. We propose a sentence-level convolutional model that estimates the probability that a given sentence is a rationale, and we then scale the contribution of each sentence to the aggregate document representation in proportion to these estimates. Experiments on five classification datasets that have document labels and associated rationales demonstrate that our approach consistently outperforms strong baselines. Moreover, our model naturally provides explanations for its predictions.
Tasks	Text Classification
Published	2016-05-14
URL	http://arxiv.org/abs/1605.04469v3
PDF	http://arxiv.org/pdf/1605.04469v3.pdf
PWC	https://paperswithcode.com/paper/rationale-augmented-convolutional-neural
Repo	https://github.com/yezhang-xiaofan/Rationale-CNN
Framework	none

Ex Machina: Personal Attacks Seen at Scale


Title	Ex Machina: Personal Attacks Seen at Scale
Authors	Ellery Wulczyn, Nithum Thain, Lucas Dixon
Abstract	The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.
Tasks
Published	2016-10-27
URL	http://arxiv.org/abs/1610.08914v2
PDF	http://arxiv.org/pdf/1610.08914v2.pdf
PWC	https://paperswithcode.com/paper/ex-machina-personal-attacks-seen-at-scale
Repo	https://github.com/canonical-debate-lab/paper
Framework	none

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation


Title	Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation
Authors	Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, Aaron Courville
Abstract	We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study. On Twitter, the model appears to generate more relevant and on-topic responses according to automatic evaluation metrics. Finally, our experiments demonstrate that the proposed model is more adept at overcoming the sparsity of natural language and is better able to capture long-term structure.
Tasks	Dialogue Generation, Text Generation
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00776v2
PDF	http://arxiv.org/pdf/1606.00776v2.pdf
PWC	https://paperswithcode.com/paper/multiresolution-recurrent-neural-networks-an
Repo	https://github.com/julianser/hed-dlg-truncated
Framework	none

Joint Multimodal Learning with Deep Generative Models


Title	Joint Multimodal Learning with Deep Generative Models
Authors	Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo
Abstract	We investigate deep generative models that can exchange multiple modalities bi-directionally, e.g., generating images from corresponding texts and vice versa. Recently, some studies handle multiple modalities on deep generative models, such as variational autoencoders (VAEs). However, these models typically assume that modalities are forced to have a conditioned relation, i.e., we can only generate modalities in one direction. To achieve our objective, we should extract a joint representation that captures high-level concepts among all modalities and through which we can exchange them bi-directionally. As described herein, we propose a joint multimodal variational autoencoder (JMVAE), in which all modalities are independently conditioned on joint representation. In other words, it models a joint distribution of modalities. Furthermore, to be able to generate missing modalities from the remaining modalities properly, we develop an additional method, JMVAE-kl, that is trained by reducing the divergence between JMVAE’s encoder and prepared networks of respective modalities. Our experiments show that our proposed method can obtain appropriate joint representation from multiple modalities and that it can generate and reconstruct them more properly than conventional VAEs. We further demonstrate that JMVAE can generate multiple modalities bi-directionally.
Tasks
Published	2016-11-07
URL	http://arxiv.org/abs/1611.01891v1
PDF	http://arxiv.org/pdf/1611.01891v1.pdf
PWC	https://paperswithcode.com/paper/joint-multimodal-learning-with-deep
Repo	https://github.com/masa-su/Tars
Framework	pytorch

Revisiting Semi-Supervised Learning with Graph Embeddings


Title	Revisiting Semi-Supervised Learning with Graph Embeddings
Authors	Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Abstract	We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.
Tasks	Document Classification, Entity Extraction, Node Classification, Text Classification
Published	2016-03-29
URL	http://arxiv.org/abs/1603.08861v2
PDF	http://arxiv.org/pdf/1603.08861v2.pdf
PWC	https://paperswithcode.com/paper/revisiting-semi-supervised-learning-with
Repo	https://github.com/tkipf/ica
Framework	none

Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering


Title	Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Authors	Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst
Abstract	In this work, we are interested in generalizing convolutional neural networks (CNNs) from low-dimensional regular grids, where image, video and speech are represented, to high-dimensional irregular domains, such as social networks, brain connectomes or words’ embedding, represented by graphs. We present a formulation of CNNs in the context of spectral graph theory, which provides the necessary mathematical background and efficient numerical schemes to design fast localized convolutional filters on graphs. Importantly, the proposed technique offers the same linear computational complexity and constant learning complexity as classical CNNs, while being universal to any graph structure. Experiments on MNIST and 20NEWS demonstrate the ability of this novel deep learning system to learn local, stationary, and compositional features on graphs.
Tasks	Node Classification, Skeleton Based Action Recognition
Published	2016-06-30
URL	http://arxiv.org/abs/1606.09375v3
PDF	http://arxiv.org/pdf/1606.09375v3.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-on-graphs-with
Repo	https://github.com/SwissDataScienceCenter/DeepSphere
Framework	tf

The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM


Title	The Event-Camera Dataset and Simulator: Event-based Data for Pose Estimation, Visual Odometry, and SLAM
Authors	Elias Mueggler, Henri Rebecq, Guillermo Gallego, Tobi Delbruck, Davide Scaramuzza
Abstract	New vision sensors, such as the Dynamic and Active-pixel Vision sensor (DAVIS), incorporate a conventional global-shutter camera and an event-based sensor in the same pixel array. These sensors have great potential for high-speed robotics and computer vision because they allow us to combine the benefits of conventional cameras with those of event-based sensors: low latency, high temporal resolution, and very high dynamic range. However, new algorithms are required to exploit the sensor characteristics and cope with its unconventional output, which consists of a stream of asynchronous brightness changes (called “events”) and synchronous grayscale frames. For this purpose, we present and release a collection of datasets captured with a DAVIS in a variety of synthetic and real environments, which we hope will motivate research on new algorithms for high-speed and high-dynamic-range robotics and computer-vision applications. In addition to global-shutter intensity images and asynchronous events, we provide inertial measurements and ground-truth camera poses from a motion-capture system. The latter allows comparing the pose accuracy of ego-motion estimation algorithms quantitatively. All the data are released both as standard text files and binary files (i.e., rosbag). This paper provides an overview of the available data and describes a simulator that we release open-source to create synthetic event-camera data.
Tasks	Motion Capture, Motion Estimation, Pose Estimation, Visual Odometry
Published	2016-10-26
URL	http://arxiv.org/abs/1610.08336v4
PDF	http://arxiv.org/pdf/1610.08336v4.pdf
PWC	https://paperswithcode.com/paper/the-event-camera-dataset-and-simulator-event
Repo	https://github.com/uzh-rpg/rpg_davis_simulator
Framework	none

A Network-based End-to-End Trainable Task-oriented Dialogue System


Title	A Network-based End-to-End Trainable Task-oriented Dialogue System
Authors	Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, Steve Young
Abstract	Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.
Tasks	Task-Oriented Dialogue Systems
Published	2016-04-15
URL	http://arxiv.org/abs/1604.04562v3
PDF	http://arxiv.org/pdf/1604.04562v3.pdf
PWC	https://paperswithcode.com/paper/a-network-based-end-to-end-trainable-task
Repo	https://github.com/ysglh/Task-Oriented-Dialogue-Dataset-Survey
Framework	none

A Hierarchical Approach for Generating Descriptive Image Paragraphs


Title	A Hierarchical Approach for Generating Descriptive Image Paragraphs
Authors	Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei
Abstract	Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable to produce a coherent story for an image. In this paper we overcome these limitations by generating entire paragraphs for describing images, which can tell detailed, unified stories. We develop a model that decomposes both images and paragraphs into their constituent parts, detecting semantic regions in images and using a hierarchical recurrent neural network to reason about language. Linguistic analysis confirms the complexity of the paragraph generation task, and thorough experiments on a new dataset of image and paragraph pairs demonstrate the effectiveness of our approach.
Tasks	Image Captioning, Image Paragraph Captioning
Published	2016-11-20
URL	http://arxiv.org/abs/1611.06607v2
PDF	http://arxiv.org/pdf/1611.06607v2.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-approach-for-generating
Repo	https://github.com/InnerPeace-Wu/im2p-tensorflow
Framework	tf