May 7, 2019

2730 words 13 mins read

Paper Group AWR 76

A Deep Learning Approach to Unsupervised Ensemble Learning. Deep Feature Flow for Video Recognition. Memory-augmented Attention Modelling for Videos. Flood-Filling Networks. Cross-stitch Networks for Multi-task Learning. Semantic Understanding of Scenes through the ADE20K Dataset. Supervision via Competition: Robot Adversaries for Learning Tasks. I …

A Deep Learning Approach to Unsupervised Ensemble Learning


Title	A Deep Learning Approach to Unsupervised Ensemble Learning
Authors	Uri Shaham, Xiuyuan Cheng, Omer Dror, Ariel Jaffe, Boaz Nadler, Joseph Chang, Yuval Kluger
Abstract	We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is {\em equivalent} to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption.
Tasks
Published	2016-02-06
URL	http://arxiv.org/abs/1602.02285v1
PDF	http://arxiv.org/pdf/1602.02285v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-approach-to-unsupervised
Repo	https://github.com/ushaham/RBMpaper
Framework	none

Deep Feature Flow for Video Recognition


Title	Deep Feature Flow for Video Recognition
Authors	Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei
Abstract	Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two recent large scale video datasets. It makes a large step towards practical video recognition.
Tasks	Video Recognition
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07715v2
PDF	http://arxiv.org/pdf/1611.07715v2.pdf
PWC	https://paperswithcode.com/paper/deep-feature-flow-for-video-recognition
Repo	https://github.com/Scalsol/mega.pytorch
Framework	pytorch

Memory-augmented Attention Modelling for Videos


Title	Memory-augmented Attention Modelling for Videos
Authors	Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
Abstract	We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts. By storing past visual attention in the video associated to previously generated words, the system is able to decide what to look at and describe in light of what it has already looked at and described. This enables not only more effective local attention, but tractable consideration of the video sequence while generating each word. Evaluation on the challenging and popular MSVD and Charades datasets demonstrates that the proposed architecture outperforms previous video description approaches without requiring external temporal video features.
Tasks	Video Description
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02261v4
PDF	http://arxiv.org/pdf/1611.02261v4.pdf
PWC	https://paperswithcode.com/paper/memory-augmented-attention-modelling-for
Repo	https://github.com/rasoolfa/videocap
Framework	torch

Flood-Filling Networks


Title	Flood-Filling Networks
Authors	Michał Januszewski, Jeremy Maitin-Shepard, Peter Li, Jörgen Kornfeld, Winfried Denk, Viren Jain
Abstract	State-of-the-art image segmentation algorithms generally consist of at least two successive and distinct computations: a boundary detection process that uses local image information to classify image locations as boundaries between objects, followed by a pixel grouping step such as watershed or connected components that clusters pixels into segments. Prior work has varied the complexity and approach employed in these two steps, including the incorporation of multi-layer neural networks to perform boundary prediction, and the use of global optimizations during pixel clustering. We propose a unified and end-to-end trainable machine learning approach, flood-filling networks, in which a recurrent 3d convolutional network directly produces individual segments from a raw image. The proposed approach robustly segments images with an unknown and variable number of objects as well as highly variable object sizes. We demonstrate the approach on a challenging 3d image segmentation task, connectomic reconstruction from volume electron microscopy data, on which flood-filling neural networks substantially improve accuracy over other state-of-the-art methods. The proposed approach can replace complex multi-step segmentation pipelines with a single neural network that is learned end-to-end.
Tasks	Boundary Detection, Semantic Segmentation
Published	2016-11-01
URL	http://arxiv.org/abs/1611.00421v1
PDF	http://arxiv.org/pdf/1611.00421v1.pdf
PWC	https://paperswithcode.com/paper/flood-filling-networks
Repo	https://github.com/Animadversio/FloodFillNetwork-Notes
Framework	tf

Cross-stitch Networks for Multi-task Learning


Title	Cross-stitch Networks for Multi-task Learning
Authors	Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, Martial Hebert
Abstract	Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we propose a new sharing unit: “cross-stitch” unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.
Tasks	Multi-Task Learning
Published	2016-04-12
URL	http://arxiv.org/abs/1604.03539v1
PDF	http://arxiv.org/pdf/1604.03539v1.pdf
PWC	https://paperswithcode.com/paper/cross-stitch-networks-for-multi-task-learning
Repo	https://github.com/lorenmt/mtan
Framework	pytorch

Semantic Understanding of Scenes through the ADE20K Dataset


Title	Semantic Understanding of Scenes through the ADE20K Dataset
Authors	Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
Abstract	Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community’s efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A generic network design called Cascade Segmentation Module is then proposed to enable the segmentation networks to parse a scene into stuff, objects, and object parts in a cascade. We evaluate the proposed module integrated within two existing semantic segmentation networks, yielding significant improvements for scene parsing. We further show that the scene parsing networks trained on ADE20K can be applied to a wide variety of scenes and objects.
Tasks	Scene Parsing, Semantic Segmentation
Published	2016-08-18
URL	http://arxiv.org/abs/1608.05442v2
PDF	http://arxiv.org/pdf/1608.05442v2.pdf
PWC	https://paperswithcode.com/paper/semantic-understanding-of-scenes-through-the
Repo	https://github.com/rickyHong/PSPNet-tensorflow-repl
Framework	tf

Supervision via Competition: Robot Adversaries for Learning Tasks


Title	Supervision via Competition: Robot Adversaries for Learning Tasks
Authors	Lerrel Pinto, James Davidson, Abhinav Gupta
Abstract	There has been a recent paradigm shift in robotics to data-driven learning for planning and control. Due to large number of experiences required for training, most of these approaches use a self-supervised paradigm: using sensors to measure success/failure. However, in most cases, these sensors provide weak supervision at best. In this work, we propose an adversarial learning framework that pits an adversary against the robot learning the task. In an effort to defeat the adversary, the original robot learns to perform the task with more robustness leading to overall improved performance. We show that this adversarial framework forces the the robot to learn a better grasping model in order to overcome the adversary. By grasping 82% of presented novel objects compared to 68% without an adversary, we demonstrate the utility of creating adversaries. We also demonstrate via experiments that having robots in adversarial setting might be a better learning strategy as compared to having collaborative multiple robots.
Tasks
Published	2016-10-05
URL	http://arxiv.org/abs/1610.01685v1
PDF	http://arxiv.org/pdf/1610.01685v1.pdf
PWC	https://paperswithcode.com/paper/supervision-via-competition-robot-adversaries
Repo	https://github.com/hudongrui/2018SU_Frank_Zihan
Framework	tf

Improving Deep Neural Network with Multiple Parametric Exponential Linear Units


Title	Improving Deep Neural Network with Multiple Parametric Exponential Linear Units
Authors	Yang Li, Chunxiao Fan, Yong Li, Qiong Wu, Yue Ming
Abstract	Activation function is crucial to the recent successes of deep neural networks. In this paper, we first propose a new activation function, Multiple Parametric Exponential Linear Units (MPELU), aiming to generalize and unify the rectified and exponential linear units. As the generalized form, MPELU shares the advantages of Parametric Rectified Linear Unit (PReLU) and Exponential Linear Unit (ELU), leading to better classification performance and convergence property. In addition, weight initialization is very important to train very deep networks. The existing methods laid a solid foundation for networks using rectified linear units but not for exponential linear units. This paper complements the current theory and extends it to the wider range. Specifically, we put forward a way of initialization, enabling training of very deep networks using exponential linear units. Experiments demonstrate that the proposed initialization not only helps the training process but leads to better generalization performance. Finally, utilizing the proposed activation function and initialization, we present a deep MPELU residual architecture that achieves state-of-the-art performance on the CIFAR-10/100 datasets. The code is available at https://github.com/Coldmooon/Code-for-MPELU.
Tasks
Published	2016-06-01
URL	http://arxiv.org/abs/1606.00305v3
PDF	http://arxiv.org/pdf/1606.00305v3.pdf
PWC	https://paperswithcode.com/paper/improving-deep-neural-network-with-multiple
Repo	https://github.com/Coldmooon/Code-for-MPELU
Framework	torch

Temporal Convolutional Networks for Action Segmentation and Detection


Title	Temporal Convolutional Networks for Action Segmentation and Detection
Authors	Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, Gregory D. Hager
Abstract	The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.
Tasks	action segmentation, Skeleton Based Action Recognition
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05267v1
PDF	http://arxiv.org/pdf/1611.05267v1.pdf
PWC	https://paperswithcode.com/paper/temporal-convolutional-networks-for-action
Repo	https://github.com/coderSkyChen/Action_Recognition_Zoo
Framework	tf

Fathom: Reference Workloads for Modern Deep Learning Methods


Title	Fathom: Reference Workloads for Modern Deep Learning Methods
Authors	Robert Adolf, Saketh Rama, Brandon Reagen, Gu-Yeon Wei, David Brooks
Abstract	Deep learning has been popularized by its recent successes on challenging artificial intelligence problems. One of the reasons for its dominance is also an ongoing challenge: the need for immense amounts of computational power. Hardware architects have responded by proposing a wide array of promising ideas, but to date, the majority of the work has focused on specific algorithms in somewhat narrow application domains. While their specificity does not diminish these approaches, there is a clear need for more flexible solutions. We believe the first step is to examine the characteristics of cutting edge models from across the deep learning community. Consequently, we have assembled Fathom: a collection of eight archetypal deep learning workloads for study. Each of these models comes from a seminal work in the deep learning community, ranging from the familiar deep convolutional neural network of Krizhevsky et al., to the more exotic memory networks from Facebook’s AI research group. Fathom has been released online, and this paper focuses on understanding the fundamental performance characteristics of each model. We use a set of application-level modeling tools built around the TensorFlow deep learning framework in order to analyze the behavior of the Fathom workloads. We present a breakdown of where time is spent, the similarities between the performance profiles of our models, an analysis of behavior in inference and training, and the effects of parallelism on scaling.
Tasks
Published	2016-08-23
URL	http://arxiv.org/abs/1608.06581v1
PDF	http://arxiv.org/pdf/1608.06581v1.pdf
PWC	https://paperswithcode.com/paper/fathom-reference-workloads-for-modern-deep
Repo	https://github.com/rdadolf/fathom
Framework	tf

Building an Interpretable Recommender via Loss-Preserving Transformation


Title	Building an Interpretable Recommender via Loss-Preserving Transformation
Authors	Amit Dhurandhar, Sechan Oh, Marek Petrik
Abstract	We propose a method for building an interpretable recommender system for personalizing online content and promotions. Historical data available for the system consists of customer features, provided content (promotions), and user responses. Unlike in a standard multi-class classification setting, misclassification costs depend on both recommended actions and customers. Our method transforms such a data set to a new set which can be used with standard interpretable multi-class classification algorithms. The transformation has the desirable property that minimizing the standard misclassification penalty in this new space is equivalent to minimizing the custom cost function.
Tasks	Recommendation Systems
Published	2016-06-19
URL	http://arxiv.org/abs/1606.05819v1
PDF	http://arxiv.org/pdf/1606.05819v1.pdf
PWC	https://paperswithcode.com/paper/building-an-interpretable-recommender-via
Repo	https://github.com/dmalagarriga/Interpretable-ML-literature
Framework	none

Stochastic Structured Prediction under Bandit Feedback


Title	Stochastic Structured Prediction under Bandit Feedback
Authors	Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler
Abstract	Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.
Tasks	Structured Prediction
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00739v2
PDF	http://arxiv.org/pdf/1606.00739v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-structured-prediction-under-bandit
Repo	https://github.com/juliakreutzer/bandit-cdec
Framework	none

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions


Title	3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions
Authors	Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser
Abstract	Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. Code, data, benchmarks, and pre-trained models are available online at http://3dmatch.cs.princeton.edu
Tasks	3D Reconstruction
Published	2016-03-27
URL	http://arxiv.org/abs/1603.08182v3
PDF	http://arxiv.org/pdf/1603.08182v3.pdf
PWC	https://paperswithcode.com/paper/3dmatch-learning-local-geometric-descriptors
Repo	https://github.com/andyzeng/3dmatch-toolbox
Framework	none

Neural Word Segmentation Learning for Chinese


Title	Neural Word Segmentation Learning for Chinese
Authors	Deng Cai, Hai Zhao
Abstract	Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured. In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history. Our model employs a gated combination neural network over characters to produce distributed representations of word candidates, which are then given to a long short-term memory (LSTM) language scoring model. Experiments on the benchmark datasets show that without the help of feature engineering as most existing approaches, our models achieve competitive or better performances with previous state-of-the-art methods.
Tasks	Chinese Word Segmentation, Feature Engineering
Published	2016-06-14
URL	http://arxiv.org/abs/1606.04300v2
PDF	http://arxiv.org/pdf/1606.04300v2.pdf
PWC	https://paperswithcode.com/paper/neural-word-segmentation-learning-for-chinese
Repo	https://github.com/jcyk/CWS
Framework	none

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks


Title	Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks
Authors	Ji Young Lee, Franck Dernoncourt
Abstract	Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction.
Tasks	Text Classification
Published	2016-03-12
URL	http://arxiv.org/abs/1603.03827v1
PDF	http://arxiv.org/pdf/1603.03827v1.pdf
PWC	https://paperswithcode.com/paper/sequential-short-text-classification-with
Repo	https://github.com/Franck-Dernoncourt/naacl2016
Framework	none