May 7, 2019

2730 words 13 mins read

Paper Group AWR 76

Paper Group AWR 76

A Deep Learning Approach to Unsupervised Ensemble Learning. Deep Feature Flow for Video Recognition. Memory-augmented Attention Modelling for Videos. Flood-Filling Networks. Cross-stitch Networks for Multi-task Learning. Semantic Understanding of Scenes through the ADE20K Dataset. Supervision via Competition: Robot Adversaries for Learning Tasks. I …

A Deep Learning Approach to Unsupervised Ensemble Learning

Title A Deep Learning Approach to Unsupervised Ensemble Learning
Authors Uri Shaham, Xiuyuan Cheng, Omer Dror, Ariel Jaffe, Boaz Nadler, Joseph Chang, Yuval Kluger
Abstract We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is {\em equivalent} to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption.
Tasks
Published 2016-02-06
URL http://arxiv.org/abs/1602.02285v1
PDF http://arxiv.org/pdf/1602.02285v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-approach-to-unsupervised
Repo https://github.com/ushaham/RBMpaper
Framework none

Deep Feature Flow for Video Recognition

Title Deep Feature Flow for Video Recognition
Authors Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei
Abstract Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two recent large scale video datasets. It makes a large step towards practical video recognition.
Tasks Video Recognition
Published 2016-11-23
URL http://arxiv.org/abs/1611.07715v2
PDF http://arxiv.org/pdf/1611.07715v2.pdf
PWC https://paperswithcode.com/paper/deep-feature-flow-for-video-recognition
Repo https://github.com/Scalsol/mega.pytorch
Framework pytorch

Memory-augmented Attention Modelling for Videos

Title Memory-augmented Attention Modelling for Videos
Authors Rasool Fakoor, Abdel-rahman Mohamed, Margaret Mitchell, Sing Bing Kang, Pushmeet Kohli
Abstract We present a method to improve video description generation by modeling higher-order interactions between video frames and described concepts. By storing past visual attention in the video associated to previously generated words, the system is able to decide what to look at and describe in light of what it has already looked at and described. This enables not only more effective local attention, but tractable consideration of the video sequence while generating each word. Evaluation on the challenging and popular MSVD and Charades datasets demonstrates that the proposed architecture outperforms previous video description approaches without requiring external temporal video features.
Tasks Video Description
Published 2016-11-07
URL http://arxiv.org/abs/1611.02261v4
PDF http://arxiv.org/pdf/1611.02261v4.pdf
PWC https://paperswithcode.com/paper/memory-augmented-attention-modelling-for
Repo https://github.com/rasoolfa/videocap
Framework torch

Flood-Filling Networks

Title Flood-Filling Networks
Authors Michał Januszewski, Jeremy Maitin-Shepard, Peter Li, Jörgen Kornfeld, Winfried Denk, Viren Jain
Abstract State-of-the-art image segmentation algorithms generally consist of at least two successive and distinct computations: a boundary detection process that uses local image information to classify image locations as boundaries between objects, followed by a pixel grouping step such as watershed or connected components that clusters pixels into segments. Prior work has varied the complexity and approach employed in these two steps, including the incorporation of multi-layer neural networks to perform boundary prediction, and the use of global optimizations during pixel clustering. We propose a unified and end-to-end trainable machine learning approach, flood-filling networks, in which a recurrent 3d convolutional network directly produces individual segments from a raw image. The proposed approach robustly segments images with an unknown and variable number of objects as well as highly variable object sizes. We demonstrate the approach on a challenging 3d image segmentation task, connectomic reconstruction from volume electron microscopy data, on which flood-filling neural networks substantially improve accuracy over other state-of-the-art methods. The proposed approach can replace complex multi-step segmentation pipelines with a single neural network that is learned end-to-end.
Tasks Boundary Detection, Semantic Segmentation
Published 2016-11-01
URL http://arxiv.org/abs/1611.00421v1
PDF http://arxiv.org/pdf/1611.00421v1.pdf
PWC https://paperswithcode.com/paper/flood-filling-networks
Repo https://github.com/Animadversio/FloodFillNetwork-Notes
Framework tf

Cross-stitch Networks for Multi-task Learning

Title Cross-stitch Networks for Multi-task Learning
Authors Ishan Misra, Abhinav Shrivastava, Abhinav Gupta, Martial Hebert
Abstract Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we propose a new sharing unit: “cross-stitch” unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.
Tasks Multi-Task Learning
Published 2016-04-12
URL http://arxiv.org/abs/1604.03539v1
PDF http://arxiv.org/pdf/1604.03539v1.pdf
PWC https://paperswithcode.com/paper/cross-stitch-networks-for-multi-task-learning
Repo https://github.com/lorenmt/mtan
Framework pytorch

Semantic Understanding of Scenes through the ADE20K Dataset

Title Semantic Understanding of Scenes through the ADE20K Dataset
Authors Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, Antonio Torralba
Abstract Scene parsing, or recognizing and segmenting objects and stuff in an image, is one of the key problems in computer vision. Despite the community’s efforts in data collection, there are still few image datasets covering a wide range of scenes and object categories with dense and detailed annotations for scene parsing. In this paper, we introduce and analyze the ADE20K dataset, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A generic network design called Cascade Segmentation Module is then proposed to enable the segmentation networks to parse a scene into stuff, objects, and object parts in a cascade. We evaluate the proposed module integrated within two existing semantic segmentation networks, yielding significant improvements for scene parsing. We further show that the scene parsing networks trained on ADE20K can be applied to a wide variety of scenes and objects.
Tasks Scene Parsing, Semantic Segmentation
Published 2016-08-18
URL http://arxiv.org/abs/1608.05442v2
PDF http://arxiv.org/pdf/1608.05442v2.pdf
PWC https://paperswithcode.com/paper/semantic-understanding-of-scenes-through-the
Repo https://github.com/rickyHong/PSPNet-tensorflow-repl
Framework tf

Supervision via Competition: Robot Adversaries for Learning Tasks

Title Supervision via Competition: Robot Adversaries for Learning Tasks
Authors Lerrel Pinto, James Davidson, Abhinav Gupta
Abstract There has been a recent paradigm shift in robotics to data-driven learning for planning and control. Due to large number of experiences required for training, most of these approaches use a self-supervised paradigm: using sensors to measure success/failure. However, in most cases, these sensors provide weak supervision at best. In this work, we propose an adversarial learning framework that pits an adversary against the robot learning the task. In an effort to defeat the adversary, the original robot learns to perform the task with more robustness leading to overall improved performance. We show that this adversarial framework forces the the robot to learn a better grasping model in order to overcome the adversary. By grasping 82% of presented novel objects compared to 68% without an adversary, we demonstrate the utility of creating adversaries. We also demonstrate via experiments that having robots in adversarial setting might be a better learning strategy as compared to having collaborative multiple robots.
Tasks
Published 2016-10-05
URL http://arxiv.org/abs/1610.01685v1
PDF http://arxiv.org/pdf/1610.01685v1.pdf
PWC https://paperswithcode.com/paper/supervision-via-competition-robot-adversaries
Repo https://github.com/hudongrui/2018SU_Frank_Zihan
Framework tf

Improving Deep Neural Network with Multiple Parametric Exponential Linear Units

Title Improving Deep Neural Network with Multiple Parametric Exponential Linear Units
Authors Yang Li, Chunxiao Fan, Yong Li, Qiong Wu, Yue Ming
Abstract Activation function is crucial to the recent successes of deep neural networks. In this paper, we first propose a new activation function, Multiple Parametric Exponential Linear Units (MPELU), aiming to generalize and unify the rectified and exponential linear units. As the generalized form, MPELU shares the advantages of Parametric Rectified Linear Unit (PReLU) and Exponential Linear Unit (ELU), leading to better classification performance and convergence property. In addition, weight initialization is very important to train very deep networks. The existing methods laid a solid foundation for networks using rectified linear units but not for exponential linear units. This paper complements the current theory and extends it to the wider range. Specifically, we put forward a way of initialization, enabling training of very deep networks using exponential linear units. Experiments demonstrate that the proposed initialization not only helps the training process but leads to better generalization performance. Finally, utilizing the proposed activation function and initialization, we present a deep MPELU residual architecture that achieves state-of-the-art performance on the CIFAR-10/100 datasets. The code is available at https://github.com/Coldmooon/Code-for-MPELU.
Tasks
Published 2016-06-01
URL http://arxiv.org/abs/1606.00305v3
PDF http://arxiv.org/pdf/1606.00305v3.pdf
PWC https://paperswithcode.com/paper/improving-deep-neural-network-with-multiple
Repo https://github.com/Coldmooon/Code-for-MPELU
Framework torch

Temporal Convolutional Networks for Action Segmentation and Detection

Title Temporal Convolutional Networks for Action Segmentation and Detection
Authors Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, Gregory D. Hager
Abstract The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.
Tasks action segmentation, Skeleton Based Action Recognition
Published 2016-11-16
URL http://arxiv.org/abs/1611.05267v1
PDF http://arxiv.org/pdf/1611.05267v1.pdf
PWC https://paperswithcode.com/paper/temporal-convolutional-networks-for-action
Repo https://github.com/coderSkyChen/Action_Recognition_Zoo
Framework tf

Fathom: Reference Workloads for Modern Deep Learning Methods

Title Fathom: Reference Workloads for Modern Deep Learning Methods
Authors Robert Adolf, Saketh Rama, Brandon Reagen, Gu-Yeon Wei, David Brooks
Abstract Deep learning has been popularized by its recent successes on challenging artificial intelligence problems. One of the reasons for its dominance is also an ongoing challenge: the need for immense amounts of computational power. Hardware architects have responded by proposing a wide array of promising ideas, but to date, the majority of the work has focused on specific algorithms in somewhat narrow application domains. While their specificity does not diminish these approaches, there is a clear need for more flexible solutions. We believe the first step is to examine the characteristics of cutting edge models from across the deep learning community. Consequently, we have assembled Fathom: a collection of eight archetypal deep learning workloads for study. Each of these models comes from a seminal work in the deep learning community, ranging from the familiar deep convolutional neural network of Krizhevsky et al., to the more exotic memory networks from Facebook’s AI research group. Fathom has been released online, and this paper focuses on understanding the fundamental performance characteristics of each model. We use a set of application-level modeling tools built around the TensorFlow deep learning framework in order to analyze the behavior of the Fathom workloads. We present a breakdown of where time is spent, the similarities between the performance profiles of our models, an analysis of behavior in inference and training, and the effects of parallelism on scaling.
Tasks
Published 2016-08-23
URL http://arxiv.org/abs/1608.06581v1
PDF http://arxiv.org/pdf/1608.06581v1.pdf
PWC https://paperswithcode.com/paper/fathom-reference-workloads-for-modern-deep
Repo https://github.com/rdadolf/fathom
Framework tf

Building an Interpretable Recommender via Loss-Preserving Transformation

Title Building an Interpretable Recommender via Loss-Preserving Transformation
Authors Amit Dhurandhar, Sechan Oh, Marek Petrik
Abstract We propose a method for building an interpretable recommender system for personalizing online content and promotions. Historical data available for the system consists of customer features, provided content (promotions), and user responses. Unlike in a standard multi-class classification setting, misclassification costs depend on both recommended actions and customers. Our method transforms such a data set to a new set which can be used with standard interpretable multi-class classification algorithms. The transformation has the desirable property that minimizing the standard misclassification penalty in this new space is equivalent to minimizing the custom cost function.
Tasks Recommendation Systems
Published 2016-06-19
URL http://arxiv.org/abs/1606.05819v1
PDF http://arxiv.org/pdf/1606.05819v1.pdf
PWC https://paperswithcode.com/paper/building-an-interpretable-recommender-via
Repo https://github.com/dmalagarriga/Interpretable-ML-literature
Framework none

Stochastic Structured Prediction under Bandit Feedback

Title Stochastic Structured Prediction under Bandit Feedback
Authors Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler
Abstract Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analyze them as stochastic first-order methods. We present an experimental evaluation on problems of natural language processing over exponential output spaces, and compare convergence speed across different objectives under the practical criterion of optimal task performance on development data and the optimization-theoretic criterion of minimal squared gradient norm. Best results under both criteria are obtained for a non-convex objective for pairwise preference learning under bandit feedback.
Tasks Structured Prediction
Published 2016-06-02
URL http://arxiv.org/abs/1606.00739v2
PDF http://arxiv.org/pdf/1606.00739v2.pdf
PWC https://paperswithcode.com/paper/stochastic-structured-prediction-under-bandit
Repo https://github.com/juliakreutzer/bandit-cdec
Framework none

3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions

Title 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions
Authors Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao, Thomas Funkhouser
Abstract Matching local geometric features on real-world depth images is a challenging task due to the noisy, low-resolution, and incomplete nature of 3D scan data. These difficulties limit the performance of current state-of-art methods, which are typically based on histograms over geometric properties. In this paper, we present 3DMatch, a data-driven model that learns a local volumetric patch descriptor for establishing correspondences between partial 3D data. To amass training data for our model, we propose a self-supervised feature learning method that leverages the millions of correspondence labels found in existing RGB-D reconstructions. Experiments show that our descriptor is not only able to match local geometry in new scenes for reconstruction, but also generalize to different tasks and spatial scales (e.g. instance-level object model alignment for the Amazon Picking Challenge, and mesh surface correspondence). Results show that 3DMatch consistently outperforms other state-of-the-art approaches by a significant margin. Code, data, benchmarks, and pre-trained models are available online at http://3dmatch.cs.princeton.edu
Tasks 3D Reconstruction
Published 2016-03-27
URL http://arxiv.org/abs/1603.08182v3
PDF http://arxiv.org/pdf/1603.08182v3.pdf
PWC https://paperswithcode.com/paper/3dmatch-learning-local-geometric-descriptors
Repo https://github.com/andyzeng/3dmatch-toolbox
Framework none

Neural Word Segmentation Learning for Chinese

Title Neural Word Segmentation Learning for Chinese
Authors Deng Cai, Hai Zhao
Abstract Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured. In this paper, we propose a novel neural framework which thoroughly eliminates context windows and can utilize complete segmentation history. Our model employs a gated combination neural network over characters to produce distributed representations of word candidates, which are then given to a long short-term memory (LSTM) language scoring model. Experiments on the benchmark datasets show that without the help of feature engineering as most existing approaches, our models achieve competitive or better performances with previous state-of-the-art methods.
Tasks Chinese Word Segmentation, Feature Engineering
Published 2016-06-14
URL http://arxiv.org/abs/1606.04300v2
PDF http://arxiv.org/pdf/1606.04300v2.pdf
PWC https://paperswithcode.com/paper/neural-word-segmentation-learning-for-chinese
Repo https://github.com/jcyk/CWS
Framework none

Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks

Title Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks
Authors Ji Young Lee, Franck Dernoncourt
Abstract Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction.
Tasks Text Classification
Published 2016-03-12
URL http://arxiv.org/abs/1603.03827v1
PDF http://arxiv.org/pdf/1603.03827v1.pdf
PWC https://paperswithcode.com/paper/sequential-short-text-classification-with
Repo https://github.com/Franck-Dernoncourt/naacl2016
Framework none
comments powered by Disqus