July 30, 2019

3241 words 16 mins read

Paper Group AWR 8

Paper Group AWR 8

Class Rectification Hard Mining for Imbalanced Deep Learning. Adaptive Low-Rank Kernel Subspace Clustering. Optimal deep neural networks for sparse recovery via Laplace techniques. Automated Scalable Bayesian Inference via Hilbert Coresets. PDE-Net: Learning PDEs from Data. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Vi …

Class Rectification Hard Mining for Imbalanced Deep Learning

Title Class Rectification Hard Mining for Imbalanced Deep Learning
Authors Qi Dong, Shaogang Gong, Xiatian Zhu
Abstract Recognising detailed facial or clothing attributes in images of people is a challenging task for computer vision, especially when the training data are both in very large scale and extremely imbalanced among different attribute classes. To address this problem, we formulate a novel scheme for batch incremental hard sample mining of minority attribute classes from imbalanced large scale training data. We develop an end-to-end deep learning framework capable of avoiding the dominant effect of majority classes by discovering sparsely sampled boundaries of minority classes. This is made possible by introducing a Class Rectification Loss (CRL) regularising algorithm. We demonstrate the advantages and scalability of CRL over existing state-of-the-art attribute recognition and imbalanced data learning models on two large scale imbalanced benchmark datasets, the CelebA facial attribute dataset and the X-Domain clothing attribute dataset.
Tasks
Published 2017-12-08
URL http://arxiv.org/abs/1712.03162v1
PDF http://arxiv.org/pdf/1712.03162v1.pdf
PWC https://paperswithcode.com/paper/class-rectification-hard-mining-for
Repo https://github.com/JoyLuo/face-attribute-recognition-paper-list
Framework none

Adaptive Low-Rank Kernel Subspace Clustering

Title Adaptive Low-Rank Kernel Subspace Clustering
Authors Pan Ji, Ian Reid, Ravi Garg, Hongdong Li, Mathieu Salzmann
Abstract In this paper, we present a kernel subspace clustering method that can handle non-linear models. In contrast to recent kernel subspace clustering methods which use predefined kernels, we propose to learn a low-rank kernel matrix, with which mapped data in feature space are not only low-rank but also self-expressive. In this manner, the low-dimensional subspace structures of the (implicitly) mapped data are retained and manifested in the high-dimensional feature space. We evaluate the proposed method extensively on both motion segmentation and image clustering benchmarks, and obtain superior results, outperforming the kernel subspace clustering method that uses standard kernels[Patel 2014] and other state-of-the-art linear subspace clustering methods.
Tasks Image Clustering, Motion Segmentation
Published 2017-07-17
URL http://arxiv.org/abs/1707.04974v4
PDF http://arxiv.org/pdf/1707.04974v4.pdf
PWC https://paperswithcode.com/paper/adaptive-low-rank-kernel-subspace-clustering
Repo https://github.com/panji1990/Low-rank-kernel-subspace-clustering
Framework none

Optimal deep neural networks for sparse recovery via Laplace techniques

Title Optimal deep neural networks for sparse recovery via Laplace techniques
Authors Steffen Limmer, Slawomir Stanczak
Abstract This paper introduces Laplace techniques for designing a neural network, with the goal of estimating simplex-constraint sparse vectors from compressed measurements. To this end, we recast the problem of MMSE estimation (w.r.t. a pre-defined uniform input distribution) as the problem of computing the centroid of some polytope that results from the intersection of the simplex and an affine subspace determined by the measurements. Owing to the specific structure, it is shown that the centroid can be computed analytically by extending a recent result that facilitates the volume computation of polytopes via Laplace transformations. A main insight of this paper is that the desired volume and centroid computations can be performed by a classical deep neural network comprising threshold functions, rectified linear (ReLU) and rectified polynomial (ReP) activation functions. The proposed construction of a deep neural network for sparse recovery is completely analytic so that time-consuming training procedures are not necessary. Furthermore, we show that the number of layers in our construction is equal to the number of measurements which might enable novel low-latency sparse recovery algorithms for a larger class of signals than that assumed in this paper. To assess the applicability of the proposed uniform input distribution, we showcase the recovery performance on samples that are soft-classification vectors generated by two standard datasets. As both volume and centroid computation are known to be computationally hard, the network width grows exponentially in the worst-case. It can be, however, decreased by inducing sparse connectivity in the neural network via a well-suited basis of the affine subspace. Finally, the presented analytical construction may serve as a viable initialization to be further optimized and trained using particular input datasets at hand.
Tasks
Published 2017-09-04
URL http://arxiv.org/abs/1709.01112v2
PDF http://arxiv.org/pdf/1709.01112v2.pdf
PWC https://paperswithcode.com/paper/optimal-deep-neural-networks-for-sparse
Repo https://github.com/stli/CentNet
Framework none

Automated Scalable Bayesian Inference via Hilbert Coresets

Title Automated Scalable Bayesian Inference via Hilbert Coresets
Authors Trevor Campbell, Tamara Broderick
Abstract The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. Building on the Bayesian coresets framework, this work instead takes advantage of data redundancy to shrink the dataset itself as a preprocessing step, providing fully-automated, scalable Bayesian inference with theoretical guarantees. We begin with an intuitive reformulation of Bayesian coreset construction as sparse vector sum approximation, and demonstrate that its automation and performance-based shortcomings arise from the use of the supremum norm. To address these shortcomings we develop Hilbert coresets, i.e., Bayesian coresets constructed under a norm induced by an inner-product on the log-likelihood function space. We propose two Hilbert coreset construction algorithms—one based on importance sampling, and one based on the Frank-Wolfe algorithm—along with theoretical guarantees on approximation quality as a function of coreset size. Since the exact computation of the proposed inner-products is model-specific, we automate the construction with a random finite-dimensional projection of the log-likelihood functions. The resulting automated coreset construction algorithm is simple to implement, and experiments on a variety of models with real and synthetic datasets show that it provides high-quality posterior approximations and a significant reduction in the computational cost of inference.
Tasks Bayesian Inference
Published 2017-10-13
URL http://arxiv.org/abs/1710.05053v2
PDF http://arxiv.org/pdf/1710.05053v2.pdf
PWC https://paperswithcode.com/paper/automated-scalable-bayesian-inference-via
Repo https://github.com/trevorcampbell/bayesian-coresets
Framework none

PDE-Net: Learning PDEs from Data

Title PDE-Net: Learning PDEs from Data
Authors Zichao Long, Yiping Lu, Xianzhong Ma, Bin Dong
Abstract In this paper, we present an initial attempt to learn evolution PDEs from data. Inspired by the latest development of neural network designs in deep learning, we propose a new feed-forward deep network, called PDE-Net, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. The basic idea of the proposed PDE-Net is to learn differential operators by learning convolution kernels (filters), and apply neural networks or other machine learning methods to approximate the unknown nonlinear responses. Comparing with existing approaches, which either assume the form of the nonlinear response is known or fix certain finite difference approximations of differential operators, our approach has the most flexibility by learning both differential operators and the nonlinear responses. A special feature of the proposed PDE-Net is that all filters are properly constrained, which enables us to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. These constrains are carefully designed by fully exploiting the relation between the orders of differential operators and the orders of sum rules of filters (an important concept originated from wavelet theory). We also discuss relations of the PDE-Net with some existing networks in computer vision such as Network-In-Network (NIN) and Residual Neural Network (ResNet). Numerical experiments show that the PDE-Net has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.
Tasks
Published 2017-10-26
URL http://arxiv.org/abs/1710.09668v2
PDF http://arxiv.org/pdf/1710.09668v2.pdf
PWC https://paperswithcode.com/paper/pde-net-learning-pdes-from-data
Repo https://github.com/ZichaoLong/aTEAM
Framework pytorch

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

Title Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
Authors Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz
Abstract Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent video sequences. While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame video interpolation, where the motion interpretation and occlusion reasoning are jointly modeled. We start by computing bi-directional optical flow between the input images using a U-Net architecture. These flows are then linearly combined at each time step to approximate the intermediate bi-directional optical flows. These approximate flows, however, only work well in locally smooth regions and produce artifacts around motion boundaries. To address this shortcoming, we employ another U-Net to refine the approximated flow and also predict soft visibility maps. Finally, the two input images are warped and linearly fused to form each intermediate frame. By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed. We use 1,132 video clips with 240-fps, containing 300K individual video frames, to train our network. Experimental results on several datasets, predicting different numbers of interpolated frames, demonstrate that our approach performs consistently better than existing methods.
Tasks Optical Flow Estimation, Video Frame Interpolation
Published 2017-11-30
URL http://arxiv.org/abs/1712.00080v2
PDF http://arxiv.org/pdf/1712.00080v2.pdf
PWC https://paperswithcode.com/paper/super-slomo-high-quality-estimation-of
Repo https://github.com/susomena/DeepSlowMotion
Framework tf

Matterport3D: Learning from RGB-D Data in Indoor Environments

Title Matterport3D: Learning from RGB-D Data in Indoor Environments
Authors Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
Abstract Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms. However, existing datasets still cover only a limited number of views or a restricted scale of spaces. In this paper, we introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification.
Tasks Scene Understanding, Semantic Segmentation
Published 2017-09-18
URL http://arxiv.org/abs/1709.06158v1
PDF http://arxiv.org/pdf/1709.06158v1.pdf
PWC https://paperswithcode.com/paper/matterport3d-learning-from-rgb-d-data-in
Repo https://github.com/niessner/Matterport
Framework none

Crowdsourcing Question-Answer Meaning Representations

Title Crowdsourcing Question-Answer Meaning Representations
Authors Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke Zettlemoyer
Abstract We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs. We also develop a crowdsourcing scheme to show that QAMRs can be labeled with very little training, and gather a dataset with over 5,000 sentences and 100,000 questions. A detailed qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets (including PropBank, NomBank, QA-SRL, and AMR) along with many previously under-resourced ones, including implicit arguments and relations. The QAMR data and annotation code is made publicly available to enable future work on how best to model these complex phenomena.
Tasks
Published 2017-11-16
URL http://arxiv.org/abs/1711.05885v1
PDF http://arxiv.org/pdf/1711.05885v1.pdf
PWC https://paperswithcode.com/paper/crowdsourcing-question-answer-meaning
Repo https://github.com/uwnlp/qamr
Framework none

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Title TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
Authors Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
Abstract High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence. Our experiments show that applying TernGrad on AlexNet does not incur any accuracy loss and can even improve accuracy. The accuracy loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a performance model is proposed to study the scalability of TernGrad. Experiments show significant speed gains for various deep neural networks. Our source code is available.
Tasks
Published 2017-05-22
URL http://arxiv.org/abs/1705.07878v6
PDF http://arxiv.org/pdf/1705.07878v6.pdf
PWC https://paperswithcode.com/paper/terngrad-ternary-gradients-to-reduce
Repo https://github.com/wenwei202/terngrad
Framework tf

Progressive Neural Networks for Transfer Learning in Emotion Recognition

Title Progressive Neural Networks for Transfer Learning in Emotion Recognition
Authors John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
Abstract Many paralinguistic tasks are closely related and thus representations learned in one domain can be leveraged for another. In this paper, we investigate how knowledge can be transferred between three paralinguistic tasks: speaker, emotion, and gender recognition. Further, we extend this problem to cross-dataset tasks, asking how knowledge captured in one emotion dataset can be transferred to another. We focus on progressive neural networks and compare these networks to the conventional deep learning method of pre-training and fine-tuning. Progressive neural networks provide a way to transfer knowledge and avoid the forgetting effect present when pre-training neural networks on different tasks. Our experiments demonstrate that: (1) emotion recognition can benefit from using representations originally learned for different paralinguistic tasks and (2) transfer learning can effectively leverage additional datasets to improve the performance of emotion recognition systems.
Tasks Emotion Recognition, Transfer Learning
Published 2017-06-10
URL http://arxiv.org/abs/1706.03256v1
PDF http://arxiv.org/pdf/1706.03256v1.pdf
PWC https://paperswithcode.com/paper/progressive-neural-networks-for-transfer
Repo https://github.com/zbyte64/pytorch-dagsearch
Framework pytorch

Punny Captions: Witty Wordplay in Image Descriptions

Title Punny Captions: Witty Wordplay in Image Descriptions
Authors Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
Abstract Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.
Tasks
Published 2017-04-26
URL http://arxiv.org/abs/1704.08224v2
PDF http://arxiv.org/pdf/1704.08224v2.pdf
PWC https://paperswithcode.com/paper/punny-captions-witty-wordplay-in-image
Repo https://github.com/purvaten/punny_captions
Framework tf

Predicting Driver Attention in Critical Situations

Title Predicting Driver Attention in Critical Situations
Authors Ye Xia, Danqing Zhang, Jinkyu Kim, Ken Nakayama, Karl Zipser, David Whitney
Abstract Robust driver attention prediction for critical situations is a challenging computer vision problem, yet essential for autonomous driving. Because critical driving moments are so rare, collecting enough data for these situations is difficult with the conventional in-car data collection protocol—tracking eye movements during driving. Here, we first propose a new in-lab driver attention collection protocol and introduce a new driver attention dataset, Berkeley DeepDrive Attention (BDD-A) dataset, which is built upon braking event videos selected from a large-scale, crowd-sourced driving video dataset. We further propose Human Weighted Sampling (HWS) method, which uses human gaze behavior to identify crucial frames of a driving dataset and weights them heavily during model training. With our dataset and HWS, we built a driver attention prediction model that outperforms the state-of-the-art and demonstrates sophisticated behaviors, like attending to crossing pedestrians but not giving false alarms to pedestrians safely walking on the sidewalk. Its prediction results are nearly indistinguishable from ground-truth to humans. Although only being trained with our in-lab attention data, the model also predicts in-car driver attention data of routine driving with state-of-the-art accuracy. This result not only demonstrates the performance of our model but also proves the validity and usefulness of our dataset and data collection protocol.
Tasks Autonomous Driving, Driver Attention Monitoring
Published 2017-11-17
URL http://arxiv.org/abs/1711.06406v3
PDF http://arxiv.org/pdf/1711.06406v3.pdf
PWC https://paperswithcode.com/paper/predicting-driver-attention-in-critical
Repo https://github.com/pascalxia/driver_attention_prediction
Framework tf

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

Title Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Authors Liwei Wang, Yin Li, Jing Huang, Svetlana Lazebnik
Abstract Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.
Tasks Text Matching
Published 2017-04-11
URL http://arxiv.org/abs/1704.03470v4
PDF http://arxiv.org/pdf/1704.03470v4.pdf
PWC https://paperswithcode.com/paper/learning-two-branch-neural-networks-for-image
Repo https://github.com/BryanPlummer/cite
Framework tf

Field-aware Factorization Machines in a Real-world Online Advertising System

Title Field-aware Factorization Machines in a Real-world Online Advertising System
Authors Yuchin Juan, Damien Lefortier, Olivier Chapelle
Abstract Predicting user response is one of the core machine learning tasks in computational advertising. Field-aware Factorization Machines (FFM) have recently been established as a state-of-the-art method for that problem and in particular won two Kaggle challenges. This paper presents some results from implementing this method in a production system that predicts click-through and conversion rates for display advertising and shows that this method it is not only effective to win challenges but is also valuable in a real-world prediction system. We also discuss some specific challenges and solutions to reduce the training time, namely the use of an innovative seeding algorithm and a distributed learning mechanism.
Tasks
Published 2017-01-15
URL http://arxiv.org/abs/1701.04099v3
PDF http://arxiv.org/pdf/1701.04099v3.pdf
PWC https://paperswithcode.com/paper/field-aware-factorization-machines-in-a-real
Repo https://github.com/cpapadimitriou/Click-Through-Rate-prediction
Framework none

Metadynamics for Training Neural Network Model Chemistries: a Competitive Assessment

Title Metadynamics for Training Neural Network Model Chemistries: a Competitive Assessment
Authors John E. Herr, Kun Yao, Ryker McIntyre, David Toth, John Parkhill
Abstract Neural network (NN) model chemistries (MCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and test data' chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow test error’ can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling (NMS) and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near $k_bT$. It is one cheap tool to address the issue of generalization.
Tasks
Published 2017-12-19
URL http://arxiv.org/abs/1712.07240v1
PDF http://arxiv.org/pdf/1712.07240v1.pdf
PWC https://paperswithcode.com/paper/metadynamics-for-training-neural-network
Repo https://github.com/jparkhill/TensorMol
Framework tf
comments powered by Disqus