July 30, 2019

3241 words 16 mins read

Paper Group AWR 8

Class Rectification Hard Mining for Imbalanced Deep Learning. Adaptive Low-Rank Kernel Subspace Clustering. Optimal deep neural networks for sparse recovery via Laplace techniques. Automated Scalable Bayesian Inference via Hilbert Coresets. PDE-Net: Learning PDEs from Data. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Vi …

Class Rectification Hard Mining for Imbalanced Deep Learning


Title	Class Rectification Hard Mining for Imbalanced Deep Learning
Authors	Qi Dong, Shaogang Gong, Xiatian Zhu
Abstract	Recognising detailed facial or clothing attributes in images of people is a challenging task for computer vision, especially when the training data are both in very large scale and extremely imbalanced among different attribute classes. To address this problem, we formulate a novel scheme for batch incremental hard sample mining of minority attribute classes from imbalanced large scale training data. We develop an end-to-end deep learning framework capable of avoiding the dominant effect of majority classes by discovering sparsely sampled boundaries of minority classes. This is made possible by introducing a Class Rectification Loss (CRL) regularising algorithm. We demonstrate the advantages and scalability of CRL over existing state-of-the-art attribute recognition and imbalanced data learning models on two large scale imbalanced benchmark datasets, the CelebA facial attribute dataset and the X-Domain clothing attribute dataset.
Tasks
Published	2017-12-08
URL	http://arxiv.org/abs/1712.03162v1
PDF	http://arxiv.org/pdf/1712.03162v1.pdf
PWC	https://paperswithcode.com/paper/class-rectification-hard-mining-for
Repo	https://github.com/JoyLuo/face-attribute-recognition-paper-list
Framework	none

Adaptive Low-Rank Kernel Subspace Clustering


Title	Adaptive Low-Rank Kernel Subspace Clustering
Authors	Pan Ji, Ian Reid, Ravi Garg, Hongdong Li, Mathieu Salzmann
Abstract	In this paper, we present a kernel subspace clustering method that can handle non-linear models. In contrast to recent kernel subspace clustering methods which use predefined kernels, we propose to learn a low-rank kernel matrix, with which mapped data in feature space are not only low-rank but also self-expressive. In this manner, the low-dimensional subspace structures of the (implicitly) mapped data are retained and manifested in the high-dimensional feature space. We evaluate the proposed method extensively on both motion segmentation and image clustering benchmarks, and obtain superior results, outperforming the kernel subspace clustering method that uses standard kernels[Patel 2014] and other state-of-the-art linear subspace clustering methods.
Tasks	Image Clustering, Motion Segmentation
Published	2017-07-17
URL	http://arxiv.org/abs/1707.04974v4
PDF	http://arxiv.org/pdf/1707.04974v4.pdf
PWC	https://paperswithcode.com/paper/adaptive-low-rank-kernel-subspace-clustering
Repo	https://github.com/panji1990/Low-rank-kernel-subspace-clustering
Framework	none

Optimal deep neural networks for sparse recovery via Laplace techniques


Title	Optimal deep neural networks for sparse recovery via Laplace techniques
Authors	Steffen Limmer, Slawomir Stanczak
Abstract	This paper introduces Laplace techniques for designing a neural network, with the goal of estimating simplex-constraint sparse vectors from compressed measurements. To this end, we recast the problem of MMSE estimation (w.r.t. a pre-defined uniform input distribution) as the problem of computing the centroid of some polytope that results from the intersection of the simplex and an affine subspace determined by the measurements. Owing to the specific structure, it is shown that the centroid can be computed analytically by extending a recent result that facilitates the volume computation of polytopes via Laplace transformations. A main insight of this paper is that the desired volume and centroid computations can be performed by a classical deep neural network comprising threshold functions, rectified linear (ReLU) and rectified polynomial (ReP) activation functions. The proposed construction of a deep neural network for sparse recovery is completely analytic so that time-consuming training procedures are not necessary. Furthermore, we show that the number of layers in our construction is equal to the number of measurements which might enable novel low-latency sparse recovery algorithms for a larger class of signals than that assumed in this paper. To assess the applicability of the proposed uniform input distribution, we showcase the recovery performance on samples that are soft-classification vectors generated by two standard datasets. As both volume and centroid computation are known to be computationally hard, the network width grows exponentially in the worst-case. It can be, however, decreased by inducing sparse connectivity in the neural network via a well-suited basis of the affine subspace. Finally, the presented analytical construction may serve as a viable initialization to be further optimized and trained using particular input datasets at hand.
Tasks
Published	2017-09-04
URL	http://arxiv.org/abs/1709.01112v2
PDF	http://arxiv.org/pdf/1709.01112v2.pdf
PWC	https://paperswithcode.com/paper/optimal-deep-neural-networks-for-sparse
Repo	https://github.com/stli/CentNet
Framework	none

Automated Scalable Bayesian Inference via Hilbert Coresets


Title	Automated Scalable Bayesian Inference via Hilbert Coresets
Authors	Trevor Campbell, Tamara Broderick
Abstract	The automation of posterior inference in Bayesian data analysis has enabled experts and nonexperts alike to use more sophisticated models, engage in faster exploratory modeling and analysis, and ensure experimental reproducibility. However, standard automated posterior inference algorithms are not tractable at the scale of massive modern datasets, and modifications to make them so are typically model-specific, require expert tuning, and can break theoretical guarantees on inferential quality. Building on the Bayesian coresets framework, this work instead takes advantage of data redundancy to shrink the dataset itself as a preprocessing step, providing fully-automated, scalable Bayesian inference with theoretical guarantees. We begin with an intuitive reformulation of Bayesian coreset construction as sparse vector sum approximation, and demonstrate that its automation and performance-based shortcomings arise from the use of the supremum norm. To address these shortcomings we develop Hilbert coresets, i.e., Bayesian coresets constructed under a norm induced by an inner-product on the log-likelihood function space. We propose two Hilbert coreset construction algorithms—one based on importance sampling, and one based on the Frank-Wolfe algorithm—along with theoretical guarantees on approximation quality as a function of coreset size. Since the exact computation of the proposed inner-products is model-specific, we automate the construction with a random finite-dimensional projection of the log-likelihood functions. The resulting automated coreset construction algorithm is simple to implement, and experiments on a variety of models with real and synthetic datasets show that it provides high-quality posterior approximations and a significant reduction in the computational cost of inference.
Tasks	Bayesian Inference
Published	2017-10-13
URL	http://arxiv.org/abs/1710.05053v2
PDF	http://arxiv.org/pdf/1710.05053v2.pdf
PWC	https://paperswithcode.com/paper/automated-scalable-bayesian-inference-via
Repo	https://github.com/trevorcampbell/bayesian-coresets
Framework	none

PDE-Net: Learning PDEs from Data


Title	PDE-Net: Learning PDEs from Data
Authors	Zichao Long, Yiping Lu, Xianzhong Ma, Bin Dong
Abstract	In this paper, we present an initial attempt to learn evolution PDEs from data. Inspired by the latest development of neural network designs in deep learning, we propose a new feed-forward deep network, called PDE-Net, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. The basic idea of the proposed PDE-Net is to learn differential operators by learning convolution kernels (filters), and apply neural networks or other machine learning methods to approximate the unknown nonlinear responses. Comparing with existing approaches, which either assume the form of the nonlinear response is known or fix certain finite difference approximations of differential operators, our approach has the most flexibility by learning both differential operators and the nonlinear responses. A special feature of the proposed PDE-Net is that all filters are properly constrained, which enables us to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. These constrains are carefully designed by fully exploiting the relation between the orders of differential operators and the orders of sum rules of filters (an important concept originated from wavelet theory). We also discuss relations of the PDE-Net with some existing networks in computer vision such as Network-In-Network (NIN) and Residual Neural Network (ResNet). Numerical experiments show that the PDE-Net has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.
Tasks
Published	2017-10-26
URL	http://arxiv.org/abs/1710.09668v2
PDF	http://arxiv.org/pdf/1710.09668v2.pdf
PWC	https://paperswithcode.com/paper/pde-net-learning-pdes-from-data
Repo	https://github.com/ZichaoLong/aTEAM
Framework	pytorch

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation


Title	Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation
Authors	Huaizu Jiang, Deqing Sun, Varun Jampani, Ming-Hsuan Yang, Erik Learned-Miller, Jan Kautz
Abstract	Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent video sequences. While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame video interpolation, where the motion interpretation and occlusion reasoning are jointly modeled. We start by computing bi-directional optical flow between the input images using a U-Net architecture. These flows are then linearly combined at each time step to approximate the intermediate bi-directional optical flows. These approximate flows, however, only work well in locally smooth regions and produce artifacts around motion boundaries. To address this shortcoming, we employ another U-Net to refine the approximated flow and also predict soft visibility maps. Finally, the two input images are warped and linearly fused to form each intermediate frame. By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed. We use 1,132 video clips with 240-fps, containing 300K individual video frames, to train our network. Experimental results on several datasets, predicting different numbers of interpolated frames, demonstrate that our approach performs consistently better than existing methods.
Tasks	Optical Flow Estimation, Video Frame Interpolation
Published	2017-11-30
URL	http://arxiv.org/abs/1712.00080v2
PDF	http://arxiv.org/pdf/1712.00080v2.pdf
PWC	https://paperswithcode.com/paper/super-slomo-high-quality-estimation-of
Repo	https://github.com/susomena/DeepSlowMotion
Framework	tf

Matterport3D: Learning from RGB-D Data in Indoor Environments


Title	Matterport3D: Learning from RGB-D Data in Indoor Environments
Authors	Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang
Abstract	Access to large, diverse RGB-D datasets is critical for training RGB-D scene understanding algorithms. However, existing datasets still cover only a limited number of views or a restricted scale of spaces. In this paper, we introduce Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided with surface reconstructions, camera poses, and 2D and 3D semantic segmentations. The precise global alignment and comprehensive, diverse panoramic set of views over entire buildings enable a variety of supervised and self-supervised computer vision tasks, including keypoint matching, view overlap prediction, normal prediction from color, semantic segmentation, and region classification.
Tasks	Scene Understanding, Semantic Segmentation
Published	2017-09-18
URL	http://arxiv.org/abs/1709.06158v1
PDF	http://arxiv.org/pdf/1709.06158v1.pdf
PWC	https://paperswithcode.com/paper/matterport3d-learning-from-rgb-d-data-in
Repo	https://github.com/niessner/Matterport
Framework	none

Crowdsourcing Question-Answer Meaning Representations


Title	Crowdsourcing Question-Answer Meaning Representations
Authors	Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, Luke Zettlemoyer
Abstract	We introduce Question-Answer Meaning Representations (QAMRs), which represent the predicate-argument structure of a sentence as a set of question-answer pairs. We also develop a crowdsourcing scheme to show that QAMRs can be labeled with very little training, and gather a dataset with over 5,000 sentences and 100,000 questions. A detailed qualitative analysis demonstrates that the crowd-generated question-answer pairs cover the vast majority of predicate-argument relationships in existing datasets (including PropBank, NomBank, QA-SRL, and AMR) along with many previously under-resourced ones, including implicit arguments and relations. The QAMR data and annotation code is made publicly available to enable future work on how best to model these complex phenomena.
Tasks
Published	2017-11-16
URL	http://arxiv.org/abs/1711.05885v1
PDF	http://arxiv.org/pdf/1711.05885v1.pdf
PWC	https://paperswithcode.com/paper/crowdsourcing-question-answer-meaning
Repo	https://github.com/uwnlp/qamr
Framework	none

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning


Title	TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
Authors	Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li
Abstract	High network communication cost for synchronizing gradients and parameters is the well-known bottleneck of distributed training. In this work, we propose TernGrad that uses ternary gradients to accelerate distributed deep learning in data parallelism. Our approach requires only three numerical levels {-1,0,1}, which can aggressively reduce the communication time. We mathematically prove the convergence of TernGrad under the assumption of a bound on gradients. Guided by the bound, we propose layer-wise ternarizing and gradient clipping to improve its convergence. Our experiments show that applying TernGrad on AlexNet does not incur any accuracy loss and can even improve accuracy. The accuracy loss of GoogLeNet induced by TernGrad is less than 2% on average. Finally, a performance model is proposed to study the scalability of TernGrad. Experiments show significant speed gains for various deep neural networks. Our source code is available.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07878v6
PDF	http://arxiv.org/pdf/1705.07878v6.pdf
PWC	https://paperswithcode.com/paper/terngrad-ternary-gradients-to-reduce
Repo	https://github.com/wenwei202/terngrad
Framework	tf

Progressive Neural Networks for Transfer Learning in Emotion Recognition


Title	Progressive Neural Networks for Transfer Learning in Emotion Recognition
Authors	John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost
Abstract	Many paralinguistic tasks are closely related and thus representations learned in one domain can be leveraged for another. In this paper, we investigate how knowledge can be transferred between three paralinguistic tasks: speaker, emotion, and gender recognition. Further, we extend this problem to cross-dataset tasks, asking how knowledge captured in one emotion dataset can be transferred to another. We focus on progressive neural networks and compare these networks to the conventional deep learning method of pre-training and fine-tuning. Progressive neural networks provide a way to transfer knowledge and avoid the forgetting effect present when pre-training neural networks on different tasks. Our experiments demonstrate that: (1) emotion recognition can benefit from using representations originally learned for different paralinguistic tasks and (2) transfer learning can effectively leverage additional datasets to improve the performance of emotion recognition systems.
Tasks	Emotion Recognition, Transfer Learning
Published	2017-06-10
URL	http://arxiv.org/abs/1706.03256v1
PDF	http://arxiv.org/pdf/1706.03256v1.pdf
PWC	https://paperswithcode.com/paper/progressive-neural-networks-for-transfer
Repo	https://github.com/zbyte64/pytorch-dagsearch
Framework	pytorch

Punny Captions: Witty Wordplay in Image Descriptions


Title	Punny Captions: Witty Wordplay in Image Descriptions
Authors	Arjun Chandrasekaran, Devi Parikh, Mohit Bansal
Abstract	Wit is a form of rich interaction that is often grounded in a specific situation (e.g., a comment in response to an event). In this work, we attempt to build computational models that can produce witty descriptions for a given image. Inspired by a cognitive account of humor appreciation, we employ linguistic wordplay, specifically puns, in image descriptions. We develop two approaches which involve retrieving witty descriptions for a given image from a large corpus of sentences, or generating them via an encoder-decoder neural network architecture. We compare our approach against meaningful baseline approaches via human studies and show substantial improvements. We find that when a human is subject to similar constraints as the model regarding word usage and style, people vote the image descriptions generated by our model to be slightly wittier than human-written witty descriptions. Unsurprisingly, humans are almost always wittier than the model when they are free to choose the vocabulary, style, etc.
Tasks
Published	2017-04-26
URL	http://arxiv.org/abs/1704.08224v2
PDF	http://arxiv.org/pdf/1704.08224v2.pdf
PWC	https://paperswithcode.com/paper/punny-captions-witty-wordplay-in-image
Repo	https://github.com/purvaten/punny_captions
Framework	tf

Predicting Driver Attention in Critical Situations


Title	Predicting Driver Attention in Critical Situations
Authors	Ye Xia, Danqing Zhang, Jinkyu Kim, Ken Nakayama, Karl Zipser, David Whitney
Abstract	Robust driver attention prediction for critical situations is a challenging computer vision problem, yet essential for autonomous driving. Because critical driving moments are so rare, collecting enough data for these situations is difficult with the conventional in-car data collection protocol—tracking eye movements during driving. Here, we first propose a new in-lab driver attention collection protocol and introduce a new driver attention dataset, Berkeley DeepDrive Attention (BDD-A) dataset, which is built upon braking event videos selected from a large-scale, crowd-sourced driving video dataset. We further propose Human Weighted Sampling (HWS) method, which uses human gaze behavior to identify crucial frames of a driving dataset and weights them heavily during model training. With our dataset and HWS, we built a driver attention prediction model that outperforms the state-of-the-art and demonstrates sophisticated behaviors, like attending to crossing pedestrians but not giving false alarms to pedestrians safely walking on the sidewalk. Its prediction results are nearly indistinguishable from ground-truth to humans. Although only being trained with our in-lab attention data, the model also predicts in-car driver attention data of routine driving with state-of-the-art accuracy. This result not only demonstrates the performance of our model but also proves the validity and usefulness of our dataset and data collection protocol.
Tasks	Autonomous Driving, Driver Attention Monitoring
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06406v3
PDF	http://arxiv.org/pdf/1711.06406v3.pdf
PWC	https://paperswithcode.com/paper/predicting-driver-attention-in-critical
Repo	https://github.com/pascalxia/driver_attention_prediction
Framework	tf

Learning Two-Branch Neural Networks for Image-Text Matching Tasks


Title	Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Authors	Liwei Wang, Yin Li, Jing Huang, Svetlana Lazebnik
Abstract	Image-language matching tasks have recently attracted a lot of attention in the computer vision field. These tasks include image-sentence matching, i.e., given an image query, retrieving relevant sentences and vice versa, and region-phrase matching or visual grounding, i.e., matching a phrase to relevant regions. This paper investigates two-branch neural networks for learning the similarity between these two data modalities. We propose two network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. Compared to standard triplet sampling, we perform improved neighborhood sampling that takes neighborhood information into consideration while constructing mini-batches. The second network structure, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.
Tasks	Text Matching
Published	2017-04-11
URL	http://arxiv.org/abs/1704.03470v4
PDF	http://arxiv.org/pdf/1704.03470v4.pdf
PWC	https://paperswithcode.com/paper/learning-two-branch-neural-networks-for-image
Repo	https://github.com/BryanPlummer/cite
Framework	tf

Field-aware Factorization Machines in a Real-world Online Advertising System


Title	Field-aware Factorization Machines in a Real-world Online Advertising System
Authors	Yuchin Juan, Damien Lefortier, Olivier Chapelle
Abstract	Predicting user response is one of the core machine learning tasks in computational advertising. Field-aware Factorization Machines (FFM) have recently been established as a state-of-the-art method for that problem and in particular won two Kaggle challenges. This paper presents some results from implementing this method in a production system that predicts click-through and conversion rates for display advertising and shows that this method it is not only effective to win challenges but is also valuable in a real-world prediction system. We also discuss some specific challenges and solutions to reduce the training time, namely the use of an innovative seeding algorithm and a distributed learning mechanism.
Tasks
Published	2017-01-15
URL	http://arxiv.org/abs/1701.04099v3
PDF	http://arxiv.org/pdf/1701.04099v3.pdf
PWC	https://paperswithcode.com/paper/field-aware-factorization-machines-in-a-real
Repo	https://github.com/cpapadimitriou/Click-Through-Rate-prediction
Framework	none

Metadynamics for Training Neural Network Model Chemistries: a Competitive Assessment


Title	Metadynamics for Training Neural Network Model Chemistries: a Competitive Assessment
Authors	John E. Herr, Kun Yao, Ryker McIntyre, David Toth, John Parkhill
Abstract	Neural network (NN) model chemistries (MCs) promise to facilitate the accurate exploration of chemical space and simulation of large reactive systems. One important path to improving these models is to add layers of physical detail, especially long-range forces. At short range, however, these models are data driven and data limited. Little is systematically known about how data should be sampled, and `test data' chosen randomly from some sampling techniques can provide poor information about generality. If the sampling method is narrow` test error’ can appear encouragingly tiny while the model fails catastrophically elsewhere. In this manuscript we competitively evaluate two common sampling methods: molecular dynamics (MD), normal-mode sampling (NMS) and one uncommon alternative, Metadynamics (MetaMD), for preparing training geometries. We show that MD is an inefficient sampling method in the sense that additional samples do not improve generality. We also show MetaMD is easily implemented in any NNMC software package with cost that scales linearly with the number of atoms in a sample molecule. MetaMD is a black-box way to ensure samples always reach out to new regions of chemical space, while remaining relevant to chemistry near $k_bT$. It is one cheap tool to address the issue of generalization.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.07240v1
PDF	http://arxiv.org/pdf/1712.07240v1.pdf
PWC	https://paperswithcode.com/paper/metadynamics-for-training-neural-network
Repo	https://github.com/jparkhill/TensorMol
Framework	tf