May 7, 2019

3035 words 15 mins read

Paper Group AWR 3

Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition. Adversarial Feature Learning. Entropy-SGD: Biasing Gradient Descent Into Wide Valleys. Demystifying Fixed k-Nearest Neighbor Information Estimators. DCM Bandits: Learning to Rank with Multiple Clicks. WaveNet: A Generative Model for Raw Audio. …

Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition


Title	Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition
Authors	Yağmur Güçlütürk, Umut Güçlü, Marcel A. J. van Gerven, Rob van Lier
Abstract	Here, we develop an audiovisual deep residual network for multimodal apparent personality trait recognition. The network is trained end-to-end for predicting the Big Five personality traits of people from their videos. That is, the network does not require any feature engineering or visual analysis such as face detection, face landmark alignment or facial expression recognition. Recently, the network won the third place in the ChaLearn First Impressions Challenge with a test accuracy of 0.9109.
Tasks	Feature Engineering, Personality Trait Recognition
Published	2016-09-16
URL	http://arxiv.org/abs/1609.05119v1
PDF	http://arxiv.org/pdf/1609.05119v1.pdf
PWC	https://paperswithcode.com/paper/deep-impression-audiovisual-deep-residual
Repo	https://github.com/yagguc/deep_impression
Framework	none

Adversarial Feature Learning


Title	Adversarial Feature Learning
Authors	Jeff Donahue, Philipp Krähenbühl, Trevor Darrell
Abstract	The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution. Intuitively, models trained to predict these semantic latent representations given data may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping – projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning.
Tasks
Published	2016-05-31
URL	http://arxiv.org/abs/1605.09782v7
PDF	http://arxiv.org/pdf/1605.09782v7.pdf
PWC	https://paperswithcode.com/paper/adversarial-feature-learning
Repo	https://github.com/jeffdonahue/bigan
Framework	none

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys


Title	Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Authors	Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina
Abstract	This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.
Tasks
Published	2016-11-06
URL	http://arxiv.org/abs/1611.01838v5
PDF	http://arxiv.org/pdf/1611.01838v5.pdf
PWC	https://paperswithcode.com/paper/entropy-sgd-biasing-gradient-descent-into
Repo	https://github.com/ucla-vision/entropy-sgd
Framework	pytorch

Demystifying Fixed k-Nearest Neighbor Information Estimators


Title	Demystifying Fixed k-Nearest Neighbor Information Estimators
Authors	Weihao Gao, Sewoong Oh, Pramod Viswanath
Abstract	Estimating mutual information from i.i.d. samples drawn from an unknown joint density function is a basic statistical problem of broad interest with multitudinous applications. The most popular estimator is one proposed by Kraskov and St"ogbauer and Grassberger (KSG) in 2004, and is nonparametric and based on the distances of each sample to its $k^{\rm th}$ nearest neighboring sample, where $k$ is a fixed small integer. Despite its widespread use (part of scientific software packages), theoretical properties of this estimator have been largely unexplored. In this paper we demonstrate that the estimator is consistent and also identify an upper bound on the rate of convergence of the bias as a function of number of samples. We argue that the superior performance benefits of the KSG estimator stems from a curious “correlation boosting” effect and build on this intuition to modify the KSG estimator in novel ways to construct a superior estimator. As a byproduct of our investigations, we obtain nearly tight rates of convergence of the $\ell_2$ error of the well known fixed $k$ nearest neighbor estimator of differential entropy by Kozachenko and Leonenko.
Tasks
Published	2016-04-11
URL	http://arxiv.org/abs/1604.03006v2
PDF	http://arxiv.org/pdf/1604.03006v2.pdf
PWC	https://paperswithcode.com/paper/demystifying-fixed-k-nearest-neighbor
Repo	https://github.com/wgao9/knnie
Framework	none

DCM Bandits: Learning to Rank with Multiple Clicks


Title	DCM Bandits: Learning to Rank with Multiple Clicks
Authors	Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen
Abstract	A search engine recommends to the user a list of web pages. The user examines this list, from the first page to the last, and clicks on all attractive pages until the user is satisfied. This behavior of the user can be described by the dependent click model (DCM). We propose DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfactory items, such as web pages. The main challenge of our learning problem is that we do not observe which attractive item is satisfactory. We propose a computationally-efficient learning algorithm for solving our problem, dcmKL-UCB; derive gap-dependent upper bounds on its regret under reasonable assumptions; and also prove a matching lower bound up to logarithmic factors. We evaluate our algorithm on synthetic and real-world problems, and show that it performs well even when our model is misspecified. This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.
Tasks	Learning-To-Rank
Published	2016-02-09
URL	http://arxiv.org/abs/1602.03146v2
PDF	http://arxiv.org/pdf/1602.03146v2.pdf
PWC	https://paperswithcode.com/paper/dcm-bandits-learning-to-rank-with-multiple
Repo	https://github.com/wchen408/4803RA
Framework	none

WaveNet: A Generative Model for Raw Audio


Title	WaveNet: A Generative Model for Raw Audio
Authors	Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
Abstract	This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.
Tasks	Audio Generation, Speech Synthesis
Published	2016-09-12
URL	http://arxiv.org/abs/1609.03499v2
PDF	http://arxiv.org/pdf/1609.03499v2.pdf
PWC	https://paperswithcode.com/paper/wavenet-a-generative-model-for-raw-audio
Repo	https://github.com/NVIDIA/nv-wavenet
Framework	pytorch

A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms


Title	A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms
Authors	Abu Sajana Rahmathullah, Ángel F. García-Fernández, Lennart Svensson
Abstract	In this paper, we propose a metric on the space of finite sets of trajectories for assessing multi-target tracking algorithms in a mathematically sound way. The metric can be used, e.g., to compare estimates from algorithms with the ground truth. It includes intuitive costs associated to localization, missed and false targets and track switches. The metric computation is based on multi-dimensional assignments, which is an NP hard problem. Therefore, we also propose a lower bound for the metric, which is also a metric for sets of trajectories and is computable in polynomial time using linear programming (LP). The LP metric can be implemented using alternating direction method of multipliers such that the complexity scales linearly with the length of the trajectories.
Tasks
Published	2016-05-04
URL	https://arxiv.org/abs/1605.01177v3
PDF	https://arxiv.org/pdf/1605.01177v3.pdf
PWC	https://paperswithcode.com/paper/a-metric-on-the-space-of-finite-sets-of
Repo	https://github.com/Agarciafernandez/MTT
Framework	none

Image Based Camera Localization: an Overview


Title	Image Based Camera Localization: an Overview
Authors	Yihong Wu, Fulin Tang, Heping Li
Abstract	Recently, virtual reality, augmented reality, robotics, autonomous driving et al attract much attention of both academic and industrial community, in which image based camera localization is a key task. However, there has not been a complete review on image-based camera localization. It is urgent to map this topic to help people enter the field quickly. In this paper, an overview of image based camera localization is presented. A new and complete kind of classifications for image based camera localization is provided and the related techniques are introduced. Trends for the future development are also discussed. It will be useful to not only researchers but also engineers and other people interested.
Tasks	Autonomous Driving, Camera Localization
Published	2016-10-12
URL	http://arxiv.org/abs/1610.03660v4
PDF	http://arxiv.org/pdf/1610.03660v4.pdf
PWC	https://paperswithcode.com/paper/image-based-camera-localization-an-overview
Repo	https://github.com/obo/lego
Framework	none

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference


Title	FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
Authors	Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers
Abstract	Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.
Tasks
Published	2016-12-01
URL	http://arxiv.org/abs/1612.07119v1
PDF	http://arxiv.org/pdf/1612.07119v1.pdf
PWC	https://paperswithcode.com/paper/finn-a-framework-for-fast-scalable-binarized
Repo	https://github.com/Xilinx/BNN-PYNQ
Framework	none

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking


Title	Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking
Authors	Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, Carlo Tomasi
Abstract	To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080p, 60fps video taken by 8 cameras observing more than 2,700 identities over 85 minutes; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art.
Tasks
Published	2016-09-06
URL	http://arxiv.org/abs/1609.01775v2
PDF	http://arxiv.org/pdf/1609.01775v2.pdf
PWC	https://paperswithcode.com/paper/performance-measures-and-a-data-set-for-multi
Repo	https://github.com/yxgeee/MMT
Framework	pytorch

Non Local Spatial and Angular Matching : Enabling higher spatial resolution diffusion MRI datasets through adaptive denoising


Title	Non Local Spatial and Angular Matching : Enabling higher spatial resolution diffusion MRI datasets through adaptive denoising
Authors	Samuel St-Jean, Pierrick Coupé, Maxime Descoteaux
Abstract	Diffusion magnetic resonance imaging datasets suffer from low Signal-to-Noise Ratio, especially at high b-values. Acquiring data at high b-values contains relevant information and is now of great interest for microstructural and connectomics studies. High noise levels bias the measurements due to the non-Gaussian nature of the noise, which in turn can lead to a false and biased estimation of the diffusion parameters. Additionally, the usage of in-plane acceleration techniques during the acquisition leads to a spatially varying noise distribution, which depends on the parallel acceleration method implemented on the scanner. This paper proposes a novel diffusion MRI denoising technique that can be used on all existing data, without adding to the scanning time. We first apply a statistical framework to convert the noise to Gaussian distributed noise, effectively removing the bias. We then introduce a spatially and angular adaptive denoising technique, the Non Local Spatial and Angular Matching (NLSAM) algorithm. Each volume is first decomposed in small 4D overlapping patches to capture the structure of the diffusion data and a dictionary of atoms is learned on those patches. A local sparse decomposition is then found by bounding the reconstruction error with the local noise variance. We compare against three other state-of-the-art denoising methods and show quantitative local and connectivity results on a synthetic phantom and on an in-vivo high resolution dataset. Overall, our method restores perceptual information, removes the noise bias in common diffusion metrics, restores the extracted peaks coherence and improves reproducibility of tractography. Our work paves the way for higher spatial resolution acquisition of diffusion MRI datasets, which could in turn reveal new anatomical details that are not discernible at the spatial resolution currently used by the diffusion MRI community.
Tasks	Denoising
Published	2016-06-23
URL	http://arxiv.org/abs/1606.07239v1
PDF	http://arxiv.org/pdf/1606.07239v1.pdf
PWC	https://paperswithcode.com/paper/non-local-spatial-and-angular-matching
Repo	https://github.com/samuelstjean/nlsam
Framework	none

Richer Convolutional Features for Edge Detection


Title	Richer Convolutional Features for Edge Detection
Authors	Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, Xiang Bai
Abstract	In this paper, we propose an accurate edge detector using richer convolutional features (RCF). Since objects in nature images have various scales and aspect ratios, the automatically learned rich hierarchical representations by CNNs are very critical and effective to detect edges and object boundaries. And the convolutional features gradually become coarser with receptive fields increasing. Based on these observations, our proposed network architecture makes full use of multiscale and multi-level information to perform the image-to-image edge prediction by combining all of the useful convolutional features into a holistic framework. It is the first attempt to adopt such rich convolutional features in computer vision tasks. Using VGG16 network, we achieve \sArt results on several available datasets. When evaluating on the well-known BSDS500 benchmark, we achieve ODS F-measure of \textbf{.811} while retaining a fast speed (\textbf{8} FPS). Besides, our fast version of RCF achieves ODS F-measure of \textbf{.806} with \textbf{30} FPS.
Tasks	Edge Detection
Published	2016-12-07
URL	https://arxiv.org/abs/1612.02103v3
PDF	https://arxiv.org/pdf/1612.02103v3.pdf
PWC	https://paperswithcode.com/paper/richer-convolutional-features-for-edge
Repo	https://github.com/meteorshowers/RCF-pytorch
Framework	pytorch

End-to-end learning of brain tissue segmentation from imperfect labeling


Title	End-to-end learning of brain tissue segmentation from imperfect labeling
Authors	Alex Fedorov, Jeremy Johnson, Eswar Damaraju, Alexei Ozerin, Vince Calhoun, Sergey Plis
Abstract	Segmenting a structural magnetic resonance imaging (MRI) scan is an important pre-processing step for analytic procedures and subsequent inferences about longitudinal tissue changes. Manual segmentation defines the current gold standard in quality but is prohibitively expensive. Automatic approaches are computationally intensive, incredibly slow at scale, and error prone due to usually involving many potentially faulty intermediate steps. In order to streamline the segmentation, we introduce a deep learning model that is based on volumetric dilated convolutions, subsequently reducing both processing time and errors. Compared to its competitors, the model has a reduced set of parameters and thus is easier to train and much faster to execute. The contrast in performance between the dilated network and its competitors becomes obvious when both are tested on a large dataset of unprocessed human brain volumes. The dilated network consistently outperforms not only another state-of-the-art deep learning approach, the up convolutional network, but also the ground truth on which it was trained. Not only can the incredible speed of our model make large scale analyses much easier but we also believe it has great potential in a clinical setting where, with little to no substantial delay, a patient and provider can go over test results.
Tasks
Published	2016-12-03
URL	http://arxiv.org/abs/1612.00940v2
PDF	http://arxiv.org/pdf/1612.00940v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-brain-tissue
Repo	https://github.com/Entodi/meshnet-pytorch
Framework	pytorch

STransE: a novel embedding model of entities and relationships in knowledge bases


Title	STransE: a novel embedding model of entities and relationships in knowledge bases
Authors	Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Abstract	Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform link prediction or knowledge base completion, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper combines insights from several previous link prediction models into a new embedding model STransE that represents each entity as a low-dimensional vector, and each relation by two matrices and a translation vector. STransE is a simple combination of the SE and TransE models, but it obtains better link prediction performance on two benchmark datasets than previous embedding models. Thus, STransE can serve as a new baseline for the more complex models in the link prediction task.
Tasks	Knowledge Base Completion, Link Prediction
Published	2016-06-27
URL	http://arxiv.org/abs/1606.08140v3
PDF	http://arxiv.org/pdf/1606.08140v3.pdf
PWC	https://paperswithcode.com/paper/stranse-a-novel-embedding-model-of-entities
Repo	https://github.com/datquocnguyen/STransE
Framework	none

Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data


Title	Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data
Authors	Gil Keren, Björn Schuller
Abstract	Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and this additional structure might encapsulate valuable information. In addition, we allow for more steps of computation in the feature extraction process, which is potentially beneficial as an affine function followed by a non-linearity can result in too simple features. Using our convolutional recurrent layers we obtain an improvement in performance in two audio classification tasks, compared to traditional convolutional layers. Tensorflow code for the convolutional recurrent layers is publicly available in https://github.com/cruvadom/Convolutional-RNN.
Tasks	Audio Classification
Published	2016-02-18
URL	http://arxiv.org/abs/1602.05875v3
PDF	http://arxiv.org/pdf/1602.05875v3.pdf
PWC	https://paperswithcode.com/paper/convolutional-rnn-an-enhanced-model-for
Repo	https://github.com/cruvadom/Convolutional-RNN
Framework	tf