Paper Group AWR 3
Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition. Adversarial Feature Learning. Entropy-SGD: Biasing Gradient Descent Into Wide Valleys. Demystifying Fixed k-Nearest Neighbor Information Estimators. DCM Bandits: Learning to Rank with Multiple Clicks. WaveNet: A Generative Model for Raw Audio. …
Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition
Title | Deep Impression: Audiovisual Deep Residual Networks for Multimodal Apparent Personality Trait Recognition |
Authors | Yağmur Güçlütürk, Umut Güçlü, Marcel A. J. van Gerven, Rob van Lier |
Abstract | Here, we develop an audiovisual deep residual network for multimodal apparent personality trait recognition. The network is trained end-to-end for predicting the Big Five personality traits of people from their videos. That is, the network does not require any feature engineering or visual analysis such as face detection, face landmark alignment or facial expression recognition. Recently, the network won the third place in the ChaLearn First Impressions Challenge with a test accuracy of 0.9109. |
Tasks | Feature Engineering, Personality Trait Recognition |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.05119v1 |
http://arxiv.org/pdf/1609.05119v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-impression-audiovisual-deep-residual |
Repo | https://github.com/yagguc/deep_impression |
Framework | none |
Adversarial Feature Learning
Title | Adversarial Feature Learning |
Authors | Jeff Donahue, Philipp Krähenbühl, Trevor Darrell |
Abstract | The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing that the latent space of such generators captures semantic variation in the data distribution. Intuitively, models trained to predict these semantic latent representations given data may serve as useful feature representations for auxiliary problems where semantics are relevant. However, in their existing form, GANs have no means of learning the inverse mapping – projecting data back into the latent space. We propose Bidirectional Generative Adversarial Networks (BiGANs) as a means of learning this inverse mapping, and demonstrate that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning. |
Tasks | |
Published | 2016-05-31 |
URL | http://arxiv.org/abs/1605.09782v7 |
http://arxiv.org/pdf/1605.09782v7.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-feature-learning |
Repo | https://github.com/jeffdonahue/bigan |
Framework | none |
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Title | Entropy-SGD: Biasing Gradient Descent Into Wide Valleys |
Authors | Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina |
Abstract | This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time. |
Tasks | |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01838v5 |
http://arxiv.org/pdf/1611.01838v5.pdf | |
PWC | https://paperswithcode.com/paper/entropy-sgd-biasing-gradient-descent-into |
Repo | https://github.com/ucla-vision/entropy-sgd |
Framework | pytorch |
Demystifying Fixed k-Nearest Neighbor Information Estimators
Title | Demystifying Fixed k-Nearest Neighbor Information Estimators |
Authors | Weihao Gao, Sewoong Oh, Pramod Viswanath |
Abstract | Estimating mutual information from i.i.d. samples drawn from an unknown joint density function is a basic statistical problem of broad interest with multitudinous applications. The most popular estimator is one proposed by Kraskov and St"ogbauer and Grassberger (KSG) in 2004, and is nonparametric and based on the distances of each sample to its $k^{\rm th}$ nearest neighboring sample, where $k$ is a fixed small integer. Despite its widespread use (part of scientific software packages), theoretical properties of this estimator have been largely unexplored. In this paper we demonstrate that the estimator is consistent and also identify an upper bound on the rate of convergence of the bias as a function of number of samples. We argue that the superior performance benefits of the KSG estimator stems from a curious “correlation boosting” effect and build on this intuition to modify the KSG estimator in novel ways to construct a superior estimator. As a byproduct of our investigations, we obtain nearly tight rates of convergence of the $\ell_2$ error of the well known fixed $k$ nearest neighbor estimator of differential entropy by Kozachenko and Leonenko. |
Tasks | |
Published | 2016-04-11 |
URL | http://arxiv.org/abs/1604.03006v2 |
http://arxiv.org/pdf/1604.03006v2.pdf | |
PWC | https://paperswithcode.com/paper/demystifying-fixed-k-nearest-neighbor |
Repo | https://github.com/wgao9/knnie |
Framework | none |
DCM Bandits: Learning to Rank with Multiple Clicks
Title | DCM Bandits: Learning to Rank with Multiple Clicks |
Authors | Sumeet Katariya, Branislav Kveton, Csaba Szepesvári, Zheng Wen |
Abstract | A search engine recommends to the user a list of web pages. The user examines this list, from the first page to the last, and clicks on all attractive pages until the user is satisfied. This behavior of the user can be described by the dependent click model (DCM). We propose DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfactory items, such as web pages. The main challenge of our learning problem is that we do not observe which attractive item is satisfactory. We propose a computationally-efficient learning algorithm for solving our problem, dcmKL-UCB; derive gap-dependent upper bounds on its regret under reasonable assumptions; and also prove a matching lower bound up to logarithmic factors. We evaluate our algorithm on synthetic and real-world problems, and show that it performs well even when our model is misspecified. This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model. |
Tasks | Learning-To-Rank |
Published | 2016-02-09 |
URL | http://arxiv.org/abs/1602.03146v2 |
http://arxiv.org/pdf/1602.03146v2.pdf | |
PWC | https://paperswithcode.com/paper/dcm-bandits-learning-to-rank-with-multiple |
Repo | https://github.com/wchen408/4803RA |
Framework | none |
WaveNet: A Generative Model for Raw Audio
Title | WaveNet: A Generative Model for Raw Audio |
Authors | Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu |
Abstract | This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition. |
Tasks | Audio Generation, Speech Synthesis |
Published | 2016-09-12 |
URL | http://arxiv.org/abs/1609.03499v2 |
http://arxiv.org/pdf/1609.03499v2.pdf | |
PWC | https://paperswithcode.com/paper/wavenet-a-generative-model-for-raw-audio |
Repo | https://github.com/NVIDIA/nv-wavenet |
Framework | pytorch |
A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms
Title | A metric on the space of finite sets of trajectories for evaluation of multi-target tracking algorithms |
Authors | Abu Sajana Rahmathullah, Ángel F. García-Fernández, Lennart Svensson |
Abstract | In this paper, we propose a metric on the space of finite sets of trajectories for assessing multi-target tracking algorithms in a mathematically sound way. The metric can be used, e.g., to compare estimates from algorithms with the ground truth. It includes intuitive costs associated to localization, missed and false targets and track switches. The metric computation is based on multi-dimensional assignments, which is an NP hard problem. Therefore, we also propose a lower bound for the metric, which is also a metric for sets of trajectories and is computable in polynomial time using linear programming (LP). The LP metric can be implemented using alternating direction method of multipliers such that the complexity scales linearly with the length of the trajectories. |
Tasks | |
Published | 2016-05-04 |
URL | https://arxiv.org/abs/1605.01177v3 |
https://arxiv.org/pdf/1605.01177v3.pdf | |
PWC | https://paperswithcode.com/paper/a-metric-on-the-space-of-finite-sets-of |
Repo | https://github.com/Agarciafernandez/MTT |
Framework | none |
Image Based Camera Localization: an Overview
Title | Image Based Camera Localization: an Overview |
Authors | Yihong Wu, Fulin Tang, Heping Li |
Abstract | Recently, virtual reality, augmented reality, robotics, autonomous driving et al attract much attention of both academic and industrial community, in which image based camera localization is a key task. However, there has not been a complete review on image-based camera localization. It is urgent to map this topic to help people enter the field quickly. In this paper, an overview of image based camera localization is presented. A new and complete kind of classifications for image based camera localization is provided and the related techniques are introduced. Trends for the future development are also discussed. It will be useful to not only researchers but also engineers and other people interested. |
Tasks | Autonomous Driving, Camera Localization |
Published | 2016-10-12 |
URL | http://arxiv.org/abs/1610.03660v4 |
http://arxiv.org/pdf/1610.03660v4.pdf | |
PWC | https://paperswithcode.com/paper/image-based-camera-localization-an-overview |
Repo | https://github.com/obo/lego |
Framework | none |
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
Title | FINN: A Framework for Fast, Scalable Binarized Neural Network Inference |
Authors | Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, Kees Vissers |
Abstract | Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1% and 94.9% accuracy. To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks. |
Tasks | |
Published | 2016-12-01 |
URL | http://arxiv.org/abs/1612.07119v1 |
http://arxiv.org/pdf/1612.07119v1.pdf | |
PWC | https://paperswithcode.com/paper/finn-a-framework-for-fast-scalable-binarized |
Repo | https://github.com/Xilinx/BNN-PYNQ |
Framework | none |
Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking
Title | Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking |
Authors | Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, Carlo Tomasi |
Abstract | To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080p, 60fps video taken by 8 cameras observing more than 2,700 identities over 85 minutes; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art. |
Tasks | |
Published | 2016-09-06 |
URL | http://arxiv.org/abs/1609.01775v2 |
http://arxiv.org/pdf/1609.01775v2.pdf | |
PWC | https://paperswithcode.com/paper/performance-measures-and-a-data-set-for-multi |
Repo | https://github.com/yxgeee/MMT |
Framework | pytorch |
Non Local Spatial and Angular Matching : Enabling higher spatial resolution diffusion MRI datasets through adaptive denoising
Title | Non Local Spatial and Angular Matching : Enabling higher spatial resolution diffusion MRI datasets through adaptive denoising |
Authors | Samuel St-Jean, Pierrick Coupé, Maxime Descoteaux |
Abstract | Diffusion magnetic resonance imaging datasets suffer from low Signal-to-Noise Ratio, especially at high b-values. Acquiring data at high b-values contains relevant information and is now of great interest for microstructural and connectomics studies. High noise levels bias the measurements due to the non-Gaussian nature of the noise, which in turn can lead to a false and biased estimation of the diffusion parameters. Additionally, the usage of in-plane acceleration techniques during the acquisition leads to a spatially varying noise distribution, which depends on the parallel acceleration method implemented on the scanner. This paper proposes a novel diffusion MRI denoising technique that can be used on all existing data, without adding to the scanning time. We first apply a statistical framework to convert the noise to Gaussian distributed noise, effectively removing the bias. We then introduce a spatially and angular adaptive denoising technique, the Non Local Spatial and Angular Matching (NLSAM) algorithm. Each volume is first decomposed in small 4D overlapping patches to capture the structure of the diffusion data and a dictionary of atoms is learned on those patches. A local sparse decomposition is then found by bounding the reconstruction error with the local noise variance. We compare against three other state-of-the-art denoising methods and show quantitative local and connectivity results on a synthetic phantom and on an in-vivo high resolution dataset. Overall, our method restores perceptual information, removes the noise bias in common diffusion metrics, restores the extracted peaks coherence and improves reproducibility of tractography. Our work paves the way for higher spatial resolution acquisition of diffusion MRI datasets, which could in turn reveal new anatomical details that are not discernible at the spatial resolution currently used by the diffusion MRI community. |
Tasks | Denoising |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07239v1 |
http://arxiv.org/pdf/1606.07239v1.pdf | |
PWC | https://paperswithcode.com/paper/non-local-spatial-and-angular-matching |
Repo | https://github.com/samuelstjean/nlsam |
Framework | none |
Richer Convolutional Features for Edge Detection
Title | Richer Convolutional Features for Edge Detection |
Authors | Yun Liu, Ming-Ming Cheng, Xiaowei Hu, Kai Wang, Xiang Bai |
Abstract | In this paper, we propose an accurate edge detector using richer convolutional features (RCF). Since objects in nature images have various scales and aspect ratios, the automatically learned rich hierarchical representations by CNNs are very critical and effective to detect edges and object boundaries. And the convolutional features gradually become coarser with receptive fields increasing. Based on these observations, our proposed network architecture makes full use of multiscale and multi-level information to perform the image-to-image edge prediction by combining all of the useful convolutional features into a holistic framework. It is the first attempt to adopt such rich convolutional features in computer vision tasks. Using VGG16 network, we achieve \sArt results on several available datasets. When evaluating on the well-known BSDS500 benchmark, we achieve ODS F-measure of \textbf{.811} while retaining a fast speed (\textbf{8} FPS). Besides, our fast version of RCF achieves ODS F-measure of \textbf{.806} with \textbf{30} FPS. |
Tasks | Edge Detection |
Published | 2016-12-07 |
URL | https://arxiv.org/abs/1612.02103v3 |
https://arxiv.org/pdf/1612.02103v3.pdf | |
PWC | https://paperswithcode.com/paper/richer-convolutional-features-for-edge |
Repo | https://github.com/meteorshowers/RCF-pytorch |
Framework | pytorch |
End-to-end learning of brain tissue segmentation from imperfect labeling
Title | End-to-end learning of brain tissue segmentation from imperfect labeling |
Authors | Alex Fedorov, Jeremy Johnson, Eswar Damaraju, Alexei Ozerin, Vince Calhoun, Sergey Plis |
Abstract | Segmenting a structural magnetic resonance imaging (MRI) scan is an important pre-processing step for analytic procedures and subsequent inferences about longitudinal tissue changes. Manual segmentation defines the current gold standard in quality but is prohibitively expensive. Automatic approaches are computationally intensive, incredibly slow at scale, and error prone due to usually involving many potentially faulty intermediate steps. In order to streamline the segmentation, we introduce a deep learning model that is based on volumetric dilated convolutions, subsequently reducing both processing time and errors. Compared to its competitors, the model has a reduced set of parameters and thus is easier to train and much faster to execute. The contrast in performance between the dilated network and its competitors becomes obvious when both are tested on a large dataset of unprocessed human brain volumes. The dilated network consistently outperforms not only another state-of-the-art deep learning approach, the up convolutional network, but also the ground truth on which it was trained. Not only can the incredible speed of our model make large scale analyses much easier but we also believe it has great potential in a clinical setting where, with little to no substantial delay, a patient and provider can go over test results. |
Tasks | |
Published | 2016-12-03 |
URL | http://arxiv.org/abs/1612.00940v2 |
http://arxiv.org/pdf/1612.00940v2.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-brain-tissue |
Repo | https://github.com/Entodi/meshnet-pytorch |
Framework | pytorch |
STransE: a novel embedding model of entities and relationships in knowledge bases
Title | STransE: a novel embedding model of entities and relationships in knowledge bases |
Authors | Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson |
Abstract | Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform link prediction or knowledge base completion, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper combines insights from several previous link prediction models into a new embedding model STransE that represents each entity as a low-dimensional vector, and each relation by two matrices and a translation vector. STransE is a simple combination of the SE and TransE models, but it obtains better link prediction performance on two benchmark datasets than previous embedding models. Thus, STransE can serve as a new baseline for the more complex models in the link prediction task. |
Tasks | Knowledge Base Completion, Link Prediction |
Published | 2016-06-27 |
URL | http://arxiv.org/abs/1606.08140v3 |
http://arxiv.org/pdf/1606.08140v3.pdf | |
PWC | https://paperswithcode.com/paper/stranse-a-novel-embedding-model-of-entities |
Repo | https://github.com/datquocnguyen/STransE |
Framework | none |
Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data
Title | Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data |
Authors | Gil Keren, Björn Schuller |
Abstract | Traditional convolutional layers extract features from patches of data by applying a non-linearity on an affine function of the input. We propose a model that enhances this feature extraction process for the case of sequential data, by feeding patches of the data into a recurrent neural network and using the outputs or hidden states of the recurrent units to compute the extracted features. By doing so, we exploit the fact that a window containing a few frames of the sequential data is a sequence itself and this additional structure might encapsulate valuable information. In addition, we allow for more steps of computation in the feature extraction process, which is potentially beneficial as an affine function followed by a non-linearity can result in too simple features. Using our convolutional recurrent layers we obtain an improvement in performance in two audio classification tasks, compared to traditional convolutional layers. Tensorflow code for the convolutional recurrent layers is publicly available in https://github.com/cruvadom/Convolutional-RNN. |
Tasks | Audio Classification |
Published | 2016-02-18 |
URL | http://arxiv.org/abs/1602.05875v3 |
http://arxiv.org/pdf/1602.05875v3.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-rnn-an-enhanced-model-for |
Repo | https://github.com/cruvadom/Convolutional-RNN |
Framework | tf |