May 7, 2019

2770 words 14 mins read

Paper Group AWR 54

Paper Group AWR 54

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract. Face Detection with the Faster R-CNN. Can we still avoid automatic face detection?. Clustering with Confidence: Finding Clusters with Statistical Guarantees. Real-Time Visual Place Recognition for Personal Localization on a Mobile Device. The Spectral Conditio …

Title A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract
Authors Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond
Abstract We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data.
Tasks Denoising, Image Denoising, Motion Capture, Semantic Segmentation
Published 2016-12-15
URL http://arxiv.org/abs/1612.05005v5
PDF http://arxiv.org/pdf/1612.05005v5.pdf
PWC https://paperswithcode.com/paper/a-multilinear-tongue-model-derived-from
Repo https://github.com/m2ci-msp/mri-shape-tools
Framework none

Face Detection with the Faster R-CNN

Title Face Detection with the Faster R-CNN
Authors Huaizu Jiang, Erik Learned-Miller
Abstract The Faster R-CNN has recently demonstrated impressive results on various object detection benchmarks. By training a Faster R-CNN model on the large scale WIDER face dataset, we report state-of-the-art results on two widely used face detection benchmarks, FDDB and the recently released IJB-A.
Tasks Face Detection, Object Detection
Published 2016-06-10
URL http://arxiv.org/abs/1606.03473v1
PDF http://arxiv.org/pdf/1606.03473v1.pdf
PWC https://paperswithcode.com/paper/face-detection-with-the-faster-r-cnn
Repo https://github.com/playerkk/face-py-faster-rcnn
Framework none

Can we still avoid automatic face detection?

Title Can we still avoid automatic face detection?
Authors Michael J. Wilber, Vitaly Shmatikov, Serge Belongie
Abstract After decades of study, automatic face detection and recognition systems are now accurate and widespread. Naturally, this means users who wish to avoid automatic recognition are becoming less able to do so. Where do we stand in this cat-and-mouse race? We currently live in a society where everyone carries a camera in their pocket. Many people willfully upload most or all of the pictures they take to social networks which invest heavily in automatic face recognition systems. In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition? If so, how? Must evasion techniques be obvious to be effective, or are there still simple measures that users can use to protect themselves? In this work, we find ways to evade face detection on Facebook, a representative example of a popular social network that uses automatic face detection to enhance their service. We challenge widely-held beliefs about evading face detection: do our old techniques such as blurring the face region or wearing “privacy glasses” still work? We show that in general, state-of-the-art detectors can often find faces even if the subject wears occluding clothing or even if the uploader damages the photo to prevent faces from being detected.
Tasks Face Detection, Face Recognition
Published 2016-02-14
URL https://arxiv.org/abs/1602.04504v2
PDF https://arxiv.org/pdf/1602.04504v2.pdf
PWC https://paperswithcode.com/paper/can-we-still-avoid-automatic-face-detection
Repo https://github.com/cydonia999/Tiny_Faces_in_Tensorflow
Framework tf

Clustering with Confidence: Finding Clusters with Statistical Guarantees

Title Clustering with Confidence: Finding Clusters with Statistical Guarantees
Authors Andreas Henelius, Kai Puolamäki, Henrik Boström, Panagiotis Papapetrou
Abstract Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least $1 - \alpha$. We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness.
Tasks
Published 2016-12-27
URL http://arxiv.org/abs/1612.08714v2
PDF http://arxiv.org/pdf/1612.08714v2.pdf
PWC https://paperswithcode.com/paper/clustering-with-confidence-finding-clusters
Repo https://github.com/bwrc/corecluster-r
Framework none

Real-Time Visual Place Recognition for Personal Localization on a Mobile Device

Title Real-Time Visual Place Recognition for Personal Localization on a Mobile Device
Authors Michał Nowicki, Jan Wietrzykowski, Piotr Skrzypczyński
Abstract The paper presents an approach to indoor personal localization on a mobile device based on visual place recognition. We implemented on a smartphone two state-of-the-art algorithms that are representative to two different approaches to visual place recognition: FAB-MAP that recognizes places using individual images, and ABLE-M that utilizes sequences of images. These algorithms are evaluated in environments of different structure, focusing on problems commonly encountered when a mobile device camera is used. The conclusions drawn from this evaluation are guidelines to design the FastABLE system, which is based on the ABLE-M algorithm, but introduces major modifications to the concept of image matching. The improvements radically cut down the processing time and improve scalability, making it possible to localize the user in long image sequences with the limited computing power of a mobile device. The resulting place recognition system compares favorably to both the ABLE-M and the FAB-MAP solutions in the context of real-time personal localization.
Tasks Visual Place Recognition
Published 2016-11-07
URL http://arxiv.org/abs/1611.02061v2
PDF http://arxiv.org/pdf/1611.02061v2.pdf
PWC https://paperswithcode.com/paper/real-time-visual-place-recognition-for
Repo https://github.com/LRMPUT/FastABLE
Framework none

The Spectral Condition Number Plot for Regularization Parameter Determination

Title The Spectral Condition Number Plot for Regularization Parameter Determination
Authors Carel F. W. Peeters, Mark A. van de Wiel, Wessel N. van Wieringen
Abstract Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators.
Tasks
Published 2016-08-14
URL http://arxiv.org/abs/1608.04123v1
PDF http://arxiv.org/pdf/1608.04123v1.pdf
PWC https://paperswithcode.com/paper/the-spectral-condition-number-plot-for
Repo https://github.com/CFWP/rags2ridges
Framework none

Convolutional Neural Networks using Logarithmic Data Representation

Title Convolutional Neural Networks using Logarithmic Data Representation
Authors Daisuke Miyashita, Edward H. Lee, Boris Murmann
Abstract Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers. Finally, we propose an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at 5-bits.
Tasks
Published 2016-03-03
URL http://arxiv.org/abs/1603.01025v2
PDF http://arxiv.org/pdf/1603.01025v2.pdf
PWC https://paperswithcode.com/paper/convolutional-neural-networks-using
Repo https://github.com/Enderdead/Pytorch_Quantize_impls
Framework pytorch

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Title Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Authors Alberto Bietti, Julien Mairal
Abstract Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.
Tasks Data Augmentation, Stochastic Optimization
Published 2016-10-04
URL http://arxiv.org/abs/1610.00970v6
PDF http://arxiv.org/pdf/1610.00970v6.pdf
PWC https://paperswithcode.com/paper/stochastic-optimization-with-variance
Repo https://github.com/albietz/stochs
Framework none

Semantic Image Inpainting with Deep Generative Models

Title Semantic Image Inpainting with Deep Generative Models
Authors Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do
Abstract Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only a single image generally produce unsatisfactory results due to the lack of high level context. In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditioning on the available data. Given a trained generative model, we search for the closest encoding of the corrupted image in the latent image manifold using our context and prior losses. This encoding is then passed through the generative model to infer the missing content. In our method, inference is possible irrespective of how the missing content is structured, while the state-of-the-art learning based method requires specific information about the holes in the training phase. Experiments on three datasets show that our method successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming the state-of-the-art methods.
Tasks Image Inpainting
Published 2016-07-26
URL http://arxiv.org/abs/1607.07539v3
PDF http://arxiv.org/pdf/1607.07539v3.pdf
PWC https://paperswithcode.com/paper/semantic-image-inpainting-with-deep
Repo https://github.com/shravan097/Image-Inpainting
Framework none

Variance-based regularization with convex objectives

Title Variance-based regularization with convex objectives
Authors John Duchi, Hongseok Namkoong
Abstract We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen’s empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.
Tasks Stochastic Optimization
Published 2016-10-08
URL http://arxiv.org/abs/1610.02581v3
PDF http://arxiv.org/pdf/1610.02581v3.pdf
PWC https://paperswithcode.com/paper/variance-based-regularization-with-convex
Repo https://github.com/hsnamkoong/robustopt
Framework none

Structured Inference Networks for Nonlinear State Space Models

Title Structured Inference Networks for Nonlinear State Space Models
Authors Rahul G. Krishnan, Uri Shalit, David Sontag
Abstract Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.
Tasks Multivariate Time Series Forecasting
Published 2016-09-30
URL http://arxiv.org/abs/1609.09869v2
PDF http://arxiv.org/pdf/1609.09869v2.pdf
PWC https://paperswithcode.com/paper/structured-inference-networks-for-nonlinear
Repo https://github.com/clinicalml/structuredinference
Framework none

The Importance of Skip Connections in Biomedical Image Segmentation

Title The Importance of Skip Connections in Biomedical Image Segmentation
Authors Michal Drozdzal, Eugene Vorontsov, Gabriel Chartrand, Samuel Kadoury, Chris Pal
Abstract In this paper, we study the influence of both long and short skip connections on Fully Convolutional Networks (FCN) for biomedical image segmentation. In standard FCNs, only long skip connections are used to skip features from the contracting path to the expanding path in order to recover spatial information lost during downsampling. We extend FCNs by adding short skip connections, that are similar to the ones introduced in residual networks, in order to build very deep FCNs (of hundreds of layers). A review of the gradient flow confirms that for a very deep FCN it is beneficial to have both long and short skip connections. Finally, we show that a very deep FCN can achieve near-to-state-of-the-art results on the EM dataset without any further post-processing.
Tasks Semantic Segmentation
Published 2016-08-14
URL http://arxiv.org/abs/1608.04117v2
PDF http://arxiv.org/pdf/1608.04117v2.pdf
PWC https://paperswithcode.com/paper/the-importance-of-skip-connections-in
Repo https://github.com/lauradhatt/Interesting-Reads
Framework none

Enriching Word Vectors with Subword Information

Title Enriching Word Vectors with Subword Information
Authors Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov
Abstract Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.
Tasks Word Embeddings
Published 2016-07-15
URL http://arxiv.org/abs/1607.04606v2
PDF http://arxiv.org/pdf/1607.04606v2.pdf
PWC https://paperswithcode.com/paper/enriching-word-vectors-with-subword
Repo https://github.com/DW-yejing/fasttext4j-jdk6
Framework none

Wavelet Scattering Regression of Quantum Chemical Energies

Title Wavelet Scattering Regression of Quantum Chemical Energies
Authors Matthew Hirn, Stéphane Mallat, Nicolas Poilvert
Abstract We introduce multiscale invariant dictionaries to estimate quantum chemical energies of organic molecules, from training databases. Molecular energies are invariant to isometric atomic displacements, and are Lipschitz continuous to molecular deformations. Similarly to density functional theory (DFT), the molecule is represented by an electronic density function. A multiscale invariant dictionary is calculated with wavelet scattering invariants. It cascades a first wavelet transform which separates scales, with a second wavelet transform which computes interactions across scales. Sparse scattering regressions give state of the art results over two databases of organic planar molecules. On these databases, the regression error is of the order of the error produced by DFT codes, but at a fraction of the computational cost.
Tasks
Published 2016-05-16
URL http://arxiv.org/abs/1605.04654v3
PDF http://arxiv.org/pdf/1605.04654v3.pdf
PWC https://paperswithcode.com/paper/wavelet-scattering-regression-of-quantum
Repo https://github.com/matthew-hirn/ScatNet-QM-2D
Framework none

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles

Title Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Authors Mehdi Noroozi, Paolo Favaro
Abstract In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature mapping of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks.
Tasks Object Classification, Representation Learning, Transfer Learning
Published 2016-03-30
URL http://arxiv.org/abs/1603.09246v3
PDF http://arxiv.org/pdf/1603.09246v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-visual-1
Repo https://github.com/Confusezius/selfsupervised_learning
Framework pytorch
comments powered by Disqus