May 7, 2019

2770 words 14 mins read

Paper Group AWR 54

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract. Face Detection with the Faster R-CNN. Can we still avoid automatic face detection?. Clustering with Confidence: Finding Clusters with Statistical Guarantees. Real-Time Visual Place Recognition for Personal Localization on a Mobile Device. The Spectral Conditio …


Title	A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract
Authors	Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond
Abstract	We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data.
Tasks	Denoising, Image Denoising, Motion Capture, Semantic Segmentation
Published	2016-12-15
URL	http://arxiv.org/abs/1612.05005v5
PDF	http://arxiv.org/pdf/1612.05005v5.pdf
PWC	https://paperswithcode.com/paper/a-multilinear-tongue-model-derived-from
Repo	https://github.com/m2ci-msp/mri-shape-tools
Framework	none

Face Detection with the Faster R-CNN


Title	Face Detection with the Faster R-CNN
Authors	Huaizu Jiang, Erik Learned-Miller
Abstract	The Faster R-CNN has recently demonstrated impressive results on various object detection benchmarks. By training a Faster R-CNN model on the large scale WIDER face dataset, we report state-of-the-art results on two widely used face detection benchmarks, FDDB and the recently released IJB-A.
Tasks	Face Detection, Object Detection
Published	2016-06-10
URL	http://arxiv.org/abs/1606.03473v1
PDF	http://arxiv.org/pdf/1606.03473v1.pdf
PWC	https://paperswithcode.com/paper/face-detection-with-the-faster-r-cnn
Repo	https://github.com/playerkk/face-py-faster-rcnn
Framework	none

Can we still avoid automatic face detection?


Title	Can we still avoid automatic face detection?
Authors	Michael J. Wilber, Vitaly Shmatikov, Serge Belongie
Abstract	After decades of study, automatic face detection and recognition systems are now accurate and widespread. Naturally, this means users who wish to avoid automatic recognition are becoming less able to do so. Where do we stand in this cat-and-mouse race? We currently live in a society where everyone carries a camera in their pocket. Many people willfully upload most or all of the pictures they take to social networks which invest heavily in automatic face recognition systems. In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition? If so, how? Must evasion techniques be obvious to be effective, or are there still simple measures that users can use to protect themselves? In this work, we find ways to evade face detection on Facebook, a representative example of a popular social network that uses automatic face detection to enhance their service. We challenge widely-held beliefs about evading face detection: do our old techniques such as blurring the face region or wearing “privacy glasses” still work? We show that in general, state-of-the-art detectors can often find faces even if the subject wears occluding clothing or even if the uploader damages the photo to prevent faces from being detected.
Tasks	Face Detection, Face Recognition
Published	2016-02-14
URL	https://arxiv.org/abs/1602.04504v2
PDF	https://arxiv.org/pdf/1602.04504v2.pdf
PWC	https://paperswithcode.com/paper/can-we-still-avoid-automatic-face-detection
Repo	https://github.com/cydonia999/Tiny_Faces_in_Tensorflow
Framework	tf

Clustering with Confidence: Finding Clusters with Statistical Guarantees


Title	Clustering with Confidence: Finding Clusters with Statistical Guarantees
Authors	Andreas Henelius, Kai Puolamäki, Henrik Boström, Panagiotis Papapetrou
Abstract	Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least $1 - \alpha$. We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness.
Tasks
Published	2016-12-27
URL	http://arxiv.org/abs/1612.08714v2
PDF	http://arxiv.org/pdf/1612.08714v2.pdf
PWC	https://paperswithcode.com/paper/clustering-with-confidence-finding-clusters
Repo	https://github.com/bwrc/corecluster-r
Framework	none

Real-Time Visual Place Recognition for Personal Localization on a Mobile Device


Title	Real-Time Visual Place Recognition for Personal Localization on a Mobile Device
Authors	Michał Nowicki, Jan Wietrzykowski, Piotr Skrzypczyński
Abstract	The paper presents an approach to indoor personal localization on a mobile device based on visual place recognition. We implemented on a smartphone two state-of-the-art algorithms that are representative to two different approaches to visual place recognition: FAB-MAP that recognizes places using individual images, and ABLE-M that utilizes sequences of images. These algorithms are evaluated in environments of different structure, focusing on problems commonly encountered when a mobile device camera is used. The conclusions drawn from this evaluation are guidelines to design the FastABLE system, which is based on the ABLE-M algorithm, but introduces major modifications to the concept of image matching. The improvements radically cut down the processing time and improve scalability, making it possible to localize the user in long image sequences with the limited computing power of a mobile device. The resulting place recognition system compares favorably to both the ABLE-M and the FAB-MAP solutions in the context of real-time personal localization.
Tasks	Visual Place Recognition
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02061v2
PDF	http://arxiv.org/pdf/1611.02061v2.pdf
PWC	https://paperswithcode.com/paper/real-time-visual-place-recognition-for
Repo	https://github.com/LRMPUT/FastABLE
Framework	none

The Spectral Condition Number Plot for Regularization Parameter Determination


Title	The Spectral Condition Number Plot for Regularization Parameter Determination
Authors	Carel F. W. Peeters, Mark A. van de Wiel, Wessel N. van Wieringen
Abstract	Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators.
Tasks
Published	2016-08-14
URL	http://arxiv.org/abs/1608.04123v1
PDF	http://arxiv.org/pdf/1608.04123v1.pdf
PWC	https://paperswithcode.com/paper/the-spectral-condition-number-plot-for
Repo	https://github.com/CFWP/rags2ridges
Framework	none

Convolutional Neural Networks using Logarithmic Data Representation


Title	Convolutional Neural Networks using Logarithmic Data Representation
Authors	Daisuke Miyashita, Edward H. Lee, Boris Murmann
Abstract	Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers. Finally, we propose an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at 5-bits.
Tasks
Published	2016-03-03
URL	http://arxiv.org/abs/1603.01025v2
PDF	http://arxiv.org/pdf/1603.01025v2.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-using
Repo	https://github.com/Enderdead/Pytorch_Quantize_impls
Framework	pytorch

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure


Title	Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Authors	Alberto Bietti, Julien Mairal
Abstract	Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.
Tasks	Data Augmentation, Stochastic Optimization
Published	2016-10-04
URL	http://arxiv.org/abs/1610.00970v6
PDF	http://arxiv.org/pdf/1610.00970v6.pdf
PWC	https://paperswithcode.com/paper/stochastic-optimization-with-variance
Repo	https://github.com/albietz/stochs
Framework	none

Semantic Image Inpainting with Deep Generative Models


Title	Semantic Image Inpainting with Deep Generative Models
Authors	Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do
Abstract	Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only a single image generally produce unsatisfactory results due to the lack of high level context. In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditioning on the available data. Given a trained generative model, we search for the closest encoding of the corrupted image in the latent image manifold using our context and prior losses. This encoding is then passed through the generative model to infer the missing content. In our method, inference is possible irrespective of how the missing content is structured, while the state-of-the-art learning based method requires specific information about the holes in the training phase. Experiments on three datasets show that our method successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming the state-of-the-art methods.
Tasks	Image Inpainting
Published	2016-07-26
URL	http://arxiv.org/abs/1607.07539v3
PDF	http://arxiv.org/pdf/1607.07539v3.pdf
PWC	https://paperswithcode.com/paper/semantic-image-inpainting-with-deep
Repo	https://github.com/shravan097/Image-Inpainting
Framework	none

Variance-based regularization with convex objectives


Title	Variance-based regularization with convex objectives
Authors	John Duchi, Hongseok Namkoong
Abstract	We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen’s empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems.
Tasks	Stochastic Optimization
Published	2016-10-08
URL	http://arxiv.org/abs/1610.02581v3
PDF	http://arxiv.org/pdf/1610.02581v3.pdf
PWC	https://paperswithcode.com/paper/variance-based-regularization-with-convex
Repo	https://github.com/hsnamkoong/robustopt
Framework	none

Structured Inference Networks for Nonlinear State Space Models


Title	Structured Inference Networks for Nonlinear State Space Models
Authors	Rahul G. Krishnan, Uri Shalit, David Sontag
Abstract	Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.
Tasks	Multivariate Time Series Forecasting
Published	2016-09-30
URL	http://arxiv.org/abs/1609.09869v2
PDF	http://arxiv.org/pdf/1609.09869v2.pdf
PWC	https://paperswithcode.com/paper/structured-inference-networks-for-nonlinear
Repo	https://github.com/clinicalml/structuredinference
Framework	none

The Importance of Skip Connections in Biomedical Image Segmentation


Title	The Importance of Skip Connections in Biomedical Image Segmentation
Authors	Michal Drozdzal, Eugene Vorontsov, Gabriel Chartrand, Samuel Kadoury, Chris Pal
Abstract	In this paper, we study the influence of both long and short skip connections on Fully Convolutional Networks (FCN) for biomedical image segmentation. In standard FCNs, only long skip connections are used to skip features from the contracting path to the expanding path in order to recover spatial information lost during downsampling. We extend FCNs by adding short skip connections, that are similar to the ones introduced in residual networks, in order to build very deep FCNs (of hundreds of layers). A review of the gradient flow confirms that for a very deep FCN it is beneficial to have both long and short skip connections. Finally, we show that a very deep FCN can achieve near-to-state-of-the-art results on the EM dataset without any further post-processing.
Tasks	Semantic Segmentation
Published	2016-08-14
URL	http://arxiv.org/abs/1608.04117v2
PDF	http://arxiv.org/pdf/1608.04117v2.pdf
PWC	https://paperswithcode.com/paper/the-importance-of-skip-connections-in
Repo	https://github.com/lauradhatt/Interesting-Reads
Framework	none

Enriching Word Vectors with Subword Information


Title	Enriching Word Vectors with Subword Information
Authors	Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov
Abstract	Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.
Tasks	Word Embeddings
Published	2016-07-15
URL	http://arxiv.org/abs/1607.04606v2
PDF	http://arxiv.org/pdf/1607.04606v2.pdf
PWC	https://paperswithcode.com/paper/enriching-word-vectors-with-subword
Repo	https://github.com/DW-yejing/fasttext4j-jdk6
Framework	none

Wavelet Scattering Regression of Quantum Chemical Energies


Title	Wavelet Scattering Regression of Quantum Chemical Energies
Authors	Matthew Hirn, Stéphane Mallat, Nicolas Poilvert
Abstract	We introduce multiscale invariant dictionaries to estimate quantum chemical energies of organic molecules, from training databases. Molecular energies are invariant to isometric atomic displacements, and are Lipschitz continuous to molecular deformations. Similarly to density functional theory (DFT), the molecule is represented by an electronic density function. A multiscale invariant dictionary is calculated with wavelet scattering invariants. It cascades a first wavelet transform which separates scales, with a second wavelet transform which computes interactions across scales. Sparse scattering regressions give state of the art results over two databases of organic planar molecules. On these databases, the regression error is of the order of the error produced by DFT codes, but at a fraction of the computational cost.
Tasks
Published	2016-05-16
URL	http://arxiv.org/abs/1605.04654v3
PDF	http://arxiv.org/pdf/1605.04654v3.pdf
PWC	https://paperswithcode.com/paper/wavelet-scattering-regression-of-quantum
Repo	https://github.com/matthew-hirn/ScatNet-QM-2D
Framework	none

Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles


Title	Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Authors	Mehdi Noroozi, Paolo Favaro
Abstract	In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature mapping of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks.
Tasks	Object Classification, Representation Learning, Transfer Learning
Published	2016-03-30
URL	http://arxiv.org/abs/1603.09246v3
PDF	http://arxiv.org/pdf/1603.09246v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-visual-1
Repo	https://github.com/Confusezius/selfsupervised_learning
Framework	pytorch