Paper Group AWR 54
A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract. Face Detection with the Faster R-CNN. Can we still avoid automatic face detection?. Clustering with Confidence: Finding Clusters with Statistical Guarantees. Real-Time Visual Place Recognition for Personal Localization on a Mobile Device. The Spectral Conditio …
A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract
Title | A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract |
Authors | Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, Korin Richmond |
Abstract | We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data. |
Tasks | Denoising, Image Denoising, Motion Capture, Semantic Segmentation |
Published | 2016-12-15 |
URL | http://arxiv.org/abs/1612.05005v5 |
http://arxiv.org/pdf/1612.05005v5.pdf | |
PWC | https://paperswithcode.com/paper/a-multilinear-tongue-model-derived-from |
Repo | https://github.com/m2ci-msp/mri-shape-tools |
Framework | none |
Face Detection with the Faster R-CNN
Title | Face Detection with the Faster R-CNN |
Authors | Huaizu Jiang, Erik Learned-Miller |
Abstract | The Faster R-CNN has recently demonstrated impressive results on various object detection benchmarks. By training a Faster R-CNN model on the large scale WIDER face dataset, we report state-of-the-art results on two widely used face detection benchmarks, FDDB and the recently released IJB-A. |
Tasks | Face Detection, Object Detection |
Published | 2016-06-10 |
URL | http://arxiv.org/abs/1606.03473v1 |
http://arxiv.org/pdf/1606.03473v1.pdf | |
PWC | https://paperswithcode.com/paper/face-detection-with-the-faster-r-cnn |
Repo | https://github.com/playerkk/face-py-faster-rcnn |
Framework | none |
Can we still avoid automatic face detection?
Title | Can we still avoid automatic face detection? |
Authors | Michael J. Wilber, Vitaly Shmatikov, Serge Belongie |
Abstract | After decades of study, automatic face detection and recognition systems are now accurate and widespread. Naturally, this means users who wish to avoid automatic recognition are becoming less able to do so. Where do we stand in this cat-and-mouse race? We currently live in a society where everyone carries a camera in their pocket. Many people willfully upload most or all of the pictures they take to social networks which invest heavily in automatic face recognition systems. In this setting, is it still possible for privacy-conscientious users to avoid automatic face detection and recognition? If so, how? Must evasion techniques be obvious to be effective, or are there still simple measures that users can use to protect themselves? In this work, we find ways to evade face detection on Facebook, a representative example of a popular social network that uses automatic face detection to enhance their service. We challenge widely-held beliefs about evading face detection: do our old techniques such as blurring the face region or wearing “privacy glasses” still work? We show that in general, state-of-the-art detectors can often find faces even if the subject wears occluding clothing or even if the uploader damages the photo to prevent faces from being detected. |
Tasks | Face Detection, Face Recognition |
Published | 2016-02-14 |
URL | https://arxiv.org/abs/1602.04504v2 |
https://arxiv.org/pdf/1602.04504v2.pdf | |
PWC | https://paperswithcode.com/paper/can-we-still-avoid-automatic-face-detection |
Repo | https://github.com/cydonia999/Tiny_Faces_in_Tensorflow |
Framework | tf |
Clustering with Confidence: Finding Clusters with Statistical Guarantees
Title | Clustering with Confidence: Finding Clusters with Statistical Guarantees |
Authors | Andreas Henelius, Kai Puolamäki, Henrik Boström, Panagiotis Papapetrou |
Abstract | Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least $1 - \alpha$. We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness. |
Tasks | |
Published | 2016-12-27 |
URL | http://arxiv.org/abs/1612.08714v2 |
http://arxiv.org/pdf/1612.08714v2.pdf | |
PWC | https://paperswithcode.com/paper/clustering-with-confidence-finding-clusters |
Repo | https://github.com/bwrc/corecluster-r |
Framework | none |
Real-Time Visual Place Recognition for Personal Localization on a Mobile Device
Title | Real-Time Visual Place Recognition for Personal Localization on a Mobile Device |
Authors | Michał Nowicki, Jan Wietrzykowski, Piotr Skrzypczyński |
Abstract | The paper presents an approach to indoor personal localization on a mobile device based on visual place recognition. We implemented on a smartphone two state-of-the-art algorithms that are representative to two different approaches to visual place recognition: FAB-MAP that recognizes places using individual images, and ABLE-M that utilizes sequences of images. These algorithms are evaluated in environments of different structure, focusing on problems commonly encountered when a mobile device camera is used. The conclusions drawn from this evaluation are guidelines to design the FastABLE system, which is based on the ABLE-M algorithm, but introduces major modifications to the concept of image matching. The improvements radically cut down the processing time and improve scalability, making it possible to localize the user in long image sequences with the limited computing power of a mobile device. The resulting place recognition system compares favorably to both the ABLE-M and the FAB-MAP solutions in the context of real-time personal localization. |
Tasks | Visual Place Recognition |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02061v2 |
http://arxiv.org/pdf/1611.02061v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-visual-place-recognition-for |
Repo | https://github.com/LRMPUT/FastABLE |
Framework | none |
The Spectral Condition Number Plot for Regularization Parameter Determination
Title | The Spectral Condition Number Plot for Regularization Parameter Determination |
Authors | Carel F. W. Peeters, Mark A. van de Wiel, Wessel N. van Wieringen |
Abstract | Many modern statistical applications ask for the estimation of a covariance (or precision) matrix in settings where the number of variables is larger than the number of observations. There exists a broad class of ridge-type estimators that employs regularization to cope with the subsequent singularity of the sample covariance matrix. These estimators depend on a penalty parameter and choosing its value can be hard, in terms of being computationally unfeasible or tenable only for a restricted set of ridge-type estimators. Here we introduce a simple graphical tool, the spectral condition number plot, for informed heuristic penalty parameter selection. The proposed tool is computationally friendly and can be employed for the full class of ridge-type covariance (precision) estimators. |
Tasks | |
Published | 2016-08-14 |
URL | http://arxiv.org/abs/1608.04123v1 |
http://arxiv.org/pdf/1608.04123v1.pdf | |
PWC | https://paperswithcode.com/paper/the-spectral-condition-number-plot-for |
Repo | https://github.com/CFWP/rags2ridges |
Framework | none |
Convolutional Neural Networks using Logarithmic Data Representation
Title | Convolutional Neural Networks using Logarithmic Data Representation |
Authors | Daisuke Miyashita, Edward H. Lee, Boris Murmann |
Abstract | Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers. Finally, we propose an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at 5-bits. |
Tasks | |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.01025v2 |
http://arxiv.org/pdf/1603.01025v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-networks-using |
Repo | https://github.com/Enderdead/Pytorch_Quantize_impls |
Framework | pytorch |
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Title | Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure |
Authors | Alberto Bietti, Julien Mairal |
Abstract | Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example. |
Tasks | Data Augmentation, Stochastic Optimization |
Published | 2016-10-04 |
URL | http://arxiv.org/abs/1610.00970v6 |
http://arxiv.org/pdf/1610.00970v6.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-optimization-with-variance |
Repo | https://github.com/albietz/stochs |
Framework | none |
Semantic Image Inpainting with Deep Generative Models
Title | Semantic Image Inpainting with Deep Generative Models |
Authors | Raymond A. Yeh, Chen Chen, Teck Yian Lim, Alexander G. Schwing, Mark Hasegawa-Johnson, Minh N. Do |
Abstract | Semantic image inpainting is a challenging task where large missing regions have to be filled based on the available visual data. Existing methods which extract information from only a single image generally produce unsatisfactory results due to the lack of high level context. In this paper, we propose a novel method for semantic image inpainting, which generates the missing content by conditioning on the available data. Given a trained generative model, we search for the closest encoding of the corrupted image in the latent image manifold using our context and prior losses. This encoding is then passed through the generative model to infer the missing content. In our method, inference is possible irrespective of how the missing content is structured, while the state-of-the-art learning based method requires specific information about the holes in the training phase. Experiments on three datasets show that our method successfully predicts information in large missing regions and achieves pixel-level photorealism, significantly outperforming the state-of-the-art methods. |
Tasks | Image Inpainting |
Published | 2016-07-26 |
URL | http://arxiv.org/abs/1607.07539v3 |
http://arxiv.org/pdf/1607.07539v3.pdf | |
PWC | https://paperswithcode.com/paper/semantic-image-inpainting-with-deep |
Repo | https://github.com/shravan097/Image-Inpainting |
Framework | none |
Variance-based regularization with convex objectives
Title | Variance-based regularization with convex objectives |
Authors | John Duchi, Hongseok Namkoong |
Abstract | We develop an approach to risk minimization and stochastic optimization that provides a convex surrogate for variance, allowing near-optimal and computationally efficient trading between approximation and estimation error. Our approach builds off of techniques for distributionally robust optimization and Owen’s empirical likelihood, and we provide a number of finite-sample and asymptotic results characterizing the theoretical performance of the estimator. In particular, we show that our procedure comes with certificates of optimality, achieving (in some scenarios) faster rates of convergence than empirical risk minimization by virtue of automatically balancing bias and variance. We give corroborating empirical evidence showing that in practice, the estimator indeed trades between variance and absolute performance on a training sample, improving out-of-sample (test) performance over standard empirical risk minimization for a number of classification problems. |
Tasks | Stochastic Optimization |
Published | 2016-10-08 |
URL | http://arxiv.org/abs/1610.02581v3 |
http://arxiv.org/pdf/1610.02581v3.pdf | |
PWC | https://paperswithcode.com/paper/variance-based-regularization-with-convex |
Repo | https://github.com/hsnamkoong/robustopt |
Framework | none |
Structured Inference Networks for Nonlinear State Space Models
Title | Structured Inference Networks for Nonlinear State Space Models |
Authors | Rahul G. Krishnan, Uri Shalit, David Sontag |
Abstract | Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood. |
Tasks | Multivariate Time Series Forecasting |
Published | 2016-09-30 |
URL | http://arxiv.org/abs/1609.09869v2 |
http://arxiv.org/pdf/1609.09869v2.pdf | |
PWC | https://paperswithcode.com/paper/structured-inference-networks-for-nonlinear |
Repo | https://github.com/clinicalml/structuredinference |
Framework | none |
The Importance of Skip Connections in Biomedical Image Segmentation
Title | The Importance of Skip Connections in Biomedical Image Segmentation |
Authors | Michal Drozdzal, Eugene Vorontsov, Gabriel Chartrand, Samuel Kadoury, Chris Pal |
Abstract | In this paper, we study the influence of both long and short skip connections on Fully Convolutional Networks (FCN) for biomedical image segmentation. In standard FCNs, only long skip connections are used to skip features from the contracting path to the expanding path in order to recover spatial information lost during downsampling. We extend FCNs by adding short skip connections, that are similar to the ones introduced in residual networks, in order to build very deep FCNs (of hundreds of layers). A review of the gradient flow confirms that for a very deep FCN it is beneficial to have both long and short skip connections. Finally, we show that a very deep FCN can achieve near-to-state-of-the-art results on the EM dataset without any further post-processing. |
Tasks | Semantic Segmentation |
Published | 2016-08-14 |
URL | http://arxiv.org/abs/1608.04117v2 |
http://arxiv.org/pdf/1608.04117v2.pdf | |
PWC | https://paperswithcode.com/paper/the-importance-of-skip-connections-in |
Repo | https://github.com/lauradhatt/Interesting-Reads |
Framework | none |
Enriching Word Vectors with Subword Information
Title | Enriching Word Vectors with Subword Information |
Authors | Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov |
Abstract | Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks. |
Tasks | Word Embeddings |
Published | 2016-07-15 |
URL | http://arxiv.org/abs/1607.04606v2 |
http://arxiv.org/pdf/1607.04606v2.pdf | |
PWC | https://paperswithcode.com/paper/enriching-word-vectors-with-subword |
Repo | https://github.com/DW-yejing/fasttext4j-jdk6 |
Framework | none |
Wavelet Scattering Regression of Quantum Chemical Energies
Title | Wavelet Scattering Regression of Quantum Chemical Energies |
Authors | Matthew Hirn, Stéphane Mallat, Nicolas Poilvert |
Abstract | We introduce multiscale invariant dictionaries to estimate quantum chemical energies of organic molecules, from training databases. Molecular energies are invariant to isometric atomic displacements, and are Lipschitz continuous to molecular deformations. Similarly to density functional theory (DFT), the molecule is represented by an electronic density function. A multiscale invariant dictionary is calculated with wavelet scattering invariants. It cascades a first wavelet transform which separates scales, with a second wavelet transform which computes interactions across scales. Sparse scattering regressions give state of the art results over two databases of organic planar molecules. On these databases, the regression error is of the order of the error produced by DFT codes, but at a fraction of the computational cost. |
Tasks | |
Published | 2016-05-16 |
URL | http://arxiv.org/abs/1605.04654v3 |
http://arxiv.org/pdf/1605.04654v3.pdf | |
PWC | https://paperswithcode.com/paper/wavelet-scattering-regression-of-quantum |
Repo | https://github.com/matthew-hirn/ScatNet-QM-2D |
Framework | none |
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Title | Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles |
Authors | Mehdi Noroozi, Paolo Favaro |
Abstract | In this paper we study the problem of image representation learning without human annotation. By following the principles of self-supervision, we build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling, and then later repurposed to solve object classification and detection. To maintain the compatibility across tasks we introduce the context-free network (CFN), a siamese-ennead CNN. The CFN takes image tiles as input and explicitly limits the receptive field (or context) of its early processing units to one tile at a time. We show that the CFN includes fewer parameters than AlexNet while preserving the same semantic learning capabilities. By training the CFN to solve Jigsaw puzzles, we learn both a feature mapping of object parts as well as their correct spatial arrangement. Our experimental evaluations show that the learned features capture semantically relevant content. Our proposed method for learning visual representations outperforms state of the art methods in several transfer learning benchmarks. |
Tasks | Object Classification, Representation Learning, Transfer Learning |
Published | 2016-03-30 |
URL | http://arxiv.org/abs/1603.09246v3 |
http://arxiv.org/pdf/1603.09246v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-visual-1 |
Repo | https://github.com/Confusezius/selfsupervised_learning |
Framework | pytorch |