July 29, 2019

3037 words 15 mins read

Paper Group AWR 140

Learning to Estimate 3D Hand Pose from Single RGB Images. Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search. Pseudo-extended Markov chain Monte Carlo. Do Deep Neural Networks Suffer from Crowding?. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning. Fine-tuning CNN I …

Learning to Estimate 3D Hand Pose from Single RGB Images


Title	Learning to Estimate 3D Hand Pose from Single RGB Images
Authors	Christian Zimmermann, Thomas Brox
Abstract	Low-cost consumer depth cameras and deep learning have enabled reasonable 3D hand pose estimation from single depth images. In this paper, we present an approach that estimates 3D hand pose from regular RGB images. This task has far more ambiguities due to the missing depth information. To this end, we propose a deep network that learns a network-implicit 3D articulation prior. Together with detected keypoints in the images, this network yields good estimates of the 3D pose. We introduce a large scale 3D hand pose dataset based on synthetic hand models for training the involved networks. Experiments on a variety of test sets, including one on sign language recognition, demonstrate the feasibility of 3D hand pose estimation on single color images.
Tasks	Hand Pose Estimation, Pose Estimation, Sign Language Recognition
Published	2017-05-03
URL	http://arxiv.org/abs/1705.01389v3
PDF	http://arxiv.org/pdf/1705.01389v3.pdf
PWC	https://paperswithcode.com/paper/learning-to-estimate-3d-hand-pose-from-single
Repo	https://github.com/theerapatkitti/hand_mask_rcnn
Framework	tf

Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search


Title	Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
Authors	Rémi Pautrat, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
Abstract	One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.
Tasks	Legged Robots, Transfer Learning
Published	2017-09-20
URL	http://arxiv.org/abs/1709.06919v2
PDF	http://arxiv.org/pdf/1709.06919v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-with-automatic-prior
Repo	https://github.com/resibots/pautrat_2018_mlei
Framework	none

Pseudo-extended Markov chain Monte Carlo


Title	Pseudo-extended Markov chain Monte Carlo
Authors	Christopher Nemeth, Fredrik Lindsten, Maurizio Filippone, James Hensman
Abstract	Sampling from posterior distributions using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations, particularly when the posterior is multi-modal as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for multi-modal posterior distributions. The pseudo-extended method augments the state-space of the posterior using pseudo-samples as auxiliary variables. On the extended space, the modes of the posterior are connected, which allows the MCMC sampler to easily move between well-separated posterior modes. We demonstrate that the pseudo-extended approach delivers improved MCMC sampling over the Hamiltonian Monte Carlo algorithm on multi-modal posteriors, including Boltzmann machines and models with sparsity-inducing priors.
Tasks
Published	2017-08-17
URL	https://arxiv.org/abs/1708.05239v3
PDF	https://arxiv.org/pdf/1708.05239v3.pdf
PWC	https://paperswithcode.com/paper/pseudo-extended-markov-chain-monte-carlo
Repo	https://github.com/chris-nemeth/pseudo-extended-mcmc-code
Framework	none

Do Deep Neural Networks Suffer from Crowding?


Title	Do Deep Neural Networks Suffer from Crowding?
Authors	Anna Volokitin, Gemma Roig, Tomaso Poggio
Abstract	Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs which is 1) multi-scale and 2) with size of the convolution filters change depending on the eccentricity wrt to the center of fixation. Such networks, that we call eccentricity-dependent, are a computational model of the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating the flankers into the images of the training set does not improve performance with crowding.
Tasks	Object Recognition
Published	2017-06-26
URL	http://arxiv.org/abs/1706.08616v1
PDF	http://arxiv.org/pdf/1706.08616v1.pdf
PWC	https://paperswithcode.com/paper/do-deep-neural-networks-suffer-from-crowding
Repo	https://github.com/voanna/eccentricity
Framework	tf

SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning


Title	SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning
Authors	Xiaojun Xu, Chang Liu, Dawn Song
Abstract	Synthesizing SQL queries from natural language is a long-standing open problem and has been attracting considerable interest recently. Toward solving the problem, the de facto approach is to employ a sequence-to-sequence-style model. Such an approach will necessarily require the SQL queries to be serialized. Since the same SQL query may have multiple equivalent serializations, training a sequence-to-sequence-style model is sensitive to the choice from one of them. This phenomenon is documented as the “order-matters” problem. Existing state-of-the-art approaches rely on reinforcement learning to reward the decoder when it generates any of the equivalent serializations. However, we observe that the improvement from reinforcement learning is limited. In this paper, we propose a novel approach, i.e., SQLNet, to fundamentally solve this problem by avoiding the sequence-to-sequence structure when the order does not matter. In particular, we employ a sketch-based approach where the sketch contains a dependency graph so that one prediction can be done by taking into consideration only the previous predictions that it depends on. In addition, we propose a sequence-to-set model as well as the column attention mechanism to synthesize the query based on the sketch. By combining all these novel techniques, we show that SQLNet can outperform the prior art by 9% to 13% on the WikiSQL task.
Tasks	Text-To-Sql
Published	2017-11-13
URL	http://arxiv.org/abs/1711.04436v1
PDF	http://arxiv.org/pdf/1711.04436v1.pdf
PWC	https://paperswithcode.com/paper/sqlnet-generating-structured-queries-from
Repo	https://github.com/Baidi96/text2sql
Framework	pytorch

Fine-tuning CNN Image Retrieval with No Human Annotation


Title	Fine-tuning CNN Image Retrieval with No Human Annotation
Authors	Filip Radenović, Giorgos Tolias, Ondřej Chum
Abstract	Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets.
Tasks	Image Retrieval
Published	2017-11-03
URL	http://arxiv.org/abs/1711.02512v2
PDF	http://arxiv.org/pdf/1711.02512v2.pdf
PWC	https://paperswithcode.com/paper/fine-tuning-cnn-image-retrieval-with-no-human
Repo	https://github.com/jandaldrop/landmark-recognition-challenge
Framework	tf

Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting


Title	Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting
Authors	Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe
Abstract	In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR text generated by these models then gets voted to determine the final output by taking the recognized characters, their alternatives, and the confidence values assigned to each character into consideration. Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more.
Tasks	Optical Character Recognition
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09670v1
PDF	http://arxiv.org/pdf/1711.09670v1.pdf
PWC	https://paperswithcode.com/paper/improving-ocr-accuracy-on-early-printed-books-2
Repo	https://github.com/chreul/mptv
Framework	tf

NIMA: Neural Image Assessment


Title	NIMA: Neural Image Assessment
Authors	Hossein Talebi, Peyman Milanfar
Abstract	Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications such as evaluating image capture pipelines, storage techniques and sharing media. Despite the subjective nature of this problem, most existing methods only predict the mean opinion score provided by datasets such as AVA [1] and TID2013 [2]. Our approach differs from others in that we predict the distribution of human opinion scores using a convolutional neural network. Our architecture also has the advantage of being significantly simpler than other methods with comparable performance. Our proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks. Our resulting network can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline. All this is done without need for a “golden” reference image, consequently allowing for single-image, semantic- and perceptually-aware, no-reference quality assessment.
Tasks	Aesthetics Quality Assessment, Image Quality Assessment
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05424v2
PDF	http://arxiv.org/pdf/1709.05424v2.pdf
PWC	https://paperswithcode.com/paper/nima-neural-image-assessment
Repo	https://github.com/titu1994/neural-image-assessment
Framework	tf

Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler


Title	Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler
Authors	Alexander Terenin, Måns Magnusson, Leif Jonsson, David Draper
Abstract	Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don’t converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel P'olya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that – contrary to popular belief in the topic modeling literature – partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora.
Tasks	Topic Models
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03581v6
PDF	http://arxiv.org/pdf/1704.03581v6.pdf
PWC	https://paperswithcode.com/paper/polya-urn-latent-dirichlet-allocation-a
Repo	https://github.com/lejon/PartiallyCollapsedLDA
Framework	none

Weighted Transformer Network for Machine Translation


Title	Weighted Transformer Network for Machine Translation
Authors	Karim Ahmed, Nitish Shirish Keskar, Richard Socher
Abstract	State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine translation tasks, it requires a large number of parameters and training iterations to converge. We propose Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15-40% faster. Specifically, we replace the multi-head attention by multiple self-attention branches that the model learns to combine during the training process. Our model improves the state-of-the-art performance by 0.5 BLEU points on the WMT 2014 English-to-German translation task and by 0.4 on the English-to-French translation task.
Tasks	Machine Translation
Published	2017-11-06
URL	http://arxiv.org/abs/1711.02132v1
PDF	http://arxiv.org/pdf/1711.02132v1.pdf
PWC	https://paperswithcode.com/paper/weighted-transformer-network-for-machine
Repo	https://github.com/duyvuleo/Transformer-DyNet
Framework	tf

ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection


Title	ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection
Authors	Maryan Morel, Emmanuel Bacry, Stéphane Gaïffas, Agathe Guilloux, Fanny Leroy
Abstract	With the increased availability of large databases of electronic health records (EHRs) comes the chance of enhancing health risks screening. Most post-marketing detections of adverse drug reaction (ADR) rely on physicians’ spontaneous reports, leading to under reporting. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features (drug exposures) on a rare longitudinal outcome. Our procedure is based on a conditional Poisson model also known as self-controlled case series (SCCS). We model the intensity of outcomes using a convolution between exposures and step functions, that are penalized using a combination of group-Lasso and total-variation. This approach does not require the specification of precise risk periods, and allows to study in the same model several exposures at the same time. We illustrate the fact that this approach improves the state-of-the-art for the estimation of the relative risks both on simulations and on a cohort of diabetic patients, extracted from the large French national health insurance database (SNIIRAM), a SQL database built around medical reimbursements of more than 65 million people. This work has been done in the context of a research partnership between Ecole Polytechnique and CNAMTS (in charge of SNIIRAM).
Tasks
Published	2017-12-21
URL	http://arxiv.org/abs/1712.08243v2
PDF	http://arxiv.org/pdf/1712.08243v2.pdf
PWC	https://paperswithcode.com/paper/convsccs-convolutional-self-controlled-case
Repo	https://github.com/MaryanMorel/ConvSCCS
Framework	none

Learning optimal wavelet bases using a neural network approach


Title	Learning optimal wavelet bases using a neural network approach
Authors	Andreas Søgaard
Abstract	A novel method for learning optimal, orthonormal wavelet bases for representing 1- and 2D signals, based on parallels between the wavelet transform and fully connected artificial neural networks, is described. The structural similarities between these two concepts are reviewed and combined to a “wavenet”, allowing for the direct learning of optimal wavelet filter coefficient through stochastic gradient descent with back-propagation over ensembles of training inputs, where conditions on the filter coefficients for constituting orthonormal wavelet bases are cast as quadratic regularisations terms. We describe the practical implementation of this method, and study its performance for high-energy physics collision events for QCD $2 \to 2$ processes. It is shown that an optimal solution is found, even in a high-dimensional search space, and the implications of the result are discussed.
Tasks
Published	2017-03-25
URL	http://arxiv.org/abs/1706.03041v2
PDF	http://arxiv.org/pdf/1706.03041v2.pdf
PWC	https://paperswithcode.com/paper/learning-optimal-wavelet-bases-using-a-neural
Repo	https://github.com/asogaard/Wavenet
Framework	none

Merge or Not? Learning to Group Faces via Imitation Learning


Title	Merge or Not? Learning to Group Faces via Imitation Learning
Authors	Yue He, Kaidi Cao, Cheng Li, Chen Change Loy
Abstract	Given a large number of unlabeled face images, face grouping aims at clustering the images into individual identities present in the data. This task remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given profile faces and a large number of uninteresting faces and noisy detections. Often, a user needs to correct the erroneous grouping manually. In this study, we formulate a novel face grouping framework that learns clustering strategy from ground-truth simulated behavior. This is achieved through imitation learning (a.k.a apprenticeship learning or learning by watching) via inverse reinforcement learning (IRL). In contrast to existing clustering approaches that group instances by similarity, our framework makes sequential decision to dynamically decide when to merge two face instances/groups driven by short- and long-term rewards. Extensive experiments on three benchmark datasets show that our framework outperforms unsupervised and supervised baselines.
Tasks	Imitation Learning
Published	2017-07-13
URL	http://arxiv.org/abs/1707.03986v1
PDF	http://arxiv.org/pdf/1707.03986v1.pdf
PWC	https://paperswithcode.com/paper/merge-or-not-learning-to-group-faces-via
Repo	https://github.com/bj80heyue/Learning-to-Group
Framework	none

RACE: Large-scale ReAding Comprehension Dataset From Examinations


Title	RACE: Large-scale ReAding Comprehension Dataset From Examinations
Authors	Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy
Abstract	We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students’ ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines.
Tasks	Reading Comprehension
Published	2017-04-15
URL	http://arxiv.org/abs/1704.04683v5
PDF	http://arxiv.org/pdf/1704.04683v5.pdf
PWC	https://paperswithcode.com/paper/race-large-scale-reading-comprehension
Repo	https://github.com/artiom-zayats/docqa_squad
Framework	none

Dynamic Bernoulli Embeddings for Language Evolution


Title	Dynamic Bernoulli Embeddings for Language Evolution
Authors	Maja Rudolph, David Blei
Abstract	Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. (2016) developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic embeddings, building on exponential family embeddings to capture how the meanings of words change over time. We use dynamic embeddings to analyze three large collections of historical texts: the U.S. Senate speeches from 1858 to 2009, the history of computer science ACM abstracts from 1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We find dynamic embeddings provide better fits than classical embeddings and capture interesting patterns about how language changes.
Tasks	Word Embeddings
Published	2017-03-23
URL	http://arxiv.org/abs/1703.08052v1
PDF	http://arxiv.org/pdf/1703.08052v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-bernoulli-embeddings-for-language
Repo	https://github.com/mariru/dynamic_bernoulli_embeddings
Framework	tf