Paper Group AWR 140
Learning to Estimate 3D Hand Pose from Single RGB Images. Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search. Pseudo-extended Markov chain Monte Carlo. Do Deep Neural Networks Suffer from Crowding?. SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning. Fine-tuning CNN I …
Learning to Estimate 3D Hand Pose from Single RGB Images
Title | Learning to Estimate 3D Hand Pose from Single RGB Images |
Authors | Christian Zimmermann, Thomas Brox |
Abstract | Low-cost consumer depth cameras and deep learning have enabled reasonable 3D hand pose estimation from single depth images. In this paper, we present an approach that estimates 3D hand pose from regular RGB images. This task has far more ambiguities due to the missing depth information. To this end, we propose a deep network that learns a network-implicit 3D articulation prior. Together with detected keypoints in the images, this network yields good estimates of the 3D pose. We introduce a large scale 3D hand pose dataset based on synthetic hand models for training the involved networks. Experiments on a variety of test sets, including one on sign language recognition, demonstrate the feasibility of 3D hand pose estimation on single color images. |
Tasks | Hand Pose Estimation, Pose Estimation, Sign Language Recognition |
Published | 2017-05-03 |
URL | http://arxiv.org/abs/1705.01389v3 |
http://arxiv.org/pdf/1705.01389v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-estimate-3d-hand-pose-from-single |
Repo | https://github.com/theerapatkitti/hand_mask_rcnn |
Framework | tf |
Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search
Title | Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search |
Authors | Rémi Pautrat, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret |
Abstract | One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors. |
Tasks | Legged Robots, Transfer Learning |
Published | 2017-09-20 |
URL | http://arxiv.org/abs/1709.06919v2 |
http://arxiv.org/pdf/1709.06919v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimization-with-automatic-prior |
Repo | https://github.com/resibots/pautrat_2018_mlei |
Framework | none |
Pseudo-extended Markov chain Monte Carlo
Title | Pseudo-extended Markov chain Monte Carlo |
Authors | Christopher Nemeth, Fredrik Lindsten, Maurizio Filippone, James Hensman |
Abstract | Sampling from posterior distributions using Markov chain Monte Carlo (MCMC) methods can require an exhaustive number of iterations, particularly when the posterior is multi-modal as the MCMC sampler can become trapped in a local mode for a large number of iterations. In this paper, we introduce the pseudo-extended MCMC method as a simple approach for improving the mixing of the MCMC sampler for multi-modal posterior distributions. The pseudo-extended method augments the state-space of the posterior using pseudo-samples as auxiliary variables. On the extended space, the modes of the posterior are connected, which allows the MCMC sampler to easily move between well-separated posterior modes. We demonstrate that the pseudo-extended approach delivers improved MCMC sampling over the Hamiltonian Monte Carlo algorithm on multi-modal posteriors, including Boltzmann machines and models with sparsity-inducing priors. |
Tasks | |
Published | 2017-08-17 |
URL | https://arxiv.org/abs/1708.05239v3 |
https://arxiv.org/pdf/1708.05239v3.pdf | |
PWC | https://paperswithcode.com/paper/pseudo-extended-markov-chain-monte-carlo |
Repo | https://github.com/chris-nemeth/pseudo-extended-mcmc-code |
Framework | none |
Do Deep Neural Networks Suffer from Crowding?
Title | Do Deep Neural Networks Suffer from Crowding? |
Authors | Anna Volokitin, Gemma Roig, Tomaso Poggio |
Abstract | Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs which is 1) multi-scale and 2) with size of the convolution filters change depending on the eccentricity wrt to the center of fixation. Such networks, that we call eccentricity-dependent, are a computational model of the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating the flankers into the images of the training set does not improve performance with crowding. |
Tasks | Object Recognition |
Published | 2017-06-26 |
URL | http://arxiv.org/abs/1706.08616v1 |
http://arxiv.org/pdf/1706.08616v1.pdf | |
PWC | https://paperswithcode.com/paper/do-deep-neural-networks-suffer-from-crowding |
Repo | https://github.com/voanna/eccentricity |
Framework | tf |
SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning
Title | SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning |
Authors | Xiaojun Xu, Chang Liu, Dawn Song |
Abstract | Synthesizing SQL queries from natural language is a long-standing open problem and has been attracting considerable interest recently. Toward solving the problem, the de facto approach is to employ a sequence-to-sequence-style model. Such an approach will necessarily require the SQL queries to be serialized. Since the same SQL query may have multiple equivalent serializations, training a sequence-to-sequence-style model is sensitive to the choice from one of them. This phenomenon is documented as the “order-matters” problem. Existing state-of-the-art approaches rely on reinforcement learning to reward the decoder when it generates any of the equivalent serializations. However, we observe that the improvement from reinforcement learning is limited. In this paper, we propose a novel approach, i.e., SQLNet, to fundamentally solve this problem by avoiding the sequence-to-sequence structure when the order does not matter. In particular, we employ a sketch-based approach where the sketch contains a dependency graph so that one prediction can be done by taking into consideration only the previous predictions that it depends on. In addition, we propose a sequence-to-set model as well as the column attention mechanism to synthesize the query based on the sketch. By combining all these novel techniques, we show that SQLNet can outperform the prior art by 9% to 13% on the WikiSQL task. |
Tasks | Text-To-Sql |
Published | 2017-11-13 |
URL | http://arxiv.org/abs/1711.04436v1 |
http://arxiv.org/pdf/1711.04436v1.pdf | |
PWC | https://paperswithcode.com/paper/sqlnet-generating-structured-queries-from |
Repo | https://github.com/Baidi96/text2sql |
Framework | pytorch |
Fine-tuning CNN Image Retrieval with No Human Annotation
Title | Fine-tuning CNN Image Retrieval with No Human Annotation |
Authors | Filip Radenović, Giorgos Tolias, Ondřej Chum |
Abstract | Image descriptors based on activations of Convolutional Neural Networks (CNNs) have become dominant in image retrieval due to their discriminative power, compactness of representation, and search efficiency. Training of CNNs, either from scratch or fine-tuning, requires a large amount of annotated data, where a high quality of annotation is often crucial. In this work, we propose to fine-tune CNNs for image retrieval on a large collection of unordered images in a fully automated manner. Reconstructed 3D models obtained by the state-of-the-art retrieval and structure-from-motion methods guide the selection of the training data. We show that both hard-positive and hard-negative examples, selected by exploiting the geometry and the camera positions available from the 3D models, enhance the performance of particular-object retrieval. CNN descriptor whitening discriminatively learned from the same training data outperforms commonly used PCA whitening. We propose a novel trainable Generalized-Mean (GeM) pooling layer that generalizes max and average pooling and show that it boosts retrieval performance. Applying the proposed method to the VGG network achieves state-of-the-art performance on the standard benchmarks: Oxford Buildings, Paris, and Holidays datasets. |
Tasks | Image Retrieval |
Published | 2017-11-03 |
URL | http://arxiv.org/abs/1711.02512v2 |
http://arxiv.org/pdf/1711.02512v2.pdf | |
PWC | https://paperswithcode.com/paper/fine-tuning-cnn-image-retrieval-with-no-human |
Repo | https://github.com/jandaldrop/landmark-recognition-challenge |
Framework | tf |
Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting
Title | Improving OCR Accuracy on Early Printed Books by utilizing Cross Fold Training and Voting |
Authors | Christian Reul, Uwe Springmann, Christoph Wick, Frank Puppe |
Abstract | In this paper we introduce a method that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books. The method uses a combination of cross fold training and confidence based voting. After allocating the available ground truth in different subsets several training processes are performed, each resulting in a specific OCR model. The OCR text generated by these models then gets voted to determine the final output by taking the recognized characters, their alternatives, and the confidence values assigned to each character into consideration. Experiments on seven early printed books show that the proposed method outperforms the standard approach considerably by reducing the amount of errors by up to 50% and more. |
Tasks | Optical Character Recognition |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09670v1 |
http://arxiv.org/pdf/1711.09670v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-ocr-accuracy-on-early-printed-books-2 |
Repo | https://github.com/chreul/mptv |
Framework | tf |
NIMA: Neural Image Assessment
Title | NIMA: Neural Image Assessment |
Authors | Hossein Talebi, Peyman Milanfar |
Abstract | Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications such as evaluating image capture pipelines, storage techniques and sharing media. Despite the subjective nature of this problem, most existing methods only predict the mean opinion score provided by datasets such as AVA [1] and TID2013 [2]. Our approach differs from others in that we predict the distribution of human opinion scores using a convolutional neural network. Our architecture also has the advantage of being significantly simpler than other methods with comparable performance. Our proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks. Our resulting network can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline. All this is done without need for a “golden” reference image, consequently allowing for single-image, semantic- and perceptually-aware, no-reference quality assessment. |
Tasks | Aesthetics Quality Assessment, Image Quality Assessment |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05424v2 |
http://arxiv.org/pdf/1709.05424v2.pdf | |
PWC | https://paperswithcode.com/paper/nima-neural-image-assessment |
Repo | https://github.com/titu1994/neural-image-assessment |
Framework | tf |
Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler
Title | Polya Urn Latent Dirichlet Allocation: a doubly sparse massively parallel sampler |
Authors | Alexander Terenin, Måns Magnusson, Leif Jonsson, David Draper |
Abstract | Latent Dirichlet Allocation (LDA) is a topic model widely used in natural language processing and machine learning. Most approaches to training the model rely on iterative algorithms, which makes it difficult to run LDA on big corpora that are best analyzed in parallel and distributed computational environments. Indeed, current approaches to parallel inference either don’t converge to the correct posterior or require storage of large dense matrices in memory. We present a novel sampler that overcomes both problems, and we show that this sampler is faster, both empirically and theoretically, than previous Gibbs samplers for LDA. We do so by employing a novel P'olya-urn-based approximation in the sparse partially collapsed sampler for LDA. We prove that the approximation error vanishes with data size, making our algorithm asymptotically exact, a property of importance for large-scale topic models. In addition, we show, via an explicit example, that – contrary to popular belief in the topic modeling literature – partially collapsed samplers can be more efficient than fully collapsed samplers. We conclude by comparing the performance of our algorithm with that of other approaches on well-known corpora. |
Tasks | Topic Models |
Published | 2017-04-12 |
URL | http://arxiv.org/abs/1704.03581v6 |
http://arxiv.org/pdf/1704.03581v6.pdf | |
PWC | https://paperswithcode.com/paper/polya-urn-latent-dirichlet-allocation-a |
Repo | https://github.com/lejon/PartiallyCollapsedLDA |
Framework | none |
Weighted Transformer Network for Machine Translation
Title | Weighted Transformer Network for Machine Translation |
Authors | Karim Ahmed, Nitish Shirish Keskar, Richard Socher |
Abstract | State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et al. (2017) propose a new architecture that avoids recurrence and convolution completely. Instead, it uses only self-attention and feed-forward layers. While the proposed architecture achieves state-of-the-art results on several machine translation tasks, it requires a large number of parameters and training iterations to converge. We propose Weighted Transformer, a Transformer with modified attention layers, that not only outperforms the baseline network in BLEU score but also converges 15-40% faster. Specifically, we replace the multi-head attention by multiple self-attention branches that the model learns to combine during the training process. Our model improves the state-of-the-art performance by 0.5 BLEU points on the WMT 2014 English-to-German translation task and by 0.4 on the English-to-French translation task. |
Tasks | Machine Translation |
Published | 2017-11-06 |
URL | http://arxiv.org/abs/1711.02132v1 |
http://arxiv.org/pdf/1711.02132v1.pdf | |
PWC | https://paperswithcode.com/paper/weighted-transformer-network-for-machine |
Repo | https://github.com/duyvuleo/Transformer-DyNet |
Framework | tf |
ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection
Title | ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection |
Authors | Maryan Morel, Emmanuel Bacry, Stéphane Gaïffas, Agathe Guilloux, Fanny Leroy |
Abstract | With the increased availability of large databases of electronic health records (EHRs) comes the chance of enhancing health risks screening. Most post-marketing detections of adverse drug reaction (ADR) rely on physicians’ spontaneous reports, leading to under reporting. To take up this challenge, we develop a scalable model to estimate the effect of multiple longitudinal features (drug exposures) on a rare longitudinal outcome. Our procedure is based on a conditional Poisson model also known as self-controlled case series (SCCS). We model the intensity of outcomes using a convolution between exposures and step functions, that are penalized using a combination of group-Lasso and total-variation. This approach does not require the specification of precise risk periods, and allows to study in the same model several exposures at the same time. We illustrate the fact that this approach improves the state-of-the-art for the estimation of the relative risks both on simulations and on a cohort of diabetic patients, extracted from the large French national health insurance database (SNIIRAM), a SQL database built around medical reimbursements of more than 65 million people. This work has been done in the context of a research partnership between Ecole Polytechnique and CNAMTS (in charge of SNIIRAM). |
Tasks | |
Published | 2017-12-21 |
URL | http://arxiv.org/abs/1712.08243v2 |
http://arxiv.org/pdf/1712.08243v2.pdf | |
PWC | https://paperswithcode.com/paper/convsccs-convolutional-self-controlled-case |
Repo | https://github.com/MaryanMorel/ConvSCCS |
Framework | none |
Learning optimal wavelet bases using a neural network approach
Title | Learning optimal wavelet bases using a neural network approach |
Authors | Andreas Søgaard |
Abstract | A novel method for learning optimal, orthonormal wavelet bases for representing 1- and 2D signals, based on parallels between the wavelet transform and fully connected artificial neural networks, is described. The structural similarities between these two concepts are reviewed and combined to a “wavenet”, allowing for the direct learning of optimal wavelet filter coefficient through stochastic gradient descent with back-propagation over ensembles of training inputs, where conditions on the filter coefficients for constituting orthonormal wavelet bases are cast as quadratic regularisations terms. We describe the practical implementation of this method, and study its performance for high-energy physics collision events for QCD $2 \to 2$ processes. It is shown that an optimal solution is found, even in a high-dimensional search space, and the implications of the result are discussed. |
Tasks | |
Published | 2017-03-25 |
URL | http://arxiv.org/abs/1706.03041v2 |
http://arxiv.org/pdf/1706.03041v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-optimal-wavelet-bases-using-a-neural |
Repo | https://github.com/asogaard/Wavenet |
Framework | none |
Merge or Not? Learning to Group Faces via Imitation Learning
Title | Merge or Not? Learning to Group Faces via Imitation Learning |
Authors | Yue He, Kaidi Cao, Cheng Li, Chen Change Loy |
Abstract | Given a large number of unlabeled face images, face grouping aims at clustering the images into individual identities present in the data. This task remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given profile faces and a large number of uninteresting faces and noisy detections. Often, a user needs to correct the erroneous grouping manually. In this study, we formulate a novel face grouping framework that learns clustering strategy from ground-truth simulated behavior. This is achieved through imitation learning (a.k.a apprenticeship learning or learning by watching) via inverse reinforcement learning (IRL). In contrast to existing clustering approaches that group instances by similarity, our framework makes sequential decision to dynamically decide when to merge two face instances/groups driven by short- and long-term rewards. Extensive experiments on three benchmark datasets show that our framework outperforms unsupervised and supervised baselines. |
Tasks | Imitation Learning |
Published | 2017-07-13 |
URL | http://arxiv.org/abs/1707.03986v1 |
http://arxiv.org/pdf/1707.03986v1.pdf | |
PWC | https://paperswithcode.com/paper/merge-or-not-learning-to-group-faces-via |
Repo | https://github.com/bj80heyue/Learning-to-Group |
Framework | none |
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Title | RACE: Large-scale ReAding Comprehension Dataset From Examinations |
Authors | Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, Eduard Hovy |
Abstract | We present RACE, a new dataset for benchmark evaluation of methods in the reading comprehension task. Collected from the English exams for middle and high school Chinese students in the age range between 12 to 18, RACE consists of near 28,000 passages and near 100,000 questions generated by human experts (English instructors), and covers a variety of topics which are carefully designed for evaluating the students’ ability in understanding and reasoning. In particular, the proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models (43%) and the ceiling human performance (95%). We hope this new dataset can serve as a valuable resource for research and evaluation in machine comprehension. The dataset is freely available at http://www.cs.cmu.edu/~glai1/data/race/ and the code is available at https://github.com/qizhex/RACE_AR_baselines. |
Tasks | Reading Comprehension |
Published | 2017-04-15 |
URL | http://arxiv.org/abs/1704.04683v5 |
http://arxiv.org/pdf/1704.04683v5.pdf | |
PWC | https://paperswithcode.com/paper/race-large-scale-reading-comprehension |
Repo | https://github.com/artiom-zayats/docqa_squad |
Framework | none |
Dynamic Bernoulli Embeddings for Language Evolution
Title | Dynamic Bernoulli Embeddings for Language Evolution |
Authors | Maja Rudolph, David Blei |
Abstract | Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. (2016) developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic embeddings, building on exponential family embeddings to capture how the meanings of words change over time. We use dynamic embeddings to analyze three large collections of historical texts: the U.S. Senate speeches from 1858 to 2009, the history of computer science ACM abstracts from 1951 to 2014, and machine learning papers on the Arxiv from 2007 to 2015. We find dynamic embeddings provide better fits than classical embeddings and capture interesting patterns about how language changes. |
Tasks | Word Embeddings |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08052v1 |
http://arxiv.org/pdf/1703.08052v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-bernoulli-embeddings-for-language |
Repo | https://github.com/mariru/dynamic_bernoulli_embeddings |
Framework | tf |