May 7, 2019

2897 words 14 mins read

Paper Group AWR 56

Paper Group AWR 56

Estimating the Number of Clusters via Normalized Cluster Instability. Inference Networks for Sequential Monte Carlo in Graphical Models. Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence. Image-to-Markup Generation with Coarse-to-Fine Attention. Deep Learning with Eigenvalue Decay Regularizer. Gated-Attention Readers fo …

Estimating the Number of Clusters via Normalized Cluster Instability

Title Estimating the Number of Clusters via Normalized Cluster Instability
Authors Jonas M. B. Haslbeck, Dirk U. Wulff
Abstract We improve current instability-based methods for the selection of the number of clusters $k$ in cluster analysis by developing a normalized cluster instability measure that corrects for the distribution of cluster sizes, a previously unaccounted driver of cluster instability. We show that our normalized instability measure outperforms current instability-based measures across the whole sequence of possible $k$ and especially overcomes limitations in the context of large $k$. We also compare, for the first time, model-based and model-free approaches to determine cluster-instability and find their performance to be comparable. We make our method available in the R-package \verb+cstab+.
Tasks
Published 2016-08-26
URL http://arxiv.org/abs/1608.07494v4
PDF http://arxiv.org/pdf/1608.07494v4.pdf
PWC https://paperswithcode.com/paper/estimating-the-number-of-clusters-via
Repo https://github.com/cran/cstab
Framework none

Inference Networks for Sequential Monte Carlo in Graphical Models

Title Inference Networks for Sequential Monte Carlo in Graphical Models
Authors Brooks Paige, Frank Wood
Abstract We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings.
Tasks
Published 2016-02-22
URL http://arxiv.org/abs/1602.06701v2
PDF http://arxiv.org/pdf/1602.06701v2.pdf
PWC https://paperswithcode.com/paper/inference-networks-for-sequential-monte-carlo
Repo https://github.com/tbrx/compiled-inference
Framework pytorch

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence

Title Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence
Authors Emilio Jorge, Mikael Kågebäck, Fredrik D. Johansson, Emil Gustavsson
Abstract Acquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step.
Tasks Image Retrieval
Published 2016-11-10
URL http://arxiv.org/abs/1611.03218v4
PDF http://arxiv.org/pdf/1611.03218v4.pdf
PWC https://paperswithcode.com/paper/learning-to-play-guess-who-and-inventing-a
Repo https://github.com/emiliojorge/Inventing-a-Grounded-Language
Framework none

Image-to-Markup Generation with Coarse-to-Fine Attention

Title Image-to-Markup Generation with Coarse-to-Fine Attention
Authors Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush
Abstract We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.
Tasks Optical Character Recognition
Published 2016-09-16
URL http://arxiv.org/abs/1609.04938v2
PDF http://arxiv.org/pdf/1609.04938v2.pdf
PWC https://paperswithcode.com/paper/image-to-markup-generation-with-coarse-to
Repo https://github.com/harvardnlp/im2markup
Framework torch

Deep Learning with Eigenvalue Decay Regularizer

Title Deep Learning with Eigenvalue Decay Regularizer
Authors Oswaldo Ludwig
Abstract This paper extends our previous work on regularization of neural networks using Eigenvalue Decay by employing a soft approximation of the dominant eigenvalue in order to enable the calculation of its derivatives in relation to the synaptic weights, and therefore the application of back-propagation, which is a primary demand for deep learning. Moreover, we extend our previous theoretical analysis to deep neural networks and multiclass classification problems. Our method is implemented as an additional regularizer in Keras, a modular neural networks library written in Python, and evaluated in the benchmark data sets Reuters Newswire Topics Classification, IMDB database for binary sentiment classification, MNIST database of handwritten digits and CIFAR-10 data set for image classification.
Tasks Image Classification, Sentiment Analysis
Published 2016-04-24
URL http://arxiv.org/abs/1604.06985v3
PDF http://arxiv.org/pdf/1604.06985v3.pdf
PWC https://paperswithcode.com/paper/deep-learning-with-eigenvalue-decay
Repo https://github.com/oswaldoludwig/Eigenvalue-Decay-Regularizer-for-Keras
Framework tf

Gated-Attention Readers for Text Comprehension

Title Gated-Attention Readers for Text Comprehension
Authors Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Abstract In this paper we study the problem of answering cloze-style questions over documents. Our model, the Gated-Attention (GA) Reader, integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader. This enables the reader to build query-specific representations of tokens in the document for accurate answer selection. The GA Reader obtains state-of-the-art results on three benchmarks for this task–the CNN & Daily Mail news stories and the Who Did What dataset. The effectiveness of multiplicative interaction is demonstrated by an ablation study, and by comparing to alternative compositional operators for implementing the gated-attention. The code is available at https://github.com/bdhingra/ga-reader.
Tasks Answer Selection, Open-Domain Question Answering, Question Answering, Reading Comprehension
Published 2016-06-05
URL http://arxiv.org/abs/1606.01549v3
PDF http://arxiv.org/pdf/1606.01549v3.pdf
PWC https://paperswithcode.com/paper/gated-attention-readers-for-text
Repo https://github.com/aartika/experiment1
Framework tf

Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks

Title Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks
Authors Daniel Lévy, Arzav Jain
Abstract Mammography is the most widely used method to screen breast cancer. Because of its mostly manual nature, variability in mass appearance, and low signal-to-noise ratio, a significant number of breast masses are missed or misdiagnosed. In this work, we present how Convolutional Neural Networks can be used to directly classify pre-segmented breast masses in mammograms as benign or malignant, using a combination of transfer learning, careful pre-processing and data augmentation to overcome limited training data. We achieve state-of-the-art results on the DDSM dataset, surpassing human performance, and show interpretability of our model.
Tasks Data Augmentation, Transfer Learning
Published 2016-12-02
URL http://arxiv.org/abs/1612.00542v1
PDF http://arxiv.org/pdf/1612.00542v1.pdf
PWC https://paperswithcode.com/paper/breast-mass-classification-from-mammograms
Repo https://github.com/Clawton92/Classification_mammograms_cnn
Framework none

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks

Title Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks
Authors Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan
Abstract Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.
Tasks Domain Adaptation, Unsupervised Domain Adaptation
Published 2016-12-16
URL http://arxiv.org/abs/1612.05424v2
PDF http://arxiv.org/pdf/1612.05424v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-pixel-level-domain-adaptation
Repo https://github.com/tensorflow/models/tree/master/research/domain_adaptation
Framework tf

RMPE: Regional Multi-person Pose Estimation

Title RMPE: Regional Multi-person Pose Estimation
Authors Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu
Abstract Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. Our framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). Our method is able to handle inaccurate bounding boxes and redundant detections, allowing it to achieve a 17% increase in mAP over the state-of-the-art methods on the MPII (multi person) dataset.Our model and source codes are publicly available.
Tasks Human Detection, Multi-Person Pose Estimation, Pose Estimation
Published 2016-12-01
URL http://arxiv.org/abs/1612.00137v5
PDF http://arxiv.org/pdf/1612.00137v5.pdf
PWC https://paperswithcode.com/paper/rmpe-regional-multi-person-pose-estimation
Repo https://github.com/Fangyh09/pose_nms
Framework none

Designing Neural Network Architectures using Reinforcement Learning

Title Designing Neural Network Architectures using Reinforcement Learning
Authors Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar
Abstract At present, designing convolutional neural network (CNN) architectures requires both human expertise and labor. New architectures are handcrafted by careful experimentation or modified from a handful of existing networks. We introduce MetaQNN, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task. The learning agent is trained to sequentially choose CNN layers using $Q$-learning with an $\epsilon$-greedy exploration strategy and experience replay. The agent explores a large but finite space of possible architectures and iteratively discovers designs with improved performance on the learning task. On image classification benchmarks, the agent-designed networks (consisting of only standard convolution, pooling, and fully-connected layers) beat existing networks designed with the same layer types and are competitive against the state-of-the-art methods that use more complex layer types. We also outperform existing meta-modeling approaches for network design on image classification tasks.
Tasks Image Classification, Q-Learning
Published 2016-11-07
URL http://arxiv.org/abs/1611.02167v3
PDF http://arxiv.org/pdf/1611.02167v3.pdf
PWC https://paperswithcode.com/paper/designing-neural-network-architectures-using
Repo https://github.com/SAGNIKMJR/MetaQNN_ImageGenerationVCAE_PyTorch
Framework pytorch

Variational Boosting: Iteratively Refining Posterior Approximations

Title Variational Boosting: Iteratively Refining Posterior Approximations
Authors Andrew C. Miller, Nicholas Foti, Ryan P. Adams
Abstract We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approximating class by incorporating additional covariance structure and by introducing new components to form a mixture. We apply variational boosting to synthetic and real statistical models, and show that resulting posterior inferences compare favorably to existing posterior approximation algorithms in both accuracy and efficiency.
Tasks
Published 2016-11-20
URL http://arxiv.org/abs/1611.06585v2
PDF http://arxiv.org/pdf/1611.06585v2.pdf
PWC https://paperswithcode.com/paper/variational-boosting-iteratively-refining
Repo https://github.com/andymiller/vboost
Framework none

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

Title Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Authors Sheng Wang, Siqi Sun, Zhen Li, Renyu Zhang, Jinbo Xu
Abstract Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold.
Tasks
Published 2016-09-02
URL http://arxiv.org/abs/1609.00680v6
PDF http://arxiv.org/pdf/1609.00680v6.pdf
PWC https://paperswithcode.com/paper/accurate-de-novo-prediction-of-protein
Repo https://github.com/j3xugit/RaptorX-Contact
Framework none

Monte Carlo Structured SVI for Two-Level Non-Conjugate Models

Title Monte Carlo Structured SVI for Two-Level Non-Conjugate Models
Authors Rishit Sheth, Roni Khardon
Abstract The stochastic variational inference (SVI) paradigm, which combines variational inference, natural gradients, and stochastic updates, was recently proposed for large-scale data analysis in conjugate Bayesian models and demonstrated to be effective in several problems. This paper studies a family of Bayesian latent variable models with two levels of hidden variables but without any conjugacy requirements, making several contributions in this context. The first is observing that SVI, with an improved structured variational approximation, is applicable under more general conditions than previously thought with the only requirement being that the approximating variational distribution be in the same family as the prior. The resulting approach, Monte Carlo Structured SVI (MC-SSVI), significantly extends the scope of SVI, enabling large-scale learning in non-conjugate models. For models with latent Gaussian variables we propose a hybrid algorithm, using both standard and natural gradients, which is shown to improve stability and convergence. Applications in mixed effects models, sparse Gaussian processes, probabilistic matrix factorization and correlated topic models demonstrate the generality of the approach and the advantages of the proposed algorithms.
Tasks Gaussian Processes, Latent Variable Models, Topic Models
Published 2016-12-12
URL http://arxiv.org/abs/1612.03957v3
PDF http://arxiv.org/pdf/1612.03957v3.pdf
PWC https://paperswithcode.com/paper/monte-carlo-structured-svi-for-two-level-non
Repo https://github.com/KaikaiZhao/Sparse-Variational-Inference-for-Generalized-Gaussian-Process-Models---Tutorial
Framework none

S3Pool: Pooling with Stochastic Spatial Sampling

Title S3Pool: Pooling with Stochastic Spatial Sampling
Authors Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris
Abstract Feature pooling layers (e.g., max pooling) in convolutional neural networks (CNNs) serve the dual purpose of providing increasingly abstract representations as well as yielding computational savings in subsequent convolutional layers. We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e.g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e.g., top-left) manner. Our starting point in this work is the observation that this regularly spaced downsampling arising from non-overlapping windows, although intuitive from a signal processing perspective (which has the goal of signal reconstruction), is not necessarily optimal for \emph{learning} (where the goal is to generalize). We study this aspect and propose a novel pooling strategy with stochastic spatial sampling (S3Pool), where the regular downsampling is replaced by a more general stochastic version. We observe that this general stochasticity acts as a strong regularizer, and can also be seen as doing implicit data augmentation by introducing distortions in the feature maps. We further introduce a mechanism to control the amount of distortion to suit different datasets and architectures. To demonstrate the effectiveness of the proposed approach, we perform extensive experiments on several popular image classification benchmarks, observing excellent improvements over baseline models. Experimental code is available at https://github.com/Shuangfei/s3pool.
Tasks Data Augmentation, Image Classification
Published 2016-11-16
URL http://arxiv.org/abs/1611.05138v1
PDF http://arxiv.org/pdf/1611.05138v1.pdf
PWC https://paperswithcode.com/paper/s3pool-pooling-with-stochastic-spatial
Repo https://github.com/Shuangfei/s3pool
Framework none

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions

Title Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions
Authors Fabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernández
Abstract In this paper we tackle the problem of image search when the query is a short textual description of the image the user is looking for. We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation. Searching in the visual feature space has the advantage that any update to the translation model does not require to reprocess the, typically huge, image collection on which the search is performed. We propose Text2Vis, a neural network that generates a visual representation, in the visual feature space of the fc6-fc7 layers of ImageNet, from a short descriptive text. Text2Vis optimizes two loss functions, using a stochastic loss-selection method. A visual-focused loss is aimed at learning the actual text-to-visual feature mapping, while a text-focused loss is aimed at modeling the higher-level semantic concepts expressed in language and countering the overfit on non-relevant visual components of the visual loss. We report preliminary results on the MS-COCO dataset.
Tasks Cross-Modal Information Retrieval, Cross-Modal Retrieval, Image Retrieval
Published 2016-06-23
URL http://arxiv.org/abs/1606.07287v1
PDF http://arxiv.org/pdf/1606.07287v1.pdf
PWC https://paperswithcode.com/paper/picture-it-in-your-mind-generating-high-level
Repo https://github.com/AlexMoreo/tensorflow-Text2Vis
Framework tf
comments powered by Disqus