May 7, 2019

2897 words 14 mins read

Paper Group AWR 56

Estimating the Number of Clusters via Normalized Cluster Instability. Inference Networks for Sequential Monte Carlo in Graphical Models. Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence. Image-to-Markup Generation with Coarse-to-Fine Attention. Deep Learning with Eigenvalue Decay Regularizer. Gated-Attention Readers fo …

Estimating the Number of Clusters via Normalized Cluster Instability


Title	Estimating the Number of Clusters via Normalized Cluster Instability
Authors	Jonas M. B. Haslbeck, Dirk U. Wulff
Abstract	We improve current instability-based methods for the selection of the number of clusters $k$ in cluster analysis by developing a normalized cluster instability measure that corrects for the distribution of cluster sizes, a previously unaccounted driver of cluster instability. We show that our normalized instability measure outperforms current instability-based measures across the whole sequence of possible $k$ and especially overcomes limitations in the context of large $k$. We also compare, for the first time, model-based and model-free approaches to determine cluster-instability and find their performance to be comparable. We make our method available in the R-package \verb+cstab+.
Tasks
Published	2016-08-26
URL	http://arxiv.org/abs/1608.07494v4
PDF	http://arxiv.org/pdf/1608.07494v4.pdf
PWC	https://paperswithcode.com/paper/estimating-the-number-of-clusters-via
Repo	https://github.com/cran/cstab
Framework	none

Inference Networks for Sequential Monte Carlo in Graphical Models


Title	Inference Networks for Sequential Monte Carlo in Graphical Models
Authors	Brooks Paige, Frank Wood
Abstract	We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings.
Tasks
Published	2016-02-22
URL	http://arxiv.org/abs/1602.06701v2
PDF	http://arxiv.org/pdf/1602.06701v2.pdf
PWC	https://paperswithcode.com/paper/inference-networks-for-sequential-monte-carlo
Repo	https://github.com/tbrx/compiled-inference
Framework	pytorch

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence


Title	Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence
Authors	Emilio Jorge, Mikael Kågebäck, Fredrik D. Johansson, Emil Gustavsson
Abstract	Acquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step.
Tasks	Image Retrieval
Published	2016-11-10
URL	http://arxiv.org/abs/1611.03218v4
PDF	http://arxiv.org/pdf/1611.03218v4.pdf
PWC	https://paperswithcode.com/paper/learning-to-play-guess-who-and-inventing-a
Repo	https://github.com/emiliojorge/Inventing-a-Grounded-Language
Framework	none

Image-to-Markup Generation with Coarse-to-Fine Attention


Title	Image-to-Markup Generation with Coarse-to-Fine Attention
Authors	Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush
Abstract	We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention.
Tasks	Optical Character Recognition
Published	2016-09-16
URL	http://arxiv.org/abs/1609.04938v2
PDF	http://arxiv.org/pdf/1609.04938v2.pdf
PWC	https://paperswithcode.com/paper/image-to-markup-generation-with-coarse-to
Repo	https://github.com/harvardnlp/im2markup
Framework	torch

Deep Learning with Eigenvalue Decay Regularizer


Title	Deep Learning with Eigenvalue Decay Regularizer
Authors	Oswaldo Ludwig
Abstract	This paper extends our previous work on regularization of neural networks using Eigenvalue Decay by employing a soft approximation of the dominant eigenvalue in order to enable the calculation of its derivatives in relation to the synaptic weights, and therefore the application of back-propagation, which is a primary demand for deep learning. Moreover, we extend our previous theoretical analysis to deep neural networks and multiclass classification problems. Our method is implemented as an additional regularizer in Keras, a modular neural networks library written in Python, and evaluated in the benchmark data sets Reuters Newswire Topics Classification, IMDB database for binary sentiment classification, MNIST database of handwritten digits and CIFAR-10 data set for image classification.
Tasks	Image Classification, Sentiment Analysis
Published	2016-04-24
URL	http://arxiv.org/abs/1604.06985v3
PDF	http://arxiv.org/pdf/1604.06985v3.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-eigenvalue-decay
Repo	https://github.com/oswaldoludwig/Eigenvalue-Decay-Regularizer-for-Keras
Framework	tf

Gated-Attention Readers for Text Comprehension


Title	Gated-Attention Readers for Text Comprehension
Authors	Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Abstract	In this paper we study the problem of answering cloze-style questions over documents. Our model, the Gated-Attention (GA) Reader, integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader. This enables the reader to build query-specific representations of tokens in the document for accurate answer selection. The GA Reader obtains state-of-the-art results on three benchmarks for this task–the CNN & Daily Mail news stories and the Who Did What dataset. The effectiveness of multiplicative interaction is demonstrated by an ablation study, and by comparing to alternative compositional operators for implementing the gated-attention. The code is available at https://github.com/bdhingra/ga-reader.
Tasks	Answer Selection, Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2016-06-05
URL	http://arxiv.org/abs/1606.01549v3
PDF	http://arxiv.org/pdf/1606.01549v3.pdf
PWC	https://paperswithcode.com/paper/gated-attention-readers-for-text
Repo	https://github.com/aartika/experiment1
Framework	tf

Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks


Title	Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks
Authors	Daniel Lévy, Arzav Jain
Abstract	Mammography is the most widely used method to screen breast cancer. Because of its mostly manual nature, variability in mass appearance, and low signal-to-noise ratio, a significant number of breast masses are missed or misdiagnosed. In this work, we present how Convolutional Neural Networks can be used to directly classify pre-segmented breast masses in mammograms as benign or malignant, using a combination of transfer learning, careful pre-processing and data augmentation to overcome limited training data. We achieve state-of-the-art results on the DDSM dataset, surpassing human performance, and show interpretability of our model.
Tasks	Data Augmentation, Transfer Learning
Published	2016-12-02
URL	http://arxiv.org/abs/1612.00542v1
PDF	http://arxiv.org/pdf/1612.00542v1.pdf
PWC	https://paperswithcode.com/paper/breast-mass-classification-from-mammograms
Repo	https://github.com/Clawton92/Classification_mammograms_cnn
Framework	none

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks


Title	Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks
Authors	Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan
Abstract	Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2016-12-16
URL	http://arxiv.org/abs/1612.05424v2
PDF	http://arxiv.org/pdf/1612.05424v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-pixel-level-domain-adaptation
Repo	https://github.com/tensorflow/models/tree/master/research/domain_adaptation
Framework	tf

RMPE: Regional Multi-person Pose Estimation


Title	RMPE: Regional Multi-person Pose Estimation
Authors	Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu
Abstract	Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. Our framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). Our method is able to handle inaccurate bounding boxes and redundant detections, allowing it to achieve a 17% increase in mAP over the state-of-the-art methods on the MPII (multi person) dataset.Our model and source codes are publicly available.
Tasks	Human Detection, Multi-Person Pose Estimation, Pose Estimation
Published	2016-12-01
URL	http://arxiv.org/abs/1612.00137v5
PDF	http://arxiv.org/pdf/1612.00137v5.pdf
PWC	https://paperswithcode.com/paper/rmpe-regional-multi-person-pose-estimation
Repo	https://github.com/Fangyh09/pose_nms
Framework	none

Designing Neural Network Architectures using Reinforcement Learning


Title	Designing Neural Network Architectures using Reinforcement Learning
Authors	Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar
Abstract	At present, designing convolutional neural network (CNN) architectures requires both human expertise and labor. New architectures are handcrafted by careful experimentation or modified from a handful of existing networks. We introduce MetaQNN, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task. The learning agent is trained to sequentially choose CNN layers using $Q$-learning with an $\epsilon$-greedy exploration strategy and experience replay. The agent explores a large but finite space of possible architectures and iteratively discovers designs with improved performance on the learning task. On image classification benchmarks, the agent-designed networks (consisting of only standard convolution, pooling, and fully-connected layers) beat existing networks designed with the same layer types and are competitive against the state-of-the-art methods that use more complex layer types. We also outperform existing meta-modeling approaches for network design on image classification tasks.
Tasks	Image Classification, Q-Learning
Published	2016-11-07
URL	http://arxiv.org/abs/1611.02167v3
PDF	http://arxiv.org/pdf/1611.02167v3.pdf
PWC	https://paperswithcode.com/paper/designing-neural-network-architectures-using
Repo	https://github.com/SAGNIKMJR/MetaQNN_ImageGenerationVCAE_PyTorch
Framework	pytorch

Variational Boosting: Iteratively Refining Posterior Approximations


Title	Variational Boosting: Iteratively Refining Posterior Approximations
Authors	Andrew C. Miller, Nicholas Foti, Ryan P. Adams
Abstract	We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approximating class by incorporating additional covariance structure and by introducing new components to form a mixture. We apply variational boosting to synthetic and real statistical models, and show that resulting posterior inferences compare favorably to existing posterior approximation algorithms in both accuracy and efficiency.
Tasks
Published	2016-11-20
URL	http://arxiv.org/abs/1611.06585v2
PDF	http://arxiv.org/pdf/1611.06585v2.pdf
PWC	https://paperswithcode.com/paper/variational-boosting-iteratively-refining
Repo	https://github.com/andymiller/vboost
Framework	none

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model


Title	Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Authors	Sheng Wang, Siqi Sun, Zhen Li, Renyu Zhang, Jinbo Xu
Abstract	Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold.
Tasks
Published	2016-09-02
URL	http://arxiv.org/abs/1609.00680v6
PDF	http://arxiv.org/pdf/1609.00680v6.pdf
PWC	https://paperswithcode.com/paper/accurate-de-novo-prediction-of-protein
Repo	https://github.com/j3xugit/RaptorX-Contact
Framework	none

Monte Carlo Structured SVI for Two-Level Non-Conjugate Models


Title	Monte Carlo Structured SVI for Two-Level Non-Conjugate Models
Authors	Rishit Sheth, Roni Khardon
Abstract	The stochastic variational inference (SVI) paradigm, which combines variational inference, natural gradients, and stochastic updates, was recently proposed for large-scale data analysis in conjugate Bayesian models and demonstrated to be effective in several problems. This paper studies a family of Bayesian latent variable models with two levels of hidden variables but without any conjugacy requirements, making several contributions in this context. The first is observing that SVI, with an improved structured variational approximation, is applicable under more general conditions than previously thought with the only requirement being that the approximating variational distribution be in the same family as the prior. The resulting approach, Monte Carlo Structured SVI (MC-SSVI), significantly extends the scope of SVI, enabling large-scale learning in non-conjugate models. For models with latent Gaussian variables we propose a hybrid algorithm, using both standard and natural gradients, which is shown to improve stability and convergence. Applications in mixed effects models, sparse Gaussian processes, probabilistic matrix factorization and correlated topic models demonstrate the generality of the approach and the advantages of the proposed algorithms.
Tasks	Gaussian Processes, Latent Variable Models, Topic Models
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03957v3
PDF	http://arxiv.org/pdf/1612.03957v3.pdf
PWC	https://paperswithcode.com/paper/monte-carlo-structured-svi-for-two-level-non
Repo	https://github.com/KaikaiZhao/Sparse-Variational-Inference-for-Generalized-Gaussian-Process-Models---Tutorial
Framework	none

S3Pool: Pooling with Stochastic Spatial Sampling


Title	S3Pool: Pooling with Stochastic Spatial Sampling
Authors	Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris
Abstract	Feature pooling layers (e.g., max pooling) in convolutional neural networks (CNNs) serve the dual purpose of providing increasingly abstract representations as well as yielding computational savings in subsequent convolutional layers. We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e.g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e.g., top-left) manner. Our starting point in this work is the observation that this regularly spaced downsampling arising from non-overlapping windows, although intuitive from a signal processing perspective (which has the goal of signal reconstruction), is not necessarily optimal for \emph{learning} (where the goal is to generalize). We study this aspect and propose a novel pooling strategy with stochastic spatial sampling (S3Pool), where the regular downsampling is replaced by a more general stochastic version. We observe that this general stochasticity acts as a strong regularizer, and can also be seen as doing implicit data augmentation by introducing distortions in the feature maps. We further introduce a mechanism to control the amount of distortion to suit different datasets and architectures. To demonstrate the effectiveness of the proposed approach, we perform extensive experiments on several popular image classification benchmarks, observing excellent improvements over baseline models. Experimental code is available at https://github.com/Shuangfei/s3pool.
Tasks	Data Augmentation, Image Classification
Published	2016-11-16
URL	http://arxiv.org/abs/1611.05138v1
PDF	http://arxiv.org/pdf/1611.05138v1.pdf
PWC	https://paperswithcode.com/paper/s3pool-pooling-with-stochastic-spatial
Repo	https://github.com/Shuangfei/s3pool
Framework	none

Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions


Title	Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions
Authors	Fabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernández
Abstract	In this paper we tackle the problem of image search when the query is a short textual description of the image the user is looking for. We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation. Searching in the visual feature space has the advantage that any update to the translation model does not require to reprocess the, typically huge, image collection on which the search is performed. We propose Text2Vis, a neural network that generates a visual representation, in the visual feature space of the fc6-fc7 layers of ImageNet, from a short descriptive text. Text2Vis optimizes two loss functions, using a stochastic loss-selection method. A visual-focused loss is aimed at learning the actual text-to-visual feature mapping, while a text-focused loss is aimed at modeling the higher-level semantic concepts expressed in language and countering the overfit on non-relevant visual components of the visual loss. We report preliminary results on the MS-COCO dataset.
Tasks	Cross-Modal Information Retrieval, Cross-Modal Retrieval, Image Retrieval
Published	2016-06-23
URL	http://arxiv.org/abs/1606.07287v1
PDF	http://arxiv.org/pdf/1606.07287v1.pdf
PWC	https://paperswithcode.com/paper/picture-it-in-your-mind-generating-high-level
Repo	https://github.com/AlexMoreo/tensorflow-Text2Vis
Framework	tf