Paper Group AWR 56
Estimating the Number of Clusters via Normalized Cluster Instability. Inference Networks for Sequential Monte Carlo in Graphical Models. Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence. Image-to-Markup Generation with Coarse-to-Fine Attention. Deep Learning with Eigenvalue Decay Regularizer. Gated-Attention Readers fo …
Estimating the Number of Clusters via Normalized Cluster Instability
Title | Estimating the Number of Clusters via Normalized Cluster Instability |
Authors | Jonas M. B. Haslbeck, Dirk U. Wulff |
Abstract | We improve current instability-based methods for the selection of the number of clusters $k$ in cluster analysis by developing a normalized cluster instability measure that corrects for the distribution of cluster sizes, a previously unaccounted driver of cluster instability. We show that our normalized instability measure outperforms current instability-based measures across the whole sequence of possible $k$ and especially overcomes limitations in the context of large $k$. We also compare, for the first time, model-based and model-free approaches to determine cluster-instability and find their performance to be comparable. We make our method available in the R-package \verb+cstab+. |
Tasks | |
Published | 2016-08-26 |
URL | http://arxiv.org/abs/1608.07494v4 |
http://arxiv.org/pdf/1608.07494v4.pdf | |
PWC | https://paperswithcode.com/paper/estimating-the-number-of-clusters-via |
Repo | https://github.com/cran/cstab |
Framework | none |
Inference Networks for Sequential Monte Carlo in Graphical Models
Title | Inference Networks for Sequential Monte Carlo in Graphical Models |
Authors | Brooks Paige, Frank Wood |
Abstract | We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings. |
Tasks | |
Published | 2016-02-22 |
URL | http://arxiv.org/abs/1602.06701v2 |
http://arxiv.org/pdf/1602.06701v2.pdf | |
PWC | https://paperswithcode.com/paper/inference-networks-for-sequential-monte-carlo |
Repo | https://github.com/tbrx/compiled-inference |
Framework | pytorch |
Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence
Title | Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence |
Authors | Emilio Jorge, Mikael Kågebäck, Fredrik D. Johansson, Emil Gustavsson |
Abstract | Acquiring your first language is an incredible feat and not easily duplicated. Learning to communicate using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. Nevertheless, this is the dominating approach in most natural language processing today. As an alternative, we propose the use of situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks for evolving a shared language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that the agents learn not only to encode physical concepts in their words, i.e. grounding, but also that the agents learn to hold a multi-step dialogue remembering the state of the dialogue from step to step. |
Tasks | Image Retrieval |
Published | 2016-11-10 |
URL | http://arxiv.org/abs/1611.03218v4 |
http://arxiv.org/pdf/1611.03218v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-play-guess-who-and-inventing-a |
Repo | https://github.com/emiliojorge/Inventing-a-Grounded-Language |
Framework | none |
Image-to-Markup Generation with Coarse-to-Fine Attention
Title | Image-to-Markup Generation with Coarse-to-Fine Attention |
Authors | Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, Alexander M. Rush |
Abstract | We present a neural encoder-decoder model to convert images into presentational markup based on a scalable coarse-to-fine attention mechanism. Our method is evaluated in the context of image-to-LaTeX generation, and we introduce a new dataset of real-world rendered mathematical expressions paired with LaTeX markup. We show that unlike neural OCR techniques using CTC-based models, attention-based approaches can tackle this non-standard OCR task. Our approach outperforms classical mathematical OCR systems by a large margin on in-domain rendered data, and, with pretraining, also performs well on out-of-domain handwritten data. To reduce the inference complexity associated with the attention-based approaches, we introduce a new coarse-to-fine attention layer that selects a support region before applying attention. |
Tasks | Optical Character Recognition |
Published | 2016-09-16 |
URL | http://arxiv.org/abs/1609.04938v2 |
http://arxiv.org/pdf/1609.04938v2.pdf | |
PWC | https://paperswithcode.com/paper/image-to-markup-generation-with-coarse-to |
Repo | https://github.com/harvardnlp/im2markup |
Framework | torch |
Deep Learning with Eigenvalue Decay Regularizer
Title | Deep Learning with Eigenvalue Decay Regularizer |
Authors | Oswaldo Ludwig |
Abstract | This paper extends our previous work on regularization of neural networks using Eigenvalue Decay by employing a soft approximation of the dominant eigenvalue in order to enable the calculation of its derivatives in relation to the synaptic weights, and therefore the application of back-propagation, which is a primary demand for deep learning. Moreover, we extend our previous theoretical analysis to deep neural networks and multiclass classification problems. Our method is implemented as an additional regularizer in Keras, a modular neural networks library written in Python, and evaluated in the benchmark data sets Reuters Newswire Topics Classification, IMDB database for binary sentiment classification, MNIST database of handwritten digits and CIFAR-10 data set for image classification. |
Tasks | Image Classification, Sentiment Analysis |
Published | 2016-04-24 |
URL | http://arxiv.org/abs/1604.06985v3 |
http://arxiv.org/pdf/1604.06985v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-eigenvalue-decay |
Repo | https://github.com/oswaldoludwig/Eigenvalue-Decay-Regularizer-for-Keras |
Framework | tf |
Gated-Attention Readers for Text Comprehension
Title | Gated-Attention Readers for Text Comprehension |
Authors | Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov |
Abstract | In this paper we study the problem of answering cloze-style questions over documents. Our model, the Gated-Attention (GA) Reader, integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader. This enables the reader to build query-specific representations of tokens in the document for accurate answer selection. The GA Reader obtains state-of-the-art results on three benchmarks for this task–the CNN & Daily Mail news stories and the Who Did What dataset. The effectiveness of multiplicative interaction is demonstrated by an ablation study, and by comparing to alternative compositional operators for implementing the gated-attention. The code is available at https://github.com/bdhingra/ga-reader. |
Tasks | Answer Selection, Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2016-06-05 |
URL | http://arxiv.org/abs/1606.01549v3 |
http://arxiv.org/pdf/1606.01549v3.pdf | |
PWC | https://paperswithcode.com/paper/gated-attention-readers-for-text |
Repo | https://github.com/aartika/experiment1 |
Framework | tf |
Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks
Title | Breast Mass Classification from Mammograms using Deep Convolutional Neural Networks |
Authors | Daniel Lévy, Arzav Jain |
Abstract | Mammography is the most widely used method to screen breast cancer. Because of its mostly manual nature, variability in mass appearance, and low signal-to-noise ratio, a significant number of breast masses are missed or misdiagnosed. In this work, we present how Convolutional Neural Networks can be used to directly classify pre-segmented breast masses in mammograms as benign or malignant, using a combination of transfer learning, careful pre-processing and data augmentation to overcome limited training data. We achieve state-of-the-art results on the DDSM dataset, surpassing human performance, and show interpretability of our model. |
Tasks | Data Augmentation, Transfer Learning |
Published | 2016-12-02 |
URL | http://arxiv.org/abs/1612.00542v1 |
http://arxiv.org/pdf/1612.00542v1.pdf | |
PWC | https://paperswithcode.com/paper/breast-mass-classification-from-mammograms |
Repo | https://github.com/Clawton92/Classification_mammograms_cnn |
Framework | none |
Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks
Title | Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks |
Authors | Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan |
Abstract | Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2016-12-16 |
URL | http://arxiv.org/abs/1612.05424v2 |
http://arxiv.org/pdf/1612.05424v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-pixel-level-domain-adaptation |
Repo | https://github.com/tensorflow/models/tree/master/research/domain_adaptation |
Framework | tf |
RMPE: Regional Multi-person Pose Estimation
Title | RMPE: Regional Multi-person Pose Estimation |
Authors | Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu |
Abstract | Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. Our framework consists of three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). Our method is able to handle inaccurate bounding boxes and redundant detections, allowing it to achieve a 17% increase in mAP over the state-of-the-art methods on the MPII (multi person) dataset.Our model and source codes are publicly available. |
Tasks | Human Detection, Multi-Person Pose Estimation, Pose Estimation |
Published | 2016-12-01 |
URL | http://arxiv.org/abs/1612.00137v5 |
http://arxiv.org/pdf/1612.00137v5.pdf | |
PWC | https://paperswithcode.com/paper/rmpe-regional-multi-person-pose-estimation |
Repo | https://github.com/Fangyh09/pose_nms |
Framework | none |
Designing Neural Network Architectures using Reinforcement Learning
Title | Designing Neural Network Architectures using Reinforcement Learning |
Authors | Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar |
Abstract | At present, designing convolutional neural network (CNN) architectures requires both human expertise and labor. New architectures are handcrafted by careful experimentation or modified from a handful of existing networks. We introduce MetaQNN, a meta-modeling algorithm based on reinforcement learning to automatically generate high-performing CNN architectures for a given learning task. The learning agent is trained to sequentially choose CNN layers using $Q$-learning with an $\epsilon$-greedy exploration strategy and experience replay. The agent explores a large but finite space of possible architectures and iteratively discovers designs with improved performance on the learning task. On image classification benchmarks, the agent-designed networks (consisting of only standard convolution, pooling, and fully-connected layers) beat existing networks designed with the same layer types and are competitive against the state-of-the-art methods that use more complex layer types. We also outperform existing meta-modeling approaches for network design on image classification tasks. |
Tasks | Image Classification, Q-Learning |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02167v3 |
http://arxiv.org/pdf/1611.02167v3.pdf | |
PWC | https://paperswithcode.com/paper/designing-neural-network-architectures-using |
Repo | https://github.com/SAGNIKMJR/MetaQNN_ImageGenerationVCAE_PyTorch |
Framework | pytorch |
Variational Boosting: Iteratively Refining Posterior Approximations
Title | Variational Boosting: Iteratively Refining Posterior Approximations |
Authors | Andrew C. Miller, Nicholas Foti, Ryan P. Adams |
Abstract | We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approximating class by incorporating additional covariance structure and by introducing new components to form a mixture. We apply variational boosting to synthetic and real statistical models, and show that resulting posterior inferences compare favorably to existing posterior approximation algorithms in both accuracy and efficiency. |
Tasks | |
Published | 2016-11-20 |
URL | http://arxiv.org/abs/1611.06585v2 |
http://arxiv.org/pdf/1611.06585v2.pdf | |
PWC | https://paperswithcode.com/paper/variational-boosting-iteratively-refining |
Repo | https://github.com/andymiller/vboost |
Framework | none |
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Title | Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model |
Authors | Sheng Wang, Siqi Sun, Zhen Li, Renyu Zhang, Jinbo Xu |
Abstract | Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold. |
Tasks | |
Published | 2016-09-02 |
URL | http://arxiv.org/abs/1609.00680v6 |
http://arxiv.org/pdf/1609.00680v6.pdf | |
PWC | https://paperswithcode.com/paper/accurate-de-novo-prediction-of-protein |
Repo | https://github.com/j3xugit/RaptorX-Contact |
Framework | none |
Monte Carlo Structured SVI for Two-Level Non-Conjugate Models
Title | Monte Carlo Structured SVI for Two-Level Non-Conjugate Models |
Authors | Rishit Sheth, Roni Khardon |
Abstract | The stochastic variational inference (SVI) paradigm, which combines variational inference, natural gradients, and stochastic updates, was recently proposed for large-scale data analysis in conjugate Bayesian models and demonstrated to be effective in several problems. This paper studies a family of Bayesian latent variable models with two levels of hidden variables but without any conjugacy requirements, making several contributions in this context. The first is observing that SVI, with an improved structured variational approximation, is applicable under more general conditions than previously thought with the only requirement being that the approximating variational distribution be in the same family as the prior. The resulting approach, Monte Carlo Structured SVI (MC-SSVI), significantly extends the scope of SVI, enabling large-scale learning in non-conjugate models. For models with latent Gaussian variables we propose a hybrid algorithm, using both standard and natural gradients, which is shown to improve stability and convergence. Applications in mixed effects models, sparse Gaussian processes, probabilistic matrix factorization and correlated topic models demonstrate the generality of the approach and the advantages of the proposed algorithms. |
Tasks | Gaussian Processes, Latent Variable Models, Topic Models |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03957v3 |
http://arxiv.org/pdf/1612.03957v3.pdf | |
PWC | https://paperswithcode.com/paper/monte-carlo-structured-svi-for-two-level-non |
Repo | https://github.com/KaikaiZhao/Sparse-Variational-Inference-for-Generalized-Gaussian-Process-Models---Tutorial |
Framework | none |
S3Pool: Pooling with Stochastic Spatial Sampling
Title | S3Pool: Pooling with Stochastic Spatial Sampling |
Authors | Shuangfei Zhai, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, Rogerio Feris |
Abstract | Feature pooling layers (e.g., max pooling) in convolutional neural networks (CNNs) serve the dual purpose of providing increasingly abstract representations as well as yielding computational savings in subsequent convolutional layers. We view the pooling operation in CNNs as a two-step procedure: first, a pooling window (e.g., $2\times 2$) slides over the feature map with stride one which leaves the spatial resolution intact, and second, downsampling is performed by selecting one pixel from each non-overlapping pooling window in an often uniform and deterministic (e.g., top-left) manner. Our starting point in this work is the observation that this regularly spaced downsampling arising from non-overlapping windows, although intuitive from a signal processing perspective (which has the goal of signal reconstruction), is not necessarily optimal for \emph{learning} (where the goal is to generalize). We study this aspect and propose a novel pooling strategy with stochastic spatial sampling (S3Pool), where the regular downsampling is replaced by a more general stochastic version. We observe that this general stochasticity acts as a strong regularizer, and can also be seen as doing implicit data augmentation by introducing distortions in the feature maps. We further introduce a mechanism to control the amount of distortion to suit different datasets and architectures. To demonstrate the effectiveness of the proposed approach, we perform extensive experiments on several popular image classification benchmarks, observing excellent improvements over baseline models. Experimental code is available at https://github.com/Shuangfei/s3pool. |
Tasks | Data Augmentation, Image Classification |
Published | 2016-11-16 |
URL | http://arxiv.org/abs/1611.05138v1 |
http://arxiv.org/pdf/1611.05138v1.pdf | |
PWC | https://paperswithcode.com/paper/s3pool-pooling-with-stochastic-spatial |
Repo | https://github.com/Shuangfei/s3pool |
Framework | none |
Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions
Title | Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions |
Authors | Fabio Carrara, Andrea Esuli, Tiziano Fagni, Fabrizio Falchi, Alejandro Moreo Fernández |
Abstract | In this paper we tackle the problem of image search when the query is a short textual description of the image the user is looking for. We choose to implement the actual search process as a similarity search in a visual feature space, by learning to translate a textual query into a visual representation. Searching in the visual feature space has the advantage that any update to the translation model does not require to reprocess the, typically huge, image collection on which the search is performed. We propose Text2Vis, a neural network that generates a visual representation, in the visual feature space of the fc6-fc7 layers of ImageNet, from a short descriptive text. Text2Vis optimizes two loss functions, using a stochastic loss-selection method. A visual-focused loss is aimed at learning the actual text-to-visual feature mapping, while a text-focused loss is aimed at modeling the higher-level semantic concepts expressed in language and countering the overfit on non-relevant visual components of the visual loss. We report preliminary results on the MS-COCO dataset. |
Tasks | Cross-Modal Information Retrieval, Cross-Modal Retrieval, Image Retrieval |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07287v1 |
http://arxiv.org/pdf/1606.07287v1.pdf | |
PWC | https://paperswithcode.com/paper/picture-it-in-your-mind-generating-high-level |
Repo | https://github.com/AlexMoreo/tensorflow-Text2Vis |
Framework | tf |