May 7, 2019

2721 words 13 mins read

Paper Group AWR 81

Paper Group AWR 81

Generative Adversarial Networks as Variational Training of Energy Based Models. Learning Representations for Counterfactual Inference. An Empirical Study of Language CNN for Image Captioning. Text-guided Attention Model for Image Captioning. IM2CAD. RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised View …

Generative Adversarial Networks as Variational Training of Energy Based Models

Title Generative Adversarial Networks as Variational Training of Energy Based Models
Authors Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang
Abstract In this paper, we study deep generative models for effective unsupervised learning. We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from. The training of VGAN takes a two step procedure: given $p(\mathbf{x})$, $q(\mathbf{x})$ is updated to maximize the lower bound; $p(\mathbf{x})$ is then updated one step with samples drawn from $q(\mathbf{x})$ to decrease the lower bound. VGAN is inspired by the generative adversarial networks (GANs), where $p(\mathbf{x})$ corresponds to the discriminator and $q(\mathbf{x})$ corresponds to the generator, but with several notable differences. We hence name our model variational GANs (VGANs). VGAN provides a practical solution to training deep EBMs in high dimensional space, by eliminating the need of MCMC sampling. From this view, we are also able to identify causes to the difficulty of training GANs and propose viable solutions. \footnote{Experimental code is available at https://github.com/Shuangfei/vgan}
Tasks
Published 2016-11-06
URL http://arxiv.org/abs/1611.01799v1
PDF http://arxiv.org/pdf/1611.01799v1.pdf
PWC https://paperswithcode.com/paper/generative-adversarial-networks-as
Repo https://github.com/Shuangfei/vgan
Framework none

Learning Representations for Counterfactual Inference

Title Learning Representations for Counterfactual Inference
Authors Fredrik D. Johansson, Uri Shalit, David Sontag
Abstract Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, “Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.
Tasks Causal Inference, Counterfactual Inference, Domain Adaptation, Representation Learning
Published 2016-05-12
URL http://arxiv.org/abs/1605.03661v3
PDF http://arxiv.org/pdf/1605.03661v3.pdf
PWC https://paperswithcode.com/paper/learning-representations-for-counterfactual
Repo https://github.com/inzouzouwetrust/BSTATS
Framework tf

An Empirical Study of Language CNN for Image Captioning

Title An Empirical Study of Language CNN for Image Captioning
Authors Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen
Abstract Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies of history words, which are critical for image captioning. The effectiveness of our approach is validated on two datasets MS COCO and Flickr30K. Our extensive experimental results show that our method outperforms the vanilla recurrent neural network based language models and is competitive with the state-of-the-art methods.
Tasks Image Captioning, Language Modelling
Published 2016-12-21
URL http://arxiv.org/abs/1612.07086v3
PDF http://arxiv.org/pdf/1612.07086v3.pdf
PWC https://paperswithcode.com/paper/an-empirical-study-of-language-cnn-for-image
Repo https://github.com/showkeyjar/chinese_im2text.pytorch
Framework pytorch

Text-guided Attention Model for Image Captioning

Title Text-guided Attention Model for Image Captioning
Authors Jonghwan Mun, Minsu Cho, Bohyung Han
Abstract Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics.
Tasks Image Captioning
Published 2016-12-12
URL http://arxiv.org/abs/1612.03557v1
PDF http://arxiv.org/pdf/1612.03557v1.pdf
PWC https://paperswithcode.com/paper/text-guided-attention-model-for-image
Repo https://github.com/vikramnitin9/nnfl
Framework tf

IM2CAD

Title IM2CAD
Authors Hamid Izadinia, Qi Shan, Steven M. Seitz
Abstract Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement.
Tasks Scene Understanding
Published 2016-08-18
URL http://arxiv.org/abs/1608.05137v2
PDF http://arxiv.org/pdf/1608.05137v2.pdf
PWC https://paperswithcode.com/paper/im2cad
Repo https://github.com/BenjaminPoilve/Deep-Learning-ressources
Framework tf

RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints

Title RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
Authors Asako Kanezaki, Yasuyuki Matsushita, Yoshifumi Nishida
Abstract We propose a Convolutional Neural Network (CNN)-based model “RotationNet,” which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10- and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on https://github.com/kanezaki/rotationnet
Tasks 3D Object Classification, Object Classification, Pose Estimation
Published 2016-03-20
URL http://arxiv.org/abs/1603.06208v4
PDF http://arxiv.org/pdf/1603.06208v4.pdf
PWC https://paperswithcode.com/paper/rotationnet-joint-object-categorization-and
Repo https://github.com/kanezaki/rotationnet
Framework none

Learning Recurrent Span Representations for Extractive Question Answering

Title Learning Recurrent Span Representations for Extractive Question Answering
Authors Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, Jonathan Berant
Abstract The reading comprehension task, that asks questions about a given evidence document, is a central problem in natural language understanding. Recent formulations of this task have typically focused on answer selection from a set of candidates pre-defined manually or through the use of an external NLP pipeline. However, Rajpurkar et al. (2016) recently released the SQuAD dataset in which the answers can be arbitrary strings from the supplied text. In this paper, we focus on this answer extraction task, presenting a novel model architecture that efficiently builds fixed length representations of all spans in the evidence document with a recurrent network. We show that scoring explicit span representations significantly improves performance over other approaches that factor the prediction into separate predictions about words or start and end markers. Our approach improves upon the best published results of Wang & Jiang (2016) by 5% and decreases the error of Rajpurkar et al.‘s baseline by > 50%.
Tasks Answer Selection, Question Answering, Reading Comprehension
Published 2016-11-04
URL http://arxiv.org/abs/1611.01436v2
PDF http://arxiv.org/pdf/1611.01436v2.pdf
PWC https://paperswithcode.com/paper/learning-recurrent-span-representations-for
Repo https://github.com/asadovsky/nn
Framework tf

Holophrasm: a neural Automated Theorem Prover for higher-order logic

Title Holophrasm: a neural Automated Theorem Prover for higher-order logic
Authors Daniel Whalen
Abstract I propose a system for Automated Theorem Proving in higher order logic using deep learning and eschewing hand-constructed features. Holophrasm exploits the formalism of the Metamath language and explores partial proof trees using a neural-network-augmented bandit algorithm and a sequence-to-sequence model for action enumeration. The system proves 14% of its test theorems from Metamath’s set.mm module.
Tasks Automated Theorem Proving
Published 2016-08-08
URL http://arxiv.org/abs/1608.02644v2
PDF http://arxiv.org/pdf/1608.02644v2.pdf
PWC https://paperswithcode.com/paper/holophrasm-a-neural-automated-theorem-prover
Repo https://github.com/justin941208/SPIA-Project
Framework none

Multi-Perspective Context Matching for Machine Comprehension

Title Multi-Perspective Context Matching for Machine Comprehension
Authors Zhiguo Wang, Haitao Mi, Wael Hamza, Radu Florian
Abstract Previous machine comprehension (MC) datasets are either too small to train end-to-end deep learning models, or not difficult enough to evaluate the ability of current MC techniques. The newly released SQuAD dataset alleviates these limitations, and gives us a chance to develop more realistic MC models. Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) model, which is an end-to-end system that directly predicts the answer beginning and ending points in a passage. Our model first adjusts each word-embedding vector in the passage by multiplying a relevancy weight computed against the question. Then, we encode the question and weighted passage by using bi-directional LSTMs. For each point in the passage, our model matches the context of this point against the encoded question from multiple perspectives and produces a matching vector. Given those matched vectors, we employ another bi-directional LSTM to aggregate all the information and predict the beginning and ending points. Experimental result on the test set of SQuAD shows that our model achieves a competitive result on the leaderboard.
Tasks Question Answering, Reading Comprehension
Published 2016-12-13
URL http://arxiv.org/abs/1612.04211v1
PDF http://arxiv.org/pdf/1612.04211v1.pdf
PWC https://paperswithcode.com/paper/multi-perspective-context-matching-for
Repo https://github.com/bloomsburyai/question-generation
Framework tf

Bidirectional Attention Flow for Machine Comprehension

Title Bidirectional Attention Flow for Machine Comprehension
Authors Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
Abstract Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed-size vector, couple attentions temporally, and/or often form a uni-directional attention. In this paper we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Our experimental evaluations show that our model achieves the state-of-the-art results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze test.
Tasks Open-Domain Question Answering, Question Answering, Reading Comprehension
Published 2016-11-05
URL http://arxiv.org/abs/1611.01603v6
PDF http://arxiv.org/pdf/1611.01603v6.pdf
PWC https://paperswithcode.com/paper/bidirectional-attention-flow-for-machine
Repo https://github.com/ghus75/Question_Answering
Framework none

Dynamic Coattention Networks For Question Answering

Title Dynamic Coattention Networks For Question Answering
Authors Caiming Xiong, Victor Zhong, Richard Socher
Abstract Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.
Tasks Question Answering
Published 2016-11-05
URL http://arxiv.org/abs/1611.01604v4
PDF http://arxiv.org/pdf/1611.01604v4.pdf
PWC https://paperswithcode.com/paper/dynamic-coattention-networks-for-question
Repo https://github.com/lmn-extracts/dcn_plus
Framework tf

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment

Title DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment
Authors Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, Yunsheng Ma
Abstract Worldwide, in 2014, more than 1.9 billion adults, 18 years and older, were overweight. Of these, over 600 million were obese. Accurately documenting dietary caloric intake is crucial to manage weight loss, but also presents challenges because most of the current methods for dietary assessment must rely on memory to recall foods eaten. The ultimate goal of our research is to develop computer-aided technical solutions to enhance and improve the accuracy of current measurements of dietary intake. Our proposed system in this paper aims to improve the accuracy of dietary assessment by analyzing the food images captured by mobile devices (e.g., smartphone). The key technique innovation in this paper is the deep learning-based food image recognition algorithms. Substantial research has demonstrated that digital imaging accurately estimates dietary intake in many environments and it has many advantages over other methods. However, how to derive the food information (e.g., food type and portion size) from food image effectively and efficiently remains a challenging and open research problem. We propose a new Convolutional Neural Network (CNN)-based food image recognition algorithm to address this problem. We applied our proposed approach to two real-world food image data sets (UEC-256 and Food-101) and achieved impressive results. To the best of our knowledge, these results outperformed all other reported work using these two data sets. Our experiments have demonstrated that the proposed approach is a promising solution for addressing the food image recognition problem. Our future work includes further improving the performance of the algorithms and integrating our system into a real-world mobile and cloud computing-based system to enhance the accuracy of current measurements of dietary intake.
Tasks Fine-Grained Image Recognition
Published 2016-06-17
URL http://arxiv.org/abs/1606.05675v1
PDF http://arxiv.org/pdf/1606.05675v1.pdf
PWC https://paperswithcode.com/paper/deepfood-deep-learning-based-food-image
Repo https://github.com/deercoder/DeepFood
Framework none

Long Short-Term Memory-Networks for Machine Reading

Title Long Short-Term Memory-Networks for Machine Reading
Authors Jianpeng Cheng, Li Dong, Mirella Lapata
Abstract In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning with memory and attention. The reader extends the Long Short-Term Memory architecture with a memory network in place of a single memory cell. This enables adaptive memory usage during recurrence with neural attention, offering a way to weakly induce relations among tokens. The system is initially designed to process a single sequence but we also demonstrate how to integrate it with an encoder-decoder architecture. Experiments on language modeling, sentiment analysis, and natural language inference show that our model matches or outperforms the state of the art.
Tasks Language Modelling, Natural Language Inference, Reading Comprehension, Sentiment Analysis
Published 2016-01-25
URL http://arxiv.org/abs/1601.06733v7
PDF http://arxiv.org/pdf/1601.06733v7.pdf
PWC https://paperswithcode.com/paper/long-short-term-memory-networks-for-machine
Repo https://github.com/JRC1995/Abstractive-Summarization
Framework tf
Title Modal-set estimation with an application to clustering
Authors Heinrich Jiang, Samory Kpotufe
Abstract We present a first procedure that can estimate – with statistical consistency guarantees – any local-maxima of a density, under benign distributional conditions. The procedure estimates all such local maxima, or $\textit{modal-sets}$, of any bounded shape or dimension, including usual point-modes. In practice, modal-sets can arise as dense low-dimensional structures in noisy data, and more generally serve to better model the rich variety of locally-high-density structures in data. The procedure is then shown to be competitive on clustering applications, and moreover is quite stable to a wide range of settings of its tuning parameter.
Tasks
Published 2016-06-13
URL http://arxiv.org/abs/1606.04166v1
PDF http://arxiv.org/pdf/1606.04166v1.pdf
PWC https://paperswithcode.com/paper/modal-set-estimation-with-an-application-to
Repo https://github.com/hhjiang/mcores
Framework none

maskSLIC: Regional Superpixel Generation with Application to Local Pathology Characterisation in Medical Images

Title maskSLIC: Regional Superpixel Generation with Application to Local Pathology Characterisation in Medical Images
Authors Benjamin Irving
Abstract Supervoxel methods such as Simple Linear Iterative Clustering (SLIC) are an effective technique for partitioning an image or volume into locally similar regions, and are a common building block for the development of detection, segmentation and analysis methods. We introduce maskSLIC an extension of SLIC to create supervoxels within regions-of-interest, and demonstrate,on examples from 2-dimensions to 4-dimensions, that maskSLIC overcomes issues that affect SLIC within an irregular mask. We highlight the benefits of this method through examples, and show that it is able to better represent underlying tumour subregions and achieves significantly better results than SLIC on the BRATS 2013 brain tumour challenge data (p=0.001) - outperforming SLIC on 18/20 scans. Finally, we show an application of this method for the analysis of functional tumour subregions and demonstrate that it is more effective than voxel clustering.
Tasks
Published 2016-06-30
URL http://arxiv.org/abs/1606.09518v2
PDF http://arxiv.org/pdf/1606.09518v2.pdf
PWC https://paperswithcode.com/paper/maskslic-regional-superpixel-generation-with
Repo https://github.com/benjaminirving/maskSLIC
Framework none
comments powered by Disqus