May 7, 2019

2721 words 13 mins read

Paper Group AWR 81

Generative Adversarial Networks as Variational Training of Energy Based Models. Learning Representations for Counterfactual Inference. An Empirical Study of Language CNN for Image Captioning. Text-guided Attention Model for Image Captioning. IM2CAD. RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised View …

Generative Adversarial Networks as Variational Training of Energy Based Models


Title	Generative Adversarial Networks as Variational Training of Energy Based Models
Authors	Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang
Abstract	In this paper, we study deep generative models for effective unsupervised learning. We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from. The training of VGAN takes a two step procedure: given $p(\mathbf{x})$, $q(\mathbf{x})$ is updated to maximize the lower bound; $p(\mathbf{x})$ is then updated one step with samples drawn from $q(\mathbf{x})$ to decrease the lower bound. VGAN is inspired by the generative adversarial networks (GANs), where $p(\mathbf{x})$ corresponds to the discriminator and $q(\mathbf{x})$ corresponds to the generator, but with several notable differences. We hence name our model variational GANs (VGANs). VGAN provides a practical solution to training deep EBMs in high dimensional space, by eliminating the need of MCMC sampling. From this view, we are also able to identify causes to the difficulty of training GANs and propose viable solutions. \footnote{Experimental code is available at https://github.com/Shuangfei/vgan}
Tasks
Published	2016-11-06
URL	http://arxiv.org/abs/1611.01799v1
PDF	http://arxiv.org/pdf/1611.01799v1.pdf
PWC	https://paperswithcode.com/paper/generative-adversarial-networks-as
Repo	https://github.com/Shuangfei/vgan
Framework	none

Learning Representations for Counterfactual Inference


Title	Learning Representations for Counterfactual Inference
Authors	Fredrik D. Johansson, Uri Shalit, David Sontag
Abstract	Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, “Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.
Tasks	Causal Inference, Counterfactual Inference, Domain Adaptation, Representation Learning
Published	2016-05-12
URL	http://arxiv.org/abs/1605.03661v3
PDF	http://arxiv.org/pdf/1605.03661v3.pdf
PWC	https://paperswithcode.com/paper/learning-representations-for-counterfactual
Repo	https://github.com/inzouzouwetrust/BSTATS
Framework	tf

An Empirical Study of Language CNN for Image Captioning


Title	An Empirical Study of Language CNN for Image Captioning
Authors	Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen
Abstract	Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies of history words, which are critical for image captioning. The effectiveness of our approach is validated on two datasets MS COCO and Flickr30K. Our extensive experimental results show that our method outperforms the vanilla recurrent neural network based language models and is competitive with the state-of-the-art methods.
Tasks	Image Captioning, Language Modelling
Published	2016-12-21
URL	http://arxiv.org/abs/1612.07086v3
PDF	http://arxiv.org/pdf/1612.07086v3.pdf
PWC	https://paperswithcode.com/paper/an-empirical-study-of-language-cnn-for-image
Repo	https://github.com/showkeyjar/chinese_im2text.pytorch
Framework	pytorch

Text-guided Attention Model for Image Captioning


Title	Text-guided Attention Model for Image Captioning
Authors	Jonghwan Mun, Minsu Cho, Bohyung Han
Abstract	Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics.
Tasks	Image Captioning
Published	2016-12-12
URL	http://arxiv.org/abs/1612.03557v1
PDF	http://arxiv.org/pdf/1612.03557v1.pdf
PWC	https://paperswithcode.com/paper/text-guided-attention-model-for-image
Repo	https://github.com/vikramnitin9/nnfl
Framework	tf

IM2CAD


Title	IM2CAD
Authors	Hamid Izadinia, Qi Shan, Steven M. Seitz
Abstract	Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement.
Tasks	Scene Understanding
Published	2016-08-18
URL	http://arxiv.org/abs/1608.05137v2
PDF	http://arxiv.org/pdf/1608.05137v2.pdf
PWC	https://paperswithcode.com/paper/im2cad
Repo	https://github.com/BenjaminPoilve/Deep-Learning-ressources
Framework	tf

RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints


Title	RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
Authors	Asako Kanezaki, Yasuyuki Matsushita, Yoshifumi Nishida
Abstract	We propose a Convolutional Neural Network (CNN)-based model “RotationNet,” which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10- and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on https://github.com/kanezaki/rotationnet
Tasks	3D Object Classification, Object Classification, Pose Estimation
Published	2016-03-20
URL	http://arxiv.org/abs/1603.06208v4
PDF	http://arxiv.org/pdf/1603.06208v4.pdf
PWC	https://paperswithcode.com/paper/rotationnet-joint-object-categorization-and
Repo	https://github.com/kanezaki/rotationnet
Framework	none

Learning Recurrent Span Representations for Extractive Question Answering


Title	Learning Recurrent Span Representations for Extractive Question Answering
Authors	Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, Jonathan Berant
Abstract	The reading comprehension task, that asks questions about a given evidence document, is a central problem in natural language understanding. Recent formulations of this task have typically focused on answer selection from a set of candidates pre-defined manually or through the use of an external NLP pipeline. However, Rajpurkar et al. (2016) recently released the SQuAD dataset in which the answers can be arbitrary strings from the supplied text. In this paper, we focus on this answer extraction task, presenting a novel model architecture that efficiently builds fixed length representations of all spans in the evidence document with a recurrent network. We show that scoring explicit span representations significantly improves performance over other approaches that factor the prediction into separate predictions about words or start and end markers. Our approach improves upon the best published results of Wang & Jiang (2016) by 5% and decreases the error of Rajpurkar et al.‘s baseline by > 50%.
Tasks	Answer Selection, Question Answering, Reading Comprehension
Published	2016-11-04
URL	http://arxiv.org/abs/1611.01436v2
PDF	http://arxiv.org/pdf/1611.01436v2.pdf
PWC	https://paperswithcode.com/paper/learning-recurrent-span-representations-for
Repo	https://github.com/asadovsky/nn
Framework	tf

Holophrasm: a neural Automated Theorem Prover for higher-order logic


Title	Holophrasm: a neural Automated Theorem Prover for higher-order logic
Authors	Daniel Whalen
Abstract	I propose a system for Automated Theorem Proving in higher order logic using deep learning and eschewing hand-constructed features. Holophrasm exploits the formalism of the Metamath language and explores partial proof trees using a neural-network-augmented bandit algorithm and a sequence-to-sequence model for action enumeration. The system proves 14% of its test theorems from Metamath’s set.mm module.
Tasks	Automated Theorem Proving
Published	2016-08-08
URL	http://arxiv.org/abs/1608.02644v2
PDF	http://arxiv.org/pdf/1608.02644v2.pdf
PWC	https://paperswithcode.com/paper/holophrasm-a-neural-automated-theorem-prover
Repo	https://github.com/justin941208/SPIA-Project
Framework	none

Multi-Perspective Context Matching for Machine Comprehension


Title	Multi-Perspective Context Matching for Machine Comprehension
Authors	Zhiguo Wang, Haitao Mi, Wael Hamza, Radu Florian
Abstract	Previous machine comprehension (MC) datasets are either too small to train end-to-end deep learning models, or not difficult enough to evaluate the ability of current MC techniques. The newly released SQuAD dataset alleviates these limitations, and gives us a chance to develop more realistic MC models. Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) model, which is an end-to-end system that directly predicts the answer beginning and ending points in a passage. Our model first adjusts each word-embedding vector in the passage by multiplying a relevancy weight computed against the question. Then, we encode the question and weighted passage by using bi-directional LSTMs. For each point in the passage, our model matches the context of this point against the encoded question from multiple perspectives and produces a matching vector. Given those matched vectors, we employ another bi-directional LSTM to aggregate all the information and predict the beginning and ending points. Experimental result on the test set of SQuAD shows that our model achieves a competitive result on the leaderboard.
Tasks	Question Answering, Reading Comprehension
Published	2016-12-13
URL	http://arxiv.org/abs/1612.04211v1
PDF	http://arxiv.org/pdf/1612.04211v1.pdf
PWC	https://paperswithcode.com/paper/multi-perspective-context-matching-for
Repo	https://github.com/bloomsburyai/question-generation
Framework	tf

Bidirectional Attention Flow for Machine Comprehension


Title	Bidirectional Attention Flow for Machine Comprehension
Authors	Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi
Abstract	Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed-size vector, couple attentions temporally, and/or often form a uni-directional attention. In this paper we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Our experimental evaluations show that our model achieves the state-of-the-art results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze test.
Tasks	Open-Domain Question Answering, Question Answering, Reading Comprehension
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01603v6
PDF	http://arxiv.org/pdf/1611.01603v6.pdf
PWC	https://paperswithcode.com/paper/bidirectional-attention-flow-for-machine
Repo	https://github.com/ghus75/Question_Answering
Framework	none

Dynamic Coattention Networks For Question Answering


Title	Dynamic Coattention Networks For Question Answering
Authors	Caiming Xiong, Victor Zhong, Richard Socher
Abstract	Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1.
Tasks	Question Answering
Published	2016-11-05
URL	http://arxiv.org/abs/1611.01604v4
PDF	http://arxiv.org/pdf/1611.01604v4.pdf
PWC	https://paperswithcode.com/paper/dynamic-coattention-networks-for-question
Repo	https://github.com/lmn-extracts/dcn_plus
Framework	tf

DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment


Title	DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment
Authors	Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, Yunsheng Ma
Abstract	Worldwide, in 2014, more than 1.9 billion adults, 18 years and older, were overweight. Of these, over 600 million were obese. Accurately documenting dietary caloric intake is crucial to manage weight loss, but also presents challenges because most of the current methods for dietary assessment must rely on memory to recall foods eaten. The ultimate goal of our research is to develop computer-aided technical solutions to enhance and improve the accuracy of current measurements of dietary intake. Our proposed system in this paper aims to improve the accuracy of dietary assessment by analyzing the food images captured by mobile devices (e.g., smartphone). The key technique innovation in this paper is the deep learning-based food image recognition algorithms. Substantial research has demonstrated that digital imaging accurately estimates dietary intake in many environments and it has many advantages over other methods. However, how to derive the food information (e.g., food type and portion size) from food image effectively and efficiently remains a challenging and open research problem. We propose a new Convolutional Neural Network (CNN)-based food image recognition algorithm to address this problem. We applied our proposed approach to two real-world food image data sets (UEC-256 and Food-101) and achieved impressive results. To the best of our knowledge, these results outperformed all other reported work using these two data sets. Our experiments have demonstrated that the proposed approach is a promising solution for addressing the food image recognition problem. Our future work includes further improving the performance of the algorithms and integrating our system into a real-world mobile and cloud computing-based system to enhance the accuracy of current measurements of dietary intake.
Tasks	Fine-Grained Image Recognition
Published	2016-06-17
URL	http://arxiv.org/abs/1606.05675v1
PDF	http://arxiv.org/pdf/1606.05675v1.pdf
PWC	https://paperswithcode.com/paper/deepfood-deep-learning-based-food-image
Repo	https://github.com/deercoder/DeepFood
Framework	none

Long Short-Term Memory-Networks for Machine Reading


Title	Long Short-Term Memory-Networks for Machine Reading
Authors	Jianpeng Cheng, Li Dong, Mirella Lapata
Abstract	In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning with memory and attention. The reader extends the Long Short-Term Memory architecture with a memory network in place of a single memory cell. This enables adaptive memory usage during recurrence with neural attention, offering a way to weakly induce relations among tokens. The system is initially designed to process a single sequence but we also demonstrate how to integrate it with an encoder-decoder architecture. Experiments on language modeling, sentiment analysis, and natural language inference show that our model matches or outperforms the state of the art.
Tasks	Language Modelling, Natural Language Inference, Reading Comprehension, Sentiment Analysis
Published	2016-01-25
URL	http://arxiv.org/abs/1601.06733v7
PDF	http://arxiv.org/pdf/1601.06733v7.pdf
PWC	https://paperswithcode.com/paper/long-short-term-memory-networks-for-machine
Repo	https://github.com/JRC1995/Abstractive-Summarization
Framework	tf


Title	Modal-set estimation with an application to clustering
Authors	Heinrich Jiang, Samory Kpotufe
Abstract	We present a first procedure that can estimate – with statistical consistency guarantees – any local-maxima of a density, under benign distributional conditions. The procedure estimates all such local maxima, or $\textit{modal-sets}$, of any bounded shape or dimension, including usual point-modes. In practice, modal-sets can arise as dense low-dimensional structures in noisy data, and more generally serve to better model the rich variety of locally-high-density structures in data. The procedure is then shown to be competitive on clustering applications, and moreover is quite stable to a wide range of settings of its tuning parameter.
Tasks
Published	2016-06-13
URL	http://arxiv.org/abs/1606.04166v1
PDF	http://arxiv.org/pdf/1606.04166v1.pdf
PWC	https://paperswithcode.com/paper/modal-set-estimation-with-an-application-to
Repo	https://github.com/hhjiang/mcores
Framework	none

maskSLIC: Regional Superpixel Generation with Application to Local Pathology Characterisation in Medical Images


Title	maskSLIC: Regional Superpixel Generation with Application to Local Pathology Characterisation in Medical Images
Authors	Benjamin Irving
Abstract	Supervoxel methods such as Simple Linear Iterative Clustering (SLIC) are an effective technique for partitioning an image or volume into locally similar regions, and are a common building block for the development of detection, segmentation and analysis methods. We introduce maskSLIC an extension of SLIC to create supervoxels within regions-of-interest, and demonstrate,on examples from 2-dimensions to 4-dimensions, that maskSLIC overcomes issues that affect SLIC within an irregular mask. We highlight the benefits of this method through examples, and show that it is able to better represent underlying tumour subregions and achieves significantly better results than SLIC on the BRATS 2013 brain tumour challenge data (p=0.001) - outperforming SLIC on 18/20 scans. Finally, we show an application of this method for the analysis of functional tumour subregions and demonstrate that it is more effective than voxel clustering.
Tasks
Published	2016-06-30
URL	http://arxiv.org/abs/1606.09518v2
PDF	http://arxiv.org/pdf/1606.09518v2.pdf
PWC	https://paperswithcode.com/paper/maskslic-regional-superpixel-generation-with
Repo	https://github.com/benjaminirving/maskSLIC
Framework	none