Paper Group AWR 81
Generative Adversarial Networks as Variational Training of Energy Based Models. Learning Representations for Counterfactual Inference. An Empirical Study of Language CNN for Image Captioning. Text-guided Attention Model for Image Captioning. IM2CAD. RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised View …
Generative Adversarial Networks as Variational Training of Energy Based Models
Title | Generative Adversarial Networks as Variational Training of Energy Based Models |
Authors | Shuangfei Zhai, Yu Cheng, Rogerio Feris, Zhongfei Zhang |
Abstract | In this paper, we study deep generative models for effective unsupervised learning. We propose VGAN, which works by minimizing a variational lower bound of the negative log likelihood (NLL) of an energy based model (EBM), where the model density $p(\mathbf{x})$ is approximated by a variational distribution $q(\mathbf{x})$ that is easy to sample from. The training of VGAN takes a two step procedure: given $p(\mathbf{x})$, $q(\mathbf{x})$ is updated to maximize the lower bound; $p(\mathbf{x})$ is then updated one step with samples drawn from $q(\mathbf{x})$ to decrease the lower bound. VGAN is inspired by the generative adversarial networks (GANs), where $p(\mathbf{x})$ corresponds to the discriminator and $q(\mathbf{x})$ corresponds to the generator, but with several notable differences. We hence name our model variational GANs (VGANs). VGAN provides a practical solution to training deep EBMs in high dimensional space, by eliminating the need of MCMC sampling. From this view, we are also able to identify causes to the difficulty of training GANs and propose viable solutions. \footnote{Experimental code is available at https://github.com/Shuangfei/vgan} |
Tasks | |
Published | 2016-11-06 |
URL | http://arxiv.org/abs/1611.01799v1 |
http://arxiv.org/pdf/1611.01799v1.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-networks-as |
Repo | https://github.com/Shuangfei/vgan |
Framework | none |
Learning Representations for Counterfactual Inference
Title | Learning Representations for Counterfactual Inference |
Authors | Fredrik D. Johansson, Uri Shalit, David Sontag |
Abstract | Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, “Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. |
Tasks | Causal Inference, Counterfactual Inference, Domain Adaptation, Representation Learning |
Published | 2016-05-12 |
URL | http://arxiv.org/abs/1605.03661v3 |
http://arxiv.org/pdf/1605.03661v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-for-counterfactual |
Repo | https://github.com/inzouzouwetrust/BSTATS |
Framework | tf |
An Empirical Study of Language CNN for Image Captioning
Title | An Empirical Study of Language CNN for Image Captioning |
Authors | Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen |
Abstract | Language Models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a Language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all the previous words and can model the long-range dependencies of history words, which are critical for image captioning. The effectiveness of our approach is validated on two datasets MS COCO and Flickr30K. Our extensive experimental results show that our method outperforms the vanilla recurrent neural network based language models and is competitive with the state-of-the-art methods. |
Tasks | Image Captioning, Language Modelling |
Published | 2016-12-21 |
URL | http://arxiv.org/abs/1612.07086v3 |
http://arxiv.org/pdf/1612.07086v3.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-study-of-language-cnn-for-image |
Repo | https://github.com/showkeyjar/chinese_im2text.pytorch |
Framework | pytorch |
Text-guided Attention Model for Image Captioning
Title | Text-guided Attention Model for Image Captioning |
Authors | Jonghwan Mun, Minsu Cho, Bohyung Han |
Abstract | Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images. On the other hand, recent studies show that language associated with an image can steer visual attention in the scene during our cognitive process. Inspired by this, we introduce a text-guided attention model for image captioning, which learns to drive visual attention using associated captions. For this model, we propose an exemplar-based learning approach that retrieves from training data associated captions with each image, and use them to learn attention on visual features. Our attention model enables to describe a detailed state of scenes by distinguishing small or confusable objects effectively. We validate our model on MS-COCO Captioning benchmark and achieve the state-of-the-art performance in standard metrics. |
Tasks | Image Captioning |
Published | 2016-12-12 |
URL | http://arxiv.org/abs/1612.03557v1 |
http://arxiv.org/pdf/1612.03557v1.pdf | |
PWC | https://paperswithcode.com/paper/text-guided-attention-model-for-image |
Repo | https://github.com/vikramnitin9/nnfl |
Framework | tf |
IM2CAD
Title | IM2CAD |
Authors | Hamid Izadinia, Qi Shan, Steven M. Seitz |
Abstract | Given a single photo of a room and a large database of furniture CAD models, our goal is to reconstruct a scene that is as similar as possible to the scene depicted in the photograph, and composed of objects drawn from the database. We present a completely automatic system to address this IM2CAD problem that produces high quality results on challenging imagery from interior home design and remodeling websites. Our approach iteratively optimizes the placement and scale of objects in the room to best match scene renderings to the input photo, using image comparison metrics trained via deep convolutional neural nets. By operating jointly on the full scene at once, we account for inter-object occlusions. We also show the applicability of our method in standard scene understanding benchmarks where we obtain significant improvement. |
Tasks | Scene Understanding |
Published | 2016-08-18 |
URL | http://arxiv.org/abs/1608.05137v2 |
http://arxiv.org/pdf/1608.05137v2.pdf | |
PWC | https://paperswithcode.com/paper/im2cad |
Repo | https://github.com/BenjaminPoilve/Deep-Learning-ressources |
Framework | tf |
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
Title | RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints |
Authors | Asako Kanezaki, Yasuyuki Matsushita, Yoshifumi Nishida |
Abstract | We propose a Convolutional Neural Network (CNN)-based model “RotationNet,” which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10- and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on https://github.com/kanezaki/rotationnet |
Tasks | 3D Object Classification, Object Classification, Pose Estimation |
Published | 2016-03-20 |
URL | http://arxiv.org/abs/1603.06208v4 |
http://arxiv.org/pdf/1603.06208v4.pdf | |
PWC | https://paperswithcode.com/paper/rotationnet-joint-object-categorization-and |
Repo | https://github.com/kanezaki/rotationnet |
Framework | none |
Learning Recurrent Span Representations for Extractive Question Answering
Title | Learning Recurrent Span Representations for Extractive Question Answering |
Authors | Kenton Lee, Shimi Salant, Tom Kwiatkowski, Ankur Parikh, Dipanjan Das, Jonathan Berant |
Abstract | The reading comprehension task, that asks questions about a given evidence document, is a central problem in natural language understanding. Recent formulations of this task have typically focused on answer selection from a set of candidates pre-defined manually or through the use of an external NLP pipeline. However, Rajpurkar et al. (2016) recently released the SQuAD dataset in which the answers can be arbitrary strings from the supplied text. In this paper, we focus on this answer extraction task, presenting a novel model architecture that efficiently builds fixed length representations of all spans in the evidence document with a recurrent network. We show that scoring explicit span representations significantly improves performance over other approaches that factor the prediction into separate predictions about words or start and end markers. Our approach improves upon the best published results of Wang & Jiang (2016) by 5% and decreases the error of Rajpurkar et al.‘s baseline by > 50%. |
Tasks | Answer Selection, Question Answering, Reading Comprehension |
Published | 2016-11-04 |
URL | http://arxiv.org/abs/1611.01436v2 |
http://arxiv.org/pdf/1611.01436v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-recurrent-span-representations-for |
Repo | https://github.com/asadovsky/nn |
Framework | tf |
Holophrasm: a neural Automated Theorem Prover for higher-order logic
Title | Holophrasm: a neural Automated Theorem Prover for higher-order logic |
Authors | Daniel Whalen |
Abstract | I propose a system for Automated Theorem Proving in higher order logic using deep learning and eschewing hand-constructed features. Holophrasm exploits the formalism of the Metamath language and explores partial proof trees using a neural-network-augmented bandit algorithm and a sequence-to-sequence model for action enumeration. The system proves 14% of its test theorems from Metamath’s set.mm module. |
Tasks | Automated Theorem Proving |
Published | 2016-08-08 |
URL | http://arxiv.org/abs/1608.02644v2 |
http://arxiv.org/pdf/1608.02644v2.pdf | |
PWC | https://paperswithcode.com/paper/holophrasm-a-neural-automated-theorem-prover |
Repo | https://github.com/justin941208/SPIA-Project |
Framework | none |
Multi-Perspective Context Matching for Machine Comprehension
Title | Multi-Perspective Context Matching for Machine Comprehension |
Authors | Zhiguo Wang, Haitao Mi, Wael Hamza, Radu Florian |
Abstract | Previous machine comprehension (MC) datasets are either too small to train end-to-end deep learning models, or not difficult enough to evaluate the ability of current MC techniques. The newly released SQuAD dataset alleviates these limitations, and gives us a chance to develop more realistic MC models. Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) model, which is an end-to-end system that directly predicts the answer beginning and ending points in a passage. Our model first adjusts each word-embedding vector in the passage by multiplying a relevancy weight computed against the question. Then, we encode the question and weighted passage by using bi-directional LSTMs. For each point in the passage, our model matches the context of this point against the encoded question from multiple perspectives and produces a matching vector. Given those matched vectors, we employ another bi-directional LSTM to aggregate all the information and predict the beginning and ending points. Experimental result on the test set of SQuAD shows that our model achieves a competitive result on the leaderboard. |
Tasks | Question Answering, Reading Comprehension |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04211v1 |
http://arxiv.org/pdf/1612.04211v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-perspective-context-matching-for |
Repo | https://github.com/bloomsburyai/question-generation |
Framework | tf |
Bidirectional Attention Flow for Machine Comprehension
Title | Bidirectional Attention Flow for Machine Comprehension |
Authors | Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi |
Abstract | Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query. Recently, attention mechanisms have been successfully extended to MC. Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed-size vector, couple attentions temporally, and/or often form a uni-directional attention. In this paper we introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Our experimental evaluations show that our model achieves the state-of-the-art results in Stanford Question Answering Dataset (SQuAD) and CNN/DailyMail cloze test. |
Tasks | Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2016-11-05 |
URL | http://arxiv.org/abs/1611.01603v6 |
http://arxiv.org/pdf/1611.01603v6.pdf | |
PWC | https://paperswithcode.com/paper/bidirectional-attention-flow-for-machine |
Repo | https://github.com/ghus75/Question_Answering |
Framework | none |
Dynamic Coattention Networks For Question Answering
Title | Dynamic Coattention Networks For Question Answering |
Authors | Caiming Xiong, Victor Zhong, Richard Socher |
Abstract | Several deep learning models have been proposed for question answering. However, due to their single-pass nature, they have no way to recover from local maxima corresponding to incorrect answers. To address this problem, we introduce the Dynamic Coattention Network (DCN) for question answering. The DCN first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both. Then a dynamic pointing decoder iterates over potential answer spans. This iterative procedure enables the model to recover from initial local maxima corresponding to incorrect answers. On the Stanford question answering dataset, a single DCN model improves the previous state of the art from 71.0% F1 to 75.9%, while a DCN ensemble obtains 80.4% F1. |
Tasks | Question Answering |
Published | 2016-11-05 |
URL | http://arxiv.org/abs/1611.01604v4 |
http://arxiv.org/pdf/1611.01604v4.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-coattention-networks-for-question |
Repo | https://github.com/lmn-extracts/dcn_plus |
Framework | tf |
DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment
Title | DeepFood: Deep Learning-Based Food Image Recognition for Computer-Aided Dietary Assessment |
Authors | Chang Liu, Yu Cao, Yan Luo, Guanling Chen, Vinod Vokkarane, Yunsheng Ma |
Abstract | Worldwide, in 2014, more than 1.9 billion adults, 18 years and older, were overweight. Of these, over 600 million were obese. Accurately documenting dietary caloric intake is crucial to manage weight loss, but also presents challenges because most of the current methods for dietary assessment must rely on memory to recall foods eaten. The ultimate goal of our research is to develop computer-aided technical solutions to enhance and improve the accuracy of current measurements of dietary intake. Our proposed system in this paper aims to improve the accuracy of dietary assessment by analyzing the food images captured by mobile devices (e.g., smartphone). The key technique innovation in this paper is the deep learning-based food image recognition algorithms. Substantial research has demonstrated that digital imaging accurately estimates dietary intake in many environments and it has many advantages over other methods. However, how to derive the food information (e.g., food type and portion size) from food image effectively and efficiently remains a challenging and open research problem. We propose a new Convolutional Neural Network (CNN)-based food image recognition algorithm to address this problem. We applied our proposed approach to two real-world food image data sets (UEC-256 and Food-101) and achieved impressive results. To the best of our knowledge, these results outperformed all other reported work using these two data sets. Our experiments have demonstrated that the proposed approach is a promising solution for addressing the food image recognition problem. Our future work includes further improving the performance of the algorithms and integrating our system into a real-world mobile and cloud computing-based system to enhance the accuracy of current measurements of dietary intake. |
Tasks | Fine-Grained Image Recognition |
Published | 2016-06-17 |
URL | http://arxiv.org/abs/1606.05675v1 |
http://arxiv.org/pdf/1606.05675v1.pdf | |
PWC | https://paperswithcode.com/paper/deepfood-deep-learning-based-food-image |
Repo | https://github.com/deercoder/DeepFood |
Framework | none |
Long Short-Term Memory-Networks for Machine Reading
Title | Long Short-Term Memory-Networks for Machine Reading |
Authors | Jianpeng Cheng, Li Dong, Mirella Lapata |
Abstract | In this paper we address the question of how to render sequence-level networks better at handling structured input. We propose a machine reading simulator which processes text incrementally from left to right and performs shallow reasoning with memory and attention. The reader extends the Long Short-Term Memory architecture with a memory network in place of a single memory cell. This enables adaptive memory usage during recurrence with neural attention, offering a way to weakly induce relations among tokens. The system is initially designed to process a single sequence but we also demonstrate how to integrate it with an encoder-decoder architecture. Experiments on language modeling, sentiment analysis, and natural language inference show that our model matches or outperforms the state of the art. |
Tasks | Language Modelling, Natural Language Inference, Reading Comprehension, Sentiment Analysis |
Published | 2016-01-25 |
URL | http://arxiv.org/abs/1601.06733v7 |
http://arxiv.org/pdf/1601.06733v7.pdf | |
PWC | https://paperswithcode.com/paper/long-short-term-memory-networks-for-machine |
Repo | https://github.com/JRC1995/Abstractive-Summarization |
Framework | tf |
Modal-set estimation with an application to clustering
Title | Modal-set estimation with an application to clustering |
Authors | Heinrich Jiang, Samory Kpotufe |
Abstract | We present a first procedure that can estimate – with statistical consistency guarantees – any local-maxima of a density, under benign distributional conditions. The procedure estimates all such local maxima, or $\textit{modal-sets}$, of any bounded shape or dimension, including usual point-modes. In practice, modal-sets can arise as dense low-dimensional structures in noisy data, and more generally serve to better model the rich variety of locally-high-density structures in data. The procedure is then shown to be competitive on clustering applications, and moreover is quite stable to a wide range of settings of its tuning parameter. |
Tasks | |
Published | 2016-06-13 |
URL | http://arxiv.org/abs/1606.04166v1 |
http://arxiv.org/pdf/1606.04166v1.pdf | |
PWC | https://paperswithcode.com/paper/modal-set-estimation-with-an-application-to |
Repo | https://github.com/hhjiang/mcores |
Framework | none |
maskSLIC: Regional Superpixel Generation with Application to Local Pathology Characterisation in Medical Images
Title | maskSLIC: Regional Superpixel Generation with Application to Local Pathology Characterisation in Medical Images |
Authors | Benjamin Irving |
Abstract | Supervoxel methods such as Simple Linear Iterative Clustering (SLIC) are an effective technique for partitioning an image or volume into locally similar regions, and are a common building block for the development of detection, segmentation and analysis methods. We introduce maskSLIC an extension of SLIC to create supervoxels within regions-of-interest, and demonstrate,on examples from 2-dimensions to 4-dimensions, that maskSLIC overcomes issues that affect SLIC within an irregular mask. We highlight the benefits of this method through examples, and show that it is able to better represent underlying tumour subregions and achieves significantly better results than SLIC on the BRATS 2013 brain tumour challenge data (p=0.001) - outperforming SLIC on 18/20 scans. Finally, we show an application of this method for the analysis of functional tumour subregions and demonstrate that it is more effective than voxel clustering. |
Tasks | |
Published | 2016-06-30 |
URL | http://arxiv.org/abs/1606.09518v2 |
http://arxiv.org/pdf/1606.09518v2.pdf | |
PWC | https://paperswithcode.com/paper/maskslic-regional-superpixel-generation-with |
Repo | https://github.com/benjaminirving/maskSLIC |
Framework | none |