October 21, 2019

3052 words 15 mins read

Paper Group AWR 165

Paper Group AWR 165

Generative Code Modeling with Graphs. Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs. Quantitative analysis of patch-based fully convolutional neural networks for tissue segmentation on brain magnetic resonance imaging. FutureGAN: Anticipating the F …

Generative Code Modeling with Graphs

Title Generative Code Modeling with Graphs
Authors Marc Brockschmidt, Miltiadis Allamanis, Alexander L. Gaunt, Oleksandr Polozov
Abstract Generative models for source code are an interesting structured prediction problem, requiring to reason about both hard syntactic and semantic constraints as well as about natural, likely programs. We present a novel model for this problem that uses a graph to represent the intermediate state of the generated output. The generative procedure interleaves grammar-driven expansion steps with graph augmentation and neural message passing steps. An experimental evaluation shows that our new model can generate semantically meaningful expressions, outperforming a range of strong baselines.
Tasks Structured Prediction
Published 2018-05-22
URL http://arxiv.org/abs/1805.08490v2
PDF http://arxiv.org/pdf/1805.08490v2.pdf
PWC https://paperswithcode.com/paper/generative-code-modeling-with-graphs
Repo https://github.com/Microsoft/graph-based-code-modelling
Framework tf

Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence

Title Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence
Authors Athanasios Davvetas, Iraklis A. Klampanos, Vangelis Karkaletsis
Abstract In this paper we introduce evidence transfer for clustering, a deep learning method that can incrementally manipulate the latent representations of an autoencoder, according to external categorical evidence, in order to improve a clustering outcome. By evidence transfer we define the process by which the categorical outcome of an external, auxiliary task is exploited to improve a primary task, in this case representation learning for clustering. Our proposed method makes no assumptions regarding the categorical evidence presented, nor the structure of the latent space. We compare our method, against the baseline solution by performing k-means clustering before and after its deployment. Experiments with three different kinds of evidence show that our method effectively manipulates the latent representations when introduced with real corresponding evidence, while remaining robust when presented with low quality evidence.
Tasks Representation Learning
Published 2018-11-09
URL http://arxiv.org/abs/1811.03909v2
PDF http://arxiv.org/pdf/1811.03909v2.pdf
PWC https://paperswithcode.com/paper/evidence-transfer-for-improving-clustering
Repo https://github.com/davidath/evitrac
Framework tf

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

Title Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs
Authors Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, Andrew Gordon Wilson
Abstract The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.
Tasks
Published 2018-02-27
URL http://arxiv.org/abs/1802.10026v4
PDF http://arxiv.org/pdf/1802.10026v4.pdf
PWC https://paperswithcode.com/paper/loss-surfaces-mode-connectivity-and-fast
Repo https://github.com/chandansharma02/Deep_Learning
Framework pytorch

Quantitative analysis of patch-based fully convolutional neural networks for tissue segmentation on brain magnetic resonance imaging

Title Quantitative analysis of patch-based fully convolutional neural networks for tissue segmentation on brain magnetic resonance imaging
Authors Jose Bernal, Kaisar Kushibar, Mariano Cabezas, Sergi Valverde, Arnau Oliver, Xavier Lladó
Abstract Accurate brain tissue segmentation in Magnetic Resonance Imaging (MRI) has attracted the attention of medical doctors and researchers since variations in tissue volume help in diagnosing and monitoring neurological diseases. Several proposals have been designed throughout the years comprising conventional machine learning strategies as well as convolutional neural networks (CNN) approaches. In particular, in this paper, we analyse a sub-group of deep learning methods producing dense predictions. This branch, referred in the literature as Fully CNN (FCNN), is of interest as these architectures can process an input volume in less time than CNNs and local spatial dependencies may be encoded since several voxels are classified at once. Our study focuses on understanding architectural strengths and weaknesses of literature-like approaches. Hence, we implement eight FCNN architectures inspired by robust state-of-the-art methods on brain segmentation related tasks. We evaluate them using the IBSR18, MICCAI2012 and iSeg2017 datasets as they contain infant and adult data and exhibit varied voxel spacing, image quality, number of scans and available imaging modalities. The discussion is driven in three directions: comparison between 2D and 3D approaches, the importance of multiple modalities and overlapping as a sampling strategy for training and testing models. To encourage other researchers to explore the evaluation framework, a public version is accessible to download from our research website.
Tasks Brain Segmentation
Published 2018-01-19
URL http://arxiv.org/abs/1801.06457v2
PDF http://arxiv.org/pdf/1801.06457v2.pdf
PWC https://paperswithcode.com/paper/quantitative-analysis-of-patch-based-fully
Repo https://github.com/NIC-VICOROB/tissue_segmentation_comparison
Framework tf

FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing GANs

Title FutureGAN: Anticipating the Future Frames of Video Sequences using Spatio-Temporal 3d Convolutions in Progressively Growing GANs
Authors Sandra Aigner, Marco Körner
Abstract We introduce a new encoder-decoder GAN model, FutureGAN, that predicts future frames of a video sequence conditioned on a sequence of past frames. During training, the networks solely receive the raw pixel values as an input, without relying on additional constraints or dataset specific conditions. To capture both the spatial and temporal components of a video sequence, spatio-temporal 3d convolutions are used in all encoder and decoder modules. Further, we utilize concepts of the existing progressively growing GAN (PGGAN) that achieves high-quality results on generating high-resolution single images. The FutureGAN model extends this concept to the complex task of video prediction. We conducted experiments on three different datasets, MovingMNIST, KTH Action, and Cityscapes. Our results show that the model learned representations to transform the information of an input sequence into a plausible future sequence effectively for all three datasets. The main advantage of the FutureGAN framework is that it is applicable to various different datasets without additional changes, whilst achieving stable results that are competitive to the state-of-the-art in video prediction. Our code is available at https://github.com/TUM-LMF/FutureGAN.
Tasks Video Prediction
Published 2018-10-02
URL http://arxiv.org/abs/1810.01325v2
PDF http://arxiv.org/pdf/1810.01325v2.pdf
PWC https://paperswithcode.com/paper/futuregan-anticipating-the-future-frames-of
Repo https://github.com/TUM-LMF/FutureGAN
Framework pytorch

Learning User Preferences and Understanding Calendar Contexts for Event Scheduling

Title Learning User Preferences and Understanding Calendar Contexts for Event Scheduling
Authors Donghyeon Kim, Jinhyuk Lee, Donghee Choi, Jaehoon Choi, Jaewoo Kang
Abstract With online calendar services gaining popularity worldwide, calendar data has become one of the richest context sources for understanding human behavior. However, event scheduling is still time-consuming even with the development of online calendars. Although machine learning based event scheduling models have automated scheduling processes to some extent, they often fail to understand subtle user preferences and complex calendar contexts with event titles written in natural language. In this paper, we propose Neural Event Scheduling Assistant (NESA) which learns user preferences and understands calendar contexts, directly from raw online calendars for fully automated and highly effective event scheduling. We leverage over 593K calendar events for NESA to learn scheduling personal events, and we further utilize NESA for multi-attendee event scheduling. NESA successfully incorporates deep neural networks such as Bidirectional Long Short-Term Memory, Convolutional Neural Network, and Highway Network for learning the preferences of each user and understanding calendar context based on natural languages. The experimental results show that NESA significantly outperforms previous baseline models in terms of various evaluation metrics on both personal and multi-attendee event scheduling tasks. Our qualitative analysis demonstrates the effectiveness of each layer in NESA and learned user preferences.
Tasks
Published 2018-09-05
URL http://arxiv.org/abs/1809.01316v2
PDF http://arxiv.org/pdf/1809.01316v2.pdf
PWC https://paperswithcode.com/paper/learning-user-preferences-and-understanding
Repo https://github.com/donghyeonk/nesa
Framework pytorch

An empirical study on the names of points of interest and their changes with geographic distance

Title An empirical study on the names of points of interest and their changes with geographic distance
Authors Yingjie Hu, Krzysztof Janowicz
Abstract While Points Of Interest (POIs), such as restaurants, hotels, and barber shops, are part of urban areas irrespective of their specific locations, the names of these POIs often reveal valuable information related to local culture, landmarks, influential families, figures, events, and so on. Place names have long been studied by geographers, e.g., to understand their origins and relations to family names. However, there is a lack of large-scale empirical studies that examine the localness of place names and their changes with geographic distance. In addition to enhancing our understanding of the coherence of geographic regions, such empirical studies are also significant for geographic information retrieval where they can inform computational models and improve the accuracy of place name disambiguation. In this work, we conduct an empirical study based on 112,071 POIs in seven US metropolitan areas extracted from an open Yelp dataset. We propose to adopt term frequency and inverse document frequency in geographic contexts to identify local terms used in POI names and to analyze their usages across different POI types. Our results show an uneven usage of local terms across POI types, which is highly consistent among different geographic regions. We also examine the decaying effect of POI name similarity with the increase of distance among POIs. While our analysis focuses on urban POI names, the presented methods can be generalized to other place types as well, such as mountain peaks and streets.
Tasks Information Retrieval
Published 2018-06-21
URL http://arxiv.org/abs/1806.08040v1
PDF http://arxiv.org/pdf/1806.08040v1.pdf
PWC https://paperswithcode.com/paper/an-empirical-study-on-the-names-of-points-of
Repo https://github.com/YingjieHu/POI_Name
Framework none

Learning to Search in Long Documents Using Document Structure

Title Learning to Search in Long Documents Using Document Structure
Authors Mor Geva, Jonathan Berant
Abstract Reading comprehension models are based on recurrent neural networks that sequentially process the document tokens. As interest turns to answering more complex questions over longer documents, sequential reading of large portions of text becomes a substantial bottleneck. Inspired by how humans use document structure, we propose a novel framework for reading comprehension. We represent documents as trees, and model an agent that learns to interleave quick navigation through the document tree with more expensive answer extraction. To encourage exploration of the document tree, we propose a new algorithm, based on Deep Q-Network (DQN), which strategically samples tree nodes at training time. Empirically we find our algorithm improves question answering performance compared to DQN and a strong information-retrieval (IR) baseline, and that ensembling our model with the IR baseline results in further gains in performance.
Tasks Information Retrieval, Question Answering, Reading Comprehension
Published 2018-06-09
URL http://arxiv.org/abs/1806.03529v2
PDF http://arxiv.org/pdf/1806.03529v2.pdf
PWC https://paperswithcode.com/paper/learning-to-search-in-long-documents-using
Repo https://github.com/mega002/DocQN
Framework tf
Title Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Authors Jinfeng Rao, Wei Yang, Yuhao Zhang, Ferhan Ture, Jimmy Lin
Abstract Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have only been applied to standard ad hoc retrieval tasks over web pages and newswire documents. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network) a novel neural ranking model specifically designed for ranking short social media posts. We identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain, and present a model specifically designed with these characteristics in mind. Our model uses hierarchical convolutional layers to learn latent semantic soft-match relevance signals at the character, word, and phrase levels. A pooling-based similarity measurement layer integrates evidence from multiple types of matches between the query, the social media post, as well as URLs contained in the post. Extensive experiments using Twitter data from the TREC Microblog Tracks 2011–2014 show that our model significantly outperforms prior feature-based as well and existing neural ranking models. To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models.
Tasks Information Retrieval
Published 2018-05-21
URL https://arxiv.org/abs/1805.08159v2
PDF https://arxiv.org/pdf/1805.08159v2.pdf
PWC https://paperswithcode.com/paper/multi-perspective-relevance-matching-with
Repo https://github.com/Jeffyrao/neural-tweet-search
Framework tf

Cross-lingual Document Retrieval using Regularized Wasserstein Distance

Title Cross-lingual Document Retrieval using Regularized Wasserstein Distance
Authors Georgios Balikas, Charlotte Laclau, Ievgen Redko, Massih-Reza Amini
Abstract Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Mover’s Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both extensions in the task of cross-lingual document retrieval (CLDR). Our experimental results on eight CLDR problems suggest that the proposed methods achieve remarkable improvements in terms of Mean Reciprocal Rank compared to several baselines.
Tasks Information Retrieval
Published 2018-05-11
URL http://arxiv.org/abs/1805.04437v1
PDF http://arxiv.org/pdf/1805.04437v1.pdf
PWC https://paperswithcode.com/paper/cross-lingual-document-retrieval-using
Repo https://github.com/balikasg/WassersteinRetrieval
Framework none

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

Title Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Authors Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang
Abstract Model pruning has become a useful technique that improves the computational efficiency of deep learning, making it possible to deploy solutions in resource-limited scenarios. A widely-used practice in relevant work assumes that a smaller-norm parameter or feature plays a less informative role at the inference time. In this paper, we propose a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that does not critically rely on this assumption. Instead, it focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task of making high-dimensional tensors of CNN structured sparse. Our approach takes two stages: first to adopt an end-to- end stochastic training method that eventually forces the outputs of some channels to be constant, and then to prune those constant channels from the original neural network by adjusting the biases of their impacting layers such that the resulting compact model can be quickly fine-tuned. Our approach is mathematically appealing from an optimization perspective and easy to reproduce. We experimented our approach through several image learning benchmarks and demonstrate its interesting aspects and competitive performance.
Tasks
Published 2018-02-01
URL http://arxiv.org/abs/1802.00124v2
PDF http://arxiv.org/pdf/1802.00124v2.pdf
PWC https://paperswithcode.com/paper/rethinking-the-smaller-norm-less-informative
Repo https://github.com/jack-willturner/batchnorm-pruning
Framework pytorch

Distillation Techniques for Pseudo-rehearsal Based Incremental Learning

Title Distillation Techniques for Pseudo-rehearsal Based Incremental Learning
Authors Haseeb Shah, Khurram Javed, Faisal Shafait
Abstract The ability to learn from incrementally arriving data is essential for any life-long learning system. However, standard deep neural networks forget the knowledge about the old tasks, a phenomenon called catastrophic forgetting, when trained on incrementally arriving data. We discuss the biases in current Generative Adversarial Networks (GAN) based approaches that learn the classifier by knowledge distillation from previously trained classifiers. These biases cause the trained classifier to perform poorly. We propose an approach to remove these biases by distilling knowledge from the classifier of AC-GAN. Experiments on MNIST and CIFAR10 show that this method is comparable to current state of the art rehearsal based approaches. The code for this paper is available at https://bit.ly/incremental-learning
Tasks
Published 2018-07-08
URL http://arxiv.org/abs/1807.02799v3
PDF http://arxiv.org/pdf/1807.02799v3.pdf
PWC https://paperswithcode.com/paper/distillation-techniques-for-pseudo-rehearsal
Repo https://github.com/haseebs/Pseudo-rehearsal-Incremental-Learning
Framework pytorch

The Weighted Kendall and High-order Kernels for Permutations

Title The Weighted Kendall and High-order Kernels for Permutations
Authors Yunlong Jiao, Jean-Philippe Vert
Abstract We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks. Like the Kendall kernel, we show that the weighted version is invariant to relabeling of items and can be computed efficiently in $O(n \ln(n))$ operations, where $n$ is the number of items in the permutation. Second, we propose a supervised approach to learn the weights by jointly optimizing them with the function estimated by a kernel machine. Third, while the Kendall kernel considers pairwise comparison between items, we extend it by considering higher-order comparisons among tuples of items and show that the supervised approach of learning the weights can be systematically generalized to higher-order permutation kernels.
Tasks
Published 2018-02-23
URL http://arxiv.org/abs/1802.08526v2
PDF http://arxiv.org/pdf/1802.08526v2.pdf
PWC https://paperswithcode.com/paper/the-weighted-kendall-and-high-order-kernels
Repo https://github.com/YunlongJiao/weightedkendall
Framework none

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

Title Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions
Authors Sjoerd van Steenkiste, Michael Chang, Klaus Greff, Jürgen Schmidhuber
Abstract Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently unobserved. In order to match real-world conditions this causal knowledge must be learned without access to supervised data. To address this problem we present a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely \emph{unsupervised} fashion. It incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently. On videos of bouncing balls we show the superior modelling capabilities of our method compared to other unsupervised neural approaches that do not incorporate such prior knowledge. We demonstrate its ability to handle occlusion and show that it can extrapolate learned knowledge to scenes with different numbers of objects.
Tasks Common Sense Reasoning
Published 2018-02-28
URL http://arxiv.org/abs/1802.10353v1
PDF http://arxiv.org/pdf/1802.10353v1.pdf
PWC https://paperswithcode.com/paper/relational-neural-expectation-maximization
Repo https://github.com/sjoerdvansteenkiste/Relational-NEM
Framework tf

Cold-Start Aware User and Product Attention for Sentiment Classification

Title Cold-Start Aware User and Product Attention for Sentiment Classification
Authors Reinald Kim Amplayo, Jihyeok Kim, Sua Sung, Seung-won Hwang
Abstract The use of user/product information in sentiment analysis is important, especially for cold-start users/products, whose number of reviews are very limited. However, current models do not deal with the cold-start problem which is typical in review websites. In this paper, we present Hybrid Contextualized Sentiment Classifier (HCSC), which contains two modules: (1) a fast word encoder that returns word vectors embedded with short and long range dependency features; and (2) Cold-Start Aware Attention (CSAA), an attention mechanism that considers the existence of cold-start problem when attentively pooling the encoded word vectors. HCSC introduces shared vectors that are constructed from similar users/products, and are used when the original distinct vectors do not have sufficient information (i.e. cold-start). This is decided by a frequency-guided selective gate vector. Our experiments show that in terms of RMSE, HCSC performs significantly better when compared with on famous datasets, despite having less complexity, and thus can be trained much faster. More importantly, our model performs significantly better than previous models when the training data is sparse and has cold-start problems.
Tasks Sentiment Analysis
Published 2018-06-14
URL http://arxiv.org/abs/1806.05507v1
PDF http://arxiv.org/pdf/1806.05507v1.pdf
PWC https://paperswithcode.com/paper/cold-start-aware-user-and-product-attention
Repo https://github.com/rktamplayo/HCSC
Framework tf
comments powered by Disqus