October 16, 2019

2798 words 14 mins read

Paper Group NAWR 17

Paper Group NAWR 17

Learning Representations Specialized in Spatial Knowledge: Leveraging Language and Vision. An End-to-End Deep Learning Architecture for Graph Classification. Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures. Efficient Gradient-Free Variational Inference using Policy Search. Dialog-to-Action: Conve …

Learning Representations Specialized in Spatial Knowledge: Leveraging Language and Vision

Title Learning Representations Specialized in Spatial Knowledge: Leveraging Language and Vision
Authors Guillem Collell, Marie-Francine Moens
Abstract Spatial understanding is crucial in many real-world problems, yet little progress has been made towards building representations that capture spatial knowledge. Here, we move one step forward in this direction and learn such representations by leveraging a task consisting in predicting continuous 2D spatial arrangements of objects given object-relationship-object instances (e.g., {``}cat under chair{''}) and a simple neural network model that learns the task from annotated images. We show that the model succeeds in this task and, furthermore, that it is capable of predicting correct spatial arrangements for unseen objects if either CNN features or word embeddings of the objects are provided. The differences between visual and linguistic features are discussed. Next, to evaluate the spatial representations learned in the previous task, we introduce a task and a dataset consisting in a set of crowdsourced human ratings of spatial similarity for object pairs. We find that both CNN (convolutional neural network) features and word embeddings predict human judgments of similarity well and that these vectors can be further specialized in spatial knowledge if we update them when training the model that predicts spatial arrangements of objects. Overall, this paper paves the way towards building distributed spatial representations, contributing to the understanding of spatial expressions in language. |
Tasks Dependency Parsing, Robot Navigation, Semantic Textual Similarity, Sentiment Analysis, Word Embeddings
Published 2018-01-01
URL https://www.aclweb.org/anthology/Q18-1010/
PDF https://www.aclweb.org/anthology/Q18-1010
PWC https://paperswithcode.com/paper/learning-representations-specialized-in
Repo https://github.com/gcollell/spatial-representations
Framework none

An End-to-End Deep Learning Architecture for Graph Classification

Title An End-to-End Deep Learning Architecture for Graph Classification
Authors Muhan Zhang, Zhicheng Cui, Marion Neumann, Yixin Chen
Abstract Neural networks are typically designed to deal with data in tensor forms. In this paper, we propose a novel neural network architecture accepting graphs of arbitrary structure. Given a dataset containing graphs in the form of (G,y) where G is a graph and y is its class, we aim to develop neural networks that read the graphs directly and learn a classification function. There are two main challenges: 1) how to extract useful features characterizing the rich information encoded in a graph for classification purpose, and 2) how to sequentially read a graph in a meaningful and consistent order. To address the first challenge, we design a localized graph convolution model and show its connection with two graph kernels. To address the second challenge, we design a novel SortPooling layer which sorts graph vertices in a consistent order so that traditional neural networks can be trained on the graphs. Experiments on benchmark graph classification datasets demonstrate that the proposed architecture achieves highly competitive performance with state-of-the-art graph kernels and other graph neural network methods. Moreover, the architecture allows end-to-end gradient-based training with original graphs, without the need to first transform graphs into vectors.
Tasks Graph Classification
Published 2018-04-29
URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewPaper/17146
PDF https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/17146/16755
PWC https://paperswithcode.com/paper/an-end-to-end-deep-learning-architecture-for
Repo https://github.com/muhanzhang/DGCNN
Framework torch

Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures

Title Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures
Authors Wenqiang Lei, Xisen Jin, Min-Yen Kan, Zhaochun Ren, Xiangnan He, Dawei Yin
Abstract Existing solutions to task-oriented dialogue systems follow pipeline designs which introduces architectural complexity and fragility. We propose a novel, holistic, extendable framework based on a single sequence-to-sequence (seq2seq) model which can be optimized with supervised or reinforcement learning. A key contribution is that we design text spans named belief spans to track dialogue believes, allowing task-oriented dialogue systems to be modeled in a seq2seq way. Based on this, we propose a simplistic Two Stage CopyNet instantiation which emonstrates good scalability: significantly reducing model complexity in terms of number of parameters and training time by a magnitude. It significantly outperforms state-of-the-art pipeline-based methods on large datasets and retains a satisfactory entity match rate on out-of-vocabulary (OOV) cases where pipeline-designed competitors totally fail.
Tasks Task-Oriented Dialogue Systems
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1133/
PDF https://www.aclweb.org/anthology/P18-1133
PWC https://paperswithcode.com/paper/sequicity-simplifying-task-oriented-dialogue
Repo https://github.com/WING-NUS/sequicity
Framework pytorch
Title Efficient Gradient-Free Variational Inference using Policy Search
Authors Oleg Arenz, Gerhard Neumann, Mingjun Zhong
Abstract Inference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources.
Tasks Efficient Exploration
Published 2018-07-01
URL https://icml.cc/Conferences/2018/Schedule?showEvent=2482
PDF http://proceedings.mlr.press/v80/arenz18a/arenz18a.pdf
PWC https://paperswithcode.com/paper/efficient-gradient-free-variational-inference
Repo https://github.com/OlegArenz/VIPS
Framework none

Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base

Title Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base
Authors Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin
Abstract We present an approach to map utterances in conversation to logical forms, which will be executed on a large-scale knowledge base. To handle enormous ellipsis phenomena in conversation, we introduce dialog memory management to manipulate historical entities, predicates, and logical forms when inferring the logical form of current utterances. Dialog memory management is embodied in a generative model, in which a logical form is interpreted in a top-down manner following a small and flexible grammar. We learn the model from denotations without explicit annotation of logical forms, and evaluate it on a large-scale dataset consisting of 200K dialogs over 12.8M entities. Results verify the benefits of modeling dialog memory, and show that our semantic parsing-based approach outperforms a memory network based encoder-decoder model by a huge margin.
Tasks Question Answering, Semantic Parsing
Published 2018-12-01
URL http://papers.nips.cc/paper/7558-dialog-to-action-conversational-question-answering-over-a-large-scale-knowledge-base
PDF http://papers.nips.cc/paper/7558-dialog-to-action-conversational-question-answering-over-a-large-scale-knowledge-base.pdf
PWC https://paperswithcode.com/paper/dialog-to-action-conversational-question
Repo https://github.com/guoday/Dialog-to-Action
Framework tf

A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis

Title A Position-aware Bidirectional Attention Network for Aspect-level Sentiment Analysis
Authors Shuqin Gu, Lipeng Zhang, Yuexian Hou, Yin Song
Abstract Aspect-level sentiment analysis aims to distinguish the sentiment polarity of each specific aspect term in a given sentence. Both industry and academia have realized the importance of the relationship between aspect term and sentence, and made attempts to model the relationship by designing a series of attention models. However, most existing methods usually neglect the fact that the position information is also crucial for identifying the sentiment polarity of the aspect term. When an aspect term occurs in a sentence, its neighboring words should be given more attention than other words with long distance. Therefore, we propose a position-aware bidirectional attention network (PBAN) based on bidirectional GRU. PBAN not only concentrates on the position information of aspect terms, but also mutually models the relation between aspect term and sentence by employing bidirectional attention mechanism. The experimental results on SemEval 2014 Datasets demonstrate the effectiveness of our proposed PBAN model.
Tasks Aspect-Based Sentiment Analysis, Feature Engineering
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1066/
PDF https://www.aclweb.org/anthology/C18-1066
PWC https://paperswithcode.com/paper/a-position-aware-bidirectional-attention
Repo https://github.com/hiyouga/PBAN-PyTorch
Framework pytorch

Lyrics Segmentation: Textual Macrostructure Detection using Convolutions

Title Lyrics Segmentation: Textual Macrostructure Detection using Convolutions
Authors Michael Fell, Yaroslav Nechaev, Elena Cabrio, G, Fabien on
Abstract Lyrics contain repeated patterns that are correlated with the repetitions found in the music they accompany. Repetitions in song texts have been shown to enable lyrics segmentation {–} a fundamental prerequisite of automatically detecting the building blocks (e.g. chorus, verse) of a song text. In this article we improve on the state-of-the-art in lyrics segmentation by applying a convolutional neural network to the task, and experiment with novel features as a step towards deeper macrostructure detection of lyrics.
Tasks Information Retrieval, Music Information Retrieval
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1174/
PDF https://www.aclweb.org/anthology/C18-1174
PWC https://paperswithcode.com/paper/lyrics-segmentation-textual-macrostructure
Repo https://github.com/TuringTrain/lyrics_segmentation
Framework tf

Transfer Learning for Entity Recognition of Novel Classes

Title Transfer Learning for Entity Recognition of Novel Classes
Authors Juan Diego Rodriguez, Adam Caldwell, Alex Liu, er
Abstract In this reproduction paper, we replicate and extend several past studies on transfer learning for entity recognition. In particular, we are interested in entity recognition problems where the class labels in the source and target domains are different. Our work is the first direct comparison of these previously published approaches in this problem setting. In addition, we perform experiments on seven new source/target corpus pairs, nearly doubling the total number of corpus pairs that have been studied in all past work combined. Our results empirically demonstrate when each of the published approaches tends to do well. In particular, simpler approaches often work best when there is very little labeled target data, while neural transfer approaches tend to do better when there is more labeled target data.
Tasks Entity Extraction, Named Entity Recognition, Sentiment Analysis, Transfer Learning
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1168/
PDF https://www.aclweb.org/anthology/C18-1168
PWC https://paperswithcode.com/paper/transfer-learning-for-entity-recognition-of
Repo https://github.com/ciads-ut/transfer-learning-ner
Framework none

Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation

Title Tree Structured Dirichlet Processes for Hierarchical Morphological Segmentation
Authors Burcu Can, Man, Suresh har
Abstract This article presents a probabilistic hierarchical clustering model for morphological segmentation. In contrast to existing approaches to morphology learning, our method allows learning hierarchical organization of word morphology as a collection of tree structured paradigms. The model is fully unsupervised and based on the hierarchical Dirichlet process. Tree hierarchies are learned along with the corresponding morphological paradigms simultaneously. Our model is evaluated on Morpho Challenge and shows competitive performance when compared to state-of-the-art unsupervised morphological segmentation systems. Although we apply this model for morphological segmentation, the model itself can also be used for hierarchical clustering of other types of data.
Tasks Information Retrieval, Machine Translation, Question Answering
Published 2018-06-01
URL https://www.aclweb.org/anthology/J18-2005/
PDF https://www.aclweb.org/anthology/J18-2005
PWC https://paperswithcode.com/paper/tree-structured-dirichlet-processes-for
Repo https://github.com/burcu-can/TreeStructuredDP
Framework none

Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications.

Title Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications.
Authors Oussama Ahmia, Nicolas B{'e}chet, Pierre-Fran{\c{c}}ois Marteau
Abstract
Tasks Information Retrieval, Machine Translation
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1583/
PDF https://www.aclweb.org/anthology/L18-1583
PWC https://paperswithcode.com/paper/two-multilingual-corpora-extracted-from-the
Repo https://github.com/oussamaahmia/TED-dataset
Framework none

A New Method of Region Embedding for Text Classification

Title A New Method of Region Embedding for Text Classification
Authors chao qiao, bo huang, guocheng niu, daren li, daxiang dong, wei he, dianhai yu, hua wu
Abstract To represent a text as a bag of properly identified “phrases” and use the representation for processing the text is proved to be useful. The key question here is how to identify the phrases and represent them. The traditional method of utilizing n-grams can be regarded as an approximation of the approach. Such a method can suffer from data sparsity, however, particularly when the length of n-gram is large. In this paper, we propose a new method of learning and utilizing task-specific distributed representations of n-grams, referred to as “region embeddings”. Without loss of generality we address text classification. We specifically propose two models for region embeddings. In our models, the representation of a word has two parts, the embedding of the word itself, and a weighting matrix to interact with the local context, referred to as local context unit. The region embeddings are learned and used in the classification task, as parameters of the neural network classifier. Experimental results show that our proposed method outperforms existing methods in text classification on several benchmark datasets. The results also indicate that our method can indeed capture the salient phrasal expressions in the texts.
Tasks Text Classification
Published 2018-01-01
URL https://openreview.net/forum?id=BkSDMA36Z
PDF https://openreview.net/pdf?id=BkSDMA36Z
PWC https://paperswithcode.com/paper/a-new-method-of-region-embedding-for-text
Repo https://github.com/text-representation/local-context-unit
Framework tf

A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs

Title A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs
Authors Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
Abstract Low-dimensional vector embeddings, computed using LSTMs or simpler techniques, are a popular approach for capturing the “meaning” of text and a form of unsupervised learning useful for downstream tasks. However, their power is not theoretically understood. The current paper derives formal understanding by looking at the subcase of linear embedding schemes. Using the theory of compressed sensing we show that representations combining the constituent word vectors are essentially information-preserving linear measurements of Bag-of-n-Grams (BonG) representations of text. This leads to a new theoretical result about LSTMs: low-dimensional embeddings derived from a low-memory LSTM are provably at least as powerful on classification tasks, up to small error, as a linear classifier over BonG vectors, a result that extensive empirical work has thus far been unable to show. Our experiments support these theoretical findings and establish strong, simple, and unsupervised baselines on standard benchmarks that in some cases are state of the art among word-level methods. We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice.
Tasks
Published 2018-01-01
URL https://openreview.net/forum?id=B1e5ef-C-
PDF https://openreview.net/pdf?id=B1e5ef-C-
PWC https://paperswithcode.com/paper/a-compressed-sensing-view-of-unsupervised
Repo https://github.com/NLPrinceton/sparse_recovery
Framework none

Automated learning of templates for data-to-text generation: comparing rule-based, statistical and neural methods

Title Automated learning of templates for data-to-text generation: comparing rule-based, statistical and neural methods
Authors Chris van der Lee, Emiel Krahmer, S Wubben, er
Abstract The current study investigated novel techniques and methods for trainable approaches to data-to-text generation. Neural Machine Translation was explored for the conversion from data to text as well as the addition of extra templatization steps of the data input and text output in the conversion process. Evaluation using BLEU did not find the Neural Machine Translation technique to perform any better compared to rule-based or Statistical Machine Translation, and the templatization method seemed to perform similarly or sometimes worse compared to direct data-to-text conversion. However, the human evaluation metrics indicated that Neural Machine Translation yielded the highest quality output and that the templatization method was able to increase text quality in multiple situations.
Tasks Data-to-Text Generation, Machine Translation, Text Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6504/
PDF https://www.aclweb.org/anthology/W18-6504
PWC https://paperswithcode.com/paper/automated-learning-of-templates-for-data-to
Repo https://github.com/TallChris91/Automated-Template-Learning
Framework none

Sub-word information in pre-trained biomedical word representations: evaluation and hyper-parameter optimization

Title Sub-word information in pre-trained biomedical word representations: evaluation and hyper-parameter optimization
Authors Dieter Galea, Ivan Laponogov, Kirill Veselkov
Abstract Word2vec embeddings are limited to computing vectors for in-vocabulary terms and do not take into account sub-word information. Character-based representations, such as fastText, mitigate such limitations. We optimize and compare these representations for the biomedical domain. fastText was found to consistently outperform word2vec in named entity recognition tasks for entities such as chemicals and genes. This is likely due to gained information from computed out-of-vocabulary term vectors, as well as the word compositionality of such entities. Contrastingly, performance varied on intrinsic datasets. Optimal hyper-parameters were intrinsic dataset-dependent, likely due to differences in term types distributions. This indicates embeddings should be chosen based on the task at hand. We therefore provide a number of optimized hyper-parameter sets and pre-trained word2vec and fastText models, available on \url{https://github.com/dterg/bionlp-embed}.
Tasks Feature Engineering, Named Entity Recognition, Word Embeddings
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-2307/
PDF https://www.aclweb.org/anthology/W18-2307
PWC https://paperswithcode.com/paper/sub-word-information-in-pre-trained
Repo https://github.com/dterg/bionlp-embed
Framework none

LSQ++: Lower running time and higher recall in multi-codebook quantization

Title LSQ++: Lower running time and higher recall in multi-codebook quantization
Authors Julieta Martinez, Shobhit Zakhmi, Holger H. Hoos, James J. Little
Abstract Multi-codebook quantization (MCQ) is the task of expressing a set of vectors as accurately as possible in terms of discrete entries in multiple bases. Work in MCQ is heavily focused on lowering quantization error, thereby improving distance estimation and recall on benchmarks of visual descriptors at a fixed memory budget. However, recent studies and methods in this area are hard to compare against each other, because they use different datasets, different protocols, and, perhaps most importantly, different computational budgets. In this work, we first benchmark a series of MCQ baselines on an equal footing and provide an analysis of their recall-vs-runtime performance. We observe that local search quantization (LSQ) is in practice much faster than its competitors, but is not the most accurate method in all cases. We then introduce two novel improvements that render LSQ (i) more accurate and (ii) faster. These improvements are easy to implement, and define a new state of the art in MCQ.
Tasks Quantization
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Julieta_Martinez_LSQ_lower_runtime_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Julieta_Martinez_LSQ_lower_runtime_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/lsq-lower-running-time-and-higher-recall-in
Repo https://github.com/una-dinosauria/Rayuela.jl
Framework none
comments powered by Disqus