July 26, 2019

2612 words 13 mins read

Paper Group NANR 175

CKY-based Convolutional Attention for Neural Machine Translation. Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC. Lower bounds on the robustness to adversarial perturbations. Integer Linear Programming formulations in Natural Language Processing. TakeLab at SemEval-2017 Task 6: #RankingHumorIn4Pag …

CKY-based Convolutional Attention for Neural Machine Translation


Title	CKY-based Convolutional Attention for Neural Machine Translation
Authors	Taiki Watanabe, Akihiro Tamura, Takashi Ninomiya
Abstract	This paper proposes a new attention mechanism for neural machine translation (NMT) based on convolutional neural networks (CNNs), which is inspired by the CKY algorithm. The proposed attention represents every possible combination of source words (e.g., phrases and structures) through CNNs, which imitates the CKY table in the algorithm. NMT, incorporating the proposed attention, decodes a target sentence on the basis of the attention scores of the hidden states of CNNs. The proposed attention enables NMT to capture alignments from underlying structures of a source sentence without sentence parsing. The evaluations on the Asian Scientific Paper Excerpt Corpus (ASPEC) English-Japanese translation task show that the proposed attention gains 0.66 points in BLEU.
Tasks	Machine Translation
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2001/
PDF	https://www.aclweb.org/anthology/I17-2001
PWC	https://paperswithcode.com/paper/cky-based-convolutional-attention-for-neural
Repo
Framework

Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC


Title	Fractional Langevin Monte Carlo: Exploring Levy Driven Stochastic Differential Equations for MCMC
Authors	Umut Şimşekli
Abstract	Along with the recent advances in scalable Markov Chain Monte Carlo methods, sampling techniques that are based on Langevin diffusions have started receiving increasing attention. These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms. Even though these approaches have proven successful in many applications, their performance can be limited by the light-tailed nature of the Gaussian proposals. In this study, we extend classical LMC and develop a novel Fractional LMC (FLMC) framework that is based on a family of heavy-tailed distributions, called alpha-stable Levy distributions. As opposed to classical approaches, the proposed approach can possess large jumps while targeting the correct distribution, which would be beneficial for efficient exploration of the state space. We develop novel computational methods that can scale up to large-scale problems and we provide formal convergence analysis of the proposed scheme. Our experiments support our theory: FLMC can provide superior performance in multi-modal settings, improved convergence rates, and robustness to algorithm parameters.
Tasks	Efficient Exploration
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=462
PDF	http://proceedings.mlr.press/v70/simsekli17a/simsekli17a.pdf
PWC	https://paperswithcode.com/paper/fractional-langevin-monte-carlo-exploring
Repo
Framework

Lower bounds on the robustness to adversarial perturbations


Title	Lower bounds on the robustness to adversarial perturbations
Authors	Jonathan Peck, Joris Roels, Bart Goossens, Yvan Saeys
Abstract	The input-output mappings learned by state-of-the-art neural networks are significantly discontinuous. It is possible to cause a neural network used for image recognition to misclassify its input by applying very specific, hardly perceptible perturbations to the input, called adversarial perturbations. Many hypotheses have been proposed to explain the existence of these peculiar samples as well as several methods to mitigate them. A proven explanation remains elusive, however. In this work, we take steps towards a formal characterization of adversarial perturbations by deriving lower bounds on the magnitudes of perturbations necessary to change the classification of neural networks. The bounds are experimentally verified on the MNIST and CIFAR-10 data sets.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/6682-lower-bounds-on-the-robustness-to-adversarial-perturbations
PDF	http://papers.nips.cc/paper/6682-lower-bounds-on-the-robustness-to-adversarial-perturbations.pdf
PWC	https://paperswithcode.com/paper/lower-bounds-on-the-robustness-to-adversarial
Repo
Framework

Integer Linear Programming formulations in Natural Language Processing


Title	Integer Linear Programming formulations in Natural Language Processing
Authors	Dan Roth, Vivek Srikumar
Abstract	Making decisions in natural language processing problems often involves assigning values to sets of interdependent variables where the expressive dependency structure can influence, or even dictate what assignments are possible. This setting includes a broad range of structured prediction problems such as semantic role labeling, named entity and relation recognition, co-reference resolution, dependency parsing and semantic parsing. The setting is also appropriate for cases that may require making global decisions that involve multiple components, possibly pre-designed or pre-learned, as in event recognition and analysis, summarization, paraphrasing, textual entailment and question answering. In all these cases, it is natural to formulate the decision problem as a constrained optimization problem, with an objective function that is composed of learned models, subject to domain or problem specific constraints.Over the last few years, starting with a couple of papers written by (Roth {&} Yih, 2004, 2005), dozens of papers have been using the Integer linear programming (ILP) formulation developed there, including several award-winning papers (e.g., (Martins, Smith, {&} Xing, 2009; Koo, Rush, Collins, Jaakkola, {&} Sontag., 2010; Berant, Dagan, {&} Goldberger, 2011)).This tutorial will present the key ingredients of ILP formulations of natural language processing problems, aiming at guiding readers through the key modeling steps, explaining the learning and inference paradigms and exemplifying these by providing examples from the literature. We will cover a range of topics, from the theoretical foundations of learning and inference with ILP models, to practical modeling guides, to software packages and applications.The goal of this tutorial is to introduce the computational framework to broader ACL community, motivate it as a generic framework for learning and inference in global NLP decision problems, present some of the key theoretical and practical issues involved and survey some of the existing applications of it as a way to promote further development of the framework and additional applications. We will also make connections with some of the {``}hot{''} topics in current NLP research and show how they can be used within the general framework proposed here. The tutorial will thus be useful for many of the senior and junior researchers that have interest in global decision problems in NLP, providing a concise overview of recent perspectives and research results. \|
Tasks	Dependency Parsing, Natural Language Inference, Question Answering, Semantic Parsing, Semantic Role Labeling, Structured Prediction
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-5005/
PDF	https://www.aclweb.org/anthology/E17-5005
PWC	https://paperswithcode.com/paper/integer-linear-programming-formulations-in
Repo
Framework

TakeLab at SemEval-2017 Task 6: #RankingHumorIn4Pages


Title	TakeLab at SemEval-2017 Task 6: #RankingHumorIn4Pages
Authors	Marin Kukova{\v{c}}ec, Juraj Malenica, Ivan Mr{\v{s}}i{'c}, Antonio {\v{S}}ajatovi{'c}, Domagoj Alagi{'c}, Jan {\v{S}}najder
Abstract	This paper describes our system for humor ranking in tweets within the SemEval 2017 Task 6: {#}HashtagWars (6A and 6B). For both subtasks, we use an off-the-shelf gradient boosting model built on a rich set of features, handcrafted to provide the model with the external knowledge needed to better predict the humor in the text. The features capture various cultural references and specific humor patterns. Our system ranked 2nd (officially 7th) among 10 submissions on the Subtask A and 2nd among 9 submissions on the Subtask B.
Tasks	Common Sense Reasoning, Humor Detection
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2066/
PDF	https://www.aclweb.org/anthology/S17-2066
PWC	https://paperswithcode.com/paper/takelab-at-semeval-2017-task-6
Repo
Framework

Structured Prediction via Learning to Search under Bandit Feedback


Title	Structured Prediction via Learning to Search under Bandit Feedback
Authors	Amr Sharaf, Hal Daum{'e} III
Abstract	We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.
Tasks	Active Learning, Dependency Parsing, Structured Prediction
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4304/
PDF	https://www.aclweb.org/anthology/W17-4304
PWC	https://paperswithcode.com/paper/structured-prediction-via-learning-to-search
Repo
Framework

Spatial Language Understanding with Multimodal Graphs using Declarative Learning based Programming


Title	Spatial Language Understanding with Multimodal Graphs using Declarative Learning based Programming
Authors	Parisa Kordjamshidi, Taher Rahgooy, Umar Manzoor
Abstract	This work is on a previously formalized semantic evaluation task of spatial role labeling (SpRL) that aims at extraction of formal spatial meaning from text. Here, we report the results of initial efforts towards exploiting visual information in the form of images to help spatial language understanding. We discuss the way of designing new models in the framework of declarative learning-based programming (DeLBP). The DeLBP framework facilitates combining modalities and representing various data in a unified graph. The learning and inference models exploit the structure of the unified graph as well as the global first order domain constraints beyond the data to predict the semantics which forms a structured meaning representation of the spatial context. Continuous representations are used to relate the various elements of the graph originating from different modalities. We improved over the state-of-the-art results on SpRL.
Tasks	Image Captioning, Image Retrieval, Question Answering, Structured Prediction, Visual Question Answering
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4306/
PDF	https://www.aclweb.org/anthology/W17-4306
PWC	https://paperswithcode.com/paper/spatial-language-understanding-with
Repo
Framework

Boosting Information Extraction Systems with Character-level Neural Networks and Free Noisy Supervision


Title	Boosting Information Extraction Systems with Character-level Neural Networks and Free Noisy Supervision
Authors	Philipp Meerkamp, Zhengyi Zhou
Abstract	We present an architecture to boost the precision of existing information extraction systems. This is achieved by augmenting the existing parser, which may be constraint-based or hybrid statistical, with a character-level neural network. Our architecture combines the ability of constraint-based or hybrid extraction systems to easily incorporate domain knowledge with the ability of deep neural networks to leverage large amounts of data to learn complex features. The network is trained using a measure of consistency between extracted data and existing databases as a form of cheap, noisy supervision. Our architecture does not require large scale manual annotation or a system rewrite. It has led to large precision improvements over an existing, highly-tuned production information extraction system used at Bloomberg LP for financial language text.
Tasks	Structured Prediction, Time Series
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4307/
PDF	https://www.aclweb.org/anthology/W17-4307
PWC	https://paperswithcode.com/paper/boosting-information-extraction-systems-with
Repo
Framework

Learning Kernels over Strings using Gaussian Processes


Title	Learning Kernels over Strings using Gaussian Processes
Authors	Daniel Beck, Trevor Cohn
Abstract	Non-contiguous word sequences are widely known to be important in modelling natural language. However they not explicitly encoded in common text representations. In this work we propose a model for text processing using string kernels, capable of flexibly representing non-contiguous sequences. Specifically, we derive a vectorised version of the string kernel algorithm and their gradients, allowing efficient hyperparameter optimisation as part of a Gaussian Process framework. Experiments on synthetic data and text regression for emotion analysis show the promise of this technique.
Tasks	Emotion Recognition, Gaussian Processes
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2012/
PDF	https://www.aclweb.org/anthology/I17-2012
PWC	https://paperswithcode.com/paper/learning-kernels-over-strings-using-gaussian
Repo
Framework

Substring Frequency Features for Segmentation of Japanese Katakana Words with Unlabeled Corpora


Title	Substring Frequency Features for Segmentation of Japanese Katakana Words with Unlabeled Corpora
Authors	Yoshinari Fujinuma, Alvin Grissom II
Abstract	Word segmentation is crucial in natural language processing tasks for unsegmented languages. In Japanese, many out-of-vocabulary words appear in the phonetic syllabary katakana, making segmentation more difficult due to the lack of clues found in mixed script settings. In this paper, we propose a straightforward approach based on a variant of tf-idf and apply it to the problem of word segmentation in Japanese. Even though our method uses only an unlabeled corpus, experimental results show that it achieves performance comparable to existing methods that use manually labeled corpora. Furthermore, it improves performance of simple word segmentation models trained on a manually labeled corpus.
Tasks	Information Retrieval, Machine Translation
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2013/
PDF	https://www.aclweb.org/anthology/I17-2013
PWC	https://paperswithcode.com/paper/substring-frequency-features-for-segmentation
Repo
Framework

Recall is the Proper Evaluation Metric for Word Segmentation


Title	Recall is the Proper Evaluation Metric for Word Segmentation
Authors	Yan Shao, Christian Hardmeier, Joakim Nivre
Abstract	We extensively analyse the correlations and drawbacks of conventionally employed evaluation metrics for word segmentation. Unlike in standard information retrieval, precision favours under-splitting systems and therefore can be misleading in word segmentation. Overall, based on both theoretical and experimental analysis, we propose that precision should be excluded from the standard evaluation metrics and that the evaluation score obtained by using only recall is sufficient and better correlated with the performance of word segmentation systems.
Tasks	Information Retrieval, Machine Translation, Part-Of-Speech Tagging
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2015/
PDF	https://www.aclweb.org/anthology/I17-2015
PWC	https://paperswithcode.com/paper/recall-is-the-proper-evaluation-metric-for
Repo
Framework

Low-Resource Named Entity Recognition with Cross-lingual, Character-Level Neural Conditional Random Fields


Title	Low-Resource Named Entity Recognition with Cross-lingual, Character-Level Neural Conditional Random Fields
Authors	Ryan Cotterell, Kevin Duh
Abstract	Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world{'}s languages it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low-resource languages jointly. Learning character representations for multiple related languages allows knowledge transfer from the high-resource languages to the low-resource ones, improving F1 by up to 9.8 points.
Tasks	Cross-Lingual Transfer, Named Entity Recognition, Transfer Learning
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2016/
PDF	https://www.aclweb.org/anthology/I17-2016
PWC	https://paperswithcode.com/paper/low-resource-named-entity-recognition-with
Repo
Framework

Are Manually Prepared Affective Lexicons Really Useful for Sentiment Analysis


Title	Are Manually Prepared Affective Lexicons Really Useful for Sentiment Analysis
Authors	Minglei Li, Qin Lu, Yunfei Long
Abstract	In this paper, we investigate the effectiveness of different affective lexicons through sentiment analysis of phrases. We examine how phrases can be represented through manually prepared lexicons, extended lexicons using computational methods, or word embedding. Comparative studies clearly show that word embedding using unsupervised distributional method outperforms manually prepared lexicons no matter what affective models are used in the lexicons. Our conclusion is that although different affective lexicons are cognitively backed by theories, they do not show any advantage over the automatically obtained word embedding.
Tasks	Semantic Textual Similarity, Sentiment Analysis
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2025/
PDF	https://www.aclweb.org/anthology/I17-2025
PWC	https://paperswithcode.com/paper/are-manually-prepared-affective-lexicons
Repo
Framework

Can Discourse Relations be Identified Incrementally?


Title	Can Discourse Relations be Identified Incrementally?
Authors	Frances Yung, Hiroshi Noji, Yuji Matsumoto
Abstract	Humans process language word by word and construct partial linguistic structures on the fly before the end of the sentence is perceived. Inspired by this cognitive ability, incremental algorithms for natural language processing tasks have been proposed and demonstrated promising performance. For discourse relation (DR) parsing, however, it is not yet clear to what extent humans can recognize DRs incrementally, because the latent {`}nodes{'} of discourse structure can span clauses and sentences. To answer this question, this work investigates incrementality in discourse processing based on a corpus annotated with DR signals. We find that DRs are dominantly signaled at the boundary between the two constituent discourse units. The findings complement existing psycholinguistic theories on expectation in discourse processing and provide direction for incremental discourse parsing. \|
Tasks	Language Modelling, Semantic Role Labeling, Speech Recognition
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2027/
PDF	https://www.aclweb.org/anthology/I17-2027
PWC	https://paperswithcode.com/paper/can-discourse-relations-be-identified
Repo
Framework

Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures


Title	Hybrid Grammars for Parsing of Discontinuous Phrase Structures and Non-Projective Dependency Structures
Authors	Kilian Gebhardt, Mark-Jan Nederhof, Heiko Vogler
Abstract	We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy, and frequency of parse failures.
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/J17-3001/
PDF	https://www.aclweb.org/anthology/J17-3001
PWC	https://paperswithcode.com/paper/hybrid-grammars-for-parsing-of-discontinuous
Repo
Framework