May 5, 2019

1506 words 8 mins read

Paper Group NANR 135

The Power of Language Music: Arabic Lemmatization through Patterns. Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices. Scaling a Natural Language Generation System. Best of Both Worlds: Making Word Sense Embeddings Interpretable. The hunvec framework for NN-CRF-based sequential tagging. Learning Word Meta-Embe …

The Power of Language Music: Arabic Lemmatization through Patterns


Title	The Power of Language Music: Arabic Lemmatization through Patterns
Authors	Mohammed Attia, Ayah Zirikly, Mona Diab
Abstract	The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries. While roots provide the consonantal building blocks, patterns provide the syllabic vocalic moulds. While roots provide abstract semantic classes, patterns realize these classes in specific instances. In this way both roots and patterns are indispensable for understanding the derivational, morphological and, to some extent, the cognitive aspects of the Arabic language. In this paper we perform lemmatization (a high-level lexical processing) without relying on a lookup dictionary. We use a hybrid approach that consists of a machine learning classifier to predict the lemma pattern for a given stem, and mapping rules to convert stems to their respective lemmas with the vocalization defined by the pattern.
Tasks	Information Retrieval, Lemmatization
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5306/
PDF	https://www.aclweb.org/anthology/W16-5306
PWC	https://paperswithcode.com/paper/the-power-of-language-music-arabic
Repo
Framework

Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices


Title	Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices
Authors	Yoav Binoun, Francesca Delogu, Clayton Greenberg, Mindaugas Mozuraitis, Matthew Crocker
Abstract
Tasks	Language Modelling
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-2009/
PDF	https://www.aclweb.org/anthology/N16-2009
PWC	https://paperswithcode.com/paper/effects-of-communicative-pressures-on-novice
Repo
Framework

Scaling a Natural Language Generation System


Title	Scaling a Natural Language Generation System
Authors	Jonathan Pfeil, Soumya Ray
Abstract
Tasks	Text Generation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1109/
PDF	https://www.aclweb.org/anthology/P16-1109
PWC	https://paperswithcode.com/paper/scaling-a-natural-language-generation-system
Repo
Framework

Best of Both Worlds: Making Word Sense Embeddings Interpretable


Title	Best of Both Worlds: Making Word Sense Embeddings Interpretable
Authors	Alex Panchenko, er
Abstract	Word sense embeddings represent a word sense as a low-dimensional numeric vector. While this representation is potentially useful for NLP applications, its interpretability is inherently limited. We propose a simple technique that improves interpretability of sense vectors by mapping them to synsets of a lexical resource. Our experiments with AdaGram sense embeddings and BabelNet synsets show that it is possible to retrieve synsets that correspond to automatically learned sense vectors with Precision of 0.87, Recall of 0.42 and AUC of 0.78.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1421/
PDF	https://www.aclweb.org/anthology/L16-1421
PWC	https://paperswithcode.com/paper/best-of-both-worlds-making-word-sense
Repo
Framework

The hunvec framework for NN-CRF-based sequential tagging


Title	The hunvec framework for NN-CRF-based sequential tagging
Authors	Katalin Pajkossy, Attila Zs{'e}der
Abstract	In this work we present the open source hunvec framework for sequential tagging, built upon Theano and Pylearn2. The underlying statistical model, which connects linear CRF-s with neural networks, was used by Collobert and co-workers, and several other researchers. For demonstrating the flexibility of our tool, we describe a set of experiments on part-of-speech and named-entity-recognition tasks, using English and Hungarian datasets, where we modify both model and training parameters, and illustrate the usage of custom features. Model parameters we experiment with affect the vectorial word representations used by the model; we apply different word vector initializations, defined by Word2vec and GloVe embeddings and enrich the representation of words by vectors assigned trigram features. We extend training methods by using their regularized (l2 and dropout) version. When testing our framework on a Hungarian named entity corpus, we find that its performance reaches the best published results on this dataset, with no need for language-specific feature engineering. Our code is available at http://github.com/zseder/hunvec
Tasks	Feature Engineering, Named Entity Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1678/
PDF	https://www.aclweb.org/anthology/L16-1678
PWC	https://paperswithcode.com/paper/the-hunvec-framework-for-nn-crf-based
Repo
Framework

Learning Word Meta-Embeddings


Title	Learning Word Meta-Embeddings
Authors	Wenpeng Yin, Hinrich Sch{"u}tze
Abstract
Tasks	Dependency Parsing, Dimensionality Reduction, Machine Translation, Part-Of-Speech Tagging, Word Embeddings
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-1128/
PDF	https://www.aclweb.org/anthology/P16-1128
PWC	https://paperswithcode.com/paper/learning-word-meta-embeddings
Repo
Framework

Time-Independent and Language-Independent Extraction of Multiword Expressions From Twitter


Title	Time-Independent and Language-Independent Extraction of Multiword Expressions From Twitter
Authors	Nikhil Londhe, Rohini Srihari, Vishrawas Gopalakrishnan
Abstract	Multiword Expressions (MWEs) are crucial lexico-semantic units in any language. However, most work on MWEs has been focused on standard monolingual corpora. In this work, we examine MWE usage on Twitter - an inherently multilingual medium with an extremely short average text length that is often replete with grammatical errors. In this work we present a new graph based, language agnostic method for automatically extracting MWEs from tweets. We show how our method outperforms standard Association Measures. We also present a novel unsupervised evaluation technique to ascertain the accuracy of MWE extraction.
Tasks	Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1214/
PDF	https://www.aclweb.org/anthology/C16-1214
PWC	https://paperswithcode.com/paper/time-independent-and-language-independent
Repo
Framework

Feature based Sentiment Analysis using a Domain Ontology


Title	Feature based Sentiment Analysis using a Domain Ontology
Authors	Neha Yadav, C Ravindranath Chowdary
Abstract
Tasks	Opinion Mining, Sentiment Analysis, Word Sense Disambiguation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-6312/
PDF	https://www.aclweb.org/anthology/W16-6312
PWC	https://paperswithcode.com/paper/feature-based-sentiment-analysis-using-a
Repo
Framework

Learning Knowledge Base Inference with Neural Theorem Provers


Title	Learning Knowledge Base Inference with Neural Theorem Provers
Authors	Tim Rockt{"a}schel, Sebastian Riedel
Abstract
Tasks
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-1309/
PDF	https://www.aclweb.org/anthology/W16-1309
PWC	https://paperswithcode.com/paper/learning-knowledge-base-inference-with-neural
Repo
Framework

A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues


Title	A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
Authors	Barbara Konat, John Lawrence, Joonsuk Park, Katarzyna Budzynska, Chris Reed
Abstract	Governments are increasingly utilising online platforms in order to engage with, and ascertain the opinions of, their citizens. Whilst policy makers could potentially benefit from such enormous feedback from society, they first face the challenge of making sense out of the large volumes of data produced. This creates a demand for tools and technologies which will enable governments to quickly and thoroughly digest the points being made and to respond accordingly. By determining the argumentative and dialogical structures contained within a debate, we are able to determine the issues which are divisive and those which attract agreement. This paper proposes a method of graph-based analytics which uses properties of graphs representing networks of arguments pro- {&} con- in order to automatically analyse issues which divide citizens about new regulations. By future application of the most recent advances in argument mining, the results reported here will have a chance to scale up to enable sense-making of the vast amount of feedback received from citizens on directions that policy should take.
Tasks	Argument Mining
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1617/
PDF	https://www.aclweb.org/anthology/L16-1617
PWC	https://paperswithcode.com/paper/a-corpus-of-argument-networks-using-graph
Repo
Framework

Towards a Convex HMM Surrogate for Word Alignment


Title	Towards a Convex HMM Surrogate for Word Alignment
Authors	Andrei Simion, Michael Collins, Cliff Stein
Abstract
Tasks	Machine Translation, Word Alignment
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1051/
PDF	https://www.aclweb.org/anthology/D16-1051
PWC	https://paperswithcode.com/paper/towards-a-convex-hmm-surrogate-for-word
Repo
Framework

Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)


Title	Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
Authors	Kay Berkling
Abstract	This paper describes the collection of the H1 Corpus of children{'}s weekly writing over the course of 3 months in 2nd and 3rd grades, aged 7-11. The texts were collected within the normal classroom setting by the teacher. Texts of children whose parents signed the permission to donate the texts to science were collected and transcribed. The corpus consists of the elicitation techniques, an overview of the data collected and the transcriptions of the texts both with and without spelling errors, aligned on a word by word basis, as well as the scanned in texts. The corpus is available for research via Linguistic Data Consortium (LDC). Researchers are strongly encouraged to make additional annotations and improvements and return it to the public domain via LDC.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1510/
PDF	https://www.aclweb.org/anthology/L16-1510
PWC	https://paperswithcode.com/paper/corpus-for-childrens-writing-with-enhanced
Repo
Framework

Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)


Title	Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)
Authors
Abstract
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3800/
PDF	https://www.aclweb.org/anthology/W16-3800
PWC	https://paperswithcode.com/paper/proceedings-of-the-workshop-on-grammar-and
Repo
Framework

Korean Language Resources for Everyone


Title	Korean Language Resources for Everyone
Authors	Jungyeul Park, Jeen-Pyo Hong, Jeong-Won Cha
Abstract
Tasks	Machine Translation, Morphological Analysis, Part-Of-Speech Tagging
Published	2016-10-01
URL	https://www.aclweb.org/anthology/Y16-2002/
PDF	https://www.aclweb.org/anthology/Y16-2002
PWC	https://paperswithcode.com/paper/korean-language-resources-for-everyone
Repo
Framework

Experiments in Idiom Recognition


Title	Experiments in Idiom Recognition
Authors	Jing Peng, Anna Feldman
Abstract	Some expressions can be ambiguous between idiomatic and literal interpretations depending on the context they occur in, e.g., {`}sales hit the roof{'} vs. {`}hit the roof of the car{'}. We present a novel method of classifying whether a given instance is literal or idiomatic, focusing on verb-noun constructions. We report state-of-the-art results on this task using an approach based on the hypothesis that the distributions of the contexts of the idiomatic phrases will be different from the contexts of the literal usages. We measure contexts by using projections of the words into vector space. For comparison, we implement Fazly et al. (2009){'}s, Sporleder and Li (2009){'}s, and Li and Sporleder (2010b){'}s methods and apply them to our data. We provide experimental results validating the proposed techniques.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1259/
PDF	https://www.aclweb.org/anthology/C16-1259
PWC	https://paperswithcode.com/paper/experiments-in-idiom-recognition
Repo
Framework