Paper Group NANR 135
The Power of Language Music: Arabic Lemmatization through Patterns. Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices. Scaling a Natural Language Generation System. Best of Both Worlds: Making Word Sense Embeddings Interpretable. The hunvec framework for NN-CRF-based sequential tagging. Learning Word Meta-Embe …
The Power of Language Music: Arabic Lemmatization through Patterns
Title | The Power of Language Music: Arabic Lemmatization through Patterns |
Authors | Mohammed Attia, Ayah Zirikly, Mona Diab |
Abstract | The interaction between roots and patterns in Arabic has intrigued lexicographers and morphologists for centuries. While roots provide the consonantal building blocks, patterns provide the syllabic vocalic moulds. While roots provide abstract semantic classes, patterns realize these classes in specific instances. In this way both roots and patterns are indispensable for understanding the derivational, morphological and, to some extent, the cognitive aspects of the Arabic language. In this paper we perform lemmatization (a high-level lexical processing) without relying on a lookup dictionary. We use a hybrid approach that consists of a machine learning classifier to predict the lemma pattern for a given stem, and mapping rules to convert stems to their respective lemmas with the vocalization defined by the pattern. |
Tasks | Information Retrieval, Lemmatization |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5306/ |
https://www.aclweb.org/anthology/W16-5306 | |
PWC | https://paperswithcode.com/paper/the-power-of-language-music-arabic |
Repo | |
Framework | |
Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices
Title | Effects of Communicative Pressures on Novice L2 Learners’ Use of Optional Formal Devices |
Authors | Yoav Binoun, Francesca Delogu, Clayton Greenberg, Mindaugas Mozuraitis, Matthew Crocker |
Abstract | |
Tasks | Language Modelling |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-2009/ |
https://www.aclweb.org/anthology/N16-2009 | |
PWC | https://paperswithcode.com/paper/effects-of-communicative-pressures-on-novice |
Repo | |
Framework | |
Scaling a Natural Language Generation System
Title | Scaling a Natural Language Generation System |
Authors | Jonathan Pfeil, Soumya Ray |
Abstract | |
Tasks | Text Generation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1109/ |
https://www.aclweb.org/anthology/P16-1109 | |
PWC | https://paperswithcode.com/paper/scaling-a-natural-language-generation-system |
Repo | |
Framework | |
Best of Both Worlds: Making Word Sense Embeddings Interpretable
Title | Best of Both Worlds: Making Word Sense Embeddings Interpretable |
Authors | Alex Panchenko, er |
Abstract | Word sense embeddings represent a word sense as a low-dimensional numeric vector. While this representation is potentially useful for NLP applications, its interpretability is inherently limited. We propose a simple technique that improves interpretability of sense vectors by mapping them to synsets of a lexical resource. Our experiments with AdaGram sense embeddings and BabelNet synsets show that it is possible to retrieve synsets that correspond to automatically learned sense vectors with Precision of 0.87, Recall of 0.42 and AUC of 0.78. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1421/ |
https://www.aclweb.org/anthology/L16-1421 | |
PWC | https://paperswithcode.com/paper/best-of-both-worlds-making-word-sense |
Repo | |
Framework | |
The hunvec framework for NN-CRF-based sequential tagging
Title | The hunvec framework for NN-CRF-based sequential tagging |
Authors | Katalin Pajkossy, Attila Zs{'e}der |
Abstract | In this work we present the open source hunvec framework for sequential tagging, built upon Theano and Pylearn2. The underlying statistical model, which connects linear CRF-s with neural networks, was used by Collobert and co-workers, and several other researchers. For demonstrating the flexibility of our tool, we describe a set of experiments on part-of-speech and named-entity-recognition tasks, using English and Hungarian datasets, where we modify both model and training parameters, and illustrate the usage of custom features. Model parameters we experiment with affect the vectorial word representations used by the model; we apply different word vector initializations, defined by Word2vec and GloVe embeddings and enrich the representation of words by vectors assigned trigram features. We extend training methods by using their regularized (l2 and dropout) version. When testing our framework on a Hungarian named entity corpus, we find that its performance reaches the best published results on this dataset, with no need for language-specific feature engineering. Our code is available at http://github.com/zseder/hunvec |
Tasks | Feature Engineering, Named Entity Recognition |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1678/ |
https://www.aclweb.org/anthology/L16-1678 | |
PWC | https://paperswithcode.com/paper/the-hunvec-framework-for-nn-crf-based |
Repo | |
Framework | |
Learning Word Meta-Embeddings
Title | Learning Word Meta-Embeddings |
Authors | Wenpeng Yin, Hinrich Sch{"u}tze |
Abstract | |
Tasks | Dependency Parsing, Dimensionality Reduction, Machine Translation, Part-Of-Speech Tagging, Word Embeddings |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1128/ |
https://www.aclweb.org/anthology/P16-1128 | |
PWC | https://paperswithcode.com/paper/learning-word-meta-embeddings |
Repo | |
Framework | |
Time-Independent and Language-Independent Extraction of Multiword Expressions From Twitter
Title | Time-Independent and Language-Independent Extraction of Multiword Expressions From Twitter |
Authors | Nikhil Londhe, Rohini Srihari, Vishrawas Gopalakrishnan |
Abstract | Multiword Expressions (MWEs) are crucial lexico-semantic units in any language. However, most work on MWEs has been focused on standard monolingual corpora. In this work, we examine MWE usage on Twitter - an inherently multilingual medium with an extremely short average text length that is often replete with grammatical errors. In this work we present a new graph based, language agnostic method for automatically extracting MWEs from tweets. We show how our method outperforms standard Association Measures. We also present a novel unsupervised evaluation technique to ascertain the accuracy of MWE extraction. |
Tasks | Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1214/ |
https://www.aclweb.org/anthology/C16-1214 | |
PWC | https://paperswithcode.com/paper/time-independent-and-language-independent |
Repo | |
Framework | |
Feature based Sentiment Analysis using a Domain Ontology
Title | Feature based Sentiment Analysis using a Domain Ontology |
Authors | Neha Yadav, C Ravindranath Chowdary |
Abstract | |
Tasks | Opinion Mining, Sentiment Analysis, Word Sense Disambiguation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-6312/ |
https://www.aclweb.org/anthology/W16-6312 | |
PWC | https://paperswithcode.com/paper/feature-based-sentiment-analysis-using-a |
Repo | |
Framework | |
Learning Knowledge Base Inference with Neural Theorem Provers
Title | Learning Knowledge Base Inference with Neural Theorem Provers |
Authors | Tim Rockt{"a}schel, Sebastian Riedel |
Abstract | |
Tasks | |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-1309/ |
https://www.aclweb.org/anthology/W16-1309 | |
PWC | https://paperswithcode.com/paper/learning-knowledge-base-inference-with-neural |
Repo | |
Framework | |
A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues
Title | A Corpus of Argument Networks: Using Graph Properties to Analyse Divisive Issues |
Authors | Barbara Konat, John Lawrence, Joonsuk Park, Katarzyna Budzynska, Chris Reed |
Abstract | Governments are increasingly utilising online platforms in order to engage with, and ascertain the opinions of, their citizens. Whilst policy makers could potentially benefit from such enormous feedback from society, they first face the challenge of making sense out of the large volumes of data produced. This creates a demand for tools and technologies which will enable governments to quickly and thoroughly digest the points being made and to respond accordingly. By determining the argumentative and dialogical structures contained within a debate, we are able to determine the issues which are divisive and those which attract agreement. This paper proposes a method of graph-based analytics which uses properties of graphs representing networks of arguments pro- {&} con- in order to automatically analyse issues which divide citizens about new regulations. By future application of the most recent advances in argument mining, the results reported here will have a chance to scale up to enable sense-making of the vast amount of feedback received from citizens on directions that policy should take. |
Tasks | Argument Mining |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1617/ |
https://www.aclweb.org/anthology/L16-1617 | |
PWC | https://paperswithcode.com/paper/a-corpus-of-argument-networks-using-graph |
Repo | |
Framework | |
Towards a Convex HMM Surrogate for Word Alignment
Title | Towards a Convex HMM Surrogate for Word Alignment |
Authors | Andrei Simion, Michael Collins, Cliff Stein |
Abstract | |
Tasks | Machine Translation, Word Alignment |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1051/ |
https://www.aclweb.org/anthology/D16-1051 | |
PWC | https://paperswithcode.com/paper/towards-a-convex-hmm-surrogate-for-word |
Repo | |
Framework | |
Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade)
Title | Corpus for Children’s Writing with Enhanced Output for Specific Spelling Patterns (2nd and 3rd Grade) |
Authors | Kay Berkling |
Abstract | This paper describes the collection of the H1 Corpus of children{'}s weekly writing over the course of 3 months in 2nd and 3rd grades, aged 7-11. The texts were collected within the normal classroom setting by the teacher. Texts of children whose parents signed the permission to donate the texts to science were collected and transcribed. The corpus consists of the elicitation techniques, an overview of the data collected and the transcriptions of the texts both with and without spelling errors, aligned on a word by word basis, as well as the scanned in texts. The corpus is available for research via Linguistic Data Consortium (LDC). Researchers are strongly encouraged to make additional annotations and improvements and return it to the public domain via LDC. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1510/ |
https://www.aclweb.org/anthology/L16-1510 | |
PWC | https://paperswithcode.com/paper/corpus-for-childrens-writing-with-enhanced |
Repo | |
Framework | |
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)
Title | Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex) |
Authors | |
Abstract | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3800/ |
https://www.aclweb.org/anthology/W16-3800 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-workshop-on-grammar-and |
Repo | |
Framework | |
Korean Language Resources for Everyone
Title | Korean Language Resources for Everyone |
Authors | Jungyeul Park, Jeen-Pyo Hong, Jeong-Won Cha |
Abstract | |
Tasks | Machine Translation, Morphological Analysis, Part-Of-Speech Tagging |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/Y16-2002/ |
https://www.aclweb.org/anthology/Y16-2002 | |
PWC | https://paperswithcode.com/paper/korean-language-resources-for-everyone |
Repo | |
Framework | |
Experiments in Idiom Recognition
Title | Experiments in Idiom Recognition |
Authors | Jing Peng, Anna Feldman |
Abstract | Some expressions can be ambiguous between idiomatic and literal interpretations depending on the context they occur in, e.g., {}sales hit the roof{'} vs. { }hit the roof of the car{'}. We present a novel method of classifying whether a given instance is literal or idiomatic, focusing on verb-noun constructions. We report state-of-the-art results on this task using an approach based on the hypothesis that the distributions of the contexts of the idiomatic phrases will be different from the contexts of the literal usages. We measure contexts by using projections of the words into vector space. For comparison, we implement Fazly et al. (2009){'}s, Sporleder and Li (2009){'}s, and Li and Sporleder (2010b){'}s methods and apply them to our data. We provide experimental results validating the proposed techniques. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1259/ |
https://www.aclweb.org/anthology/C16-1259 | |
PWC | https://paperswithcode.com/paper/experiments-in-idiom-recognition |
Repo | |
Framework | |