Paper Group NANR 34
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. Fine-Grained Chinese Discourse Relation Labelling. TopoText: Interactive Digital Mapping of Literary Text. The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources. Identifying Referenced Text i …
Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning
Title | Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning |
Authors | |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-1000/ |
https://www.aclweb.org/anthology/K16-1000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-20th-signll-conference-on |
Repo | |
Framework | |
Fine-Grained Chinese Discourse Relation Labelling
Title | Fine-Grained Chinese Discourse Relation Labelling |
Authors | Huan-Yuan Chen, Wan-Shan Liao, Hen-Hsen Huang, Hsin-Hsi Chen |
Abstract | This paper explores several aspects together for a fine-grained Chinese discourse analysis. We deal with the issues of ambiguous discourse markers, ambiguous marker linkings, and more than one discourse marker. A universal feature representation is proposed. The pair-once postulation, cross-discourse-unit-first rule and word-pair-marker-first rule select a set of discourse markers from ambiguous linkings. Marker-Sum feature considers total contribution of markers and Marker-Preference feature captures the probability distribution of discourse functions of a representative marker by using preference rule. The HIT Chinese discourse relation treebank (HIT-CDTB) is used to evaluate the proposed models. The 25-way classifier achieves 0.57 micro-averaged F-score. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1164/ |
https://www.aclweb.org/anthology/L16-1164 | |
PWC | https://paperswithcode.com/paper/fine-grained-chinese-discourse-relation |
Repo | |
Framework | |
TopoText: Interactive Digital Mapping of Literary Text
Title | TopoText: Interactive Digital Mapping of Literary Text |
Authors | R El Khatib, a, Julia El Zini, David Wrisley, Mohamad Jaber, Shady Elbassuoni |
Abstract | We demonstrate TopoText, an interactive tool for digital mapping of literary text. TopoText takes as input a literary piece of text such as a novel or a biography article and automatically extracts all place names in the text. The identified places are then geoparsed and displayed on an interactive map. TopoText calculates the number of times a place was mentioned in the text, which is then reflected on the map allowing the end-user to grasp the importance of the different places within the text. It also displays the most frequent words mentioned within a specified proximity of a place name in context or across the entire text. This can also be faceted according to part of speech tags. Finally, TopoText keeps the human in the loop by allowing the end-user to disambiguate places and to provide specific place annotations. All extracted information such as geolocations, place frequencies, as well as all user-provided annotations can be automatically exported as a CSV file that can be imported later by the same user or other users. |
Tasks | Part-Of-Speech Tagging |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2040/ |
https://www.aclweb.org/anthology/C16-2040 | |
PWC | https://paperswithcode.com/paper/topotext-interactive-digital-mapping-of |
Repo | |
Framework | |
The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
Title | The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources |
Authors | Georg Rehm |
Abstract | Language Resources (LRs) are an essential ingredient of current approaches in Linguistics, Computational Linguistics, Language Technology and related fields. LRs are collections of spoken or written language data, typically annotated with linguistic analysis information. Different types of LRs exist, for example, corpora, ontologies, lexicons, collections of spoken language data (audio), or collections that also include video (multimedia, multimodal). Often, LRs are distributed with specific tools, documentation, manuals or research publications. The different phases that involve creating and distributing an LR can be conceptualised as a life cycle. While the idea of handling the LR production and maintenance process in terms of a life cycle has been brought up quite some time ago, a best practice model or common approach can still be considered a research gap. This article wants to help fill this gap by proposing an initial version of a generic Language Resource Life Cycle that can be used to inform, direct, control and evaluate LR research and development activities (including description, management, production, validation and evaluation workflows). |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1388/ |
https://www.aclweb.org/anthology/L16-1388 | |
PWC | https://paperswithcode.com/paper/the-language-resource-life-cycle-towards-a |
Repo | |
Framework | |
Identifying Referenced Text in Scientific Publications by Summarisation and Classification Techniques
Title | Identifying Referenced Text in Scientific Publications by Summarisation and Classification Techniques |
Authors | Stefan Klampfl, Andi Rexha, Roman Kern |
Abstract | |
Tasks | Document Summarization, Information Retrieval |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-1514/ |
https://www.aclweb.org/anthology/W16-1514 | |
PWC | https://paperswithcode.com/paper/identifying-referenced-text-in-scientific |
Repo | |
Framework | |
Efficient techniques for parsing with tree automata
Title | Efficient techniques for parsing with tree automata |
Authors | Jonas Groschwitz, Alex Koller, er, Mark Johnson |
Abstract | |
Tasks | Machine Translation, Semantic Parsing |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-1192/ |
https://www.aclweb.org/anthology/P16-1192 | |
PWC | https://paperswithcode.com/paper/efficient-techniques-for-parsing-with-tree |
Repo | |
Framework | |
Towards Building a Political Protest Database to Explain Changes in the Welfare State
Title | Towards Building a Political Protest Database to Explain Changes in the Welfare State |
Authors | {\c{C}}a{\u{g}}{\i}l S{"o}nmez, Arzucan {"O}zg{"u}r, Erdem Y{"o}r{"u}k |
Abstract | |
Tasks | Time Series |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2113/ |
https://www.aclweb.org/anthology/W16-2113 | |
PWC | https://paperswithcode.com/paper/towards-building-a-political-protest-database |
Repo | |
Framework | |
Neural Attention for Learning to Rank Questions in Community Question Answering
Title | Neural Attention for Learning to Rank Questions in Community Question Answering |
Authors | Salvatore Romeo, Giovanni Da San Martino, Alberto Barr{'o}n-Cede{~n}o, Aless Moschitti, ro, Yonatan Belinkov, Wei-Ning Hsu, Yu Zhang, Mitra Mohtarami, James Glass |
Abstract | In real-world data, e.g., from Web forums, text is often contaminated with redundant or irrelevant content, which leads to introducing noise in machine learning algorithms. In this paper, we apply Long Short-Term Memory networks with an attention mechanism, which can select important parts of text for the task of similar question retrieval from community Question Answering (cQA) forums. In particular, we use the attention weights for both selecting entire sentences and their subparts, i.e., word/chunk, from shallow syntactic trees. More interestingly, we apply tree kernels to the filtered text representations, thus exploiting the implicit features of the subtree space for learning question reranking. Our results show that the attention-based pruning allows for achieving the top position in the cQA challenge of SemEval 2016, with a relatively large gap from the other participants while greatly decreasing running time. |
Tasks | Community Question Answering, Learning-To-Rank, Natural Language Inference, Question Answering |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1163/ |
https://www.aclweb.org/anthology/C16-1163 | |
PWC | https://paperswithcode.com/paper/neural-attention-for-learning-to-rank |
Repo | |
Framework | |
Cooperative Graphical Models
Title | Cooperative Graphical Models |
Authors | Josip Djolonga, Stefanie Jegelka, Sebastian Tschiatschek, Andreas Krause |
Abstract | We study a rich family of distributions that capture variable interactions significantly more expressive than those representable with low-treewidth or pairwise graphical models, or log-supermodular models. We call these cooperative graphical models. Yet, this family retains structure, which we carefully exploit for efficient inference techniques. Our algorithms combine the polyhedral structure of submodular functions in new ways with variational inference methods to obtain both lower and upper bounds on the partition function. While our fully convex upper bound is minimized as an SDP or via tree-reweighted belief propagation, our lower bound is tightened via belief propagation or mean-field algorithms. The resulting algorithms are easy to implement and, as our experiments show, effectively obtain good bounds and marginals for synthetic and real-world examples. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6122-cooperative-graphical-models |
http://papers.nips.cc/paper/6122-cooperative-graphical-models.pdf | |
PWC | https://paperswithcode.com/paper/cooperative-graphical-models |
Repo | |
Framework | |
Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis
Title | Rude waiter but mouthwatering pastries! An exploratory study into Dutch Aspect-Based Sentiment Analysis |
Authors | Orph{'e}e De Clercq, V{'e}ronique Hoste |
Abstract | The fine-grained task of automatically detecting all sentiment expressions within a given document and the aspects to which they refer is known as aspect-based sentiment analysis. In this paper we present the first full aspect-based sentiment analysis pipeline for Dutch and apply it to customer reviews. To this purpose, we collected reviews from two different domains, i.e. restaurant and smartphone reviews. Both corpora have been manually annotated using newly developed guidelines that comply to standard practices in the field. For our experimental pipeline we perceive aspect-based sentiment analysis as a task consisting of three main subtasks which have to be tackled incrementally: aspect term extraction, aspect category classification and polarity classification. First experiments on our Dutch restaurant corpus reveal that this is indeed a feasible approach that yields promising results. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1465/ |
https://www.aclweb.org/anthology/L16-1465 | |
PWC | https://paperswithcode.com/paper/rude-waiter-but-mouthwatering-pastries-an |
Repo | |
Framework | |
Substring-based unsupervised transliteration with phonetic and contextual knowledge
Title | Substring-based unsupervised transliteration with phonetic and contextual knowledge |
Authors | Anoop Kunchukuttan, Pushpak Bhattacharyya, Mitesh M. Khapra |
Abstract | |
Tasks | Information Retrieval, Machine Translation, Transliteration |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/K16-1027/ |
https://www.aclweb.org/anthology/K16-1027 | |
PWC | https://paperswithcode.com/paper/substring-based-unsupervised-transliteration |
Repo | |
Framework | |
The Gavagai Living Lexicon
Title | The Gavagai Living Lexicon |
Authors | Magnus Sahlgren, Amaru Cuba Gyllensten, Fredrik Espinoza, Ola Hamfors, Jussi Karlgren, Fredrik Olsson, Per Persson, Akshay Viswanathan, Anders Holst |
Abstract | This paper presents the Gavagai Living Lexicon, which is an online distributional semantic model currently available in 20 different languages. We describe the underlying distributional semantic model, and how we have solved some of the challenges in applying such a model to large amounts of streaming data. We also describe the architecture of our implementation, and discuss how we deal with continuous quality assurance of the lexicon. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1053/ |
https://www.aclweb.org/anthology/L16-1053 | |
PWC | https://paperswithcode.com/paper/the-gavagai-living-lexicon |
Repo | |
Framework | |
Odin’s Runes: A Rule Language for Information Extraction
Title | Odin’s Runes: A Rule Language for Information Extraction |
Authors | Marco A. Valenzuela-Esc{'a}rcega, Gus Hahn-Powell, Mihai Surdeanu |
Abstract | Odin is an information extraction framework that applies cascades of finite state automata over both surface text and syntactic dependency graphs. Support for syntactic patterns allow us to concisely define relations that are otherwise difficult to express in languages such as Common Pattern Specification Language (CPSL), which are currently limited to shallow linguistic features. The interaction of lexical and syntactic automata provides robustness and flexibility when writing extraction rules. This paper describes Odin{'}s declarative language for writing these cascaded automata. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1050/ |
https://www.aclweb.org/anthology/L16-1050 | |
PWC | https://paperswithcode.com/paper/odins-runes-a-rule-language-for-information |
Repo | |
Framework | |
Automatic Detection of Arabicized Berber and Arabic Varieties
Title | Automatic Detection of Arabicized Berber and Arabic Varieties |
Authors | Wafia Adouane, Nasredine Semmar, Richard Johansson, Victoria Bobicev |
Abstract | Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language processing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically processed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We introduce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94{%}. |
Tasks | Language Identification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4809/ |
https://www.aclweb.org/anthology/W16-4809 | |
PWC | https://paperswithcode.com/paper/automatic-detection-of-arabicized-berber-and |
Repo | |
Framework | |
The Gun Violence Database: A new task and data set for NLP
Title | The Gun Violence Database: A new task and data set for NLP |
Authors | Ellie Pavlick, Heng Ji, Xiaoman Pan, Chris Callison-Burch |
Abstract | |
Tasks | Coreference Resolution, Relation Extraction |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1106/ |
https://www.aclweb.org/anthology/D16-1106 | |
PWC | https://paperswithcode.com/paper/the-gun-violence-database-a-new-task-and-data |
Repo | |
Framework | |