Paper Group NANR 39
Croatian Error-Annotated Corpus of Non-Professional Written Language. Cognitively Motivated Distributional Representations of Meaning. Improving Argument Overlap for Proposition-Based Summarisation. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC). Abstractive Sentence Summarization with Attentive Recurrent …
Croatian Error-Annotated Corpus of Non-Professional Written Language
Title | Croatian Error-Annotated Corpus of Non-Professional Written Language |
Authors | Vanja {\v{S}}tefanec, Nikola Ljube{\v{s}}i{'c}, Jelena Kuva{\v{c}} Kraljevi{'c} |
Abstract | In the paper authors present the Croatian corpus of non-professional written language. Consisting of two subcorpora, i.e. the clinical subcorpus, consisting of written texts produced by speakers with various types of language disorders, and the healthy speakers subcorpus, as well as by the levels of its annotation, it offers an opportunity for different lines of research. The authors present the corpus structure, describe the sampling methodology, explain the levels of annotation, and give some very basic statistics. On the basis of data from the corpus, existing language technologies for Croatian are adapted in order to be implemented in a platform facilitating text production to speakers with language disorders. In this respect, several analyses of the corpus data and a basic evaluation of the developed technologies are presented. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1513/ |
https://www.aclweb.org/anthology/L16-1513 | |
PWC | https://paperswithcode.com/paper/croatian-error-annotated-corpus-of-non |
Repo | |
Framework | |
Cognitively Motivated Distributional Representations of Meaning
Title | Cognitively Motivated Distributional Representations of Meaning |
Authors | Elias Iosif, Spiros Georgiladakis, Alex Potamianos, ros |
Abstract | Although meaning is at the core of human cognition, state-of-the-art distributional semantic models (DSMs) are often agnostic to the findings in the area of semantic cognition. In this work, we present a novel type of DSMs motivated by the dual-processing cognitive perspective that is triggered by lexico-semantic activations in the short-term human memory. The proposed model is shown to perform better than state-of-the-art models for computing semantic similarity between words. The fusion of different types of DSMs is also investigated achieving results that are comparable or better than the state-of-the-art. The used corpora along with a set of tools, as well as large repositories of vectorial word representations are made publicly available for four languages (English, German, Italian, and Greek). |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1195/ |
https://www.aclweb.org/anthology/L16-1195 | |
PWC | https://paperswithcode.com/paper/cognitively-motivated-distributional |
Repo | |
Framework | |
Improving Argument Overlap for Proposition-Based Summarisation
Title | Improving Argument Overlap for Proposition-Based Summarisation |
Authors | Yimai Fang, Simone Teufel |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2078/ |
https://www.aclweb.org/anthology/P16-2078 | |
PWC | https://paperswithcode.com/paper/improving-argument-overlap-for-proposition |
Repo | |
Framework | |
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Title | Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC) |
Authors | |
Abstract | |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4100/ |
https://www.aclweb.org/anthology/W16-4100 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-workshop-on-computational-7 |
Repo | |
Framework | |
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
Title | Abstractive Sentence Summarization with Attentive Recurrent Neural Networks |
Authors | Sumit Chopra, Michael Auli, Alex Rush, er M. |
Abstract | |
Tasks | Abstractive Sentence Summarization, Language Modelling, Machine Translation, Text Summarization |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1012/ |
https://www.aclweb.org/anthology/N16-1012 | |
PWC | https://paperswithcode.com/paper/abstractive-sentence-summarization-with |
Repo | |
Framework | |
A Semi-Supervised Approach for Gender Identification
Title | A Semi-Supervised Approach for Gender Identification |
Authors | Juan Soler, Leo Wanner |
Abstract | In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70{%} with only 113 instances to train the model. It is also shown that the algorithm also performs well using publicly available data. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1204/ |
https://www.aclweb.org/anthology/L16-1204 | |
PWC | https://paperswithcode.com/paper/a-semi-supervised-approach-for-gender |
Repo | |
Framework | |
Psycholinguistic Features for Deceptive Role Detection in Werewolf
Title | Psycholinguistic Features for Deceptive Role Detection in Werewolf |
Authors | Codruta Girlea, Roxana Girju, Eyal Amir |
Abstract | |
Tasks | Deception Detection |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1047/ |
https://www.aclweb.org/anthology/N16-1047 | |
PWC | https://paperswithcode.com/paper/psycholinguistic-features-for-deceptive-role |
Repo | |
Framework | |
Minimally Supervised Number Normalization
Title | Minimally Supervised Number Normalization |
Authors | Kyle Gorman, Richard Sproat |
Abstract | We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages. |
Tasks | Speech Recognition, Speech Synthesis, Text-To-Speech Synthesis |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/Q16-1036/ |
https://www.aclweb.org/anthology/Q16-1036 | |
PWC | https://paperswithcode.com/paper/minimally-supervised-number-normalization |
Repo | |
Framework | |
Transition-Based Dependency Parsing with Heuristic Backtracking
Title | Transition-Based Dependency Parsing with Heuristic Backtracking |
Authors | Jacob Buckman, Miguel Ballesteros, Chris Dyer |
Abstract | |
Tasks | Dependency Parsing, Transition-Based Dependency Parsing |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1254/ |
https://www.aclweb.org/anthology/D16-1254 | |
PWC | https://paperswithcode.com/paper/transition-based-dependency-parsing-with-3 |
Repo | |
Framework | |
Cross-lingual Wikification Using Multilingual Embeddings
Title | Cross-lingual Wikification Using Multilingual Embeddings |
Authors | Chen-Tse Tsai, Dan Roth |
Abstract | |
Tasks | Entity Linking, Knowledge Base Population, Machine Translation |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1072/ |
https://www.aclweb.org/anthology/N16-1072 | |
PWC | https://paperswithcode.com/paper/cross-lingual-wikification-using-multilingual |
Repo | |
Framework | |
Deconstructing Complex Search Tasks: a Bayesian Nonparametric Approach for Extracting Sub-tasks
Title | Deconstructing Complex Search Tasks: a Bayesian Nonparametric Approach for Extracting Sub-tasks |
Authors | Rishabh Mehrotra, Prasanta Bhattacharya, Emine Yilmaz |
Abstract | |
Tasks | Recommendation Systems, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1073/ |
https://www.aclweb.org/anthology/N16-1073 | |
PWC | https://paperswithcode.com/paper/deconstructing-complex-search-tasks-a |
Repo | |
Framework | |
Discovering Potential Terminological Relationships from Twitter’s Timed Content
Title | Discovering Potential Terminological Relationships from Twitter’s Timed Content |
Authors | Mohammad Daoud, Daoud Daoud |
Abstract | This paper presents a method to discover possible terminological relationships from tweets. We match the histories of terms (frequency patterns). Similar history indicates a possible relationship between terms. For example, if two terms (t1, t2) appeared frequently in Twitter at particular days, and there is a {`}similarity{'} in the frequencies over a period of time, then t1 and t2 can be related. Maintaining standard terminological repository with updated relationships can be difficult; especially in a dynamic domain such as social media where thousands of new terms (neology) are coined every day. So we propose to construct a raw repository of lexical units with unconfirmed relationships. We have experimented our method on time-sensitive Arabic terms used by the online Arabic community of Twitter. We draw relationships between these terms by matching their similar frequency patterns (timelines). We use dynamic time warping as a similarity measure. For evaluation, we have selected 630 possible terms (we call them preterms) and we matched the similarity of these terms over a period of 30 days. Around 270 correct relationships were discovered with a precision of 0.61. These relationships were extracted without considering the textual context of the term. | |
Tasks | Information Retrieval, Machine Translation |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-5319/ |
https://www.aclweb.org/anthology/W16-5319 | |
PWC | https://paperswithcode.com/paper/discovering-potential-terminological |
Repo | |
Framework | |
Acquisition of semantic relations between terms: how far can we get with standard NLP tools?
Title | Acquisition of semantic relations between terms: how far can we get with standard NLP tools? |
Authors | Ina Roesiger, Julia Bettinger, Johannes Sch{"a}fer, Michael Dorna, Ulrich Heid |
Abstract | The extraction of data exemplifying relations between terms can make use, at least to a large extent, of techniques that are similar to those used in standard hybrid term candidate extraction, namely basic corpus analysis tools (e.g. tagging, lemmatization, parsing), as well as morphological analysis of complex words (compounds and derived items). In this article, we discuss the use of such techniques for the extraction of raw material for a description of relations between terms, and we provide internal evaluation data for the devices developed. We claim that user-generated content is a rich source of term variation through paraphrasing and reformulation, and that these provide relational data at the same time as term variants. Germanic languages with their rich word formation morphology may be particularly good candidates for the approach advocated here. |
Tasks | Coreference Resolution, Lemmatization, Morphological Analysis, Text Classification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4706/ |
https://www.aclweb.org/anthology/W16-4706 | |
PWC | https://paperswithcode.com/paper/acquisition-of-semantic-relations-between |
Repo | |
Framework | |
Collecting Language Resources for the Latvian e-Government Machine Translation Platform
Title | Collecting Language Resources for the Latvian e-Government Machine Translation Platform |
Authors | Roberts Rozis, Andrejs Vasi{\c{l}}jevs, Raivis Skadi{\c{n}}{\v{s}} |
Abstract | This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian English and Latvian Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1202/ |
https://www.aclweb.org/anthology/L16-1202 | |
PWC | https://paperswithcode.com/paper/collecting-language-resources-for-the-latvian |
Repo | |
Framework | |
Edit Categories and Editor Role Identification in Wikipedia
Title | Edit Categories and Editor Role Identification in Wikipedia |
Authors | Diyi Yang, Aaron Halfaker, Robert Kraut, Eduard Hovy |
Abstract | In this work, we introduced a corpus for categorizing edit types in Wikipedia. This fine-grained taxonomy of edit types enables us to differentiate editing actions and find editor roles in Wikipedia based on their low-level edit types. To do this, we first created an annotated corpus based on 1,996 edits obtained from 953 article revisions and built machine-learning models to automatically identify the edit categories associated with edits. Building on this automated measurement of edit types, we then applied a graphical model analogous to Latent Dirichlet Allocation to uncover the latent roles in editors{'} edit histories. Applying this technique revealed eight different roles editors play, such as Social Networker, Substantive Expert, etc. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1206/ |
https://www.aclweb.org/anthology/L16-1206 | |
PWC | https://paperswithcode.com/paper/edit-categories-and-editor-role |
Repo | |
Framework | |