May 5, 2019

1672 words 8 mins read

Paper Group NANR 39

Croatian Error-Annotated Corpus of Non-Professional Written Language. Cognitively Motivated Distributional Representations of Meaning. Improving Argument Overlap for Proposition-Based Summarisation. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC). Abstractive Sentence Summarization with Attentive Recurrent …

Croatian Error-Annotated Corpus of Non-Professional Written Language


Title	Croatian Error-Annotated Corpus of Non-Professional Written Language
Authors	Vanja {\v{S}}tefanec, Nikola Ljube{\v{s}}i{'c}, Jelena Kuva{\v{c}} Kraljevi{'c}
Abstract	In the paper authors present the Croatian corpus of non-professional written language. Consisting of two subcorpora, i.e. the clinical subcorpus, consisting of written texts produced by speakers with various types of language disorders, and the healthy speakers subcorpus, as well as by the levels of its annotation, it offers an opportunity for different lines of research. The authors present the corpus structure, describe the sampling methodology, explain the levels of annotation, and give some very basic statistics. On the basis of data from the corpus, existing language technologies for Croatian are adapted in order to be implemented in a platform facilitating text production to speakers with language disorders. In this respect, several analyses of the corpus data and a basic evaluation of the developed technologies are presented.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1513/
PDF	https://www.aclweb.org/anthology/L16-1513
PWC	https://paperswithcode.com/paper/croatian-error-annotated-corpus-of-non
Repo
Framework

Cognitively Motivated Distributional Representations of Meaning


Title	Cognitively Motivated Distributional Representations of Meaning
Authors	Elias Iosif, Spiros Georgiladakis, Alex Potamianos, ros
Abstract	Although meaning is at the core of human cognition, state-of-the-art distributional semantic models (DSMs) are often agnostic to the findings in the area of semantic cognition. In this work, we present a novel type of DSMs motivated by the dual-processing cognitive perspective that is triggered by lexico-semantic activations in the short-term human memory. The proposed model is shown to perform better than state-of-the-art models for computing semantic similarity between words. The fusion of different types of DSMs is also investigated achieving results that are comparable or better than the state-of-the-art. The used corpora along with a set of tools, as well as large repositories of vectorial word representations are made publicly available for four languages (English, German, Italian, and Greek).
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1195/
PDF	https://www.aclweb.org/anthology/L16-1195
PWC	https://paperswithcode.com/paper/cognitively-motivated-distributional
Repo
Framework

Improving Argument Overlap for Proposition-Based Summarisation


Title	Improving Argument Overlap for Proposition-Based Summarisation
Authors	Yimai Fang, Simone Teufel
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2078/
PDF	https://www.aclweb.org/anthology/P16-2078
PWC	https://paperswithcode.com/paper/improving-argument-overlap-for-proposition
Repo
Framework

Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)


Title	Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Authors
Abstract
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4100/
PDF	https://www.aclweb.org/anthology/W16-4100
PWC	https://paperswithcode.com/paper/proceedings-of-the-workshop-on-computational-7
Repo
Framework

Abstractive Sentence Summarization with Attentive Recurrent Neural Networks


Title	Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
Authors	Sumit Chopra, Michael Auli, Alex Rush, er M.
Abstract
Tasks	Abstractive Sentence Summarization, Language Modelling, Machine Translation, Text Summarization
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1012/
PDF	https://www.aclweb.org/anthology/N16-1012
PWC	https://paperswithcode.com/paper/abstractive-sentence-summarization-with
Repo
Framework

A Semi-Supervised Approach for Gender Identification


Title	A Semi-Supervised Approach for Gender Identification
Authors	Juan Soler, Leo Wanner
Abstract	In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70{%} with only 113 instances to train the model. It is also shown that the algorithm also performs well using publicly available data.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1204/
PDF	https://www.aclweb.org/anthology/L16-1204
PWC	https://paperswithcode.com/paper/a-semi-supervised-approach-for-gender
Repo
Framework

Psycholinguistic Features for Deceptive Role Detection in Werewolf


Title	Psycholinguistic Features for Deceptive Role Detection in Werewolf
Authors	Codruta Girlea, Roxana Girju, Eyal Amir
Abstract
Tasks	Deception Detection
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1047/
PDF	https://www.aclweb.org/anthology/N16-1047
PWC	https://paperswithcode.com/paper/psycholinguistic-features-for-deceptive-role
Repo
Framework

Minimally Supervised Number Normalization


Title	Minimally Supervised Number Normalization
Authors	Kyle Gorman, Richard Sproat
Abstract	We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.
Tasks	Speech Recognition, Speech Synthesis, Text-To-Speech Synthesis
Published	2016-01-01
URL	https://www.aclweb.org/anthology/Q16-1036/
PDF	https://www.aclweb.org/anthology/Q16-1036
PWC	https://paperswithcode.com/paper/minimally-supervised-number-normalization
Repo
Framework

Transition-Based Dependency Parsing with Heuristic Backtracking


Title	Transition-Based Dependency Parsing with Heuristic Backtracking
Authors	Jacob Buckman, Miguel Ballesteros, Chris Dyer
Abstract
Tasks	Dependency Parsing, Transition-Based Dependency Parsing
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1254/
PDF	https://www.aclweb.org/anthology/D16-1254
PWC	https://paperswithcode.com/paper/transition-based-dependency-parsing-with-3
Repo
Framework

Cross-lingual Wikification Using Multilingual Embeddings


Title	Cross-lingual Wikification Using Multilingual Embeddings
Authors	Chen-Tse Tsai, Dan Roth
Abstract
Tasks	Entity Linking, Knowledge Base Population, Machine Translation
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1072/
PDF	https://www.aclweb.org/anthology/N16-1072
PWC	https://paperswithcode.com/paper/cross-lingual-wikification-using-multilingual
Repo
Framework

Deconstructing Complex Search Tasks: a Bayesian Nonparametric Approach for Extracting Sub-tasks


Title	Deconstructing Complex Search Tasks: a Bayesian Nonparametric Approach for Extracting Sub-tasks
Authors	Rishabh Mehrotra, Prasanta Bhattacharya, Emine Yilmaz
Abstract
Tasks	Recommendation Systems, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1073/
PDF	https://www.aclweb.org/anthology/N16-1073
PWC	https://paperswithcode.com/paper/deconstructing-complex-search-tasks-a
Repo
Framework

Discovering Potential Terminological Relationships from Twitter’s Timed Content


Title	Discovering Potential Terminological Relationships from Twitter’s Timed Content
Authors	Mohammad Daoud, Daoud Daoud
Abstract	This paper presents a method to discover possible terminological relationships from tweets. We match the histories of terms (frequency patterns). Similar history indicates a possible relationship between terms. For example, if two terms (t1, t2) appeared frequently in Twitter at particular days, and there is a {`}similarity{'} in the frequencies over a period of time, then t1 and t2 can be related. Maintaining standard terminological repository with updated relationships can be difficult; especially in a dynamic domain such as social media where thousands of new terms (neology) are coined every day. So we propose to construct a raw repository of lexical units with unconfirmed relationships. We have experimented our method on time-sensitive Arabic terms used by the online Arabic community of Twitter. We draw relationships between these terms by matching their similar frequency patterns (timelines). We use dynamic time warping as a similarity measure. For evaluation, we have selected 630 possible terms (we call them preterms) and we matched the similarity of these terms over a period of 30 days. Around 270 correct relationships were discovered with a precision of 0.61. These relationships were extracted without considering the textual context of the term. \|
Tasks	Information Retrieval, Machine Translation
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5319/
PDF	https://www.aclweb.org/anthology/W16-5319
PWC	https://paperswithcode.com/paper/discovering-potential-terminological
Repo
Framework

Acquisition of semantic relations between terms: how far can we get with standard NLP tools?


Title	Acquisition of semantic relations between terms: how far can we get with standard NLP tools?
Authors	Ina Roesiger, Julia Bettinger, Johannes Sch{"a}fer, Michael Dorna, Ulrich Heid
Abstract	The extraction of data exemplifying relations between terms can make use, at least to a large extent, of techniques that are similar to those used in standard hybrid term candidate extraction, namely basic corpus analysis tools (e.g. tagging, lemmatization, parsing), as well as morphological analysis of complex words (compounds and derived items). In this article, we discuss the use of such techniques for the extraction of raw material for a description of relations between terms, and we provide internal evaluation data for the devices developed. We claim that user-generated content is a rich source of term variation through paraphrasing and reformulation, and that these provide relational data at the same time as term variants. Germanic languages with their rich word formation morphology may be particularly good candidates for the approach advocated here.
Tasks	Coreference Resolution, Lemmatization, Morphological Analysis, Text Classification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4706/
PDF	https://www.aclweb.org/anthology/W16-4706
PWC	https://paperswithcode.com/paper/acquisition-of-semantic-relations-between
Repo
Framework

Collecting Language Resources for the Latvian e-Government Machine Translation Platform


Title	Collecting Language Resources for the Latvian e-Government Machine Translation Platform
Authors	Roberts Rozis, Andrejs Vasi{\c{l}}jevs, Raivis Skadi{\c{n}}{\v{s}}
Abstract	This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian English and Latvian Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1202/
PDF	https://www.aclweb.org/anthology/L16-1202
PWC	https://paperswithcode.com/paper/collecting-language-resources-for-the-latvian
Repo
Framework

Edit Categories and Editor Role Identification in Wikipedia


Title	Edit Categories and Editor Role Identification in Wikipedia
Authors	Diyi Yang, Aaron Halfaker, Robert Kraut, Eduard Hovy
Abstract	In this work, we introduced a corpus for categorizing edit types in Wikipedia. This fine-grained taxonomy of edit types enables us to differentiate editing actions and find editor roles in Wikipedia based on their low-level edit types. To do this, we first created an annotated corpus based on 1,996 edits obtained from 953 article revisions and built machine-learning models to automatically identify the edit categories associated with edits. Building on this automated measurement of edit types, we then applied a graphical model analogous to Latent Dirichlet Allocation to uncover the latent roles in editors{'} edit histories. Applying this technique revealed eight different roles editors play, such as Social Networker, Substantive Expert, etc.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1206/
PDF	https://www.aclweb.org/anthology/L16-1206
PWC	https://paperswithcode.com/paper/edit-categories-and-editor-role
Repo
Framework