May 5, 2019

1672 words 8 mins read

Paper Group NANR 39

Paper Group NANR 39

Croatian Error-Annotated Corpus of Non-Professional Written Language. Cognitively Motivated Distributional Representations of Meaning. Improving Argument Overlap for Proposition-Based Summarisation. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC). Abstractive Sentence Summarization with Attentive Recurrent …

Croatian Error-Annotated Corpus of Non-Professional Written Language

Title Croatian Error-Annotated Corpus of Non-Professional Written Language
Authors Vanja {\v{S}}tefanec, Nikola Ljube{\v{s}}i{'c}, Jelena Kuva{\v{c}} Kraljevi{'c}
Abstract In the paper authors present the Croatian corpus of non-professional written language. Consisting of two subcorpora, i.e. the clinical subcorpus, consisting of written texts produced by speakers with various types of language disorders, and the healthy speakers subcorpus, as well as by the levels of its annotation, it offers an opportunity for different lines of research. The authors present the corpus structure, describe the sampling methodology, explain the levels of annotation, and give some very basic statistics. On the basis of data from the corpus, existing language technologies for Croatian are adapted in order to be implemented in a platform facilitating text production to speakers with language disorders. In this respect, several analyses of the corpus data and a basic evaluation of the developed technologies are presented.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1513/
PDF https://www.aclweb.org/anthology/L16-1513
PWC https://paperswithcode.com/paper/croatian-error-annotated-corpus-of-non
Repo
Framework

Cognitively Motivated Distributional Representations of Meaning

Title Cognitively Motivated Distributional Representations of Meaning
Authors Elias Iosif, Spiros Georgiladakis, Alex Potamianos, ros
Abstract Although meaning is at the core of human cognition, state-of-the-art distributional semantic models (DSMs) are often agnostic to the findings in the area of semantic cognition. In this work, we present a novel type of DSMs motivated by the dual-processing cognitive perspective that is triggered by lexico-semantic activations in the short-term human memory. The proposed model is shown to perform better than state-of-the-art models for computing semantic similarity between words. The fusion of different types of DSMs is also investigated achieving results that are comparable or better than the state-of-the-art. The used corpora along with a set of tools, as well as large repositories of vectorial word representations are made publicly available for four languages (English, German, Italian, and Greek).
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1195/
PDF https://www.aclweb.org/anthology/L16-1195
PWC https://paperswithcode.com/paper/cognitively-motivated-distributional
Repo
Framework

Improving Argument Overlap for Proposition-Based Summarisation

Title Improving Argument Overlap for Proposition-Based Summarisation
Authors Yimai Fang, Simone Teufel
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2078/
PDF https://www.aclweb.org/anthology/P16-2078
PWC https://paperswithcode.com/paper/improving-argument-overlap-for-proposition
Repo
Framework

Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Title Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Authors
Abstract
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4100/
PDF https://www.aclweb.org/anthology/W16-4100
PWC https://paperswithcode.com/paper/proceedings-of-the-workshop-on-computational-7
Repo
Framework

Abstractive Sentence Summarization with Attentive Recurrent Neural Networks

Title Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
Authors Sumit Chopra, Michael Auli, Alex Rush, er M.
Abstract
Tasks Abstractive Sentence Summarization, Language Modelling, Machine Translation, Text Summarization
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1012/
PDF https://www.aclweb.org/anthology/N16-1012
PWC https://paperswithcode.com/paper/abstractive-sentence-summarization-with
Repo
Framework

A Semi-Supervised Approach for Gender Identification

Title A Semi-Supervised Approach for Gender Identification
Authors Juan Soler, Leo Wanner
Abstract In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70{%} with only 113 instances to train the model. It is also shown that the algorithm also performs well using publicly available data.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1204/
PDF https://www.aclweb.org/anthology/L16-1204
PWC https://paperswithcode.com/paper/a-semi-supervised-approach-for-gender
Repo
Framework

Psycholinguistic Features for Deceptive Role Detection in Werewolf

Title Psycholinguistic Features for Deceptive Role Detection in Werewolf
Authors Codruta Girlea, Roxana Girju, Eyal Amir
Abstract
Tasks Deception Detection
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1047/
PDF https://www.aclweb.org/anthology/N16-1047
PWC https://paperswithcode.com/paper/psycholinguistic-features-for-deceptive-role
Repo
Framework

Minimally Supervised Number Normalization

Title Minimally Supervised Number Normalization
Authors Kyle Gorman, Richard Sproat
Abstract We propose two models for verbalizing numbers, a key component in speech recognition and synthesis systems. The first model uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages.
Tasks Speech Recognition, Speech Synthesis, Text-To-Speech Synthesis
Published 2016-01-01
URL https://www.aclweb.org/anthology/Q16-1036/
PDF https://www.aclweb.org/anthology/Q16-1036
PWC https://paperswithcode.com/paper/minimally-supervised-number-normalization
Repo
Framework

Transition-Based Dependency Parsing with Heuristic Backtracking

Title Transition-Based Dependency Parsing with Heuristic Backtracking
Authors Jacob Buckman, Miguel Ballesteros, Chris Dyer
Abstract
Tasks Dependency Parsing, Transition-Based Dependency Parsing
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1254/
PDF https://www.aclweb.org/anthology/D16-1254
PWC https://paperswithcode.com/paper/transition-based-dependency-parsing-with-3
Repo
Framework

Cross-lingual Wikification Using Multilingual Embeddings

Title Cross-lingual Wikification Using Multilingual Embeddings
Authors Chen-Tse Tsai, Dan Roth
Abstract
Tasks Entity Linking, Knowledge Base Population, Machine Translation
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1072/
PDF https://www.aclweb.org/anthology/N16-1072
PWC https://paperswithcode.com/paper/cross-lingual-wikification-using-multilingual
Repo
Framework

Deconstructing Complex Search Tasks: a Bayesian Nonparametric Approach for Extracting Sub-tasks

Title Deconstructing Complex Search Tasks: a Bayesian Nonparametric Approach for Extracting Sub-tasks
Authors Rishabh Mehrotra, Prasanta Bhattacharya, Emine Yilmaz
Abstract
Tasks Recommendation Systems, Word Embeddings
Published 2016-06-01
URL https://www.aclweb.org/anthology/N16-1073/
PDF https://www.aclweb.org/anthology/N16-1073
PWC https://paperswithcode.com/paper/deconstructing-complex-search-tasks-a
Repo
Framework

Discovering Potential Terminological Relationships from Twitter’s Timed Content

Title Discovering Potential Terminological Relationships from Twitter’s Timed Content
Authors Mohammad Daoud, Daoud Daoud
Abstract This paper presents a method to discover possible terminological relationships from tweets. We match the histories of terms (frequency patterns). Similar history indicates a possible relationship between terms. For example, if two terms (t1, t2) appeared frequently in Twitter at particular days, and there is a {`}similarity{'} in the frequencies over a period of time, then t1 and t2 can be related. Maintaining standard terminological repository with updated relationships can be difficult; especially in a dynamic domain such as social media where thousands of new terms (neology) are coined every day. So we propose to construct a raw repository of lexical units with unconfirmed relationships. We have experimented our method on time-sensitive Arabic terms used by the online Arabic community of Twitter. We draw relationships between these terms by matching their similar frequency patterns (timelines). We use dynamic time warping as a similarity measure. For evaluation, we have selected 630 possible terms (we call them preterms) and we matched the similarity of these terms over a period of 30 days. Around 270 correct relationships were discovered with a precision of 0.61. These relationships were extracted without considering the textual context of the term. |
Tasks Information Retrieval, Machine Translation
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5319/
PDF https://www.aclweb.org/anthology/W16-5319
PWC https://paperswithcode.com/paper/discovering-potential-terminological
Repo
Framework

Acquisition of semantic relations between terms: how far can we get with standard NLP tools?

Title Acquisition of semantic relations between terms: how far can we get with standard NLP tools?
Authors Ina Roesiger, Julia Bettinger, Johannes Sch{"a}fer, Michael Dorna, Ulrich Heid
Abstract The extraction of data exemplifying relations between terms can make use, at least to a large extent, of techniques that are similar to those used in standard hybrid term candidate extraction, namely basic corpus analysis tools (e.g. tagging, lemmatization, parsing), as well as morphological analysis of complex words (compounds and derived items). In this article, we discuss the use of such techniques for the extraction of raw material for a description of relations between terms, and we provide internal evaluation data for the devices developed. We claim that user-generated content is a rich source of term variation through paraphrasing and reformulation, and that these provide relational data at the same time as term variants. Germanic languages with their rich word formation morphology may be particularly good candidates for the approach advocated here.
Tasks Coreference Resolution, Lemmatization, Morphological Analysis, Text Classification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4706/
PDF https://www.aclweb.org/anthology/W16-4706
PWC https://paperswithcode.com/paper/acquisition-of-semantic-relations-between
Repo
Framework

Collecting Language Resources for the Latvian e-Government Machine Translation Platform

Title Collecting Language Resources for the Latvian e-Government Machine Translation Platform
Authors Roberts Rozis, Andrejs Vasi{\c{l}}jevs, Raivis Skadi{\c{n}}{\v{s}}
Abstract This paper describes corpora collection activity for building large machine translation systems for Latvian e-Government platform. We describe requirements for corpora, selection and assessment of data sources, collection of the public corpora and creation of new corpora from miscellaneous sources. Methodology, tools and assessment methods are also presented along with the results achieved, challenges faced and conclusions made. Several approaches to address the data scarceness are discussed. We summarize the volume of obtained corpora and provide quality metrics of MT systems trained on this data. Resulting MT systems for English-Latvian, Latvian English and Latvian Russian are integrated in the Latvian e-service portal and are freely available on website HUGO.LV. This paper can serve as a guidance for similar activities initiated in other countries, particularly in the context of European Language Resource Coordination action.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1202/
PDF https://www.aclweb.org/anthology/L16-1202
PWC https://paperswithcode.com/paper/collecting-language-resources-for-the-latvian
Repo
Framework

Edit Categories and Editor Role Identification in Wikipedia

Title Edit Categories and Editor Role Identification in Wikipedia
Authors Diyi Yang, Aaron Halfaker, Robert Kraut, Eduard Hovy
Abstract In this work, we introduced a corpus for categorizing edit types in Wikipedia. This fine-grained taxonomy of edit types enables us to differentiate editing actions and find editor roles in Wikipedia based on their low-level edit types. To do this, we first created an annotated corpus based on 1,996 edits obtained from 953 article revisions and built machine-learning models to automatically identify the edit categories associated with edits. Building on this automated measurement of edit types, we then applied a graphical model analogous to Latent Dirichlet Allocation to uncover the latent roles in editors{'} edit histories. Applying this technique revealed eight different roles editors play, such as Social Networker, Substantive Expert, etc.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1206/
PDF https://www.aclweb.org/anthology/L16-1206
PWC https://paperswithcode.com/paper/edit-categories-and-editor-role
Repo
Framework
comments powered by Disqus