May 5, 2019

1823 words 9 mins read

Paper Group NANR 9

Paper Group NANR 9

Top a Splitter: Using Distributional Semantics for Improving Compound Splitting. Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine. Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art. Operational Assessment of Keyword Search on Oral Hi …

Top a Splitter: Using Distributional Semantics for Improving Compound Splitting

Title Top a Splitter: Using Distributional Semantics for Improving Compound Splitting
Authors Patrick Ziering, Stefan M{"u}ller, Lonneke van der Plas
Abstract
Tasks Machine Translation, Semantic Textual Similarity
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-1807/
PDF https://www.aclweb.org/anthology/W16-1807
PWC https://paperswithcode.com/paper/top-a-splitter-using-distributional-semantics
Repo
Framework

Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine

Title Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine
Authors Esk, Ramy er, Nizar Habash, Owen Rambow, Arfath Pasha
Abstract Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much. However, as more and more written dialectal Arabic is found in social media, NLP for Arabic dialects becomes an important goal. We present a methodology for creating a morphological analyzer and a morphological tagger for dialectal Arabic, and we illustrate it on Egyptian and Levantine Arabic. To our knowledge, these are the first analyzer and tagger for Levantine.
Tasks Morphological Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1326/
PDF https://www.aclweb.org/anthology/C16-1326
PWC https://paperswithcode.com/paper/creating-resources-for-dialectal-arabic-from
Repo
Framework

Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art

Title Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
Authors Steffen Eger, R{"u}diger Gleim, Alex Mehler, er
Abstract This paper relates to the challenge of morphological tagging and lemmatization in morphologically rich languages by example of German and Latin. We focus on the question what a practitioner can expect when using state-of-the-art solutions out of the box. Moreover, we contrast these with old(er) methods and implementations for POS tagging. We examine to what degree recent efforts in tagger development are reflected by improved accuracies ― and at what cost, in terms of training and processing time. We also conduct in-domain vs. out-domain evaluation. Out-domain evaluations are particularly insightful because the distribution of the data which is being tagged by a user will typically differ from the distribution on which the tagger has been trained. Furthermore, two lemmatization techniques are evaluated. Finally, we compare pipeline tagging vs. a tagging approach that acknowledges dependencies between inflectional categories.
Tasks Lemmatization, Morphological Tagging
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1239/
PDF https://www.aclweb.org/anthology/L16-1239
PWC https://paperswithcode.com/paper/lemmatization-and-morphological-tagging-in
Repo
Framework

Operational Assessment of Keyword Search on Oral History

Title Operational Assessment of Keyword Search on Oral History
Authors Elizabeth Salesky, Jessica Ray, Wade Shen
Abstract This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.
Tasks Information Retrieval, Speech Recognition
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1049/
PDF https://www.aclweb.org/anthology/L16-1049
PWC https://paperswithcode.com/paper/operational-assessment-of-keyword-search-on
Repo
Framework

Data, tools and resources for mining social media drug chatter

Title Data, tools and resources for mining social media drug chatter
Authors Abeed Sarker, Graciela Gonzalez
Abstract Social media has emerged into a crucial resource for obtaining population-based signals for various public health monitoring and surveillance tasks, such as pharmacovigilance. There is an abundance of knowledge hidden within social media data, and the volume is growing. Drug-related chatter on social media can include user-generated information that can provide insights into public health problems such as abuse, adverse reactions, long-term effects, and multi-drug interactions. Our objective in this paper is to present to the biomedical natural language processing, data science, and public health communities data sets (annotated and unannotated), tools and resources that we have collected and created from social media. The data we present was collected from Twitter using the generic and brand names of drugs as keywords, along with their common misspellings. Following the collection of the data, annotation guidelines were created over several iterations, which detail important aspects of social media data annotation and can be used by future researchers for developing similar data sets. The annotation guidelines were followed to prepare data sets for text classification, information extraction and normalization. In this paper, we discuss the preparation of these guidelines, outline the data sets prepared, and present an overview of our state-of-the-art systems for data collection, supervised classification, and information extraction. In addition to the development of supervised systems for classification and extraction, we developed and released unlabeled data and language models. We discuss the potential uses of these language models in data mining and the large volumes of unlabeled data from which they were generated. We believe that the summaries and repositories we present here of our data, annotation guidelines, models, and tools will be beneficial to the research community as a single-point entry for all these resources, and will promote further research in this area.
Tasks Epidemiology, Text Classification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5111/
PDF https://www.aclweb.org/anthology/W16-5111
PWC https://paperswithcode.com/paper/data-tools-and-resources-for-mining-social
Repo
Framework

Crowdsourcing Salient Information from News and Tweets

Title Crowdsourcing Salient Information from News and Tweets
Authors Oana Inel, Tommaso Caselli, Lora Aroyo
Abstract The increasing streams of information pose challenges to both humans and machines. On the one hand, humans need to identify relevant information and consume only the information that lies at their interests. On the other hand, machines need to understand the information that is published in online data streams and generate concise and meaningful overviews. We consider events as prime factors to query for information and generate meaningful context. The focus of this paper is to acquire empirical insights for identifying salience features in tweets and news about a target event, i.e., the event of {``}whaling{''}. We first derive a methodology to identify such features by building up a knowledge space of the event enriched with relevant phrases, sentiments and ranked by their novelty. We applied this methodology on tweets and we have performed preliminary work towards adapting it to news articles. Our results show that crowdsourcing text relevance, sentiments and novelty (1) can be a main step in identifying salient information, and (2) provides a deeper and more precise understanding of the data at hand compared to state-of-the-art approaches. |
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1625/
PDF https://www.aclweb.org/anthology/L16-1625
PWC https://paperswithcode.com/paper/crowdsourcing-salient-information-from-news
Repo
Framework

Metrics for Evaluation of Word-level Machine Translation Quality Estimation

Title Metrics for Evaluation of Word-level Machine Translation Quality Estimation
Authors Varvara Logacheva, Michal Lukasik, Lucia Specia
Abstract
Tasks Machine Translation
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2095/
PDF https://www.aclweb.org/anthology/P16-2095
PWC https://paperswithcode.com/paper/metrics-for-evaluation-of-word-level-machine
Repo
Framework

Parallel Speech Corpora of Japanese Dialects

Title Parallel Speech Corpora of Japanese Dialects
Authors Koichiro Yoshino, Naoki Hirayama, Shinsuke Mori, Fumihiko Takahashi, Katsutoshi Itoyama, Hiroshi G. Okuno
Abstract Binary file summaries/549.html matches
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1737/
PDF https://www.aclweb.org/anthology/L16-1737
PWC https://paperswithcode.com/paper/parallel-speech-corpora-of-japanese-dialects
Repo
Framework

Detection of Text Reuse in French Medical Corpora

Title Detection of Text Reuse in French Medical Corpora
Authors Eva D{'}hondt, Cyril Grouin, Aur{'e}lie N{'e}v{'e}ol, Efstathios Stamatatos, Pierre Zweigenbaum
Abstract Electronic Health Records (EHRs) are increasingly available in modern health care institutions either through the direct creation of electronic documents in hospitals{'} health information systems, or through the digitization of historical paper records. Each EHR creation method yields the need for sophisticated text reuse detection tools in order to prepare the EHR collections for efficient secondary use relying on Natural Language Processing methods. Herein, we address the detection of two types of text reuse in French EHRs: 1) the detection of updated versions of the same document and 2) the detection of document duplicates that still bear surface differences due to OCR or de-identification processing. We present a robust text reuse detection method to automatically identify redundant document pairs in two French EHR corpora that achieves an overall macro F-measure of 0.68 and 0.60, respectively and correctly identifies all redundant document pairs of interest.
Tasks Optical Character Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-5112/
PDF https://www.aclweb.org/anthology/W16-5112
PWC https://paperswithcode.com/paper/detection-of-text-reuse-in-french-medical
Repo
Framework

Learning to Generate Textual Data

Title Learning to Generate Textual Data
Authors Guillaume Bouchard, Pontus Stenetorp, Sebastian Riedel
Abstract
Tasks Recommendation Systems, Transfer Learning
Published 2016-11-01
URL https://www.aclweb.org/anthology/D16-1167/
PDF https://www.aclweb.org/anthology/D16-1167
PWC https://paperswithcode.com/paper/learning-to-generate-textual-data
Repo
Framework

Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes

Title Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes
Authors Davaajav Jargalsaikhan, Naoaki Okazaki, Koji Matsuda, Kentaro Inui
Abstract
Tasks Coreference Resolution, Entity Linking, Information Retrieval, Knowledge Base Population, Question Answering
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-3021/
PDF https://www.aclweb.org/anthology/P16-3021
PWC https://paperswithcode.com/paper/building-a-corpus-for-japanese-wikification
Repo
Framework

Suggestion Mining from Opinionated Text

Title Suggestion Mining from Opinionated Text
Authors Sapna Negi
Abstract
Tasks Opinion Mining, Sentence Classification
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-3018/
PDF https://www.aclweb.org/anthology/P16-3018
PWC https://paperswithcode.com/paper/suggestion-mining-from-opinionated-text
Repo
Framework

Inspire at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity Alignment based on Answer Set Programming

Title Inspire at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity Alignment based on Answer Set Programming
Authors Mishal Kazmi, Peter Sch{"u}ller
Abstract
Tasks Chunking, Semantic Textual Similarity
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1171/
PDF https://www.aclweb.org/anthology/S16-1171
PWC https://paperswithcode.com/paper/inspire-at-semeval-2016-task-2-interpretable
Repo
Framework

Learning Additive Exponential Family Graphical Models via \ell_{2,1}-norm Regularized M-Estimation

Title Learning Additive Exponential Family Graphical Models via \ell_{2,1}-norm Regularized M-Estimation
Authors Xiaotong Yuan, Ping Li, Tong Zhang, Qingshan Liu, Guangcan Liu
Abstract We investigate a subclass of exponential family graphical models of which the sufficient statistics are defined by arbitrary additive forms. We propose two $\ell_{2,1}$-norm regularized maximum likelihood estimators to learn the model parameters from i.i.d. samples. The first one is a joint MLE estimator which estimates all the parameters simultaneously. The second one is a node-wise conditional MLE estimator which estimates the parameters for each node individually. For both estimators, statistical analysis shows that under mild conditions the extra flexibility gained by the additive exponential family models comes at almost no cost of statistical efficiency. A Monte-Carlo approximation method is developed to efficiently optimize the proposed estimators. The advantages of our estimators over Gaussian graphical models and Nonparanormal estimators are demonstrated on synthetic and real data sets.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6106-learning-additive-exponential-family-graphical-models-via-ell_21-norm-regularized-m-estimation
PDF http://papers.nips.cc/paper/6106-learning-additive-exponential-family-graphical-models-via-ell_21-norm-regularized-m-estimation.pdf
PWC https://paperswithcode.com/paper/learning-additive-exponential-family
Repo
Framework

TGB at SemEval-2016 Task 5: Multi-Lingual Constraint System for Aspect Based Sentiment Analysis

Title TGB at SemEval-2016 Task 5: Multi-Lingual Constraint System for Aspect Based Sentiment Analysis
Authors Fatih Samet {\c{C}}etin, Ezgi Y{\i}ld{\i}r{\i}m, Can {"O}zbey, G{"u}l{\c{s}}en Eryi{\u{g}}it
Abstract
Tasks Aspect-Based Sentiment Analysis, Opinion Mining, Sentiment Analysis
Published 2016-06-01
URL https://www.aclweb.org/anthology/S16-1054/
PDF https://www.aclweb.org/anthology/S16-1054
PWC https://paperswithcode.com/paper/tgb-at-semeval-2016-task-5-multi-lingual
Repo
Framework
comments powered by Disqus