May 5, 2019

1823 words 9 mins read

Paper Group NANR 9

Top a Splitter: Using Distributional Semantics for Improving Compound Splitting. Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine. Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art. Operational Assessment of Keyword Search on Oral Hi …

Top a Splitter: Using Distributional Semantics for Improving Compound Splitting


Title	Top a Splitter: Using Distributional Semantics for Improving Compound Splitting
Authors	Patrick Ziering, Stefan M{"u}ller, Lonneke van der Plas
Abstract
Tasks	Machine Translation, Semantic Textual Similarity
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1807/
PDF	https://www.aclweb.org/anthology/W16-1807
PWC	https://paperswithcode.com/paper/top-a-splitter-using-distributional-semantics
Repo
Framework

Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine


Title	Creating Resources for Dialectal Arabic from a Single Annotation: A Case Study on Egyptian and Levantine
Authors	Esk, Ramy er, Nizar Habash, Owen Rambow, Arfath Pasha
Abstract	Arabic dialects present a special problem for natural language processing because there are few resources, they have no standard orthography, and have not been studied much. However, as more and more written dialectal Arabic is found in social media, NLP for Arabic dialects becomes an important goal. We present a methodology for creating a morphological analyzer and a morphological tagger for dialectal Arabic, and we illustrate it on Egyptian and Levantine Arabic. To our knowledge, these are the first analyzer and tagger for Levantine.
Tasks	Morphological Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1326/
PDF	https://www.aclweb.org/anthology/C16-1326
PWC	https://paperswithcode.com/paper/creating-resources-for-dialectal-arabic-from
Repo
Framework

Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art


Title	Lemmatization and Morphological Tagging in German and Latin: A Comparison and a Survey of the State-of-the-art
Authors	Steffen Eger, R{"u}diger Gleim, Alex Mehler, er
Abstract	This paper relates to the challenge of morphological tagging and lemmatization in morphologically rich languages by example of German and Latin. We focus on the question what a practitioner can expect when using state-of-the-art solutions out of the box. Moreover, we contrast these with old(er) methods and implementations for POS tagging. We examine to what degree recent efforts in tagger development are reflected by improved accuracies â€• and at what cost, in terms of training and processing time. We also conduct in-domain vs. out-domain evaluation. Out-domain evaluations are particularly insightful because the distribution of the data which is being tagged by a user will typically differ from the distribution on which the tagger has been trained. Furthermore, two lemmatization techniques are evaluated. Finally, we compare pipeline tagging vs. a tagging approach that acknowledges dependencies between inflectional categories.
Tasks	Lemmatization, Morphological Tagging
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1239/
PDF	https://www.aclweb.org/anthology/L16-1239
PWC	https://paperswithcode.com/paper/lemmatization-and-morphological-tagging-in
Repo
Framework

Operational Assessment of Keyword Search on Oral History


Title	Operational Assessment of Keyword Search on Oral History
Authors	Elizabeth Salesky, Jessica Ray, Wade Shen
Abstract	This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.
Tasks	Information Retrieval, Speech Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1049/
PDF	https://www.aclweb.org/anthology/L16-1049
PWC	https://paperswithcode.com/paper/operational-assessment-of-keyword-search-on
Repo
Framework


Title	Data, tools and resources for mining social media drug chatter
Authors	Abeed Sarker, Graciela Gonzalez
Abstract	Social media has emerged into a crucial resource for obtaining population-based signals for various public health monitoring and surveillance tasks, such as pharmacovigilance. There is an abundance of knowledge hidden within social media data, and the volume is growing. Drug-related chatter on social media can include user-generated information that can provide insights into public health problems such as abuse, adverse reactions, long-term effects, and multi-drug interactions. Our objective in this paper is to present to the biomedical natural language processing, data science, and public health communities data sets (annotated and unannotated), tools and resources that we have collected and created from social media. The data we present was collected from Twitter using the generic and brand names of drugs as keywords, along with their common misspellings. Following the collection of the data, annotation guidelines were created over several iterations, which detail important aspects of social media data annotation and can be used by future researchers for developing similar data sets. The annotation guidelines were followed to prepare data sets for text classification, information extraction and normalization. In this paper, we discuss the preparation of these guidelines, outline the data sets prepared, and present an overview of our state-of-the-art systems for data collection, supervised classification, and information extraction. In addition to the development of supervised systems for classification and extraction, we developed and released unlabeled data and language models. We discuss the potential uses of these language models in data mining and the large volumes of unlabeled data from which they were generated. We believe that the summaries and repositories we present here of our data, annotation guidelines, models, and tools will be beneficial to the research community as a single-point entry for all these resources, and will promote further research in this area.
Tasks	Epidemiology, Text Classification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5111/
PDF	https://www.aclweb.org/anthology/W16-5111
PWC	https://paperswithcode.com/paper/data-tools-and-resources-for-mining-social
Repo
Framework

Crowdsourcing Salient Information from News and Tweets


Title	Crowdsourcing Salient Information from News and Tweets
Authors	Oana Inel, Tommaso Caselli, Lora Aroyo
Abstract	The increasing streams of information pose challenges to both humans and machines. On the one hand, humans need to identify relevant information and consume only the information that lies at their interests. On the other hand, machines need to understand the information that is published in online data streams and generate concise and meaningful overviews. We consider events as prime factors to query for information and generate meaningful context. The focus of this paper is to acquire empirical insights for identifying salience features in tweets and news about a target event, i.e., the event of {``}whaling{''}. We first derive a methodology to identify such features by building up a knowledge space of the event enriched with relevant phrases, sentiments and ranked by their novelty. We applied this methodology on tweets and we have performed preliminary work towards adapting it to news articles. Our results show that crowdsourcing text relevance, sentiments and novelty (1) can be a main step in identifying salient information, and (2) provides a deeper and more precise understanding of the data at hand compared to state-of-the-art approaches. \|
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1625/
PDF	https://www.aclweb.org/anthology/L16-1625
PWC	https://paperswithcode.com/paper/crowdsourcing-salient-information-from-news
Repo
Framework

Metrics for Evaluation of Word-level Machine Translation Quality Estimation


Title	Metrics for Evaluation of Word-level Machine Translation Quality Estimation
Authors	Varvara Logacheva, Michal Lukasik, Lucia Specia
Abstract
Tasks	Machine Translation
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2095/
PDF	https://www.aclweb.org/anthology/P16-2095
PWC	https://paperswithcode.com/paper/metrics-for-evaluation-of-word-level-machine
Repo
Framework

Parallel Speech Corpora of Japanese Dialects


Title	Parallel Speech Corpora of Japanese Dialects
Authors	Koichiro Yoshino, Naoki Hirayama, Shinsuke Mori, Fumihiko Takahashi, Katsutoshi Itoyama, Hiroshi G. Okuno
Abstract	Binary file summaries/549.html matches
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1737/
PDF	https://www.aclweb.org/anthology/L16-1737
PWC	https://paperswithcode.com/paper/parallel-speech-corpora-of-japanese-dialects
Repo
Framework

Detection of Text Reuse in French Medical Corpora


Title	Detection of Text Reuse in French Medical Corpora
Authors	Eva D{'}hondt, Cyril Grouin, Aur{'e}lie N{'e}v{'e}ol, Efstathios Stamatatos, Pierre Zweigenbaum
Abstract	Electronic Health Records (EHRs) are increasingly available in modern health care institutions either through the direct creation of electronic documents in hospitals{'} health information systems, or through the digitization of historical paper records. Each EHR creation method yields the need for sophisticated text reuse detection tools in order to prepare the EHR collections for efficient secondary use relying on Natural Language Processing methods. Herein, we address the detection of two types of text reuse in French EHRs: 1) the detection of updated versions of the same document and 2) the detection of document duplicates that still bear surface differences due to OCR or de-identification processing. We present a robust text reuse detection method to automatically identify redundant document pairs in two French EHR corpora that achieves an overall macro F-measure of 0.68 and 0.60, respectively and correctly identifies all redundant document pairs of interest.
Tasks	Optical Character Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-5112/
PDF	https://www.aclweb.org/anthology/W16-5112
PWC	https://paperswithcode.com/paper/detection-of-text-reuse-in-french-medical
Repo
Framework

Learning to Generate Textual Data


Title	Learning to Generate Textual Data
Authors	Guillaume Bouchard, Pontus Stenetorp, Sebastian Riedel
Abstract
Tasks	Recommendation Systems, Transfer Learning
Published	2016-11-01
URL	https://www.aclweb.org/anthology/D16-1167/
PDF	https://www.aclweb.org/anthology/D16-1167
PWC	https://paperswithcode.com/paper/learning-to-generate-textual-data
Repo
Framework

Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes


Title	Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes
Authors	Davaajav Jargalsaikhan, Naoaki Okazaki, Koji Matsuda, Kentaro Inui
Abstract
Tasks	Coreference Resolution, Entity Linking, Information Retrieval, Knowledge Base Population, Question Answering
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-3021/
PDF	https://www.aclweb.org/anthology/P16-3021
PWC	https://paperswithcode.com/paper/building-a-corpus-for-japanese-wikification
Repo
Framework

Suggestion Mining from Opinionated Text


Title	Suggestion Mining from Opinionated Text
Authors	Sapna Negi
Abstract
Tasks	Opinion Mining, Sentence Classification
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-3018/
PDF	https://www.aclweb.org/anthology/P16-3018
PWC	https://paperswithcode.com/paper/suggestion-mining-from-opinionated-text
Repo
Framework

Inspire at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity Alignment based on Answer Set Programming


Title	Inspire at SemEval-2016 Task 2: Interpretable Semantic Textual Similarity Alignment based on Answer Set Programming
Authors	Mishal Kazmi, Peter Sch{"u}ller
Abstract
Tasks	Chunking, Semantic Textual Similarity
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1171/
PDF	https://www.aclweb.org/anthology/S16-1171
PWC	https://paperswithcode.com/paper/inspire-at-semeval-2016-task-2-interpretable
Repo
Framework

Learning Additive Exponential Family Graphical Models via \ell_{2,1}-norm Regularized M-Estimation


Title	Learning Additive Exponential Family Graphical Models via \ell_{2,1}-norm Regularized M-Estimation
Authors	Xiaotong Yuan, Ping Li, Tong Zhang, Qingshan Liu, Guangcan Liu
Abstract	We investigate a subclass of exponential family graphical models of which the sufficient statistics are defined by arbitrary additive forms. We propose two $\ell_{2,1}$-norm regularized maximum likelihood estimators to learn the model parameters from i.i.d. samples. The first one is a joint MLE estimator which estimates all the parameters simultaneously. The second one is a node-wise conditional MLE estimator which estimates the parameters for each node individually. For both estimators, statistical analysis shows that under mild conditions the extra flexibility gained by the additive exponential family models comes at almost no cost of statistical efficiency. A Monte-Carlo approximation method is developed to efficiently optimize the proposed estimators. The advantages of our estimators over Gaussian graphical models and Nonparanormal estimators are demonstrated on synthetic and real data sets.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6106-learning-additive-exponential-family-graphical-models-via-ell_21-norm-regularized-m-estimation
PDF	http://papers.nips.cc/paper/6106-learning-additive-exponential-family-graphical-models-via-ell_21-norm-regularized-m-estimation.pdf
PWC	https://paperswithcode.com/paper/learning-additive-exponential-family
Repo
Framework

TGB at SemEval-2016 Task 5: Multi-Lingual Constraint System for Aspect Based Sentiment Analysis


Title	TGB at SemEval-2016 Task 5: Multi-Lingual Constraint System for Aspect Based Sentiment Analysis
Authors	Fatih Samet {\c{C}}etin, Ezgi Y{\i}ld{\i}r{\i}m, Can {"O}zbey, G{"u}l{\c{s}}en Eryi{\u{g}}it
Abstract
Tasks	Aspect-Based Sentiment Analysis, Opinion Mining, Sentiment Analysis
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1054/
PDF	https://www.aclweb.org/anthology/S16-1054
PWC	https://paperswithcode.com/paper/tgb-at-semeval-2016-task-5-multi-lingual
Repo
Framework