Paper Group NANR 202
Classifying ASR Transcriptions According to Arabic Dialect. Correcting Errors in a Treebank Based on Tree Mining. UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports. Dealing with word-internal modification and spelling variation …
Classifying ASR Transcriptions According to Arabic Dialect
Title | Classifying ASR Transcriptions According to Arabic Dialect |
Authors | Abualsoud Hanani, Aziz Qaroush, Stephen Taylor |
Abstract | We describe several systems for identifying short samples of Arabic dialects. The systems were prepared for the shared task of the 2016 DSL Workshop. Our best system, an SVM using character tri-gram features, achieved an accuracy on the test data for the task of 0.4279, compared to a baseline of 0.20 for chance guesses or 0.2279 if we had always chosen the same most frequent class in the test set. This compares with the results of the team with the best weighted F1 score, which was an accuracy of 0.5117. The team entries seem to fall into cohorts, with all the teams in a cohort within a standard-deviation of each other, and our three entries are in the third cohort, which is about seven standard deviations from the top. |
Tasks | Language Modelling |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4817/ |
https://www.aclweb.org/anthology/W16-4817 | |
PWC | https://paperswithcode.com/paper/classifying-asr-transcriptions-according-to |
Repo | |
Framework | |
Correcting Errors in a Treebank Based on Tree Mining
Title | Correcting Errors in a Treebank Based on Tree Mining |
Authors | Kanta Suzuki, Yoshihide Kato, Shigeki Matsubara |
Abstract | This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that an infrequent pattern which can be transformed to a frequent one is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1244/ |
https://www.aclweb.org/anthology/L16-1244 | |
PWC | https://paperswithcode.com/paper/correcting-errors-in-a-treebank-based-on-tree |
Repo | |
Framework | |
UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports
Title | UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports |
Authors | Peng Li, Heng Huang |
Abstract | |
Tasks | Information Retrieval, Language Modelling, Machine Translation, Named Entity Recognition, Paraphrase Identification, Question Answering, Representation Learning, Semantic Role Labeling |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1197/ |
https://www.aclweb.org/anthology/S16-1197 | |
PWC | https://paperswithcode.com/paper/uta-dlnlp-at-semeval-2016-task-12-deep |
Repo | |
Framework | |
Dealing with word-internal modification and spelling variation in data-driven lemmatization
Title | Dealing with word-internal modification and spelling variation in data-driven lemmatization |
Authors | Fabian Barteld, Ingrid Schr{"o}der, Heike Zinsmeister |
Abstract | |
Tasks | Information Retrieval, Lemmatization |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2106/ |
https://www.aclweb.org/anthology/W16-2106 | |
PWC | https://paperswithcode.com/paper/dealing-with-word-internal-modification-and |
Repo | |
Framework | |
EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis
Title | EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis |
Authors | David Vilares, Miguel A. Alonso, Carlos G{'o}mez-Rodr{'\i}guez |
Abstract | Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community. |
Tasks | Sentiment Analysis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1655/ |
https://www.aclweb.org/anthology/L16-1655 | |
PWC | https://paperswithcode.com/paper/en-es-cs-an-english-spanish-code-switching |
Repo | |
Framework | |
WordForce: Visualizing Controversial Words in Debates
Title | WordForce: Visualizing Controversial Words in Debates |
Authors | Wei-Fan Chen, Fang-Yu Lin, Lun-Wei Ku |
Abstract | This paper presents WordForce, a system powered by the state of the art neural network model to visualize the learned user-dependent word embeddings from each post according to the post content and its engaged users. It generates the scatter plots to show the force of a word, i.e., whether the semantics of word embeddings from posts of different stances are clearly separated from the aspect of this controversial word. In addition, WordForce provides the dispersion and the distance of word embeddings from posts of different stance groups, and proposes the most controversial words accordingly to show clues to what people argue about in a debate. |
Tasks | Sentiment Analysis, Word Embeddings |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2057/ |
https://www.aclweb.org/anthology/C16-2057 | |
PWC | https://paperswithcode.com/paper/wordforce-visualizing-controversial-words-in |
Repo | |
Framework | |
Classifying Emotions in Customer Support Dialogues in Social Media
Title | Classifying Emotions in Customer Support Dialogues in Social Media |
Authors | Jonathan Herzig, Guy Feigenblat, Michal Shmueli-Scheuer, David Konopnicki, Anat Rafaeli, Daniel Altman, David Spivak |
Abstract | |
Tasks | |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3609/ |
https://www.aclweb.org/anthology/W16-3609 | |
PWC | https://paperswithcode.com/paper/classifying-emotions-in-customer-support |
Repo | |
Framework | |
Universal Dependencies for Persian
Title | Universal Dependencies for Persian |
Authors | Mojgan Seraji, Filip Ginter, Joakim Nivre |
Abstract | The Persian Universal Dependency Treebank (Persian UD) is a recent effort of treebanking Persian with Universal Dependencies (UD), an ongoing project that designs unified and cross-linguistically valid grammatical representations including part-of-speech tags, morphological features, and dependency relations. The Persian UD is the converted version of the Uppsala Persian Dependency Treebank (UPDT) to the universal dependencies framework and consists of nearly 6,000 sentences and 152,871 word tokens with an average sentence length of 25 words. In addition to the universal dependencies syntactic annotation guidelines, the two treebanks differ in tokenization. All words containing unsegmented clitics (pronominal and copula clitics) annotated with complex labels in the UPDT have been separated from the clitics and appear with distinct labels in the Persian UD. The treebank has its original syntactic annotation scheme based on Stanford Typed Dependencies. In this paper, we present the approaches taken in the development of the Persian UD. |
Tasks | Tokenization |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1374/ |
https://www.aclweb.org/anthology/L16-1374 | |
PWC | https://paperswithcode.com/paper/universal-dependencies-for-persian |
Repo | |
Framework | |
Retrieval Term Prediction Using Deep Learning Methods
Title | Retrieval Term Prediction Using Deep Learning Methods |
Authors | Qing Ma, Ibuki Tanigawa, Masaki Murata |
Abstract | |
Tasks | Chunking, Denoising, Information Retrieval, Machine Translation, Named Entity Recognition, Part-Of-Speech Tagging, Semantic Role Labeling, Speech Recognition |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/Y16-3001/ |
https://www.aclweb.org/anthology/Y16-3001 | |
PWC | https://paperswithcode.com/paper/retrieval-term-prediction-using-deep-learning |
Repo | |
Framework | |
Empirical comparison of dependency conversions for RST discourse trees
Title | Empirical comparison of dependency conversions for RST discourse trees |
Authors | Katsuhiko Hayashi, Tsutomu Hirao, Masaaki Nagata |
Abstract | |
Tasks | Text Summarization |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3616/ |
https://www.aclweb.org/anthology/W16-3616 | |
PWC | https://paperswithcode.com/paper/empirical-comparison-of-dependency |
Repo | |
Framework | |
Extracting PDTB Discourse Relations from Student Essays
Title | Extracting PDTB Discourse Relations from Student Essays |
Authors | Kate Forbes-Riley, Fan Zhang, Diane Litman |
Abstract | |
Tasks | |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-3615/ |
https://www.aclweb.org/anthology/W16-3615 | |
PWC | https://paperswithcode.com/paper/extracting-pdtb-discourse-relations-from |
Repo | |
Framework | |
Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features
Title | Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features |
Authors | Maximilian K{"o}per, Sabine Schulte im Walde |
Abstract | |
Tasks | Machine Translation, Word Sense Disambiguation |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2042/ |
https://www.aclweb.org/anthology/P16-2042 | |
PWC | https://paperswithcode.com/paper/automatic-semantic-classification-of-german |
Repo | |
Framework | |
Automatic parsing as an efficient pre-annotation tool for historical texts
Title | Automatic parsing as an efficient pre-annotation tool for historical texts |
Authors | Hanne Martine Eckhoff, Aleks Berdi{\v{c}}evskis, rs |
Abstract | Historical treebanks tend to be manually annotated, which is not surprising, since state-of-the-art parsers are not accurate enough to ensure high-quality annotation for historical texts. We test whether automatic parsing can be an efficient pre-annotation tool for Old East Slavic texts. We use the TOROT treebank from the PROIEL treebank family. We convert the PROIEL format to the CONLL format and use MaltParser to create syntactic pre-annotation. Using the most conservative evaluation method, which takes into account PROIEL-specific features, MaltParser by itself yields 0.845 unlabelled attachment score, 0.779 labelled attachment score and 0.741 secondary dependency accuracy (note, though, that the test set comes from a relatively simple genre and contains rather short sentences). Experiments with human annotators show that preparsing, if limited to sentences where no changes to word or sentence boundaries are required, increases their annotation rate. For experienced annotators, the speed gain varies from 5.80{%} to 16.57{%}, for inexperienced annotators from 14.61{%} to 32.17{%} (using conservative estimates). There are no strong reliable differences in the annotation accuracy, which means that there is no reason to suspect that using preparsing might lower the final annotation quality. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4009/ |
https://www.aclweb.org/anthology/W16-4009 | |
PWC | https://paperswithcode.com/paper/automatic-parsing-as-an-efficient-pre |
Repo | |
Framework | |
Fast and Easy Short Answer Grading with High Accuracy
Title | Fast and Easy Short Answer Grading with High Accuracy |
Authors | Md Arafat Sultan, Cristobal Salazar, Tamara Sumner |
Abstract | |
Tasks | Semantic Textual Similarity |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1123/ |
https://www.aclweb.org/anthology/N16-1123 | |
PWC | https://paperswithcode.com/paper/fast-and-easy-short-answer-grading-with-high |
Repo | |
Framework | |
Semantic classifications for detection of verb metaphors
Title | Semantic classifications for detection of verb metaphors |
Authors | Beata Beigman Klebanov, Chee Wee Leong, E. Dario Gutierrez, Ekaterina Shutova, Michael Flor |
Abstract | |
Tasks | Topic Models |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2017/ |
https://www.aclweb.org/anthology/P16-2017 | |
PWC | https://paperswithcode.com/paper/semantic-classifications-for-detection-of |
Repo | |
Framework | |