Paper Group NANR 55
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology. Word Re-Embedding via Manifold Dimensionality Retention. Improving the Character Ngram Model for the DSL Task with BM25 Weighting and Less Frequently Used Feature Sets. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2 …
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology
Title | Proceedings of the Workshop Human-Informed Translation and Interpreting Technology |
Authors | Irina Temnikova, Constantin Orasan, Gloria Corpas Pastor, Stephan Vogel |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/papers/W17-7900/w17-7900 |
https://www.aclweb.org/anthology/W17-7900 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-workshop-human-informed |
Repo | |
Framework | |
Word Re-Embedding via Manifold Dimensionality Retention
Title | Word Re-Embedding via Manifold Dimensionality Retention |
Authors | Souleiman Hasan, Edward Curry |
Abstract | Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words co-occurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 - 5.0{%} points depending on the original space. |
Tasks | Machine Translation, Named Entity Recognition, Part-Of-Speech Tagging, Word Embeddings |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1033/ |
https://www.aclweb.org/anthology/D17-1033 | |
PWC | https://paperswithcode.com/paper/word-re-embedding-via-manifold-dimensionality |
Repo | |
Framework | |
Improving the Character Ngram Model for the DSL Task with BM25 Weighting and Less Frequently Used Feature Sets
Title | Improving the Character Ngram Model for the DSL Task with BM25 Weighting and Less Frequently Used Feature Sets |
Authors | Yves Bestgen |
Abstract | This paper describes the system developed by the Centre for English Corpus Linguistics (CECL) to discriminating similar languages, language varieties and dialects. Based on a SVM with character and POStag n-grams as features and the BM25 weighting scheme, it achieved 92.7{%} accuracy in the Discriminating between Similar Languages (DSL) task, ranking first among eleven systems but with a lead over the next three teams of only 0.2{%}. A simpler version of the system ranked second in the German Dialect Identification (GDI) task thanks to several ad hoc postprocessing steps. Complementary analyses carried out by a cross-validation procedure suggest that the BM25 weighting scheme could be competitive in this type of tasks, at least in comparison with the sublinear TF-IDF. POStag n-grams also improved the system performance. |
Tasks | Language Identification |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1214/ |
https://www.aclweb.org/anthology/W17-1214 | |
PWC | https://paperswithcode.com/paper/improving-the-character-ngram-model-for-the |
Repo | |
Framework | |
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Title | Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers) |
Authors | |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-2000/ |
https://www.aclweb.org/anthology/I17-2000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-eighth-international-joint-1 |
Repo | |
Framework | |
Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging
Title | Towards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging |
Authors | Yllias Chali, Moin Tanvee, Mir Tafseer Nayeem |
Abstract | We propose a submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary. We design monotone and submodular functions which allow us to apply an efficient and scalable greedy algorithm to obtain informative and well-covered summaries. In addition, we integrate two abstraction-based methods namely sentence compression and merging for generating an abstractive sentence set. We design our summarization models for both generic and query-focused summarization. Experimental results on DUC-2004 and DUC-2007 datasets show that our generic and query-focused summarizers have outperformed the state-of-the-art summarization systems in terms of ROUGE-1 and ROUGE-2 recall and F-measure. |
Tasks | Abstractive Text Summarization, Document Summarization, Multi-Document Summarization, Sentence Compression, Text Generation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-2071/ |
https://www.aclweb.org/anthology/I17-2071 | |
PWC | https://paperswithcode.com/paper/towards-abstractive-multi-document |
Repo | |
Framework | |
A corpus-based study on synesthesia in Korean ordinary language
Title | A corpus-based study on synesthesia in Korean ordinary language |
Authors | Charmhun Jo |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1034/ |
https://www.aclweb.org/anthology/Y17-1034 | |
PWC | https://paperswithcode.com/paper/a-corpus-based-study-on-synesthesia-in-korean |
Repo | |
Framework | |
Intrusions of Masbate Lexicon in Local Bilingual Tabloid
Title | Intrusions of Masbate Lexicon in Local Bilingual Tabloid |
Authors | Cecilia Genuino, Romualdo Mabuan |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1035/ |
https://www.aclweb.org/anthology/Y17-1035 | |
PWC | https://paperswithcode.com/paper/intrusions-of-masbate-lexicon-in-local |
Repo | |
Framework | |
Tweet Extraction for News Production Considering Unreality
Title | Tweet Extraction for News Production Considering Unreality |
Authors | Yuka Takei, Taro Miyazaki, Ichiro Yamada, Jun Goto |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1049/ |
https://www.aclweb.org/anthology/Y17-1049 | |
PWC | https://paperswithcode.com/paper/tweet-extraction-for-news-production |
Repo | |
Framework | |
A Dataset and Classifier for Recognizing Social Media English
Title | A Dataset and Classifier for Recognizing Social Media English |
Authors | Su Lin Blodgett, Johnny Wei, Brendan O{'}Connor |
Abstract | While language identification works well on standard texts, it performs much worse on social media language, in particular dialectal language{—}even for English. First, to support work on English language identification, we contribute a new dataset of tweets annotated for English versus non-English, with attention to ambiguity, code-switching, and automatic generation issues. It is randomly sampled from all public messages, avoiding biases towards pre-existing language classifiers. Second, we find that a demographic language model{—}which identifies messages with language similar to that used by several U.S. ethnic populations on Twitter{—}can be used to improve English language identification performance when combined with a traditional supervised language identifier. It increases recall with almost no loss of precision, including, surprisingly, for English messages written by non-U.S. authors. Our dataset and identifier ensemble are available online. |
Tasks | Language Identification, Language Modelling |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4408/ |
https://www.aclweb.org/anthology/W17-4408 | |
PWC | https://paperswithcode.com/paper/a-dataset-and-classifier-for-recognizing |
Repo | |
Framework | |
完全基於類神經網路之語音合成系統初步研究 (A Preliminary Study on Fully Neural Network-based Speech Synthesis System) [In Chinese]
Title | 完全基於類神經網路之語音合成系統初步研究 (A Preliminary Study on Fully Neural Network-based Speech Synthesis System) [In Chinese] |
Authors | Shu-Han Liao, Ya-Bo Chai, Yuan-Fu Liao |
Abstract | |
Tasks | Speech Synthesis |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/O17-1021/ |
https://www.aclweb.org/anthology/O17-1021 | |
PWC | https://paperswithcode.com/paper/aa-ao14eccc2e-a1eae3ac3cac-c-a-preliminary |
Repo | |
Framework | |
Automatic Morpheme Segmentation and Labeling in Universal Dependencies Resources
Title | Automatic Morpheme Segmentation and Labeling in Universal Dependencies Resources |
Authors | Miikka Silfverberg, Mans Hulden |
Abstract | |
Tasks | Semantic Textual Similarity, Word Embeddings |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0418/ |
https://www.aclweb.org/anthology/W17-0418 | |
PWC | https://paperswithcode.com/paper/automatic-morpheme-segmentation-and-labeling |
Repo | |
Framework | |
Role-Preserving Redaction of Medical Records to Enable Ontology-Driven Processing
Title | Role-Preserving Redaction of Medical Records to Enable Ontology-Driven Processing |
Authors | Seth Polsley, Atif Tahir, Muppala Raju, Akintayo Akinleye, Duane Steward |
Abstract | Electronic medical records (EMR) have largely replaced hand-written patient files in healthcare. The growing pool of EMR data presents a significant resource in medical research, but the U.S. Health Insurance Portability and Accountability Act (HIPAA) mandates redacting medical records before performing any analysis on the same. This process complicates obtaining medical data and can remove much useful information from the record. As part of a larger project involving ontology-driven medical processing, we employ a method of recognizing protected health information (PHI) that maps to ontological terms. We then use the relationships defined in the ontology to redact medical texts so that roles and semantics of terms are retained without compromising anonymity. The method is evaluated by clinical experts on several hundred medical documents, achieving up to a 98.8{%} f-score, and has already shown promise for retaining semantic information in later processing. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2324/ |
https://www.aclweb.org/anthology/W17-2324 | |
PWC | https://paperswithcode.com/paper/role-preserving-redaction-of-medical-records |
Repo | |
Framework | |
Deep Neural Network based system for solving Arithmetic Word problems
Title | Deep Neural Network based system for solving Arithmetic Word problems |
Authors | Purvanshi Mehta, Pruthwik Mishra, Vinayak Athavale, Manish Shrivastava, Dipti Sharma |
Abstract | This paper presents DILTON a system which solves simple arithmetic word problems. DILTON uses a Deep Neural based model to solve math word problems. DILTON divides the question into two parts - worldstate and query. The worldstate and the query are processed separately in two different networks and finally, the networks are merged to predict the final operation. We report the first deep learning approach for the prediction of operation between two numbers. DILTON learns to predict operations with 88.81{%} accuracy in a corpus of primary school questions. |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-3017/ |
https://www.aclweb.org/anthology/I17-3017 | |
PWC | https://paperswithcode.com/paper/deep-neural-network-based-system-for-solving |
Repo | |
Framework | |
Non-parametric Structured Output Networks
Title | Non-parametric Structured Output Networks |
Authors | Andreas Lehrmann, Leonid Sigal |
Abstract | Deep neural networks (DNNs) and probabilistic graphical models (PGMs) are the two main tools for statistical modeling. While DNNs provide the ability to model rich and complex relationships between input and output variables, PGMs provide the ability to encode dependencies among the output variables themselves. End-to-end training methods for models with structured graphical dependencies on top of neural predictions have recently emerged as a principled way of combining these two paradigms. While these models have proven to be powerful in discriminative settings with discrete outputs, extensions to structured continuous spaces, as well as performing efficient inference in these spaces, are lacking. We propose non-parametric structured output networks (NSON), a modular approach that cleanly separates a non-parametric, structured posterior representation from a discriminative inference scheme but allows joint end-to-end training of both components. Our experiments evaluate the ability of NSONs to capture structured posterior densities (modeling) and to compute complex statistics of those densities (inference). We compare our model to output spaces of varying expressiveness and popular variational and sampling-based inference algorithms. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7009-non-parametric-structured-output-networks |
http://papers.nips.cc/paper/7009-non-parametric-structured-output-networks.pdf | |
PWC | https://paperswithcode.com/paper/non-parametric-structured-output-networks |
Repo | |
Framework | |
Author-aware Aspect Topic Sentiment Model to Retrieve Supporting Opinions from Reviews
Title | Author-aware Aspect Topic Sentiment Model to Retrieve Supporting Opinions from Reviews |
Authors | Lahari Poddar, Wynne Hsu, Mong Li Lee |
Abstract | User generated content about products and services in the form of reviews are often diverse and even contradictory. This makes it difficult for users to know if an opinion in a review is prevalent or biased. We study the problem of searching for supporting opinions in the context of reviews. We propose a framework called SURF, that first identifies opinions expressed in a review, and then finds similar opinions from other reviews. We design a novel probabilistic graphical model that captures opinions as a combination of aspect, topic and sentiment dimensions, takes into account the preferences of individual authors, as well as the quality of the entity under review, and encodes the flow of thoughts in a review by constraining the aspect distribution dynamically among successive review segments. We derive a similarity measure that considers both lexical and semantic similarity to find supporting opinions. Experiments on TripAdvisor hotel reviews and Yelp restaurant reviews show that our model outperforms existing methods for modeling opinions, and the proposed framework is effective in finding supporting opinions. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1049/ |
https://www.aclweb.org/anthology/D17-1049 | |
PWC | https://paperswithcode.com/paper/author-aware-aspect-topic-sentiment-model-to |
Repo | |
Framework | |