Paper Group NANR 128
Improving POS Tagging in Old Spanish Using TEITOK. Persian-Spanish Low-Resource Statistical Machine Translation Through English as Pivot Language. The Labeled Segmentation of Printed Books. Detecting spelling variants in non-standard texts. Estonian Copular and Existential Constructions as an UD Annotation Problem. Churn Identification in Microblog …
Improving POS Tagging in Old Spanish Using TEITOK
Title | Improving POS Tagging in Old Spanish Using TEITOK |
Authors | Maarten Janssen, Josep Ausensi, Josep Fontana |
Abstract | |
Tasks | |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0502/ |
https://www.aclweb.org/anthology/W17-0502 | |
PWC | https://paperswithcode.com/paper/improving-pos-tagging-in-old-spanish-using |
Repo | |
Framework | |
Persian-Spanish Low-Resource Statistical Machine Translation Through English as Pivot Language
Title | Persian-Spanish Low-Resource Statistical Machine Translation Through English as Pivot Language |
Authors | Benyamin Ahmadnia, Javier Serrano, Gholamreza Haffari |
Abstract | This paper is an attempt to exclusively focus on investigating the pivot language technique in which a bridging language is utilized to increase the quality of the Persian-Spanish low-resource Statistical Machine Translation (SMT). In this case, English is used as the bridging language, and the Persian-English SMT is combined with the English-Spanish one, where the relatively large corpora of each may be used in support of the Persian-Spanish pairing. Our results indicate that the pivot language technique outperforms the direct SMT processes currently in use between Persian and Spanish. Furthermore, we investigate the sentence translation pivot strategy and the phrase translation in turn, and demonstrate that, in the context of the Persian-Spanish SMT system, the phrase-level pivoting outperforms the sentence-level pivoting. Finally we suggest a method called combination model in which the standard direct model and the best triangulation pivoting model are blended in order to reach a high-quality translation. |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1004/ |
https://doi.org/10.26615/978-954-452-049-6_004 | |
PWC | https://paperswithcode.com/paper/persian-spanish-low-resource-statistical |
Repo | |
Framework | |
The Labeled Segmentation of Printed Books
Title | The Labeled Segmentation of Printed Books |
Authors | Lara McConnaughey, Jennifer Dai, David Bamman |
Abstract | We introduce the task of book structure labeling: segmenting and assigning a fixed category (such as Table of Contents, Preface, Index) to the document structure of printed books. We manually annotate the page-level structural categories for a large dataset totaling 294,816 pages in 1,055 books evenly sampled from 1750-1922, and present empirical results comparing the performance of several classes of models. The best-performing model, a bidirectional LSTM with rich features, achieves an overall accuracy of 95.8 and a class-balanced macro F-score of 71.4. |
Tasks | Optical Character Recognition |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1077/ |
https://www.aclweb.org/anthology/D17-1077 | |
PWC | https://paperswithcode.com/paper/the-labeled-segmentation-of-printed-books |
Repo | |
Framework | |
Detecting spelling variants in non-standard texts
Title | Detecting spelling variants in non-standard texts |
Authors | Fabian Barteld |
Abstract | Spelling variation in non-standard language, e.g. computer-mediated communication and historical texts, is usually treated as a deviation from a standard spelling, e.g. 2mr as an non-standard spelling for tomorrow. Consequently, in normalization {–} the standard approach of dealing with spelling variation {–} so-called non-standard words are mapped to their corresponding standard words. However, there is not always a corresponding standard word. This can be the case for single types (like emoticons in computer-mediated communication) or a complete language, e.g. texts from historical languages that did not develop to a standard variety. The approach presented in this thesis proposal deals with spelling variation in absence of reference to a standard. The task is to detect pairs of types that are variants of the same morphological word. An approach for spelling-variant detection is presented, where pairs of potential spelling variants are generated with Levenshtein distance and subsequently filtered by supervised machine learning. The approach is evaluated on historical Low German texts. Finally, further perspectives are discussed. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-4002/ |
https://www.aclweb.org/anthology/E17-4002 | |
PWC | https://paperswithcode.com/paper/detecting-spelling-variants-in-non-standard |
Repo | |
Framework | |
Estonian Copular and Existential Constructions as an UD Annotation Problem
Title | Estonian Copular and Existential Constructions as an UD Annotation Problem |
Authors | Kadri Muischnek, Kaili M{"u}{"u}risep |
Abstract | |
Tasks | |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0410/ |
https://www.aclweb.org/anthology/W17-0410 | |
PWC | https://paperswithcode.com/paper/estonian-copular-and-existential |
Repo | |
Framework | |
Churn Identification in Microblogs using Convolutional Neural Networks with Structured Logical Knowledge
Title | Churn Identification in Microblogs using Convolutional Neural Networks with Structured Logical Knowledge |
Authors | Mourad Gridach, Hatem Haddad, Hala Mulki |
Abstract | For brands, gaining new customer is more expensive than keeping an existing one. Therefore, the ability to keep customers in a brand is becoming more challenging these days. Churn happens when a customer leaves a brand to another competitor. Most of the previous work considers the problem of churn prediction using the Call Detail Records (CDRs). In this paper, we use micro-posts to classify customers into churny or non-churny. We explore the power of convolutional neural networks (CNNs) since they achieved state-of-the-art in various computer vision and NLP applications. However, the robustness of end-to-end models has some limitations such as the availability of a large amount of labeled data and uninterpretability of these models. We investigate the use of CNNs augmented with structured logic rules to overcome or reduce this issue. We developed our system called Churn{_}teacher by using an iterative distillation method that transfers the knowledge, extracted using just the combination of three logic rules, directly into the weight of the DNNs. Furthermore, we used weight normalization to speed up training our convolutional neural networks. Experimental results showed that with just these three rules, we were able to get state-of-the-art on publicly available Twitter dataset about three Telecom brands. |
Tasks | Language Modelling, Machine Translation, Sentiment Analysis, Speech Recognition |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4403/ |
https://www.aclweb.org/anthology/W17-4403 | |
PWC | https://paperswithcode.com/paper/churn-identification-in-microblogs-using |
Repo | |
Framework | |
Bootstrapping for Numerical Open IE
Title | Bootstrapping for Numerical Open IE |
Authors | Swarnadeep Saha, Harinder Pal, {Mausam} |
Abstract | We design and release BONIE, the first open numerical relation extractor, for extracting Open IE tuples where one of the arguments is a number or a quantity-unit phrase. BONIE uses bootstrapping to learn the specific dependency patterns that express numerical relations in a sentence. BONIE{'}s novelty lies in task-specific customizations, such as inferring implicit relations, which are clear due to context such as units (for e.g., {}square kilometers{'} suggests area, even if the word { }area{'} is missing in the sentence). BONIE obtains 1.5x yield and 15 point precision gain on numerical facts over a state-of-the-art Open IE system. |
Tasks | Open Information Extraction |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2050/ |
https://www.aclweb.org/anthology/P17-2050 | |
PWC | https://paperswithcode.com/paper/bootstrapping-for-numerical-open-ie |
Repo | |
Framework | |
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts
Title | Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts |
Authors | |
Abstract | |
Tasks | |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-5000/ |
https://www.aclweb.org/anthology/P17-5000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-acl-2017-tutorial-abstracts |
Repo | |
Framework | |
Inter-Annotator Agreement in Sentiment Analysis: Machine Learning Perspective
Title | Inter-Annotator Agreement in Sentiment Analysis: Machine Learning Perspective |
Authors | Victoria Bobicev, Marina Sokolova |
Abstract | Manual text annotation is an essential part of Big Text analytics. Although annotators work with limited parts of data sets, their results are extrapolated by automated text classification and affect the final classification results. Reliability of annotations and adequacy of assigned labels are especially important in the case of sentiment annotations. In the current study we examine inter-annotator agreement in multi-class, multi-label sentiment annotation of messages. We used several annotation agreement measures, as well as statistical analysis and Machine Learning to assess the resulting annotations. |
Tasks | Sentiment Analysis, Text Classification |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1015/ |
https://doi.org/10.26615/978-954-452-049-6_015 | |
PWC | https://paperswithcode.com/paper/inter-annotator-agreement-in-sentiment |
Repo | |
Framework | |
A Crowdsourcing Approach for Annotating Causal Relation Instances in Wikipedia
Title | A Crowdsourcing Approach for Annotating Causal Relation Instances in Wikipedia |
Authors | Kazuaki Hanawa, Akira Sasaki, Naoaki Okazaki, Kentaro Inui |
Abstract | |
Tasks | Named Entity Recognition, Question Answering, Stance Detection |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1045/ |
https://www.aclweb.org/anthology/Y17-1045 | |
PWC | https://paperswithcode.com/paper/a-crowdsourcing-approach-for-annotating |
Repo | |
Framework | |
Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus
Title | Extracting a Lexicon of Discourse Connectives in Czech from an Annotated Corpus |
Authors | Pavl{'\i}na Synkov{'a}, Magdal{'e}na Rysov{'a}, Lucie Pol{'a}kov{'a}, Ji{\v{r}}{'\i} M{'\i}rovsk{'y} |
Abstract | |
Tasks | Machine Translation, Text Generation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1032/ |
https://www.aclweb.org/anthology/Y17-1032 | |
PWC | https://paperswithcode.com/paper/extracting-a-lexicon-of-discourse-connectives |
Repo | |
Framework | |
Standard and nonstandard lexicon in aviation English: A corpus linguistic study
Title | Standard and nonstandard lexicon in aviation English: A corpus linguistic study |
Authors | Ramsey Ferrer, Jollene Empinado, Eloisa Marie Calico, Jan Yharie Floro |
Abstract | |
Tasks | |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/Y17-1010/ |
https://www.aclweb.org/anthology/Y17-1010 | |
PWC | https://paperswithcode.com/paper/standard-and-nonstandard-lexicon-in-aviation |
Repo | |
Framework | |
Constructing narrative using a generative model and continuous action policies
Title | Constructing narrative using a generative model and continuous action policies |
Authors | Emmanouil Theofanis Chourdakis, Joshua Reiss |
Abstract | |
Tasks | Paraphrase Identification, Q-Learning, Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3905/ |
https://www.aclweb.org/anthology/W17-3905 | |
PWC | https://paperswithcode.com/paper/constructing-narrative-using-a-generative |
Repo | |
Framework | |
On the order of Words in Italian: a Study on Genre vs Complexity
Title | On the order of Words in Italian: a Study on Genre vs Complexity |
Authors | Dominique Brunato, Felice Dell{'}Orletta |
Abstract | |
Tasks | Dependency Parsing |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6505/ |
https://www.aclweb.org/anthology/W17-6505 | |
PWC | https://paperswithcode.com/paper/on-the-order-of-words-in-italian-a-study-on |
Repo | |
Framework | |
A Dictionary-Based Comparison of Autobiographies by People and Murderous Monsters
Title | A Dictionary-Based Comparison of Autobiographies by People and Murderous Monsters |
Authors | Micah Iserman, Molly Ireland |
Abstract | |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/papers/W17-3109/w17-3109 |
https://www.aclweb.org/anthology/W17-3109 | |
PWC | https://paperswithcode.com/paper/a-dictionary-based-comparison-of |
Repo | |
Framework | |