January 24, 2020

2834 words 14 mins read

Paper Group NANR 155

The Strength of the Weakest Supervision: Topic Classification Using Class Labels. Detecting Clitics Related Orthographic Errors in Turkish. DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning. An Alternative Deep Feature Approach to Line Level Keyword Spotting. The TALP-UPC Machine Translation Systems for WMT …

The Strength of the Weakest Supervision: Topic Classification Using Class Labels


Title	The Strength of the Weakest Supervision: Topic Classification Using Class Labels
Authors	Jiatong Li, Kai Zheng, Hua Xu, Qiaozhu Mei, Yue Wang
Abstract	When developing topic classifiers for real-world applications, we begin by defining a set of meaningful topic labels. Ideally, an intelligent classifier can understand these labels right away and start classifying documents. Indeed, a human can confidently tell if an article is about science, politics, sports, or none of the above, after knowing just the class labels. We study the problem of training an initial topic classifier using only class labels. We investigate existing techniques for solving this problem and propose a simple but effective approach. Experiments on a variety of topic classification data sets show that learning from class labels can save significant initial labeling effort, essentially providing a {''}free{''} warm start to the topic classifier.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-3004/
PDF	https://www.aclweb.org/anthology/N19-3004
PWC	https://paperswithcode.com/paper/the-strength-of-the-weakest-supervision-topic
Repo
Framework


Title	Detecting Clitics Related Orthographic Errors in Turkish
Authors	Ugurcan Arikan, Onur Gungor, Suzan Uskudarli
Abstract	For the spell correction task, vocabulary based methods have been replaced with methods that take morphological and grammar rules into account. However, such tools are fairly immature, and, worse, non-existent for many low resource languages. Checking only if a word is well-formed with respect to the morphological rules of a language may produce false negatives due to the ambiguity resulting from the presence of numerous homophonic words. In this work, we propose an approach to detect and correct the {``}de/da{''} clitic errors in Turkish text. Our model is a neural sequence tagger trained with a synthetically constructed dataset consisting of positive and negative samples. The model{'}s performance with this dataset is presented according to different word embedding configurations. The model achieved an F1 score of 86.67{%} on a synthetically constructed dataset. We also compared the model{'}s performance on a manually curated dataset of challenging samples that proved superior to other spelling correctors with 71{%} accuracy compared to the second-best (Google Docs) with and accuracy of 34{%}. \|
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1009/
PDF	https://www.aclweb.org/anthology/R19-1009
PWC	https://paperswithcode.com/paper/detecting-clitics-related-orthographic-errors
Repo
Framework

DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning


Title	DIVINE: A Generative Adversarial Imitation Learning Framework for Knowledge Graph Reasoning
Authors	Ruiping Li, Xiang Cheng
Abstract	Knowledge graphs (KGs) often suffer from sparseness and incompleteness. Knowledge graph reasoning provides a feasible way to address such problems. Recent studies on knowledge graph reasoning have shown that reinforcement learning (RL) based methods can provide state-of-the-art performance. However, existing RL-based methods require numerous trials for path-finding and rely heavily on meticulous reward engineering to fit specific dataset, which is inefficient and laborious to apply to fast-evolving KGs. To this end, in this paper, we present DIVINE, a novel plug-and-play framework based on generative adversarial imitation learning for enhancing existing RL-based methods. DIVINE guides the path-finding process, and learns reasoning policies and reward functions self-adaptively through imitating the demonstrations automatically sampled from KGs. Experimental results on two benchmark datasets show that our framework improves the performance of existing RL-based methods while eliminating extra reward engineering.
Tasks	Imitation Learning, Knowledge Graphs
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-1266/
PDF	https://www.aclweb.org/anthology/D19-1266
PWC	https://paperswithcode.com/paper/divine-a-generative-adversarial-imitation
Repo
Framework

An Alternative Deep Feature Approach to Line Level Keyword Spotting


Title	An Alternative Deep Feature Approach to Line Level Keyword Spotting
Authors	George Retsinas, Georgios Louloudis, Nikolaos Stamatopoulos, Giorgos Sfikas, Basilis Gatos
Abstract	Keyword spotting (KWS) is defined as the problem of detecting all instances of a given word, provided by the user either as a query word image (Query-by-Example, QbE) or a query word string (Query-by-String, QbS) in a body of digitized documents. Keyword detection is typically preceded by a preprocessing step where the text is segmented into text lines (line-level KWS). Methods following this paradigm are monopolized by test-time computationally expensive handwritten text recognition (HTR)-based approaches; furthermore, they typically cannot handle image queries (QbE). In this work, we propose a time and storage-efficient, deep feature-based approach that enables both the image and textual search options. Three distinct components, all modeled as neural networks, are combined: normalization, feature extraction and representation of image and textual input into a common space. These components, even if designed on word level image representations, collaborate in order to achieve an efficient line level keyword spotting system. The experimental results indicate that the proposed system is on par with state-of-the-art KWS methods.
Tasks	Keyword Spotting
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Retsinas_An_Alternative_Deep_Feature_Approach_to_Line_Level_Keyword_Spotting_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Retsinas_An_Alternative_Deep_Feature_Approach_to_Line_Level_Keyword_Spotting_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/an-alternative-deep-feature-approach-to-line
Repo
Framework

The TALP-UPC Machine Translation Systems for WMT19 News Translation Task: Pivoting Techniques for Low Resource MT


Title	The TALP-UPC Machine Translation Systems for WMT19 News Translation Task: Pivoting Techniques for Low Resource MT
Authors	Noe Casas, Jos{'e} A. R. Fonollosa, Carlos Escolano, Christine Basta, Marta R. Costa-juss{`a}
Abstract	In this article, we describe the TALP-UPC research group participation in the WMT19 news translation shared task for Kazakh-English. Given the low amount of parallel training data, we resort to using Russian as pivot language, training subword-based statistical translation systems for Russian-Kazakh and Russian-English that were then used to create two synthetic pseudo-parallel corpora for Kazakh-English and English-Kazakh respectively. Finally, a self-attention model based on the decoder part of the Transformer architecture was trained on the two pseudo-parallel corpora.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5311/
PDF	https://www.aclweb.org/anthology/W19-5311
PWC	https://paperswithcode.com/paper/the-talp-upc-machine-translation-systems-for-1
Repo
Framework

Quality and Coverage: The AFRL Submission to the WMT19 Parallel Corpus Filtering for Low-Resource Conditions Task


Title	Quality and Coverage: The AFRL Submission to the WMT19 Parallel Corpus Filtering for Low-Resource Conditions Task
Authors	Grant Erdmann, Jeremy Gwinnup
Abstract	The WMT19 Parallel Corpus Filtering For Low-Resource Conditions Task aims to test various methods of filtering a noisy parallel corpora, to make them useful for training machine translation systems. This year the noisy corpora are the relatively low-resource language pairs of Nepali-English and Sinhala-English. This papers describes the Air Force Research Laboratory (AFRL) submissions, including preprocessing methods and scoring metrics. Numerical results indicate a benefit over baseline and the relative benefits of different options.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5436/
PDF	https://www.aclweb.org/anthology/W19-5436
PWC	https://paperswithcode.com/paper/quality-and-coverage-the-afrl-submission-to
Repo
Framework

Exploiting Edge Features for Graph Neural Networks


Title	Exploiting Edge Features for Graph Neural Networks
Authors	Liyu Gong, Qiang Cheng
Abstract	Edge features contain important information about graphs. However, current state-of-the-art neural network models designed for graph learning, e.g., graph convolutional networks (GCN) and graph attention networks (GAT), inadequately utilize edge features, especially multi-dimensional edge features. In this paper, we build a new framework for a family of new graph neural network models that can more sufficiently exploit edge features, including those of undirected or multi-dimensional edges. The proposed framework can consolidate current graph neural network models, e.g., GCN and GAT. The proposed framework and new models have the following novelties: First, we propose to use doubly stochastic normalization of graph edge features instead of the commonly used row or symmetric normalization approaches used in current graph neural networks. Second, we construct new formulas for the operations in each individual layer so that they can handle multi-dimensional edge features. Third, for the proposed new framework, edge features are adaptive across network layers. As a result, our proposed new framework and new models are able to exploit a rich source of graph edge information. We apply our new models to graph node classification on several citation networks, whole graph classification, and regression on several molecular datasets. Compared with the current state-of-the-art methods, i.e., GCNs and GAT, our models obtain better performance, which testify to the importance of exploiting edge features in graph neural networks.
Tasks	Graph Classification, Node Classification
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Gong_Exploiting_Edge_Features_for_Graph_Neural_Networks_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Gong_Exploiting_Edge_Features_for_Graph_Neural_Networks_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/exploiting-edge-features-for-graph-neural
Repo
Framework

Sleep stage classification from heart-rate variability using long short-term memory neural networks


Title	Sleep stage classification from heart-rate variability using long short-term memory neural networks
Authors	Mustafa Radha, Pedro Fonseca, Arnaud Moreau, Marco Ross, Andreas Cerny, Peter Anderer, Xi Long, Ronald M. Aarts
Abstract	Automated sleep stage classification using heart rate variability (HRV) may provide an ergonomic and low-cost alternative to gold standard polysomnography, creating possibilities for unobtrusive home-based sleep monitoring. Current methods however are limited in their ability to take into account long-term sleep architectural patterns. A long short-term memory (LSTM) network is proposed as a solution to model long-term cardiac sleep architecture information and validated on a comprehensive data set (292 participants, 584 nights, 541.214 annotated 30 s sleep segments) comprising a wide range of ages and pathological profiles, annotated according to the Rechtschaffen and Kales (R&K) annotation standard. It is shown that the model outperforms state-of-the-art approaches which were often limited to non-temporal or short-term recurrent classifiers. The model achieves a Cohen’s k of 0.61 ± 0.15 and accuracy of 77.00 ± 8.90% across the entire database. Further analysis revealed that the performance for individuals aged 50 years and older may decline. These results demonstrate the merit of deep temporal modelling using a diverse data set and advance the state-of-the-art for HRV-based sleep stage classification. Further research is warranted into individuals over the age of 50 as performance tends to worsen in this sub-population.
Tasks	Electrocardiography (ECG), Heart Rate Variability, Sleep Stage Detection
Published	2019-10-02
URL	https://doi.org/10.1038/s41598-019-49703-y
PDF	https://www.nature.com/articles/s41598-019-49703-y.pdf
PWC	https://paperswithcode.com/paper/sleep-stage-classification-from-heart-rate
Repo
Framework

NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction


Title	NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction
Authors	Amy Olex, Luke Maffey, Bridget McInnes
Abstract	Incorporating domain knowledge is vital in building successful natural language processing (NLP) applications. Many times, cross-domain application of a tool results in poor performance as the tool does not account for domain-specific attributes. The clinical domain is challenging in this aspect due to specialized medical terms and nomenclature, shorthand notation, fragmented text, and a variety of writing styles used by different medical units. Temporal resolution is an NLP task that, in general, is domain-agnostic because temporal information is represented using a limited lexicon. However, domain-specific aspects of temporal resolution are present in clinical texts. Here we explore parsing issues that arose when running our system, a tool built on Newswire text, on clinical notes in the THYME corpus. Many parsing issues were straightforward to correct; however, a few code changes resulted in a cascading series of parsing errors that had to be resolved before an improvement in performance was observed, revealing the complexity temporal resolution and rule-based parsing. Our system now outperforms current state-of-the-art systems on the THYME corpus with little change in its performance on Newswire texts.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1369/
PDF	https://www.aclweb.org/anthology/N19-1369
PWC	https://paperswithcode.com/paper/nlp-whack-a-mole-challenges-in-cross-domain
Repo
Framework

AI_Blues at FinSBD Shared Task: CRF-based Sentence Boundary Detection in PDF Noisy Text in the Financial Domain


Title	AI_Blues at FinSBD Shared Task: CRF-based Sentence Boundary Detection in PDF Noisy Text in the Financial Domain
Authors	Ditty Mathew, Chinnappa Guggilla
Abstract
Tasks	Boundary Detection
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5522/
PDF	https://www.aclweb.org/anthology/W19-5522
PWC	https://paperswithcode.com/paper/ai_blues-at-finsbd-shared-task-crf-based
Repo
Framework

Community lexical access for an endangered polysynthetic language: An electronic dictionary for St. Lawrence Island Yupik


Title	Community lexical access for an endangered polysynthetic language: An electronic dictionary for St. Lawrence Island Yupik
Authors	Benjamin Hunt, Emily Chen, Sylvia L.R. Schreiner, Lane Schwartz
Abstract	In this paper, we introduce a morphologically-aware electronic dictionary for St. Lawrence Island Yupik, an endangered language of the Bering Strait region. Implemented using HTML, Javascript, and CSS, the dictionary is set in an uncluttered interface and permits users to search in Yupik or in English for Yupik root words and Yupik derivational suffixes. For each matching result, our electronic dictionary presents the user with the corresponding entry from the Badten (2008) Yupik-English paper dictionary. Because Yupik is a polysynthetic language, handling of multimorphemic word forms is critical. If a user searches for an inflected Yupik word form, we perform a morphological analysis and return entries for the root word and for any derivational suffixes present in the word. This electronic dictionary should serve not only as a valuable resource for all students and speakers of Yupik, but also for field linguists working towards documentation and conservation of the language.
Tasks	Morphological Analysis
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-4021/
PDF	https://www.aclweb.org/anthology/N19-4021
PWC	https://paperswithcode.com/paper/community-lexical-access-for-an-endangered
Repo
Framework

Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis


Title	Neural and Linear Pipeline Approaches to Cross-lingual Morphological Analysis
Authors	{\c{C}}a{\u{g}}r{\i} {\c{C}}{"o}ltekin, Jeremy Barnes
Abstract	This paper describes T{"u}bingen-Oslo team{'}s participation in the cross-lingual morphological analysis task in the VarDial 2019 evaluation campaign. We participated in the shared task with a standard neural network model. Our model achieved analysis F1-scores of 31.48 and 23.67 on test languages Karachay-Balkar (Turkic) and Sardinian (Romance) respectively. The scores are comparable to the scores obtained by the other participants in both language families, and the analysis score on the Romance data set was also the best result obtained in the shared task. Besides describing the system used in our shared task participation, we describe another, simpler, model based on linear classifiers, and present further analyses using both models. Our analyses, besides revealing some of the difficult cases, also confirm that the usefulness of a source language in this task is highly correlated with the similarity of source and target languages.
Tasks	Morphological Analysis
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1416/
PDF	https://www.aclweb.org/anthology/W19-1416
PWC	https://paperswithcode.com/paper/neural-and-linear-pipeline-approaches-to
Repo
Framework

Measuring the perceptual availability of phonological features during language acquisition using unsupervised binary stochastic autoencoders


Title	Measuring the perceptual availability of phonological features during language acquisition using unsupervised binary stochastic autoencoders
Authors	Cory Shain, Micha Elsner
Abstract	In this paper, we deploy binary stochastic neural autoencoder networks as models of infant language learning in two typologically unrelated languages (Xitsonga and English). We show that the drive to model auditory percepts leads to latent clusters that partially align with theory-driven phonemic categories. We further evaluate the degree to which theory-driven phonological features are encoded in the latent bit patterns, finding that some (e.g. [+-approximant]), are well represented by the network in both languages, while others (e.g. [+-spread glottis]) are less so. Together, these findings suggest that many reliable cues to phonemic structure are immediately available to infants from bottom-up perceptual characteristics alone, but that these cues must eventually be supplemented by top-down lexical and phonotactic information to achieve adult-like phone discrimination. Our results also suggest differences in degree of perceptual availability between features, yielding testable predictions as to which features might depend more or less heavily on top-down cues during child language acquisition.
Tasks	Language Acquisition
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1007/
PDF	https://www.aclweb.org/anthology/N19-1007
PWC	https://paperswithcode.com/paper/measuring-the-perceptual-availability-of
Repo
Framework

?`Es un pl'atano? Exploring the Application of a Physically Grounded Language Acquisition System to Spanish


Title	?`Es un pl'atano? Exploring the Application of a Physically Grounded Language Acquisition System to Spanish \|
Authors	Caroline Kery, Francis Ferraro, Cynthia Matuszek
Abstract	In this paper we describe a multilingual grounded language learning system adapted from an English-only system. This system learns the meaning of words used in crowd-sourced descriptions by grounding them in the physical representations of the objects they are describing. Our work presents a framework to compare the performance of the system when applied to a new language and to identify modifications necessary to attain equal performance, with the goal of enhancing the ability of robots to learn language from a more diverse range of people. We then demonstrate this system with Spanish, through first analyzing the performance of translated Spanish, and then extending this analysis to a new corpus of crowd-sourced Spanish language data. We find that with small modifications, the system is able to learn color, object, and shape words with comparable performance between languages.
Tasks	Language Acquisition
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1602/
PDF	https://www.aclweb.org/anthology/W19-1602
PWC	https://paperswithcode.com/paper/es-un-platano-exploring-the-application-of-a
Repo
Framework

The Role of Utterance Boundaries and Word Frequencies for Part-of-speech Learning in Brazilian Portuguese Through Distributional Analysis


Title	The Role of Utterance Boundaries and Word Frequencies for Part-of-speech Learning in Brazilian Portuguese Through Distributional Analysis
Authors	Pablo Picasso Feliciano de Faria
Abstract	In this study, we address the problem of part-of-speech (or syntactic category) learning during language acquisition through distributional analysis of utterances. A model based on Redington et al.{'}s (1998) distributional learner is used to investigate the informativeness of distributional information in Brazilian Portuguese (BP). The data provided to the learner comes from two publicly available corpora of child directed speech. We present preliminary results from two experiments. The first one investigates the effects of different assumptions about utterance boundaries when presenting the input data to the learner. The second experiment compares the learner{'}s performance when counting contextual words{'} frequencies versus just acknowledging their co-occurrence with a given target word. In general, our results indicate that explicit boundaries are more informative, frequencies are important, and that distributional information is useful to the child as a source of categorial information. These results are in accordance with Redington et al.{'}s findings for English.
Tasks	Language Acquisition
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2917/
PDF	https://www.aclweb.org/anthology/W19-2917
PWC	https://paperswithcode.com/paper/the-role-of-utterance-boundaries-and-word
Repo
Framework