July 26, 2019

2082 words 10 mins read

Paper Group NANR 111

Paper Group NANR 111

A Preliminary Study of Croatian Lexical Substitution. Lexicon Induction for Spoken Rusyn – Challenges and Results. A New Theory for Matrix Completion. Identifying dialects with textual and acoustic cues. Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation. Real-Time Keyword Extraction from Conversatio …

A Preliminary Study of Croatian Lexical Substitution

Title A Preliminary Study of Croatian Lexical Substitution
Authors Domagoj Alagi{'c}, Jan {\v{S}}najder
Abstract Lexical substitution is a task of determining a meaning-preserving replacement for a word in context. We report on a preliminary study of this task for the Croatian language on a small-scale lexical sample dataset, manually annotated using three different annotation schemes. We compare the annotations, analyze the inter-annotator agreement, and observe a number of interesting language specific details in the obtained lexical substitutes. Furthermore, we apply a recently-proposed, dependency-based lexical substitution model to our dataset. The model achieves a P@3 score of 0.35, which indicates the difficulty of the task.
Tasks Information Retrieval, Machine Translation, Word Sense Disambiguation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1403/
PDF https://www.aclweb.org/anthology/W17-1403
PWC https://paperswithcode.com/paper/a-preliminary-study-of-croatian-lexical
Repo
Framework

Lexicon Induction for Spoken Rusyn – Challenges and Results

Title Lexicon Induction for Spoken Rusyn – Challenges and Results
Authors Achim Rabus, Yves Scherrer
Abstract This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4{%} relative (42.9{%} absolute), which results in a tagging recall increased by 11.6{%} relative (9.1{%} absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages.
Tasks
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1405/
PDF https://www.aclweb.org/anthology/W17-1405
PWC https://paperswithcode.com/paper/lexicon-induction-for-spoken-rusyn-a
Repo
Framework

A New Theory for Matrix Completion

Title A New Theory for Matrix Completion
Authors Guangcan Liu, Qingshan Liu, Xiaotong Yuan
Abstract Prevalent matrix completion theories reply on an assumption that the locations of the missing data are distributed uniformly and randomly (i.e., uniform sampling). Nevertheless, the reason for observations being missing often depends on the unseen observations themselves, and thus the missing data in practice usually occurs in a nonuniform and deterministic fashion rather than randomly. To break through the limits of random sampling, this paper introduces a new hypothesis called \emph{isomeric condition}, which is provably weaker than the assumption of uniform sampling and arguably holds even when the missing data is placed irregularly. Equipped with this new tool, we prove a series of theorems for missing data recovery and matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used nonconvex programs. Unlike the existing theories for nonconvex matrix completion, which are built upon the same condition as convex programs, our theory shows that nonconvex programs have the potential to work with a much weaker condition. Comparing to the existing studies on nonuniform sampling, our setup is more general.
Tasks Matrix Completion
Published 2017-12-01
URL http://papers.nips.cc/paper/6680-a-new-theory-for-matrix-completion
PDF http://papers.nips.cc/paper/6680-a-new-theory-for-matrix-completion.pdf
PWC https://paperswithcode.com/paper/a-new-theory-for-matrix-completion
Repo
Framework

Identifying dialects with textual and acoustic cues

Title Identifying dialects with textual and acoustic cues
Authors Abualsoud Hanani, Aziz Qaroush, Stephen Taylor
Abstract We describe several systems for identifying short samples of Arabic or Swiss-German dialects, which were prepared for the shared task of the 2017 DSL Workshop (Zampieri et al., 2017). The Arabic data comprises both text and acoustic files, and our best run combined both. The Swiss-German data is text-only. Coincidently, our best runs achieved a accuracy of nearly 63{%} on both the Swiss-German and Arabic dialects tasks.
Tasks Speech Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1211/
PDF https://www.aclweb.org/anthology/W17-1211
PWC https://paperswithcode.com/paper/identifying-dialects-with-textual-and
Repo
Framework

Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation

Title Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation
Authors James Mayfield, Paul McNamee, Cash Costello
Abstract The 2017 shared task at the Balto-Slavic NLP workshop requires identifying coarse-grained named entities in seven languages, identifying each entity{'}s base form, and clustering name mentions across the multilingual set of documents. The fact that no training data is provided to systems for building supervised classifiers further adds to the complexity. To complete the task we first use publicly available parallel texts to project named entity recognition capability from English to each evaluation language. We ignore entirely the subtask of identifying non-inflected forms of names. Finally, we create cross-document entity identifiers by clustering named mentions using a procedure-based approach.
Tasks Named Entity Recognition
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1414/
PDF https://www.aclweb.org/anthology/W17-1414
PWC https://paperswithcode.com/paper/language-independent-named-entity-analysis
Repo
Framework

Real-Time Keyword Extraction from Conversations

Title Real-Time Keyword Extraction from Conversations
Authors Polykarpos Meladianos, Antoine Tixier, Ioannis Nikolentzos, Michalis Vazirgiannis
Abstract We introduce a novel method to extract keywords from meeting speech in real-time. Our approach builds on the graph-of-words representation of text and leverages the k-core decomposition algorithm and properties of submodular functions. We outperform multiple baselines in a real-time scenario emulated from the AMI and ICSI meeting corpora. Evaluation is conducted against both extractive and abstractive gold standard using two standard performance metrics and a newer one based on word embeddings.
Tasks Keyword Extraction, Speech Recognition, Word Embeddings
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-2074/
PDF https://www.aclweb.org/anthology/E17-2074
PWC https://paperswithcode.com/paper/real-time-keyword-extraction-from
Repo
Framework

Multilabel Classification with Group Testing and Codes

Title Multilabel Classification with Group Testing and Codes
Authors Shashanka Ubaru, Arya Mazumdar
Abstract In recent years, the multiclass and mutlilabel classification problems we encounter in many applications have very large ($10^3$–$10^6$) number of classes. However, each instance belongs to only one or few classes, i.e., the label vectors are sparse. In this work, we propose a novel approach based on group testing to solve such large multilabel classification problems with sparse label vectors. We describe various group testing constructions, and advocate the use of concatenated Reed Solomon codes and unbalanced bipartite expander graphs for extreme classification problems. The proposed approach has several advantages theoretically and practically over existing popular methods. Our method operates on the binary alphabet and can utilize the well-established binary classifiers for learning. The error correction capabilities of the codes are leveraged for the first time in the learning problem to correct prediction errors. Even if a linearly growing number of classifiers mis-classify, these errors are fully corrected. We establish Hamming loss error bounds for the approach. More importantly, our method utilizes a simple prediction algorithm and does not require matrix inversion or solving optimization problems making the algorithm very inexpensive. Numerical experiments with various datasets illustrate the superior performance of our method.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=713
PDF http://proceedings.mlr.press/v70/ubaru17a/ubaru17a.pdf
PWC https://paperswithcode.com/paper/multilabel-classification-with-group-testing
Repo
Framework

IWCS 2017 — 12th International Conference on Computational Semantics — Short papers

Title IWCS 2017 — 12th International Conference on Computational Semantics — Short papers
Authors
Abstract
Tasks
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6900/
PDF https://www.aclweb.org/anthology/W17-6900
PWC https://paperswithcode.com/paper/iwcs-2017-a-12th-international-conference-on
Repo
Framework

Learning to Compose Spatial Relations with Grounded Neural Language Models

Title Learning to Compose Spatial Relations with Grounded Neural Language Models
Authors Mehdi Ghanimifard, Simon Dobnik
Abstract
Tasks Semantic Textual Similarity, Word Embeddings
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6808/
PDF https://www.aclweb.org/anthology/W17-6808
PWC https://paperswithcode.com/paper/learning-to-compose-spatial-relations-with
Repo
Framework

Improving Polish Mention Detection with Valency Dictionary

Title Improving Polish Mention Detection with Valency Dictionary
Authors Maciej Ogrodniczuk, Bart{\l}omiej Nito{'n}
Abstract This paper presents results of an experiment integrating information from valency dictionary of Polish into a mention detection system. Two types of information is acquired: positions of syntactic schemata for nominal and verbal constructs and secondary prepositions present in schemata. The syntactic schemata are used to prevent (for verbal realizations) or encourage (for nominal groups) constructing mentions from phrases filling multiple schema positions, the secondary prepositions {–} to filter out artificial mentions created from their nominal components. Mention detection is evaluated against the manual annotation of the Polish Coreference Corpus in two settings: taking into account only mention heads or exact borders.
Tasks Coreference Resolution
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1503/
PDF https://www.aclweb.org/anthology/W17-1503
PWC https://paperswithcode.com/paper/improving-polish-mention-detection-with
Repo
Framework

Comprehensive annotation of cross-linguistic variation in tense and aspect categories

Title Comprehensive annotation of cross-linguistic variation in tense and aspect categories
Authors Mark-Matthias Zymla
Abstract
Tasks
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6817/
PDF https://www.aclweb.org/anthology/W17-6817
PWC https://paperswithcode.com/paper/comprehensive-annotation-of-cross-linguistic
Repo
Framework

A Google-Proof Collection of French Winograd Schemas

Title A Google-Proof Collection of French Winograd Schemas
Authors Pascal Amsili, Olga Seminck
Abstract This article presents the first collection of French Winograd Schemas. Winograd Schemas form anaphora resolution problems that can only be resolved with extensive world knowledge. For this reason the Winograd Schema Challenge has been proposed as an alternative to the Turing Test. A very important feature of Winograd Schemas is that it should be impossible to resolve them with statistical information about word co-occurrences: they should be Google-proof. We propose a measure of Google-proofness based on Mutual Information, and demonstrate the method on our collection of French Winograd Schemas.
Tasks Coreference Resolution
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1504/
PDF https://www.aclweb.org/anthology/W17-1504
PWC https://paperswithcode.com/paper/a-google-proof-collection-of-french-winograd
Repo
Framework
Title Neural Machine Translation: Basics, Practical Aspects and Recent Trends
Authors Fabien Cromieres, Toshiaki Nakazawa, Raj Dabre
Abstract Machine Translation (MT) is a sub-field of NLP which has experienced a number of paradigm shifts since its inception. Up until 2014, Phrase Based Statistical Machine Translation (PBSMT) approaches used to be the state of the art. In late 2014, Neural Machine Translation (NMT) was introduced and was proven to outperform all PBSMT approaches by a significant margin. Since then, the NMT approaches have undergone several transformations which have pushed the state of the art even further. This tutorial is primarily aimed at researchers who are either interested in or are fairly new to the world of NMT and want to obtain a deep understanding of NMT fundamentals. Because it will also cover the latest developments in NMT, it should also be useful to attendees with some experience in NMT.
Tasks Image Captioning, Machine Translation
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-5004/
PDF https://www.aclweb.org/anthology/I17-5004
PWC https://paperswithcode.com/paper/neural-machine-translation-basics-practical
Repo
Framework

Sentiment Analysis of Arabic Tweets Using Semantic Resources

Title Sentiment Analysis of Arabic Tweets Using Semantic Resources
Authors Lamia Al-Horaibi, Muhammad Badruddin Khan
Abstract Sentiment analysis has grown to be one of the most active research areas in natural language processing and text mining. Many researchers have investigated sentiment analysis and opinion mining from different classification approaches. However, limited research is conducted on Arabic sentiment analysis as compared to the English language. In this paper, we have proposed and implemented a technique for Twitter Arabic sentiment analysis consisting of a semantic approach and Arabic linguistic features. Hence, we introduced a mechanism for preprocessing Arabic tweets, and for the methodology of sentiment classification we used a semantic approach. Also, we proposed a technique of classification which uses both Arabic and English sentiment lexicons to classify the Arabic tweets into three sentiment categories (positive or negative or neutral). Our experiments show that many issues were encountered when we used the Arabic SentiWordNet facility to classify Arabic tweets directly; these issues are basically related to Arabic text processing. The Arabic lexicons and Arabic tools must be improved or built from scratch in order to improve Arabic sentiment analysis using the semantic approach. The improvement in results, which are due to our contribution in the form of enhanced Arabic lexicons and amended Arabic tools, demonstrate this need.
Tasks Arabic Sentiment Analysis, Opinion Mining, Sentiment Analysis
Published 2017-01-30
URL http://www.ijcis.info/Vol13N1/Vol13N1PP9-14.pdf
PDF http://www.ijcis.info/Vol13N1/Vol13N1PP9-14.pdf
PWC https://paperswithcode.com/paper/sentiment-analysis-of-arabic-tweets-using
Repo
Framework

These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution

Title These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution
Authors Corina Koolen, Andreas van Cranenburgh
Abstract Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.
Tasks Text Categorization
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1602/
PDF https://www.aclweb.org/anthology/W17-1602
PWC https://paperswithcode.com/paper/these-are-not-the-stereotypes-you-are-looking
Repo
Framework
comments powered by Disqus