July 26, 2019

2082 words 10 mins read

Paper Group NANR 111

A Preliminary Study of Croatian Lexical Substitution. Lexicon Induction for Spoken Rusyn – Challenges and Results. A New Theory for Matrix Completion. Identifying dialects with textual and acoustic cues. Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation. Real-Time Keyword Extraction from Conversatio …

A Preliminary Study of Croatian Lexical Substitution


Title	A Preliminary Study of Croatian Lexical Substitution
Authors	Domagoj Alagi{'c}, Jan {\v{S}}najder
Abstract	Lexical substitution is a task of determining a meaning-preserving replacement for a word in context. We report on a preliminary study of this task for the Croatian language on a small-scale lexical sample dataset, manually annotated using three different annotation schemes. We compare the annotations, analyze the inter-annotator agreement, and observe a number of interesting language specific details in the obtained lexical substitutes. Furthermore, we apply a recently-proposed, dependency-based lexical substitution model to our dataset. The model achieves a P@3 score of 0.35, which indicates the difficulty of the task.
Tasks	Information Retrieval, Machine Translation, Word Sense Disambiguation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1403/
PDF	https://www.aclweb.org/anthology/W17-1403
PWC	https://paperswithcode.com/paper/a-preliminary-study-of-croatian-lexical
Repo
Framework

Lexicon Induction for Spoken Rusyn – Challenges and Results


Title	Lexicon Induction for Spoken Rusyn – Challenges and Results
Authors	Achim Rabus, Yves Scherrer
Abstract	This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4{%} relative (42.9{%} absolute), which results in a tagging recall increased by 11.6{%} relative (9.1{%} absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages.
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1405/
PDF	https://www.aclweb.org/anthology/W17-1405
PWC	https://paperswithcode.com/paper/lexicon-induction-for-spoken-rusyn-a
Repo
Framework

A New Theory for Matrix Completion


Title	A New Theory for Matrix Completion
Authors	Guangcan Liu, Qingshan Liu, Xiaotong Yuan
Abstract	Prevalent matrix completion theories reply on an assumption that the locations of the missing data are distributed uniformly and randomly (i.e., uniform sampling). Nevertheless, the reason for observations being missing often depends on the unseen observations themselves, and thus the missing data in practice usually occurs in a nonuniform and deterministic fashion rather than randomly. To break through the limits of random sampling, this paper introduces a new hypothesis called \emph{isomeric condition}, which is provably weaker than the assumption of uniform sampling and arguably holds even when the missing data is placed irregularly. Equipped with this new tool, we prove a series of theorems for missing data recovery and matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used nonconvex programs. Unlike the existing theories for nonconvex matrix completion, which are built upon the same condition as convex programs, our theory shows that nonconvex programs have the potential to work with a much weaker condition. Comparing to the existing studies on nonuniform sampling, our setup is more general.
Tasks	Matrix Completion
Published	2017-12-01
URL	http://papers.nips.cc/paper/6680-a-new-theory-for-matrix-completion
PDF	http://papers.nips.cc/paper/6680-a-new-theory-for-matrix-completion.pdf
PWC	https://paperswithcode.com/paper/a-new-theory-for-matrix-completion
Repo
Framework

Identifying dialects with textual and acoustic cues


Title	Identifying dialects with textual and acoustic cues
Authors	Abualsoud Hanani, Aziz Qaroush, Stephen Taylor
Abstract	We describe several systems for identifying short samples of Arabic or Swiss-German dialects, which were prepared for the shared task of the 2017 DSL Workshop (Zampieri et al., 2017). The Arabic data comprises both text and acoustic files, and our best run combined both. The Swiss-German data is text-only. Coincidently, our best runs achieved a accuracy of nearly 63{%} on both the Swiss-German and Arabic dialects tasks.
Tasks	Speech Recognition
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1211/
PDF	https://www.aclweb.org/anthology/W17-1211
PWC	https://paperswithcode.com/paper/identifying-dialects-with-textual-and
Repo
Framework

Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation


Title	Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation
Authors	James Mayfield, Paul McNamee, Cash Costello
Abstract	The 2017 shared task at the Balto-Slavic NLP workshop requires identifying coarse-grained named entities in seven languages, identifying each entity{'}s base form, and clustering name mentions across the multilingual set of documents. The fact that no training data is provided to systems for building supervised classifiers further adds to the complexity. To complete the task we first use publicly available parallel texts to project named entity recognition capability from English to each evaluation language. We ignore entirely the subtask of identifying non-inflected forms of names. Finally, we create cross-document entity identifiers by clustering named mentions using a procedure-based approach.
Tasks	Named Entity Recognition
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1414/
PDF	https://www.aclweb.org/anthology/W17-1414
PWC	https://paperswithcode.com/paper/language-independent-named-entity-analysis
Repo
Framework

Real-Time Keyword Extraction from Conversations


Title	Real-Time Keyword Extraction from Conversations
Authors	Polykarpos Meladianos, Antoine Tixier, Ioannis Nikolentzos, Michalis Vazirgiannis
Abstract	We introduce a novel method to extract keywords from meeting speech in real-time. Our approach builds on the graph-of-words representation of text and leverages the k-core decomposition algorithm and properties of submodular functions. We outperform multiple baselines in a real-time scenario emulated from the AMI and ICSI meeting corpora. Evaluation is conducted against both extractive and abstractive gold standard using two standard performance metrics and a newer one based on word embeddings.
Tasks	Keyword Extraction, Speech Recognition, Word Embeddings
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-2074/
PDF	https://www.aclweb.org/anthology/E17-2074
PWC	https://paperswithcode.com/paper/real-time-keyword-extraction-from
Repo
Framework

Multilabel Classification with Group Testing and Codes


Title	Multilabel Classification with Group Testing and Codes
Authors	Shashanka Ubaru, Arya Mazumdar
Abstract	In recent years, the multiclass and mutlilabel classification problems we encounter in many applications have very large ($10^3$–$10^6$) number of classes. However, each instance belongs to only one or few classes, i.e., the label vectors are sparse. In this work, we propose a novel approach based on group testing to solve such large multilabel classification problems with sparse label vectors. We describe various group testing constructions, and advocate the use of concatenated Reed Solomon codes and unbalanced bipartite expander graphs for extreme classification problems. The proposed approach has several advantages theoretically and practically over existing popular methods. Our method operates on the binary alphabet and can utilize the well-established binary classifiers for learning. The error correction capabilities of the codes are leveraged for the first time in the learning problem to correct prediction errors. Even if a linearly growing number of classifiers mis-classify, these errors are fully corrected. We establish Hamming loss error bounds for the approach. More importantly, our method utilizes a simple prediction algorithm and does not require matrix inversion or solving optimization problems making the algorithm very inexpensive. Numerical experiments with various datasets illustrate the superior performance of our method.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=713
PDF	http://proceedings.mlr.press/v70/ubaru17a/ubaru17a.pdf
PWC	https://paperswithcode.com/paper/multilabel-classification-with-group-testing
Repo
Framework

IWCS 2017 — 12th International Conference on Computational Semantics — Short papers


Title	IWCS 2017 — 12th International Conference on Computational Semantics — Short papers
Authors
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-6900/
PDF	https://www.aclweb.org/anthology/W17-6900
PWC	https://paperswithcode.com/paper/iwcs-2017-a-12th-international-conference-on
Repo
Framework

Learning to Compose Spatial Relations with Grounded Neural Language Models


Title	Learning to Compose Spatial Relations with Grounded Neural Language Models
Authors	Mehdi Ghanimifard, Simon Dobnik
Abstract
Tasks	Semantic Textual Similarity, Word Embeddings
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-6808/
PDF	https://www.aclweb.org/anthology/W17-6808
PWC	https://paperswithcode.com/paper/learning-to-compose-spatial-relations-with
Repo
Framework

Improving Polish Mention Detection with Valency Dictionary


Title	Improving Polish Mention Detection with Valency Dictionary
Authors	Maciej Ogrodniczuk, Bart{\l}omiej Nito{'n}
Abstract	This paper presents results of an experiment integrating information from valency dictionary of Polish into a mention detection system. Two types of information is acquired: positions of syntactic schemata for nominal and verbal constructs and secondary prepositions present in schemata. The syntactic schemata are used to prevent (for verbal realizations) or encourage (for nominal groups) constructing mentions from phrases filling multiple schema positions, the secondary prepositions {–} to filter out artificial mentions created from their nominal components. Mention detection is evaluated against the manual annotation of the Polish Coreference Corpus in two settings: taking into account only mention heads or exact borders.
Tasks	Coreference Resolution
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1503/
PDF	https://www.aclweb.org/anthology/W17-1503
PWC	https://paperswithcode.com/paper/improving-polish-mention-detection-with
Repo
Framework

Comprehensive annotation of cross-linguistic variation in tense and aspect categories


Title	Comprehensive annotation of cross-linguistic variation in tense and aspect categories
Authors	Mark-Matthias Zymla
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-6817/
PDF	https://www.aclweb.org/anthology/W17-6817
PWC	https://paperswithcode.com/paper/comprehensive-annotation-of-cross-linguistic
Repo
Framework

A Google-Proof Collection of French Winograd Schemas


Title	A Google-Proof Collection of French Winograd Schemas
Authors	Pascal Amsili, Olga Seminck
Abstract	This article presents the first collection of French Winograd Schemas. Winograd Schemas form anaphora resolution problems that can only be resolved with extensive world knowledge. For this reason the Winograd Schema Challenge has been proposed as an alternative to the Turing Test. A very important feature of Winograd Schemas is that it should be impossible to resolve them with statistical information about word co-occurrences: they should be Google-proof. We propose a measure of Google-proofness based on Mutual Information, and demonstrate the method on our collection of French Winograd Schemas.
Tasks	Coreference Resolution
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1504/
PDF	https://www.aclweb.org/anthology/W17-1504
PWC	https://paperswithcode.com/paper/a-google-proof-collection-of-french-winograd
Repo
Framework

Neural Machine Translation: Basics, Practical Aspects and Recent Trends


Title	Neural Machine Translation: Basics, Practical Aspects and Recent Trends
Authors	Fabien Cromieres, Toshiaki Nakazawa, Raj Dabre
Abstract	Machine Translation (MT) is a sub-field of NLP which has experienced a number of paradigm shifts since its inception. Up until 2014, Phrase Based Statistical Machine Translation (PBSMT) approaches used to be the state of the art. In late 2014, Neural Machine Translation (NMT) was introduced and was proven to outperform all PBSMT approaches by a significant margin. Since then, the NMT approaches have undergone several transformations which have pushed the state of the art even further. This tutorial is primarily aimed at researchers who are either interested in or are fairly new to the world of NMT and want to obtain a deep understanding of NMT fundamentals. Because it will also cover the latest developments in NMT, it should also be useful to attendees with some experience in NMT.
Tasks	Image Captioning, Machine Translation
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-5004/
PDF	https://www.aclweb.org/anthology/I17-5004
PWC	https://paperswithcode.com/paper/neural-machine-translation-basics-practical
Repo
Framework

Sentiment Analysis of Arabic Tweets Using Semantic Resources


Title	Sentiment Analysis of Arabic Tweets Using Semantic Resources
Authors	Lamia Al-Horaibi, Muhammad Badruddin Khan
Abstract	Sentiment analysis has grown to be one of the most active research areas in natural language processing and text mining. Many researchers have investigated sentiment analysis and opinion mining from different classification approaches. However, limited research is conducted on Arabic sentiment analysis as compared to the English language. In this paper, we have proposed and implemented a technique for Twitter Arabic sentiment analysis consisting of a semantic approach and Arabic linguistic features. Hence, we introduced a mechanism for preprocessing Arabic tweets, and for the methodology of sentiment classification we used a semantic approach. Also, we proposed a technique of classification which uses both Arabic and English sentiment lexicons to classify the Arabic tweets into three sentiment categories (positive or negative or neutral). Our experiments show that many issues were encountered when we used the Arabic SentiWordNet facility to classify Arabic tweets directly; these issues are basically related to Arabic text processing. The Arabic lexicons and Arabic tools must be improved or built from scratch in order to improve Arabic sentiment analysis using the semantic approach. The improvement in results, which are due to our contribution in the form of enhanced Arabic lexicons and amended Arabic tools, demonstrate this need.
Tasks	Arabic Sentiment Analysis, Opinion Mining, Sentiment Analysis
Published	2017-01-30
URL	http://www.ijcis.info/Vol13N1/Vol13N1PP9-14.pdf
PDF	http://www.ijcis.info/Vol13N1/Vol13N1PP9-14.pdf
PWC	https://paperswithcode.com/paper/sentiment-analysis-of-arabic-tweets-using
Repo
Framework

These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution


Title	These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution
Authors	Corina Koolen, Andreas van Cranenburgh
Abstract	Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.
Tasks	Text Categorization
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1602/
PDF	https://www.aclweb.org/anthology/W17-1602
PWC	https://paperswithcode.com/paper/these-are-not-the-stereotypes-you-are-looking
Repo
Framework