Paper Group NANR 111
A Preliminary Study of Croatian Lexical Substitution. Lexicon Induction for Spoken Rusyn – Challenges and Results. A New Theory for Matrix Completion. Identifying dialects with textual and acoustic cues. Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation. Real-Time Keyword Extraction from Conversatio …
A Preliminary Study of Croatian Lexical Substitution
Title | A Preliminary Study of Croatian Lexical Substitution |
Authors | Domagoj Alagi{'c}, Jan {\v{S}}najder |
Abstract | Lexical substitution is a task of determining a meaning-preserving replacement for a word in context. We report on a preliminary study of this task for the Croatian language on a small-scale lexical sample dataset, manually annotated using three different annotation schemes. We compare the annotations, analyze the inter-annotator agreement, and observe a number of interesting language specific details in the obtained lexical substitutes. Furthermore, we apply a recently-proposed, dependency-based lexical substitution model to our dataset. The model achieves a P@3 score of 0.35, which indicates the difficulty of the task. |
Tasks | Information Retrieval, Machine Translation, Word Sense Disambiguation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1403/ |
https://www.aclweb.org/anthology/W17-1403 | |
PWC | https://paperswithcode.com/paper/a-preliminary-study-of-croatian-lexical |
Repo | |
Framework | |
Lexicon Induction for Spoken Rusyn – Challenges and Results
Title | Lexicon Induction for Spoken Rusyn – Challenges and Results |
Authors | Achim Rabus, Yves Scherrer |
Abstract | This paper reports on challenges and results in developing NLP resources for spoken Rusyn. Being a Slavic minority language, Rusyn does not have any resources to make use of. We propose to build a morphosyntactic dictionary for Rusyn, combining existing resources from the etymologically close Slavic languages Russian, Ukrainian, Slovak, and Polish. We adapt these resources to Rusyn by using vowel-sensitive Levenshtein distance, hand-written language-specific transformation rules, and combinations of the two. Compared to an exact match baseline, we increase the coverage of the resulting morphological dictionary by up to 77.4{%} relative (42.9{%} absolute), which results in a tagging recall increased by 11.6{%} relative (9.1{%} absolute). Our research confirms and expands the results of previous studies showing the efficiency of using NLP resources from neighboring languages for low-resourced languages. |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1405/ |
https://www.aclweb.org/anthology/W17-1405 | |
PWC | https://paperswithcode.com/paper/lexicon-induction-for-spoken-rusyn-a |
Repo | |
Framework | |
A New Theory for Matrix Completion
Title | A New Theory for Matrix Completion |
Authors | Guangcan Liu, Qingshan Liu, Xiaotong Yuan |
Abstract | Prevalent matrix completion theories reply on an assumption that the locations of the missing data are distributed uniformly and randomly (i.e., uniform sampling). Nevertheless, the reason for observations being missing often depends on the unseen observations themselves, and thus the missing data in practice usually occurs in a nonuniform and deterministic fashion rather than randomly. To break through the limits of random sampling, this paper introduces a new hypothesis called \emph{isomeric condition}, which is provably weaker than the assumption of uniform sampling and arguably holds even when the missing data is placed irregularly. Equipped with this new tool, we prove a series of theorems for missing data recovery and matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used nonconvex programs. Unlike the existing theories for nonconvex matrix completion, which are built upon the same condition as convex programs, our theory shows that nonconvex programs have the potential to work with a much weaker condition. Comparing to the existing studies on nonuniform sampling, our setup is more general. |
Tasks | Matrix Completion |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6680-a-new-theory-for-matrix-completion |
http://papers.nips.cc/paper/6680-a-new-theory-for-matrix-completion.pdf | |
PWC | https://paperswithcode.com/paper/a-new-theory-for-matrix-completion |
Repo | |
Framework | |
Identifying dialects with textual and acoustic cues
Title | Identifying dialects with textual and acoustic cues |
Authors | Abualsoud Hanani, Aziz Qaroush, Stephen Taylor |
Abstract | We describe several systems for identifying short samples of Arabic or Swiss-German dialects, which were prepared for the shared task of the 2017 DSL Workshop (Zampieri et al., 2017). The Arabic data comprises both text and acoustic files, and our best run combined both. The Swiss-German data is text-only. Coincidently, our best runs achieved a accuracy of nearly 63{%} on both the Swiss-German and Arabic dialects tasks. |
Tasks | Speech Recognition |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1211/ |
https://www.aclweb.org/anthology/W17-1211 | |
PWC | https://paperswithcode.com/paper/identifying-dialects-with-textual-and |
Repo | |
Framework | |
Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation
Title | Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation |
Authors | James Mayfield, Paul McNamee, Cash Costello |
Abstract | The 2017 shared task at the Balto-Slavic NLP workshop requires identifying coarse-grained named entities in seven languages, identifying each entity{'}s base form, and clustering name mentions across the multilingual set of documents. The fact that no training data is provided to systems for building supervised classifiers further adds to the complexity. To complete the task we first use publicly available parallel texts to project named entity recognition capability from English to each evaluation language. We ignore entirely the subtask of identifying non-inflected forms of names. Finally, we create cross-document entity identifiers by clustering named mentions using a procedure-based approach. |
Tasks | Named Entity Recognition |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1414/ |
https://www.aclweb.org/anthology/W17-1414 | |
PWC | https://paperswithcode.com/paper/language-independent-named-entity-analysis |
Repo | |
Framework | |
Real-Time Keyword Extraction from Conversations
Title | Real-Time Keyword Extraction from Conversations |
Authors | Polykarpos Meladianos, Antoine Tixier, Ioannis Nikolentzos, Michalis Vazirgiannis |
Abstract | We introduce a novel method to extract keywords from meeting speech in real-time. Our approach builds on the graph-of-words representation of text and leverages the k-core decomposition algorithm and properties of submodular functions. We outperform multiple baselines in a real-time scenario emulated from the AMI and ICSI meeting corpora. Evaluation is conducted against both extractive and abstractive gold standard using two standard performance metrics and a newer one based on word embeddings. |
Tasks | Keyword Extraction, Speech Recognition, Word Embeddings |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-2074/ |
https://www.aclweb.org/anthology/E17-2074 | |
PWC | https://paperswithcode.com/paper/real-time-keyword-extraction-from |
Repo | |
Framework | |
Multilabel Classification with Group Testing and Codes
Title | Multilabel Classification with Group Testing and Codes |
Authors | Shashanka Ubaru, Arya Mazumdar |
Abstract | In recent years, the multiclass and mutlilabel classification problems we encounter in many applications have very large ($10^3$–$10^6$) number of classes. However, each instance belongs to only one or few classes, i.e., the label vectors are sparse. In this work, we propose a novel approach based on group testing to solve such large multilabel classification problems with sparse label vectors. We describe various group testing constructions, and advocate the use of concatenated Reed Solomon codes and unbalanced bipartite expander graphs for extreme classification problems. The proposed approach has several advantages theoretically and practically over existing popular methods. Our method operates on the binary alphabet and can utilize the well-established binary classifiers for learning. The error correction capabilities of the codes are leveraged for the first time in the learning problem to correct prediction errors. Even if a linearly growing number of classifiers mis-classify, these errors are fully corrected. We establish Hamming loss error bounds for the approach. More importantly, our method utilizes a simple prediction algorithm and does not require matrix inversion or solving optimization problems making the algorithm very inexpensive. Numerical experiments with various datasets illustrate the superior performance of our method. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=713 |
http://proceedings.mlr.press/v70/ubaru17a/ubaru17a.pdf | |
PWC | https://paperswithcode.com/paper/multilabel-classification-with-group-testing |
Repo | |
Framework | |
IWCS 2017 — 12th International Conference on Computational Semantics — Short papers
Title | IWCS 2017 — 12th International Conference on Computational Semantics — Short papers |
Authors | |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-6900/ |
https://www.aclweb.org/anthology/W17-6900 | |
PWC | https://paperswithcode.com/paper/iwcs-2017-a-12th-international-conference-on |
Repo | |
Framework | |
Learning to Compose Spatial Relations with Grounded Neural Language Models
Title | Learning to Compose Spatial Relations with Grounded Neural Language Models |
Authors | Mehdi Ghanimifard, Simon Dobnik |
Abstract | |
Tasks | Semantic Textual Similarity, Word Embeddings |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-6808/ |
https://www.aclweb.org/anthology/W17-6808 | |
PWC | https://paperswithcode.com/paper/learning-to-compose-spatial-relations-with |
Repo | |
Framework | |
Improving Polish Mention Detection with Valency Dictionary
Title | Improving Polish Mention Detection with Valency Dictionary |
Authors | Maciej Ogrodniczuk, Bart{\l}omiej Nito{'n} |
Abstract | This paper presents results of an experiment integrating information from valency dictionary of Polish into a mention detection system. Two types of information is acquired: positions of syntactic schemata for nominal and verbal constructs and secondary prepositions present in schemata. The syntactic schemata are used to prevent (for verbal realizations) or encourage (for nominal groups) constructing mentions from phrases filling multiple schema positions, the secondary prepositions {–} to filter out artificial mentions created from their nominal components. Mention detection is evaluated against the manual annotation of the Polish Coreference Corpus in two settings: taking into account only mention heads or exact borders. |
Tasks | Coreference Resolution |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1503/ |
https://www.aclweb.org/anthology/W17-1503 | |
PWC | https://paperswithcode.com/paper/improving-polish-mention-detection-with |
Repo | |
Framework | |
Comprehensive annotation of cross-linguistic variation in tense and aspect categories
Title | Comprehensive annotation of cross-linguistic variation in tense and aspect categories |
Authors | Mark-Matthias Zymla |
Abstract | |
Tasks | |
Published | 2017-01-01 |
URL | https://www.aclweb.org/anthology/W17-6817/ |
https://www.aclweb.org/anthology/W17-6817 | |
PWC | https://paperswithcode.com/paper/comprehensive-annotation-of-cross-linguistic |
Repo | |
Framework | |
A Google-Proof Collection of French Winograd Schemas
Title | A Google-Proof Collection of French Winograd Schemas |
Authors | Pascal Amsili, Olga Seminck |
Abstract | This article presents the first collection of French Winograd Schemas. Winograd Schemas form anaphora resolution problems that can only be resolved with extensive world knowledge. For this reason the Winograd Schema Challenge has been proposed as an alternative to the Turing Test. A very important feature of Winograd Schemas is that it should be impossible to resolve them with statistical information about word co-occurrences: they should be Google-proof. We propose a measure of Google-proofness based on Mutual Information, and demonstrate the method on our collection of French Winograd Schemas. |
Tasks | Coreference Resolution |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1504/ |
https://www.aclweb.org/anthology/W17-1504 | |
PWC | https://paperswithcode.com/paper/a-google-proof-collection-of-french-winograd |
Repo | |
Framework | |
Neural Machine Translation: Basics, Practical Aspects and Recent Trends
Title | Neural Machine Translation: Basics, Practical Aspects and Recent Trends |
Authors | Fabien Cromieres, Toshiaki Nakazawa, Raj Dabre |
Abstract | Machine Translation (MT) is a sub-field of NLP which has experienced a number of paradigm shifts since its inception. Up until 2014, Phrase Based Statistical Machine Translation (PBSMT) approaches used to be the state of the art. In late 2014, Neural Machine Translation (NMT) was introduced and was proven to outperform all PBSMT approaches by a significant margin. Since then, the NMT approaches have undergone several transformations which have pushed the state of the art even further. This tutorial is primarily aimed at researchers who are either interested in or are fairly new to the world of NMT and want to obtain a deep understanding of NMT fundamentals. Because it will also cover the latest developments in NMT, it should also be useful to attendees with some experience in NMT. |
Tasks | Image Captioning, Machine Translation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-5004/ |
https://www.aclweb.org/anthology/I17-5004 | |
PWC | https://paperswithcode.com/paper/neural-machine-translation-basics-practical |
Repo | |
Framework | |
Sentiment Analysis of Arabic Tweets Using Semantic Resources
Title | Sentiment Analysis of Arabic Tweets Using Semantic Resources |
Authors | Lamia Al-Horaibi, Muhammad Badruddin Khan |
Abstract | Sentiment analysis has grown to be one of the most active research areas in natural language processing and text mining. Many researchers have investigated sentiment analysis and opinion mining from different classification approaches. However, limited research is conducted on Arabic sentiment analysis as compared to the English language. In this paper, we have proposed and implemented a technique for Twitter Arabic sentiment analysis consisting of a semantic approach and Arabic linguistic features. Hence, we introduced a mechanism for preprocessing Arabic tweets, and for the methodology of sentiment classification we used a semantic approach. Also, we proposed a technique of classification which uses both Arabic and English sentiment lexicons to classify the Arabic tweets into three sentiment categories (positive or negative or neutral). Our experiments show that many issues were encountered when we used the Arabic SentiWordNet facility to classify Arabic tweets directly; these issues are basically related to Arabic text processing. The Arabic lexicons and Arabic tools must be improved or built from scratch in order to improve Arabic sentiment analysis using the semantic approach. The improvement in results, which are due to our contribution in the form of enhanced Arabic lexicons and amended Arabic tools, demonstrate this need. |
Tasks | Arabic Sentiment Analysis, Opinion Mining, Sentiment Analysis |
Published | 2017-01-30 |
URL | http://www.ijcis.info/Vol13N1/Vol13N1PP9-14.pdf |
http://www.ijcis.info/Vol13N1/Vol13N1PP9-14.pdf | |
PWC | https://paperswithcode.com/paper/sentiment-analysis-of-arabic-tweets-using |
Repo | |
Framework | |
These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution
Title | These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution |
Authors | Corina Koolen, Andreas van Cranenburgh |
Abstract | Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results. |
Tasks | Text Categorization |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1602/ |
https://www.aclweb.org/anthology/W17-1602 | |
PWC | https://paperswithcode.com/paper/these-are-not-the-stereotypes-you-are-looking |
Repo | |
Framework | |