July 26, 2019

1832 words 9 mins read

Paper Group NANR 119

On orthogonality and learning RNNs with long term dependencies. Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages. Document retrieval and question answering in medical documents. A large-scale corpus challenge.. Rule-based Machine translation from English to Finnish. EUDAMU at SemEval-2017 Task …

On orthogonality and learning RNNs with long term dependencies


Title	On orthogonality and learning RNNs with long term dependencies
Authors	Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, Chris Pal
Abstract	It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints on orthogonality can negatively affect the speed of convergence and model performance.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=740
PDF	http://proceedings.mlr.press/v70/vorontsov17a/vorontsov17a.pdf
PWC	https://paperswithcode.com/paper/on-orthogonality-and-learning-rnns-with-long
Repo
Framework

Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages


Title	Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages
Authors
Abstract
Tasks
Published	2017-03-01
URL	https://www.aclweb.org/anthology/W17-0100/
PDF	https://www.aclweb.org/anthology/W17-0100
PWC	https://paperswithcode.com/paper/proceedings-of-the-2nd-workshop-on-the-use-of
Repo
Framework

Document retrieval and question answering in medical documents. A large-scale corpus challenge.


Title	Document retrieval and question answering in medical documents. A large-scale corpus challenge.
Authors	Curea Eric
Abstract	Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations.
Tasks	Document Classification, Information Retrieval, Question Answering
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-8001/
PDF	https://doi.org/10.26615/978-954-452-044-1_001
PWC	https://paperswithcode.com/paper/document-retrieval-and-question-answering-in
Repo
Framework

Rule-based Machine translation from English to Finnish


Title	Rule-based Machine translation from English to Finnish
Authors	Arvi Hurskainen, J{"o}rg Tiedemann
Abstract
Tasks	Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4731/
PDF	https://www.aclweb.org/anthology/W17-4731
PWC	https://paperswithcode.com/paper/rule-based-machine-translation-from-english
Repo
Framework

EUDAMU at SemEval-2017 Task 11: Action Ranking and Type Matching for End-User Development


Title	EUDAMU at SemEval-2017 Task 11: Action Ranking and Type Matching for End-User Development
Authors	Marek Kubis, Pawe{\l} Sk{'o}rzewski, Tomasz Zi{\k{e}}tkiewicz
Abstract	The paper describes a system for end-user development using natural language. Our approach uses a ranking model to identify the actions to be executed followed by reference and parameter matching models to select parameter values that should be set for the given commands. We discuss the results of evaluation and possible improvements for future work.
Tasks	Action Detection, Tokenization
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2175/
PDF	https://www.aclweb.org/anthology/S17-2175
PWC	https://paperswithcode.com/paper/eudamu-at-semeval-2017-task-11-action-ranking
Repo
Framework

The Effect of Translationese on Tuning for Statistical Machine Translation


Title	The Effect of Translationese on Tuning for Statistical Machine Translation
Authors	Sara Stymne
Abstract
Tasks	Language Modelling, Machine Translation, Text Classification
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0230/
PDF	https://www.aclweb.org/anthology/W17-0230
PWC	https://paperswithcode.com/paper/the-effect-of-translationese-on-tuning-for
Repo
Framework

Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation


Title	Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation
Authors	Sam Henry, Clint Cuffy, Bridget McInnes
Abstract	In this paper, we present an analysis of feature extraction methods via dimensionality reduction for the task of biomedical Word Sense Disambiguation (WSD). We modify the vector representations in the 2-MRD WSD algorithm, and evaluate four dimensionality reduction methods: Word Embeddings using Continuous Bag of Words and Skip Gram, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA). We also evaluate the effects of vector size on the performance of each of these methods. Results are evaluated on five standard evaluation datasets (Abbrev.100, Abbrev.200, Abbrev.300, NLM-WSD, and MSH-WSD). We find that vector sizes of 100 are sufficient for all techniques except SVD, for which a vector size of 1500 is referred. We also show that SVD performs on par with Word Embeddings for all but one dataset.
Tasks	Dimensionality Reduction, Information Retrieval, Question Answering, Word Embeddings, Word Sense Disambiguation
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2334/
PDF	https://www.aclweb.org/anthology/W17-2334
PWC	https://paperswithcode.com/paper/evaluating-feature-extraction-methods-for
Repo
Framework

Chinese Answer Extraction Based on POS Tree and Genetic Algorithm


Title	Chinese Answer Extraction Based on POS Tree and Genetic Algorithm
Authors	Shuihua Li, Xiaoming Zhang, Zhoujun Li
Abstract	Answer extraction is the most important part of a chinese web-based question answering system. In order to enhance the robustness and adaptability of answer extraction to new domains and eliminate the influence of the incomplete and noisy search snippets, we propose two new answer exraction methods. We utilize text patterns to generate Part-of-Speech (POS) patterns. In addition, a method is proposed to construct a POS tree by using these POS patterns. The POS tree is useful to candidate answer extraction of web-based question answering. To retrieve a efficient POS tree, the similarities between questions are used to select the question-answer pairs whose questions are similar to the unanswered question. Then, the POS tree is improved based on these question-answer pairs. In order to rank these candidate answers, the weights of the leaf nodes of the POS tree are calculated using a heuristic method. Moreover, the Genetic Algorithm (GA) is used to train the weights. The experimental results of 10-fold crossvalidation show that the weighted POS tree trained by GA can improve the accuracy of answer extraction.
Tasks	Information Retrieval, Question Answering
Published	2017-12-01
URL	https://www.aclweb.org/anthology/W17-6004/
PDF	https://www.aclweb.org/anthology/W17-6004
PWC	https://paperswithcode.com/paper/chinese-answer-extraction-based-on-pos-tree
Repo
Framework

Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications


Title	Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Authors
Abstract
Tasks
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1900/
PDF	https://www.aclweb.org/anthology/W17-1900
PWC	https://paperswithcode.com/paper/proceedings-of-the-1st-workshop-on-sense
Repo
Framework

Arabic Diacritization: Stats, Rules, and Hacks


Title	Arabic Diacritization: Stats, Rules, and Hacks
Authors	Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali
Abstract	In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings. We employ a Viterbi decoder at word-level with back-off to stem, morphological patterns, and transliteration and sequence labeling based diacritization of named entities. For case endings, we use Support Vector Machine (SVM) based ranking coupled with morphological patterns and linguistic rules to properly guess case endings. We achieve a low word level diacritization error of 3.29{%} and 12.77{%} without and with case endings respectively on a new multi-genre free of copyright test set. We are making the diacritizer available for free for research purposes.
Tasks	Part-Of-Speech Tagging, Transliteration, Word Sense Disambiguation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1302/
PDF	https://www.aclweb.org/anthology/W17-1302
PWC	https://paperswithcode.com/paper/arabic-diacritization-stats-rules-and-hacks
Repo
Framework

Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017


Title	Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Authors	Ruslan Mitkov, Galia Angelova
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/papers/R17-1000/r17-1000
PDF	https://www.aclweb.org/anthology/R17-1000
PWC	https://paperswithcode.com/paper/proceedings-of-the-international-conference
Repo
Framework

JU CSE NLP @ SemEval 2017 Task 7: Employing Rules to Detect and Interpret English Puns


Title	JU CSE NLP @ SemEval 2017 Task 7: Employing Rules to Detect and Interpret English Puns
Authors	Aniket Pramanick, Dipankar Das
Abstract	System description. Implementation of HMM and Cyclic Dependency Network.
Tasks	Word Sense Disambiguation
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2073/
PDF	https://www.aclweb.org/anthology/S17-2073
PWC	https://paperswithcode.com/paper/ju-cse-nlp-semeval-2017-task-7-employing
Repo
Framework

Adapting Pre-trained Word Embeddings For Use In Medical Coding


Title	Adapting Pre-trained Word Embeddings For Use In Medical Coding
Authors	Kevin Patel, Divya Patel, Mansi Golakiya, Pushpak Bhattacharyya, Nilesh Birari
Abstract	Word embeddings are a crucial component in modern NLP. Pre-trained embeddings released by different groups have been a major reason for their popularity. However, they are trained on generic corpora, which limits their direct use for domain specific tasks. In this paper, we propose a method to add task specific information to pre-trained word embeddings. Such information can improve their utility. We add information from medical coding data, as well as the first level from the hierarchy of ICD-10 medical code set to different pre-trained word embeddings. We adapt CBOW algorithm from the word2vec package for our purpose. We evaluated our approach on five different pre-trained word embeddings. Both the original word embeddings, and their modified versions (the ones with added information) were used for automated review of medical coding. The modified word embeddings give an improvement in f-score by 1{%} on the 5-fold evaluation on a private medical claims dataset. Our results show that adding extra information is possible and beneficial for the task at hand.
Tasks	Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2338/
PDF	https://www.aclweb.org/anthology/W17-2338
PWC	https://paperswithcode.com/paper/adapting-pre-trained-word-embeddings-for-use
Repo
Framework

Random Permutation Online Isotonic Regression


Title	Random Permutation Online Isotonic Regression
Authors	Wojciech Kotlowski, Wouter M. Koolen, Alan Malek
Abstract	We revisit isotonic regression on linear orders, the problem of fitting monotonic functions to best explain the data, in an online setting. It was previously shown that online isotonic regression is unlearnable in a fully adversarial model, which lead to its study in the fixed design model. Here, we instead develop the more practical random permutation model. We show that the regret is bounded above by the excess leave-one-out loss for which we develop efficient algorithms and matching lower bounds. We also analyze the class of simple and popular forward algorithms and recommend where to look for algorithms for online isotonic regression on partial orders.
Tasks
Published	2017-12-01
URL	http://papers.nips.cc/paper/7006-random-permutation-online-isotonic-regression
PDF	http://papers.nips.cc/paper/7006-random-permutation-online-isotonic-regression.pdf
PWC	https://paperswithcode.com/paper/random-permutation-online-isotonic-regression
Repo
Framework

Assessing the performance of Olelo, a real-time biomedical question answering application


Title	Assessing the performance of Olelo, a real-time biomedical question answering application
Authors	Mariana Neves, Fabian Eckert, Hendrik Folkerts, Matthias Uflacker
Abstract	Question answering (QA) can support physicians and biomedical researchers to find answers to their questions in the scientific literature. Such systems process large collections of documents in real time and include many natural language processing (NLP) procedures. We recently developed Olelo, a QA system for biomedicine which includes various NLP components, such as question processing, document and passage retrieval, answer processing and multi-document summarization. In this work, we present an evaluation of our system on the the fifth BioASQ challenge. We participated with the current state of the application and with an extension based on semantic role labeling that we are currently investigating. In addition to the BioASQ evaluation, we compared our system to other on-line biomedical QA systems in terms of the response time and the quality of the answers.
Tasks	Document Summarization, Information Retrieval, Multi-Document Summarization, Question Answering, Semantic Role Labeling
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2344/
PDF	https://www.aclweb.org/anthology/W17-2344
PWC	https://paperswithcode.com/paper/assessing-the-performance-of-olelo-a-real
Repo
Framework