Paper Group NANR 119
On orthogonality and learning RNNs with long term dependencies. Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages. Document retrieval and question answering in medical documents. A large-scale corpus challenge.. Rule-based Machine translation from English to Finnish. EUDAMU at SemEval-2017 Task …
On orthogonality and learning RNNs with long term dependencies
Title | On orthogonality and learning RNNs with long term dependencies |
Authors | Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, Chris Pal |
Abstract | It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints on orthogonality can negatively affect the speed of convergence and model performance. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=740 |
http://proceedings.mlr.press/v70/vorontsov17a/vorontsov17a.pdf | |
PWC | https://paperswithcode.com/paper/on-orthogonality-and-learning-rnns-with-long |
Repo | |
Framework | |
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages
Title | Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages |
Authors | |
Abstract | |
Tasks | |
Published | 2017-03-01 |
URL | https://www.aclweb.org/anthology/W17-0100/ |
https://www.aclweb.org/anthology/W17-0100 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-2nd-workshop-on-the-use-of |
Repo | |
Framework | |
Document retrieval and question answering in medical documents. A large-scale corpus challenge.
Title | Document retrieval and question answering in medical documents. A large-scale corpus challenge. |
Authors | Curea Eric |
Abstract | Whenever employed on large datasets, information retrieval works by isolating a subset of documents from the larger dataset and then proceeding with low-level processing of the text. This is usually carried out by means of adding index-terms to each document in the collection. In this paper we deal with automatic document classification and index-term detection applied on large-scale medical corpora. In our methodology we employ a linear classifier and we test our results on the BioASQ training corpora, which is a collection of 12 million MeSH-indexed medical abstracts. We cover both term-indexing, result retrieval and result ranking based on distributed word representations. |
Tasks | Document Classification, Information Retrieval, Question Answering |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-8001/ |
https://doi.org/10.26615/978-954-452-044-1_001 | |
PWC | https://paperswithcode.com/paper/document-retrieval-and-question-answering-in |
Repo | |
Framework | |
Rule-based Machine translation from English to Finnish
Title | Rule-based Machine translation from English to Finnish |
Authors | Arvi Hurskainen, J{"o}rg Tiedemann |
Abstract | |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4731/ |
https://www.aclweb.org/anthology/W17-4731 | |
PWC | https://paperswithcode.com/paper/rule-based-machine-translation-from-english |
Repo | |
Framework | |
EUDAMU at SemEval-2017 Task 11: Action Ranking and Type Matching for End-User Development
Title | EUDAMU at SemEval-2017 Task 11: Action Ranking and Type Matching for End-User Development |
Authors | Marek Kubis, Pawe{\l} Sk{'o}rzewski, Tomasz Zi{\k{e}}tkiewicz |
Abstract | The paper describes a system for end-user development using natural language. Our approach uses a ranking model to identify the actions to be executed followed by reference and parameter matching models to select parameter values that should be set for the given commands. We discuss the results of evaluation and possible improvements for future work. |
Tasks | Action Detection, Tokenization |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2175/ |
https://www.aclweb.org/anthology/S17-2175 | |
PWC | https://paperswithcode.com/paper/eudamu-at-semeval-2017-task-11-action-ranking |
Repo | |
Framework | |
The Effect of Translationese on Tuning for Statistical Machine Translation
Title | The Effect of Translationese on Tuning for Statistical Machine Translation |
Authors | Sara Stymne |
Abstract | |
Tasks | Language Modelling, Machine Translation, Text Classification |
Published | 2017-05-01 |
URL | https://www.aclweb.org/anthology/W17-0230/ |
https://www.aclweb.org/anthology/W17-0230 | |
PWC | https://paperswithcode.com/paper/the-effect-of-translationese-on-tuning-for |
Repo | |
Framework | |
Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation
Title | Evaluating Feature Extraction Methods for Knowledge-based Biomedical Word Sense Disambiguation |
Authors | Sam Henry, Clint Cuffy, Bridget McInnes |
Abstract | In this paper, we present an analysis of feature extraction methods via dimensionality reduction for the task of biomedical Word Sense Disambiguation (WSD). We modify the vector representations in the 2-MRD WSD algorithm, and evaluate four dimensionality reduction methods: Word Embeddings using Continuous Bag of Words and Skip Gram, Singular Value Decomposition (SVD), and Principal Component Analysis (PCA). We also evaluate the effects of vector size on the performance of each of these methods. Results are evaluated on five standard evaluation datasets (Abbrev.100, Abbrev.200, Abbrev.300, NLM-WSD, and MSH-WSD). We find that vector sizes of 100 are sufficient for all techniques except SVD, for which a vector size of 1500 is referred. We also show that SVD performs on par with Word Embeddings for all but one dataset. |
Tasks | Dimensionality Reduction, Information Retrieval, Question Answering, Word Embeddings, Word Sense Disambiguation |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2334/ |
https://www.aclweb.org/anthology/W17-2334 | |
PWC | https://paperswithcode.com/paper/evaluating-feature-extraction-methods-for |
Repo | |
Framework | |
Chinese Answer Extraction Based on POS Tree and Genetic Algorithm
Title | Chinese Answer Extraction Based on POS Tree and Genetic Algorithm |
Authors | Shuihua Li, Xiaoming Zhang, Zhoujun Li |
Abstract | Answer extraction is the most important part of a chinese web-based question answering system. In order to enhance the robustness and adaptability of answer extraction to new domains and eliminate the influence of the incomplete and noisy search snippets, we propose two new answer exraction methods. We utilize text patterns to generate Part-of-Speech (POS) patterns. In addition, a method is proposed to construct a POS tree by using these POS patterns. The POS tree is useful to candidate answer extraction of web-based question answering. To retrieve a efficient POS tree, the similarities between questions are used to select the question-answer pairs whose questions are similar to the unanswered question. Then, the POS tree is improved based on these question-answer pairs. In order to rank these candidate answers, the weights of the leaf nodes of the POS tree are calculated using a heuristic method. Moreover, the Genetic Algorithm (GA) is used to train the weights. The experimental results of 10-fold crossvalidation show that the weighted POS tree trained by GA can improve the accuracy of answer extraction. |
Tasks | Information Retrieval, Question Answering |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/W17-6004/ |
https://www.aclweb.org/anthology/W17-6004 | |
PWC | https://paperswithcode.com/paper/chinese-answer-extraction-based-on-pos-tree |
Repo | |
Framework | |
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications
Title | Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications |
Authors | |
Abstract | |
Tasks | |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1900/ |
https://www.aclweb.org/anthology/W17-1900 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-1st-workshop-on-sense |
Repo | |
Framework | |
Arabic Diacritization: Stats, Rules, and Hacks
Title | Arabic Diacritization: Stats, Rules, and Hacks |
Authors | Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali |
Abstract | In this paper, we present a new and fast state-of-the-art Arabic diacritizer that guesses the diacritics of words and then their case endings. We employ a Viterbi decoder at word-level with back-off to stem, morphological patterns, and transliteration and sequence labeling based diacritization of named entities. For case endings, we use Support Vector Machine (SVM) based ranking coupled with morphological patterns and linguistic rules to properly guess case endings. We achieve a low word level diacritization error of 3.29{%} and 12.77{%} without and with case endings respectively on a new multi-genre free of copyright test set. We are making the diacritizer available for free for research purposes. |
Tasks | Part-Of-Speech Tagging, Transliteration, Word Sense Disambiguation |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/W17-1302/ |
https://www.aclweb.org/anthology/W17-1302 | |
PWC | https://paperswithcode.com/paper/arabic-diacritization-stats-rules-and-hacks |
Repo | |
Framework | |
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Title | Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 |
Authors | Ruslan Mitkov, Galia Angelova |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/papers/R17-1000/r17-1000 |
https://www.aclweb.org/anthology/R17-1000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-international-conference |
Repo | |
Framework | |
JU CSE NLP @ SemEval 2017 Task 7: Employing Rules to Detect and Interpret English Puns
Title | JU CSE NLP @ SemEval 2017 Task 7: Employing Rules to Detect and Interpret English Puns |
Authors | Aniket Pramanick, Dipankar Das |
Abstract | System description. Implementation of HMM and Cyclic Dependency Network. |
Tasks | Word Sense Disambiguation |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2073/ |
https://www.aclweb.org/anthology/S17-2073 | |
PWC | https://paperswithcode.com/paper/ju-cse-nlp-semeval-2017-task-7-employing |
Repo | |
Framework | |
Adapting Pre-trained Word Embeddings For Use In Medical Coding
Title | Adapting Pre-trained Word Embeddings For Use In Medical Coding |
Authors | Kevin Patel, Divya Patel, Mansi Golakiya, Pushpak Bhattacharyya, Nilesh Birari |
Abstract | Word embeddings are a crucial component in modern NLP. Pre-trained embeddings released by different groups have been a major reason for their popularity. However, they are trained on generic corpora, which limits their direct use for domain specific tasks. In this paper, we propose a method to add task specific information to pre-trained word embeddings. Such information can improve their utility. We add information from medical coding data, as well as the first level from the hierarchy of ICD-10 medical code set to different pre-trained word embeddings. We adapt CBOW algorithm from the word2vec package for our purpose. We evaluated our approach on five different pre-trained word embeddings. Both the original word embeddings, and their modified versions (the ones with added information) were used for automated review of medical coding. The modified word embeddings give an improvement in f-score by 1{%} on the 5-fold evaluation on a private medical claims dataset. Our results show that adding extra information is possible and beneficial for the task at hand. |
Tasks | Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2338/ |
https://www.aclweb.org/anthology/W17-2338 | |
PWC | https://paperswithcode.com/paper/adapting-pre-trained-word-embeddings-for-use |
Repo | |
Framework | |
Random Permutation Online Isotonic Regression
Title | Random Permutation Online Isotonic Regression |
Authors | Wojciech Kotlowski, Wouter M. Koolen, Alan Malek |
Abstract | We revisit isotonic regression on linear orders, the problem of fitting monotonic functions to best explain the data, in an online setting. It was previously shown that online isotonic regression is unlearnable in a fully adversarial model, which lead to its study in the fixed design model. Here, we instead develop the more practical random permutation model. We show that the regret is bounded above by the excess leave-one-out loss for which we develop efficient algorithms and matching lower bounds. We also analyze the class of simple and popular forward algorithms and recommend where to look for algorithms for online isotonic regression on partial orders. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7006-random-permutation-online-isotonic-regression |
http://papers.nips.cc/paper/7006-random-permutation-online-isotonic-regression.pdf | |
PWC | https://paperswithcode.com/paper/random-permutation-online-isotonic-regression |
Repo | |
Framework | |
Assessing the performance of Olelo, a real-time biomedical question answering application
Title | Assessing the performance of Olelo, a real-time biomedical question answering application |
Authors | Mariana Neves, Fabian Eckert, Hendrik Folkerts, Matthias Uflacker |
Abstract | Question answering (QA) can support physicians and biomedical researchers to find answers to their questions in the scientific literature. Such systems process large collections of documents in real time and include many natural language processing (NLP) procedures. We recently developed Olelo, a QA system for biomedicine which includes various NLP components, such as question processing, document and passage retrieval, answer processing and multi-document summarization. In this work, we present an evaluation of our system on the the fifth BioASQ challenge. We participated with the current state of the application and with an extension based on semantic role labeling that we are currently investigating. In addition to the BioASQ evaluation, we compared our system to other on-line biomedical QA systems in terms of the response time and the quality of the answers. |
Tasks | Document Summarization, Information Retrieval, Multi-Document Summarization, Question Answering, Semantic Role Labeling |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2344/ |
https://www.aclweb.org/anthology/W17-2344 | |
PWC | https://paperswithcode.com/paper/assessing-the-performance-of-olelo-a-real |
Repo | |
Framework | |