Paper Group NANR 193
Identifying Outlier Arms in Multi-Armed Bandit. Wasserstein Generative Adversarial Networks. Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings. Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention. A Method to Generate a Machine-Lab …
Identifying Outlier Arms in Multi-Armed Bandit
Title | Identifying Outlier Arms in Multi-Armed Bandit |
Authors | Honglei Zhuang, Chi Wang, Yifan Wang |
Abstract | We study a novel problem lying at the intersection of two areas: multi-armed bandit and outlier detection. Multi-armed bandit is a useful tool to model the process of incrementally collecting data for multiple objects in a decision space. Outlier detection is a powerful method to narrow down the attention to a few objects after the data for them are collected. However, no one has studied how to detect outlier objects while incrementally collecting data for them, which is necessary when data collection is expensive. We formalize this problem as identifying outlier arms in a multi-armed bandit. We propose two sampling strategies with theoretical guarantee, and analyze their sampling efficiency. Our experimental results on both synthetic and real data show that our solution saves 70-99% of data collection cost from baseline while having nearly perfect accuracy. |
Tasks | Outlier Detection |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/7105-identifying-outlier-arms-in-multi-armed-bandit |
http://papers.nips.cc/paper/7105-identifying-outlier-arms-in-multi-armed-bandit.pdf | |
PWC | https://paperswithcode.com/paper/identifying-outlier-arms-in-multi-armed |
Repo | |
Framework | |
Wasserstein Generative Adversarial Networks
Title | Wasserstein Generative Adversarial Networks |
Authors | Martin Arjovsky, Soumith Chintala, Léon Bottou |
Abstract | We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to different distances between distributions. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=799 |
http://proceedings.mlr.press/v70/arjovsky17a/arjovsky17a.pdf | |
PWC | https://paperswithcode.com/paper/wasserstein-generative-adversarial-networks |
Repo | |
Framework | |
Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings
Title | Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings |
Authors | Benjamin Marie, Atsushi Fujita |
Abstract | We propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information. Our method first exploits word embeddings in order to efficiently evaluate trillions of candidate sentence pairs and then a classifier to find the most reliable ones. We report significant improvements in domain adaptation for statistical machine translation when using a translation model trained on the sentence pairs extracted from in-domain monolingual corpora. |
Tasks | Domain Adaptation, Information Retrieval, Machine Translation, Word Embeddings |
Published | 2017-07-01 |
URL | https://www.aclweb.org/anthology/P17-2062/ |
https://www.aclweb.org/anthology/P17-2062 | |
PWC | https://paperswithcode.com/paper/efficient-extraction-of-pseudo-parallel |
Repo | |
Framework | |
Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention
Title | Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention |
Authors | Maria Becker, Michael Staniek, Vivi Nastase, Alexis Palmer, Anette Frank |
Abstract | Detecting aspectual properties of clauses in the form of situation entity types has been shown to depend on a combination of syntactic-semantic and contextual features. We explore this task in a deep-learning framework, where tuned word representations capture lexical, syntactic and semantic features. We introduce an attention mechanism that pinpoints relevant context not only for the current instance, but also for the larger context. Apart from implicitly capturing task relevant features, the advantage of our neural model is that it avoids the need to reproduce linguistic features for other languages and is thus more easily transferable. We present experiments for English and German that achieve competitive performance. We present a novel take on modeling and exploiting genre information and showcase the adaptation of our system from one language to another. |
Tasks | Feature Engineering, Language Modelling, Relation Classification, Sentence Classification, Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-1027/ |
https://www.aclweb.org/anthology/S17-1027 | |
PWC | https://paperswithcode.com/paper/classifying-semantic-clause-types-modeling |
Repo | |
Framework | |
A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains
Title | A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains |
Authors | Juae Kim, Sunjae Kwon, Youngjoong Ko, Jungyun Seo |
Abstract | Biomedical Named Entity (NE) recognition is a core technique for various works in the biomedical domain. In previous studies, using machine learning algorithm shows better performance than dictionary-based and rule-based approaches because there are too many terminological variations of biomedical NEs and new biomedical NEs are constantly generated. To achieve the high performance with a machine-learning algorithm, good-quality corpora are required. However, it is difficult to obtain the good-quality corpora because an-notating a biomedical corpus for ma-chine-learning is extremely time-consuming and costly. In addition, most previous corpora are insufficient for high-level tasks because they cannot cover various domains. Therefore, we propose a method for generating a large amount of machine-labeled data that covers various domains. To generate a large amount of machine-labeled data, firstly we generate an initial machine-labeled data by using a chunker and MetaMap. The chunker is developed to extract only biomedical NEs with manually annotated data. MetaMap is used to annotate the category of bio-medical NE. Then we apply the self-training approach to bootstrap the performance of initial machine-labeled data. In our experiments, the biomedical NE recognition system that is trained with our proposed machine-labeled data achieves much high performance. As a result, our system outperforms biomedical NE recognition system that using MetaMap only with 26.03{%}p improvements on F1-score. |
Tasks | Named Entity Recognition, Question Answering |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/W17-5807/ |
https://www.aclweb.org/anthology/W17-5807 | |
PWC | https://paperswithcode.com/paper/a-method-to-generate-a-machine-labeled-data |
Repo | |
Framework | |
Universal Dependencies for Portuguese
Title | Universal Dependencies for Portuguese |
Authors | Alex Rademaker, re, Fabricio Chalub, Livy Real, Cl{'a}udia Freitas, Eckhard Bick, Valeria de Paiva |
Abstract | |
Tasks | Dependency Parsing |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6523/ |
https://www.aclweb.org/anthology/W17-6523 | |
PWC | https://paperswithcode.com/paper/universal-dependencies-for-portuguese |
Repo | |
Framework | |
Fighting' or
Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse
Title | Fighting' or Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse |
Authors | Linyuan Tang, Kyo Kageura |
Abstract | Previous work on the epistemology of fact-checking indicated the dilemma between the needs of binary answers for the public and ambiguity of political discussion. Determining concepts represented by terms in political discourse can be considered as a Word-Sense Disambiguation (WSD) task. The analysis of political discourse, however, requires identifying precise concepts of terms from relatively small data. This work attempts to provide a basic framework for revealing concepts of terms in political discourse with explicit contextual information. The framework consists of three parts: 1) extracting important terms, 2) generating concordance for each term with stipulative definitions and explanations, and 3) agglomerating similar information of the term by hierarchical clustering. Utterances made by Prime Minister Abe Shinzo in the Diet of Japan are used to examine our framework. Importantly, we revealed the conceptual inconsistency of the term Sonritsu-kiki-jitai. The framework was proved to work, but only for a small number of terms due to lack of explicit contextual information. |
Tasks | Word Sense Disambiguation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4216/ |
https://www.aclweb.org/anthology/W17-4216 | |
PWC | https://paperswithcode.com/paper/fighting-or-conflict-an-approach-to-revealing |
Repo | |
Framework | |
Connecting people digitally - a semantic web based approach to linking heterogeneous data sets
Title | Connecting people digitally - a semantic web based approach to linking heterogeneous data sets |
Authors | Katalin Lejtovicz, Amelie Dorn |
Abstract | In this paper we present a semantic enrichment approach for linking two distinct data sets: the {"O}BL (Austrian Biographical Dictionary) and the DB{"O} (Database of Bavarian Dialects in Austria). Although the data sets are different in their content and in the structuring of data, they contain similar common {``}entities{''} such as names of persons. Here we describe the semantic enrichment process of how these data sets can be inter-linked through URIs (Uniform Resource Identifiers) taking person names as a concrete example. Moreover, we also point to societal benefits of applying such semantic enrichment methods in order to open and connect our resources to various services. | |
Tasks | Entity Linking, Word Sense Disambiguation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7801/ |
https://doi.org/10.26615/978-954-452-040-3_001 | |
PWC | https://paperswithcode.com/paper/connecting-people-digitally-a-semantic-web |
Repo | |
Framework | |