July 26, 2019

1297 words 7 mins read

Paper Group NANR 193

Identifying Outlier Arms in Multi-Armed Bandit. Wasserstein Generative Adversarial Networks. Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings. Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention. A Method to Generate a Machine-Lab …

Identifying Outlier Arms in Multi-Armed Bandit


Title	Identifying Outlier Arms in Multi-Armed Bandit
Authors	Honglei Zhuang, Chi Wang, Yifan Wang
Abstract	We study a novel problem lying at the intersection of two areas: multi-armed bandit and outlier detection. Multi-armed bandit is a useful tool to model the process of incrementally collecting data for multiple objects in a decision space. Outlier detection is a powerful method to narrow down the attention to a few objects after the data for them are collected. However, no one has studied how to detect outlier objects while incrementally collecting data for them, which is necessary when data collection is expensive. We formalize this problem as identifying outlier arms in a multi-armed bandit. We propose two sampling strategies with theoretical guarantee, and analyze their sampling efficiency. Our experimental results on both synthetic and real data show that our solution saves 70-99% of data collection cost from baseline while having nearly perfect accuracy.
Tasks	Outlier Detection
Published	2017-12-01
URL	http://papers.nips.cc/paper/7105-identifying-outlier-arms-in-multi-armed-bandit
PDF	http://papers.nips.cc/paper/7105-identifying-outlier-arms-in-multi-armed-bandit.pdf
PWC	https://paperswithcode.com/paper/identifying-outlier-arms-in-multi-armed
Repo
Framework

Wasserstein Generative Adversarial Networks


Title	Wasserstein Generative Adversarial Networks
Authors	Martin Arjovsky, Soumith Chintala, Léon Bottou
Abstract	We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to different distances between distributions.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=799
PDF	http://proceedings.mlr.press/v70/arjovsky17a/arjovsky17a.pdf
PWC	https://paperswithcode.com/paper/wasserstein-generative-adversarial-networks
Repo
Framework

Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings


Title	Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings
Authors	Benjamin Marie, Atsushi Fujita
Abstract	We propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information. Our method first exploits word embeddings in order to efficiently evaluate trillions of candidate sentence pairs and then a classifier to find the most reliable ones. We report significant improvements in domain adaptation for statistical machine translation when using a translation model trained on the sentence pairs extracted from in-domain monolingual corpora.
Tasks	Domain Adaptation, Information Retrieval, Machine Translation, Word Embeddings
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2062/
PDF	https://www.aclweb.org/anthology/P17-2062
PWC	https://paperswithcode.com/paper/efficient-extraction-of-pseudo-parallel
Repo
Framework

Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention


Title	Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention
Authors	Maria Becker, Michael Staniek, Vivi Nastase, Alexis Palmer, Anette Frank
Abstract	Detecting aspectual properties of clauses in the form of situation entity types has been shown to depend on a combination of syntactic-semantic and contextual features. We explore this task in a deep-learning framework, where tuned word representations capture lexical, syntactic and semantic features. We introduce an attention mechanism that pinpoints relevant context not only for the current instance, but also for the larger context. Apart from implicitly capturing task relevant features, the advantage of our neural model is that it avoids the need to reproduce linguistic features for other languages and is thus more easily transferable. We present experiments for English and German that achieve competitive performance. We present a novel take on modeling and exploiting genre information and showcase the adaptation of our system from one language to another.
Tasks	Feature Engineering, Language Modelling, Relation Classification, Sentence Classification, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-1027/
PDF	https://www.aclweb.org/anthology/S17-1027
PWC	https://paperswithcode.com/paper/classifying-semantic-clause-types-modeling
Repo
Framework

A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains


Title	A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains
Authors	Juae Kim, Sunjae Kwon, Youngjoong Ko, Jungyun Seo
Abstract	Biomedical Named Entity (NE) recognition is a core technique for various works in the biomedical domain. In previous studies, using machine learning algorithm shows better performance than dictionary-based and rule-based approaches because there are too many terminological variations of biomedical NEs and new biomedical NEs are constantly generated. To achieve the high performance with a machine-learning algorithm, good-quality corpora are required. However, it is difficult to obtain the good-quality corpora because an-notating a biomedical corpus for ma-chine-learning is extremely time-consuming and costly. In addition, most previous corpora are insufficient for high-level tasks because they cannot cover various domains. Therefore, we propose a method for generating a large amount of machine-labeled data that covers various domains. To generate a large amount of machine-labeled data, firstly we generate an initial machine-labeled data by using a chunker and MetaMap. The chunker is developed to extract only biomedical NEs with manually annotated data. MetaMap is used to annotate the category of bio-medical NE. Then we apply the self-training approach to bootstrap the performance of initial machine-labeled data. In our experiments, the biomedical NE recognition system that is trained with our proposed machine-labeled data achieves much high performance. As a result, our system outperforms biomedical NE recognition system that using MetaMap only with 26.03{%}p improvements on F1-score.
Tasks	Named Entity Recognition, Question Answering
Published	2017-11-01
URL	https://www.aclweb.org/anthology/W17-5807/
PDF	https://www.aclweb.org/anthology/W17-5807
PWC	https://paperswithcode.com/paper/a-method-to-generate-a-machine-labeled-data
Repo
Framework

Universal Dependencies for Portuguese


Title	Universal Dependencies for Portuguese
Authors	Alex Rademaker, re, Fabricio Chalub, Livy Real, Cl{'a}udia Freitas, Eckhard Bick, Valeria de Paiva
Abstract
Tasks	Dependency Parsing
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-6523/
PDF	https://www.aclweb.org/anthology/W17-6523
PWC	https://paperswithcode.com/paper/universal-dependencies-for-portuguese
Repo
Framework

`Fighting' or` Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse


Title	`Fighting' or` Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse
Authors	Linyuan Tang, Kyo Kageura
Abstract	Previous work on the epistemology of fact-checking indicated the dilemma between the needs of binary answers for the public and ambiguity of political discussion. Determining concepts represented by terms in political discourse can be considered as a Word-Sense Disambiguation (WSD) task. The analysis of political discourse, however, requires identifying precise concepts of terms from relatively small data. This work attempts to provide a basic framework for revealing concepts of terms in political discourse with explicit contextual information. The framework consists of three parts: 1) extracting important terms, 2) generating concordance for each term with stipulative definitions and explanations, and 3) agglomerating similar information of the term by hierarchical clustering. Utterances made by Prime Minister Abe Shinzo in the Diet of Japan are used to examine our framework. Importantly, we revealed the conceptual inconsistency of the term Sonritsu-kiki-jitai. The framework was proved to work, but only for a small number of terms due to lack of explicit contextual information.
Tasks	Word Sense Disambiguation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4216/
PDF	https://www.aclweb.org/anthology/W17-4216
PWC	https://paperswithcode.com/paper/fighting-or-conflict-an-approach-to-revealing
Repo
Framework

Connecting people digitally - a semantic web based approach to linking heterogeneous data sets


Title	Connecting people digitally - a semantic web based approach to linking heterogeneous data sets
Authors	Katalin Lejtovicz, Amelie Dorn
Abstract	In this paper we present a semantic enrichment approach for linking two distinct data sets: the {"O}BL (Austrian Biographical Dictionary) and the DB{"O} (Database of Bavarian Dialects in Austria). Although the data sets are different in their content and in the structuring of data, they contain similar common {``}entities{''} such as names of persons. Here we describe the semantic enrichment process of how these data sets can be inter-linked through URIs (Uniform Resource Identifiers) taking person names as a concrete example. Moreover, we also point to societal benefits of applying such semantic enrichment methods in order to open and connect our resources to various services. \|
Tasks	Entity Linking, Word Sense Disambiguation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7801/
PDF	https://doi.org/10.26615/978-954-452-040-3_001
PWC	https://paperswithcode.com/paper/connecting-people-digitally-a-semantic-web
Repo
Framework

Paper Group NANR 193

Identifying Outlier Arms in Multi-Armed Bandit

Wasserstein Generative Adversarial Networks

Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings

Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention

A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains

Universal Dependencies for Portuguese

Fighting' or Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse

Connecting people digitally - a semantic web based approach to linking heterogeneous data sets

Paper Group NANR 43

Paper Group NANR 45

Paper Group NANR 5

`Fighting' or` Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse