July 26, 2019

1297 words 7 mins read

Paper Group NANR 193

Paper Group NANR 193

Identifying Outlier Arms in Multi-Armed Bandit. Wasserstein Generative Adversarial Networks. Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings. Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention. A Method to Generate a Machine-Lab …

Identifying Outlier Arms in Multi-Armed Bandit

Title Identifying Outlier Arms in Multi-Armed Bandit
Authors Honglei Zhuang, Chi Wang, Yifan Wang
Abstract We study a novel problem lying at the intersection of two areas: multi-armed bandit and outlier detection. Multi-armed bandit is a useful tool to model the process of incrementally collecting data for multiple objects in a decision space. Outlier detection is a powerful method to narrow down the attention to a few objects after the data for them are collected. However, no one has studied how to detect outlier objects while incrementally collecting data for them, which is necessary when data collection is expensive. We formalize this problem as identifying outlier arms in a multi-armed bandit. We propose two sampling strategies with theoretical guarantee, and analyze their sampling efficiency. Our experimental results on both synthetic and real data show that our solution saves 70-99% of data collection cost from baseline while having nearly perfect accuracy.
Tasks Outlier Detection
Published 2017-12-01
URL http://papers.nips.cc/paper/7105-identifying-outlier-arms-in-multi-armed-bandit
PDF http://papers.nips.cc/paper/7105-identifying-outlier-arms-in-multi-armed-bandit.pdf
PWC https://paperswithcode.com/paper/identifying-outlier-arms-in-multi-armed
Repo
Framework

Wasserstein Generative Adversarial Networks

Title Wasserstein Generative Adversarial Networks
Authors Martin Arjovsky, Soumith Chintala, Léon Bottou
Abstract We introduce a new algorithm named WGAN, an alternative to traditional GAN training. In this new model, we show that we can improve the stability of learning, get rid of problems like mode collapse, and provide meaningful learning curves useful for debugging and hyperparameter searches. Furthermore, we show that the corresponding optimization problem is sound, and provide extensive theoretical work highlighting the deep connections to different distances between distributions.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=799
PDF http://proceedings.mlr.press/v70/arjovsky17a/arjovsky17a.pdf
PWC https://paperswithcode.com/paper/wasserstein-generative-adversarial-networks
Repo
Framework

Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings

Title Efficient Extraction of Pseudo-Parallel Sentences from Raw Monolingual Data Using Word Embeddings
Authors Benjamin Marie, Atsushi Fujita
Abstract We propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information. Our method first exploits word embeddings in order to efficiently evaluate trillions of candidate sentence pairs and then a classifier to find the most reliable ones. We report significant improvements in domain adaptation for statistical machine translation when using a translation model trained on the sentence pairs extracted from in-domain monolingual corpora.
Tasks Domain Adaptation, Information Retrieval, Machine Translation, Word Embeddings
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2062/
PDF https://www.aclweb.org/anthology/P17-2062
PWC https://paperswithcode.com/paper/efficient-extraction-of-pseudo-parallel
Repo
Framework

Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention

Title Classifying Semantic Clause Types: Modeling Context and Genre Characteristics with Recurrent Neural Networks and Attention
Authors Maria Becker, Michael Staniek, Vivi Nastase, Alexis Palmer, Anette Frank
Abstract Detecting aspectual properties of clauses in the form of situation entity types has been shown to depend on a combination of syntactic-semantic and contextual features. We explore this task in a deep-learning framework, where tuned word representations capture lexical, syntactic and semantic features. We introduce an attention mechanism that pinpoints relevant context not only for the current instance, but also for the larger context. Apart from implicitly capturing task relevant features, the advantage of our neural model is that it avoids the need to reproduce linguistic features for other languages and is thus more easily transferable. We present experiments for English and German that achieve competitive performance. We present a novel take on modeling and exploiting genre information and showcase the adaptation of our system from one language to another.
Tasks Feature Engineering, Language Modelling, Relation Classification, Sentence Classification, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-1027/
PDF https://www.aclweb.org/anthology/S17-1027
PWC https://paperswithcode.com/paper/classifying-semantic-clause-types-modeling
Repo
Framework

A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains

Title A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains
Authors Juae Kim, Sunjae Kwon, Youngjoong Ko, Jungyun Seo
Abstract Biomedical Named Entity (NE) recognition is a core technique for various works in the biomedical domain. In previous studies, using machine learning algorithm shows better performance than dictionary-based and rule-based approaches because there are too many terminological variations of biomedical NEs and new biomedical NEs are constantly generated. To achieve the high performance with a machine-learning algorithm, good-quality corpora are required. However, it is difficult to obtain the good-quality corpora because an-notating a biomedical corpus for ma-chine-learning is extremely time-consuming and costly. In addition, most previous corpora are insufficient for high-level tasks because they cannot cover various domains. Therefore, we propose a method for generating a large amount of machine-labeled data that covers various domains. To generate a large amount of machine-labeled data, firstly we generate an initial machine-labeled data by using a chunker and MetaMap. The chunker is developed to extract only biomedical NEs with manually annotated data. MetaMap is used to annotate the category of bio-medical NE. Then we apply the self-training approach to bootstrap the performance of initial machine-labeled data. In our experiments, the biomedical NE recognition system that is trained with our proposed machine-labeled data achieves much high performance. As a result, our system outperforms biomedical NE recognition system that using MetaMap only with 26.03{%}p improvements on F1-score.
Tasks Named Entity Recognition, Question Answering
Published 2017-11-01
URL https://www.aclweb.org/anthology/W17-5807/
PDF https://www.aclweb.org/anthology/W17-5807
PWC https://paperswithcode.com/paper/a-method-to-generate-a-machine-labeled-data
Repo
Framework

Universal Dependencies for Portuguese

Title Universal Dependencies for Portuguese
Authors Alex Rademaker, re, Fabricio Chalub, Livy Real, Cl{'a}udia Freitas, Eckhard Bick, Valeria de Paiva
Abstract
Tasks Dependency Parsing
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-6523/
PDF https://www.aclweb.org/anthology/W17-6523
PWC https://paperswithcode.com/paper/universal-dependencies-for-portuguese
Repo
Framework

Fighting' or Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse

Title Fighting' or Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse
Authors Linyuan Tang, Kyo Kageura
Abstract Previous work on the epistemology of fact-checking indicated the dilemma between the needs of binary answers for the public and ambiguity of political discussion. Determining concepts represented by terms in political discourse can be considered as a Word-Sense Disambiguation (WSD) task. The analysis of political discourse, however, requires identifying precise concepts of terms from relatively small data. This work attempts to provide a basic framework for revealing concepts of terms in political discourse with explicit contextual information. The framework consists of three parts: 1) extracting important terms, 2) generating concordance for each term with stipulative definitions and explanations, and 3) agglomerating similar information of the term by hierarchical clustering. Utterances made by Prime Minister Abe Shinzo in the Diet of Japan are used to examine our framework. Importantly, we revealed the conceptual inconsistency of the term Sonritsu-kiki-jitai. The framework was proved to work, but only for a small number of terms due to lack of explicit contextual information.
Tasks Word Sense Disambiguation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4216/
PDF https://www.aclweb.org/anthology/W17-4216
PWC https://paperswithcode.com/paper/fighting-or-conflict-an-approach-to-revealing
Repo
Framework

Connecting people digitally - a semantic web based approach to linking heterogeneous data sets

Title Connecting people digitally - a semantic web based approach to linking heterogeneous data sets
Authors Katalin Lejtovicz, Amelie Dorn
Abstract In this paper we present a semantic enrichment approach for linking two distinct data sets: the {"O}BL (Austrian Biographical Dictionary) and the DB{"O} (Database of Bavarian Dialects in Austria). Although the data sets are different in their content and in the structuring of data, they contain similar common {``}entities{''} such as names of persons. Here we describe the semantic enrichment process of how these data sets can be inter-linked through URIs (Uniform Resource Identifiers) taking person names as a concrete example. Moreover, we also point to societal benefits of applying such semantic enrichment methods in order to open and connect our resources to various services. |
Tasks Entity Linking, Word Sense Disambiguation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7801/
PDF https://doi.org/10.26615/978-954-452-040-3_001
PWC https://paperswithcode.com/paper/connecting-people-digitally-a-semantic-web
Repo
Framework
comments powered by Disqus