July 26, 2019

2236 words 11 mins read

Paper Group NANR 82

Paper Group NANR 82

Improving the Naturalness and Expressivity of Language Generation for Spanish. NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project. Analysis and Optimization of Graph Decompositions by Lifted Multicuts. User-initiated Sub-dialogues in State-of-the-art Dialogue Systems. Stylistic Variation in Television Dialogue for Natura …

Improving the Naturalness and Expressivity of Language Generation for Spanish

Title Improving the Naturalness and Expressivity of Language Generation for Spanish
Authors Cristina Barros, Dimitra Gkatzia, Elena Lloret
Abstract We present a flexible Natural Language Generation approach for Spanish, focused on the surface realisation stage, which integrates an inflection module in order to improve the naturalness and expressivity of the generated language. This inflection module inflects the verbs using an ensemble of trainable algorithms whereas the other types of words (e.g. nouns, determiners, etc) are inflected using hand-crafted rules. We show that our approach achieves 2{%} higher accuracy than two state-of-art inflection generation approaches. Furthermore, our proposed approach also predicts an extra feature: the inflection of the imperative mood, which was not taken into account by previous work. We also present a user evaluation, where we demonstrate that the proposed method significantly improves the perceived naturalness of the generated language.
Tasks Text Generation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-3505/
PDF https://www.aclweb.org/anthology/W17-3505
PWC https://paperswithcode.com/paper/improving-the-naturalness-and-expressivity-of
Repo
Framework

NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project

Title NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project
Authors Inguna Skadi{\c{n}}a, M{=a}rcis Pinnis
Abstract The recent technological shift in machine translation from statistical machine translation (SMT) to neural machine translation (NMT) raises the question of the strengths and weaknesses of NMT. In this paper, we present an analysis of NMT and SMT systems{'} outputs from narrow domain English-Latvian MT systems that were trained on a rather small amount of data. We analyze post-edits produced by professional translators and manually annotated errors in these outputs. Analysis of post-edits allowed us to conclude that both approaches are comparably successful, allowing for an increase in translators{'} productivity, with the NMT system showing slightly worse results. Through the analysis of annotated errors, we found that NMT translations are more fluent than SMT translations. However, errors related to accuracy, especially, mistranslation and omission errors, occur more often in NMT outputs. The word form errors, that characterize the morphological richness of Latvian, are frequent for both systems, but slightly fewer in NMT outputs.
Tasks Machine Translation
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1038/
PDF https://www.aclweb.org/anthology/I17-1038
PWC https://paperswithcode.com/paper/nmt-or-smt-case-study-of-a-narrow-domain
Repo
Framework

Analysis and Optimization of Graph Decompositions by Lifted Multicuts

Title Analysis and Optimization of Graph Decompositions by Lifted Multicuts
Authors Andrea Horňáková, Jan-Hendrik Lange, Bjoern Andres
Abstract We study the set of all decompositions (clusterings) of a graph through its characterization as a set of lifted multicuts. This leads us to practically relevant insights related to the definition of classes of decompositions by must-join and must-cut constraints and related to the comparison of clusterings by metrics. To find optimal decompositions defined by minimum cost lifted multicuts, we establish some properties of some facets of lifted multicut polytopes, define efficient separation procedures and apply these in a branch-and-cut algorithm.
Tasks
Published 2017-08-01
URL https://icml.cc/Conferences/2017/Schedule?showEvent=476
PDF http://proceedings.mlr.press/v70/hornakova17a/hornakova17a.pdf
PWC https://paperswithcode.com/paper/analysis-and-optimization-of-graph
Repo
Framework

User-initiated Sub-dialogues in State-of-the-art Dialogue Systems

Title User-initiated Sub-dialogues in State-of-the-art Dialogue Systems
Authors Staffan Larsson
Abstract We test state of the art dialogue systems for their behaviour in response to user-initiated sub-dialogues, i.e. interactions where a system question is responded to with a question or request from the user, who thus initiates a sub-dialogue. We look at sub-dialogues both within a single app (where the sub-dialogue concerns another topic in the original domain) and across apps (where the sub-dialogue concerns a different domain). The overall conclusion of the tests is that none of the systems can be said to deal appropriately with user-initiated sub-dialogues.
Tasks Dialogue Management, Spoken Dialogue Systems
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-5503/
PDF https://www.aclweb.org/anthology/W17-5503
PWC https://paperswithcode.com/paper/user-initiated-sub-dialogues-in-state-of-the
Repo
Framework

Stylistic Variation in Television Dialogue for Natural Language Generation

Title Stylistic Variation in Television Dialogue for Natural Language Generation
Authors Grace Lin, Marilyn Walker
Abstract Conversation is a critical component of storytelling, where key information is often revealed by what/how a character says it. We focus on the issue of character voice and build stylistic models with linguistic features related to natural language generation decisions. Using a dialogue corpus of the television series, The Big Bang Theory, we apply content analysis to extract relevant linguistic features to build character-based stylistic models, and we test the model-fit through an user perceptual experiment with Amazon{'}s Mechanical Turk. The results are encouraging in that human subjects tend to perceive the generated utterances as being more similar to the character they are modeled on, than to another random character.
Tasks Language Modelling, Text Generation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4911/
PDF https://www.aclweb.org/anthology/W17-4911
PWC https://paperswithcode.com/paper/stylistic-variation-in-television-dialogue
Repo
Framework

Results of the fifth edition of the BioASQ Challenge

Title Results of the fifth edition of the BioASQ Challenge
Authors Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras, Ioannis Kakadiaris
Abstract The goal of the BioASQ challenge is to engage researchers into creating cuttingedge biomedical information systems. Specifically, it aims at the promotion of systems and methodologies that are able to deal with a plethora of different tasks in the biomedical domain. This is achieved through the organization of challenges. The fifth challenge consisted of three tasks: semantic indexing, question answering and a new task on information extraction. In total, 29 teams with more than 95 systems participated in the challenge. Overall, as in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the art systems are continuously improving, pushing the frontier of research.
Tasks Information Retrieval, Question Answering
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2306/
PDF https://www.aclweb.org/anthology/W17-2306
PWC https://paperswithcode.com/paper/results-of-the-fifth-edition-of-the-bioasq
Repo
Framework

Graph Based Sentiment Aggregation using ConceptNet Ontology

Title Graph Based Sentiment Aggregation using ConceptNet Ontology
Authors Srikanth Tamilselvam, Seema Nagar, Abhijit Mishra, Kuntal Dey
Abstract The sentiment aggregation problem accounts for analyzing the sentiment of a user towards various aspects/features of a product, and meaningfully assimilating the pragmatic significance of these features/aspects from an opinionated text. The current paper addresses the sentiment aggregation problem, by assigning weights to each aspect appearing in the user-generated content, that are proportionate to the strategic importance of the aspect in the pragmatic domain. The novelty of this paper is in computing the pragmatic significance (weight) of each aspect, using graph centrality measures (applied on domain specific ontology-graphs extracted from ConceptNet), and deeply ingraining these weights while aggregating the sentiments from opinionated text. We experiment over multiple real-life product review data. Our system consistently outperforms the state of the art - by as much as a F-score of 20.39{%} in one case.
Tasks Sentiment Analysis
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-1053/
PDF https://www.aclweb.org/anthology/I17-1053
PWC https://paperswithcode.com/paper/graph-based-sentiment-aggregation-using
Repo
Framework

Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them

Title Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them
Authors Helmut Horacek
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-3703/
PDF https://www.aclweb.org/anthology/W17-3703
PWC https://paperswithcode.com/paper/requirements-for-conceptual-representations
Repo
Framework

Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study

Title Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
Authors Jihen Karoui, Farah Benamara, V{'e}ronique Moriceau, Viviana Patti, Cristina Bosco, Nathalie Aussenac-Gilles
Abstract This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter{'}s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the tweet. Informed by linguistic theories, we propose for the first time a multi-layered annotation schema for irony and its application to a corpus of French, English and Italian tweets. We detail each layer, explore their interactions, and discuss our results according to a qualitative and quantitative perspective.
Tasks Sentiment Analysis
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-1025/
PDF https://www.aclweb.org/anthology/E17-1025
PWC https://paperswithcode.com/paper/exploring-the-impact-of-pragmatic-phenomena
Repo
Framework

Comparing Machine Translation and Human Translation: A Case Study

Title Comparing Machine Translation and Human Translation: A Case Study
Authors Lars Ahrenberg
Abstract As machine translation technology improves comparisons to human performance are often made in quite general and exaggerated terms. Thus, it is important to be able to account for differences accurately. This paper reports a simple, descriptive scheme for comparing translations and applies it to two translations of a British opinion article published in March, 2017. One is a human translation (HT) into Swedish, and the other a machine translation (MT). While the comparison is limited to one text, the results are indicative of current limitations in MT.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7903/
PDF https://doi.org/10.26615/978-954-452-042-7_003
PWC https://paperswithcode.com/paper/comparing-machine-translation-and-human
Repo
Framework
Title TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies
Authors Michael Ustaszewski, Andy Stauder
Abstract Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike. Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP.
Tasks Machine Translation
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-7904/
PDF https://doi.org/10.26615/978-954-452-042-7_004
PWC https://paperswithcode.com/paper/transbank-metadata-as-the-missing-link
Repo
Framework

Sparse Embedded k-Means Clustering

Title Sparse Embedded k-Means Clustering
Authors Weiwei Liu, Xiaobo Shen, Ivor Tsang
Abstract The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min {n,d}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]‘s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.
Tasks Dimensionality Reduction
Published 2017-12-01
URL http://papers.nips.cc/paper/6924-sparse-embedded-k-means-clustering
PDF http://papers.nips.cc/paper/6924-sparse-embedded-k-means-clustering.pdf
PWC https://paperswithcode.com/paper/sparse-embedded-k-means-clustering
Repo
Framework

One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text

Title One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text
Authors Victor Bellon, Raul Rodriguez-Esteban
Abstract We explored a new approach to named entity recognition based on hundreds of machine learning models, each trained to distinguish a single entity, and showed its application to gene name identification (GNI). The rationale for our approach, which we named {``}one model per entity{''} (OMPE), was that increasing the number of models would make the learning task easier for each individual model. Our training strategy leveraged freely-available database annotations instead of manually-annotated corpora. While its performance in our proof-of-concept was disappointing, we believe that there is enough room for improvement that such approaches could reach competitive performance while eliminating the cost of creating costly training corpora. |
Tasks Domain Adaptation, Named Entity Recognition
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-8007/
PDF https://doi.org/10.26615/978-954-452-044-1_007
PWC https://paperswithcode.com/paper/one-model-per-entity-using-hundreds-of
Repo
Framework

State Aware Imitation Learning

Title State Aware Imitation Learning
Authors Yannick Schroecker, Charles L. Isbell
Abstract Imitation learning is the study of learning how to act given a set of demonstrations provided by a human expert. It is intuitively apparent that learning to take optimal actions is a simpler undertaking in situations that are similar to the ones shown by the teacher. However, imitation learning approaches do not tend to use this insight directly. In this paper, we introduce State Aware Imitation Learning (SAIL), an imitation learning algorithm that allows an agent to learn how to remain in states where it can confidently take the correct action and how to recover if it is lead astray. Key to this algorithm is a gradient learned using a temporal difference update rule which leads the agent to prefer states similar to the demonstrated states. We show that estimating a linear approximation of this gradient yields similar theoretical guarantees to online temporal difference learning approaches and empirically show that SAIL can effectively be used for imitation learning in continuous domains with non-linear function approximators used for both the policy representation and the gradient estimate.
Tasks Imitation Learning
Published 2017-12-01
URL http://papers.nips.cc/paper/6884-state-aware-imitation-learning
PDF http://papers.nips.cc/paper/6884-state-aware-imitation-learning.pdf
PWC https://paperswithcode.com/paper/state-aware-imitation-learning
Repo
Framework

Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Title Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Authors
Abstract
Tasks
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4200/
PDF https://www.aclweb.org/anthology/W17-4200
PWC https://paperswithcode.com/paper/proceedings-of-the-2017-emnlp-workshop
Repo
Framework
comments powered by Disqus