July 26, 2019

2236 words 11 mins read

Paper Group NANR 82

Improving the Naturalness and Expressivity of Language Generation for Spanish. NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project. Analysis and Optimization of Graph Decompositions by Lifted Multicuts. User-initiated Sub-dialogues in State-of-the-art Dialogue Systems. Stylistic Variation in Television Dialogue for Natura …

Improving the Naturalness and Expressivity of Language Generation for Spanish


Title	Improving the Naturalness and Expressivity of Language Generation for Spanish
Authors	Cristina Barros, Dimitra Gkatzia, Elena Lloret
Abstract	We present a flexible Natural Language Generation approach for Spanish, focused on the surface realisation stage, which integrates an inflection module in order to improve the naturalness and expressivity of the generated language. This inflection module inflects the verbs using an ensemble of trainable algorithms whereas the other types of words (e.g. nouns, determiners, etc) are inflected using hand-crafted rules. We show that our approach achieves 2{%} higher accuracy than two state-of-art inflection generation approaches. Furthermore, our proposed approach also predicts an extra feature: the inflection of the imperative mood, which was not taken into account by previous work. We also present a user evaluation, where we demonstrate that the proposed method significantly improves the perceived naturalness of the generated language.
Tasks	Text Generation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3505/
PDF	https://www.aclweb.org/anthology/W17-3505
PWC	https://paperswithcode.com/paper/improving-the-naturalness-and-expressivity-of
Repo
Framework

NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project


Title	NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project
Authors	Inguna Skadi{\c{n}}a, M{=a}rcis Pinnis
Abstract	The recent technological shift in machine translation from statistical machine translation (SMT) to neural machine translation (NMT) raises the question of the strengths and weaknesses of NMT. In this paper, we present an analysis of NMT and SMT systems{'} outputs from narrow domain English-Latvian MT systems that were trained on a rather small amount of data. We analyze post-edits produced by professional translators and manually annotated errors in these outputs. Analysis of post-edits allowed us to conclude that both approaches are comparably successful, allowing for an increase in translators{'} productivity, with the NMT system showing slightly worse results. Through the analysis of annotated errors, we found that NMT translations are more fluent than SMT translations. However, errors related to accuracy, especially, mistranslation and omission errors, occur more often in NMT outputs. The word form errors, that characterize the morphological richness of Latvian, are frequent for both systems, but slightly fewer in NMT outputs.
Tasks	Machine Translation
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1038/
PDF	https://www.aclweb.org/anthology/I17-1038
PWC	https://paperswithcode.com/paper/nmt-or-smt-case-study-of-a-narrow-domain
Repo
Framework

Analysis and Optimization of Graph Decompositions by Lifted Multicuts


Title	Analysis and Optimization of Graph Decompositions by Lifted Multicuts
Authors	Andrea Horňáková, Jan-Hendrik Lange, Bjoern Andres
Abstract	We study the set of all decompositions (clusterings) of a graph through its characterization as a set of lifted multicuts. This leads us to practically relevant insights related to the definition of classes of decompositions by must-join and must-cut constraints and related to the comparison of clusterings by metrics. To find optimal decompositions defined by minimum cost lifted multicuts, we establish some properties of some facets of lifted multicut polytopes, define efficient separation procedures and apply these in a branch-and-cut algorithm.
Tasks
Published	2017-08-01
URL	https://icml.cc/Conferences/2017/Schedule?showEvent=476
PDF	http://proceedings.mlr.press/v70/hornakova17a/hornakova17a.pdf
PWC	https://paperswithcode.com/paper/analysis-and-optimization-of-graph
Repo
Framework

User-initiated Sub-dialogues in State-of-the-art Dialogue Systems


Title	User-initiated Sub-dialogues in State-of-the-art Dialogue Systems
Authors	Staffan Larsson
Abstract	We test state of the art dialogue systems for their behaviour in response to user-initiated sub-dialogues, i.e. interactions where a system question is responded to with a question or request from the user, who thus initiates a sub-dialogue. We look at sub-dialogues both within a single app (where the sub-dialogue concerns another topic in the original domain) and across apps (where the sub-dialogue concerns a different domain). The overall conclusion of the tests is that none of the systems can be said to deal appropriately with user-initiated sub-dialogues.
Tasks	Dialogue Management, Spoken Dialogue Systems
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-5503/
PDF	https://www.aclweb.org/anthology/W17-5503
PWC	https://paperswithcode.com/paper/user-initiated-sub-dialogues-in-state-of-the
Repo
Framework

Stylistic Variation in Television Dialogue for Natural Language Generation


Title	Stylistic Variation in Television Dialogue for Natural Language Generation
Authors	Grace Lin, Marilyn Walker
Abstract	Conversation is a critical component of storytelling, where key information is often revealed by what/how a character says it. We focus on the issue of character voice and build stylistic models with linguistic features related to natural language generation decisions. Using a dialogue corpus of the television series, The Big Bang Theory, we apply content analysis to extract relevant linguistic features to build character-based stylistic models, and we test the model-fit through an user perceptual experiment with Amazon{'}s Mechanical Turk. The results are encouraging in that human subjects tend to perceive the generated utterances as being more similar to the character they are modeled on, than to another random character.
Tasks	Language Modelling, Text Generation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4911/
PDF	https://www.aclweb.org/anthology/W17-4911
PWC	https://paperswithcode.com/paper/stylistic-variation-in-television-dialogue
Repo
Framework

Results of the fifth edition of the BioASQ Challenge


Title	Results of the fifth edition of the BioASQ Challenge
Authors	Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras, Ioannis Kakadiaris
Abstract	The goal of the BioASQ challenge is to engage researchers into creating cuttingedge biomedical information systems. Specifically, it aims at the promotion of systems and methodologies that are able to deal with a plethora of different tasks in the biomedical domain. This is achieved through the organization of challenges. The fifth challenge consisted of three tasks: semantic indexing, question answering and a new task on information extraction. In total, 29 teams with more than 95 systems participated in the challenge. Overall, as in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the art systems are continuously improving, pushing the frontier of research.
Tasks	Information Retrieval, Question Answering
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2306/
PDF	https://www.aclweb.org/anthology/W17-2306
PWC	https://paperswithcode.com/paper/results-of-the-fifth-edition-of-the-bioasq
Repo
Framework

Graph Based Sentiment Aggregation using ConceptNet Ontology


Title	Graph Based Sentiment Aggregation using ConceptNet Ontology
Authors	Srikanth Tamilselvam, Seema Nagar, Abhijit Mishra, Kuntal Dey
Abstract	The sentiment aggregation problem accounts for analyzing the sentiment of a user towards various aspects/features of a product, and meaningfully assimilating the pragmatic significance of these features/aspects from an opinionated text. The current paper addresses the sentiment aggregation problem, by assigning weights to each aspect appearing in the user-generated content, that are proportionate to the strategic importance of the aspect in the pragmatic domain. The novelty of this paper is in computing the pragmatic significance (weight) of each aspect, using graph centrality measures (applied on domain specific ontology-graphs extracted from ConceptNet), and deeply ingraining these weights while aggregating the sentiments from opinionated text. We experiment over multiple real-life product review data. Our system consistently outperforms the state of the art - by as much as a F-score of 20.39{%} in one case.
Tasks	Sentiment Analysis
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-1053/
PDF	https://www.aclweb.org/anthology/I17-1053
PWC	https://paperswithcode.com/paper/graph-based-sentiment-aggregation-using
Repo
Framework

Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them


Title	Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them
Authors	Helmut Horacek
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-3703/
PDF	https://www.aclweb.org/anthology/W17-3703
PWC	https://paperswithcode.com/paper/requirements-for-conceptual-representations
Repo
Framework

Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study


Title	Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
Authors	Jihen Karoui, Farah Benamara, V{'e}ronique Moriceau, Viviana Patti, Cristina Bosco, Nathalie Aussenac-Gilles
Abstract	This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter{'}s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the tweet. Informed by linguistic theories, we propose for the first time a multi-layered annotation schema for irony and its application to a corpus of French, English and Italian tweets. We detail each layer, explore their interactions, and discuss our results according to a qualitative and quantitative perspective.
Tasks	Sentiment Analysis
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-1025/
PDF	https://www.aclweb.org/anthology/E17-1025
PWC	https://paperswithcode.com/paper/exploring-the-impact-of-pragmatic-phenomena
Repo
Framework

Comparing Machine Translation and Human Translation: A Case Study


Title	Comparing Machine Translation and Human Translation: A Case Study
Authors	Lars Ahrenberg
Abstract	As machine translation technology improves comparisons to human performance are often made in quite general and exaggerated terms. Thus, it is important to be able to account for differences accurately. This paper reports a simple, descriptive scheme for comparing translations and applies it to two translations of a British opinion article published in March, 2017. One is a human translation (HT) into Swedish, and the other a machine translation (MT). While the comparison is limited to one text, the results are indicative of current limitations in MT.
Tasks	Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7903/
PDF	https://doi.org/10.26615/978-954-452-042-7_003
PWC	https://paperswithcode.com/paper/comparing-machine-translation-and-human
Repo
Framework

TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies


Title	TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies
Authors	Michael Ustaszewski, Andy Stauder
Abstract	Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike. Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP.
Tasks	Machine Translation
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-7904/
PDF	https://doi.org/10.26615/978-954-452-042-7_004
PWC	https://paperswithcode.com/paper/transbank-metadata-as-the-missing-link
Repo
Framework

Sparse Embedded k-Means Clustering


Title	Sparse Embedded k-Means Clustering
Authors	Weiwei Liu, Xiaobo Shen, Ivor Tsang
Abstract	The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min {n,d}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]‘s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance.
Tasks	Dimensionality Reduction
Published	2017-12-01
URL	http://papers.nips.cc/paper/6924-sparse-embedded-k-means-clustering
PDF	http://papers.nips.cc/paper/6924-sparse-embedded-k-means-clustering.pdf
PWC	https://paperswithcode.com/paper/sparse-embedded-k-means-clustering
Repo
Framework

One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text


Title	One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text
Authors	Victor Bellon, Raul Rodriguez-Esteban
Abstract	We explored a new approach to named entity recognition based on hundreds of machine learning models, each trained to distinguish a single entity, and showed its application to gene name identification (GNI). The rationale for our approach, which we named {``}one model per entity{''} (OMPE), was that increasing the number of models would make the learning task easier for each individual model. Our training strategy leveraged freely-available database annotations instead of manually-annotated corpora. While its performance in our proof-of-concept was disappointing, we believe that there is enough room for improvement that such approaches could reach competitive performance while eliminating the cost of creating costly training corpora. \|
Tasks	Domain Adaptation, Named Entity Recognition
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-8007/
PDF	https://doi.org/10.26615/978-954-452-044-1_007
PWC	https://paperswithcode.com/paper/one-model-per-entity-using-hundreds-of
Repo
Framework

State Aware Imitation Learning


Title	State Aware Imitation Learning
Authors	Yannick Schroecker, Charles L. Isbell
Abstract	Imitation learning is the study of learning how to act given a set of demonstrations provided by a human expert. It is intuitively apparent that learning to take optimal actions is a simpler undertaking in situations that are similar to the ones shown by the teacher. However, imitation learning approaches do not tend to use this insight directly. In this paper, we introduce State Aware Imitation Learning (SAIL), an imitation learning algorithm that allows an agent to learn how to remain in states where it can confidently take the correct action and how to recover if it is lead astray. Key to this algorithm is a gradient learned using a temporal difference update rule which leads the agent to prefer states similar to the demonstrated states. We show that estimating a linear approximation of this gradient yields similar theoretical guarantees to online temporal difference learning approaches and empirically show that SAIL can effectively be used for imitation learning in continuous domains with non-linear function approximators used for both the policy representation and the gradient estimate.
Tasks	Imitation Learning
Published	2017-12-01
URL	http://papers.nips.cc/paper/6884-state-aware-imitation-learning
PDF	http://papers.nips.cc/paper/6884-state-aware-imitation-learning.pdf
PWC	https://paperswithcode.com/paper/state-aware-imitation-learning
Repo
Framework

Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism


Title	Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Authors
Abstract
Tasks
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4200/
PDF	https://www.aclweb.org/anthology/W17-4200
PWC	https://paperswithcode.com/paper/proceedings-of-the-2017-emnlp-workshop
Repo
Framework