Paper Group NANR 82
Improving the Naturalness and Expressivity of Language Generation for Spanish. NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project. Analysis and Optimization of Graph Decompositions by Lifted Multicuts. User-initiated Sub-dialogues in State-of-the-art Dialogue Systems. Stylistic Variation in Television Dialogue for Natura …
Improving the Naturalness and Expressivity of Language Generation for Spanish
Title | Improving the Naturalness and Expressivity of Language Generation for Spanish |
Authors | Cristina Barros, Dimitra Gkatzia, Elena Lloret |
Abstract | We present a flexible Natural Language Generation approach for Spanish, focused on the surface realisation stage, which integrates an inflection module in order to improve the naturalness and expressivity of the generated language. This inflection module inflects the verbs using an ensemble of trainable algorithms whereas the other types of words (e.g. nouns, determiners, etc) are inflected using hand-crafted rules. We show that our approach achieves 2{%} higher accuracy than two state-of-art inflection generation approaches. Furthermore, our proposed approach also predicts an extra feature: the inflection of the imperative mood, which was not taken into account by previous work. We also present a user evaluation, where we demonstrate that the proposed method significantly improves the perceived naturalness of the generated language. |
Tasks | Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3505/ |
https://www.aclweb.org/anthology/W17-3505 | |
PWC | https://paperswithcode.com/paper/improving-the-naturalness-and-expressivity-of |
Repo | |
Framework | |
NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project
Title | NMT or SMT: Case Study of a Narrow-domain English-Latvian Post-editing Project |
Authors | Inguna Skadi{\c{n}}a, M{=a}rcis Pinnis |
Abstract | The recent technological shift in machine translation from statistical machine translation (SMT) to neural machine translation (NMT) raises the question of the strengths and weaknesses of NMT. In this paper, we present an analysis of NMT and SMT systems{'} outputs from narrow domain English-Latvian MT systems that were trained on a rather small amount of data. We analyze post-edits produced by professional translators and manually annotated errors in these outputs. Analysis of post-edits allowed us to conclude that both approaches are comparably successful, allowing for an increase in translators{'} productivity, with the NMT system showing slightly worse results. Through the analysis of annotated errors, we found that NMT translations are more fluent than SMT translations. However, errors related to accuracy, especially, mistranslation and omission errors, occur more often in NMT outputs. The word form errors, that characterize the morphological richness of Latvian, are frequent for both systems, but slightly fewer in NMT outputs. |
Tasks | Machine Translation |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1038/ |
https://www.aclweb.org/anthology/I17-1038 | |
PWC | https://paperswithcode.com/paper/nmt-or-smt-case-study-of-a-narrow-domain |
Repo | |
Framework | |
Analysis and Optimization of Graph Decompositions by Lifted Multicuts
Title | Analysis and Optimization of Graph Decompositions by Lifted Multicuts |
Authors | Andrea Horňáková, Jan-Hendrik Lange, Bjoern Andres |
Abstract | We study the set of all decompositions (clusterings) of a graph through its characterization as a set of lifted multicuts. This leads us to practically relevant insights related to the definition of classes of decompositions by must-join and must-cut constraints and related to the comparison of clusterings by metrics. To find optimal decompositions defined by minimum cost lifted multicuts, we establish some properties of some facets of lifted multicut polytopes, define efficient separation procedures and apply these in a branch-and-cut algorithm. |
Tasks | |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=476 |
http://proceedings.mlr.press/v70/hornakova17a/hornakova17a.pdf | |
PWC | https://paperswithcode.com/paper/analysis-and-optimization-of-graph |
Repo | |
Framework | |
User-initiated Sub-dialogues in State-of-the-art Dialogue Systems
Title | User-initiated Sub-dialogues in State-of-the-art Dialogue Systems |
Authors | Staffan Larsson |
Abstract | We test state of the art dialogue systems for their behaviour in response to user-initiated sub-dialogues, i.e. interactions where a system question is responded to with a question or request from the user, who thus initiates a sub-dialogue. We look at sub-dialogues both within a single app (where the sub-dialogue concerns another topic in the original domain) and across apps (where the sub-dialogue concerns a different domain). The overall conclusion of the tests is that none of the systems can be said to deal appropriately with user-initiated sub-dialogues. |
Tasks | Dialogue Management, Spoken Dialogue Systems |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-5503/ |
https://www.aclweb.org/anthology/W17-5503 | |
PWC | https://paperswithcode.com/paper/user-initiated-sub-dialogues-in-state-of-the |
Repo | |
Framework | |
Stylistic Variation in Television Dialogue for Natural Language Generation
Title | Stylistic Variation in Television Dialogue for Natural Language Generation |
Authors | Grace Lin, Marilyn Walker |
Abstract | Conversation is a critical component of storytelling, where key information is often revealed by what/how a character says it. We focus on the issue of character voice and build stylistic models with linguistic features related to natural language generation decisions. Using a dialogue corpus of the television series, The Big Bang Theory, we apply content analysis to extract relevant linguistic features to build character-based stylistic models, and we test the model-fit through an user perceptual experiment with Amazon{'}s Mechanical Turk. The results are encouraging in that human subjects tend to perceive the generated utterances as being more similar to the character they are modeled on, than to another random character. |
Tasks | Language Modelling, Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4911/ |
https://www.aclweb.org/anthology/W17-4911 | |
PWC | https://paperswithcode.com/paper/stylistic-variation-in-television-dialogue |
Repo | |
Framework | |
Results of the fifth edition of the BioASQ Challenge
Title | Results of the fifth edition of the BioASQ Challenge |
Authors | Anastasios Nentidis, Konstantinos Bougiatiotis, Anastasia Krithara, Georgios Paliouras, Ioannis Kakadiaris |
Abstract | The goal of the BioASQ challenge is to engage researchers into creating cuttingedge biomedical information systems. Specifically, it aims at the promotion of systems and methodologies that are able to deal with a plethora of different tasks in the biomedical domain. This is achieved through the organization of challenges. The fifth challenge consisted of three tasks: semantic indexing, question answering and a new task on information extraction. In total, 29 teams with more than 95 systems participated in the challenge. Overall, as in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the art systems are continuously improving, pushing the frontier of research. |
Tasks | Information Retrieval, Question Answering |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-2306/ |
https://www.aclweb.org/anthology/W17-2306 | |
PWC | https://paperswithcode.com/paper/results-of-the-fifth-edition-of-the-bioasq |
Repo | |
Framework | |
Graph Based Sentiment Aggregation using ConceptNet Ontology
Title | Graph Based Sentiment Aggregation using ConceptNet Ontology |
Authors | Srikanth Tamilselvam, Seema Nagar, Abhijit Mishra, Kuntal Dey |
Abstract | The sentiment aggregation problem accounts for analyzing the sentiment of a user towards various aspects/features of a product, and meaningfully assimilating the pragmatic significance of these features/aspects from an opinionated text. The current paper addresses the sentiment aggregation problem, by assigning weights to each aspect appearing in the user-generated content, that are proportionate to the strategic importance of the aspect in the pragmatic domain. The novelty of this paper is in computing the pragmatic significance (weight) of each aspect, using graph centrality measures (applied on domain specific ontology-graphs extracted from ConceptNet), and deeply ingraining these weights while aggregating the sentiments from opinionated text. We experiment over multiple real-life product review data. Our system consistently outperforms the state of the art - by as much as a F-score of 20.39{%} in one case. |
Tasks | Sentiment Analysis |
Published | 2017-11-01 |
URL | https://www.aclweb.org/anthology/I17-1053/ |
https://www.aclweb.org/anthology/I17-1053 | |
PWC | https://paperswithcode.com/paper/graph-based-sentiment-aggregation-using |
Repo | |
Framework | |
Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them
Title | Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them |
Authors | Helmut Horacek |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3703/ |
https://www.aclweb.org/anthology/W17-3703 | |
PWC | https://paperswithcode.com/paper/requirements-for-conceptual-representations |
Repo | |
Framework | |
Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study
Title | Exploring the Impact of Pragmatic Phenomena on Irony Detection in Tweets: A Multilingual Corpus Study |
Authors | Jihen Karoui, Farah Benamara, V{'e}ronique Moriceau, Viviana Patti, Cristina Bosco, Nathalie Aussenac-Gilles |
Abstract | This paper provides a linguistic and pragmatic analysis of the phenomenon of irony in order to represent how Twitter{'}s users exploit irony devices within their communication strategies for generating textual contents. We aim to measure the impact of a wide-range of pragmatic phenomena in the interpretation of irony, and to investigate how these phenomena interact with contexts local to the tweet. Informed by linguistic theories, we propose for the first time a multi-layered annotation schema for irony and its application to a corpus of French, English and Italian tweets. We detail each layer, explore their interactions, and discuss our results according to a qualitative and quantitative perspective. |
Tasks | Sentiment Analysis |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1025/ |
https://www.aclweb.org/anthology/E17-1025 | |
PWC | https://paperswithcode.com/paper/exploring-the-impact-of-pragmatic-phenomena |
Repo | |
Framework | |
Comparing Machine Translation and Human Translation: A Case Study
Title | Comparing Machine Translation and Human Translation: A Case Study |
Authors | Lars Ahrenberg |
Abstract | As machine translation technology improves comparisons to human performance are often made in quite general and exaggerated terms. Thus, it is important to be able to account for differences accurately. This paper reports a simple, descriptive scheme for comparing translations and applies it to two translations of a British opinion article published in March, 2017. One is a human translation (HT) into Swedish, and the other a machine translation (MT). While the comparison is limited to one text, the results are indicative of current limitations in MT. |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7903/ |
https://doi.org/10.26615/978-954-452-042-7_003 | |
PWC | https://paperswithcode.com/paper/comparing-machine-translation-and-human |
Repo | |
Framework | |
TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies
Title | TransBank: Metadata as the Missing Link between NLP and Traditional Translation Studies |
Authors | Michael Ustaszewski, Andy Stauder |
Abstract | Despite the growing importance of data in translation, there is no data repository that equally meets the requirements of translation industry and academia alike. Therefore, we plan to develop a freely available, multilingual and expandable bank of translations and their source texts aligned at the sentence level. Special emphasis will be placed on the labelling of metadata that precisely describe the relations between translated texts and their originals. This metadata-centric approach gives users the opportunity to compile and download custom corpora on demand. Such a general-purpose data repository may help to bridge the gap between translation theory and the language industry, including translation technology providers and NLP. |
Tasks | Machine Translation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-7904/ |
https://doi.org/10.26615/978-954-452-042-7_004 | |
PWC | https://paperswithcode.com/paper/transbank-metadata-as-the-missing-link |
Repo | |
Framework | |
Sparse Embedded k-Means Clustering
Title | Sparse Embedded k-Means Clustering |
Authors | Weiwei Liu, Xiaobo Shen, Ivor Tsang |
Abstract | The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of $\min {n,d}\epsilon^2 log(d)/k$ for data matrix $X \in \mathbb{R}^{n\times d} $ with $n$ data points and $d$ features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require $\mathcal{O}(\frac{ndk}{\epsilon^2log(d)})$ for matrix multiplication and this cost will be prohibitive for large values of $n$ and $d$. To break this bottleneck, we carefully build a sparse embedded $k$-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz(X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]‘s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate $k$-means clustering, while achieving satisfactory clustering performance. |
Tasks | Dimensionality Reduction |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6924-sparse-embedded-k-means-clustering |
http://papers.nips.cc/paper/6924-sparse-embedded-k-means-clustering.pdf | |
PWC | https://paperswithcode.com/paper/sparse-embedded-k-means-clustering |
Repo | |
Framework | |
One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text
Title | One model per entity: using hundreds of machine learning models to recognize and normalize biomedical names in text |
Authors | Victor Bellon, Raul Rodriguez-Esteban |
Abstract | We explored a new approach to named entity recognition based on hundreds of machine learning models, each trained to distinguish a single entity, and showed its application to gene name identification (GNI). The rationale for our approach, which we named {``}one model per entity{''} (OMPE), was that increasing the number of models would make the learning task easier for each individual model. Our training strategy leveraged freely-available database annotations instead of manually-annotated corpora. While its performance in our proof-of-concept was disappointing, we believe that there is enough room for improvement that such approaches could reach competitive performance while eliminating the cost of creating costly training corpora. | |
Tasks | Domain Adaptation, Named Entity Recognition |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-8007/ |
https://doi.org/10.26615/978-954-452-044-1_007 | |
PWC | https://paperswithcode.com/paper/one-model-per-entity-using-hundreds-of |
Repo | |
Framework | |
State Aware Imitation Learning
Title | State Aware Imitation Learning |
Authors | Yannick Schroecker, Charles L. Isbell |
Abstract | Imitation learning is the study of learning how to act given a set of demonstrations provided by a human expert. It is intuitively apparent that learning to take optimal actions is a simpler undertaking in situations that are similar to the ones shown by the teacher. However, imitation learning approaches do not tend to use this insight directly. In this paper, we introduce State Aware Imitation Learning (SAIL), an imitation learning algorithm that allows an agent to learn how to remain in states where it can confidently take the correct action and how to recover if it is lead astray. Key to this algorithm is a gradient learned using a temporal difference update rule which leads the agent to prefer states similar to the demonstrated states. We show that estimating a linear approximation of this gradient yields similar theoretical guarantees to online temporal difference learning approaches and empirically show that SAIL can effectively be used for imitation learning in continuous domains with non-linear function approximators used for both the policy representation and the gradient estimate. |
Tasks | Imitation Learning |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6884-state-aware-imitation-learning |
http://papers.nips.cc/paper/6884-state-aware-imitation-learning.pdf | |
PWC | https://paperswithcode.com/paper/state-aware-imitation-learning |
Repo | |
Framework | |
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism
Title | Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism |
Authors | |
Abstract | |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-4200/ |
https://www.aclweb.org/anthology/W17-4200 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-2017-emnlp-workshop |
Repo | |
Framework | |