October 15, 2019

1983 words 10 mins read

Paper Group NANR 150

Paper Group NANR 150

Findings of the WMT 2018 Shared Task on Automatic Post-Editing. Evaluating Textual Representations through Image Generation. On Training Classifiers for Linking Event Templates. Predictive power of word surprisal for reading times is a linear function of language model quality. Dynamic encoding of structural uncertainty in gradient symbols. Prefix …

Findings of the WMT 2018 Shared Task on Automatic Post-Editing

Title Findings of the WMT 2018 Shared Task on Automatic Post-Editing
Authors Rajen Chatterjee, Matteo Negri, Raphael Rubino, Marco Turchi
Abstract We present the results from the fourth round of the WMT shared task on MT Automatic Post-Editing. The task consists in automatically correcting the output of a {}black-box{''} machine translation system by learning from human corrections. Keeping the same general evaluation setting of the three previous rounds, this year we focused on one language pair (English-German) and on domain-specific data (Information Technology), with MT outputs produced by two different paradigms: phrase-based (PBSMT) and neural (NMT). Five teams submitted respectively 11 runs for the PBSMT subtask and 10 runs for the NMT subtask. In the former subtask, characterized by original translations of lower quality, top results achieved impressive improvements, up to -6.24 TER and +9.53 BLEU points over the baseline {}\textit{do-nothing}{''} system. The NMT subtask proved to be more challenging due to the higher quality of the original translations and the availability of less training data. In this case, top results show smaller improvements up to -0.38 TER and +0.8 BLEU points.
Tasks Automatic Post-Editing, Machine Translation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6452/
PDF https://www.aclweb.org/anthology/W18-6452
PWC https://paperswithcode.com/paper/findings-of-the-wmt-2018-shared-task-on
Repo
Framework

Evaluating Textual Representations through Image Generation

Title Evaluating Textual Representations through Image Generation
Authors Graham Spinks, Marie-Francine Moens
Abstract We present a methodology for determining the quality of textual representations through the ability to generate images from them. Continuous representations of textual input are ubiquitous in modern Natural Language Processing techniques either at the core of machine learning algorithms or as the by-product at any given layer of a neural network. While current techniques to evaluate such representations focus on their performance on particular tasks, they don{'}t provide a clear understanding of the level of informational detail that is stored within them, especially their ability to represent spatial information. The central premise of this paper is that visual inspection or analysis is the most convenient method to quickly and accurately determine information content. Through the use of text-to-image neural networks, we propose a new technique to compare the quality of textual representations by visualizing their information content. The method is illustrated on a medical dataset where the correct representation of spatial information and shorthands are of particular importance. For four different well-known textual representations, we show with a quantitative analysis that some representations are consistently able to deliver higher quality visualizations of the information content. Additionally, we show that the quantitative analysis technique correlates with the judgment of a human expert evaluator in terms of alignment.
Tasks Image Generation
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-5405/
PDF https://www.aclweb.org/anthology/W18-5405
PWC https://paperswithcode.com/paper/evaluating-textual-representations-through
Repo
Framework

On Training Classifiers for Linking Event Templates

Title On Training Classifiers for Linking Event Templates
Authors Jakub Piskorski, Fredi {\v{S}}ari{'c}, Vanni Zavarella, Martin Atkinson
Abstract The paper reports on exploring various machine learning techniques and a range of textual and meta-data features to train classifiers for linking related event templates automatically extracted from online news. With the best model using textual features only we achieved 94.7{%} (92.9{%}) F1 score on GOLD (SILVER) dataset. These figures were further improved to 98.6{%} (GOLD) and 97{%} (SILVER) F1 score by adding meta-data features, mainly thanks to the strong discriminatory power of automatically extracted geographical information related to events.
Tasks
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4309/
PDF https://www.aclweb.org/anthology/W18-4309
PWC https://paperswithcode.com/paper/on-training-classifiers-for-linking-event
Repo
Framework

Predictive power of word surprisal for reading times is a linear function of language model quality

Title Predictive power of word surprisal for reading times is a linear function of language model quality
Authors Adam Goodkind, Klinton Bicknell
Abstract
Tasks Language Modelling
Published 2018-01-01
URL https://www.aclweb.org/anthology/W18-0102/
PDF https://www.aclweb.org/anthology/W18-0102
PWC https://paperswithcode.com/paper/predictive-power-of-word-surprisal-for
Repo
Framework

Dynamic encoding of structural uncertainty in gradient symbols

Title Dynamic encoding of structural uncertainty in gradient symbols
Authors Pyeong Whan Cho, Matthew Goldrick, Richard L. Lewis, Paul Smolensky
Abstract
Tasks
Published 2018-01-01
URL https://www.aclweb.org/anthology/W18-0103/
PDF https://www.aclweb.org/anthology/W18-0103
PWC https://paperswithcode.com/paper/dynamic-encoding-of-structural-uncertainty-in
Repo
Framework

Prefix Lexicalization of Synchronous CFGs using Synchronous TAG

Title Prefix Lexicalization of Synchronous CFGs using Synchronous TAG
Authors Logan Born, Anoop Sarkar
Abstract We show that an epsilon-free, chain-free synchronous context-free grammar (SCFG) can be converted into a weakly equivalent synchronous tree-adjoining grammar (STAG) which is prefix lexicalized. This transformation at most doubles the grammar{'}s rank and cubes its size, but we show that in practice the size increase is only quadratic. Our results extend Greibach normal form from CFGs to SCFGs and prove new formal properties about SCFG, a formalism with many applications in natural language processing.
Tasks Machine Translation
Published 2018-07-01
URL https://www.aclweb.org/anthology/P18-1107/
PDF https://www.aclweb.org/anthology/P18-1107
PWC https://paperswithcode.com/paper/prefix-lexicalization-of-synchronous-cfgs
Repo
Framework

Coordinate Structures in Universal Dependencies for Head-final Languages

Title Coordinate Structures in Universal Dependencies for Head-final Languages
Authors Hiroshi Kanayama, Na-Rae Han, Masayuki Asahara, Jena D. Hwang, Yusuke Miyao, Jinho D. Choi, Yuji Matsumoto
Abstract This paper discusses the representation of coordinate structures in the Universal Dependencies framework for two head-final languages, Japanese and Korean. UD applies a strict principle that makes the head of coordination the left-most conjunct. However, the guideline may produce syntactic trees which are difficult to accept in head-final languages. This paper describes the status in the current Japanese and Korean corpora and proposes alternative designs suitable for these languages.
Tasks
Published 2018-11-01
URL https://www.aclweb.org/anthology/W18-6009/
PDF https://www.aclweb.org/anthology/W18-6009
PWC https://paperswithcode.com/paper/coordinate-structures-in-universal
Repo
Framework

EmotionX-AR: CNN-DCNN autoencoder based Emotion Classifier

Title EmotionX-AR: CNN-DCNN autoencoder based Emotion Classifier
Authors Sopan Khosla
Abstract In this paper, we model emotions in EmotionLines dataset using a convolutional-deconvolutional autoencoder (CNN-DCNN) framework. We show that adding a joint reconstruction loss improves performance. Quantitative evaluation with jointly trained network, augmented with linguistic features, reports best accuracies for emotion prediction; namely joy, sadness, anger, and neutral emotion in text.
Tasks Emotion Classification, Emotion Recognition, Sentiment Analysis
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-3507/
PDF https://www.aclweb.org/anthology/W18-3507
PWC https://paperswithcode.com/paper/emotionx-ar-cnn-dcnn-autoencoder-based
Repo
Framework

Meteor++: Incorporating Copy Knowledge into Machine Translation Evaluation

Title Meteor++: Incorporating Copy Knowledge into Machine Translation Evaluation
Authors Yinuo Guo, Chong Ruan, Junfeng Hu
Abstract In machine translation evaluation, a good candidate translation can be regarded as a paraphrase of the reference. We notice that some words are always copied during paraphrasing, which we call \textbf{copy knowledge}. Considering the stability of such knowledge, a good candidate translation should contain all these words appeared in the reference sentence. Therefore, in this participation of the WMT{'}2018 metrics shared task we introduce a simple statistical method for copy knowledge extraction, and incorporate it into Meteor metric, resulting in a new machine translation metric \textbf{Meteor++}. Our experiments show that Meteor++ can nicely integrate copy knowledge and improve the performance significantly on WMT17 and WMT15 evaluation sets.
Tasks Machine Translation, Text Generation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6454/
PDF https://www.aclweb.org/anthology/W18-6454
PWC https://paperswithcode.com/paper/meteor-incorporating-copy-knowledge-into
Repo
Framework

HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles

Title HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles
Authors Tommi Jauhiainen, Heidi Jauhiainen, Krister Lind{'e}n
Abstract This paper presents the experiments and results obtained by the SUKI team in the Discriminating between Dutch and Flemish in Subtitles shared task of the VarDial 2018 Evaluation Campaign. Our best submission was ranked 8th, obtaining macro F1-score of 0.61. Our best results were produced by a language identifier implementing the HeLI method without any modifications. We describe, in addition to the best method we used, some of the experiments we did with unsupervised clustering.
Tasks Language Identification, Text Categorization
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-3915/
PDF https://www.aclweb.org/anthology/W18-3915
PWC https://paperswithcode.com/paper/heli-based-experiments-in-discriminating
Repo
Framework

Writing Mentor: Self-Regulated Writing Feedback for Struggling Writers

Title Writing Mentor: Self-Regulated Writing Feedback for Struggling Writers
Authors Nitin Madnani, Jill Burstein, Norbert Elliot, Beata Beigman Klebanov, Diane Napolitano, Slava Andreyev, Maxwell Schwartz
Abstract Writing Mentor is a free Google Docs add-on designed to provide feedback to struggling writers and help them improve their writing in a self-paced and self-regulated fashion. Writing Mentor uses natural language processing (NLP) methods and resources to generate feedback in terms of features that research into post-secondary struggling writers has classified as developmental (Burstein et al., 2016b). These features span many writing sub-constructs (use of sources, claims, and evidence; topic development; coherence; and knowledge of English conventions). Prelimi- nary analysis indicates that users have a largely positive impression of Writing Mentor in terms of usability and potential impact on their writing.
Tasks
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-2025/
PDF https://www.aclweb.org/anthology/C18-2025
PWC https://paperswithcode.com/paper/writing-mentor-self-regulated-writing
Repo
Framework

IRIT at TRAC 2018

Title IRIT at TRAC 2018
Authors Rami, Faneva risoa, Josiane Mothe
Abstract This paper describes the participation of the IRIT team to the TRAC 2018 shared task on Aggression Identification and more precisely to the shared task in English language. The three following methods have been used: a) a combination of machine learning techniques that relies on a set of features and document/text vectorization, b) Convolutional Neural Network (CNN) and c) a combination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). Best results were obtained when using the method (a) on the English test data from Facebook which ranked our method sixteenth out of thirty teams, and the method (c) on the English test data from other social media, where we obtained the fifteenth rank out of thirty.
Tasks
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4403/
PDF https://www.aclweb.org/anthology/W18-4403
PWC https://paperswithcode.com/paper/irit-at-trac-2018
Repo
Framework

Lean Multiclass Crowdsourcing

Title Lean Multiclass Crowdsourcing
Authors Grant Van Horn, Steve Branson, Scott Loarie, Serge Belongie, Pietro Perona
Abstract We introduce a method for efficiently crowdsourcing multiclass annotations in challenging, real world image datasets. Our method is designed to minimize the number of human annotations that are necessary to achieve a desired level of confidence on class labels. It is based on combining models of worker behavior with computer vision. Our method is general: it can handle a large number of classes, worker labels that come from a taxonomy rather than a flat list, and can model the dependence of labels when workers can see a history of previous annotations. Our method may be used as a drop-in replacement for the majority vote algorithms used in online crowdsourcing services that aggregate multiple human annotations into a final consolidated label. In experiments conducted on two real-life applications we find that our method can reduce the number of required annotations by as much as a factor of 5.4 and can reduce the residual annotation error by up to 90% when compared with majority voting. Furthermore, the online risk estimates of the models may be used to sort the annotated collection and minimize subsequent expert review effort.
Tasks
Published 2018-06-01
URL http://openaccess.thecvf.com/content_cvpr_2018/html/Van_Horn_Lean_Multiclass_Crowdsourcing_CVPR_2018_paper.html
PDF http://openaccess.thecvf.com/content_cvpr_2018/papers/Van_Horn_Lean_Multiclass_Crowdsourcing_CVPR_2018_paper.pdf
PWC https://paperswithcode.com/paper/lean-multiclass-crowdsourcing
Repo
Framework

Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter

Title Fully Connected Neural Network with Advance Preprocessor to Identify Aggression over Facebook and Twitter
Authors Kashyap Raiyani, Teresa Gon{\c{c}}alves, Paulo Quaresma, Vitor Beires Nogueira
Abstract Paper presents the different methodologies developed {&} tested and discusses their results, with the goal of identifying the best possible method for the aggression identification problem in social media.
Tasks Hate Speech Detection, Language Identification, Text Classification
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4404/
PDF https://www.aclweb.org/anthology/W18-4404
PWC https://paperswithcode.com/paper/fully-connected-neural-network-with-advance
Repo
Framework

ITER: Improving Translation Edit Rate through Optimizable Edit Costs

Title ITER: Improving Translation Edit Rate through Optimizable Edit Costs
Authors Joybrata Panja, Sudip Kumar Naskar
Abstract The paper presents our participation in the WMT 2018 Metrics Shared Task. We propose an improved version of Translation Edit/Error Rate (TER). In addition to including the basic edit operations in TER, namely - insertion, deletion, substitution and shift, our metric also allows stem matching, optimizable edit costs and better normalization so as to correlate better with human judgement scores. The proposed metric shows much higher correlation with human judgments than TER.
Tasks Machine Translation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6455/
PDF https://www.aclweb.org/anthology/W18-6455
PWC https://paperswithcode.com/paper/iter-improving-translation-edit-rate-through
Repo
Framework
comments powered by Disqus