October 15, 2019

1840 words 9 mins read

Paper Group NANR 135

Paper Group NANR 135

Konbitzul: an MWE-specific database for Spanish-Basque. Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.. Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies. Utilization of Nganasan digital resources: a statistical approach to vowe …

Konbitzul: an MWE-specific database for Spanish-Basque

Title Konbitzul: an MWE-specific database for Spanish-Basque
Authors Uxoa I{~n}urrieta, Itziar Aduriz, Arantza D{'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola
Abstract
Tasks Machine Translation
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1397/
PDF https://www.aclweb.org/anthology/L18-1397
PWC https://paperswithcode.com/paper/konbitzul-an-mwe-specific-database-for
Repo
Framework

Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.

Title Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.
Authors Claus Zinn, Wei Qui, Marie Hinrichs, Emanuel Dima, Alex Chernov, r
Abstract
Tasks
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1285/
PDF https://www.aclweb.org/anthology/L18-1285
PWC https://paperswithcode.com/paper/handling-big-data-and-sensitive-data-using
Repo
Framework

Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies

Title Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies
Authors Taraka Rama, S{\o}ren Wichmann
Abstract Bayesian linguistic phylogenies are standardly based on cognate matrices for words referring to a fix set of meanings{—}typically around 100-200. To this day there has not been any empirical investigation into which datasize is optimal. Here we determine, across a set of language families, the optimal number of meanings required for the best performance in Bayesian phylogenetic inference. We rank meanings by stability, infer phylogenetic trees using first the most stable meaning, then the two most stable meanings, and so on, computing the quartet distance of the resulting tree to the tree proposed by language family experts at each step of datasize increase. When a gold standard tree is not available we propose to instead compute the quartet distance between the tree based on the n-most stable meaning and the one based on the n + 1-most stable meanings, increasing n from 1 to N − 1, where N is the total number of meanings. The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve. We show that this assumption is borne out. The results of the two methods vary across families, and the optimal number of meanings appears to correlate with the number of languages under consideration.
Tasks Bayesian Inference
Published 2018-08-01
URL https://www.aclweb.org/anthology/C18-1134/
PDF https://www.aclweb.org/anthology/C18-1134
PWC https://paperswithcode.com/paper/towards-identifying-the-optimal-datasize-for
Repo
Framework

Utilization of Nganasan digital resources: a statistical approach to vowel harmony

Title Utilization of Nganasan digital resources: a statistical approach to vowel harmony
Authors L{'a}szl{'o} Fejes
Abstract
Tasks
Published 2018-01-01
URL https://www.aclweb.org/anthology/W18-0211/
PDF https://www.aclweb.org/anthology/W18-0211
PWC https://paperswithcode.com/paper/utilization-of-nganasan-digital-resources-a
Repo
Framework

An OpenNMT Model to Arabic Broken Plurals

Title An OpenNMT Model to Arabic Broken Plurals
Authors Elsayed Issa
Abstract Arabic Broken Plurals show an interesting phenomenon in Arabic morphology as they are formed by shifting the consonants of the syllables into different syllable patterns, and subsequently, the pattern of the word changes. The present paper, therefore, attempts to look at Arabic broken plurals from the perspective of neural networks by implementing an OpenNMT experiment to better understand and interpret the behavior of these plurals, especially when it comes to L2 acquisition. The results show that the model is successful in predicting the Arabic template. However, it fails to predict certain consonants such as the emphatics and the gutturals. This reinforces the fact that these consonants or sounds are the most difficult for L2 learners to acquire.
Tasks Language Acquisition
Published 2018-08-01
URL https://www.aclweb.org/anthology/W18-4103/
PDF https://www.aclweb.org/anthology/W18-4103
PWC https://paperswithcode.com/paper/an-opennmt-model-to-arabic-broken-plurals
Repo
Framework

A Hybrid Learning Scheme for Chinese Word Embedding

Title A Hybrid Learning Scheme for Chinese Word Embedding
Authors Wenfan Chen, Weiguo Sheng
Abstract To improve word embedding, subword information has been widely employed in state-of-the-art methods. These methods can be classified to either compositional or predictive models. In this paper, we propose a hybrid learning scheme, which integrates compositional and predictive model for word embedding. Such a scheme can take advantage of both models, thus effectively learning word embedding. The proposed scheme has been applied to learn word representation on Chinese. Our results show that the proposed scheme can significantly improve the performance of word embedding in terms of analogical reasoning and is robust to the size of training data.
Tasks Language Modelling, Machine Translation, Question Answering, Representation Learning, Word Embeddings
Published 2018-07-01
URL https://www.aclweb.org/anthology/W18-3011/
PDF https://www.aclweb.org/anthology/W18-3011
PWC https://paperswithcode.com/paper/a-hybrid-learning-scheme-for-chinese-word
Repo
Framework

Improved Structure from Motion Using Fiducial Marker Matching

Title Improved Structure from Motion Using Fiducial Marker Matching
Authors Joseph DeGol, Timothy Bretl, Derek Hoiem
Abstract In this paper, we present an incremental structure from motion (SfM) algorithm that significantly outperforms existing algorithms when fiducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present. Our algorithm uses markers to limit potential incorrect image matches, change the order in which images are added to the reconstruction, and enforce new bundle adjustment constraints. To validate our algorithm, we introduce a new dataset with 16 image collections of large indoor scenes with challenging characteristics (e.g., blank hallways, glass facades, brick walls) and with markers placed throughout. We show that our algorithm produces complete, accurate reconstructions on all 16 image collections, most of which cause other algorithms to fail. Further, by selectively masking fiducial markers, we show that the presence of even a small number of markers can improve the results of our algorithm.
Tasks
Published 2018-09-01
URL http://openaccess.thecvf.com/content_ECCV_2018/html/Joseph_DeGol_Improved_Structure_from_ECCV_2018_paper.html
PDF http://openaccess.thecvf.com/content_ECCV_2018/papers/Joseph_DeGol_Improved_Structure_from_ECCV_2018_paper.pdf
PWC https://paperswithcode.com/paper/improved-structure-from-motion-using-fiducial
Repo
Framework

UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation

Title UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
Authors Gustavo Paetzold
Abstract We present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training classic binary classification models over an artificially produced binary classification dataset derived from a high-quality translation set, and a minimalistic set of 6 semantic distance features that rely only on easy-to-gather resources. We rank translations by their probability for the {``}good{''} label. Our results show that logistic regression pairs best with our approach, yielding more consistent results throughout the different settings evaluated. |
Tasks Language Modelling, Machine Translation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6483/
PDF https://www.aclweb.org/anthology/W18-6483
PWC https://paperswithcode.com/paper/utfpr-at-wmt-2018-minimalistic-supervised
Repo
Framework

Findings of the 2018 Conference on Machine Translation (WMT18)

Title Findings of the 2018 Conference on Machine Translation (WMT18)
Authors Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz
Abstract This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test sets to probe specific aspects of translation.
Tasks Automatic Post-Editing, Machine Translation, Multimodal Machine Translation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6401/
PDF https://www.aclweb.org/anthology/W18-6401
PWC https://paperswithcode.com/paper/findings-of-the-2018-conference-on-machine
Repo
Framework

Keep It or Not: Word Level Quality Estimation for Post-Editing

Title Keep It or Not: Word Level Quality Estimation for Post-Editing
Authors Prasenjit Basu, Santanu Pal, Sudip Kumar Naskar
Abstract The paper presents our participation in the WMT 2018 shared task on word level quality estimation (QE) of machine translated (MT) text, i.e., to predict whether a word in MT output for a given source context is correctly translated and hence should be retained in the post-edited translation (PE), or not. To perform the QE task, we measure the similarity of the source context of the target MT word with the context for which the word is retained in PE in the training data. This is achieved in two different ways, using \textit{Bag-of-Words} (\textit{BoW}) model and \textit{Document-to-Vector} (\textit{Doc2Vec}) model. In the \textit{BoW} model, we compute the cosine similarity while in the \textit{Doc2Vec} model we consider the Doc2Vec similarity. By applying the Kneedle algorithm on the F1mult vs. similarity score plot, we derive the threshold based on which OK/BAD decisions are taken for the MT words. Experimental results revealed that the Doc2Vec model performs better than the BoW model on the word level QE task.
Tasks Language Modelling, Machine Translation
Published 2018-10-01
URL https://www.aclweb.org/anthology/W18-6457/
PDF https://www.aclweb.org/anthology/W18-6457
PWC https://paperswithcode.com/paper/keep-it-or-not-word-level-quality-estimation
Repo
Framework

Are we experiencing the Golden Age of Automatic Post-Editing?

Title Are we experiencing the Golden Age of Automatic Post-Editing?
Authors Marcin Junczys-Dowmunt
Abstract
Tasks Automatic Post-Editing
Published 2018-03-01
URL https://www.aclweb.org/anthology/W18-2105/
PDF https://www.aclweb.org/anthology/W18-2105
PWC https://paperswithcode.com/paper/are-we-experiencing-the-golden-age-of
Repo
Framework

Challenges in Adaptive Neural Machine Translation

Title Challenges in Adaptive Neural Machine Translation
Authors Marcello Federico
Abstract
Tasks Automatic Post-Editing, Machine Translation
Published 2018-03-01
URL https://www.aclweb.org/anthology/W18-2106/
PDF https://www.aclweb.org/anthology/W18-2106
PWC https://paperswithcode.com/paper/challenges-in-adaptive-neural-machine
Repo
Framework

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

Title Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Authors Yining Wang, Xi Chen, Yuan Zhou
Abstract In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
Tasks
Published 2018-12-01
URL http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-multinomial-logit-assortment-selection-models
PDF http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-multinomial-logit-assortment-selection-models.pdf
PWC https://paperswithcode.com/paper/near-optimal-policies-for-dynamic-multinomial
Repo
Framework

Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings

Title Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings
Authors Kangwook Lee, Hoon Kim, Changho Suh
Abstract Collecting a large dataset with high quality annotations is expensive and time-consuming. Recently, Shrivastava et al. (2017) propose Simulated+Unsupervised (S+U) learning: It first learns a mapping from synthetic data to real data, translates a large amount of labeled synthetic data to the ones that resemble real data, and then trains a learning model on the translated data. Bousmalis et al. (2017) propose a similar framework that jointly trains a translation mapping and a learning model. While these algorithms are shown to achieve the state-of-the-art performances on various tasks, it may have a room for improvement, as they do not fully leverage flexibility of data simulation process and consider only the forward (synthetic to real) mapping. While these algorithms are shown to achieve the state-of-the-art performances on various tasks, it may have a room for improvement, as it does not fully leverage flexibility of data simulation process and consider only the forward (synthetic to real) mapping. Inspired by this limitation, we propose a new S+U learning algorithm, which fully leverage the flexibility of data simulators and bidirectional mappings between synthetic data and real data. We show that our approach achieves the improved performance on the gaze estimation task, outperforming (Shrivastava et al., 2017).
Tasks Gaze Estimation
Published 2018-01-01
URL https://openreview.net/forum?id=SkHDoG-Cb
PDF https://openreview.net/pdf?id=SkHDoG-Cb
PWC https://paperswithcode.com/paper/simulatedunsupervised-learning-with-adaptive
Repo
Framework

Sentence Level Temporality Detection using an Implicit Time-sensed Resource

Title Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Authors Sabyasachi Kamila, Asif Ekbal, Pushpak Bhattacharyya
Abstract
Tasks Information Retrieval
Published 2018-05-01
URL https://www.aclweb.org/anthology/L18-1049/
PDF https://www.aclweb.org/anthology/L18-1049
PWC https://paperswithcode.com/paper/sentence-level-temporality-detection-using-an
Repo
Framework
comments powered by Disqus