October 15, 2019

1840 words 9 mins read

Paper Group NANR 135

Konbitzul: an MWE-specific database for Spanish-Basque. Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.. Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies. Utilization of Nganasan digital resources: a statistical approach to vowe …

Konbitzul: an MWE-specific database for Spanish-Basque


Title	Konbitzul: an MWE-specific database for Spanish-Basque
Authors	Uxoa I{~n}urrieta, Itziar Aduriz, Arantza D{'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola
Abstract
Tasks	Machine Translation
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1397/
PDF	https://www.aclweb.org/anthology/L18-1397
PWC	https://paperswithcode.com/paper/konbitzul-an-mwe-specific-database-for
Repo
Framework

Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.


Title	Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.
Authors	Claus Zinn, Wei Qui, Marie Hinrichs, Emanuel Dima, Alex Chernov, r
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1285/
PDF	https://www.aclweb.org/anthology/L18-1285
PWC	https://paperswithcode.com/paper/handling-big-data-and-sensitive-data-using
Repo
Framework

Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies


Title	Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies
Authors	Taraka Rama, S{\o}ren Wichmann
Abstract	Bayesian linguistic phylogenies are standardly based on cognate matrices for words referring to a fix set of meanings{—}typically around 100-200. To this day there has not been any empirical investigation into which datasize is optimal. Here we determine, across a set of language families, the optimal number of meanings required for the best performance in Bayesian phylogenetic inference. We rank meanings by stability, infer phylogenetic trees using first the most stable meaning, then the two most stable meanings, and so on, computing the quartet distance of the resulting tree to the tree proposed by language family experts at each step of datasize increase. When a gold standard tree is not available we propose to instead compute the quartet distance between the tree based on the n-most stable meaning and the one based on the n + 1-most stable meanings, increasing n from 1 to N âˆ’ 1, where N is the total number of meanings. The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve. We show that this assumption is borne out. The results of the two methods vary across families, and the optimal number of meanings appears to correlate with the number of languages under consideration.
Tasks	Bayesian Inference
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1134/
PDF	https://www.aclweb.org/anthology/C18-1134
PWC	https://paperswithcode.com/paper/towards-identifying-the-optimal-datasize-for
Repo
Framework

Utilization of Nganasan digital resources: a statistical approach to vowel harmony


Title	Utilization of Nganasan digital resources: a statistical approach to vowel harmony
Authors	L{'a}szl{'o} Fejes
Abstract
Tasks
Published	2018-01-01
URL	https://www.aclweb.org/anthology/W18-0211/
PDF	https://www.aclweb.org/anthology/W18-0211
PWC	https://paperswithcode.com/paper/utilization-of-nganasan-digital-resources-a
Repo
Framework

An OpenNMT Model to Arabic Broken Plurals


Title	An OpenNMT Model to Arabic Broken Plurals
Authors	Elsayed Issa
Abstract	Arabic Broken Plurals show an interesting phenomenon in Arabic morphology as they are formed by shifting the consonants of the syllables into different syllable patterns, and subsequently, the pattern of the word changes. The present paper, therefore, attempts to look at Arabic broken plurals from the perspective of neural networks by implementing an OpenNMT experiment to better understand and interpret the behavior of these plurals, especially when it comes to L2 acquisition. The results show that the model is successful in predicting the Arabic template. However, it fails to predict certain consonants such as the emphatics and the gutturals. This reinforces the fact that these consonants or sounds are the most difficult for L2 learners to acquire.
Tasks	Language Acquisition
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-4103/
PDF	https://www.aclweb.org/anthology/W18-4103
PWC	https://paperswithcode.com/paper/an-opennmt-model-to-arabic-broken-plurals
Repo
Framework

A Hybrid Learning Scheme for Chinese Word Embedding


Title	A Hybrid Learning Scheme for Chinese Word Embedding
Authors	Wenfan Chen, Weiguo Sheng
Abstract	To improve word embedding, subword information has been widely employed in state-of-the-art methods. These methods can be classified to either compositional or predictive models. In this paper, we propose a hybrid learning scheme, which integrates compositional and predictive model for word embedding. Such a scheme can take advantage of both models, thus effectively learning word embedding. The proposed scheme has been applied to learn word representation on Chinese. Our results show that the proposed scheme can significantly improve the performance of word embedding in terms of analogical reasoning and is robust to the size of training data.
Tasks	Language Modelling, Machine Translation, Question Answering, Representation Learning, Word Embeddings
Published	2018-07-01
URL	https://www.aclweb.org/anthology/W18-3011/
PDF	https://www.aclweb.org/anthology/W18-3011
PWC	https://paperswithcode.com/paper/a-hybrid-learning-scheme-for-chinese-word
Repo
Framework

Improved Structure from Motion Using Fiducial Marker Matching


Title	Improved Structure from Motion Using Fiducial Marker Matching
Authors	Joseph DeGol, Timothy Bretl, Derek Hoiem
Abstract	In this paper, we present an incremental structure from motion (SfM) algorithm that signiï¬cantly outperforms existing algorithms when ï¬ducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present. Our algorithm uses markers to limit potential incorrect image matches, change the order in which images are added to the reconstruction, and enforce new bundle adjustment constraints. To validate our algorithm, we introduce a new dataset with 16 image collections of large indoor scenes with challenging characteristics (e.g., blank hallways, glass facades, brick walls) and with markers placed throughout. We show that our algorithm produces complete, accurate reconstructions on all 16 image collections, most of which cause other algorithms to fail. Further, by selectively masking ï¬ducial markers, we show that the presence of even a small number of markers can improve the results of our algorithm.
Tasks
Published	2018-09-01
URL	http://openaccess.thecvf.com/content_ECCV_2018/html/Joseph_DeGol_Improved_Structure_from_ECCV_2018_paper.html
PDF	http://openaccess.thecvf.com/content_ECCV_2018/papers/Joseph_DeGol_Improved_Structure_from_ECCV_2018_paper.pdf
PWC	https://paperswithcode.com/paper/improved-structure-from-motion-using-fiducial
Repo
Framework

UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation


Title	UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
Authors	Gustavo Paetzold
Abstract	We present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training classic binary classification models over an artificially produced binary classification dataset derived from a high-quality translation set, and a minimalistic set of 6 semantic distance features that rely only on easy-to-gather resources. We rank translations by their probability for the {``}good{''} label. Our results show that logistic regression pairs best with our approach, yielding more consistent results throughout the different settings evaluated. \|
Tasks	Language Modelling, Machine Translation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6483/
PDF	https://www.aclweb.org/anthology/W18-6483
PWC	https://paperswithcode.com/paper/utfpr-at-wmt-2018-minimalistic-supervised
Repo
Framework

Findings of the 2018 Conference on Machine Translation (WMT18)


Title	Findings of the 2018 Conference on Machine Translation (WMT18)
Authors	Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz
Abstract	This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test sets to probe specific aspects of translation.
Tasks	Automatic Post-Editing, Machine Translation, Multimodal Machine Translation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6401/
PDF	https://www.aclweb.org/anthology/W18-6401
PWC	https://paperswithcode.com/paper/findings-of-the-2018-conference-on-machine
Repo
Framework

Keep It or Not: Word Level Quality Estimation for Post-Editing


Title	Keep It or Not: Word Level Quality Estimation for Post-Editing
Authors	Prasenjit Basu, Santanu Pal, Sudip Kumar Naskar
Abstract	The paper presents our participation in the WMT 2018 shared task on word level quality estimation (QE) of machine translated (MT) text, i.e., to predict whether a word in MT output for a given source context is correctly translated and hence should be retained in the post-edited translation (PE), or not. To perform the QE task, we measure the similarity of the source context of the target MT word with the context for which the word is retained in PE in the training data. This is achieved in two different ways, using \textit{Bag-of-Words} (\textit{BoW}) model and \textit{Document-to-Vector} (\textit{Doc2Vec}) model. In the \textit{BoW} model, we compute the cosine similarity while in the \textit{Doc2Vec} model we consider the Doc2Vec similarity. By applying the Kneedle algorithm on the F1mult vs. similarity score plot, we derive the threshold based on which OK/BAD decisions are taken for the MT words. Experimental results revealed that the Doc2Vec model performs better than the BoW model on the word level QE task.
Tasks	Language Modelling, Machine Translation
Published	2018-10-01
URL	https://www.aclweb.org/anthology/W18-6457/
PDF	https://www.aclweb.org/anthology/W18-6457
PWC	https://paperswithcode.com/paper/keep-it-or-not-word-level-quality-estimation
Repo
Framework

Are we experiencing the Golden Age of Automatic Post-Editing?


Title	Are we experiencing the Golden Age of Automatic Post-Editing?
Authors	Marcin Junczys-Dowmunt
Abstract
Tasks	Automatic Post-Editing
Published	2018-03-01
URL	https://www.aclweb.org/anthology/W18-2105/
PDF	https://www.aclweb.org/anthology/W18-2105
PWC	https://paperswithcode.com/paper/are-we-experiencing-the-golden-age-of
Repo
Framework

Challenges in Adaptive Neural Machine Translation


Title	Challenges in Adaptive Neural Machine Translation
Authors	Marcello Federico
Abstract
Tasks	Automatic Post-Editing, Machine Translation
Published	2018-03-01
URL	https://www.aclweb.org/anthology/W18-2106/
PDF	https://www.aclweb.org/anthology/W18-2106
PWC	https://paperswithcode.com/paper/challenges-in-adaptive-neural-machine
Repo
Framework

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models


Title	Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Authors	Yining Wang, Xi Chen, Yuan Zhou
Abstract	In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems.
Tasks
Published	2018-12-01
URL	http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-multinomial-logit-assortment-selection-models
PDF	http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-multinomial-logit-assortment-selection-models.pdf
PWC	https://paperswithcode.com/paper/near-optimal-policies-for-dynamic-multinomial
Repo
Framework

Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings


Title	Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings
Authors	Kangwook Lee, Hoon Kim, Changho Suh
Abstract	Collecting a large dataset with high quality annotations is expensive and time-consuming. Recently, Shrivastava et al. (2017) propose Simulated+Unsupervised (S+U) learning: It first learns a mapping from synthetic data to real data, translates a large amount of labeled synthetic data to the ones that resemble real data, and then trains a learning model on the translated data. Bousmalis et al. (2017) propose a similar framework that jointly trains a translation mapping and a learning model. While these algorithms are shown to achieve the state-of-the-art performances on various tasks, it may have a room for improvement, as they do not fully leverage flexibility of data simulation process and consider only the forward (synthetic to real) mapping. While these algorithms are shown to achieve the state-of-the-art performances on various tasks, it may have a room for improvement, as it does not fully leverage flexibility of data simulation process and consider only the forward (synthetic to real) mapping. Inspired by this limitation, we propose a new S+U learning algorithm, which fully leverage the flexibility of data simulators and bidirectional mappings between synthetic data and real data. We show that our approach achieves the improved performance on the gaze estimation task, outperforming (Shrivastava et al., 2017).
Tasks	Gaze Estimation
Published	2018-01-01
URL	https://openreview.net/forum?id=SkHDoG-Cb
PDF	https://openreview.net/pdf?id=SkHDoG-Cb
PWC	https://paperswithcode.com/paper/simulatedunsupervised-learning-with-adaptive
Repo
Framework

Sentence Level Temporality Detection using an Implicit Time-sensed Resource


Title	Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Authors	Sabyasachi Kamila, Asif Ekbal, Pushpak Bhattacharyya
Abstract
Tasks	Information Retrieval
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1049/
PDF	https://www.aclweb.org/anthology/L18-1049
PWC	https://paperswithcode.com/paper/sentence-level-temporality-detection-using-an
Repo
Framework