Paper Group NANR 135
Konbitzul: an MWE-specific database for Spanish-Basque. Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.. Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies. Utilization of Nganasan digital resources: a statistical approach to vowe …
Konbitzul: an MWE-specific database for Spanish-Basque
Title | Konbitzul: an MWE-specific database for Spanish-Basque |
Authors | Uxoa I{~n}urrieta, Itziar Aduriz, Arantza D{'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola |
Abstract | |
Tasks | Machine Translation |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1397/ |
https://www.aclweb.org/anthology/L18-1397 | |
PWC | https://paperswithcode.com/paper/konbitzul-an-mwe-specific-database-for |
Repo | |
Framework | |
Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine.
Title | Handling Big Data and Sensitive Data Using EUDAT’s Generic Execution Framework and the WebLicht Workflow Engine. |
Authors | Claus Zinn, Wei Qui, Marie Hinrichs, Emanuel Dima, Alex Chernov, r |
Abstract | |
Tasks | |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1285/ |
https://www.aclweb.org/anthology/L18-1285 | |
PWC | https://paperswithcode.com/paper/handling-big-data-and-sensitive-data-using |
Repo | |
Framework | |
Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies
Title | Towards identifying the optimal datasize for lexically-based Bayesian inference of linguistic phylogenies |
Authors | Taraka Rama, S{\o}ren Wichmann |
Abstract | Bayesian linguistic phylogenies are standardly based on cognate matrices for words referring to a fix set of meanings{—}typically around 100-200. To this day there has not been any empirical investigation into which datasize is optimal. Here we determine, across a set of language families, the optimal number of meanings required for the best performance in Bayesian phylogenetic inference. We rank meanings by stability, infer phylogenetic trees using first the most stable meaning, then the two most stable meanings, and so on, computing the quartet distance of the resulting tree to the tree proposed by language family experts at each step of datasize increase. When a gold standard tree is not available we propose to instead compute the quartet distance between the tree based on the n-most stable meaning and the one based on the n + 1-most stable meanings, increasing n from 1 to N − 1, where N is the total number of meanings. The assumption here is that the value of n for which the quartet distance begins to stabilize is also the value at which the quality of the tree ceases to improve. We show that this assumption is borne out. The results of the two methods vary across families, and the optimal number of meanings appears to correlate with the number of languages under consideration. |
Tasks | Bayesian Inference |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1134/ |
https://www.aclweb.org/anthology/C18-1134 | |
PWC | https://paperswithcode.com/paper/towards-identifying-the-optimal-datasize-for |
Repo | |
Framework | |
Utilization of Nganasan digital resources: a statistical approach to vowel harmony
Title | Utilization of Nganasan digital resources: a statistical approach to vowel harmony |
Authors | L{'a}szl{'o} Fejes |
Abstract | |
Tasks | |
Published | 2018-01-01 |
URL | https://www.aclweb.org/anthology/W18-0211/ |
https://www.aclweb.org/anthology/W18-0211 | |
PWC | https://paperswithcode.com/paper/utilization-of-nganasan-digital-resources-a |
Repo | |
Framework | |
An OpenNMT Model to Arabic Broken Plurals
Title | An OpenNMT Model to Arabic Broken Plurals |
Authors | Elsayed Issa |
Abstract | Arabic Broken Plurals show an interesting phenomenon in Arabic morphology as they are formed by shifting the consonants of the syllables into different syllable patterns, and subsequently, the pattern of the word changes. The present paper, therefore, attempts to look at Arabic broken plurals from the perspective of neural networks by implementing an OpenNMT experiment to better understand and interpret the behavior of these plurals, especially when it comes to L2 acquisition. The results show that the model is successful in predicting the Arabic template. However, it fails to predict certain consonants such as the emphatics and the gutturals. This reinforces the fact that these consonants or sounds are the most difficult for L2 learners to acquire. |
Tasks | Language Acquisition |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-4103/ |
https://www.aclweb.org/anthology/W18-4103 | |
PWC | https://paperswithcode.com/paper/an-opennmt-model-to-arabic-broken-plurals |
Repo | |
Framework | |
A Hybrid Learning Scheme for Chinese Word Embedding
Title | A Hybrid Learning Scheme for Chinese Word Embedding |
Authors | Wenfan Chen, Weiguo Sheng |
Abstract | To improve word embedding, subword information has been widely employed in state-of-the-art methods. These methods can be classified to either compositional or predictive models. In this paper, we propose a hybrid learning scheme, which integrates compositional and predictive model for word embedding. Such a scheme can take advantage of both models, thus effectively learning word embedding. The proposed scheme has been applied to learn word representation on Chinese. Our results show that the proposed scheme can significantly improve the performance of word embedding in terms of analogical reasoning and is robust to the size of training data. |
Tasks | Language Modelling, Machine Translation, Question Answering, Representation Learning, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-3011/ |
https://www.aclweb.org/anthology/W18-3011 | |
PWC | https://paperswithcode.com/paper/a-hybrid-learning-scheme-for-chinese-word |
Repo | |
Framework | |
Improved Structure from Motion Using Fiducial Marker Matching
Title | Improved Structure from Motion Using Fiducial Marker Matching |
Authors | Joseph DeGol, Timothy Bretl, Derek Hoiem |
Abstract | In this paper, we present an incremental structure from motion (SfM) algorithm that signiï¬cantly outperforms existing algorithms when ï¬ducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present. Our algorithm uses markers to limit potential incorrect image matches, change the order in which images are added to the reconstruction, and enforce new bundle adjustment constraints. To validate our algorithm, we introduce a new dataset with 16 image collections of large indoor scenes with challenging characteristics (e.g., blank hallways, glass facades, brick walls) and with markers placed throughout. We show that our algorithm produces complete, accurate reconstructions on all 16 image collections, most of which cause other algorithms to fail. Further, by selectively masking ï¬ducial markers, we show that the presence of even a small number of markers can improve the results of our algorithm. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Joseph_DeGol_Improved_Structure_from_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Joseph_DeGol_Improved_Structure_from_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/improved-structure-from-motion-using-fiducial |
Repo | |
Framework | |
UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation
Title | UTFPR at WMT 2018: Minimalistic Supervised Corpora Filtering for Machine Translation |
Authors | Gustavo Paetzold |
Abstract | We present the UTFPR systems at the WMT 2018 parallel corpus filtering task. Our supervised approach discerns between good and bad translations by training classic binary classification models over an artificially produced binary classification dataset derived from a high-quality translation set, and a minimalistic set of 6 semantic distance features that rely only on easy-to-gather resources. We rank translations by their probability for the {``}good{''} label. Our results show that logistic regression pairs best with our approach, yielding more consistent results throughout the different settings evaluated. | |
Tasks | Language Modelling, Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6483/ |
https://www.aclweb.org/anthology/W18-6483 | |
PWC | https://paperswithcode.com/paper/utfpr-at-wmt-2018-minimalistic-supervised |
Repo | |
Framework | |
Findings of the 2018 Conference on Machine Translation (WMT18)
Title | Findings of the 2018 Conference on Machine Translation (WMT18) |
Authors | Ond{\v{r}}ej Bojar, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Philipp Koehn, Christof Monz |
Abstract | This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2018. Participants were asked to build machine translation systems for any of 7 language pairs in both directions, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. This year, we also opened up the task to additional test sets to probe specific aspects of translation. |
Tasks | Automatic Post-Editing, Machine Translation, Multimodal Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6401/ |
https://www.aclweb.org/anthology/W18-6401 | |
PWC | https://paperswithcode.com/paper/findings-of-the-2018-conference-on-machine |
Repo | |
Framework | |
Keep It or Not: Word Level Quality Estimation for Post-Editing
Title | Keep It or Not: Word Level Quality Estimation for Post-Editing |
Authors | Prasenjit Basu, Santanu Pal, Sudip Kumar Naskar |
Abstract | The paper presents our participation in the WMT 2018 shared task on word level quality estimation (QE) of machine translated (MT) text, i.e., to predict whether a word in MT output for a given source context is correctly translated and hence should be retained in the post-edited translation (PE), or not. To perform the QE task, we measure the similarity of the source context of the target MT word with the context for which the word is retained in PE in the training data. This is achieved in two different ways, using \textit{Bag-of-Words} (\textit{BoW}) model and \textit{Document-to-Vector} (\textit{Doc2Vec}) model. In the \textit{BoW} model, we compute the cosine similarity while in the \textit{Doc2Vec} model we consider the Doc2Vec similarity. By applying the Kneedle algorithm on the F1mult vs. similarity score plot, we derive the threshold based on which OK/BAD decisions are taken for the MT words. Experimental results revealed that the Doc2Vec model performs better than the BoW model on the word level QE task. |
Tasks | Language Modelling, Machine Translation |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-6457/ |
https://www.aclweb.org/anthology/W18-6457 | |
PWC | https://paperswithcode.com/paper/keep-it-or-not-word-level-quality-estimation |
Repo | |
Framework | |
Are we experiencing the Golden Age of Automatic Post-Editing?
Title | Are we experiencing the Golden Age of Automatic Post-Editing? |
Authors | Marcin Junczys-Dowmunt |
Abstract | |
Tasks | Automatic Post-Editing |
Published | 2018-03-01 |
URL | https://www.aclweb.org/anthology/W18-2105/ |
https://www.aclweb.org/anthology/W18-2105 | |
PWC | https://paperswithcode.com/paper/are-we-experiencing-the-golden-age-of |
Repo | |
Framework | |
Challenges in Adaptive Neural Machine Translation
Title | Challenges in Adaptive Neural Machine Translation |
Authors | Marcello Federico |
Abstract | |
Tasks | Automatic Post-Editing, Machine Translation |
Published | 2018-03-01 |
URL | https://www.aclweb.org/anthology/W18-2106/ |
https://www.aclweb.org/anthology/W18-2106 | |
PWC | https://paperswithcode.com/paper/challenges-in-adaptive-neural-machine |
Repo | |
Framework | |
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models
Title | Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models |
Authors | Yining Wang, Xi Chen, Yuan Zhou |
Abstract | In this paper we consider the dynamic assortment selection problem under an uncapacitated multinomial-logit (MNL) model. By carefully analyzing a revenue potential function, we show that a trisection based algorithm achieves an item-independent regret bound of O(sqrt(T log log T), which matches information theoretical lower bounds up to iterated logarithmic terms. Our proof technique draws tools from the unimodal/convex bandit literature as well as adaptive confidence parameters in minimax multi-armed bandit problems. |
Tasks | |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-multinomial-logit-assortment-selection-models |
http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-multinomial-logit-assortment-selection-models.pdf | |
PWC | https://paperswithcode.com/paper/near-optimal-policies-for-dynamic-multinomial |
Repo | |
Framework | |
Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings
Title | Simulated+Unsupervised Learning With Adaptive Data Generation and Bidirectional Mappings |
Authors | Kangwook Lee, Hoon Kim, Changho Suh |
Abstract | Collecting a large dataset with high quality annotations is expensive and time-consuming. Recently, Shrivastava et al. (2017) propose Simulated+Unsupervised (S+U) learning: It first learns a mapping from synthetic data to real data, translates a large amount of labeled synthetic data to the ones that resemble real data, and then trains a learning model on the translated data. Bousmalis et al. (2017) propose a similar framework that jointly trains a translation mapping and a learning model. While these algorithms are shown to achieve the state-of-the-art performances on various tasks, it may have a room for improvement, as they do not fully leverage flexibility of data simulation process and consider only the forward (synthetic to real) mapping. While these algorithms are shown to achieve the state-of-the-art performances on various tasks, it may have a room for improvement, as it does not fully leverage flexibility of data simulation process and consider only the forward (synthetic to real) mapping. Inspired by this limitation, we propose a new S+U learning algorithm, which fully leverage the flexibility of data simulators and bidirectional mappings between synthetic data and real data. We show that our approach achieves the improved performance on the gaze estimation task, outperforming (Shrivastava et al., 2017). |
Tasks | Gaze Estimation |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=SkHDoG-Cb |
https://openreview.net/pdf?id=SkHDoG-Cb | |
PWC | https://paperswithcode.com/paper/simulatedunsupervised-learning-with-adaptive |
Repo | |
Framework | |
Sentence Level Temporality Detection using an Implicit Time-sensed Resource
Title | Sentence Level Temporality Detection using an Implicit Time-sensed Resource |
Authors | Sabyasachi Kamila, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | |
Tasks | Information Retrieval |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1049/ |
https://www.aclweb.org/anthology/L18-1049 | |
PWC | https://paperswithcode.com/paper/sentence-level-temporality-detection-using-an |
Repo | |
Framework | |