Paper Group NANR 120
Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation. Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation. SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation. Bayesian Learning for Neural Dependency Parsing. Typological Features for …
Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation
Title | Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation |
Authors | Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar |
Abstract | This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling task. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The positive result is that using EWC with the decoder achieves BLEU scores similar to the previous work. However, the model converges 2-3 times faster and does not require the original unlabeled training data during the fine-tuning stage. In contrast, the regularization using EWC is less effective if the original and new tasks are not closely related. We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder for the whole bidirectional context. |
Tasks | Language Modelling, Machine Translation |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2017/ |
https://www.aclweb.org/anthology/P19-2017 | |
PWC | https://paperswithcode.com/paper/unsupervised-pretraining-for-neural-machine |
Repo | |
Framework | |
Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation
Title | Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation |
Authors | Wei Deng, Xiao Zhang, Faming Liang, Guang Lin |
Abstract | We propose a robust Bayesian deep learning algorithm to infer complex posteriors with latent variables. Inspired by dropout, a popular tool for regularization and model ensemble, we assign sparse priors to the weights in deep neural networks (DNN) in order to achieve automatic “dropout” and avoid over-fitting. By alternatively sampling from posterior distribution through stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and optimizing latent variables via stochastic approximation (SA), the trajectory of the target weights is proved to converge to the true posterior distribution conditioned on optimal latent variables. This ensures a stronger regularization on the over-fitted parameter space and more accurate uncertainty quantification on the decisive variables. Simulations from large-p-small-n regressions showcase the robustness of this method when applied to models with latent variables. Additionally, its application on the convolutional neural networks (CNN) leads to state-of-the-art performance on MNIST and Fashion MNIST datasets and improved resistance to adversarial attacks. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S1grRoR9tQ |
https://openreview.net/pdf?id=S1grRoR9tQ | |
PWC | https://paperswithcode.com/paper/bayesian-deep-learning-via-stochastic |
Repo | |
Framework | |
SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation
Title | SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation |
Authors | Junpei Zhou, Zhisong Zhang, Zecong Hu |
Abstract | Quality estimation (QE) of machine translation (MT) systems is a task of growing importance. It reduces the cost of post-editing, allowing machine-translated text to be used in formal occasions. In this work, we describe our submission system in WMT 2019 sentence-level QE task. We mainly explore the utilization of pre-trained translation models in QE and adopt a bi-directional translation-like strategy. The strategy is similar to ELMo, but additionally conditions on source sentences. Experiments on WMT QE dataset show that our strategy, which makes the pre-training slightly harder, can bring improvements for QE. In WMT-2019 QE task, our system ranked in the second place on En-De NMT dataset and the third place on En-Ru NMT dataset. |
Tasks | Machine Translation |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-5411/ |
https://www.aclweb.org/anthology/W19-5411 | |
PWC | https://paperswithcode.com/paper/source-source-conditional-elmo-style-model |
Repo | |
Framework | |
Bayesian Learning for Neural Dependency Parsing
Title | Bayesian Learning for Neural Dependency Parsing |
Authors | Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart, Anna Korhonen |
Abstract | While neural dependency parsers provide state-of-the-art accuracy for several languages, they still rely on large amounts of costly labeled training data. We demonstrate that in the small data regime, where uncertainty around parameter estimation and model prediction matters the most, Bayesian neural modeling is very effective. In order to overcome the computational and statistical costs of the approximate inference step in this framework, we utilize an efficient sampling procedure via stochastic gradient Langevin dynamics to generate samples from the approximated posterior. Moreover, we show that our Bayesian neural parser can be further improved when integrated into a multi-task parsing and POS tagging framework, designed to minimize task interference via an adversarial procedure. When trained and tested on 6 languages with less than 5k training instances, our parser consistently outperforms the strong bilstm baseline (Kiperwasser and Goldberg, 2016). Compared with the biaffine parser (Dozat et al., 2017) our model achieves an improvement of up to 3{%} for Vietnames and Irish, while our multi-task model achieves an improvement of up to 9{%} across five languages: Farsi, Russian, Turkish, Vietnamese, and Irish. |
Tasks | Dependency Parsing |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1354/ |
https://www.aclweb.org/anthology/N19-1354 | |
PWC | https://paperswithcode.com/paper/bayesian-learning-for-neural-dependency |
Repo | |
Framework | |
Typological Features for Multilingual Delexicalised Dependency Parsing
Title | Typological Features for Multilingual Delexicalised Dependency Parsing |
Authors | Manon Scholivet, Franck Dary, Alexis Nasr, Benoit Favre, Carlos Ramisch |
Abstract | The existence of universal models to describe the syntax of languages has been debated for decades. The availability of resources such as the Universal Dependencies treebanks and the World Atlas of Language Structures make it possible to study the plausibility of universal grammar from the perspective of dependency parsing. Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing. Our experiments on multilingual parsing for 40 languages show that typological information can indeed guide parsers to share information between similar languages beyond simple language identification. |
Tasks | Dependency Parsing, Language Identification |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1393/ |
https://www.aclweb.org/anthology/N19-1393 | |
PWC | https://paperswithcode.com/paper/typological-features-for-multilingual |
Repo | |
Framework | |
Latent Structure Models for Natural Language Processing
Title | Latent Structure Models for Natural Language Processing |
Authors | Andr{'e} F. T. Martins, Tsvetomila Mihaylova, Nikita Nangia, Vlad Niculae |
Abstract | Latent structure models are a powerful tool for modeling compositional data, discovering linguistic structure, and building NLP pipelines. They are appealing for two main reasons: they allow incorporating structural bias during training, leading to more accurate models; and they allow discovering hidden linguistic structure, which provides better interpretability. This tutorial will cover recent advances in discrete latent structure models. We discuss their motivation, potential, and limitations, then explore in detail three strategies for designing such models: gradient approximation, reinforcement learning, and end-to-end differentiable methods. We highlight connections among all these methods, enumerating their strengths and weaknesses. The models we present and analyze have been applied to a wide variety of NLP tasks, including sentiment analysis, natural language inference, language modeling, machine translation, and semantic parsing. Examples and evaluation will be covered throughout. After attending the tutorial, a practitioner will be better informed about which method is best suited for their problem. |
Tasks | Language Modelling, Machine Translation, Natural Language Inference, Semantic Parsing, Sentiment Analysis |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-4001/ |
https://www.aclweb.org/anthology/P19-4001 | |
PWC | https://paperswithcode.com/paper/latent-structure-models-for-natural-language |
Repo | |
Framework | |
GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks
Title | GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks |
Authors | Shiva Taslimipoor, Omid Rohanian, Sara Mo{\v{z}}e |
Abstract | This paper describes the system submitted to the SemEval 2019 shared task 1 {`}Cross-lingual Semantic Parsing with UCCA{'}. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing. | |
Tasks | Dependency Parsing, Semantic Dependency Parsing, Semantic Parsing |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2014/ |
https://www.aclweb.org/anthology/S19-2014 | |
PWC | https://paperswithcode.com/paper/gcn-sem-at-semeval-2019-task-1-semantic |
Repo | |
Framework | |
Extraction of Message Sequence Charts from Narrative History Text
Title | Extraction of Message Sequence Charts from Narrative History Text |
Authors | Girish Palshikar, Sachin Pawar, Sangameshwar Patil, Swapnil Hingmire, Nitin Ramrakhiyani, Harsimran Bedi, Pushpak Bhattacharyya, Vasudeva Varma |
Abstract | In this paper, we advocate the use of Message Sequence Chart (MSC) as a knowledge representation to capture and visualize multi-actor interactions and their temporal ordering. We propose algorithms to automatically extract an MSC from a history narrative. For a given narrative, we first identify verbs which indicate interactions and then use dependency parsing and Semantic Role Labelling based approaches to identify senders (initiating actors) and receivers (other actors involved) for these interaction verbs. As a final step in MSC extraction, we employ a state-of-the art algorithm to temporally re-order these interactions. Our evaluation on multiple publicly available narratives shows improvements over four baselines. |
Tasks | Dependency Parsing |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-2404/ |
https://www.aclweb.org/anthology/W19-2404 | |
PWC | https://paperswithcode.com/paper/extraction-of-message-sequence-charts-from-1 |
Repo | |
Framework | |
Saarland at MRP 2019: Compositional parsing across all graphbanks
Title | Saarland at MRP 2019: Compositional parsing across all graphbanks |
Authors | Lucia Donatelli, Meaghan Fowlie, Jonas Groschwitz, Alex Koller, er, Matthias Lindemann, Mario Mina, Pia Wei{\ss}enhorn |
Abstract | We describe the Saarland University submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference on Computational Natural Language Learning (CoNLL). |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/K19-2006/ |
https://www.aclweb.org/anthology/K19-2006 | |
PWC | https://paperswithcode.com/paper/saarland-at-mrp-2019-compositional-parsing |
Repo | |
Framework | |
What Correspondences Reveal About Unknown Camera and Motion Models?
Title | What Correspondences Reveal About Unknown Camera and Motion Models? |
Authors | Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool |
Abstract | In two-view geometry, camera models and motion types are used as key knowledge along with the image point correspondences in order to solve several key problems of 3D vision. Problems such as Structure-from-Motion (SfM) and camera self-calibration are tackled under the assumptions of a specific camera projection model and motion type. However, these key assumptions may not be always justified, i.e.., we may often know neither the camera model nor the motion type beforehand. In that context, one can extract only the point correspondences between images. From such correspondences, recovering two-view relationship –expressed by the unknown camera model and motion type– remains to be an unsolved problem. In this paper, we tackle this problem in two steps. First, we propose a method that computes the correct two-view relationship in the presence of noise and outliers. Later, we study different possibilities to disambiguate the obtained relationships into camera model and motion type. By extensive experiments on both synthetic and real data, we verify our theory and assumptions in practical settings. |
Tasks | Calibration |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Probst_What_Correspondences_Reveal_About_Unknown_Camera_and_Motion_Models_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Probst_What_Correspondences_Reveal_About_Unknown_Camera_and_Motion_Models_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/what-correspondences-reveal-about-unknown |
Repo | |
Framework | |
Nested Dithered Quantization for Communication Reduction in Distributed Training
Title | Nested Dithered Quantization for Communication Reduction in Distributed Training |
Authors | Afshin Abdi, Faramarz Fekri |
Abstract | In distributed training, the communication cost due to the transmission of gradients or the parameters of the deep model is a major bottleneck in scaling up the number of processing nodes. To address this issue, we propose dithered quantization for the transmission of the stochastic gradients and show that training with Dithered Quantized Stochastic Gradients (DQSG) is similar to the training with unquantized SGs perturbed by an independent bounded uniform noise, in contrast to the other quantization methods where the perturbation depends on the gradients and hence, complicating the convergence analysis. We study the convergence of training algorithms using DQSG and the trade off between the number of quantization levels and the training time. Next, we observe that there is a correlation among the SGs computed by workers that can be utilized to further reduce the communication overhead without any performance loss. Hence, we develop a simple yet effective quantization scheme, nested dithered quantized SG (NDQSG), that can reduce the communication significantly without requiring the workers communicating extra information to each other. We prove that although NDQSG requires significantly less bits, it can achieve the same quantization variance bound as DQSG. Our simulation results confirm the effectiveness of training using DQSG and NDQSG in reducing the communication bits or the convergence time compared to the existing methods without sacrificing the accuracy of the trained model. |
Tasks | Quantization |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=rJxMM2C5K7 |
https://openreview.net/pdf?id=rJxMM2C5K7 | |
PWC | https://paperswithcode.com/paper/nested-dithered-quantization-for |
Repo | |
Framework | |
Incremental Object Learning From Contiguous Views
Title | Incremental Object Learning From Contiguous Views |
Authors | Stefan Stojanov, Samarth Mishra, Ngoc Anh Thai, Nikhil Dhanda, Ahmad Humayun, Chen Yu, Linda B. Smith, James M. Rehg |
Abstract | In this work, we present CRIB (Continual Recognition Inspired by Babies), a synthetic incremental object learning environment that can produce data that models visual imagery produced by object exploration in early infancy. CRIB is coupled with a new 3D object dataset, Toys-200, that contains 200 unique toy-like object instances, and is also compatible with existing 3D datasets. Through extensive empirical evaluation of state-of-the-art incremental learning algorithms, we find the novel empirical result that repetition can significantly ameliorate the effects of catastrophic forgetting. Furthermore, we find that in certain cases repetition allows for performance approaching that of batch learning algorithms. Finally, we propose an unsupervised incremental learning task with intriguing baseline results. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Stojanov_Incremental_Object_Learning_From_Contiguous_Views_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Stojanov_Incremental_Object_Learning_From_Contiguous_Views_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/incremental-object-learning-from-contiguous |
Repo | |
Framework | |
A convolutional neural network approach to detect congestive heart failure
Title | A convolutional neural network approach to detect congestive heart failure |
Authors | Mihaela Porumb, Ernesto Iadanza, Sebastiano Massaro, Leandro Pecchia |
Abstract | Congestive Heart Failure (CHF) is a severe pathophysiological condition associated with high prevalence, high mortality rates, and sustained healthcare costs, therefore demanding efficient methods for its detection. Despite recent research has provided methods focused on advanced signal processing and machine learning, the potential of applying Convolutional Neural Network (CNN) approaches to the automatic detection of CHF has been largely overlooked thus far. This study addresses this important gap by presenting a CNN model that accurately identifies CHF on the basis of one raw electrocardiogram (ECG) heartbeat only, also juxtaposing existing methods typically grounded on Heart Rate Variability. We trained and tested the model on publicly available ECG datasets, comprising a total of 490,505 heartbeats, to achieve 100% CHF detection accuracy. Importantly, the model also identifies those heartbeat sequences and ECG’s morphological characteristics which are class-discriminative and thus prominent for CHF detection. Overall, our contribution substantially advances the current methodology for detecting CHF and caters to clinical practitioners’ needs by providing an accurate and fully transparent tool to support decisions concerning CHF detection. |
Tasks | Congestive Heart Failure detection, Electrocardiography (ECG), Heartbeat Classification, Heart Rate Variability |
Published | 2019-09-03 |
URL | https://doi.org/10.1016/j.bspc.2019.101597 |
https://www.sciencedirect.com/science/article/pii/S1746809419301776/pdfft?md5=ca17956e278efdd4a39ec925adfa2b16&pid=1-s2.0-S1746809419301776-main.pdf | |
PWC | https://paperswithcode.com/paper/a-convolutional-neural-network-approach-to |
Repo | |
Framework | |
Proceedings of the 6th Workshop on Asian Translation
Title | Proceedings of the 6th Workshop on Asian Translation |
Authors | |
Abstract | |
Tasks | |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5200/ |
https://www.aclweb.org/anthology/D19-5200 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-6th-workshop-on-asian |
Repo | |
Framework | |
Limiting Extrapolation in Linear Approximate Value Iteration
Title | Limiting Extrapolation in Linear Approximate Value Iteration |
Authors | Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill |
Abstract | We study linear approximate value iteration (LAVI) with a generative model. While linear models may accurately represent the optimal value function using a few parameters, several empirical and theoretical studies show the combination of least-squares projection with the Bellman operator may be expansive, thus leading LAVI to amplify errors over iterations and eventually diverge. We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of \textit{anchor} states. Our algorithm tries to balance the generalization and compactness of linear methods with the small amplification of errors typical of interpolation methods. We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points. These findings are confirmed in preliminary simulations in a number of simple problems where a traditional least-square LAVI method diverges. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8799-limiting-extrapolation-in-linear-approximate-value-iteration |
http://papers.nips.cc/paper/8799-limiting-extrapolation-in-linear-approximate-value-iteration.pdf | |
PWC | https://paperswithcode.com/paper/limiting-extrapolation-in-linear-approximate |
Repo | |
Framework | |