January 24, 2020

2638 words 13 mins read

Paper Group NANR 120

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation. Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation. SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation. Bayesian Learning for Neural Dependency Parsing. Typological Features for …

Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation


Title	Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation
Authors	Du{\v{s}}an Vari{\v{s}}, Ond{\v{r}}ej Bojar
Abstract	This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling task. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The positive result is that using EWC with the decoder achieves BLEU scores similar to the previous work. However, the model converges 2-3 times faster and does not require the original unlabeled training data during the fine-tuning stage. In contrast, the regularization using EWC is less effective if the original and new tasks are not closely related. We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder for the whole bidirectional context.
Tasks	Language Modelling, Machine Translation
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2017/
PDF	https://www.aclweb.org/anthology/P19-2017
PWC	https://paperswithcode.com/paper/unsupervised-pretraining-for-neural-machine
Repo
Framework

Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation


Title	Bayesian Deep Learning via Stochastic Gradient MCMC with a Stochastic Approximation Adaptation
Authors	Wei Deng, Xiao Zhang, Faming Liang, Guang Lin
Abstract	We propose a robust Bayesian deep learning algorithm to infer complex posteriors with latent variables. Inspired by dropout, a popular tool for regularization and model ensemble, we assign sparse priors to the weights in deep neural networks (DNN) in order to achieve automatic “dropout” and avoid over-fitting. By alternatively sampling from posterior distribution through stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and optimizing latent variables via stochastic approximation (SA), the trajectory of the target weights is proved to converge to the true posterior distribution conditioned on optimal latent variables. This ensures a stronger regularization on the over-fitted parameter space and more accurate uncertainty quantification on the decisive variables. Simulations from large-p-small-n regressions showcase the robustness of this method when applied to models with latent variables. Additionally, its application on the convolutional neural networks (CNN) leads to state-of-the-art performance on MNIST and Fashion MNIST datasets and improved resistance to adversarial attacks.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=S1grRoR9tQ
PDF	https://openreview.net/pdf?id=S1grRoR9tQ
PWC	https://paperswithcode.com/paper/bayesian-deep-learning-via-stochastic
Repo
Framework

SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation


Title	SOURCE: SOURce-Conditional Elmo-style Model for Machine Translation Quality Estimation
Authors	Junpei Zhou, Zhisong Zhang, Zecong Hu
Abstract	Quality estimation (QE) of machine translation (MT) systems is a task of growing importance. It reduces the cost of post-editing, allowing machine-translated text to be used in formal occasions. In this work, we describe our submission system in WMT 2019 sentence-level QE task. We mainly explore the utilization of pre-trained translation models in QE and adopt a bi-directional translation-like strategy. The strategy is similar to ELMo, but additionally conditions on source sentences. Experiments on WMT QE dataset show that our strategy, which makes the pre-training slightly harder, can bring improvements for QE. In WMT-2019 QE task, our system ranked in the second place on En-De NMT dataset and the third place on En-Ru NMT dataset.
Tasks	Machine Translation
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5411/
PDF	https://www.aclweb.org/anthology/W19-5411
PWC	https://paperswithcode.com/paper/source-source-conditional-elmo-style-model
Repo
Framework

Bayesian Learning for Neural Dependency Parsing


Title	Bayesian Learning for Neural Dependency Parsing
Authors	Ehsan Shareghi, Yingzhen Li, Yi Zhu, Roi Reichart, Anna Korhonen
Abstract	While neural dependency parsers provide state-of-the-art accuracy for several languages, they still rely on large amounts of costly labeled training data. We demonstrate that in the small data regime, where uncertainty around parameter estimation and model prediction matters the most, Bayesian neural modeling is very effective. In order to overcome the computational and statistical costs of the approximate inference step in this framework, we utilize an efficient sampling procedure via stochastic gradient Langevin dynamics to generate samples from the approximated posterior. Moreover, we show that our Bayesian neural parser can be further improved when integrated into a multi-task parsing and POS tagging framework, designed to minimize task interference via an adversarial procedure. When trained and tested on 6 languages with less than 5k training instances, our parser consistently outperforms the strong bilstm baseline (Kiperwasser and Goldberg, 2016). Compared with the biaffine parser (Dozat et al., 2017) our model achieves an improvement of up to 3{%} for Vietnames and Irish, while our multi-task model achieves an improvement of up to 9{%} across five languages: Farsi, Russian, Turkish, Vietnamese, and Irish.
Tasks	Dependency Parsing
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1354/
PDF	https://www.aclweb.org/anthology/N19-1354
PWC	https://paperswithcode.com/paper/bayesian-learning-for-neural-dependency
Repo
Framework

Typological Features for Multilingual Delexicalised Dependency Parsing


Title	Typological Features for Multilingual Delexicalised Dependency Parsing
Authors	Manon Scholivet, Franck Dary, Alexis Nasr, Benoit Favre, Carlos Ramisch
Abstract	The existence of universal models to describe the syntax of languages has been debated for decades. The availability of resources such as the Universal Dependencies treebanks and the World Atlas of Language Structures make it possible to study the plausibility of universal grammar from the perspective of dependency parsing. Our work investigates the use of high-level language descriptions in the form of typological features for multilingual dependency parsing. Our experiments on multilingual parsing for 40 languages show that typological information can indeed guide parsers to share information between similar languages beyond simple language identification.
Tasks	Dependency Parsing, Language Identification
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1393/
PDF	https://www.aclweb.org/anthology/N19-1393
PWC	https://paperswithcode.com/paper/typological-features-for-multilingual
Repo
Framework

Latent Structure Models for Natural Language Processing


Title	Latent Structure Models for Natural Language Processing
Authors	Andr{'e} F. T. Martins, Tsvetomila Mihaylova, Nikita Nangia, Vlad Niculae
Abstract	Latent structure models are a powerful tool for modeling compositional data, discovering linguistic structure, and building NLP pipelines. They are appealing for two main reasons: they allow incorporating structural bias during training, leading to more accurate models; and they allow discovering hidden linguistic structure, which provides better interpretability. This tutorial will cover recent advances in discrete latent structure models. We discuss their motivation, potential, and limitations, then explore in detail three strategies for designing such models: gradient approximation, reinforcement learning, and end-to-end differentiable methods. We highlight connections among all these methods, enumerating their strengths and weaknesses. The models we present and analyze have been applied to a wide variety of NLP tasks, including sentiment analysis, natural language inference, language modeling, machine translation, and semantic parsing. Examples and evaluation will be covered throughout. After attending the tutorial, a practitioner will be better informed about which method is best suited for their problem.
Tasks	Language Modelling, Machine Translation, Natural Language Inference, Semantic Parsing, Sentiment Analysis
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-4001/
PDF	https://www.aclweb.org/anthology/P19-4001
PWC	https://paperswithcode.com/paper/latent-structure-models-for-natural-language
Repo
Framework

GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks


Title	GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks
Authors	Shiva Taslimipoor, Omid Rohanian, Sara Mo{\v{z}}e
Abstract	This paper describes the system submitted to the SemEval 2019 shared task 1 {`}Cross-lingual Semantic Parsing with UCCA{'}. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing. \|
Tasks	Dependency Parsing, Semantic Dependency Parsing, Semantic Parsing
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2014/
PDF	https://www.aclweb.org/anthology/S19-2014
PWC	https://paperswithcode.com/paper/gcn-sem-at-semeval-2019-task-1-semantic
Repo
Framework

Extraction of Message Sequence Charts from Narrative History Text


Title	Extraction of Message Sequence Charts from Narrative History Text
Authors	Girish Palshikar, Sachin Pawar, Sangameshwar Patil, Swapnil Hingmire, Nitin Ramrakhiyani, Harsimran Bedi, Pushpak Bhattacharyya, Vasudeva Varma
Abstract	In this paper, we advocate the use of Message Sequence Chart (MSC) as a knowledge representation to capture and visualize multi-actor interactions and their temporal ordering. We propose algorithms to automatically extract an MSC from a history narrative. For a given narrative, we first identify verbs which indicate interactions and then use dependency parsing and Semantic Role Labelling based approaches to identify senders (initiating actors) and receivers (other actors involved) for these interaction verbs. As a final step in MSC extraction, we employ a state-of-the art algorithm to temporally re-order these interactions. Our evaluation on multiple publicly available narratives shows improvements over four baselines.
Tasks	Dependency Parsing
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2404/
PDF	https://www.aclweb.org/anthology/W19-2404
PWC	https://paperswithcode.com/paper/extraction-of-message-sequence-charts-from-1
Repo
Framework

Saarland at MRP 2019: Compositional parsing across all graphbanks


Title	Saarland at MRP 2019: Compositional parsing across all graphbanks
Authors	Lucia Donatelli, Meaghan Fowlie, Jonas Groschwitz, Alex Koller, er, Matthias Lindemann, Mario Mina, Pia Wei{\ss}enhorn
Abstract	We describe the Saarland University submission to the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference on Computational Natural Language Learning (CoNLL).
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/K19-2006/
PDF	https://www.aclweb.org/anthology/K19-2006
PWC	https://paperswithcode.com/paper/saarland-at-mrp-2019-compositional-parsing
Repo
Framework

What Correspondences Reveal About Unknown Camera and Motion Models?


Title	What Correspondences Reveal About Unknown Camera and Motion Models?
Authors	Thomas Probst, Ajad Chhatkuli, Danda Pani Paudel, Luc Van Gool
Abstract	In two-view geometry, camera models and motion types are used as key knowledge along with the image point correspondences in order to solve several key problems of 3D vision. Problems such as Structure-from-Motion (SfM) and camera self-calibration are tackled under the assumptions of a specific camera projection model and motion type. However, these key assumptions may not be always justified, i.e.., we may often know neither the camera model nor the motion type beforehand. In that context, one can extract only the point correspondences between images. From such correspondences, recovering two-view relationship –expressed by the unknown camera model and motion type– remains to be an unsolved problem. In this paper, we tackle this problem in two steps. First, we propose a method that computes the correct two-view relationship in the presence of noise and outliers. Later, we study different possibilities to disambiguate the obtained relationships into camera model and motion type. By extensive experiments on both synthetic and real data, we verify our theory and assumptions in practical settings.
Tasks	Calibration
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Probst_What_Correspondences_Reveal_About_Unknown_Camera_and_Motion_Models_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Probst_What_Correspondences_Reveal_About_Unknown_Camera_and_Motion_Models_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/what-correspondences-reveal-about-unknown
Repo
Framework

Nested Dithered Quantization for Communication Reduction in Distributed Training


Title	Nested Dithered Quantization for Communication Reduction in Distributed Training
Authors	Afshin Abdi, Faramarz Fekri
Abstract	In distributed training, the communication cost due to the transmission of gradients or the parameters of the deep model is a major bottleneck in scaling up the number of processing nodes. To address this issue, we propose dithered quantization for the transmission of the stochastic gradients and show that training with Dithered Quantized Stochastic Gradients (DQSG) is similar to the training with unquantized SGs perturbed by an independent bounded uniform noise, in contrast to the other quantization methods where the perturbation depends on the gradients and hence, complicating the convergence analysis. We study the convergence of training algorithms using DQSG and the trade off between the number of quantization levels and the training time. Next, we observe that there is a correlation among the SGs computed by workers that can be utilized to further reduce the communication overhead without any performance loss. Hence, we develop a simple yet effective quantization scheme, nested dithered quantized SG (NDQSG), that can reduce the communication significantly without requiring the workers communicating extra information to each other. We prove that although NDQSG requires significantly less bits, it can achieve the same quantization variance bound as DQSG. Our simulation results confirm the effectiveness of training using DQSG and NDQSG in reducing the communication bits or the convergence time compared to the existing methods without sacrificing the accuracy of the trained model.
Tasks	Quantization
Published	2019-05-01
URL	https://openreview.net/forum?id=rJxMM2C5K7
PDF	https://openreview.net/pdf?id=rJxMM2C5K7
PWC	https://paperswithcode.com/paper/nested-dithered-quantization-for
Repo
Framework

Incremental Object Learning From Contiguous Views


Title	Incremental Object Learning From Contiguous Views
Authors	Stefan Stojanov, Samarth Mishra, Ngoc Anh Thai, Nikhil Dhanda, Ahmad Humayun, Chen Yu, Linda B. Smith, James M. Rehg
Abstract	In this work, we present CRIB (Continual Recognition Inspired by Babies), a synthetic incremental object learning environment that can produce data that models visual imagery produced by object exploration in early infancy. CRIB is coupled with a new 3D object dataset, Toys-200, that contains 200 unique toy-like object instances, and is also compatible with existing 3D datasets. Through extensive empirical evaluation of state-of-the-art incremental learning algorithms, we find the novel empirical result that repetition can significantly ameliorate the effects of catastrophic forgetting. Furthermore, we find that in certain cases repetition allows for performance approaching that of batch learning algorithms. Finally, we propose an unsupervised incremental learning task with intriguing baseline results.
Tasks
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Stojanov_Incremental_Object_Learning_From_Contiguous_Views_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Stojanov_Incremental_Object_Learning_From_Contiguous_Views_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/incremental-object-learning-from-contiguous
Repo
Framework

A convolutional neural network approach to detect congestive heart failure


Title	A convolutional neural network approach to detect congestive heart failure
Authors	Mihaela Porumb, Ernesto Iadanza, Sebastiano Massaro, Leandro Pecchia
Abstract	Congestive Heart Failure (CHF) is a severe pathophysiological condition associated with high prevalence, high mortality rates, and sustained healthcare costs, therefore demanding efficient methods for its detection. Despite recent research has provided methods focused on advanced signal processing and machine learning, the potential of applying Convolutional Neural Network (CNN) approaches to the automatic detection of CHF has been largely overlooked thus far. This study addresses this important gap by presenting a CNN model that accurately identifies CHF on the basis of one raw electrocardiogram (ECG) heartbeat only, also juxtaposing existing methods typically grounded on Heart Rate Variability. We trained and tested the model on publicly available ECG datasets, comprising a total of 490,505 heartbeats, to achieve 100% CHF detection accuracy. Importantly, the model also identifies those heartbeat sequences and ECG’s morphological characteristics which are class-discriminative and thus prominent for CHF detection. Overall, our contribution substantially advances the current methodology for detecting CHF and caters to clinical practitioners’ needs by providing an accurate and fully transparent tool to support decisions concerning CHF detection.
Tasks	Congestive Heart Failure detection, Electrocardiography (ECG), Heartbeat Classification, Heart Rate Variability
Published	2019-09-03
URL	https://doi.org/10.1016/j.bspc.2019.101597
PDF	https://www.sciencedirect.com/science/article/pii/S1746809419301776/pdfft?md5=ca17956e278efdd4a39ec925adfa2b16&pid=1-s2.0-S1746809419301776-main.pdf
PWC	https://paperswithcode.com/paper/a-convolutional-neural-network-approach-to
Repo
Framework

Proceedings of the 6th Workshop on Asian Translation


Title	Proceedings of the 6th Workshop on Asian Translation
Authors
Abstract
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5200/
PDF	https://www.aclweb.org/anthology/D19-5200
PWC	https://paperswithcode.com/paper/proceedings-of-the-6th-workshop-on-asian
Repo
Framework

Limiting Extrapolation in Linear Approximate Value Iteration


Title	Limiting Extrapolation in Linear Approximate Value Iteration
Authors	Andrea Zanette, Alessandro Lazaric, Mykel J. Kochenderfer, Emma Brunskill
Abstract	We study linear approximate value iteration (LAVI) with a generative model. While linear models may accurately represent the optimal value function using a few parameters, several empirical and theoretical studies show the combination of least-squares projection with the Bellman operator may be expansive, thus leading LAVI to amplify errors over iterations and eventually diverge. We introduce an algorithm that approximates value functions by combining Q-values estimated at a set of \textit{anchor} states. Our algorithm tries to balance the generalization and compactness of linear methods with the small amplification of errors typical of interpolation methods. We prove that if the features at any state can be represented as a convex combination of features at the anchor points, then errors are propagated linearly over iterations (instead of exponentially) and our method achieves a polynomial sample complexity bound in the horizon and the number of anchor points. These findings are confirmed in preliminary simulations in a number of simple problems where a traditional least-square LAVI method diverges.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8799-limiting-extrapolation-in-linear-approximate-value-iteration
PDF	http://papers.nips.cc/paper/8799-limiting-extrapolation-in-linear-approximate-value-iteration.pdf
PWC	https://paperswithcode.com/paper/limiting-extrapolation-in-linear-approximate
Repo
Framework