Paper Group NANR 159
Naive Regularizers for Low-Resource Neural Machine Translation. A Multi-Task Learning Framework for Extracting Bacteria Biotope Information. Corpus of usage examples: What is it good for?. Solving Vision Problems via Filtering. A Modular Tool for Automatic Summarization. Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machi …
Naive Regularizers for Low-Resource Neural Machine Translation
Title | Naive Regularizers for Low-Resource Neural Machine Translation |
Authors | Meriem Beloucif, Ana Valeria Gonzalez, Marcel Bollmann, Anders S{\o}gaard |
Abstract | Neural machine translation models have little inductive bias, which can be a disadvantage in low-resource scenarios. Neural models have to be trained on large amounts of data and have been shown to perform poorly when only limited data is available. We show that using naive regularization methods, based on sentence length, punctuation and word frequencies, to penalize translations that are very different from the input sentences, consistently improves the translation quality across multiple low-resource languages. We experiment with 12 language pairs, varying the training data size between 17k to 230k sentence pairs. Our best regularizer achieves an average increase of 1.5 BLEU score and 1.0 TER score across all the language pairs. For example, we achieve a BLEU score of 26.70 on the IWSLT15 English{–}Vietnamese translation task simply by using relative differences in punctuation as a regularizer. |
Tasks | Low-Resource Neural Machine Translation, Machine Translation |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1013/ |
https://www.aclweb.org/anthology/R19-1013 | |
PWC | https://paperswithcode.com/paper/naive-regularizers-for-low-resource-neural |
Repo | |
Framework | |
A Multi-Task Learning Framework for Extracting Bacteria Biotope Information
Title | A Multi-Task Learning Framework for Extracting Bacteria Biotope Information |
Authors | Qi Zhang, Chao Liu, Ying Chi, Xuansong Xie, Xiansheng Hua |
Abstract | This paper presents a novel transfer multi-task learning method for Bacteria Biotope rel+ner task at BioNLP-OST 2019. To alleviate the data deficiency problem in domain-specific information extraction, we use BERT(Bidirectional Encoder Representations from Transformers) and pre-train it using mask language models and next sentence prediction on both general corpus and medical corpus like PubMed. In fine-tuning stage, we fine-tune the relation extraction layer and mention recognition layer designed by us on the top of BERT to extract mentions and relations simultaneously. The evaluation results show that our method achieves the best performance on all metrics (including slot error rate, precision and recall) in the Bacteria Biotope rel+ner subtask. |
Tasks | Multi-Task Learning, Relation Extraction |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5716/ |
https://www.aclweb.org/anthology/D19-5716 | |
PWC | https://paperswithcode.com/paper/a-multi-task-learning-framework-for-2 |
Repo | |
Framework | |
Corpus of usage examples: What is it good for?
Title | Corpus of usage examples: What is it good for? |
Authors | Timofey Arkhangelskiy |
Abstract | |
Tasks | |
Published | 2019-02-01 |
URL | https://www.aclweb.org/anthology/W19-6008/ |
https://www.aclweb.org/anthology/W19-6008 | |
PWC | https://paperswithcode.com/paper/corpus-of-usage-examples-what-is-it-good-for |
Repo | |
Framework | |
Solving Vision Problems via Filtering
Title | Solving Vision Problems via Filtering |
Authors | Sean I. Young, Aous T. Naman, Bernd Girod, David Taubman |
Abstract | We propose a new, filtering approach for solving a large number of regularized inverse problems commonly found in computer vision. Traditionally, such problems are solved by finding the solution to the system of equations that expresses the first-order optimality conditions of the problem. This can be slow if the system of equations is dense due to the use of nonlocal regularization, necessitating iterative solvers such as successive over-relaxation or conjugate gradients. In this paper, we show that similar solutions can be obtained more easily via filtering, obviating the need to solve a potentially dense system of equations using slow iterative methods. Our filtered solutions are very similar to the true ones, but often up to 10 times faster to compute. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Young_Solving_Vision_Problems_via_Filtering_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Young_Solving_Vision_Problems_via_Filtering_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/solving-vision-problems-via-filtering |
Repo | |
Framework | |
A Modular Tool for Automatic Summarization
Title | A Modular Tool for Automatic Summarization |
Authors | Valentin Nyzam, Aur{'e}lien Bossard |
Abstract | This paper introduces the first fine-grained modular tool for automatic summarization. Open source and written in Java, it is designed to be as straightforward as possible for end-users. Its modular architecture is meant to ease its maintenance and the development and integration of new modules. We hope that it will ease the work of researchers in automatic summarization by providing a reliable baseline for future works as well as an easy way to evaluate methods on different corpora. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-3030/ |
https://www.aclweb.org/anthology/P19-3030 | |
PWC | https://paperswithcode.com/paper/a-modular-tool-for-automatic-summarization |
Repo | |
Framework | |
Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation
Title | Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation |
Authors | Pamela Shapiro, Kevin Duh |
Abstract | When translating diglossic languages such as Arabic, situations may arise where we would like to translate a text but do not know which dialect it is. A traditional approach to this problem is to design dialect identification systems and dialect-specific machine translation systems. However, under the recent paradigm of neural machine translation, shared multi-dialectal systems have become a natural alternative. Here we explore under which conditions it is beneficial to perform dialect identification for Arabic neural machine translation versus using a general system for all dialects. |
Tasks | Machine Translation |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/W19-1424/ |
https://www.aclweb.org/anthology/W19-1424 | |
PWC | https://paperswithcode.com/paper/comparing-pipelined-and-integrated-approaches |
Repo | |
Framework | |
Variational Bayesian Phylogenetic Inference
Title | Variational Bayesian Phylogenetic Inference |
Authors | Cheng Zhang, Frederick A. Matsen IV |
Abstract | Bayesian phylogenetic inference is currently done via Markov chain Monte Carlo with simple mechanisms for proposing new states, which hinders exploration efficiency and often requires long runs to deliver accurate posterior estimates. In this paper we present an alternative approach: a variational framework for Bayesian phylogenetic analysis. We approximate the true posterior using an expressive graphical model for tree distributions, called a subsplit Bayesian network, together with appropriate branch length distributions. We train the variational approximation via stochastic gradient ascent and adopt multi-sample based gradient estimators for different latent variables separately to handle the composite latent space of phylogenetic models. We show that our structured variational approximations are flexible enough to provide comparable posterior estimation to MCMC, while requiring less computation due to a more efficient tree exploration mechanism enabled by variational inference. Moreover, the variational approximations can be readily used for further statistical analysis such as marginal likelihood estimation for model comparison via importance sampling. Experiments on both synthetic data and real data Bayesian phylogenetic inference problems demonstrate the effectiveness and efficiency of our methods. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SJVmjjR9FX |
https://openreview.net/pdf?id=SJVmjjR9FX | |
PWC | https://paperswithcode.com/paper/variational-bayesian-phylogenetic-inference |
Repo | |
Framework | |
Empirically Characterizing Overparameterization Impact on Convergence
Title | Empirically Characterizing Overparameterization Impact on Convergence |
Authors | Newsha Ardalani, Joel Hestness, Gregory Diamos |
Abstract | A long-held conventional wisdom states that larger models train more slowly when using gradient descent. This work challenges this widely-held belief, showing that larger models can potentially train faster despite the increasing computational requirements of each training step. In particular, we study the effect of network structure (depth and width) on halting time and show that larger models—wider models in particular—take fewer training steps to converge. We design simple experiments to quantitatively characterize the effect of overparametrization on weight space traversal. Results show that halting time improves when growing model’s width for three different applications, and the improvement comes from each factor: The distance from initialized weights to converged weights shrinks with a power-law-like relationship, the average step size grows with a power-law-like relationship, and gradient vectors become more aligned with each other during traversal. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S1lPShAqFm |
https://openreview.net/pdf?id=S1lPShAqFm | |
PWC | https://paperswithcode.com/paper/empirically-characterizing |
Repo | |
Framework | |
Identifying Adverse Drug Events Mentions in Tweets Using Attentive, Collocated, and Aggregated Medical Representation
Title | Identifying Adverse Drug Events Mentions in Tweets Using Attentive, Collocated, and Aggregated Medical Representation |
Authors | Xinyan Zhao, Deahan Yu, V.G.Vinod Vydiswaran |
Abstract | Identifying mentions of medical concepts in social media is challenging because of high variability in free text. In this paper, we propose a novel neural network architecture, the Collocated LSTM with Attentive Pooling and Aggregated representation (CLAPA), that integrates a bidirectional LSTM model with attention and pooling strategy and utilizes the collocation information from training data to improve the representation of medical concepts. The collocation and aggregation layers improve the model performance on the task of identifying mentions of adverse drug events (ADE) in tweets. Using the dataset made available as part of the workshop shared task, we show that careful selection of neighborhood contexts can help uncover useful local information and improve the overall medical concept representation. |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-3209/ |
https://www.aclweb.org/anthology/W19-3209 | |
PWC | https://paperswithcode.com/paper/identifying-adverse-drug-events-mentions-in |
Repo | |
Framework | |
NLPRL at WAT2019: Transformer-based Tamil – English Indic Task Neural Machine Translation System
Title | NLPRL at WAT2019: Transformer-based Tamil – English Indic Task Neural Machine Translation System |
Authors | Amit Kumar, Anil Kumar Singh |
Abstract | This paper describes the Machine Translation system for Tamil-English Indic Task organized at WAT 2019. We use Transformer- based architecture for Neural Machine Translation. |
Tasks | Machine Translation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5222/ |
https://www.aclweb.org/anthology/D19-5222 | |
PWC | https://paperswithcode.com/paper/nlprl-at-wat2019-transformer-based-tamil |
Repo | |
Framework | |
Combining Discourse Markers and Cross-lingual Embeddings for Synonym–Antonym Classification
Title | Combining Discourse Markers and Cross-lingual Embeddings for Synonym–Antonym Classification |
Authors | Michael Roth, Shyam Upadhyay |
Abstract | It is well-known that distributional semantic approaches have difficulty in distinguishing between synonyms and antonyms (Grefenstette, 1992; Pad{'o} and Lapata, 2003). Recent work has shown that supervision available in English for this task (e.g., lexical resources) can be transferred to other languages via cross-lingual word embeddings. However, this kind of transfer misses monolingual distributional information available in a target language, such as contrast relations that are indicative of antonymy (e.g. hot … while … cold). In this work, we improve the transfer by exploiting monolingual information, expressed in the form of co-occurrences with discourse markers that convey contrast. Our approach makes use of less than a dozen markers, which can easily be obtained for many languages. Compared to a baseline using only cross-lingual embeddings, we show absolute improvements of 4{–}10{%} F1-score in Vietnamese and Hindi. |
Tasks | Word Embeddings |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1390/ |
https://www.aclweb.org/anthology/N19-1390 | |
PWC | https://paperswithcode.com/paper/combining-discourse-markers-and-cross-lingual |
Repo | |
Framework | |
ZQM at SemEval-2019 Task9: A Single Layer CNN Based on Pre-trained Model for Suggestion Mining
Title | ZQM at SemEval-2019 Task9: A Single Layer CNN Based on Pre-trained Model for Suggestion Mining |
Authors | Qimin Zhou, Zhengxin Zhang, Hao Wu, Linmao Wang |
Abstract | This paper describes our system that competed at SemEval 2019 Task 9 - SubTask A: {''}Sug- gestion Mining from Online Reviews and Forums{''}. Our system fuses the convolutional neural network and the latest BERT model to conduct suggestion mining. In our system, the input of convolutional neural network is the embedding vectors which are drawn from the pre-trained BERT model. And to enhance the effectiveness of the whole system, the pre-trained BERT model is fine-tuned by provided datasets before the procedure of embedding vectors extraction. Empirical results show the effectiveness of our model which obtained 9th position out of 34 teams with F1 score equals to 0.715. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2226/ |
https://www.aclweb.org/anthology/S19-2226 | |
PWC | https://paperswithcode.com/paper/zqm-at-semeval-2019-task9-a-single-layer-cnn |
Repo | |
Framework | |
One format to rule them all – The emtsv pipeline for Hungarian
Title | One format to rule them all – The emtsv pipeline for Hungarian |
Authors | Bal{'a}zs Indig, B{'a}lint Sass, Eszter Simon, Iv{'a}n Mittelholcz, No{'e}mi Vad{'a}sz, M{'a}rton Makrai |
Abstract | We present a more efficient version of the e-magyar NLP pipeline for Hungarian called emtsv. It integrates Hungarian NLP tools in a framework whose individual modules can be developed or replaced independently and allows new ones to be added. The design also allows convenient investigation and manual correction of the data flow from one module to another. The improvements we publish include effective communication between the modules and support of the use of individual modules both in the chain and standing alone. Our goals are accomplished using extended tsv (tab separated values) files, a simple, uniform, generic and self-documenting input/output format. Our vision is maintaining the system for a long time and making it easier for external developers to fit their own modules into the system, thus sharing existing competencies in the field of processing Hungarian, a mid-resourced language. The source code is available under LGPL 3.0 license at https://github.com/dlt-rilmta/emtsv . |
Tasks | |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-4018/ |
https://www.aclweb.org/anthology/W19-4018 | |
PWC | https://paperswithcode.com/paper/one-format-to-rule-them-all-the-emtsv |
Repo | |
Framework | |
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy
Title | Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy |
Authors | Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang |
Abstract | Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9242-neural-trust-regionproximal-policy-optimization-attains-globally-optimal-policy |
http://papers.nips.cc/paper/9242-neural-trust-regionproximal-policy-optimization-attains-globally-optimal-policy.pdf | |
PWC | https://paperswithcode.com/paper/neural-trust-regionproximal-policy |
Repo | |
Framework | |
Learning data-derived privacy preserving representations from information metrics
Title | Learning data-derived privacy preserving representations from information metrics |
Authors | Martin Bertran, Natalia Martinez, Afroditi Papadaki, Qiang Qiu, Miguel Rodrigues, Guillermo Sapiro |
Abstract | It is clear that users should own and control their data and privacy. Utility providers are also becoming more interested in guaranteeing data privacy. Therefore, users and providers can and should collaborate in privacy protecting challenges, and this paper addresses this new paradigm. We propose a framework where the user controls what characteristics of the data they want to share (utility) and what they want to keep private (secret), without necessarily asking the utility provider to change its existing machine learning algorithms. We first analyze the space of privacy-preserving representations and derive natural information-theoretic bounds on the utility-privacy trade-off when disclosing a sanitized version of the data X. We present explicit learning architectures to learn privacy-preserving representations that approach this bound in a data-driven fashion. We describe important use-case scenarios where the utility providers are willing to collaborate with the sanitization process. We study space-preserving transformations where the utility provider can use the same algorithm on original and sanitized data, a critical and novel attribute to help service providers accommodate varying privacy requirements with a single set of utility algorithms. We illustrate this framework through the implementation of three use cases; subject-within-subject, where we tackle the problem of having a face identity detector that works only on a consenting subset of users, an important application, for example, for mobile devices activated by face recognition; gender-and-subject, where we preserve facial verification while hiding the gender attribute for users who choose to do so; and emotion-and-gender, where we hide independent variables, as is the case of hiding gender while preserving emotion detection. |
Tasks | Face Recognition |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=SJe2so0qF7 |
https://openreview.net/pdf?id=SJe2so0qF7 | |
PWC | https://paperswithcode.com/paper/learning-data-derived-privacy-preserving |
Repo | |
Framework | |