January 24, 2020

2129 words 10 mins read

Paper Group NANR 147

Paper Group NANR 147

IMSurReal: IMS at the Surface Realization Shared Task 2019. Topic-Guided Variational Auto-Encoder for Text Generation. Proceedings of the Natural Legal Language Processing Workshop 2019. The Concordia NLG Surface Realizer at SRST 2019. The DipInfoUniTo Realizer at SRST’19: Learning to Rank and Deep Morphology Prediction for Multilingual Surface Rea …

IMSurReal: IMS at the Surface Realization Shared Task 2019

Title IMSurReal: IMS at the Surface Realization Shared Task 2019
Authors Xiang Yu, Agnieszka Falenska, Marina Haid, Ngoc Thang Vu, Jonas Kuhn
Abstract We introduce the IMS contribution to the Surface Realization Shared Task 2019. Our submission achieves the state-of-the-art performance without using any external resources. The system takes a pipeline approach consisting of five steps: linearization, completion, inflection, contraction, and detokenization. We compare the performance of our linearization algorithm with two external baselines and report results for each step in the pipeline. Furthermore, we perform detailed error analysis revealing correlation between word order freedom and difficulty of the linearization task.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6306/
PDF https://www.aclweb.org/anthology/D19-6306
PWC https://paperswithcode.com/paper/imsurreal-ims-at-the-surface-realization
Repo
Framework

Topic-Guided Variational Auto-Encoder for Text Generation

Title Topic-Guided Variational Auto-Encoder for Text Generation
Authors Wenlin Wang, Zhe Gan, Hongteng Xu, Ruiyi Zhang, Guoyin Wang, Dinghan Shen, Changyou Chen, Lawrence Carin
Abstract We propose a topic-guided variational auto-encoder (TGVAE) model for text generation. Distinct from existing variational auto-encoder (VAE) based approaches, which assume a simple Gaussian prior for latent code, our model specifies the prior as a Gaussian mixture model (GMM) parametrized by a neural topic module. Each mixture component corresponds to a latent topic, which provides a guidance to generate sentences under the topic. The neural topic module and the VAE-based neural sequence module in our model are learned jointly. In particular, a sequence of invertible Householder transformations is applied to endow the approximate posterior of the latent code with high flexibility during the model inference. Experimental results show that our TGVAE outperforms its competitors on both unconditional and conditional text generation, which can also generate semantically-meaningful sentences with various topics.
Tasks Text Generation
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-1015/
PDF https://www.aclweb.org/anthology/N19-1015
PWC https://paperswithcode.com/paper/topic-guided-variational-auto-encoder-for
Repo
Framework
Title Proceedings of the Natural Legal Language Processing Workshop 2019
Authors
Abstract
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2200/
PDF https://www.aclweb.org/anthology/W19-2200
PWC https://paperswithcode.com/paper/proceedings-of-the-natural-legal-language
Repo
Framework

The Concordia NLG Surface Realizer at SRST 2019

Title The Concordia NLG Surface Realizer at SRST 2019
Authors Farhood Farahnak, Laya Rafiee, Leila Kosseim, Thomas Fevens
Abstract This paper presents the model we developed for the shallow track of the 2019 NLG Surface Realization Shared Task. The model reconstructs sentences whose word order and word inflections were removed. We divided the problem into two sub-problems: reordering and inflecting. For the purpose of reordering, we used a pointer network integrated with a transformer model as its encoder-decoder modules. In order to generate the inflected forms of tokens, a Feed Forward Neural Network was employed.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6308/
PDF https://www.aclweb.org/anthology/D19-6308
PWC https://paperswithcode.com/paper/the-concordia-nlg-surface-realizer-at-srst
Repo
Framework

The DipInfoUniTo Realizer at SRST’19: Learning to Rank and Deep Morphology Prediction for Multilingual Surface Realization

Title The DipInfoUniTo Realizer at SRST’19: Learning to Rank and Deep Morphology Prediction for Multilingual Surface Realization
Authors Aless Mazzei, ro, Valerio Basile
Abstract We describe the system presented at the SR{'}19 shared task by the DipInfoUnito team. Our approach is based on supervised machine learning. In particular, we divide the SR task into two independent subtasks, namely word order prediction and morphology inflection prediction. Two neural networks with different architectures run on the same input structure, each producing a partial output which is recombined in the final step in order to produce the predicted surface form. This work is a direct successor of the architecture presented at SR{'}19.
Tasks Learning-To-Rank
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6311/
PDF https://www.aclweb.org/anthology/D19-6311
PWC https://paperswithcode.com/paper/the-dipinfounito-realizer-at-srst19-learning
Repo
Framework

DepDist: Surface realization via regex and learned dependency-distance tolerance

Title DepDist: Surface realization via regex and learned dependency-distance tolerance
Authors William Dyer
Abstract This paper describes a method of inflecting and linearizing a lemmatized dependency tree by: (1) determining a regular expression and substitution to describe each productive wordform rule; (2) learning the dependency distance tolerance for each head-dependent pair, resulting in an edge-weighted directed acyclic graph (DAG); and (3) topologically sorting the DAG into a surface realization based on edge weight. The method{'}s output for 11 languages across 18 treebanks is competitive with the other submissions to the Second Multilingual Surface Realization Shared Task (SR {`}19). |
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6303/
PDF https://www.aclweb.org/anthology/D19-6303
PWC https://paperswithcode.com/paper/depdist-surface-realization-via-regex-and
Repo
Framework

Social Web Observatory: An entity-driven, holistic information summarization platform across sources

Title Social Web Observatory: An entity-driven, holistic information summarization platform across sources
Authors Leonidas Tsekouras, Georgios Petasis, Aris Kosmopoulos
Abstract The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events.
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/W19-8907/
PDF https://www.aclweb.org/anthology/W19-8907
PWC https://paperswithcode.com/paper/social-web-observatory-an-entity-driven
Repo
Framework

Joint Semantic and Distributional Word Representations with Multi-Graph Embeddings

Title Joint Semantic and Distributional Word Representations with Multi-Graph Embeddings
Authors Pierre Daix-Moreux, Matthias Gall{'e}
Abstract Word embeddings continue to be of great use for NLP researchers and practitioners due to their training speed and easiness of use and distribution. Prior work has shown that the representation of those words can be improved by the use of semantic knowledge-bases. In this paper we propose a novel way of combining those knowledge-bases while the lexical information of co-occurrences of words remains. It is conceptually clear, as it consists in mapping both distributional and semantic information into a multi-graph and modifying existing node embeddings techniques to compute word representations. Our experiments show improved results compared to vanilla word embeddings, retrofitting and concatenation techniques using the same information, on a variety of data-sets of word similarities.
Tasks Word Embeddings
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5314/
PDF https://www.aclweb.org/anthology/D19-5314
PWC https://paperswithcode.com/paper/joint-semantic-and-distributional-word
Repo
Framework
Title Scalable Methods for Annotating Legal-Decision Corpora
Authors Lisa Ferro, John Aberdeen, Karl Branting, Craig Pfeifer, Alex Yeh, er, Amartya Chakraborty
Abstract Recent research has demonstrated that judicial and administrative decisions can be predicted by machine-learning models trained on prior decisions. However, to have any practical application, these predictions must be explainable, which in turn requires modeling a rich set of features. Such approaches face a roadblock if the knowledge engineering required to create these features is not scalable. We present an approach to developing a feature-rich corpus of administrative rulings about domain name disputes, an approach which leverages a small amount of manual annotation and prototypical patterns present in the case documents to automatically extend feature labels to the entire corpus. To demonstrate the feasibility of this approach, we report results from systems trained on this dataset.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2202/
PDF https://www.aclweb.org/anthology/W19-2202
PWC https://paperswithcode.com/paper/scalable-methods-for-annotating-legal
Repo
Framework

The Rationality of Semantic Change

Title The Rationality of Semantic Change
Authors Omer Korat
Abstract This study investigates the mutual effects over time of semantically related function words on each other{'}s distribution over syntactic environments. Words that can have the same meaning are observed to have opposite trends of change in frequency across different syntactic structures which correspond to the shared meaning. This phenomenon is demonstrated to have a rational basis: it increases communicative efficiency by prioritizing words differently in the environments on which they compete.
Tasks
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-4719/
PDF https://www.aclweb.org/anthology/W19-4719
PWC https://paperswithcode.com/paper/the-rationality-of-semantic-change
Repo
Framework

Correcting Whitespace Errors in Digitized Historical Texts

Title Correcting Whitespace Errors in Digitized Historical Texts
Authors S Soni, eep, Lauren Klein, Jacob Eisenstein
Abstract Whitespace errors are common to digitized archives. This paper describes a lightweight unsupervised technique for recovering the original whitespace. Our approach is based on count statistics from Google n-grams, which are converted into a likelihood ratio test computed from interpolated trigram and bigram probabilities. To evaluate this approach, we annotate a small corpus of whitespace errors in a digitized corpus of newspapers from the 19th century United States. Our technique identifies and corrects most whitespace errors while introducing a minimal amount of oversegmentation: it achieves 77{%} recall at a false positive rate of less than 1{%}, and 91{%} recall at a false positive rate of less than 3{%}.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/W19-2513/
PDF https://www.aclweb.org/anthology/W19-2513
PWC https://paperswithcode.com/paper/correcting-whitespace-errors-in-digitized
Repo
Framework

Eigencharacter: An Embedding of Chinese Character Orthography

Title Eigencharacter: An Embedding of Chinese Character Orthography
Authors Yu-Hsiang Tseng, Shu-Kai Hsieh
Abstract Chinese characters are unique in its logographic nature, which inherently encodes world knowledge through thousands of years evolution. This paper proposes an embedding approach, namely eigencharacter (EC) space, which helps NLP application easily access the knowledge encoded in Chinese orthography. These EC representations are automatically extracted, encode both structural and radical information, and easily integrate with other computational models. We built EC representations of 5,000 Chinese characters, investigated orthography knowledge encoded in ECs, and demonstrated how these ECs identified visually similar characters with both structural and radical information.
Tasks
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-6404/
PDF https://www.aclweb.org/anthology/D19-6404
PWC https://paperswithcode.com/paper/eigencharacter-an-embedding-of-chinese
Repo
Framework

Soft Q-Learning with Mutual-Information Regularization

Title Soft Q-Learning with Mutual-Information Regularization
Authors Jordi Grau-Moya, Felix Leibfried, Peter Vrancx
Abstract We propose a reinforcement learning (RL) algorithm that uses mutual-information regularization to optimize a prior action distribution for better performance and exploration. Entropy-based regularization has previously been shown to improve both exploration and robustness in challenging sequential decision-making tasks. It does so by encouraging policies to put probability mass on all actions. However, entropy regularization might be undesirable when actions have significantly different importance. In this paper, we propose a theoretically motivated framework that dynamically weights the importance of actions by using the mutual-information. In particular, we express the RL problem as an inference problem where the prior probability distribution over actions is subject to optimization. We show that the prior optimization introduces a mutual-information regularizer in the RL objective. This regularizer encourages the policy to be close to a non-uniform distribution that assigns higher probability mass to more important actions. We empirically demonstrate that our method significantly improves over entropy regularization methods and unregularized methods.
Tasks Decision Making, Q-Learning
Published 2019-05-01
URL https://openreview.net/forum?id=HyEtjoCqFX
PDF https://openreview.net/pdf?id=HyEtjoCqFX
PWC https://paperswithcode.com/paper/soft-q-learning-with-mutual-information
Repo
Framework

Hierarchical User and Item Representation with Three-Tier Attention for Recommendation

Title Hierarchical User and Item Representation with Three-Tier Attention for Recommendation
Authors Chuhan Wu, Fangzhao Wu, Junxin Liu, Yongfeng Huang
Abstract Utilizing reviews to learn user and item representations is useful for recommender systems. Existing methods usually merge all reviews from the same user or for the same item into a long document. However, different reviews, sentences and even words usually have different informativeness for modeling users and items. In this paper, we propose a hierarchical user and item representation model with three-tier attention to learn user and item representations from reviews for recommendation. Our model contains three major components, i.e., a sentence encoder to learn sentence representations from words, a review encoder to learn review representations from sentences, and a user/item encoder to learn user/item representations from reviews. In addition, we incorporate a three-tier attention network in our model to select important words, sentences and reviews. Besides, we combine the user and item representations learned from the reviews with user and item embeddings based on IDs as the final representations to capture the latent factors of individual users and items. Extensive experiments on four benchmark datasets validate the effectiveness of our approach.
Tasks Recommendation Systems
Published 2019-06-01
URL https://www.aclweb.org/anthology/N19-1180/
PDF https://www.aclweb.org/anthology/N19-1180
PWC https://paperswithcode.com/paper/hierarchical-user-and-item-representation
Repo
Framework

Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back-Translation

Title Improving Neural Machine Translation Robustness via Data Augmentation: Beyond Back-Translation
Authors Zhenhao Li, Lucia Specia
Abstract Neural Machine Translation (NMT) models have been proved strong when translating clean texts, but they are very sensitive to noise in the input. Improving NMT models robustness can be seen as a form of {``}domain{''} adaption to noise. The recently created Machine Translation on Noisy Text task corpus provides noisy-clean parallel data for a few language pairs, but this data is very limited in size and diversity. The state-of-the-art approaches are heavily dependent on large volumes of back-translated data. This paper has two main contributions: Firstly, we propose new data augmentation methods to extend limited noisy data and further improve NMT robustness to noise while keeping the models small. Secondly, we explore the effect of utilizing noise from external data in the form of speech transcripts and show that it could help robustness. |
Tasks Data Augmentation, Domain Adaptation, Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5543/
PDF https://www.aclweb.org/anthology/D19-5543
PWC https://paperswithcode.com/paper/improving-neural-machine-translation-1
Repo
Framework
comments powered by Disqus