October 15, 2019

2168 words 11 mins read

Paper Group NANR 129

Rule-based vs. Neural Net Approaches to Semantic Textual Similarity. MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation. The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions. LREMap, a Song of Resources and Evaluation. Building Literary Corpora for Computational Literary Analysis - …

Rule-based vs. Neural Net Approaches to Semantic Textual Similarity


Title	Rule-based vs. Neural Net Approaches to Semantic Textual Similarity
Authors	Linrui Zhang, Dan Moldovan
Abstract	This paper presents a neural net approach to determine Semantic Textual Similarity (STS) using attention-based bidirectional Long Short-Term Memory Networks (Bi-LSTM). To this date, most of the traditional STS systems were rule-based that built on top of excessive use of linguistic features and resources. In this paper, we present an end-to-end attention-based Bi-LSTM neural network system that solely takes word-level features, without expensive feature engineering work or the usage of external resources. By comparing its performance with traditional rule-based systems against SemEval-2012 benchmark, we make an assessment on the limitations and strengths of neural net systems to rule-based systems on Semantic Textual Similarity.
Tasks	Feature Engineering, Semantic Textual Similarity, Sentence Pair Modeling
Published	2018-08-01
URL	https://www.aclweb.org/anthology/W18-3803/
PDF	https://www.aclweb.org/anthology/W18-3803
PWC	https://paperswithcode.com/paper/rule-based-vs-neural-net-approaches-to
Repo
Framework

MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation


Title	MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation
Authors	Meng Zou, Xihan Li, Haokun Liu, Zhihong Deng
Abstract	Neural encoder-decoder models have been widely applied to conversational response generation, which is a research hot spot in recent years. However, conventional neural encoder-decoder models tend to generate commonplace responses like {``}I don{'}t know{''} regardless of what the input is. In this paper, we analyze this problem from a new perspective: latent vectors. Based on it, we propose an easy-to-extend learning framework named MEMD (Multi-Encoder to Multi-Decoder), in which an auxiliary encoder and an auxiliary decoder are introduced to provide necessary training guidance without resorting to extra data or complicating network{'}s inner structure. Experimental results demonstrate that our method effectively improve the quality of generated responses according to automatic metrics and human evaluations, yielding more diverse and smooth replies. \|
Tasks	Conversational Response Generation, Short-Text Conversation
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1109/
PDF	https://www.aclweb.org/anthology/C18-1109
PWC	https://paperswithcode.com/paper/memd-a-diversity-promoting-learning-framework
Repo
Framework

The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions


Title	The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions
Authors	Salvatore Giorgi, Daniel Preo{\c{t}}iuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle Ungar, H. Andrew Schwartz
Abstract	Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. Results on four different U.S. county-level tasks, spanning demographic, health, and psychological outcomes show large and consistent improvements in prediction accuracies (e.g. from Pearson r=.73 to .82 for median income prediction or r=.37 to .47 for life satisfaction prediction) over the standard approach of aggregating all tweets. We make our aggregated and anonymized community-level data, derived from 37 billion tweets {–} over 1 billion of which were mapped to counties, available for research.
Tasks
Published	2018-10-01
URL	https://www.aclweb.org/anthology/D18-1148/
PDF	https://www.aclweb.org/anthology/D18-1148
PWC	https://paperswithcode.com/paper/the-remarkable-benefit-of-user-level
Repo
Framework

LREMap, a Song of Resources and Evaluation


Title	LREMap, a Song of Resources and Evaluation
Authors	Riccardo Del Gratta, Sara Goggi, Gabriella Pardelli, Nicoletta Calzolari
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1203/
PDF	https://www.aclweb.org/anthology/L18-1203
PWC	https://paperswithcode.com/paper/lremap-a-song-of-resources-and-evaluation
Repo
Framework

Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH


Title	Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH
Authors	Andrew Frank, Christine Ivanovic
Abstract
Tasks
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1128/
PDF	https://www.aclweb.org/anthology/L18-1128
PWC	https://paperswithcode.com/paper/building-literary-corpora-for-computational
Repo
Framework

Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)


Title	Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Authors
Abstract
Tasks
Published	2018-06-01
URL	https://www.aclweb.org/anthology/N18-1000/
PDF	https://www.aclweb.org/anthology/N18-1000
PWC	https://paperswithcode.com/paper/proceedings-of-the-2018-conference-of-the-5
Repo
Framework

A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis


Title	A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis
Authors	Yicheng Zou, Tao Gui, Qi Zhang, Xuanjing Huang
Abstract	Attention mechanisms have been leveraged for sentiment classification tasks because not all words have the same importance. However, most existing attention models did not take full advantage of sentiment lexicons, which provide rich sentiment information and play a critical role in sentiment analysis. To achieve the above target, in this work, we propose a novel lexicon-based supervised attention model (LBSA), which allows a recurrent neural network to focus on the sentiment content, thus generating sentiment-informative representations. Compared with general attention models, our model has better interpretability and less noise. Experimental results on three large-scale sentiment classification datasets showed that the proposed method outperforms previous methods.
Tasks	Sentiment Analysis
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1074/
PDF	https://www.aclweb.org/anthology/C18-1074
PWC	https://paperswithcode.com/paper/a-lexicon-based-supervised-attention-model
Repo
Framework

Cooperative Denoising for Distantly Supervised Relation Extraction


Title	Cooperative Denoising for Distantly Supervised Relation Extraction
Authors	Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan, Ying Shen
Abstract	Distantly supervised relation extraction greatly reduces human efforts in extracting relational facts from unstructured texts. However, it suffers from noisy labeling problem, which can degrade its performance. Meanwhile, the useful information expressed in knowledge graph is still underutilized in the state-of-the-art methods for distantly supervised relation extraction. In the light of these challenges, we propose CORD, a novelCOopeRativeDenoising framework, which consists two base networks leveraging text corpus and knowledge graph respectively, and a cooperative module involving their mutual learning by the adaptive bi-directional knowledge distillation and dynamic ensemble with noisy-varying instances. Experimental results on a real-world dataset demonstrate that the proposed method reduces the noisy labels and achieves substantial improvement over the state-of-the-art methods.
Tasks	Denoising, Information Retrieval, Question Answering, Relation Extraction
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1036/
PDF	https://www.aclweb.org/anthology/C18-1036
PWC	https://paperswithcode.com/paper/cooperative-denoising-for-distantly
Repo
Framework

Word Affect Intensities


Title	Word Affect Intensities
Authors	Saif Mohammad
Abstract
Tasks	Emotion Recognition, Sentiment Analysis, Text Generation
Published	2018-05-01
URL	https://www.aclweb.org/anthology/L18-1027/
PDF	https://www.aclweb.org/anthology/L18-1027
PWC	https://paperswithcode.com/paper/word-affect-intensities-1
Repo
Framework

Constrained Interacting Submodular Groupings


Title	Constrained Interacting Submodular Groupings
Authors	Andrew Cotter, Mahdi Milani Fard, Seungil You, Maya Gupta, Jeff Bilmes
Abstract	We introduce the problem of grouping a finite ground set into blocks where each block is a subset of the ground set and where: (i) the blocks are individually highly valued by a submodular function (both robustly and in the average case) while satisfying block-specific matroid constraints; and (ii) block scores interact where blocks are jointly scored highly, thus making the blocks mutually non-redundant. Submodular functions are good models of information and diversity; thus, the above can be seen as grouping the ground set into matroid constrained blocks that are both intra- and inter-diverse. Potential applications include forming ensembles of classification/regression models, partitioning data for parallel processing, and summarization. In the non-robust case, we reduce the problem to non-monotone submodular maximization subject to multiple matroid constraints. In the mixed robust/average case, we offer a bi-criterion guarantee for a polynomial time deterministic algorithm and a probabilistic guarantee for randomized algorithm, as long as the involved submodular functions (including the inter-block interaction terms) are monotone. We close with a case study in which we use these algorithms to find high quality diverse ensembles of classifiers, showing good results.
Tasks
Published	2018-07-01
URL	https://icml.cc/Conferences/2018/Schedule?showEvent=2129
PDF	http://proceedings.mlr.press/v80/cotter18a/cotter18a.pdf
PWC	https://paperswithcode.com/paper/constrained-interacting-submodular-groupings
Repo
Framework

Universal Sentence Encoder for English


Title	Universal Sentence Encoder for English
Authors	Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil
Abstract	We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias.
Tasks	Multi-Task Learning, Sentence Embedding, Sentence Embeddings, Tokenization, Transfer Learning
Published	2018-11-01
URL	https://www.aclweb.org/anthology/D18-2029/
PDF	https://www.aclweb.org/anthology/D18-2029
PWC	https://paperswithcode.com/paper/universal-sentence-encoder-for-english
Repo
Framework

DropMax: Adaptive Stochastic Softmax


Title	DropMax: Adaptive Stochastic Softmax
Authors	Hae Beom Lee, Juho Lee, Eunho Yang, Sung Ju Hwang
Abstract	We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes with some probability, for each instance. Specifically, we overlay binary masking variables over class output probabilities, which are learned based on the input via regularized variational inference. This stochastic regularization has an effect of building an ensemble classifier out of combinatorial number of classifiers with different decision boundaries. Moreover, the learning of dropout probabilities for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes. We validate our model on multiple public datasets for classification, on which it obtains improved accuracy over regular softmax classifier and other baselines. Further analysis of the learned dropout masks shows that our model indeed selects confusing classes more often when it performs classification.
Tasks
Published	2018-01-01
URL	https://openreview.net/forum?id=Sy4c-3xRW
PDF	https://openreview.net/pdf?id=Sy4c-3xRW
PWC	https://paperswithcode.com/paper/dropmax-adaptive-stochastic-softmax
Repo
Framework

Cross-corpus Native Language Identification via Statistical Embedding


Title	Cross-corpus Native Language Identification via Statistical Embedding
Authors	Francisco Rangel, Paolo Rosso, Julian Brooke, Alex Uitdenbogerd, ra
Abstract	In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus. The motivation behind this study is to investigate native language identification in the Australian academic scenario where a majority of students come from China, Indonesia, and Arabic-speaking nations. We have proposed a statistical embedding representation reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and Indonesian in a cross-corpus scenario. The proposed approach was shown to be competitive even when the data is scarce and imbalanced.
Tasks	Language Identification, Native Language Identification
Published	2018-06-01
URL	https://www.aclweb.org/anthology/W18-1605/
PDF	https://www.aclweb.org/anthology/W18-1605
PWC	https://paperswithcode.com/paper/cross-corpus-native-language-identification
Repo
Framework

TCAV: Relative concept importance testing with Linear Concept Activation Vectors


Title	TCAV: Relative concept importance testing with Linear Concept Activation Vectors
Authors	Been Kim, Justin Gilmer, Martin Wattenberg, Fernanda Viégas
Abstract	Despite neural network’s high performance, the lack of interpretability has been the main bottleneck for its safe usage in practice. In domains with high stakes (e.g., medical diagnosis), gaining insights into the network is critical for gaining trust and being adopted. One of the ways to improve interpretability of a NN is to explain the importance of a particular concept (e.g., gender) in prediction. This is useful for explaining reasoning behind the networks’ predictions, and for revealing any biases the network may have. This work aims to provide quantitative answers to \textit{the relative importance of concepts of interest} via concept activation vectors (CAV). In particular, this framework enables non-machine learning experts to express concepts of interests and test hypotheses using examples (e.g., a set of pictures that illustrate the concept). We show that CAV can be learned given a relatively small set of examples. Testing with CAV, for example, can answer whether a particular concept (e.g., gender) is more important in predicting a given class (e.g., doctor) than other set of concepts. Interpreting with CAV does not require any retraining or modification of the network. We show that many levels of meaningful concepts are learned (e.g., color, texture, objects, a person’s occupation), and we present CAV’s \textit{empirical deepdream} — where we maximize an activation using a set of example pictures. We show how various insights can be gained from the relative importance testing with CAV.
Tasks	Medical Diagnosis
Published	2018-01-01
URL	https://openreview.net/forum?id=S1viikbCW
PDF	https://openreview.net/pdf?id=S1viikbCW
PWC	https://paperswithcode.com/paper/tcav-relative-concept-importance-testing-with
Repo
Framework

Improving Neural Machine Translation by Incorporating Hierarchical Subword Features


Title	Improving Neural Machine Translation by Incorporating Hierarchical Subword Features
Authors	Makoto Morishita, Jun Suzuki, Masaaki Nagata
Abstract	This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ: (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets.
Tasks	Machine Translation
Published	2018-08-01
URL	https://www.aclweb.org/anthology/C18-1052/
PDF	https://www.aclweb.org/anthology/C18-1052
PWC	https://paperswithcode.com/paper/improving-neural-machine-translation-by
Repo
Framework