Paper Group NANR 129
Rule-based vs. Neural Net Approaches to Semantic Textual Similarity. MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation. The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions. LREMap, a Song of Resources and Evaluation. Building Literary Corpora for Computational Literary Analysis - …
Rule-based vs. Neural Net Approaches to Semantic Textual Similarity
Title | Rule-based vs. Neural Net Approaches to Semantic Textual Similarity |
Authors | Linrui Zhang, Dan Moldovan |
Abstract | This paper presents a neural net approach to determine Semantic Textual Similarity (STS) using attention-based bidirectional Long Short-Term Memory Networks (Bi-LSTM). To this date, most of the traditional STS systems were rule-based that built on top of excessive use of linguistic features and resources. In this paper, we present an end-to-end attention-based Bi-LSTM neural network system that solely takes word-level features, without expensive feature engineering work or the usage of external resources. By comparing its performance with traditional rule-based systems against SemEval-2012 benchmark, we make an assessment on the limitations and strengths of neural net systems to rule-based systems on Semantic Textual Similarity. |
Tasks | Feature Engineering, Semantic Textual Similarity, Sentence Pair Modeling |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-3803/ |
https://www.aclweb.org/anthology/W18-3803 | |
PWC | https://paperswithcode.com/paper/rule-based-vs-neural-net-approaches-to |
Repo | |
Framework | |
MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation
Title | MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation |
Authors | Meng Zou, Xihan Li, Haokun Liu, Zhihong Deng |
Abstract | Neural encoder-decoder models have been widely applied to conversational response generation, which is a research hot spot in recent years. However, conventional neural encoder-decoder models tend to generate commonplace responses like {``}I don{'}t know{''} regardless of what the input is. In this paper, we analyze this problem from a new perspective: latent vectors. Based on it, we propose an easy-to-extend learning framework named MEMD (Multi-Encoder to Multi-Decoder), in which an auxiliary encoder and an auxiliary decoder are introduced to provide necessary training guidance without resorting to extra data or complicating network{'}s inner structure. Experimental results demonstrate that our method effectively improve the quality of generated responses according to automatic metrics and human evaluations, yielding more diverse and smooth replies. | |
Tasks | Conversational Response Generation, Short-Text Conversation |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1109/ |
https://www.aclweb.org/anthology/C18-1109 | |
PWC | https://paperswithcode.com/paper/memd-a-diversity-promoting-learning-framework |
Repo | |
Framework | |
The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions
Title | The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions |
Authors | Salvatore Giorgi, Daniel Preo{\c{t}}iuc-Pietro, Anneke Buffone, Daniel Rieman, Lyle Ungar, H. Andrew Schwartz |
Abstract | Nowcasting based on social media text promises to provide unobtrusive and near real-time predictions of community-level outcomes. These outcomes are typically regarding people, but the data is often aggregated without regard to users in the Twitter populations of each community. This paper describes a simple yet effective method for building community-level models using Twitter language aggregated by user. Results on four different U.S. county-level tasks, spanning demographic, health, and psychological outcomes show large and consistent improvements in prediction accuracies (e.g. from Pearson r=.73 to .82 for median income prediction or r=.37 to .47 for life satisfaction prediction) over the standard approach of aggregating all tweets. We make our aggregated and anonymized community-level data, derived from 37 billion tweets {–} over 1 billion of which were mapped to counties, available for research. |
Tasks | |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1148/ |
https://www.aclweb.org/anthology/D18-1148 | |
PWC | https://paperswithcode.com/paper/the-remarkable-benefit-of-user-level |
Repo | |
Framework | |
LREMap, a Song of Resources and Evaluation
Title | LREMap, a Song of Resources and Evaluation |
Authors | Riccardo Del Gratta, Sara Goggi, Gabriella Pardelli, Nicoletta Calzolari |
Abstract | |
Tasks | |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1203/ |
https://www.aclweb.org/anthology/L18-1203 | |
PWC | https://paperswithcode.com/paper/lremap-a-song-of-resources-and-evaluation |
Repo | |
Framework | |
Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH
Title | Building Literary Corpora for Computational Literary Analysis - A Prototype to Bridge the Gap between CL and DH |
Authors | Andrew Frank, Christine Ivanovic |
Abstract | |
Tasks | |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1128/ |
https://www.aclweb.org/anthology/L18-1128 | |
PWC | https://paperswithcode.com/paper/building-literary-corpora-for-computational |
Repo | |
Framework | |
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Title | Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) |
Authors | |
Abstract | |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/N18-1000/ |
https://www.aclweb.org/anthology/N18-1000 | |
PWC | https://paperswithcode.com/paper/proceedings-of-the-2018-conference-of-the-5 |
Repo | |
Framework | |
A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis
Title | A Lexicon-Based Supervised Attention Model for Neural Sentiment Analysis |
Authors | Yicheng Zou, Tao Gui, Qi Zhang, Xuanjing Huang |
Abstract | Attention mechanisms have been leveraged for sentiment classification tasks because not all words have the same importance. However, most existing attention models did not take full advantage of sentiment lexicons, which provide rich sentiment information and play a critical role in sentiment analysis. To achieve the above target, in this work, we propose a novel lexicon-based supervised attention model (LBSA), which allows a recurrent neural network to focus on the sentiment content, thus generating sentiment-informative representations. Compared with general attention models, our model has better interpretability and less noise. Experimental results on three large-scale sentiment classification datasets showed that the proposed method outperforms previous methods. |
Tasks | Sentiment Analysis |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1074/ |
https://www.aclweb.org/anthology/C18-1074 | |
PWC | https://paperswithcode.com/paper/a-lexicon-based-supervised-attention-model |
Repo | |
Framework | |
Cooperative Denoising for Distantly Supervised Relation Extraction
Title | Cooperative Denoising for Distantly Supervised Relation Extraction |
Authors | Kai Lei, Daoyuan Chen, Yaliang Li, Nan Du, Min Yang, Wei Fan, Ying Shen |
Abstract | Distantly supervised relation extraction greatly reduces human efforts in extracting relational facts from unstructured texts. However, it suffers from noisy labeling problem, which can degrade its performance. Meanwhile, the useful information expressed in knowledge graph is still underutilized in the state-of-the-art methods for distantly supervised relation extraction. In the light of these challenges, we propose CORD, a novelCOopeRativeDenoising framework, which consists two base networks leveraging text corpus and knowledge graph respectively, and a cooperative module involving their mutual learning by the adaptive bi-directional knowledge distillation and dynamic ensemble with noisy-varying instances. Experimental results on a real-world dataset demonstrate that the proposed method reduces the noisy labels and achieves substantial improvement over the state-of-the-art methods. |
Tasks | Denoising, Information Retrieval, Question Answering, Relation Extraction |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1036/ |
https://www.aclweb.org/anthology/C18-1036 | |
PWC | https://paperswithcode.com/paper/cooperative-denoising-for-distantly |
Repo | |
Framework | |
Word Affect Intensities
Title | Word Affect Intensities |
Authors | Saif Mohammad |
Abstract | |
Tasks | Emotion Recognition, Sentiment Analysis, Text Generation |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1027/ |
https://www.aclweb.org/anthology/L18-1027 | |
PWC | https://paperswithcode.com/paper/word-affect-intensities-1 |
Repo | |
Framework | |
Constrained Interacting Submodular Groupings
Title | Constrained Interacting Submodular Groupings |
Authors | Andrew Cotter, Mahdi Milani Fard, Seungil You, Maya Gupta, Jeff Bilmes |
Abstract | We introduce the problem of grouping a finite ground set into blocks where each block is a subset of the ground set and where: (i) the blocks are individually highly valued by a submodular function (both robustly and in the average case) while satisfying block-specific matroid constraints; and (ii) block scores interact where blocks are jointly scored highly, thus making the blocks mutually non-redundant. Submodular functions are good models of information and diversity; thus, the above can be seen as grouping the ground set into matroid constrained blocks that are both intra- and inter-diverse. Potential applications include forming ensembles of classification/regression models, partitioning data for parallel processing, and summarization. In the non-robust case, we reduce the problem to non-monotone submodular maximization subject to multiple matroid constraints. In the mixed robust/average case, we offer a bi-criterion guarantee for a polynomial time deterministic algorithm and a probabilistic guarantee for randomized algorithm, as long as the involved submodular functions (including the inter-block interaction terms) are monotone. We close with a case study in which we use these algorithms to find high quality diverse ensembles of classifiers, showing good results. |
Tasks | |
Published | 2018-07-01 |
URL | https://icml.cc/Conferences/2018/Schedule?showEvent=2129 |
http://proceedings.mlr.press/v80/cotter18a/cotter18a.pdf | |
PWC | https://paperswithcode.com/paper/constrained-interacting-submodular-groupings |
Repo | |
Framework | |
Universal Sentence Encoder for English
Title | Universal Sentence Encoder for English |
Authors | Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil |
Abstract | We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias. |
Tasks | Multi-Task Learning, Sentence Embedding, Sentence Embeddings, Tokenization, Transfer Learning |
Published | 2018-11-01 |
URL | https://www.aclweb.org/anthology/D18-2029/ |
https://www.aclweb.org/anthology/D18-2029 | |
PWC | https://paperswithcode.com/paper/universal-sentence-encoder-for-english |
Repo | |
Framework | |
DropMax: Adaptive Stochastic Softmax
Title | DropMax: Adaptive Stochastic Softmax |
Authors | Hae Beom Lee, Juho Lee, Eunho Yang, Sung Ju Hwang |
Abstract | We propose DropMax, a stochastic version of softmax classifier which at each iteration drops non-target classes with some probability, for each instance. Specifically, we overlay binary masking variables over class output probabilities, which are learned based on the input via regularized variational inference. This stochastic regularization has an effect of building an ensemble classifier out of combinatorial number of classifiers with different decision boundaries. Moreover, the learning of dropout probabilities for non-target classes on each instance allows the classifier to focus more on classification against the most confusing classes. We validate our model on multiple public datasets for classification, on which it obtains improved accuracy over regular softmax classifier and other baselines. Further analysis of the learned dropout masks shows that our model indeed selects confusing classes more often when it performs classification. |
Tasks | |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=Sy4c-3xRW |
https://openreview.net/pdf?id=Sy4c-3xRW | |
PWC | https://paperswithcode.com/paper/dropmax-adaptive-stochastic-softmax |
Repo | |
Framework | |
Cross-corpus Native Language Identification via Statistical Embedding
Title | Cross-corpus Native Language Identification via Statistical Embedding |
Authors | Francisco Rangel, Paolo Rosso, Julian Brooke, Alex Uitdenbogerd, ra |
Abstract | In this paper, we approach the task of native language identification in a realistic cross-corpus scenario where a model is trained with available data and has to predict the native language from data of a different corpus. The motivation behind this study is to investigate native language identification in the Australian academic scenario where a majority of students come from China, Indonesia, and Arabic-speaking nations. We have proposed a statistical embedding representation reporting a significant improvement over common single-layer approaches of the state of the art, identifying Chinese, Arabic, and Indonesian in a cross-corpus scenario. The proposed approach was shown to be competitive even when the data is scarce and imbalanced. |
Tasks | Language Identification, Native Language Identification |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1605/ |
https://www.aclweb.org/anthology/W18-1605 | |
PWC | https://paperswithcode.com/paper/cross-corpus-native-language-identification |
Repo | |
Framework | |
TCAV: Relative concept importance testing with Linear Concept Activation Vectors
Title | TCAV: Relative concept importance testing with Linear Concept Activation Vectors |
Authors | Been Kim, Justin Gilmer, Martin Wattenberg, Fernanda Viégas |
Abstract | Despite neural network’s high performance, the lack of interpretability has been the main bottleneck for its safe usage in practice. In domains with high stakes (e.g., medical diagnosis), gaining insights into the network is critical for gaining trust and being adopted. One of the ways to improve interpretability of a NN is to explain the importance of a particular concept (e.g., gender) in prediction. This is useful for explaining reasoning behind the networks’ predictions, and for revealing any biases the network may have. This work aims to provide quantitative answers to \textit{the relative importance of concepts of interest} via concept activation vectors (CAV). In particular, this framework enables non-machine learning experts to express concepts of interests and test hypotheses using examples (e.g., a set of pictures that illustrate the concept). We show that CAV can be learned given a relatively small set of examples. Testing with CAV, for example, can answer whether a particular concept (e.g., gender) is more important in predicting a given class (e.g., doctor) than other set of concepts. Interpreting with CAV does not require any retraining or modification of the network. We show that many levels of meaningful concepts are learned (e.g., color, texture, objects, a person’s occupation), and we present CAV’s \textit{empirical deepdream} — where we maximize an activation using a set of example pictures. We show how various insights can be gained from the relative importance testing with CAV. |
Tasks | Medical Diagnosis |
Published | 2018-01-01 |
URL | https://openreview.net/forum?id=S1viikbCW |
https://openreview.net/pdf?id=S1viikbCW | |
PWC | https://paperswithcode.com/paper/tcav-relative-concept-importance-testing-with |
Repo | |
Framework | |
Improving Neural Machine Translation by Incorporating Hierarchical Subword Features
Title | Improving Neural Machine Translation by Incorporating Hierarchical Subword Features |
Authors | Makoto Morishita, Jun Suzuki, Masaaki Nagata |
Abstract | This paper focuses on subword-based Neural Machine Translation (NMT). We hypothesize that in the NMT model, the appropriate subword units for the following three modules (layers) can differ: (1) the encoder embedding layer, (2) the decoder embedding layer, and (3) the decoder output layer. We find the subword based on Sennrich et al. (2016) has a feature that a large vocabulary is a superset of a small vocabulary and modify the NMT model enables the incorporation of several different subword units in a single embedding layer. We refer these small subword features as hierarchical subword features. To empirically investigate our assumption, we compare the performance of several different subword units and hierarchical subword features for both the encoder and decoder embedding layers. We confirmed that incorporating hierarchical subword features in the encoder consistently improves BLEU scores on the IWSLT evaluation datasets. |
Tasks | Machine Translation |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1052/ |
https://www.aclweb.org/anthology/C18-1052 | |
PWC | https://paperswithcode.com/paper/improving-neural-machine-translation-by |
Repo | |
Framework | |