Paper Group NANR 145
A working, non-trivial, topically indifferent NLG System for 17 languages. Exploiting Morphological Regularities in Distributional Word Representations. iSurvive: An Interpretable, Event-time Prediction Model for mHealth. Czech Dataset for Semantic Similarity and Relatedness. Co-reference Resolution in Tamil Text. An Empirical Bayes Approach to Opt …
A working, non-trivial, topically indifferent NLG System for 17 languages
Title | A working, non-trivial, topically indifferent NLG System for 17 languages |
Authors | Robert Wei{\ss}graeber, Andreas Madsack |
Abstract | A fully fledged practical working application for a rule-based NLG system is presented that is able to create non-trivial, human sounding narrative from structured data, in any language and for any topic. |
Tasks | Text Generation |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-3524/ |
https://www.aclweb.org/anthology/W17-3524 | |
PWC | https://paperswithcode.com/paper/a-working-non-trivial-topically-indifferent |
Repo | |
Framework | |
Exploiting Morphological Regularities in Distributional Word Representations
Title | Exploiting Morphological Regularities in Distributional Word Representations |
Authors | Arihant Gupta, Syed Sarfaraz Akhtar, Avijit Vajpayee, Arjit Srivastava, Madan Gopal Jhanwar, Manish Shrivastava |
Abstract | We present an unsupervised, language agnostic approach for exploiting morphological regularities present in high dimensional vector spaces. We propose a novel method for generating embeddings of words from their morphological variants using morphological transformation operators. We evaluate this approach on MSR word analogy test set with an accuracy of 85{%} which is 12{%} higher than the previous best known system. |
Tasks | Chunking, Document Classification, Question Answering, Word Embeddings |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/D17-1028/ |
https://www.aclweb.org/anthology/D17-1028 | |
PWC | https://paperswithcode.com/paper/exploiting-morphological-regularities-in |
Repo | |
Framework | |
iSurvive: An Interpretable, Event-time Prediction Model for mHealth
Title | iSurvive: An Interpretable, Event-time Prediction Model for mHealth |
Authors | Walter H. Dempsey, Alexander Moreno, Christy K. Scott, Michael L. Dennis, David H. Gustafson, Susan A. Murphy, James M. Rehg |
Abstract | An important mobile health (mHealth) task is the use of multimodal data, such as sensor streams and self-report, to construct interpretable time-to-event predictions of, for example, lapse to alcohol or illicit drug use. Interpretability of the prediction model is important for acceptance and adoption by domain scientists, enabling model outputs and parameters to inform theory and guide intervention design. Temporal latent state models are therefore attractive, and so we adopt the continuous time hidden Markov model (CT-HMM) due to its ability to describe irregular arrival times of event data. Standard CT-HMMs, however, are not specialized for predicting the time to a future event, the key variable for mHealth interventions. Also, standard emission models lack a sufficiently rich structure to describe multimodal data and incorporate domain knowledge. We present iSurvive, an extension of classical survival analysis to a CT-HMM. We present a parameter learning method for GLM emissions and survival model fitting, and present promising results on both synthetic data and an mHealth drug use dataset. |
Tasks | Survival Analysis |
Published | 2017-08-01 |
URL | https://icml.cc/Conferences/2017/Schedule?showEvent=732 |
http://proceedings.mlr.press/v70/dempsey17a/dempsey17a.pdf | |
PWC | https://paperswithcode.com/paper/isurvive-an-interpretable-event-time |
Repo | |
Framework | |
Czech Dataset for Semantic Similarity and Relatedness
Title | Czech Dataset for Semantic Similarity and Relatedness |
Authors | Miloslav Konop{'\i}k, Ond{\v{r}}ej Pra{\v{z}}{'a}k, David Steinberger |
Abstract | This paper introduces a Czech dataset for semantic similarity and semantic relatedness. The dataset contains word pairs with hand annotated scores that indicate the semantic similarity and semantic relatedness of the words. The dataset contains 953 word pairs compiled from 9 different sources. It contains words and their contexts taken from real text corpora including extra examples when the words are ambiguous. The dataset is annotated by 5 independent annotators. The average Spearman correlation coefficient of the annotation agreement is $r = 0.81$. We provide reference evaluation experiments with several methods for computing semantic similarity and relatedness. |
Tasks | Semantic Similarity, Semantic Textual Similarity |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/R17-1053/ |
https://doi.org/10.26615/978-954-452-049-6_053 | |
PWC | https://paperswithcode.com/paper/czech-dataset-for-semantic-similarity-and |
Repo | |
Framework | |
Co-reference Resolution in Tamil Text
Title | Co-reference Resolution in Tamil Text |
Authors | Vijay Sundar Ram, Sobha Lalitha Devi |
Abstract | |
Tasks | |
Published | 2017-12-01 |
URL | https://www.aclweb.org/anthology/W17-7548/ |
https://www.aclweb.org/anthology/W17-7548 | |
PWC | https://paperswithcode.com/paper/co-reference-resolution-in-tamil-text |
Repo | |
Framework | |
An Empirical Bayes Approach to Optimizing Machine Learning Algorithms
Title | An Empirical Bayes Approach to Optimizing Machine Learning Algorithms |
Authors | James Mcinerney |
Abstract | There is rapidly growing interest in using Bayesian optimization to tune model and inference hyperparameters for machine learning algorithms that take a long time to run. For example, Spearmint is a popular software package for selecting the optimal number of layers and learning rate in neural networks. But given that there is uncertainty about which hyperparameters give the best predictive performance, and given that fitting a model for each choice of hyperparameters is costly, it is arguably wasteful to “throw away” all but the best result, as per Bayesian optimization. A related issue is the danger of overfitting the validation data when optimizing many hyperparameters. In this paper, we consider an alternative approach that uses more samples from the hyperparameter selection procedure to average over the uncertainty in model hyperparameters. The resulting approach, empirical Bayes for hyperparameter averaging (EB-Hyp) predicts held-out data better than Bayesian optimization in two experiments on latent Dirichlet allocation and deep latent Gaussian models. EB-Hyp suggests a simpler approach to evaluating and deploying machine learning algorithms that does not require a separate validation data set and hyperparameter selection procedure. |
Tasks | |
Published | 2017-12-01 |
URL | http://papers.nips.cc/paper/6864-an-empirical-bayes-approach-to-optimizing-machine-learning-algorithms |
http://papers.nips.cc/paper/6864-an-empirical-bayes-approach-to-optimizing-machine-learning-algorithms.pdf | |
PWC | https://paperswithcode.com/paper/an-empirical-bayes-approach-to-optimizing |
Repo | |
Framework | |
Out-of-domain FrameNet Semantic Role Labeling
Title | Out-of-domain FrameNet Semantic Role Labeling |
Authors | Silvana Hartmann, Ilia Kuznetsov, Teresa Martin, Iryna Gurevych |
Abstract | Domain dependence of NLP systems is one of the major obstacles to their application in large-scale text analysis, also restricting the applicability of FrameNet semantic role labeling (SRL) systems. Yet, current FrameNet SRL systems are still only evaluated on a single in-domain test set. For the first time, we study the domain dependence of FrameNet SRL on a wide range of benchmark sets. We create a novel test set for FrameNet SRL based on user-generated web text and find that the major bottleneck for out-of-domain FrameNet SRL is the frame identification step. To address this problem, we develop a simple, yet efficient system based on distributed word representations. Our system closely approaches the state-of-the-art in-domain while outperforming the best available frame identification system out-of-domain. We publish our system and test data for research purposes. |
Tasks | Semantic Role Labeling |
Published | 2017-04-01 |
URL | https://www.aclweb.org/anthology/E17-1045/ |
https://www.aclweb.org/anthology/E17-1045 | |
PWC | https://paperswithcode.com/paper/out-of-domain-framenet-semantic-role-labeling |
Repo | |
Framework | |
Vectors for Counterspeech on Twitter
Title | Vectors for Counterspeech on Twitter |
Authors | Lucas Wright, Derek Ruths, Kelly P Dillon, Haji Mohammad Saleem, Susan Benesch |
Abstract | A study of conversations on Twitter found that some arguments between strangers led to favorable change in discourse and even in attitudes. The authors propose that such exchanges can be usefully distinguished according to whether individuals or groups take part on each side, since the opportunity for a constructive exchange of views seems to vary accordingly. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-3009/ |
https://www.aclweb.org/anthology/W17-3009 | |
PWC | https://paperswithcode.com/paper/vectors-for-counterspeech-on-twitter |
Repo | |
Framework | |
Parsing for Grammatical Relations via Graph Merging
Title | Parsing for Grammatical Relations via Graph Merging |
Authors | Weiwei Sun, Yantao Du, Xiaojun Wan |
Abstract | This paper is concerned with building deep grammatical relation (GR) analysis using data-driven approach. To deal with this problem, we propose graph merging, a new perspective, for building flexible dependency graphs: Constructing complex graphs via constructing simple subgraphs. We discuss two key problems in this perspective: (1) how to decompose a complex graph into simple subgraphs, and (2) how to combine subgraphs into a coherent complex graph. Experiments demonstrate the effectiveness of graph merging. Our parser reaches state-of-the-art performance and is significantly better than two transition-based parsers. |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/K17-1005/ |
https://www.aclweb.org/anthology/K17-1005 | |
PWC | https://paperswithcode.com/paper/parsing-for-grammatical-relations-via-graph |
Repo | |
Framework | |
Rephrasing Profanity in Chinese Text
Title | Rephrasing Profanity in Chinese Text |
Authors | Hui-Po Su, Zhen-Jie Huang, Hao-Tsung Chang, Chuan-Jie Lin |
Abstract | This paper proposes a system that can detect and rephrase profanity in Chinese text. Rather than just masking detected profanity, we want to revise the input sentence by using inoffensive words while keeping their original meanings. 29 of such rephrasing rules were invented after observing sentences on real-word social websites. The overall accuracy of the proposed system is 85.56{%} |
Tasks | |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/W17-3003/ |
https://www.aclweb.org/anthology/W17-3003 | |
PWC | https://paperswithcode.com/paper/rephrasing-profanity-in-chinese-text |
Repo | |
Framework | |
PurdueNLP at SemEval-2017 Task 1: Predicting Semantic Textual Similarity with Paraphrase and Event Embeddings
Title | PurdueNLP at SemEval-2017 Task 1: Predicting Semantic Textual Similarity with Paraphrase and Event Embeddings |
Authors | I-Ta Lee, Mahak Goindani, Chang Li, Di Jin, Kristen Marie Johnson, Xiao Zhang, Maria Leonor Pacheco, Dan Goldwasser |
Abstract | This paper describes our proposed solution for SemEval 2017 Task 1: Semantic Textual Similarity (Daniel Cer and Specia, 2017). The task aims at measuring the degree of equivalence between sentences given in English. Performance is evaluated by computing Pearson Correlation scores between the predicted scores and human judgements. Our proposed system consists of two subsystems and one regression model for predicting STS scores. The two subsystems are designed to learn Paraphrase and Event Embeddings that can take the consideration of paraphrasing characteristics and sentence structures into our system. The regression model associates these embeddings to make the final predictions. The experimental result shows that our system acquires 0.8 of Pearson Correlation Scores in this task. |
Tasks | Question Answering, Semantic Textual Similarity, Word Embeddings |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2029/ |
https://www.aclweb.org/anthology/S17-2029 | |
PWC | https://paperswithcode.com/paper/purduenlp-at-semeval-2017-task-1-predicting |
Repo | |
Framework | |
The Agreement Measure γcat a Complement to γ Focused on Categorization of a Continuum
Title | The Agreement Measure γcat a Complement to γ Focused on Categorization of a Continuum |
Authors | Yann Mathet |
Abstract | Agreement on unitizing, where several annotators freely put units of various sizes and categories on a continuum, is difficult to assess because of the simultaneaous discrepancies in positioning and categorizing. The recent agreement measure γ offers an overall solution that simultaneously takes into account positions and categories. In this article, I propose the additional coefficient γcat, which complements γ by assessing the agreement on categorization of a continuum, putting aside positional discrepancies. When applied to pure categorization (with predefined units), γcat behaves the same way as the famous dedicated Krippendorff{'}s α, even with missing values, which proves its consistency. A variation of γcat is also proposed that provides an in-depth assessment of categorizing for each individual category. The entire family of γ coefficients is implemented in free software. |
Tasks | |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/J17-3006/ |
https://www.aclweb.org/anthology/J17-3006 | |
PWC | https://paperswithcode.com/paper/the-agreement-measure-i3cat-a-complement-to |
Repo | |
Framework | |
Annotating Italian Social Media Texts in Universal Dependencies
Title | Annotating Italian Social Media Texts in Universal Dependencies |
Authors | Manuela Sanguinetti, Cristina Bosco, Aless Mazzei, ro, Alberto Lavelli, Fabio Tamburini |
Abstract | |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2017-09-01 |
URL | https://www.aclweb.org/anthology/W17-6526/ |
https://www.aclweb.org/anthology/W17-6526 | |
PWC | https://paperswithcode.com/paper/annotating-italian-social-media-texts-in |
Repo | |
Framework | |
OMAM at SemEval-2017 Task 4: Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based Model
Title | OMAM at SemEval-2017 Task 4: Evaluation of English State-of-the-Art Sentiment Analysis Models for Arabic and a New Topic-based Model |
Authors | Ramy Baly, Gilbert Badaro, Ali Hamdi, Rawan Moukalled, Rita Aoun, Georges El-Khoury, Ahmad Al Sallab, Hazem Hajj, Nizar Habash, Khaled Shaban, Wassim El-Hajj |
Abstract | While sentiment analysis in English has achieved significant progress, it remains a challenging task in Arabic given the rich morphology of the language. It becomes more challenging when applied to Twitter data that comes with additional sources of noise including dialects, misspellings, grammatical mistakes, code switching and the use of non-textual objects to express sentiments. This paper describes the {``}OMAM{''} systems that we developed as part of SemEval-2017 task 4. We evaluate English state-of-the-art methods on Arabic tweets for subtask A. As for the remaining subtasks, we introduce a topic-based approach that accounts for topic specificities by predicting topics or domains of upcoming tweets, and then using this information to predict their sentiment. Results indicate that applying the English state-of-the-art method to Arabic has achieved solid results without significant enhancements. Furthermore, the topic-based method ranked 1st in subtasks C and E, and 2nd in subtask D. | |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2099/ |
https://www.aclweb.org/anthology/S17-2099 | |
PWC | https://paperswithcode.com/paper/omam-at-semeval-2017-task-4-evaluation-of |
Repo | |
Framework | |
OMAM at SemEval-2017 Task 4: English Sentiment Analysis with Conditional Random Fields
Title | OMAM at SemEval-2017 Task 4: English Sentiment Analysis with Conditional Random Fields |
Authors | Chukwuyem Onyibe, Nizar Habash |
Abstract | We describe a supervised system that uses optimized Condition Random Fields and lexical features to predict the sentiment of a tweet. The system was submitted to the English version of all subtasks in SemEval-2017 Task 4. |
Tasks | Opinion Mining, Sentiment Analysis, Stance Detection |
Published | 2017-08-01 |
URL | https://www.aclweb.org/anthology/S17-2111/ |
https://www.aclweb.org/anthology/S17-2111 | |
PWC | https://paperswithcode.com/paper/omam-at-semeval-2017-task-4-english-sentiment |
Repo | |
Framework | |