Paper Group NANR 125
Building an annotated dataset of app store reviews with Appraisal features in English and Spanish. RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. A prototype finite-state morphological analyser for Chukchi. Building a Morphological Treebank for German from a Linguistic Database. Robust Subspace Approximation in a S …
Building an annotated dataset of app store reviews with Appraisal features in English and Spanish
Title | Building an annotated dataset of app store reviews with Appraisal features in English and Spanish |
Authors | Natalia Mora, Julia Lavid-L{'o}pez |
Abstract | This paper describes the creation and annotation of a dataset consisting of 250 English and Spanish app store reviews from Google{'}s Play Store with Appraisal features. This is one of the most influential linguistic frameworks for the analysis of evaluation and opinion in discourse due to its insightful descriptive features. However, it has not been extensively applied in NLP in spite of its potential for the classification of the subjective content of these reviews. We describe the dataset, the annotation scheme and guidelines, the agreement studies, the annotation results and their impact on the characterisation of this genre. |
Tasks | |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/W18-1103/ |
https://www.aclweb.org/anthology/W18-1103 | |
PWC | https://paperswithcode.com/paper/building-an-annotated-dataset-of-app-store |
Repo | |
Framework | |
RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian
Title | RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian |
Authors | Anna Rogers, Alexey Romanov, Anna Rumshisky, Svitlana Volkova, Mikhail Gronas, Alex Gribov |
Abstract | This paper presents RuSentiment, a new dataset for sentiment analysis of social media posts in Russian, and a new set of comprehensive annotation guidelines that are extensible to other languages. RuSentiment is currently the largest in its class for Russian, with 31,185 posts annotated with Fleiss{'} kappa of 0.58 (3 annotations per post). To diversify the dataset, 6,950 posts were pre-selected with an active learning-style strategy. We report baseline classification results, and we also release the best-performing embeddings trained on 3.2B tokens of Russian VKontakte posts. |
Tasks | Active Learning, Sentiment Analysis, Word Embeddings |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/C18-1064/ |
https://www.aclweb.org/anthology/C18-1064 | |
PWC | https://paperswithcode.com/paper/rusentiment-an-enriched-sentiment-analysis |
Repo | |
Framework | |
A prototype finite-state morphological analyser for Chukchi
Title | A prototype finite-state morphological analyser for Chukchi |
Authors | Vasilisa Andriyanets, Francis Tyers |
Abstract | In this article we describe the application of finite-state transducers to the morphological and phonological systems of Chukchi, a polysynthetic language spoken in the north of the Russian Federation. The language exhibits progressive and regressive vowel harmony, productive incorporation and extensive circumfixing. To implement the analyser we use the well-known Helsinki Finite-State Toolkit (HFST). The resulting model covers the majority of the morphological and phonological processes. A brief evaluation carried out on publically-available corpora shows that the coverage of the transducer is between and 53{%} and 76{%}. An error evaluation of 100 tokens randomly selected from the corpus, which were not covered by the analyser shows that most of the morphological processes are covered and that the majority of errors are caused by a limited stem lexicon. |
Tasks | |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-4804/ |
https://www.aclweb.org/anthology/W18-4804 | |
PWC | https://paperswithcode.com/paper/a-prototype-finite-state-morphological |
Repo | |
Framework | |
Building a Morphological Treebank for German from a Linguistic Database
Title | Building a Morphological Treebank for German from a Linguistic Database |
Authors | Petra Steiner, Josef Ruppenhofer |
Abstract | |
Tasks | Morphological Analysis |
Published | 2018-05-01 |
URL | https://www.aclweb.org/anthology/L18-1613/ |
https://www.aclweb.org/anthology/L18-1613 | |
PWC | https://paperswithcode.com/paper/building-a-morphological-treebank-for-german |
Repo | |
Framework | |
Robust Subspace Approximation in a Stream
Title | Robust Subspace Approximation in a Stream |
Authors | Roie Levin, Anish Prasad Sevekari, David Woodruff |
Abstract | We study robust subspace estimation in the streaming and distributed settings. Given a set of n data points {a_i}{i=1}^n in R^d and an integer k, we wish to find a linear subspace S of dimension k for which sum_i M(dist(S, a_i)) is minimized, where dist(S,x) := min{y in S} x-y_2, and M() is some loss function. When M is the identity function, S gives a subspace that is more robust to outliers than that provided by the truncated SVD. Though the problem is NP-hard, it is approximable within a (1+epsilon) factor in polynomial time when k and epsilon are constant. We give the first sublinear approximation algorithm for this problem in the turnstile streaming and arbitrary partition distributed models, achieving the same time guarantees as in the offline case. Our algorithm is the first based entirely on oblivious dimensionality reduction, and significantly simplifies prior methods for this problem, which held in neither the streaming nor distributed models. |
Tasks | Dimensionality Reduction |
Published | 2018-12-01 |
URL | http://papers.nips.cc/paper/8267-robust-subspace-approximation-in-a-stream |
http://papers.nips.cc/paper/8267-robust-subspace-approximation-in-a-stream.pdf | |
PWC | https://paperswithcode.com/paper/robust-subspace-approximation-in-a-stream |
Repo | |
Framework | |
Towards Realistic Predictors
Title | Towards Realistic Predictors |
Authors | Pei Wang, Nuno Vasconcelos |
Abstract | A new class of predictors, denoted realistic predictors, is defined. These are predictors that, like humans, assess the difficulty of examples, reject to work on those that are deemed too hard, but guarantee good performance on the ones they operate on. In this paper, we talk about a particular case of it, realistic classifiers. The central problem in realistic classification, the design of an inductive predictor of hardness scores, is considered. It is argued that this should be a predictor independent of the classifier itself, but tuned to it, and learned without explicit supervision, so as to learn from its mistakes. A new architecture is proposed to accomplish these goals by complementing the classifier with an auxiliary hardness prediction network (HP-Net). Sharing the same inputs as classifiers, the HP-Net outputs the hardness scores to be fed to the classifier as loss weights. Alternatively, the output of classifiers is also fed to HP-Net in a new defined loss, variant of cross entropy loss. The two networks are trained jointly in an adversarial way where, as the classifier learns to improve its predictions, the HP-Net refines its hardness scores. Given the learned hardness predictor, a simple implementation of realistic classifiers is proposed by rejecting examples with large scores. Experimental results not only provide evidence in support of the effectiveness of the proposed architecture and the learned hardness predictor, but also show that the realistic classifier always improves performance on the examples that it accepts to classify, performing better on these examples than an equivalent nonrealistic classifier. All of these make it possible for realistic classifiers to guarantee a good performance. |
Tasks | |
Published | 2018-09-01 |
URL | http://openaccess.thecvf.com/content_ECCV_2018/html/Pei_Wang_Towards_Realistic_Predictors_ECCV_2018_paper.html |
http://openaccess.thecvf.com/content_ECCV_2018/papers/Pei_Wang_Towards_Realistic_Predictors_ECCV_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/towards-realistic-predictors |
Repo | |
Framework | |
Autonomous Sub-domain Modeling for Dialogue Policy with Hierarchical Deep Reinforcement Learning
Title | Autonomous Sub-domain Modeling for Dialogue Policy with Hierarchical Deep Reinforcement Learning |
Authors | Giovanni Yoko Kristianto, Huiwen Zhang, Bin Tong, Makoto Iwayama, Yoshiyuki Kobayashi |
Abstract | Solving composites tasks, which consist of several inherent sub-tasks, remains a challenge in the research area of dialogue. Current studies have tackled this issue by manually decomposing the composite tasks into several sub-domains. However, much human effort is inevitable. This paper proposes a dialogue framework that autonomously models meaningful sub-domains and learns the policy over them. Our experiments show that our framework outperforms the baseline without subdomains by 11{%} in terms of success rate, and is competitive with that with manually defined sub-domains. |
Tasks | Hierarchical Reinforcement Learning |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/W18-5702/ |
https://www.aclweb.org/anthology/W18-5702 | |
PWC | https://paperswithcode.com/paper/autonomous-sub-domain-modeling-for-dialogue |
Repo | |
Framework | |
Coding Structures and Actions with the COSTA Scheme in Medical Conversations
Title | Coding Structures and Actions with the COSTA Scheme in Medical Conversations |
Authors | Nan Wang, Yan Song, Fei Xia |
Abstract | This paper describes the COSTA scheme for coding structures and actions in conversation. Informed by Conversation Analysis, the scheme introduces an innovative method for marking multi-layer structural organization of conversation and a structure-informed taxonomy of actions. In addition, we create a corpus of naturally occurring medical conversations, containing 318 video-recorded and manually transcribed pediatric consultations. Based on the annotated corpus, we investigate 1) treatment decision-making process in medical conversations, and 2) effects of physician-caregiver communication behaviors on antibiotic over-prescribing. Although the COSTA annotation scheme is developed based on data from the task-specific domain of pediatric consultations, it can be easily extended to apply to more general domains and other languages. |
Tasks | Decision Making |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-2309/ |
https://www.aclweb.org/anthology/W18-2309 | |
PWC | https://paperswithcode.com/paper/coding-structures-and-actions-with-the-costa |
Repo | |
Framework | |
Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing
Title | Biomedical Event Extraction Using Convolutional Neural Networks and Dependency Parsing |
Authors | Jari Bj{"o}rne, Tapio Salakoski |
Abstract | Event and relation extraction are central tasks in biomedical text mining. Where relation extraction concerns the detection of semantic connections between pairs of entities, event extraction expands this concept with the addition of trigger words, multiple arguments and nested events, in order to more accurately model the diversity of natural language. In this work we develop a convolutional neural network that can be used for both event and relation extraction. We use a linear representation of the input text, where information is encoded with various vector space embeddings. Most notably, we encode the parse graph into this linear space using dependency path embeddings. We integrate our neural network into the open source Turku Event Extraction System (TEES) framework. Using this system, our machine learning model can be easily applied to a large set of corpora from e.g. the BioNLP, DDI Extraction and BioCreative shared tasks. We evaluate our system on 12 different event, relation and NER corpora, showing good generalizability to many tasks and achieving improved performance on several corpora. |
Tasks | Dependency Parsing, Relation Extraction, Semantic Parsing, Word Embeddings |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/W18-2311/ |
https://www.aclweb.org/anthology/W18-2311 | |
PWC | https://paperswithcode.com/paper/biomedical-event-extraction-using |
Repo | |
Framework | |
Constraining MGbank: Agreement, L-Selection and Supertagging in Minimalist Grammars
Title | Constraining MGbank: Agreement, L-Selection and Supertagging in Minimalist Grammars |
Authors | John Torr |
Abstract | This paper reports on two strategies that have been implemented for improving the efficiency and precision of wide-coverage Minimalist Grammar (MG) parsing. The first extends the formalism presented in Torr and Stabler (2016) with a mechanism for enforcing fine-grained selectional restrictions and agreements. The second is a method for factoring computationally costly null heads out from bottom-up MG parsing; this has the additional benefit of rendering the formalism fully compatible for the first time with highly efficient Markovian supertaggers. These techniques aided in the task of generating MGbank, the first wide-coverage corpus of Minimalist Grammar derivation trees. |
Tasks | |
Published | 2018-07-01 |
URL | https://www.aclweb.org/anthology/P18-1055/ |
https://www.aclweb.org/anthology/P18-1055 | |
PWC | https://paperswithcode.com/paper/constraining-mgbank-agreement-l-selection-and |
Repo | |
Framework | |
Natural Language Generation for Polysynthetic Languages: Language Teaching and Learning Software for Kanyen’k'eha (Mohawk)
Title | Natural Language Generation for Polysynthetic Languages: Language Teaching and Learning Software for Kanyen’k'eha (Mohawk) |
Authors | Greg Lessard, Nathan Brinklow, Michael Levison |
Abstract | Kanyen{'}k{'e}ha (in English, Mohawk) is an Iroquoian language spoken primarily in Eastern Canada (Ontario, Qu{'e}bec). Classified as endangered, it has only a small number of speakers and very few younger native speakers. Consequently, teachers and courses, teaching materials and software are urgently needed. In the case of software, the polysynthetic nature of Kanyen{'}k{'e}ha means that the number of possible combinations grows exponentially and soon surpasses attempts to capture variant forms by hand. It is in this context that we describe an attempt to produce language teaching materials based on a generative approach. A natural language generation environment (ivi/Vinci) embedded in a web environment (VinciLingua) makes it possible to produce, by rule, variant forms of indefinite complexity. These may be used as models to explore, or as materials to which learners respond. Generated materials may take the form of written text, oral utterances, or images; responses may be typed on a keyboard, gestural (using a mouse) or, to a limited extent, oral. The software also provides complex orthographic, morphological and syntactic analysis of learner productions. We describe the trajectory of development of materials for a suite of four courses on Kanyen{'}k{'e}ha, the first of which will be taught in the fall of 2018. |
Tasks | Text Generation |
Published | 2018-08-01 |
URL | https://www.aclweb.org/anthology/W18-4805/ |
https://www.aclweb.org/anthology/W18-4805 | |
PWC | https://paperswithcode.com/paper/natural-language-generation-for-polysynthetic |
Repo | |
Framework | |
Sentiment Classification towards Question-Answering with Hierarchical Matching Network
Title | Sentiment Classification towards Question-Answering with Hierarchical Matching Network |
Authors | Chenlin Shen, Changlong Sun, Jingjing Wang, Yangyang Kang, Shoushan Li, Xiaozhong Liu, Luo Si, Min Zhang, Guodong Zhou |
Abstract | In an e-commerce environment, user-oriented question-answering (QA) text pair could carry rich sentiment information. In this study, we propose a novel task/method to address QA sentiment analysis. In particular, we create a high-quality annotated corpus with specially-designed annotation guidelines for QA-style sentiment classification. On the basis, we propose a three-stage hierarchical matching network to explore deep sentiment information in a QA text pair. First, we segment both the question and answer text into sentences and construct a number of [Q-sentence, A-sentence] units in each QA text pair. Then, by leveraging a QA bidirectional matching layer, the proposed approach can learn the matching vectors of each [Q-sentence, A-sentence] unit. Finally, we characterize the importance of the generated matching vectors via a self-matching attention layer. Experimental results, comparing with a number of state-of-the-art baselines, demonstrate the impressive effectiveness of the proposed approach for QA-style sentiment classification. |
Tasks | Opinion Mining, Question Answering, Sentiment Analysis |
Published | 2018-10-01 |
URL | https://www.aclweb.org/anthology/D18-1401/ |
https://www.aclweb.org/anthology/D18-1401 | |
PWC | https://paperswithcode.com/paper/sentiment-classification-towards-question |
Repo | |
Framework | |
Analytic Expressions for Probabilistic Moments of PL-DNN With Gaussian Input
Title | Analytic Expressions for Probabilistic Moments of PL-DNN With Gaussian Input |
Authors | Adel Bibi, Modar Alfadly, Bernard Ghanem |
Abstract | The outstanding performance of deep neural networks (DNNs), for the visual recognition task in particular, has been demonstrated on several large-scale benchmarks. This performance has immensely strengthened the line of re- search that aims to understand and analyze the driving reasons behind the effectiveness of these networks. One important aspect of this analysis has recently gained much attention, namely the reaction of a DNN to noisy input. This has spawned research on developing adversarial input attacks as well as training strategies that make DNNs more robust against these attacks. To this end, we derive in this pa- per exact analytic expressions for the first and second moments (mean and variance) of a small piecewise linear (PL) network (Affine, ReLU, Affine) subject to general Gaussian input. We experimentally show that these expressions are tight under simple linearizations of deeper PL-DNNs, especially popular architectures in the literature (e.g. LeNet and AlexNet). Extensive experiments on image classification show that these expressions can be used to study the behaviour of the output mean of the logits for each class, the interclass confusion and the pixel-level spatial noise sensitivity of the network. Moreover, we show how these expressions can be used to systematically construct targeted and non-targeted adversarial attacks. |
Tasks | Image Classification |
Published | 2018-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2018/html/Bibi_Analytic_Expressions_for_CVPR_2018_paper.html |
http://openaccess.thecvf.com/content_cvpr_2018/papers/Bibi_Analytic_Expressions_for_CVPR_2018_paper.pdf | |
PWC | https://paperswithcode.com/paper/analytic-expressions-for-probabilistic |
Repo | |
Framework | |
Measuring Frame Instance Relatedness
Title | Measuring Frame Instance Relatedness |
Authors | Valerio Basile, Roque Lopez Condori, Elena Cabrio |
Abstract | Frame semantics is a well-established framework to represent the meaning of natural language in computational terms. In this work, we aim to propose a quantitative measure of relatedness between pairs of frame instances. We test our method on a dataset of sentence pairs, highlighting the correlation between our metric and human judgments of semantic similarity. Furthermore, we propose an application of our measure for clustering frame instances to extract prototypical knowledge from natural language. |
Tasks | Reading Comprehension, Semantic Similarity, Semantic Textual Similarity, Sentiment Analysis |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-2029/ |
https://www.aclweb.org/anthology/S18-2029 | |
PWC | https://paperswithcode.com/paper/measuring-frame-instance-relatedness |
Repo | |
Framework | |
UniMelb at SemEval-2018 Task 12: Generative Implication using LSTMs, Siamese Networks and Semantic Representations with Synonym Fuzzing
Title | UniMelb at SemEval-2018 Task 12: Generative Implication using LSTMs, Siamese Networks and Semantic Representations with Synonym Fuzzing |
Authors | Anirudh Joshi, Tim Baldwin, Richard O. Sinnott, Cecile Paris |
Abstract | This paper describes a warrant classification system for SemEval 2018 Task 12, that attempts to learn semantic representations of reasons, claims and warrants. The system consists of 3 stacked LSTMs: one for the reason, one for the claim, and one shared Siamese Network for the 2 candidate warrants. Our main contribution is to force the embeddings into a shared feature space using vector operations, semantic similarity classification, Siamese networks, and multi-task learning. In doing so, we learn a form of generative implication, in encoding implication interrelationships between reasons, claims, and the associated correct and incorrect warrants. We augment the limited data in the task further by utilizing WordNet synonym {``}fuzzing{''}. When applied to SemEval 2018 Task 12, our system performs well on the development data, and officially ranked 8th among 21 teams. | |
Tasks | Multi-Task Learning, Semantic Similarity, Semantic Textual Similarity, Word Embeddings |
Published | 2018-06-01 |
URL | https://www.aclweb.org/anthology/S18-1190/ |
https://www.aclweb.org/anthology/S18-1190 | |
PWC | https://paperswithcode.com/paper/unimelb-at-semeval-2018-task-12-generative |
Repo | |
Framework | |