Paper Group NANR 98
A Global Analysis of Emoji Usage. Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik. CDE-IIITH at SemEval-2016 Task 12: Extraction of Temporal Information from Clinical documents using Machine Learning techniques. Sentiment Analysis of Tweets in Three Indian Languages. A Framework for Automat …
A Global Analysis of Emoji Usage
Title | A Global Analysis of Emoji Usage |
Authors | Nikola Ljube{\v{s}}i{'c}, Darja Fi{\v{s}}er |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2610/ |
https://www.aclweb.org/anthology/W16-2610 | |
PWC | https://paperswithcode.com/paper/a-global-analysis-of-emoji-usage |
Repo | |
Framework | |
Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik
Title | Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik |
Authors | Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin |
Abstract | This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations. In the absence of large annotated corpora, parallel corpora, treebanks, bilingual lexica, etc., we found the following to be effective: exploiting distributional regularities in monolingual data, projecting information across closely related languages, and utilizing human linguist judgments. We show promising results on both a four-month exercise in Sorani and a two-day exercise in Tajik, achieved with minimal annotation costs. | |
Tasks | Named Entity Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1095/ |
https://www.aclweb.org/anthology/C16-1095 | |
PWC | https://paperswithcode.com/paper/named-entity-recognition-for-linguistic-rapid |
Repo | |
Framework | |
CDE-IIITH at SemEval-2016 Task 12: Extraction of Temporal Information from Clinical documents using Machine Learning techniques
Title | CDE-IIITH at SemEval-2016 Task 12: Extraction of Temporal Information from Clinical documents using Machine Learning techniques |
Authors | Veera Raghavendra Chikka |
Abstract | |
Tasks | Relation Extraction, Temporal Information Extraction |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1192/ |
https://www.aclweb.org/anthology/S16-1192 | |
PWC | https://paperswithcode.com/paper/cde-iiith-at-semeval-2016-task-12-extraction |
Repo | |
Framework | |
Sentiment Analysis of Tweets in Three Indian Languages
Title | Sentiment Analysis of Tweets in Three Indian Languages |
Authors | Shanta Phani, Shibamouli Lahiri, Arindam Biswas |
Abstract | In this paper, we describe the results of sentiment analysis on tweets in three Indian languages {–} Bengali, Hindi, and Tamil. We used the recently released SAIL dataset (Patra et al., 2015), and obtained state-of-the-art results in all three languages. Our features are simple, robust, scalable, and language-independent. Further, we show that these simple features provide better results than more complex and language-specific features, in two separate classification tasks. Detailed feature analysis and error analysis have been reported, along with learning curves for Hindi and Bengali. |
Tasks | Opinion Mining, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3710/ |
https://www.aclweb.org/anthology/W16-3710 | |
PWC | https://paperswithcode.com/paper/sentiment-analysis-of-tweets-in-three-indian |
Repo | |
Framework | |
A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora
Title | A Framework for Automatic Acquisition of Croatian and Serbian Verb Aspect from Corpora |
Authors | Tanja Samard{\v{z}}i{'c}, Maja Mili{\v{c}}evi{'c} |
Abstract | Verb aspect is a grammatical and lexical category that encodes temporal unfolding and duration of events described by verbs. It is a potentially interesting source of information for various computational tasks, but has so far not been studied in much depth from the perspective of automatic processing. Slavic languages are particularly interesting in this respect, as they encode aspect through complex and not entirely consistent lexical derivations involving prefixation and suffixation. Focusing on Croatian and Serbian, in this paper we propose a novel framework for automatic classification of their verb types into a number of fine-grained aspectual classes based on the observable morphology of verb forms. In addition, we provide a set of around 2000 verbs classified based on our framework. This set can be used for linguistic research as well as for testing automatic classification on a larger scale. With minor adjustments the approach is also applicable to other Slavic languages |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1728/ |
https://www.aclweb.org/anthology/L16-1728 | |
PWC | https://paperswithcode.com/paper/a-framework-for-automatic-acquisition-of |
Repo | |
Framework | |
Deep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition
Title | Deep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition |
Authors | Jinzhuo Wang, Wenmin Wang, Xiongtao Chen, Ronggang Wang, Wen Gao |
Abstract | Contexts are crucial for action recognition in video. Current methods often mine contexts after extracting hierarchical local features and focus on their high-order encodings. This paper instead explores contexts as early as possible and leverages their evolutions for action recognition. In particular, we introduce a novel architecture called deep alternative neural network (DANN) stacking alternative layers. Each alternative layer consists of a volumetric convolutional layer followed by a recurrent layer. The former acts as local feature learner while the latter is used to collect contexts. Compared with feed-forward neural networks, DANN learns contexts of local features from the very beginning. This setting helps to preserve hierarchical context evolutions which we show are essential to recognize similar actions. Besides, we present an adaptive method to determine the temporal size for network input based on optical flow energy, and develop a volumetric pyramid pooling layer to deal with input clips of arbitrary sizes. We demonstrate the advantages of DANN on two benchmarks HMDB51 and UCF101 and report competitive or superior results to the state-of-the-art. |
Tasks | Optical Flow Estimation, Temporal Action Localization |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6335-deep-alternative-neural-network-exploring-contexts-as-early-as-possible-for-action-recognition |
http://papers.nips.cc/paper/6335-deep-alternative-neural-network-exploring-contexts-as-early-as-possible-for-action-recognition.pdf | |
PWC | https://paperswithcode.com/paper/deep-alternative-neural-network-exploring |
Repo | |
Framework | |
Multi-source named entity typing for social media
Title | Multi-source named entity typing for social media |
Authors | Reuth Vexler, Einat Minkov |
Abstract | |
Tasks | Entity Linking, Entity Typing, Named Entity Recognition, Question Answering, Relation Extraction |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2702/ |
https://www.aclweb.org/anthology/W16-2702 | |
PWC | https://paperswithcode.com/paper/multi-source-named-entity-typing-for-social |
Repo | |
Framework | |
Intra-Sentential Subject Zero Anaphora Resolution using Multi-Column Convolutional Neural Network
Title | Intra-Sentential Subject Zero Anaphora Resolution using Multi-Column Convolutional Neural Network |
Authors | Ryu Iida, Kentaro Torisawa, Jong-Hoon Oh, Canasai Kruengkrai, Julien Kloetzer |
Abstract | |
Tasks | Machine Translation |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/D16-1132/ |
https://www.aclweb.org/anthology/D16-1132 | |
PWC | https://paperswithcode.com/paper/intra-sentential-subject-zero-anaphora |
Repo | |
Framework | |
Broad Twitter Corpus: A Diverse Named Entity Recognition Resource
Title | Broad Twitter Corpus: A Diverse Named Entity Recognition Resource |
Authors | Leon Derczynski, Kalina Bontcheva, Ian Roberts |
Abstract | One of the main obstacles, hampering method development and comparative evaluation of named entity recognition in social media, is the lack of a sizeable, diverse, high quality annotated corpus, analogous to the CoNLL{'}2003 news dataset. For instance, the biggest Ritter tweet corpus is only 45,000 tokens {–} a mere 15{%} the size of CoNLL{'}2003. Another major shortcoming is the lack of temporal, geographic, and author diversity. This paper introduces the Broad Twitter Corpus (BTC), which is not only significantly bigger, but sampled across different regions, temporal periods, and types of Twitter users. The gold-standard named entity annotations are made by a combination of NLP experts and crowd workers, which enables us to harness crowd recall while maintaining high quality. We also measure the entity drift observed in our dataset (i.e. how entity representation varies over time), and compare to newswire. The corpus is released openly, including source text and intermediate annotations. |
Tasks | Named Entity Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1111/ |
https://www.aclweb.org/anthology/C16-1111 | |
PWC | https://paperswithcode.com/paper/broad-twitter-corpus-a-diverse-named-entity |
Repo | |
Framework | |
Neural Attention Model for Classification of Sentences that Support Promoting/Suppressing Relationship
Title | Neural Attention Model for Classification of Sentences that Support Promoting/Suppressing Relationship |
Authors | Yuta Koreeda, Toshihiko Yanase, Kohsuke Yanai, Misa Sato, Yoshiki Niwa |
Abstract | |
Tasks | Argument Mining, Aspect-Based Sentiment Analysis, Decision Making, Sentiment Analysis |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2809/ |
https://www.aclweb.org/anthology/W16-2809 | |
PWC | https://paperswithcode.com/paper/neural-attention-model-for-classification-of |
Repo | |
Framework | |
Unshared Task at the 3rd Workshop on Argument Mining: Perspective Based Local Agreement and Disagreement in Online Debate
Title | Unshared Task at the 3rd Workshop on Argument Mining: Perspective Based Local Agreement and Disagreement in Online Debate |
Authors | Chantal van Son, Tommaso Caselli, Antske Fokkens, Isa Maks, Roser Morante, Lora Aroyo, Piek Vossen |
Abstract | |
Tasks | Argument Mining |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2819/ |
https://www.aclweb.org/anthology/W16-2819 | |
PWC | https://paperswithcode.com/paper/unshared-task-at-the-3rd-workshop-on-argument |
Repo | |
Framework | |
Inferring Implicit Causal Relationships in Biomedical Literature
Title | Inferring Implicit Causal Relationships in Biomedical Literature |
Authors | Halil Kilicoglu |
Abstract | |
Tasks | Drug Discovery, Named Entity Recognition |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2906/ |
https://www.aclweb.org/anthology/W16-2906 | |
PWC | https://paperswithcode.com/paper/inferring-implicit-causal-relationships-in |
Repo | |
Framework | |
Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words
Title | Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words |
Authors | Gustavo Paetzold, Lucia Specia |
Abstract | Exploring language usage through frequency analysis in large corpora is a defining feature in most recent work in corpus and computational linguistics. From a psycholinguistic perspective, however, the corpora used in these contributions are often not representative of language usage: they are either domain-specific, limited in size, or extracted from unreliable sources. In an effort to address this limitation, we introduce SubIMDB, a corpus of everyday language spoken text we created which contains over 225 million words. The corpus was extracted from 38,102 subtitles of family, comedy and children movies and series, and is the first sizeable structured corpus of subtitles made available. Our experiments show that word frequency norms extracted from this corpus are more effective than those from well-known norms such as Kucera-Francis, HAL and SUBTLEXus in predicting various psycholinguistic properties of words, such as lexical decision times, familiarity, age of acquisition and simplicity. We also provide evidence that contradict the long-standing assumption that the ideal size for a corpus can be determined solely based on how well its word frequencies correlate with lexical decision times. |
Tasks | Text Simplification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1157/ |
https://www.aclweb.org/anthology/C16-1157 | |
PWC | https://paperswithcode.com/paper/collecting-and-exploring-everyday-language |
Repo | |
Framework | |
什麼時候「認真就輸了」?——語料庫中「認真」一詞的語意變化(Do We Lose When Being Serious? —Change in Meaning of the Word ``Renzen(認真)’’ in Corpora)
Title | 什麼時候「認真就輸了」?——語料庫中「認真」一詞的語意變化(Do We Lose When Being Serious? —Change in Meaning of the Word ``Renzen(認真)’’ in Corpora) | |
Authors | Pei-Yi Chen, Siaw-Fong Chung |
Abstract | |
Tasks | |
Published | 2016-10-01 |
URL | https://www.aclweb.org/anthology/O16-1008/ |
https://www.aclweb.org/anthology/O16-1008 | |
PWC | https://paperswithcode.com/paper/aeo14aaeacae14-aoai14aaeaaoa-aeacaa-eceaeado |
Repo | |
Framework | |
Automatic Biomedical Term Polysemy Detection
Title | Automatic Biomedical Term Polysemy Detection |
Authors | Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche, Maguelonne Teisseire |
Abstract | Polysemy is the capacity for a word to have multiple meanings. Polysemy detection is a first step for Word Sense Induction (WSI), which allows to find different meanings for a term. The polysemy detection is also important for information extraction (IE) systems. In addition, the polysemy detection is important for building/enriching terminologies and ontologies. In this paper, we present a novel approach to detect if a biomedical term is polysemic, with the long term goal of enriching biomedical ontologies. This approach is based on the extraction of new features. In this context we propose to extract features following two manners: (i) extracted directly from the text dataset, and (ii) from an induced graph. Our method obtains an Accuracy and F-Measure of 0.978. |
Tasks | Word Sense Induction |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1266/ |
https://www.aclweb.org/anthology/L16-1266 | |
PWC | https://paperswithcode.com/paper/automatic-biomedical-term-polysemy-detection |
Repo | |
Framework | |