Paper Group NANR 182
Detecting Sentence Boundaries in Sanskrit Texts. Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games. beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data. Automatically Extracting Topical Components for a Response-to-Text Writing Assessment. Vocal Pathologies Detection and Mispronounced Phonemes …
Detecting Sentence Boundaries in Sanskrit Texts
Title | Detecting Sentence Boundaries in Sanskrit Texts |
Authors | Oliver Hellwig |
Abstract | The paper applies a deep recurrent neural network to the task of sentence boundary detection in Sanskrit, an important, yet underresourced ancient Indian language. The deep learning approach improves the F scores set by a metrical baseline and by a Conditional Random Field classifier by more than 10{%}. |
Tasks | Boundary Detection |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1028/ |
https://www.aclweb.org/anthology/C16-1028 | |
PWC | https://paperswithcode.com/paper/detecting-sentence-boundaries-in-sanskrit |
Repo | |
Framework | |
Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games
Title | Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games |
Authors | Andrew Cattle, Xiaojuan Ma |
Abstract | This paper explores humour recognition for Twitter-based hashtag games. Given their popularity, frequency, and relatively formulaic nature, these games make a good target for computational humour research and can leverage Twitter likes and retweets as humour judgments. In this work, we use pair-wise relative humour judgments to examine several measures of semantic relatedness between setups and punchlines on a hashtag game corpus we collected and annotated. Results show that perplexity, Normalized Google Distance, and free-word association-based features are all useful in identifying {``}funnier{''} hashtag game responses. In fact, we provide empirical evidence that funnier punchlines tend to be more obscure, although more obscure punchlines are not necessarily rated funnier. Furthermore, the asymmetric nature of free-word association features allows us to see that while punchlines should be harder to predict given a setup, they should also be relatively easy to understand in context. | |
Tasks | Language Acquisition, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4308/ |
https://www.aclweb.org/anthology/W16-4308 | |
PWC | https://paperswithcode.com/paper/effects-of-semantic-relatedness-between |
Repo | |
Framework | |
beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
Title | beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data |
Authors | Valentina Zantedeschi, Rémi Emonet, Marc Sebban |
Abstract | During the past few years, the machine learning community has paid attention to developping new methods for learning from weakly labeled data. This field covers different settings like semi-supervised learning, learning with label proportions, multi-instance learning, noise-tolerant learning, etc. This paper presents a generic framework to deal with these weakly labeled scenarios. We introduce the beta-risk as a generalized formulation of the standard empirical risk based on surrogate margin-based loss functions. This risk allows us to express the reliability on the labels and to derive different kinds of learning algorithms. We specifically focus on SVMs and propose a soft margin beta-svm algorithm which behaves better that the state of the art. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6156-beta-risk-a-new-surrogate-risk-for-learning-from-weakly-labeled-data |
http://papers.nips.cc/paper/6156-beta-risk-a-new-surrogate-risk-for-learning-from-weakly-labeled-data.pdf | |
PWC | https://paperswithcode.com/paper/beta-risk-a-new-surrogate-risk-for-learning |
Repo | |
Framework | |
Automatically Extracting Topical Components for a Response-to-Text Writing Assessment
Title | Automatically Extracting Topical Components for a Response-to-Text Writing Assessment |
Authors | Zahra Rahimi, Diane Litman |
Abstract | |
Tasks | Semantic Textual Similarity |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/W16-0532/ |
https://www.aclweb.org/anthology/W16-0532 | |
PWC | https://paperswithcode.com/paper/automatically-extracting-topical-components |
Repo | |
Framework | |
Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
Title | Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech |
Authors | Naim Terbeh, Mounir Zrigui |
Abstract | We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Arabic spoken language and that proper to concerned speaker. For this task, we use techniques of automatic speech processing like forced alignment and artificial neural network (ANN) (Basheer, 2000). Based on a test corpus containing 100 speech sequences, recorded by different speakers (healthy/pathological speeches and native/foreign speakers), we attain 97{%} as classification rate. Algorithms used in identifying phonemes that pose pronunciation problems show high efficiency: we attain an identification rate of 100{%}. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1334/ |
https://www.aclweb.org/anthology/L16-1334 | |
PWC | https://paperswithcode.com/paper/vocal-pathologies-detection-and-mispronounced |
Repo | |
Framework | |
Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian
Title | Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian |
Authors | Chenchen Ding, Masao Utiyama, Eiichiro Sumita |
Abstract | This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development. |
Tasks | Machine Translation, Word Alignment |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4614/ |
https://www.aclweb.org/anthology/W16-4614 | |
PWC | https://paperswithcode.com/paper/similar-southeast-asian-languages-corpus |
Repo | |
Framework | |
Building a dictionary of lexical variants for phenotype descriptors
Title | Building a dictionary of lexical variants for phenotype descriptors |
Authors | Simon Kocbek, Tudor Groza |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-2925/ |
https://www.aclweb.org/anthology/W16-2925 | |
PWC | https://paperswithcode.com/paper/building-a-dictionary-of-lexical-variants-for |
Repo | |
Framework | |
Optimization for Statistical Machine Translation: A Survey
Title | Optimization for Statistical Machine Translation: A Survey |
Authors | Graham Neubig, Taro Watanabe |
Abstract | |
Tasks | Machine Translation |
Published | 2016-03-01 |
URL | https://www.aclweb.org/anthology/J16-1001/ |
https://www.aclweb.org/anthology/J16-1001 | |
PWC | https://paperswithcode.com/paper/optimization-for-statistical-machine |
Repo | |
Framework | |
Online Learning for Statistical Machine Translation
Title | Online Learning for Statistical Machine Translation |
Authors | Daniel Ortiz-Mart{'\i}nez |
Abstract | |
Tasks | Machine Translation |
Published | 2016-03-01 |
URL | https://www.aclweb.org/anthology/J16-1004/ |
https://www.aclweb.org/anthology/J16-1004 | |
PWC | https://paperswithcode.com/paper/online-learning-for-statistical-machine |
Repo | |
Framework | |
Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
Title | Polarity Lexicon Building: to what Extent Is the Manual Effort Worth? |
Authors | I{~n}aki San Vicente, Xabier Saralegi |
Abstract | Polarity lexicons are a basic resource for analyzing the sentiments and opinions expressed in texts in an automated way. This paper explores three methods to construct polarity lexicons: translating existing lexicons from other languages, extracting polarity lexicons from corpora, and annotating sentiments Lexical Knowledge Bases. Each of these methods require a different degree of human effort. We evaluate how much manual effort is needed and to what extent that effort pays in terms of performance improvement. Experiment setup includes generating lexicons for Basque, and evaluating them against gold standard datasets in different domains. Results show that extracting polarity lexicons from corpora is the best solution for achieving a good performance with reasonable human effort. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1149/ |
https://www.aclweb.org/anthology/L16-1149 | |
PWC | https://paperswithcode.com/paper/polarity-lexicon-building-to-what-extent-is |
Repo | |
Framework | |
When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation
Title | When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation |
Authors | Charese Smiley, Vassilis Plachouras, Frank Schilder, Hiroko Bretz, Jochen Leidner, Dezhao Song |
Abstract | |
Tasks | Text Generation |
Published | 2016-09-01 |
URL | https://www.aclweb.org/anthology/W16-6606/ |
https://www.aclweb.org/anthology/W16-6606 | |
PWC | https://paperswithcode.com/paper/when-to-plummet-and-when-to-soar-corpus-based |
Repo | |
Framework | |
Improving PAC Exploration Using the Median Of Means
Title | Improving PAC Exploration Using the Median Of Means |
Authors | Jason Pazis, Ronald E. Parr, Jonathan P. How |
Abstract | We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards. |
Tasks | |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6577-improving-pac-exploration-using-the-median-of-means |
http://papers.nips.cc/paper/6577-improving-pac-exploration-using-the-median-of-means.pdf | |
PWC | https://paperswithcode.com/paper/improving-pac-exploration-using-the-median-of |
Repo | |
Framework | |
Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health
Title | Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health |
Authors | Danielle L. Mowery, Albert Park, Craig Bryan, Mike Conway |
Abstract | Major depressive disorder, a debilitating and burdensome disease experienced by individuals worldwide, can be defined by several depressive symptoms (e.g., anhedonia (inability to feel pleasure), depressed mood, difficulty concentrating, etc.). Individuals often discuss their experiences with depression symptoms on public social media platforms like Twitter, providing a potentially useful data source for monitoring population-level mental health risk factors. In a step towards developing an automated method to estimate the prevalence of symptoms associated with major depressive disorder over time in the United States using Twitter, we developed classifiers for discerning whether a Twitter tweet represents no evidence of depression or evidence of depression. If there was evidence of depression, we then classified whether the tweet contained a depressive symptom and if so, which of three subtypes: depressed mood, disturbed sleep, or fatigue or loss of energy. We observed that the most accurate classifiers could predict classes with high-to-moderate F1-score performances for no evidence of depression (85), evidence of depression (52), and depressive symptoms (49). We report moderate F1-scores for depressive symptoms ranging from 75 (fatigue or loss of energy) to 43 (disturbed sleep) to 35 (depressed mood). Our work demonstrates baseline approaches for automatically encoding Twitter data with granular depressive symptoms associated with major depressive disorder. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4320/ |
https://www.aclweb.org/anthology/W16-4320 | |
PWC | https://paperswithcode.com/paper/towards-automatically-classifying-depressive |
Repo | |
Framework | |
Sampling for Bayesian Program Learning
Title | Sampling for Bayesian Program Learning |
Authors | Kevin Ellis, Armando Solar-Lezama, Josh Tenenbaum |
Abstract | Towards learning programs from data, we introduce the problem of sampling programs from posterior distributions conditioned on that data. Within this setting, we propose an algorithm that uses a symbolic solver to efficiently sample programs. The proposal combines constraint-based program synthesis with sampling via random parity constraints. We give theoretical guarantees on how well the samples approximate the true posterior, and have empirical results showing the algorithm is efficient in practice, evaluating our approach on 22 program learning problems in the domains of text editing and computer-aided programming. |
Tasks | Program Synthesis |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning |
http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning.pdf | |
PWC | https://paperswithcode.com/paper/sampling-for-bayesian-program-learning |
Repo | |
Framework | |
South African National Centre for Digital Language Resources
Title | South African National Centre for Digital Language Resources |
Authors | Justus Roux |
Abstract | This presentation introduces the imminent establishment of a new language resource infrastructure focusing on languages spoken in Southern Africa, with an eventual aim to become a hub for digital language resources within Sub-Saharan Africa. The Constitution of South Africa makes provision for 11 official languages all with equal status. The current language Resource Management Agency will be merged with the new Centre, which will have a wider focus than that of data acquisition, management and distribution. The Centre will entertain two main programs: Digitisation and Digital Humanities. The digitisation program will focus on the systematic digitisation of relevant text, speech and multi-modal data across the official languages. Relevancy will be determined by a Scientific Advisory Board. This will take place on a continuous basis through specified projects allocated to national members of the Centre, as well as through open-calls aimed at the academic as well as local communities. The digital resources will be managed and distributed through a dedicated web-based portal. The development of the Digital Humanities program will entail extensive academic support for projects implementing digital language based data. The Centre will function as an enabling research infrastructure primarily supported by national government and hosted by the North-West University. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1391/ |
https://www.aclweb.org/anthology/L16-1391 | |
PWC | https://paperswithcode.com/paper/south-african-national-centre-for-digital |
Repo | |
Framework | |