May 4, 2019

1822 words 9 mins read

Paper Group NANR 182

Paper Group NANR 182

Detecting Sentence Boundaries in Sanskrit Texts. Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games. beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data. Automatically Extracting Topical Components for a Response-to-Text Writing Assessment. Vocal Pathologies Detection and Mispronounced Phonemes …

Detecting Sentence Boundaries in Sanskrit Texts

Title Detecting Sentence Boundaries in Sanskrit Texts
Authors Oliver Hellwig
Abstract The paper applies a deep recurrent neural network to the task of sentence boundary detection in Sanskrit, an important, yet underresourced ancient Indian language. The deep learning approach improves the F scores set by a metrical baseline and by a Conditional Random Field classifier by more than 10{%}.
Tasks Boundary Detection
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-1028/
PDF https://www.aclweb.org/anthology/C16-1028
PWC https://paperswithcode.com/paper/detecting-sentence-boundaries-in-sanskrit
Repo
Framework

Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games

Title Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games
Authors Andrew Cattle, Xiaojuan Ma
Abstract This paper explores humour recognition for Twitter-based hashtag games. Given their popularity, frequency, and relatively formulaic nature, these games make a good target for computational humour research and can leverage Twitter likes and retweets as humour judgments. In this work, we use pair-wise relative humour judgments to examine several measures of semantic relatedness between setups and punchlines on a hashtag game corpus we collected and annotated. Results show that perplexity, Normalized Google Distance, and free-word association-based features are all useful in identifying {``}funnier{''} hashtag game responses. In fact, we provide empirical evidence that funnier punchlines tend to be more obscure, although more obscure punchlines are not necessarily rated funnier. Furthermore, the asymmetric nature of free-word association features allows us to see that while punchlines should be harder to predict given a setup, they should also be relatively easy to understand in context. |
Tasks Language Acquisition, Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4308/
PDF https://www.aclweb.org/anthology/W16-4308
PWC https://paperswithcode.com/paper/effects-of-semantic-relatedness-between
Repo
Framework

beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data

Title beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
Authors Valentina Zantedeschi, Rémi Emonet, Marc Sebban
Abstract During the past few years, the machine learning community has paid attention to developping new methods for learning from weakly labeled data. This field covers different settings like semi-supervised learning, learning with label proportions, multi-instance learning, noise-tolerant learning, etc. This paper presents a generic framework to deal with these weakly labeled scenarios. We introduce the beta-risk as a generalized formulation of the standard empirical risk based on surrogate margin-based loss functions. This risk allows us to express the reliability on the labels and to derive different kinds of learning algorithms. We specifically focus on SVMs and propose a soft margin beta-svm algorithm which behaves better that the state of the art.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6156-beta-risk-a-new-surrogate-risk-for-learning-from-weakly-labeled-data
PDF http://papers.nips.cc/paper/6156-beta-risk-a-new-surrogate-risk-for-learning-from-weakly-labeled-data.pdf
PWC https://paperswithcode.com/paper/beta-risk-a-new-surrogate-risk-for-learning
Repo
Framework

Automatically Extracting Topical Components for a Response-to-Text Writing Assessment

Title Automatically Extracting Topical Components for a Response-to-Text Writing Assessment
Authors Zahra Rahimi, Diane Litman
Abstract
Tasks Semantic Textual Similarity
Published 2016-06-01
URL https://www.aclweb.org/anthology/W16-0532/
PDF https://www.aclweb.org/anthology/W16-0532
PWC https://paperswithcode.com/paper/automatically-extracting-topical-components
Repo
Framework

Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech

Title Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
Authors Naim Terbeh, Mounir Zrigui
Abstract We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Arabic spoken language and that proper to concerned speaker. For this task, we use techniques of automatic speech processing like forced alignment and artificial neural network (ANN) (Basheer, 2000). Based on a test corpus containing 100 speech sequences, recorded by different speakers (healthy/pathological speeches and native/foreign speakers), we attain 97{%} as classification rate. Algorithms used in identifying phonemes that pose pronunciation problems show high efficiency: we attain an identification rate of 100{%}.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1334/
PDF https://www.aclweb.org/anthology/L16-1334
PWC https://paperswithcode.com/paper/vocal-pathologies-detection-and-mispronounced
Repo
Framework

Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

Title Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian
Authors Chenchen Ding, Masao Utiyama, Eiichiro Sumita
Abstract This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.
Tasks Machine Translation, Word Alignment
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4614/
PDF https://www.aclweb.org/anthology/W16-4614
PWC https://paperswithcode.com/paper/similar-southeast-asian-languages-corpus
Repo
Framework

Building a dictionary of lexical variants for phenotype descriptors

Title Building a dictionary of lexical variants for phenotype descriptors
Authors Simon Kocbek, Tudor Groza
Abstract
Tasks
Published 2016-08-01
URL https://www.aclweb.org/anthology/W16-2925/
PDF https://www.aclweb.org/anthology/W16-2925
PWC https://paperswithcode.com/paper/building-a-dictionary-of-lexical-variants-for
Repo
Framework

Optimization for Statistical Machine Translation: A Survey

Title Optimization for Statistical Machine Translation: A Survey
Authors Graham Neubig, Taro Watanabe
Abstract
Tasks Machine Translation
Published 2016-03-01
URL https://www.aclweb.org/anthology/J16-1001/
PDF https://www.aclweb.org/anthology/J16-1001
PWC https://paperswithcode.com/paper/optimization-for-statistical-machine
Repo
Framework

Online Learning for Statistical Machine Translation

Title Online Learning for Statistical Machine Translation
Authors Daniel Ortiz-Mart{'\i}nez
Abstract
Tasks Machine Translation
Published 2016-03-01
URL https://www.aclweb.org/anthology/J16-1004/
PDF https://www.aclweb.org/anthology/J16-1004
PWC https://paperswithcode.com/paper/online-learning-for-statistical-machine
Repo
Framework

Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?

Title Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
Authors I{~n}aki San Vicente, Xabier Saralegi
Abstract Polarity lexicons are a basic resource for analyzing the sentiments and opinions expressed in texts in an automated way. This paper explores three methods to construct polarity lexicons: translating existing lexicons from other languages, extracting polarity lexicons from corpora, and annotating sentiments Lexical Knowledge Bases. Each of these methods require a different degree of human effort. We evaluate how much manual effort is needed and to what extent that effort pays in terms of performance improvement. Experiment setup includes generating lexicons for Basque, and evaluating them against gold standard datasets in different domains. Results show that extracting polarity lexicons from corpora is the best solution for achieving a good performance with reasonable human effort.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1149/
PDF https://www.aclweb.org/anthology/L16-1149
PWC https://paperswithcode.com/paper/polarity-lexicon-building-to-what-extent-is
Repo
Framework

When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation

Title When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation
Authors Charese Smiley, Vassilis Plachouras, Frank Schilder, Hiroko Bretz, Jochen Leidner, Dezhao Song
Abstract
Tasks Text Generation
Published 2016-09-01
URL https://www.aclweb.org/anthology/W16-6606/
PDF https://www.aclweb.org/anthology/W16-6606
PWC https://paperswithcode.com/paper/when-to-plummet-and-when-to-soar-corpus-based
Repo
Framework

Improving PAC Exploration Using the Median Of Means

Title Improving PAC Exploration Using the Median Of Means
Authors Jason Pazis, Ronald E. Parr, Jonathan P. How
Abstract We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards.
Tasks
Published 2016-12-01
URL http://papers.nips.cc/paper/6577-improving-pac-exploration-using-the-median-of-means
PDF http://papers.nips.cc/paper/6577-improving-pac-exploration-using-the-median-of-means.pdf
PWC https://paperswithcode.com/paper/improving-pac-exploration-using-the-median-of
Repo
Framework

Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health

Title Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health
Authors Danielle L. Mowery, Albert Park, Craig Bryan, Mike Conway
Abstract Major depressive disorder, a debilitating and burdensome disease experienced by individuals worldwide, can be defined by several depressive symptoms (e.g., anhedonia (inability to feel pleasure), depressed mood, difficulty concentrating, etc.). Individuals often discuss their experiences with depression symptoms on public social media platforms like Twitter, providing a potentially useful data source for monitoring population-level mental health risk factors. In a step towards developing an automated method to estimate the prevalence of symptoms associated with major depressive disorder over time in the United States using Twitter, we developed classifiers for discerning whether a Twitter tweet represents no evidence of depression or evidence of depression. If there was evidence of depression, we then classified whether the tweet contained a depressive symptom and if so, which of three subtypes: depressed mood, disturbed sleep, or fatigue or loss of energy. We observed that the most accurate classifiers could predict classes with high-to-moderate F1-score performances for no evidence of depression (85), evidence of depression (52), and depressive symptoms (49). We report moderate F1-scores for depressive symptoms ranging from 75 (fatigue or loss of energy) to 43 (disturbed sleep) to 35 (depressed mood). Our work demonstrates baseline approaches for automatically encoding Twitter data with granular depressive symptoms associated with major depressive disorder.
Tasks
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4320/
PDF https://www.aclweb.org/anthology/W16-4320
PWC https://paperswithcode.com/paper/towards-automatically-classifying-depressive
Repo
Framework

Sampling for Bayesian Program Learning

Title Sampling for Bayesian Program Learning
Authors Kevin Ellis, Armando Solar-Lezama, Josh Tenenbaum
Abstract Towards learning programs from data, we introduce the problem of sampling programs from posterior distributions conditioned on that data. Within this setting, we propose an algorithm that uses a symbolic solver to efficiently sample programs. The proposal combines constraint-based program synthesis with sampling via random parity constraints. We give theoretical guarantees on how well the samples approximate the true posterior, and have empirical results showing the algorithm is efficient in practice, evaluating our approach on 22 program learning problems in the domains of text editing and computer-aided programming.
Tasks Program Synthesis
Published 2016-12-01
URL http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning
PDF http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning.pdf
PWC https://paperswithcode.com/paper/sampling-for-bayesian-program-learning
Repo
Framework

South African National Centre for Digital Language Resources

Title South African National Centre for Digital Language Resources
Authors Justus Roux
Abstract This presentation introduces the imminent establishment of a new language resource infrastructure focusing on languages spoken in Southern Africa, with an eventual aim to become a hub for digital language resources within Sub-Saharan Africa. The Constitution of South Africa makes provision for 11 official languages all with equal status. The current language Resource Management Agency will be merged with the new Centre, which will have a wider focus than that of data acquisition, management and distribution. The Centre will entertain two main programs: Digitisation and Digital Humanities. The digitisation program will focus on the systematic digitisation of relevant text, speech and multi-modal data across the official languages. Relevancy will be determined by a Scientific Advisory Board. This will take place on a continuous basis through specified projects allocated to national members of the Centre, as well as through open-calls aimed at the academic as well as local communities. The digital resources will be managed and distributed through a dedicated web-based portal. The development of the Digital Humanities program will entail extensive academic support for projects implementing digital language based data. The Centre will function as an enabling research infrastructure primarily supported by national government and hosted by the North-West University.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1391/
PDF https://www.aclweb.org/anthology/L16-1391
PWC https://paperswithcode.com/paper/south-african-national-centre-for-digital
Repo
Framework
comments powered by Disqus