May 4, 2019

1822 words 9 mins read

Paper Group NANR 182

Detecting Sentence Boundaries in Sanskrit Texts. Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games. beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data. Automatically Extracting Topical Components for a Response-to-Text Writing Assessment. Vocal Pathologies Detection and Mispronounced Phonemes …

Detecting Sentence Boundaries in Sanskrit Texts


Title	Detecting Sentence Boundaries in Sanskrit Texts
Authors	Oliver Hellwig
Abstract	The paper applies a deep recurrent neural network to the task of sentence boundary detection in Sanskrit, an important, yet underresourced ancient Indian language. The deep learning approach improves the F scores set by a metrical baseline and by a Conditional Random Field classifier by more than 10{%}.
Tasks	Boundary Detection
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1028/
PDF	https://www.aclweb.org/anthology/C16-1028
PWC	https://paperswithcode.com/paper/detecting-sentence-boundaries-in-sanskrit
Repo
Framework

Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games


Title	Effects of Semantic Relatedness between Setups and Punchlines in Twitter Hashtag Games
Authors	Andrew Cattle, Xiaojuan Ma
Abstract	This paper explores humour recognition for Twitter-based hashtag games. Given their popularity, frequency, and relatively formulaic nature, these games make a good target for computational humour research and can leverage Twitter likes and retweets as humour judgments. In this work, we use pair-wise relative humour judgments to examine several measures of semantic relatedness between setups and punchlines on a hashtag game corpus we collected and annotated. Results show that perplexity, Normalized Google Distance, and free-word association-based features are all useful in identifying {``}funnier{''} hashtag game responses. In fact, we provide empirical evidence that funnier punchlines tend to be more obscure, although more obscure punchlines are not necessarily rated funnier. Furthermore, the asymmetric nature of free-word association features allows us to see that while punchlines should be harder to predict given a setup, they should also be relatively easy to understand in context. \|
Tasks	Language Acquisition, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4308/
PDF	https://www.aclweb.org/anthology/W16-4308
PWC	https://paperswithcode.com/paper/effects-of-semantic-relatedness-between
Repo
Framework

beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data


Title	beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data
Authors	Valentina Zantedeschi, Rémi Emonet, Marc Sebban
Abstract	During the past few years, the machine learning community has paid attention to developping new methods for learning from weakly labeled data. This field covers different settings like semi-supervised learning, learning with label proportions, multi-instance learning, noise-tolerant learning, etc. This paper presents a generic framework to deal with these weakly labeled scenarios. We introduce the beta-risk as a generalized formulation of the standard empirical risk based on surrogate margin-based loss functions. This risk allows us to express the reliability on the labels and to derive different kinds of learning algorithms. We specifically focus on SVMs and propose a soft margin beta-svm algorithm which behaves better that the state of the art.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6156-beta-risk-a-new-surrogate-risk-for-learning-from-weakly-labeled-data
PDF	http://papers.nips.cc/paper/6156-beta-risk-a-new-surrogate-risk-for-learning-from-weakly-labeled-data.pdf
PWC	https://paperswithcode.com/paper/beta-risk-a-new-surrogate-risk-for-learning
Repo
Framework

Automatically Extracting Topical Components for a Response-to-Text Writing Assessment


Title	Automatically Extracting Topical Components for a Response-to-Text Writing Assessment
Authors	Zahra Rahimi, Diane Litman
Abstract
Tasks	Semantic Textual Similarity
Published	2016-06-01
URL	https://www.aclweb.org/anthology/W16-0532/
PDF	https://www.aclweb.org/anthology/W16-0532
PWC	https://paperswithcode.com/paper/automatically-extracting-topical-components
Repo
Framework

Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech


Title	Vocal Pathologies Detection and Mispronounced Phonemes Identification: Case of Arabic Continuous Speech
Authors	Naim Terbeh, Mounir Zrigui
Abstract	We propose in this work a novel acoustic phonetic study for Arabic people suffering from language disabilities and non-native learners of Arabic language to classify Arabic continuous speech to pathological or healthy and to identify phonemes that pose pronunciation problems (case of pathological speeches). The main idea can be summarized in comparing between the phonetic model reference to Arabic spoken language and that proper to concerned speaker. For this task, we use techniques of automatic speech processing like forced alignment and artificial neural network (ANN) (Basheer, 2000). Based on a test corpus containing 100 speech sequences, recorded by different speakers (healthy/pathological speeches and native/foreign speakers), we attain 97{%} as classification rate. Algorithms used in identifying phonemes that pose pronunciation problems show high efficiency: we attain an identification rate of 100{%}.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1334/
PDF	https://www.aclweb.org/anthology/L16-1334
PWC	https://paperswithcode.com/paper/vocal-pathologies-detection-and-mispronounced
Repo
Framework

Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian


Title	Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian
Authors	Chenchen Ding, Masao Utiyama, Eiichiro Sumita
Abstract	This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.
Tasks	Machine Translation, Word Alignment
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4614/
PDF	https://www.aclweb.org/anthology/W16-4614
PWC	https://paperswithcode.com/paper/similar-southeast-asian-languages-corpus
Repo
Framework

Building a dictionary of lexical variants for phenotype descriptors


Title	Building a dictionary of lexical variants for phenotype descriptors
Authors	Simon Kocbek, Tudor Groza
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-2925/
PDF	https://www.aclweb.org/anthology/W16-2925
PWC	https://paperswithcode.com/paper/building-a-dictionary-of-lexical-variants-for
Repo
Framework

Optimization for Statistical Machine Translation: A Survey


Title	Optimization for Statistical Machine Translation: A Survey
Authors	Graham Neubig, Taro Watanabe
Abstract
Tasks	Machine Translation
Published	2016-03-01
URL	https://www.aclweb.org/anthology/J16-1001/
PDF	https://www.aclweb.org/anthology/J16-1001
PWC	https://paperswithcode.com/paper/optimization-for-statistical-machine
Repo
Framework

Online Learning for Statistical Machine Translation


Title	Online Learning for Statistical Machine Translation
Authors	Daniel Ortiz-Mart{'\i}nez
Abstract
Tasks	Machine Translation
Published	2016-03-01
URL	https://www.aclweb.org/anthology/J16-1004/
PDF	https://www.aclweb.org/anthology/J16-1004
PWC	https://paperswithcode.com/paper/online-learning-for-statistical-machine
Repo
Framework

Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?


Title	Polarity Lexicon Building: to what Extent Is the Manual Effort Worth?
Authors	I{~n}aki San Vicente, Xabier Saralegi
Abstract	Polarity lexicons are a basic resource for analyzing the sentiments and opinions expressed in texts in an automated way. This paper explores three methods to construct polarity lexicons: translating existing lexicons from other languages, extracting polarity lexicons from corpora, and annotating sentiments Lexical Knowledge Bases. Each of these methods require a different degree of human effort. We evaluate how much manual effort is needed and to what extent that effort pays in terms of performance improvement. Experiment setup includes generating lexicons for Basque, and evaluating them against gold standard datasets in different domains. Results show that extracting polarity lexicons from corpora is the best solution for achieving a good performance with reasonable human effort.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1149/
PDF	https://www.aclweb.org/anthology/L16-1149
PWC	https://paperswithcode.com/paper/polarity-lexicon-building-to-what-extent-is
Repo
Framework

When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation


Title	When to Plummet and When to Soar: Corpus Based Verb Selection for Natural Language Generation
Authors	Charese Smiley, Vassilis Plachouras, Frank Schilder, Hiroko Bretz, Jochen Leidner, Dezhao Song
Abstract
Tasks	Text Generation
Published	2016-09-01
URL	https://www.aclweb.org/anthology/W16-6606/
PDF	https://www.aclweb.org/anthology/W16-6606
PWC	https://paperswithcode.com/paper/when-to-plummet-and-when-to-soar-corpus-based
Repo
Framework

Improving PAC Exploration Using the Median Of Means


Title	Improving PAC Exploration Using the Median Of Means
Authors	Jason Pazis, Ronald E. Parr, Jonathan P. How
Abstract	We present the first application of the median of means in a PAC exploration algorithm for MDPs. Using the median of means allows us to significantly reduce the dependence of our bounds on the range of values that the value function can take, while introducing a dependence on the (potentially much smaller) variance of the Bellman operator. Additionally, our algorithm is the first algorithm with PAC bounds that can be applied to MDPs with unbounded rewards.
Tasks
Published	2016-12-01
URL	http://papers.nips.cc/paper/6577-improving-pac-exploration-using-the-median-of-means
PDF	http://papers.nips.cc/paper/6577-improving-pac-exploration-using-the-median-of-means.pdf
PWC	https://paperswithcode.com/paper/improving-pac-exploration-using-the-median-of
Repo
Framework

Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health


Title	Towards Automatically Classifying Depressive Symptoms from Twitter Data for Population Health
Authors	Danielle L. Mowery, Albert Park, Craig Bryan, Mike Conway
Abstract	Major depressive disorder, a debilitating and burdensome disease experienced by individuals worldwide, can be defined by several depressive symptoms (e.g., anhedonia (inability to feel pleasure), depressed mood, difficulty concentrating, etc.). Individuals often discuss their experiences with depression symptoms on public social media platforms like Twitter, providing a potentially useful data source for monitoring population-level mental health risk factors. In a step towards developing an automated method to estimate the prevalence of symptoms associated with major depressive disorder over time in the United States using Twitter, we developed classifiers for discerning whether a Twitter tweet represents no evidence of depression or evidence of depression. If there was evidence of depression, we then classified whether the tweet contained a depressive symptom and if so, which of three subtypes: depressed mood, disturbed sleep, or fatigue or loss of energy. We observed that the most accurate classifiers could predict classes with high-to-moderate F1-score performances for no evidence of depression (85), evidence of depression (52), and depressive symptoms (49). We report moderate F1-scores for depressive symptoms ranging from 75 (fatigue or loss of energy) to 43 (disturbed sleep) to 35 (depressed mood). Our work demonstrates baseline approaches for automatically encoding Twitter data with granular depressive symptoms associated with major depressive disorder.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4320/
PDF	https://www.aclweb.org/anthology/W16-4320
PWC	https://paperswithcode.com/paper/towards-automatically-classifying-depressive
Repo
Framework

Sampling for Bayesian Program Learning


Title	Sampling for Bayesian Program Learning
Authors	Kevin Ellis, Armando Solar-Lezama, Josh Tenenbaum
Abstract	Towards learning programs from data, we introduce the problem of sampling programs from posterior distributions conditioned on that data. Within this setting, we propose an algorithm that uses a symbolic solver to efficiently sample programs. The proposal combines constraint-based program synthesis with sampling via random parity constraints. We give theoretical guarantees on how well the samples approximate the true posterior, and have empirical results showing the algorithm is efficient in practice, evaluating our approach on 22 program learning problems in the domains of text editing and computer-aided programming.
Tasks	Program Synthesis
Published	2016-12-01
URL	http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning
PDF	http://papers.nips.cc/paper/6082-sampling-for-bayesian-program-learning.pdf
PWC	https://paperswithcode.com/paper/sampling-for-bayesian-program-learning
Repo
Framework

South African National Centre for Digital Language Resources


Title	South African National Centre for Digital Language Resources
Authors	Justus Roux
Abstract	This presentation introduces the imminent establishment of a new language resource infrastructure focusing on languages spoken in Southern Africa, with an eventual aim to become a hub for digital language resources within Sub-Saharan Africa. The Constitution of South Africa makes provision for 11 official languages all with equal status. The current language Resource Management Agency will be merged with the new Centre, which will have a wider focus than that of data acquisition, management and distribution. The Centre will entertain two main programs: Digitisation and Digital Humanities. The digitisation program will focus on the systematic digitisation of relevant text, speech and multi-modal data across the official languages. Relevancy will be determined by a Scientific Advisory Board. This will take place on a continuous basis through specified projects allocated to national members of the Centre, as well as through open-calls aimed at the academic as well as local communities. The digital resources will be managed and distributed through a dedicated web-based portal. The development of the Digital Humanities program will entail extensive academic support for projects implementing digital language based data. The Centre will function as an enabling research infrastructure primarily supported by national government and hosted by the North-West University.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1391/
PDF	https://www.aclweb.org/anthology/L16-1391
PWC	https://paperswithcode.com/paper/south-african-national-centre-for-digital
Repo
Framework