July 26, 2019

2100 words 10 mins read

Paper Group NANR 49

Paper Group NANR 49

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque. A Generative Model of Phonotactics. Developing a web-based workbook for English supporting the interaction of students and teachers. Communication with Robots using Multilayer Recurrent Networks. Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier …

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque

Title Rule-Based Translation of Spanish Verb-Noun Combinations into Basque
Authors Uxoa I{~n}urrieta, Itziar Aduriz, Arantza D{'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola
Abstract This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being evidently better according to human evaluators.
Tasks Machine Translation
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1720/
PDF https://www.aclweb.org/anthology/W17-1720
PWC https://paperswithcode.com/paper/rule-based-translation-of-spanish-verb-noun

A Generative Model of Phonotactics

Title A Generative Model of Phonotactics
Authors Richard Futrell, Adam Albright, Peter Graff, Timothy J. O{'}Donnell
Abstract We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2007; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models{'} ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.
Published 2017-01-01
URL https://www.aclweb.org/anthology/Q17-1006/
PDF https://www.aclweb.org/anthology/Q17-1006
PWC https://paperswithcode.com/paper/a-generative-model-of-phonotactics

Developing a web-based workbook for English supporting the interaction of students and teachers

Title Developing a web-based workbook for English supporting the interaction of students and teachers
Authors Bj{"o}rn Rudzewitz, Ramon Ziai, Kordula De Kuthy, Detmar Meurers
Tasks Language Acquisition
Published 2017-05-01
URL https://www.aclweb.org/anthology/W17-0305/
PDF https://www.aclweb.org/anthology/W17-0305
PWC https://paperswithcode.com/paper/developing-a-web-based-workbook-for-english

Communication with Robots using Multilayer Recurrent Networks

Title Communication with Robots using Multilayer Recurrent Networks
Authors Bed{\v{r}}ich Pi{\v{s}}l, David Mare{\v{c}}ek
Abstract In this paper, we describe an improvement on the task of giving instructions to robots in a simulated block world using unrestricted natural language commands.
Tasks Tokenization
Published 2017-08-01
URL https://www.aclweb.org/anthology/W17-2806/
PDF https://www.aclweb.org/anthology/W17-2806
PWC https://paperswithcode.com/paper/communication-with-robots-using-multilayer

Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier Composition

Title Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier Composition
Authors Ellie Pavlick, Marius Pa{\c{s}}ca
Abstract We present a method for populating fine-grained classes (e.g., {``}1950s American jazz musicians{''}) with instances (e.g., Charles Mingus ). While state-of-the-art methods tend to treat class labels as single lexical units, the proposed method considers each of the individual modifiers in the class label relative to the head. An evaluation on the task of reconstructing Wikipedia category pages demonstrates a {\textgreater}10 point increase in AUC, over a strong baseline relying on widely-used Hearst patterns. |
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1192/
PDF https://www.aclweb.org/anthology/P17-1192
PWC https://paperswithcode.com/paper/identifying-1950s-american-jazz-musicians

Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths

Title Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths
Authors Fei Cheng, Yusuke Miyao
Abstract Temporal relation classification is becoming an active research field. Lots of methods have been proposed, while most of them focus on extracting features from external resources. Less attention has been paid to a significant advance in a closely related task: relation extraction. In this work, we borrow a state-of-the-art method in relation extraction by adopting bidirectional long short-term memory (Bi-LSTM) along dependency paths (DP). We make a {``}common root{''} assumption to extend DP representations of cross-sentence links. In the final comparison to two state-of-the-art systems on TimeBank-Dense, our model achieves comparable performance, without using external knowledge, as well as manually annotated attributes of entities (class, tense, polarity, etc.). |
Tasks Question Answering, Relation Classification, Relation Extraction
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2001/
PDF https://www.aclweb.org/anthology/P17-2001
PWC https://paperswithcode.com/paper/classifying-temporal-relations-by

A KL-LUCB algorithm for Large-Scale Crowdsourcing

Title A KL-LUCB algorithm for Large-Scale Crowdsourcing
Authors Ervin Tanczos, Robert Nowak, Bob Mankoff
Abstract This paper focuses on best-arm identification in multi-armed bandits with bounded rewards. We develop an algorithm that is a fusion of lil-UCB and KL-LUCB, offering the best qualities of the two algorithms in one method. This is achieved by proving a novel anytime confidence bound for the mean of bounded distributions, which is the analogue of the LIL-type bounds recently developed for sub-Gaussian distributions. We corroborate our theoretical results with numerical experiments based on the New Yorker Cartoon Caption Contest.
Tasks Multi-Armed Bandits
Published 2017-12-01
URL http://papers.nips.cc/paper/7171-a-kl-lucb-algorithm-for-large-scale-crowdsourcing
PDF http://papers.nips.cc/paper/7171-a-kl-lucb-algorithm-for-large-scale-crowdsourcing.pdf
PWC https://paperswithcode.com/paper/a-kl-lucb-algorithm-for-large-scale

Ultra-Concise Multi-genre Summarisation of Web2.0: towards Intelligent Content Generation

Title Ultra-Concise Multi-genre Summarisation of Web2.0: towards Intelligent Content Generation
Authors Elena Lloret, Ester Boldrini, Patricio Mart{'\i}nez-Barco, Manuel Palomar
Abstract The electronic Word of Mouth has become the most powerful communication channel thanks to the wide usage of the Social Media. Our research proposes an approach towards the production of automatic ultra-concise summaries from multiple Web 2.0 sources. We exploit user-generated content from reviews and microblogs in different domains, and compile and analyse four types of ultra-concise summaries: a)positive information, b) negative information; c) both or d) objective information. The appropriateness and usefulness of our model is demonstrated by its successful results and great potential in real-life applications, thus meaning a relevant advancement of the state-of-the-art approaches.
Tasks Information Retrieval, Opinion Mining
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1006/
PDF https://www.aclweb.org/anthology/W17-1006
PWC https://paperswithcode.com/paper/ultra-concise-multi-genre-summarisation-of

IIIT-H at IJCNLP-2017 Task 4: Customer Feedback Analysis using Machine Learning and Neural Network Approaches

Title IIIT-H at IJCNLP-2017 Task 4: Customer Feedback Analysis using Machine Learning and Neural Network Approaches
Authors D, Prathyusha a, Pruthwik Mishra, Silpa Kanneganti, Soujanya Lanka
Abstract The IJCNLP 2017 shared task on Customer Feedback Analysis focuses on classifying customer feedback into one of a predefined set of categories or classes. In this paper, we describe our approach to this problem and the results on four languages, i.e. English, French, Japanese and Spanish. Our system implemented a bidirectional LSTM (Graves and Schmidhuber, 2005) using pre-trained glove (Pennington et al., 2014) and fastText (Joulin et al., 2016) embeddings, and SVM (Cortes and Vapnik, 1995) with TF-IDF vectors for classifying the feedback data which is described in the later sections. We also tried different machine learning techniques and compared the results in this paper. Out of the 12 participating teams, our systems obtained 0.65, 0.86, 0.70 and 0.56 exact accuracy score in English, Spanish, French and Japanese respectively. We observed that our systems perform better than the baseline systems in three languages while we match the baseline accuracy for Japanese on our submitted systems. We noticed significant improvements in Japanese in later experiments, matching the highest performing system that was submitted in the shared task, which we will discuss in this paper.
Published 2017-12-01
URL https://www.aclweb.org/anthology/I17-4026/
PDF https://www.aclweb.org/anthology/I17-4026
PWC https://paperswithcode.com/paper/iiit-h-at-ijcnlp-2017-task-4-customer

Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter

Title Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter
Authors Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, Nathan Hodas
Abstract Pew research polls report 62 percent of U.S. adults get news on social media (Gottfried and Shearer, 2016). In a December poll, 64 percent of U.S. adults said that {}made-up news{''} has caused a {}great deal of confusion{''} about the facts of current events (Barthel et al., 2016). Fabricated stories in social media, ranging from deliberate propaganda to hoaxes and satire, contributes to this confusion in addition to having serious effects on global stability. In this work we build predictive models to classify 130 thousand news posts as suspicious or verified, and predict four sub-types of suspicious news {–} satire, hoaxes, clickbait and propaganda. We show that neural network models trained on tweet content and social network interactions outperform lexical models. Unlike previous work on deception detection, we find that adding syntax and grammar features to our models does not improve performance. Incorporating linguistic features improves classification results, however, social interaction features are most informative for finer-grained separation between four types of suspicious news posts.
Tasks Deception Detection
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2102/
PDF https://www.aclweb.org/anthology/P17-2102
PWC https://paperswithcode.com/paper/separating-facts-from-fiction-linguistic

Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks

Title Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks
Authors Anastassia Loukina, Nitin Madnani, Aoife Cahill
Abstract We consider the automatic scoring of a task for which both the content of the response as well its spoken fluency are important. We combine features from a text-only content scoring system originally designed for written responses with several categories of acoustic features. Although adding any single category of acoustic features to the text-only system on its own does not significantly improve performance, adding all acoustic features together does yield a small but significant improvement. These results are consistent for responses to open-ended questions and to questions focused on some given source material.
Tasks Speech Recognition
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-4609/
PDF https://www.aclweb.org/anthology/W17-4609
PWC https://paperswithcode.com/paper/speech-and-text-driven-features-for-automated

Taking into account Inter-sentence Similarity for Update Summarization

Title Taking into account Inter-sentence Similarity for Update Summarization
Authors Ma{^a}li Mnasri, Ga{"e}l de Chalendar, Olivier Ferret
Abstract Following Gillick and Favre (2009), a lot of work about extractive summarization has modeled this task by associating two contrary constraints: one aims at maximizing the coverage of the summary with respect to its information content while the other represents its size limit. In this context, the notion of redundancy is only implicitly taken into account. In this article, we extend the framework defined by Gillick and Favre (2009) by examining how and to what extent integrating semantic sentence similarity into an update summarization system can improve its results. We show more precisely the impact of this strategy through evaluations performed on DUC 2007 and TAC 2008 and 2009 datasets.
Tasks Document Summarization, Multi-Document Summarization, Natural Language Inference, Topic Models
Published 2017-11-01
URL https://www.aclweb.org/anthology/I17-2035/
PDF https://www.aclweb.org/anthology/I17-2035
PWC https://paperswithcode.com/paper/taking-into-account-inter-sentence-similarity

Toward a Web-based Speech Corpus for Algerian Dialectal Arabic Varieties

Title Toward a Web-based Speech Corpus for Algerian Dialectal Arabic Varieties
Authors Soumia Bougrine, Aicha Chorana, Abdallah Lakhdari, Hadda Cherroun
Abstract The success of machine learning for automatic speech processing has raised the need for large scale datasets. However, collecting such data is often a challenging task as it implies significant investment involving time and money cost. In this paper, we devise a recipe for building largescale Speech Corpora by harnessing Web resources namely YouTube, other Social Media, Online Radio and TV. We illustrate our methodology by building KALAM{'}DZ, An Arabic Spoken corpus dedicated to Algerian dialectal varieties. The preliminary version of our dataset covers all major Algerian dialects. In addition, we make sure that this material takes into account numerous aspects that foster its richness. In fact, we have targeted various speech topics. Some automatic and manual annotations are provided. They gather useful information related to the speakers and sub-dialect information at the utterance level. Our corpus encompasses the 8 major Algerian Arabic sub-dialects with 4881 speakers and more than 104.4 hours segmented in utterances of at least 6 s.
Tasks Speech Recognition, Speech Synthesis
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1317/
PDF https://www.aclweb.org/anthology/W17-1317
PWC https://paperswithcode.com/paper/toward-a-web-based-speech-corpus-for-algerian

Revising the METU-Sabanc\i Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages

Title Revising the METU-Sabanc\i Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages
Authors Alicia Burga, Alp {"O}ktem, Leo Wanner
Tasks Language Modelling
Published 2017-09-01
URL https://www.aclweb.org/anthology/W17-6506/
PDF https://www.aclweb.org/anthology/W17-6506
PWC https://paperswithcode.com/paper/revising-the-metu-sabanca-turkish-treebank-an

Towards Never Ending Language Learning for Morphologically Rich Languages

Title Towards Never Ending Language Learning for Morphologically Rich Languages
Authors Kseniya Buraya, Lidia Pivovarova, Sergey Budkov, Andrey Filchenkov
Abstract This work deals with ontology learning from unstructured Russian text. We implement one of components Never Ending Language Learner and introduce the algorithm extensions aimed to gather specificity of morphologicaly rich free-word-order language. We demonstrate that this method may be successfully applied to Russian data. In addition we perform several additional experiments comparing different settings of the training process. We demonstrate that utilizing of morphological features significantly improves the system precision while using of seed patterns helps to improve the coverage.
Tasks Information Retrieval
Published 2017-04-01
URL https://www.aclweb.org/anthology/W17-1417/
PDF https://www.aclweb.org/anthology/W17-1417
PWC https://paperswithcode.com/paper/towards-never-ending-language-learning-for
comments powered by Disqus