July 26, 2019

2100 words 10 mins read

Paper Group NANR 49

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque. A Generative Model of Phonotactics. Developing a web-based workbook for English supporting the interaction of students and teachers. Communication with Robots using Multilayer Recurrent Networks. Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier …

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque


Title	Rule-Based Translation of Spanish Verb-Noun Combinations into Basque
Authors	Uxoa I{~n}urrieta, Itziar Aduriz, Arantza D{'\i}az de Ilarraza, Gorka Labaka, Kepa Sarasola
Abstract	This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for Spanish-Basque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being evidently better according to human evaluators.
Tasks	Machine Translation
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1720/
PDF	https://www.aclweb.org/anthology/W17-1720
PWC	https://paperswithcode.com/paper/rule-based-translation-of-spanish-verb-noun
Repo
Framework

A Generative Model of Phonotactics


Title	A Generative Model of Phonotactics
Authors	Richard Futrell, Adam Albright, Peter Graff, Timothy J. O{'}Donnell
Abstract	We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2007; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models{'} ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/Q17-1006/
PDF	https://www.aclweb.org/anthology/Q17-1006
PWC	https://paperswithcode.com/paper/a-generative-model-of-phonotactics
Repo
Framework

Developing a web-based workbook for English supporting the interaction of students and teachers


Title	Developing a web-based workbook for English supporting the interaction of students and teachers
Authors	Bj{"o}rn Rudzewitz, Ramon Ziai, Kordula De Kuthy, Detmar Meurers
Abstract
Tasks	Language Acquisition
Published	2017-05-01
URL	https://www.aclweb.org/anthology/W17-0305/
PDF	https://www.aclweb.org/anthology/W17-0305
PWC	https://paperswithcode.com/paper/developing-a-web-based-workbook-for-english
Repo
Framework

Communication with Robots using Multilayer Recurrent Networks


Title	Communication with Robots using Multilayer Recurrent Networks
Authors	Bed{\v{r}}ich Pi{\v{s}}l, David Mare{\v{c}}ek
Abstract	In this paper, we describe an improvement on the task of giving instructions to robots in a simulated block world using unrestricted natural language commands.
Tasks	Tokenization
Published	2017-08-01
URL	https://www.aclweb.org/anthology/W17-2806/
PDF	https://www.aclweb.org/anthology/W17-2806
PWC	https://paperswithcode.com/paper/communication-with-robots-using-multilayer
Repo
Framework

Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier Composition


Title	Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier Composition
Authors	Ellie Pavlick, Marius Pa{\c{s}}ca
Abstract	We present a method for populating fine-grained classes (e.g., {``}1950s American jazz musicians{''}) with instances (e.g., Charles Mingus ). While state-of-the-art methods tend to treat class labels as single lexical units, the proposed method considers each of the individual modifiers in the class label relative to the head. An evaluation on the task of reconstructing Wikipedia category pages demonstrates a {\textgreater}10 point increase in AUC, over a strong baseline relying on widely-used Hearst patterns. \|
Tasks
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1192/
PDF	https://www.aclweb.org/anthology/P17-1192
PWC	https://paperswithcode.com/paper/identifying-1950s-american-jazz-musicians
Repo
Framework

Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths


Title	Classifying Temporal Relations by Bidirectional LSTM over Dependency Paths
Authors	Fei Cheng, Yusuke Miyao
Abstract	Temporal relation classification is becoming an active research field. Lots of methods have been proposed, while most of them focus on extracting features from external resources. Less attention has been paid to a significant advance in a closely related task: relation extraction. In this work, we borrow a state-of-the-art method in relation extraction by adopting bidirectional long short-term memory (Bi-LSTM) along dependency paths (DP). We make a {``}common root{''} assumption to extend DP representations of cross-sentence links. In the final comparison to two state-of-the-art systems on TimeBank-Dense, our model achieves comparable performance, without using external knowledge, as well as manually annotated attributes of entities (class, tense, polarity, etc.). \|
Tasks	Question Answering, Relation Classification, Relation Extraction
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2001/
PDF	https://www.aclweb.org/anthology/P17-2001
PWC	https://paperswithcode.com/paper/classifying-temporal-relations-by
Repo
Framework

A KL-LUCB algorithm for Large-Scale Crowdsourcing


Title	A KL-LUCB algorithm for Large-Scale Crowdsourcing
Authors	Ervin Tanczos, Robert Nowak, Bob Mankoff
Abstract	This paper focuses on best-arm identification in multi-armed bandits with bounded rewards. We develop an algorithm that is a fusion of lil-UCB and KL-LUCB, offering the best qualities of the two algorithms in one method. This is achieved by proving a novel anytime confidence bound for the mean of bounded distributions, which is the analogue of the LIL-type bounds recently developed for sub-Gaussian distributions. We corroborate our theoretical results with numerical experiments based on the New Yorker Cartoon Caption Contest.
Tasks	Multi-Armed Bandits
Published	2017-12-01
URL	http://papers.nips.cc/paper/7171-a-kl-lucb-algorithm-for-large-scale-crowdsourcing
PDF	http://papers.nips.cc/paper/7171-a-kl-lucb-algorithm-for-large-scale-crowdsourcing.pdf
PWC	https://paperswithcode.com/paper/a-kl-lucb-algorithm-for-large-scale
Repo
Framework

Ultra-Concise Multi-genre Summarisation of Web2.0: towards Intelligent Content Generation


Title	Ultra-Concise Multi-genre Summarisation of Web2.0: towards Intelligent Content Generation
Authors	Elena Lloret, Ester Boldrini, Patricio Mart{'\i}nez-Barco, Manuel Palomar
Abstract	The electronic Word of Mouth has become the most powerful communication channel thanks to the wide usage of the Social Media. Our research proposes an approach towards the production of automatic ultra-concise summaries from multiple Web 2.0 sources. We exploit user-generated content from reviews and microblogs in different domains, and compile and analyse four types of ultra-concise summaries: a)positive information, b) negative information; c) both or d) objective information. The appropriateness and usefulness of our model is demonstrated by its successful results and great potential in real-life applications, thus meaning a relevant advancement of the state-of-the-art approaches.
Tasks	Information Retrieval, Opinion Mining
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1006/
PDF	https://www.aclweb.org/anthology/W17-1006
PWC	https://paperswithcode.com/paper/ultra-concise-multi-genre-summarisation-of
Repo
Framework

IIIT-H at IJCNLP-2017 Task 4: Customer Feedback Analysis using Machine Learning and Neural Network Approaches


Title	IIIT-H at IJCNLP-2017 Task 4: Customer Feedback Analysis using Machine Learning and Neural Network Approaches
Authors	D, Prathyusha a, Pruthwik Mishra, Silpa Kanneganti, Soujanya Lanka
Abstract	The IJCNLP 2017 shared task on Customer Feedback Analysis focuses on classifying customer feedback into one of a predefined set of categories or classes. In this paper, we describe our approach to this problem and the results on four languages, i.e. English, French, Japanese and Spanish. Our system implemented a bidirectional LSTM (Graves and Schmidhuber, 2005) using pre-trained glove (Pennington et al., 2014) and fastText (Joulin et al., 2016) embeddings, and SVM (Cortes and Vapnik, 1995) with TF-IDF vectors for classifying the feedback data which is described in the later sections. We also tried different machine learning techniques and compared the results in this paper. Out of the 12 participating teams, our systems obtained 0.65, 0.86, 0.70 and 0.56 exact accuracy score in English, Spanish, French and Japanese respectively. We observed that our systems perform better than the baseline systems in three languages while we match the baseline accuracy for Japanese on our submitted systems. We noticed significant improvements in Japanese in later experiments, matching the highest performing system that was submitted in the shared task, which we will discuss in this paper.
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/I17-4026/
PDF	https://www.aclweb.org/anthology/I17-4026
PWC	https://paperswithcode.com/paper/iiit-h-at-ijcnlp-2017-task-4-customer
Repo
Framework

Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter


Title	Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter
Authors	Svitlana Volkova, Kyle Shaffer, Jin Yea Jang, Nathan Hodas
Abstract	Pew research polls report 62 percent of U.S. adults get news on social media (Gottfried and Shearer, 2016). In a December poll, 64 percent of U.S. adults said that {`}made-up news{''} has caused a {`}great deal of confusion{''} about the facts of current events (Barthel et al., 2016). Fabricated stories in social media, ranging from deliberate propaganda to hoaxes and satire, contributes to this confusion in addition to having serious effects on global stability. In this work we build predictive models to classify 130 thousand news posts as suspicious or verified, and predict four sub-types of suspicious news {–} satire, hoaxes, clickbait and propaganda. We show that neural network models trained on tweet content and social network interactions outperform lexical models. Unlike previous work on deception detection, we find that adding syntax and grammar features to our models does not improve performance. Incorporating linguistic features improves classification results, however, social interaction features are most informative for finer-grained separation between four types of suspicious news posts.
Tasks	Deception Detection
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2102/
PDF	https://www.aclweb.org/anthology/P17-2102
PWC	https://paperswithcode.com/paper/separating-facts-from-fiction-linguistic
Repo
Framework

Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks


Title	Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks
Authors	Anastassia Loukina, Nitin Madnani, Aoife Cahill
Abstract	We consider the automatic scoring of a task for which both the content of the response as well its spoken fluency are important. We combine features from a text-only content scoring system originally designed for written responses with several categories of acoustic features. Although adding any single category of acoustic features to the text-only system on its own does not significantly improve performance, adding all acoustic features together does yield a small but significant improvement. These results are consistent for responses to open-ended questions and to questions focused on some given source material.
Tasks	Speech Recognition
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-4609/
PDF	https://www.aclweb.org/anthology/W17-4609
PWC	https://paperswithcode.com/paper/speech-and-text-driven-features-for-automated
Repo
Framework

Taking into account Inter-sentence Similarity for Update Summarization


Title	Taking into account Inter-sentence Similarity for Update Summarization
Authors	Ma{^a}li Mnasri, Ga{"e}l de Chalendar, Olivier Ferret
Abstract	Following Gillick and Favre (2009), a lot of work about extractive summarization has modeled this task by associating two contrary constraints: one aims at maximizing the coverage of the summary with respect to its information content while the other represents its size limit. In this context, the notion of redundancy is only implicitly taken into account. In this article, we extend the framework defined by Gillick and Favre (2009) by examining how and to what extent integrating semantic sentence similarity into an update summarization system can improve its results. We show more precisely the impact of this strategy through evaluations performed on DUC 2007 and TAC 2008 and 2009 datasets.
Tasks	Document Summarization, Multi-Document Summarization, Natural Language Inference, Topic Models
Published	2017-11-01
URL	https://www.aclweb.org/anthology/I17-2035/
PDF	https://www.aclweb.org/anthology/I17-2035
PWC	https://paperswithcode.com/paper/taking-into-account-inter-sentence-similarity
Repo
Framework

Toward a Web-based Speech Corpus for Algerian Dialectal Arabic Varieties


Title	Toward a Web-based Speech Corpus for Algerian Dialectal Arabic Varieties
Authors	Soumia Bougrine, Aicha Chorana, Abdallah Lakhdari, Hadda Cherroun
Abstract	The success of machine learning for automatic speech processing has raised the need for large scale datasets. However, collecting such data is often a challenging task as it implies significant investment involving time and money cost. In this paper, we devise a recipe for building largescale Speech Corpora by harnessing Web resources namely YouTube, other Social Media, Online Radio and TV. We illustrate our methodology by building KALAM{'}DZ, An Arabic Spoken corpus dedicated to Algerian dialectal varieties. The preliminary version of our dataset covers all major Algerian dialects. In addition, we make sure that this material takes into account numerous aspects that foster its richness. In fact, we have targeted various speech topics. Some automatic and manual annotations are provided. They gather useful information related to the speakers and sub-dialect information at the utterance level. Our corpus encompasses the 8 major Algerian Arabic sub-dialects with 4881 speakers and more than 104.4 hours segmented in utterances of at least 6 s.
Tasks	Speech Recognition, Speech Synthesis
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1317/
PDF	https://www.aclweb.org/anthology/W17-1317
PWC	https://paperswithcode.com/paper/toward-a-web-based-speech-corpus-for-algerian
Repo
Framework

Revising the METU-Sabanc\i Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages


Title	Revising the METU-Sabanc\i Turkish Treebank: An Exercise in Surface-Syntactic Annotation of Agglutinative Languages
Authors	Alicia Burga, Alp {"O}ktem, Leo Wanner
Abstract
Tasks	Language Modelling
Published	2017-09-01
URL	https://www.aclweb.org/anthology/W17-6506/
PDF	https://www.aclweb.org/anthology/W17-6506
PWC	https://paperswithcode.com/paper/revising-the-metu-sabanca-turkish-treebank-an
Repo
Framework

Towards Never Ending Language Learning for Morphologically Rich Languages


Title	Towards Never Ending Language Learning for Morphologically Rich Languages
Authors	Kseniya Buraya, Lidia Pivovarova, Sergey Budkov, Andrey Filchenkov
Abstract	This work deals with ontology learning from unstructured Russian text. We implement one of components Never Ending Language Learner and introduce the algorithm extensions aimed to gather specificity of morphologicaly rich free-word-order language. We demonstrate that this method may be successfully applied to Russian data. In addition we perform several additional experiments comparing different settings of the training process. We demonstrate that utilizing of morphological features significantly improves the system precision while using of seed patterns helps to improve the coverage.
Tasks	Information Retrieval
Published	2017-04-01
URL	https://www.aclweb.org/anthology/W17-1417/
PDF	https://www.aclweb.org/anthology/W17-1417
PWC	https://paperswithcode.com/paper/towards-never-ending-language-learning-for
Repo
Framework