May 5, 2019

2448 words 12 mins read

Paper Group NANR 47

Paper Group NANR 47

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database. A New Feature Selection Technique Combined with ELM Feature Space for Text Classification. AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis. A Language-Independent Neural Network for Event Detection. Learning to Make Inferences in a Semantic Parsing T …

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database

Title The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
Authors Emiel van Miltenburg, Benjamin Timmermans, Lora Aroyo
Abstract This paper presents a collection of annotations (tags or keywords) for a set of 2,133 environmental sounds taken from the Freesound database (www.freesound.org). The annotations are acquired through an open-ended crowd-labeling task, in which participants were asked to provide keywords for each of three sounds. The main goal of this study is to find out (i) whether it is feasible to collect keywords for a large collection of sounds through crowdsourcing, and (ii) how people talk about sounds, and what information they can infer from hearing a sound in isolation. Our main finding is that it is not only feasible to perform crowd-labeling for a large collection of sounds, it is also very useful to highlight different aspects of the sounds that authors may fail to mention. Our data is freely available, and can be used to ground semantic models, improve search in audio databases, and to study the language of sound.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1337/
PDF https://www.aclweb.org/anthology/L16-1337
PWC https://paperswithcode.com/paper/the-vu-sound-corpus-adding-more-fine-grained
Repo
Framework

A New Feature Selection Technique Combined with ELM Feature Space for Text Classification

Title A New Feature Selection Technique Combined with ELM Feature Space for Text Classification
Authors Rajendra Kumar Roul, Pranav Rai
Abstract
Tasks Feature Selection, Text Categorization, Text Classification
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-6335/
PDF https://www.aclweb.org/anthology/W16-6335
PWC https://paperswithcode.com/paper/a-new-feature-selection-technique-combined
Repo
Framework

AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis

Title AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Authors Kevin El Haddad, H{"u}seyin {\c{C}}akmak, St{'e}phane Dupont, Thierry Dutoit
Abstract It has been shown that adding expressivity and emotional expressions to an agent{'}s communication systems would improve the interaction quality between this agent and a human user. In this paper we present a multimodal database of affect bursts, which are very short non-verbal expressions with facial, vocal, and gestural components that are highly synchronized and triggered by an identifiable event. This database contains motion capture and audio data of affect bursts representing disgust, startle and surprise recorded at three different levels of arousal each. This database is to be used for synthesis purposes in order to generate affect bursts of these emotions on a continuous arousal level scale.
Tasks Motion Capture
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1345/
PDF https://www.aclweb.org/anthology/L16-1345
PWC https://paperswithcode.com/paper/avab-dbs-an-audio-visual-affect-bursts
Repo
Framework

A Language-Independent Neural Network for Event Detection

Title A Language-Independent Neural Network for Event Detection
Authors Xiaocheng Feng, Lifu Huang, Duyu Tang, Heng Ji, Bing Qin, Ting Liu
Abstract
Tasks Feature Engineering
Published 2016-08-01
URL https://www.aclweb.org/anthology/P16-2011/
PDF https://www.aclweb.org/anthology/P16-2011
PWC https://paperswithcode.com/paper/a-language-independent-neural-network-for
Repo
Framework

Learning to Make Inferences in a Semantic Parsing Task

Title Learning to Make Inferences in a Semantic Parsing Task
Authors Kyle Richardson, Jonas Kuhn
Abstract We introduce a new approach to training a semantic parser that uses textual entailment judgements as supervision. These judgements are based on high-level inferences about whether the meaning of one sentence follows from another. When applied to an existing semantic parsing task, they prove to be a useful tool for revealing semantic distinctions and background knowledge not captured in the target representations. This information is used to improve the quality of the semantic representations being learned and to acquire generic knowledge for reasoning. Experiments are done on the benchmark Sportscaster corpus (Chen and Mooney, 2008), and a novel RTE-inspired inference dataset is introduced. On this new dataset our method strongly outperforms several strong baselines. Separately, we obtain state-of-the-art results on the original Sportscaster semantic parsing task.
Tasks Machine Translation, Natural Language Inference, Question Answering, Semantic Parsing
Published 2016-01-01
URL https://www.aclweb.org/anthology/Q16-1012/
PDF https://www.aclweb.org/anthology/Q16-1012
PWC https://paperswithcode.com/paper/learning-to-make-inferences-in-a-semantic
Repo
Framework

ASPEC: Asian Scientific Paper Excerpt Corpus

Title ASPEC: Asian Scientific Paper Excerpt Corpus
Authors Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchimoto, Masao Utiyama, Eiichiro Sumita, Sadao Kurohashi, Hitoshi Isahara
Abstract In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1350/
PDF https://www.aclweb.org/anthology/L16-1350
PWC https://paperswithcode.com/paper/aspec-asian-scientific-paper-excerpt-corpus
Repo
Framework

ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool

Title ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Authors Xiaofeng Wu, Jinhua Du, Qun Liu, Andy Way
Abstract This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a Controlled Language that is understood by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author{'}s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1352/
PDF https://www.aclweb.org/anthology/L16-1352
PWC https://paperswithcode.com/paper/prophetmt-a-tree-based-smt-driven-controlled
Repo
Framework

A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation

Title A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation
Authors Marlies van der Wees, Arianna Bisazza, Christof Monz
Abstract A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official standardization and therefore highly ambiguous, which prevents rule-based approaches from achieving good translation results. In this paper, we improve Arabizi-to-English machine translation by presenting a simple but effective Arabizi-to-Arabic transliteration pipeline that does not require knowledge by experts or native Arabic speakers. We incorporate this pipeline into a phrase-based SMT system, and show that translation quality after automatically transliterating Arabizi to Arabic yields results that are comparable to those achieved after human transliteration.
Tasks Machine Translation, Transliteration
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-3908/
PDF https://www.aclweb.org/anthology/W16-3908
PWC https://paperswithcode.com/paper/a-simple-but-effective-approach-to-improve
Repo
Framework

Sarcasm Detection : Building a Contextual Hierarchy

Title Sarcasm Detection : Building a Contextual Hierarchy
Authors Taradheesh Bali, Navjyoti Singh
Abstract The conundrum of understanding and classifying sarcasm has been dealt with by the traditional theorists as an analysis of a sarcastic utterance and the ironic situation that surrounds it. The problem with such an approach is that it is too narrow, as it is unable to sufficiently utilize the two indispensable agents in making such an utterance, viz. the speaker and the listener. It undermines the necessary context required to comprehend a sarcastic utterance. In this paper, we propose a novel approach towards understanding sarcasm in terms of the existing knowledge hierarchy between the two participants, which forms the basis of the context that both agents share. The difference in relationship of the speaker of the sarcastic utterance and the disparate audience found on social media, such as Twitter, is also captured. We then apply our model on a corpus of tweets to achieve significant results and consequently, shed light on subjective nature of context, which is contingent on the relation between the speaker and the listener.
Tasks Lexical Analysis, Sarcasm Detection, Sentiment Analysis
Published 2016-12-01
URL https://www.aclweb.org/anthology/W16-4313/
PDF https://www.aclweb.org/anthology/W16-4313
PWC https://paperswithcode.com/paper/sarcasm-detection-building-a-contextual
Repo
Framework

Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects

Title Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
Authors Diana Bogantes, Eric Rodr{'\i}guez, Alej Arauco, ro, Alej Rodr{'\i}guez, ro, Agata Savary
Abstract This paper describes a pilot study in lexical encoding of multi-word expressions (MWEs) in 4 Latin American dialects of Spanish: Costa Rican, Colombian, Mexican and Peruvian. We describe the variability of MWE usage across dialects. We adapt an existing data model to a dialect-aware encoding, so as to represent dialect-related specificities, while avoiding redundancy of the data common for all dialects. A dozen of linguistic properties of MWEs can be expressed in this model, both on the level of a whole MWE and of its individual components. We describe the resulting lexical resource containing several dozens of MWEs in four dialects and we propose a method for constructing a web corpus as a support for crowdsourcing examples of MWE occurrences. The resource is available under an open license and paves the way towards a large-scale dialect-aware language resource construction, which should prove useful in both traditional and novel NLP applications.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1358/
PDF https://www.aclweb.org/anthology/L16-1358
PWC https://paperswithcode.com/paper/towards-lexical-encoding-of-multi-word
Repo
Framework

The Trials and Tribulations of Predicting Post-Editing Productivity

Title The Trials and Tribulations of Predicting Post-Editing Productivity
Authors Lena Marg
Abstract While an increasing number of (automatic) metrics is available to assess the linguistic quality of machine translations, their interpretation remains cryptic to many users, specifically in the translation community. They are clearly useful for indicating certain overarching trends, but say little about actual improvements for translation buyers or post-editors. However, these metrics are commonly referenced when discussing pricing and models, both with translation buyers and service providers. With the aim of focusing on automatic metrics that are easier to understand for non-research users, we identified Edit Distance (or Post-Edit Distance) as a good fit. While Edit Distance as such does not express cognitive effort or time spent editing machine translation suggestions, we found that it correlates strongly with the productivity tests we performed, for various language pairs and domains. This paper aims to analyse Edit Distance and productivity data on a segment level based on data gathered over some years. Drawing from these findings, we want to then explore how Edit Distance could help in predicting productivity on new content. Some further analysis is proposed, with findings to be presented at the conference.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1004/
PDF https://www.aclweb.org/anthology/L16-1004
PWC https://paperswithcode.com/paper/the-trials-and-tribulations-of-predicting
Repo
Framework

Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.

Title Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
Authors Emer Gilmartin, Nick Campbell
Abstract Casual multiparty conversation is an understudied but very common genre of spoken interaction, whose analysis presents a number of challenges in terms of data scarcity and annotation. We describe the annotation process used on the d64 and DANS multimodal corpora of multiparty casual talk, which have been manually segmented, transcribed, annotated for laughter and disfluencies, and aligned using the Penn Aligner. We also describe a visualization tool, STAVE, developed during the annotation process, which allows long stretches of talk or indeed entire conversations to be viewed, aiding preliminary identification of features and patterns worthy of analysis. It is hoped that this tool will be of use to other researchers working in this field.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1705/
PDF https://www.aclweb.org/anthology/L16-1705
PWC https://paperswithcode.com/paper/capturing-chat-annotation-and-tools-for
Repo
Framework

Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams

Title Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams
Authors Benjamin Milde, Jonas Wacker, Stefan Radomski, Max M{"u}hlh{"a}user, Chris Biemann
Abstract In this demonstration paper we describe Ambient Search, a system that displays and retrieves documents in real time based on speech input. The system operates continuously in ambient mode, i.e. it generates speech transcriptions and identifies main keywords and keyphrases, while also querying its index to display relevant documents without explicit query. Without user intervention, the results are dynamically updated; users can choose to interact with the system at any time, employing a conversation protocol that is enriched with the ambient information gathered continuously. Our evaluation shows that Ambient Search outperforms another implicit speech-based information retrieval system. Ambient search is available as open source software.
Tasks Information Retrieval, Keyword Extraction, Speech Recognition
Published 2016-12-01
URL https://www.aclweb.org/anthology/C16-2049/
PDF https://www.aclweb.org/anthology/C16-2049
PWC https://paperswithcode.com/paper/demonstrating-ambient-search-implicit
Repo
Framework

corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora

Title corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
Authors Stephan Druskat, Volker Gast, Thomas Krause, Florian Zipser
Abstract This paper introduces an open source, interoperable generic software tool set catering for the entire workflow of creation, migration, annotation, query and analysis of multi-layer linguistic corpora. It consists of four components: Salt, a graph-based meta model and API for linguistic data, the common data model for the rest of the tool set; Pepper, a conversion tool and platform for linguistic data that can be used to convert many different linguistic formats into each other; Atomic, an extensible, platform-independent multi-layer desktop annotation software for linguistic corpora; ANNIS, a search and visualization architecture for multi-layer linguistic corpora with many different visualizations and a powerful native query language. The set was designed to solve the following issues in a multi-layer corpus workflow: Lossless data transition between tools through a common data model generic enough to allow for a potentially unlimited number of different types of annotation, conversion capabilities for different linguistic formats to cater for the processing of data from different sources and/or with existing annotations, a high level of extensibility to enhance the sustainability of the whole tool set, analysis capabilities encompassing corpus and annotation query alongside multi-faceted visualizations of all annotation layers.
Tasks
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1711/
PDF https://www.aclweb.org/anthology/L16-1711
PWC https://paperswithcode.com/paper/corpus-toolsorg-an-interoperable-generic
Repo
Framework

SubCo: A Learner Translation Corpus of Human and Machine Subtitles

Title SubCo: A Learner Translation Corpus of Human and Machine Subtitles
Authors Jos{'e} Manuel Mart{'\i}nez Mart{'\i}nez, Mihaela Vela
Abstract In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises, the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resource for both human and machine translation communities, enabling the direct comparison {–} in terms of errors and evaluation {–} between human and machine translations and post-edited machine translations.
Tasks Machine Translation
Published 2016-05-01
URL https://www.aclweb.org/anthology/L16-1357/
PDF https://www.aclweb.org/anthology/L16-1357
PWC https://paperswithcode.com/paper/subco-a-learner-translation-corpus-of-human
Repo
Framework
comments powered by Disqus