May 5, 2019

2448 words 12 mins read

Paper Group NANR 47

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database. A New Feature Selection Technique Combined with ELM Feature Space for Text Classification. AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis. A Language-Independent Neural Network for Event Detection. Learning to Make Inferences in a Semantic Parsing T …

The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database


Title	The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
Authors	Emiel van Miltenburg, Benjamin Timmermans, Lora Aroyo
Abstract	This paper presents a collection of annotations (tags or keywords) for a set of 2,133 environmental sounds taken from the Freesound database (www.freesound.org). The annotations are acquired through an open-ended crowd-labeling task, in which participants were asked to provide keywords for each of three sounds. The main goal of this study is to find out (i) whether it is feasible to collect keywords for a large collection of sounds through crowdsourcing, and (ii) how people talk about sounds, and what information they can infer from hearing a sound in isolation. Our main finding is that it is not only feasible to perform crowd-labeling for a large collection of sounds, it is also very useful to highlight different aspects of the sounds that authors may fail to mention. Our data is freely available, and can be used to ground semantic models, improve search in audio databases, and to study the language of sound.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1337/
PDF	https://www.aclweb.org/anthology/L16-1337
PWC	https://paperswithcode.com/paper/the-vu-sound-corpus-adding-more-fine-grained
Repo
Framework

A New Feature Selection Technique Combined with ELM Feature Space for Text Classification


Title	A New Feature Selection Technique Combined with ELM Feature Space for Text Classification
Authors	Rajendra Kumar Roul, Pranav Rai
Abstract
Tasks	Feature Selection, Text Categorization, Text Classification
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-6335/
PDF	https://www.aclweb.org/anthology/W16-6335
PWC	https://paperswithcode.com/paper/a-new-feature-selection-technique-combined
Repo
Framework

AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis


Title	AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Authors	Kevin El Haddad, H{"u}seyin {\c{C}}akmak, St{'e}phane Dupont, Thierry Dutoit
Abstract	It has been shown that adding expressivity and emotional expressions to an agent{'}s communication systems would improve the interaction quality between this agent and a human user. In this paper we present a multimodal database of affect bursts, which are very short non-verbal expressions with facial, vocal, and gestural components that are highly synchronized and triggered by an identifiable event. This database contains motion capture and audio data of affect bursts representing disgust, startle and surprise recorded at three different levels of arousal each. This database is to be used for synthesis purposes in order to generate affect bursts of these emotions on a continuous arousal level scale.
Tasks	Motion Capture
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1345/
PDF	https://www.aclweb.org/anthology/L16-1345
PWC	https://paperswithcode.com/paper/avab-dbs-an-audio-visual-affect-bursts
Repo
Framework

A Language-Independent Neural Network for Event Detection


Title	A Language-Independent Neural Network for Event Detection
Authors	Xiaocheng Feng, Lifu Huang, Duyu Tang, Heng Ji, Bing Qin, Ting Liu
Abstract
Tasks	Feature Engineering
Published	2016-08-01
URL	https://www.aclweb.org/anthology/P16-2011/
PDF	https://www.aclweb.org/anthology/P16-2011
PWC	https://paperswithcode.com/paper/a-language-independent-neural-network-for
Repo
Framework

Learning to Make Inferences in a Semantic Parsing Task


Title	Learning to Make Inferences in a Semantic Parsing Task
Authors	Kyle Richardson, Jonas Kuhn
Abstract	We introduce a new approach to training a semantic parser that uses textual entailment judgements as supervision. These judgements are based on high-level inferences about whether the meaning of one sentence follows from another. When applied to an existing semantic parsing task, they prove to be a useful tool for revealing semantic distinctions and background knowledge not captured in the target representations. This information is used to improve the quality of the semantic representations being learned and to acquire generic knowledge for reasoning. Experiments are done on the benchmark Sportscaster corpus (Chen and Mooney, 2008), and a novel RTE-inspired inference dataset is introduced. On this new dataset our method strongly outperforms several strong baselines. Separately, we obtain state-of-the-art results on the original Sportscaster semantic parsing task.
Tasks	Machine Translation, Natural Language Inference, Question Answering, Semantic Parsing
Published	2016-01-01
URL	https://www.aclweb.org/anthology/Q16-1012/
PDF	https://www.aclweb.org/anthology/Q16-1012
PWC	https://paperswithcode.com/paper/learning-to-make-inferences-in-a-semantic
Repo
Framework

ASPEC: Asian Scientific Paper Excerpt Corpus


Title	ASPEC: Asian Scientific Paper Excerpt Corpus
Authors	Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchimoto, Masao Utiyama, Eiichiro Sumita, Sadao Kurohashi, Hitoshi Isahara
Abstract	In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1350/
PDF	https://www.aclweb.org/anthology/L16-1350
PWC	https://paperswithcode.com/paper/aspec-asian-scientific-paper-excerpt-corpus
Repo
Framework

ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool


Title	ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Authors	Xiaofeng Wu, Jinhua Du, Qun Liu, Andy Way
Abstract	This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a Controlled Language that is understood by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author{'}s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1352/
PDF	https://www.aclweb.org/anthology/L16-1352
PWC	https://paperswithcode.com/paper/prophetmt-a-tree-based-smt-driven-controlled
Repo
Framework

A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation


Title	A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation
Authors	Marlies van der Wees, Arianna Bisazza, Christof Monz
Abstract	A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official standardization and therefore highly ambiguous, which prevents rule-based approaches from achieving good translation results. In this paper, we improve Arabizi-to-English machine translation by presenting a simple but effective Arabizi-to-Arabic transliteration pipeline that does not require knowledge by experts or native Arabic speakers. We incorporate this pipeline into a phrase-based SMT system, and show that translation quality after automatically transliterating Arabizi to Arabic yields results that are comparable to those achieved after human transliteration.
Tasks	Machine Translation, Transliteration
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-3908/
PDF	https://www.aclweb.org/anthology/W16-3908
PWC	https://paperswithcode.com/paper/a-simple-but-effective-approach-to-improve
Repo
Framework

Sarcasm Detection : Building a Contextual Hierarchy


Title	Sarcasm Detection : Building a Contextual Hierarchy
Authors	Taradheesh Bali, Navjyoti Singh
Abstract	The conundrum of understanding and classifying sarcasm has been dealt with by the traditional theorists as an analysis of a sarcastic utterance and the ironic situation that surrounds it. The problem with such an approach is that it is too narrow, as it is unable to sufficiently utilize the two indispensable agents in making such an utterance, viz. the speaker and the listener. It undermines the necessary context required to comprehend a sarcastic utterance. In this paper, we propose a novel approach towards understanding sarcasm in terms of the existing knowledge hierarchy between the two participants, which forms the basis of the context that both agents share. The difference in relationship of the speaker of the sarcastic utterance and the disparate audience found on social media, such as Twitter, is also captured. We then apply our model on a corpus of tweets to achieve significant results and consequently, shed light on subjective nature of context, which is contingent on the relation between the speaker and the listener.
Tasks	Lexical Analysis, Sarcasm Detection, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4313/
PDF	https://www.aclweb.org/anthology/W16-4313
PWC	https://paperswithcode.com/paper/sarcasm-detection-building-a-contextual
Repo
Framework

Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects


Title	Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
Authors	Diana Bogantes, Eric Rodr{'\i}guez, Alej Arauco, ro, Alej Rodr{'\i}guez, ro, Agata Savary
Abstract	This paper describes a pilot study in lexical encoding of multi-word expressions (MWEs) in 4 Latin American dialects of Spanish: Costa Rican, Colombian, Mexican and Peruvian. We describe the variability of MWE usage across dialects. We adapt an existing data model to a dialect-aware encoding, so as to represent dialect-related specificities, while avoiding redundancy of the data common for all dialects. A dozen of linguistic properties of MWEs can be expressed in this model, both on the level of a whole MWE and of its individual components. We describe the resulting lexical resource containing several dozens of MWEs in four dialects and we propose a method for constructing a web corpus as a support for crowdsourcing examples of MWE occurrences. The resource is available under an open license and paves the way towards a large-scale dialect-aware language resource construction, which should prove useful in both traditional and novel NLP applications.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1358/
PDF	https://www.aclweb.org/anthology/L16-1358
PWC	https://paperswithcode.com/paper/towards-lexical-encoding-of-multi-word
Repo
Framework

The Trials and Tribulations of Predicting Post-Editing Productivity


Title	The Trials and Tribulations of Predicting Post-Editing Productivity
Authors	Lena Marg
Abstract	While an increasing number of (automatic) metrics is available to assess the linguistic quality of machine translations, their interpretation remains cryptic to many users, specifically in the translation community. They are clearly useful for indicating certain overarching trends, but say little about actual improvements for translation buyers or post-editors. However, these metrics are commonly referenced when discussing pricing and models, both with translation buyers and service providers. With the aim of focusing on automatic metrics that are easier to understand for non-research users, we identified Edit Distance (or Post-Edit Distance) as a good fit. While Edit Distance as such does not express cognitive effort or time spent editing machine translation suggestions, we found that it correlates strongly with the productivity tests we performed, for various language pairs and domains. This paper aims to analyse Edit Distance and productivity data on a segment level based on data gathered over some years. Drawing from these findings, we want to then explore how Edit Distance could help in predicting productivity on new content. Some further analysis is proposed, with findings to be presented at the conference.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1004/
PDF	https://www.aclweb.org/anthology/L16-1004
PWC	https://paperswithcode.com/paper/the-trials-and-tribulations-of-predicting
Repo
Framework

Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.


Title	Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
Authors	Emer Gilmartin, Nick Campbell
Abstract	Casual multiparty conversation is an understudied but very common genre of spoken interaction, whose analysis presents a number of challenges in terms of data scarcity and annotation. We describe the annotation process used on the d64 and DANS multimodal corpora of multiparty casual talk, which have been manually segmented, transcribed, annotated for laughter and disfluencies, and aligned using the Penn Aligner. We also describe a visualization tool, STAVE, developed during the annotation process, which allows long stretches of talk or indeed entire conversations to be viewed, aiding preliminary identification of features and patterns worthy of analysis. It is hoped that this tool will be of use to other researchers working in this field.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1705/
PDF	https://www.aclweb.org/anthology/L16-1705
PWC	https://paperswithcode.com/paper/capturing-chat-annotation-and-tools-for
Repo
Framework

Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams


Title	Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams
Authors	Benjamin Milde, Jonas Wacker, Stefan Radomski, Max M{"u}hlh{"a}user, Chris Biemann
Abstract	In this demonstration paper we describe Ambient Search, a system that displays and retrieves documents in real time based on speech input. The system operates continuously in ambient mode, i.e. it generates speech transcriptions and identifies main keywords and keyphrases, while also querying its index to display relevant documents without explicit query. Without user intervention, the results are dynamically updated; users can choose to interact with the system at any time, employing a conversation protocol that is enriched with the ambient information gathered continuously. Our evaluation shows that Ambient Search outperforms another implicit speech-based information retrieval system. Ambient search is available as open source software.
Tasks	Information Retrieval, Keyword Extraction, Speech Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-2049/
PDF	https://www.aclweb.org/anthology/C16-2049
PWC	https://paperswithcode.com/paper/demonstrating-ambient-search-implicit
Repo
Framework

corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora


Title	corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
Authors	Stephan Druskat, Volker Gast, Thomas Krause, Florian Zipser
Abstract	This paper introduces an open source, interoperable generic software tool set catering for the entire workflow of creation, migration, annotation, query and analysis of multi-layer linguistic corpora. It consists of four components: Salt, a graph-based meta model and API for linguistic data, the common data model for the rest of the tool set; Pepper, a conversion tool and platform for linguistic data that can be used to convert many different linguistic formats into each other; Atomic, an extensible, platform-independent multi-layer desktop annotation software for linguistic corpora; ANNIS, a search and visualization architecture for multi-layer linguistic corpora with many different visualizations and a powerful native query language. The set was designed to solve the following issues in a multi-layer corpus workflow: Lossless data transition between tools through a common data model generic enough to allow for a potentially unlimited number of different types of annotation, conversion capabilities for different linguistic formats to cater for the processing of data from different sources and/or with existing annotations, a high level of extensibility to enhance the sustainability of the whole tool set, analysis capabilities encompassing corpus and annotation query alongside multi-faceted visualizations of all annotation layers.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1711/
PDF	https://www.aclweb.org/anthology/L16-1711
PWC	https://paperswithcode.com/paper/corpus-toolsorg-an-interoperable-generic
Repo
Framework

SubCo: A Learner Translation Corpus of Human and Machine Subtitles


Title	SubCo: A Learner Translation Corpus of Human and Machine Subtitles
Authors	Jos{'e} Manuel Mart{'\i}nez Mart{'\i}nez, Mihaela Vela
Abstract	In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises, the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resource for both human and machine translation communities, enabling the direct comparison {–} in terms of errors and evaluation {–} between human and machine translations and post-edited machine translations.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1357/
PDF	https://www.aclweb.org/anthology/L16-1357
PWC	https://paperswithcode.com/paper/subco-a-learner-translation-corpus-of-human
Repo
Framework