Paper Group NANR 47
The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database. A New Feature Selection Technique Combined with ELM Feature Space for Text Classification. AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis. A Language-Independent Neural Network for Event Detection. Learning to Make Inferences in a Semantic Parsing T …
The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database
Title | The VU Sound Corpus: Adding More Fine-grained Annotations to the Freesound Database |
Authors | Emiel van Miltenburg, Benjamin Timmermans, Lora Aroyo |
Abstract | This paper presents a collection of annotations (tags or keywords) for a set of 2,133 environmental sounds taken from the Freesound database (www.freesound.org). The annotations are acquired through an open-ended crowd-labeling task, in which participants were asked to provide keywords for each of three sounds. The main goal of this study is to find out (i) whether it is feasible to collect keywords for a large collection of sounds through crowdsourcing, and (ii) how people talk about sounds, and what information they can infer from hearing a sound in isolation. Our main finding is that it is not only feasible to perform crowd-labeling for a large collection of sounds, it is also very useful to highlight different aspects of the sounds that authors may fail to mention. Our data is freely available, and can be used to ground semantic models, improve search in audio databases, and to study the language of sound. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1337/ |
https://www.aclweb.org/anthology/L16-1337 | |
PWC | https://paperswithcode.com/paper/the-vu-sound-corpus-adding-more-fine-grained |
Repo | |
Framework | |
A New Feature Selection Technique Combined with ELM Feature Space for Text Classification
Title | A New Feature Selection Technique Combined with ELM Feature Space for Text Classification |
Authors | Rajendra Kumar Roul, Pranav Rai |
Abstract | |
Tasks | Feature Selection, Text Categorization, Text Classification |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-6335/ |
https://www.aclweb.org/anthology/W16-6335 | |
PWC | https://paperswithcode.com/paper/a-new-feature-selection-technique-combined |
Repo | |
Framework | |
AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis
Title | AVAB-DBS: an Audio-Visual Affect Bursts Database for Synthesis |
Authors | Kevin El Haddad, H{"u}seyin {\c{C}}akmak, St{'e}phane Dupont, Thierry Dutoit |
Abstract | It has been shown that adding expressivity and emotional expressions to an agent{'}s communication systems would improve the interaction quality between this agent and a human user. In this paper we present a multimodal database of affect bursts, which are very short non-verbal expressions with facial, vocal, and gestural components that are highly synchronized and triggered by an identifiable event. This database contains motion capture and audio data of affect bursts representing disgust, startle and surprise recorded at three different levels of arousal each. This database is to be used for synthesis purposes in order to generate affect bursts of these emotions on a continuous arousal level scale. |
Tasks | Motion Capture |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1345/ |
https://www.aclweb.org/anthology/L16-1345 | |
PWC | https://paperswithcode.com/paper/avab-dbs-an-audio-visual-affect-bursts |
Repo | |
Framework | |
A Language-Independent Neural Network for Event Detection
Title | A Language-Independent Neural Network for Event Detection |
Authors | Xiaocheng Feng, Lifu Huang, Duyu Tang, Heng Ji, Bing Qin, Ting Liu |
Abstract | |
Tasks | Feature Engineering |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/P16-2011/ |
https://www.aclweb.org/anthology/P16-2011 | |
PWC | https://paperswithcode.com/paper/a-language-independent-neural-network-for |
Repo | |
Framework | |
Learning to Make Inferences in a Semantic Parsing Task
Title | Learning to Make Inferences in a Semantic Parsing Task |
Authors | Kyle Richardson, Jonas Kuhn |
Abstract | We introduce a new approach to training a semantic parser that uses textual entailment judgements as supervision. These judgements are based on high-level inferences about whether the meaning of one sentence follows from another. When applied to an existing semantic parsing task, they prove to be a useful tool for revealing semantic distinctions and background knowledge not captured in the target representations. This information is used to improve the quality of the semantic representations being learned and to acquire generic knowledge for reasoning. Experiments are done on the benchmark Sportscaster corpus (Chen and Mooney, 2008), and a novel RTE-inspired inference dataset is introduced. On this new dataset our method strongly outperforms several strong baselines. Separately, we obtain state-of-the-art results on the original Sportscaster semantic parsing task. |
Tasks | Machine Translation, Natural Language Inference, Question Answering, Semantic Parsing |
Published | 2016-01-01 |
URL | https://www.aclweb.org/anthology/Q16-1012/ |
https://www.aclweb.org/anthology/Q16-1012 | |
PWC | https://paperswithcode.com/paper/learning-to-make-inferences-in-a-semantic |
Repo | |
Framework | |
ASPEC: Asian Scientific Paper Excerpt Corpus
Title | ASPEC: Asian Scientific Paper Excerpt Corpus |
Authors | Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchimoto, Masao Utiyama, Eiichiro Sumita, Sadao Kurohashi, Hitoshi Isahara |
Abstract | In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation). |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1350/ |
https://www.aclweb.org/anthology/L16-1350 | |
PWC | https://paperswithcode.com/paper/aspec-asian-scientific-paper-excerpt-corpus |
Repo | |
Framework | |
ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool
Title | ProphetMT: A Tree-based SMT-driven Controlled Language Authoring/Post-Editing Tool |
Authors | Xiaofeng Wu, Jinhua Du, Qun Liu, Andy Way |
Abstract | This paper presents ProphetMT, a tree-based SMT-driven Controlled Language (CL) authoring and post-editing tool. ProphetMT employs the source-side rules in a translation model and provides them as auto-suggestions to users. Accordingly, one might say that users are writing in a Controlled Language that is understood by the computer. ProphetMT also allows users to easily attach structural information as they compose content. When a specific rule is selected, a partial translation is promptly generated on-the-fly with the help of the structural information. Our experiments conducted on English-to-Chinese show that our proposed ProphetMT system can not only better regularise an author{'}s writing behaviour, but also significantly improve translation fluency which is vital to reduce the post-editing time. Additionally, when the writing and translation process is over, ProphetMT can provide an effective colour scheme to further improve the productivity of post-editors by explicitly featuring the relations between the source and target rules. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1352/ |
https://www.aclweb.org/anthology/L16-1352 | |
PWC | https://paperswithcode.com/paper/prophetmt-a-tree-based-smt-driven-controlled |
Repo | |
Framework | |
A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation
Title | A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation |
Authors | Marlies van der Wees, Arianna Bisazza, Christof Monz |
Abstract | A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official standardization and therefore highly ambiguous, which prevents rule-based approaches from achieving good translation results. In this paper, we improve Arabizi-to-English machine translation by presenting a simple but effective Arabizi-to-Arabic transliteration pipeline that does not require knowledge by experts or native Arabic speakers. We incorporate this pipeline into a phrase-based SMT system, and show that translation quality after automatically transliterating Arabizi to Arabic yields results that are comparable to those achieved after human transliteration. |
Tasks | Machine Translation, Transliteration |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-3908/ |
https://www.aclweb.org/anthology/W16-3908 | |
PWC | https://paperswithcode.com/paper/a-simple-but-effective-approach-to-improve |
Repo | |
Framework | |
Sarcasm Detection : Building a Contextual Hierarchy
Title | Sarcasm Detection : Building a Contextual Hierarchy |
Authors | Taradheesh Bali, Navjyoti Singh |
Abstract | The conundrum of understanding and classifying sarcasm has been dealt with by the traditional theorists as an analysis of a sarcastic utterance and the ironic situation that surrounds it. The problem with such an approach is that it is too narrow, as it is unable to sufficiently utilize the two indispensable agents in making such an utterance, viz. the speaker and the listener. It undermines the necessary context required to comprehend a sarcastic utterance. In this paper, we propose a novel approach towards understanding sarcasm in terms of the existing knowledge hierarchy between the two participants, which forms the basis of the context that both agents share. The difference in relationship of the speaker of the sarcastic utterance and the disparate audience found on social media, such as Twitter, is also captured. We then apply our model on a corpus of tweets to achieve significant results and consequently, shed light on subjective nature of context, which is contingent on the relation between the speaker and the listener. |
Tasks | Lexical Analysis, Sarcasm Detection, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4313/ |
https://www.aclweb.org/anthology/W16-4313 | |
PWC | https://paperswithcode.com/paper/sarcasm-detection-building-a-contextual |
Repo | |
Framework | |
Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects
Title | Towards Lexical Encoding of Multi-Word Expressions in Spanish Dialects |
Authors | Diana Bogantes, Eric Rodr{'\i}guez, Alej Arauco, ro, Alej Rodr{'\i}guez, ro, Agata Savary |
Abstract | This paper describes a pilot study in lexical encoding of multi-word expressions (MWEs) in 4 Latin American dialects of Spanish: Costa Rican, Colombian, Mexican and Peruvian. We describe the variability of MWE usage across dialects. We adapt an existing data model to a dialect-aware encoding, so as to represent dialect-related specificities, while avoiding redundancy of the data common for all dialects. A dozen of linguistic properties of MWEs can be expressed in this model, both on the level of a whole MWE and of its individual components. We describe the resulting lexical resource containing several dozens of MWEs in four dialects and we propose a method for constructing a web corpus as a support for crowdsourcing examples of MWE occurrences. The resource is available under an open license and paves the way towards a large-scale dialect-aware language resource construction, which should prove useful in both traditional and novel NLP applications. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1358/ |
https://www.aclweb.org/anthology/L16-1358 | |
PWC | https://paperswithcode.com/paper/towards-lexical-encoding-of-multi-word |
Repo | |
Framework | |
The Trials and Tribulations of Predicting Post-Editing Productivity
Title | The Trials and Tribulations of Predicting Post-Editing Productivity |
Authors | Lena Marg |
Abstract | While an increasing number of (automatic) metrics is available to assess the linguistic quality of machine translations, their interpretation remains cryptic to many users, specifically in the translation community. They are clearly useful for indicating certain overarching trends, but say little about actual improvements for translation buyers or post-editors. However, these metrics are commonly referenced when discussing pricing and models, both with translation buyers and service providers. With the aim of focusing on automatic metrics that are easier to understand for non-research users, we identified Edit Distance (or Post-Edit Distance) as a good fit. While Edit Distance as such does not express cognitive effort or time spent editing machine translation suggestions, we found that it correlates strongly with the productivity tests we performed, for various language pairs and domains. This paper aims to analyse Edit Distance and productivity data on a segment level based on data gathered over some years. Drawing from these findings, we want to then explore how Edit Distance could help in predicting productivity on new content. Some further analysis is proposed, with findings to be presented at the conference. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1004/ |
https://www.aclweb.org/anthology/L16-1004 | |
PWC | https://paperswithcode.com/paper/the-trials-and-tribulations-of-predicting |
Repo | |
Framework | |
Capturing Chat: Annotation and Tools for Multiparty Casual Conversation.
Title | Capturing Chat: Annotation and Tools for Multiparty Casual Conversation. |
Authors | Emer Gilmartin, Nick Campbell |
Abstract | Casual multiparty conversation is an understudied but very common genre of spoken interaction, whose analysis presents a number of challenges in terms of data scarcity and annotation. We describe the annotation process used on the d64 and DANS multimodal corpora of multiparty casual talk, which have been manually segmented, transcribed, annotated for laughter and disfluencies, and aligned using the Penn Aligner. We also describe a visualization tool, STAVE, developed during the annotation process, which allows long stretches of talk or indeed entire conversations to be viewed, aiding preliminary identification of features and patterns worthy of analysis. It is hoped that this tool will be of use to other researchers working in this field. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1705/ |
https://www.aclweb.org/anthology/L16-1705 | |
PWC | https://paperswithcode.com/paper/capturing-chat-annotation-and-tools-for |
Repo | |
Framework | |
Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams
Title | Demonstrating Ambient Search: Implicit Document Retrieval for Speech Streams |
Authors | Benjamin Milde, Jonas Wacker, Stefan Radomski, Max M{"u}hlh{"a}user, Chris Biemann |
Abstract | In this demonstration paper we describe Ambient Search, a system that displays and retrieves documents in real time based on speech input. The system operates continuously in ambient mode, i.e. it generates speech transcriptions and identifies main keywords and keyphrases, while also querying its index to display relevant documents without explicit query. Without user intervention, the results are dynamically updated; users can choose to interact with the system at any time, employing a conversation protocol that is enriched with the ambient information gathered continuously. Our evaluation shows that Ambient Search outperforms another implicit speech-based information retrieval system. Ambient search is available as open source software. |
Tasks | Information Retrieval, Keyword Extraction, Speech Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-2049/ |
https://www.aclweb.org/anthology/C16-2049 | |
PWC | https://paperswithcode.com/paper/demonstrating-ambient-search-implicit |
Repo | |
Framework | |
corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora
Title | corpus-tools.org: An Interoperable Generic Software Tool Set for Multi-layer Linguistic Corpora |
Authors | Stephan Druskat, Volker Gast, Thomas Krause, Florian Zipser |
Abstract | This paper introduces an open source, interoperable generic software tool set catering for the entire workflow of creation, migration, annotation, query and analysis of multi-layer linguistic corpora. It consists of four components: Salt, a graph-based meta model and API for linguistic data, the common data model for the rest of the tool set; Pepper, a conversion tool and platform for linguistic data that can be used to convert many different linguistic formats into each other; Atomic, an extensible, platform-independent multi-layer desktop annotation software for linguistic corpora; ANNIS, a search and visualization architecture for multi-layer linguistic corpora with many different visualizations and a powerful native query language. The set was designed to solve the following issues in a multi-layer corpus workflow: Lossless data transition between tools through a common data model generic enough to allow for a potentially unlimited number of different types of annotation, conversion capabilities for different linguistic formats to cater for the processing of data from different sources and/or with existing annotations, a high level of extensibility to enhance the sustainability of the whole tool set, analysis capabilities encompassing corpus and annotation query alongside multi-faceted visualizations of all annotation layers. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1711/ |
https://www.aclweb.org/anthology/L16-1711 | |
PWC | https://paperswithcode.com/paper/corpus-toolsorg-an-interoperable-generic |
Repo | |
Framework | |
SubCo: A Learner Translation Corpus of Human and Machine Subtitles
Title | SubCo: A Learner Translation Corpus of Human and Machine Subtitles |
Authors | Jos{'e} Manuel Mart{'\i}nez Mart{'\i}nez, Mihaela Vela |
Abstract | In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises, the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resource for both human and machine translation communities, enabling the direct comparison {–} in terms of errors and evaluation {–} between human and machine translations and post-edited machine translations. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1357/ |
https://www.aclweb.org/anthology/L16-1357 | |
PWC | https://paperswithcode.com/paper/subco-a-learner-translation-corpus-of-human |
Repo | |
Framework | |