May 5, 2019

2347 words 12 mins read

Paper Group NANR 65

Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives. A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C). Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research. Tools and Instruments for Building and Querying Diach …

Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives


Title	Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives
Authors	Roy Schwartz, Roi Reichart, Ari Rappoport
Abstract
Tasks	Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1060/
PDF	https://www.aclweb.org/anthology/N16-1060
PWC	https://paperswithcode.com/paper/symmetric-patterns-and-coordinations-fast-and
Repo
Framework

A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)


Title	A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
Authors	Franco Salvetti, John B. Lowe, James H. Martin
Abstract	We present an approach to creating corpora for use in detecting deception in text, including a discussion of the challenges peculiar to this task. Our approach is based on soliciting several types of reviews from writers and was implemented using Amazon Mechanical Turk. We describe the multi-dimensional corpus of reviews built using this approach, available free of charge from LDC as the Boulder Lies and Truth Corpus (BLT-C). Challenges for both corpus creation and the deception detection include the fact that human performance on the task is typically at chance, that the signal is faint, that paid writers such as turkers are sometimes deceptive, and that deception is a complex human behavior; manifestations of deception depend on details of domain, intrinsic properties of the deceiver (such as education, linguistic competence, and the nature of the intention), and specifics of the deceptive act (e.g., lying vs. fabricating.) To overcome the inherent lack of ground truth, we have developed a set of semi-automatic techniques to ensure corpus validity. We present some preliminary results on the task of deception detection which suggest that the BLT-C is an improvement in the quality of resources available for this task.
Tasks	Deception Detection
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1558/
PDF	https://www.aclweb.org/anthology/L16-1558
PWC	https://paperswithcode.com/paper/a-tangled-web-the-faint-signals-of-deception
Repo
Framework

Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research


Title	Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
Authors	Susanne Haaf
Abstract	This paper poses the question, how linguistic corpus-based research may be enriched by the exploitation of conceptual text structures and layout as provided via TEI annotation. Examples for possible areas of research and usage scenarios are provided based on the German historical corpus of the Deutsches Textarchiv (DTA) project, which has been consistently tagged accordant to the TEI Guidelines, more specifically to the DTA ›Base Format‹ (DTABf). The paper shows that by including TEI-XML structuring in corpus-based analyses significances can be observed for different linguistic phenomena, as e.g. the development of conceptual text structures themselves, the syntactic embedding of terms in certain conceptual text structures, and phenomena of language change which become obvious via the layout of a text. The exemplary study carried out here shows some of the potential for the exploitation of TEI annotation for linguistic research, which might be kept in mind when making design decisions for new corpora as well when working with existing TEI corpora.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1692/
PDF	https://www.aclweb.org/anthology/L16-1692
PWC	https://paperswithcode.com/paper/corpus-analysis-based-on-structural-phenomena
Repo
Framework

Tools and Instruments for Building and Querying Diachronic Computational Lexica


Title	Tools and Instruments for Building and Querying Diachronic Computational Lexica
Authors	Fahad Khan, Bell, Andrea i, Monica Monachini
Abstract	This article describes work on enabling the addition of temporal information to senses of words in linguistic linked open data lexica based on the lemonDia model. Our contribution in this article is twofold. On the one hand, we demonstrate how lemonDia enables the querying of diachronic lexical datasets using OWL-oriented Semantic Web based technologies. On the other hand, we present a preliminary version of an interactive interface intended to help users in creating lexical datasets that model meaning change over time.
Tasks
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4022/
PDF	https://www.aclweb.org/anthology/W16-4022
PWC	https://paperswithcode.com/paper/tools-and-instruments-for-building-and
Repo
Framework

Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian


Title	Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian
Authors	M{=a}rcis Pinnis, Askars Salimbajevs, Ilze Auzi{\c{n}}a
Abstract	In this paper the authors present a speech corpus designed and created for the development and evaluation of dictation systems in Latvian. The corpus consists of over nine hours of orthographically annotated speech from 30 different speakers. The corpus features spoken commands that are common for dictation systems for text editors. The corpus is evaluated in an automatic speech recognition scenario. Evaluation results in an ASR dictation scenario show that the addition of the corpus to the acoustic model training data in combination with language model adaptation allows to decrease the WER by up to relative 41.36{%} (or 16.83{%} in absolute numbers) compared to a baseline system without language model adaptation. Contribution of acoustic data augmentation is at relative 12.57{%} (or 3.43{%} absolute).
Tasks	Data Augmentation, Language Modelling, Speech Recognition
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1124/
PDF	https://www.aclweb.org/anthology/L16-1124
PWC	https://paperswithcode.com/paper/designing-a-speech-corpus-for-the-development
Repo
Framework

Analyzing Impact, Trend, and Diffusion of Knowledge associated with Neoplasms Research


Title	Analyzing Impact, Trend, and Diffusion of Knowledge associated with Neoplasms Research
Authors	Min Song
Abstract	Cancer (a.k.a neoplasms in a broader sense) is one of the leading causes of death worldwide and its incidence is expected to exacerbate. To respond to the critical need from the society, there have been rigorous attempts for the cancer research community to develop treatment for cancer. Accordingly, we observe a surge in the sheer volume of research products and outcomes in relation to neoplasms. In this talk, we introduce the notion of entitymetrics to provide a new lens for understanding the impact, trend, and diffusion of knowledge associated with neoplasms research. To this end, we collected over two million records from PubMed, the most popular search engine in the medical domain. Coupled with text mining techniques including named entity recognition, sentence boundary detection, string approximate matching, entitymetrics enables us to analyze knowledge diffusion, impact, and trend at various knowledge entity units, such as bio-entity, organization, and country. At the end of the talk, the future applications and possible directions of entitymetrics will be discussed.
Tasks	Boundary Detection, Named Entity Recognition
Published	2016-12-01
URL	https://www.aclweb.org/anthology/W16-4701/
PDF	https://www.aclweb.org/anthology/W16-4701
PWC	https://paperswithcode.com/paper/analyzing-impact-trend-and-diffusion-of
Repo
Framework

Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming


Title	Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
Authors	Luis Gerardo Mojica de la Vega, Vincent Ng
Abstract	Joint inference approaches such as Integer Linear Programming (ILP) and Markov Logic Networks (MLNs) have recently been successfully applied to many natural language processing (NLP) tasks, often outperforming their pipeline counterparts. However, MLNs are arguably much less popular among NLP researchers than ILP. While NLP researchers who desire to employ these joint inference frameworks do not necessarily have to understand their theoretical underpinnings, it is imperative that they understand which of them should be applied under what circumstances. With the goal of helping NLP researchers better understand the relative strengths and weaknesses of MLNs and ILP; we will compare them along different dimensions of interest, such as expressiveness, ease of use, scalability, and performance. To our knowledge, this is the first systematic comparison of ILP and MLNs on an NLP task.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1695/
PDF	https://www.aclweb.org/anthology/L16-1695
PWC	https://paperswithcode.com/paper/markov-logic-networks-for-text-mining-a
Repo
Framework

Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations


Title	Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
Authors	Morena Danieli, Balamurali A R, Evgeny Stepanov, Benoit Favre, Frederic Bechet, Giuseppe Riccardi
Abstract	Annotating and predicting behavioural aspects in conversations is becoming critical in the conversational analytics industry. In this paper we look into inter-annotator agreement of agent behaviour dimensions on two call center corpora. We find that the task can be annotated consistently over time, but that subjectivity issues impacts the quality of the annotation. The reformulation of some of the annotated dimensions is suggested in order to improve agreement.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1701/
PDF	https://www.aclweb.org/anthology/L16-1701
PWC	https://paperswithcode.com/paper/summarizing-behaviours-an-experiment-on-the
Repo
Framework

Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation


Title	Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
Authors	Hanae Koiso, Tomoyuki Tsuchiya, Ryoko Watanabe, Daisuke Yokomori, Masao Aizawa, Yasuharu Den
Abstract	In 2016, we set about building a large-scale corpus of everyday Japanese conversationâ€•a collection of conversations embedded in naturally occurring activities in daily life. We will collect more than 200 hours of recordings over six years,publishing the corpus in 2022. To construct such a huge corpus, we have conducted a pilot project, one of whose purposes is to establish a corpus design for collecting various kinds of everyday conversations in a balanced manner. For this purpose, we conducted a survey of everyday conversational behavior, with about 250 adults, in order to reveal how diverse our everyday conversational behavior is and to build an empirical foundation for corpus design. The questionnaire included when, where, how long,with whom, and in what kind of activity informants were engaged in conversations. We found that ordinary conversations show the following tendencies: i) they mainly consist of chats, business talks, and consultations; ii) in general, the number of participants is small and the duration of the conversation is short; iii) many conversations are conducted in private places such as homes, as well as in public places such as offices and schools; and iv) some questionnaire items are related to each other. This paper describes an overview of this survey study, and then discusses how to design a large-scale corpus of everyday Japanese conversation on this basis.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1702/
PDF	https://www.aclweb.org/anthology/L16-1702
PWC	https://paperswithcode.com/paper/survey-of-conversational-behavior-towards-the
Repo
Framework

Lin|gu|is|tik: Building the Linguist’s Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data


Title	Lin
Authors	Christian Chiarcos, Christian F{"a}th, Heike Renner-Westermann, Frank Abromeit, Vanya Dimitrova
Abstract	This paper introduces a novel research tool for the field of linguistics: The Linguistik web portal provides a virtual library which offers scientific information on every linguistic subject. It comprises selected internet sources and databases as well as catalogues for linguistic literature, and addresses an interdisciplinary audience. The virtual library is the most recent outcome of the Special Subject Collection Linguistics of the German Research Foundation (DFG), and also integrates the knowledge accumulated in the Bibliography of Linguistic Literature. In addition to the portal, we describe long-term goals and prospects with a special focus on ongoing efforts regarding an extension towards integrating language resources and Linguistic Linked Open Data.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1707/
PDF	https://www.aclweb.org/anthology/L16-1707
PWC	https://paperswithcode.com/paper/linguistik-building-the-linguists-pathway-to
Repo
Framework

A Web Tool for Building Parallel Corpora of Spoken and Sign Languages


Title	A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
Authors	Alex Becker, Fabio Kepler, C, Sara eias
Abstract	In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and our presented tool already deals with some of them, like adequate text representation of a sign and many to many alignments between words and signs.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1229/
PDF	https://www.aclweb.org/anthology/L16-1229
PWC	https://paperswithcode.com/paper/a-web-tool-for-building-parallel-corpora-of
Repo
Framework

The Clinical Panel: Leveraging Psychological Expertise During NLP Research


Title	The Clinical Panel: Leveraging Psychological Expertise During NLP Research
Authors	Glen Coppersmith, Kristy Hollingshead, H. Andrew Schwartz, Molly Ireland, Rebecca Resnik, Kate Loveys, April Foreman, Loring Ingraham
Abstract
Tasks
Published	2016-11-01
URL	https://www.aclweb.org/anthology/papers/W16-5617/w16-5617
PDF	https://www.aclweb.org/anthology/W16-5617
PWC	https://paperswithcode.com/paper/the-clinical-panel-leveraging-psychological
Repo
Framework

Bayesian Optimization with Robust Bayesian Neural Networks


Title	Bayesian Optimization with Robust Bayesian Neural Networks
Authors	Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter
Abstract	Bayesian optimization is a prominent method for optimizing expensive to evaluate black-box functions that is prominently applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach.
Tasks	Hyperparameter Optimization
Published	2016-12-01
URL	http://papers.nips.cc/paper/6117-bayesian-optimization-with-robust-bayesian-neural-networks
PDF	http://papers.nips.cc/paper/6117-bayesian-optimization-with-robust-bayesian-neural-networks.pdf
PWC	https://paperswithcode.com/paper/bayesian-optimization-with-robust-bayesian
Repo
Framework

Parameterized context windows in Random Indexing


Title	Parameterized context windows in Random Indexing
Authors	Tobias Norlund, David Nilsson, Magnus Sahlgren
Abstract
Tasks	Representation Learning, Sentiment Analysis, Word Embeddings
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1619/
PDF	https://www.aclweb.org/anthology/W16-1619
PWC	https://paperswithcode.com/paper/parameterized-context-windows-in-random
Repo
Framework

Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation


Title	Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation
Authors	Kordula De Kuthy, Ramon Ziai, Detmar Meurers
Abstract
Tasks	Reading Comprehension
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1713/
PDF	https://www.aclweb.org/anthology/W16-1713
PWC	https://paperswithcode.com/paper/focus-annotation-of-task-based-data
Repo
Framework