Paper Group NANR 65
Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives. A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C). Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research. Tools and Instruments for Building and Querying Diach …
Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives
Title | Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives |
Authors | Roy Schwartz, Roi Reichart, Ari Rappoport |
Abstract | |
Tasks | Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1060/ |
https://www.aclweb.org/anthology/N16-1060 | |
PWC | https://paperswithcode.com/paper/symmetric-patterns-and-coordinations-fast-and |
Repo | |
Framework | |
A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C)
Title | A Tangled Web: The Faint Signals of Deception in Text - Boulder Lies and Truth Corpus (BLT-C) |
Authors | Franco Salvetti, John B. Lowe, James H. Martin |
Abstract | We present an approach to creating corpora for use in detecting deception in text, including a discussion of the challenges peculiar to this task. Our approach is based on soliciting several types of reviews from writers and was implemented using Amazon Mechanical Turk. We describe the multi-dimensional corpus of reviews built using this approach, available free of charge from LDC as the Boulder Lies and Truth Corpus (BLT-C). Challenges for both corpus creation and the deception detection include the fact that human performance on the task is typically at chance, that the signal is faint, that paid writers such as turkers are sometimes deceptive, and that deception is a complex human behavior; manifestations of deception depend on details of domain, intrinsic properties of the deceiver (such as education, linguistic competence, and the nature of the intention), and specifics of the deceptive act (e.g., lying vs. fabricating.) To overcome the inherent lack of ground truth, we have developed a set of semi-automatic techniques to ensure corpus validity. We present some preliminary results on the task of deception detection which suggest that the BLT-C is an improvement in the quality of resources available for this task. |
Tasks | Deception Detection |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1558/ |
https://www.aclweb.org/anthology/L16-1558 | |
PWC | https://paperswithcode.com/paper/a-tangled-web-the-faint-signals-of-deception |
Repo | |
Framework | |
Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research
Title | Corpus Analysis based on Structural Phenomena in Texts: Exploiting TEI Encoding for Linguistic Research |
Authors | Susanne Haaf |
Abstract | This paper poses the question, how linguistic corpus-based research may be enriched by the exploitation of conceptual text structures and layout as provided via TEI annotation. Examples for possible areas of research and usage scenarios are provided based on the German historical corpus of the Deutsches Textarchiv (DTA) project, which has been consistently tagged accordant to the TEI Guidelines, more specifically to the DTA ›Base Format‹ (DTABf). The paper shows that by including TEI-XML structuring in corpus-based analyses significances can be observed for different linguistic phenomena, as e.g. the development of conceptual text structures themselves, the syntactic embedding of terms in certain conceptual text structures, and phenomena of language change which become obvious via the layout of a text. The exemplary study carried out here shows some of the potential for the exploitation of TEI annotation for linguistic research, which might be kept in mind when making design decisions for new corpora as well when working with existing TEI corpora. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1692/ |
https://www.aclweb.org/anthology/L16-1692 | |
PWC | https://paperswithcode.com/paper/corpus-analysis-based-on-structural-phenomena |
Repo | |
Framework | |
Tools and Instruments for Building and Querying Diachronic Computational Lexica
Title | Tools and Instruments for Building and Querying Diachronic Computational Lexica |
Authors | Fahad Khan, Bell, Andrea i, Monica Monachini |
Abstract | This article describes work on enabling the addition of temporal information to senses of words in linguistic linked open data lexica based on the lemonDia model. Our contribution in this article is twofold. On the one hand, we demonstrate how lemonDia enables the querying of diachronic lexical datasets using OWL-oriented Semantic Web based technologies. On the other hand, we present a preliminary version of an interactive interface intended to help users in creating lexical datasets that model meaning change over time. |
Tasks | |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4022/ |
https://www.aclweb.org/anthology/W16-4022 | |
PWC | https://paperswithcode.com/paper/tools-and-instruments-for-building-and |
Repo | |
Framework | |
Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian
Title | Designing a Speech Corpus for the Development and Evaluation of Dictation Systems in Latvian |
Authors | M{=a}rcis Pinnis, Askars Salimbajevs, Ilze Auzi{\c{n}}a |
Abstract | In this paper the authors present a speech corpus designed and created for the development and evaluation of dictation systems in Latvian. The corpus consists of over nine hours of orthographically annotated speech from 30 different speakers. The corpus features spoken commands that are common for dictation systems for text editors. The corpus is evaluated in an automatic speech recognition scenario. Evaluation results in an ASR dictation scenario show that the addition of the corpus to the acoustic model training data in combination with language model adaptation allows to decrease the WER by up to relative 41.36{%} (or 16.83{%} in absolute numbers) compared to a baseline system without language model adaptation. Contribution of acoustic data augmentation is at relative 12.57{%} (or 3.43{%} absolute). |
Tasks | Data Augmentation, Language Modelling, Speech Recognition |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1124/ |
https://www.aclweb.org/anthology/L16-1124 | |
PWC | https://paperswithcode.com/paper/designing-a-speech-corpus-for-the-development |
Repo | |
Framework | |
Analyzing Impact, Trend, and Diffusion of Knowledge associated with Neoplasms Research
Title | Analyzing Impact, Trend, and Diffusion of Knowledge associated with Neoplasms Research |
Authors | Min Song |
Abstract | Cancer (a.k.a neoplasms in a broader sense) is one of the leading causes of death worldwide and its incidence is expected to exacerbate. To respond to the critical need from the society, there have been rigorous attempts for the cancer research community to develop treatment for cancer. Accordingly, we observe a surge in the sheer volume of research products and outcomes in relation to neoplasms. In this talk, we introduce the notion of entitymetrics to provide a new lens for understanding the impact, trend, and diffusion of knowledge associated with neoplasms research. To this end, we collected over two million records from PubMed, the most popular search engine in the medical domain. Coupled with text mining techniques including named entity recognition, sentence boundary detection, string approximate matching, entitymetrics enables us to analyze knowledge diffusion, impact, and trend at various knowledge entity units, such as bio-entity, organization, and country. At the end of the talk, the future applications and possible directions of entitymetrics will be discussed. |
Tasks | Boundary Detection, Named Entity Recognition |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/W16-4701/ |
https://www.aclweb.org/anthology/W16-4701 | |
PWC | https://paperswithcode.com/paper/analyzing-impact-trend-and-diffusion-of |
Repo | |
Framework | |
Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming
Title | Markov Logic Networks for Text Mining: A Qualitative and Empirical Comparison with Integer Linear Programming |
Authors | Luis Gerardo Mojica de la Vega, Vincent Ng |
Abstract | Joint inference approaches such as Integer Linear Programming (ILP) and Markov Logic Networks (MLNs) have recently been successfully applied to many natural language processing (NLP) tasks, often outperforming their pipeline counterparts. However, MLNs are arguably much less popular among NLP researchers than ILP. While NLP researchers who desire to employ these joint inference frameworks do not necessarily have to understand their theoretical underpinnings, it is imperative that they understand which of them should be applied under what circumstances. With the goal of helping NLP researchers better understand the relative strengths and weaknesses of MLNs and ILP; we will compare them along different dimensions of interest, such as expressiveness, ease of use, scalability, and performance. To our knowledge, this is the first systematic comparison of ILP and MLNs on an NLP task. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1695/ |
https://www.aclweb.org/anthology/L16-1695 | |
PWC | https://paperswithcode.com/paper/markov-logic-networks-for-text-mining-a |
Repo | |
Framework | |
Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations
Title | Summarizing Behaviours: An Experiment on the Annotation of Call-Centre Conversations |
Authors | Morena Danieli, Balamurali A R, Evgeny Stepanov, Benoit Favre, Frederic Bechet, Giuseppe Riccardi |
Abstract | Annotating and predicting behavioural aspects in conversations is becoming critical in the conversational analytics industry. In this paper we look into inter-annotator agreement of agent behaviour dimensions on two call center corpora. We find that the task can be annotated consistently over time, but that subjectivity issues impacts the quality of the annotation. The reformulation of some of the annotated dimensions is suggested in order to improve agreement. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1701/ |
https://www.aclweb.org/anthology/L16-1701 | |
PWC | https://paperswithcode.com/paper/summarizing-behaviours-an-experiment-on-the |
Repo | |
Framework | |
Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation
Title | Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation |
Authors | Hanae Koiso, Tomoyuki Tsuchiya, Ryoko Watanabe, Daisuke Yokomori, Masao Aizawa, Yasuharu Den |
Abstract | In 2016, we set about building a large-scale corpus of everyday Japanese conversation―a collection of conversations embedded in naturally occurring activities in daily life. We will collect more than 200 hours of recordings over six years,publishing the corpus in 2022. To construct such a huge corpus, we have conducted a pilot project, one of whose purposes is to establish a corpus design for collecting various kinds of everyday conversations in a balanced manner. For this purpose, we conducted a survey of everyday conversational behavior, with about 250 adults, in order to reveal how diverse our everyday conversational behavior is and to build an empirical foundation for corpus design. The questionnaire included when, where, how long,with whom, and in what kind of activity informants were engaged in conversations. We found that ordinary conversations show the following tendencies: i) they mainly consist of chats, business talks, and consultations; ii) in general, the number of participants is small and the duration of the conversation is short; iii) many conversations are conducted in private places such as homes, as well as in public places such as offices and schools; and iv) some questionnaire items are related to each other. This paper describes an overview of this survey study, and then discusses how to design a large-scale corpus of everyday Japanese conversation on this basis. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1702/ |
https://www.aclweb.org/anthology/L16-1702 | |
PWC | https://paperswithcode.com/paper/survey-of-conversational-behavior-towards-the |
Repo | |
Framework | |
Lin|gu|is|tik: Building the Linguist’s Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data
Title | Lin |
Authors | Christian Chiarcos, Christian F{"a}th, Heike Renner-Westermann, Frank Abromeit, Vanya Dimitrova |
Abstract | This paper introduces a novel research tool for the field of linguistics: The Linguistik web portal provides a virtual library which offers scientific information on every linguistic subject. It comprises selected internet sources and databases as well as catalogues for linguistic literature, and addresses an interdisciplinary audience. The virtual library is the most recent outcome of the Special Subject Collection Linguistics of the German Research Foundation (DFG), and also integrates the knowledge accumulated in the Bibliography of Linguistic Literature. In addition to the portal, we describe long-term goals and prospects with a special focus on ongoing efforts regarding an extension towards integrating language resources and Linguistic Linked Open Data. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1707/ |
https://www.aclweb.org/anthology/L16-1707 | |
PWC | https://paperswithcode.com/paper/linguistik-building-the-linguists-pathway-to |
Repo | |
Framework | |
A Web Tool for Building Parallel Corpora of Spoken and Sign Languages
Title | A Web Tool for Building Parallel Corpora of Spoken and Sign Languages |
Authors | Alex Becker, Fabio Kepler, C, Sara eias |
Abstract | In this paper we describe our work in building an online tool for manually annotating texts in any spoken language with SignWriting in any sign language. The existence of such tool will allow the creation of parallel corpora between spoken and sign languages that can be used to bootstrap the creation of efficient tools for the Deaf community. As an example, a parallel corpus between English and American Sign Language could be used for training Machine Learning models for automatic translation between the two languages. Clearly, this kind of tool must be designed in a way that it eases the task of human annotators, not only by being easy to use, but also by giving smart suggestions as the annotation progresses, in order to save time and effort. By building a collaborative, online, easy to use annotation tool for building parallel corpora between spoken and sign languages we aim at helping the development of proper resources for sign languages that can then be used in state-of-the-art models currently used in tools for spoken languages. There are several issues and difficulties in creating this kind of resource, and our presented tool already deals with some of them, like adequate text representation of a sign and many to many alignments between words and signs. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1229/ |
https://www.aclweb.org/anthology/L16-1229 | |
PWC | https://paperswithcode.com/paper/a-web-tool-for-building-parallel-corpora-of |
Repo | |
Framework | |
The Clinical Panel: Leveraging Psychological Expertise During NLP Research
Title | The Clinical Panel: Leveraging Psychological Expertise During NLP Research |
Authors | Glen Coppersmith, Kristy Hollingshead, H. Andrew Schwartz, Molly Ireland, Rebecca Resnik, Kate Loveys, April Foreman, Loring Ingraham |
Abstract | |
Tasks | |
Published | 2016-11-01 |
URL | https://www.aclweb.org/anthology/papers/W16-5617/w16-5617 |
https://www.aclweb.org/anthology/W16-5617 | |
PWC | https://paperswithcode.com/paper/the-clinical-panel-leveraging-psychological |
Repo | |
Framework | |
Bayesian Optimization with Robust Bayesian Neural Networks
Title | Bayesian Optimization with Robust Bayesian Neural Networks |
Authors | Jost Tobias Springenberg, Aaron Klein, Stefan Falkner, Frank Hutter |
Abstract | Bayesian optimization is a prominent method for optimizing expensive to evaluate black-box functions that is prominently applied to tuning the hyperparameters of machine learning algorithms. Despite its successes, the prototypical Bayesian optimization approach - using Gaussian process models - does not scale well to either many hyperparameters or many function evaluations. Attacking this lack of scalability and flexibility is thus one of the key challenges of the field. We present a general approach for using flexible parametric models (neural networks) for Bayesian optimization, staying as close to a truly Bayesian treatment as possible. We obtain scalability through stochastic gradient Hamiltonian Monte Carlo, whose robustness we improve via a scale adaptation. Experiments including multi-task Bayesian optimization with 21 tasks, parallel optimization of deep neural networks and deep reinforcement learning show the power and flexibility of this approach. |
Tasks | Hyperparameter Optimization |
Published | 2016-12-01 |
URL | http://papers.nips.cc/paper/6117-bayesian-optimization-with-robust-bayesian-neural-networks |
http://papers.nips.cc/paper/6117-bayesian-optimization-with-robust-bayesian-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimization-with-robust-bayesian |
Repo | |
Framework | |
Parameterized context windows in Random Indexing
Title | Parameterized context windows in Random Indexing |
Authors | Tobias Norlund, David Nilsson, Magnus Sahlgren |
Abstract | |
Tasks | Representation Learning, Sentiment Analysis, Word Embeddings |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1619/ |
https://www.aclweb.org/anthology/W16-1619 | |
PWC | https://paperswithcode.com/paper/parameterized-context-windows-in-random |
Repo | |
Framework | |
Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation
Title | Focus Annotation of Task-based Data: Establishing the Quality of Crowd Annotation |
Authors | Kordula De Kuthy, Ramon Ziai, Detmar Meurers |
Abstract | |
Tasks | Reading Comprehension |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1713/ |
https://www.aclweb.org/anthology/W16-1713 | |
PWC | https://paperswithcode.com/paper/focus-annotation-of-task-based-data |
Repo | |
Framework | |