May 5, 2019

2345 words 12 mins read

Paper Group NANR 57

Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models. Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge. Annotating and Detecting Medical Events in Clinical Notes. Speech Synthesis of Code-Mixed Text. CHATR the Corpus; a 20-year-old archive of Con …

Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models


Title	Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models
Authors	Marco Silvio Giuseppe Senaldi, Gianluca E. Lebani, Aless Lenci, ro
Abstract
Tasks
Published	2016-08-01
URL	https://www.aclweb.org/anthology/W16-1803/
PDF	https://www.aclweb.org/anthology/W16-1803
PWC	https://paperswithcode.com/paper/lexical-variability-and-compositionality
Repo
Framework

Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge


Title	Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge
Authors	Elena Arsevska, Mathieu Roche, Sylvain Falala, Renaud Lancelot, David Chavernac, Pascal Hendrikx, Barbara Dufour
Abstract	Timeliness and precision for detection of infectious animal disease outbreaks from the information published on the web is crucial for prevention against their spread. We propose a generic method to enrich and extend the use of different expressions as queries in order to improve the acquisition of relevant disease related pages on the web. Our method combines a text mining approach to extract terms from corpora of relevant disease outbreak documents, and domain expert elicitation (Delphi method) to propose expressions and to select relevant combinations between terms obtained with text mining. In this paper we evaluated the performance as queries of a number of expressions obtained with text mining and validated by a domain expert and expressions proposed by a panel of 21 domain experts. We used African swine fever as an infectious animal disease model. The expressions obtained with text mining outperformed as queries the expressions proposed by domain experts. However, domain experts proposed expressions not extracted automatically. Our method is simple to conduct and flexible to adapt to any other animal infectious disease and even in the public health domain.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1543/
PDF	https://www.aclweb.org/anthology/L16-1543
PWC	https://paperswithcode.com/paper/monitoring-disease-outbreak-events-on-the-web
Repo
Framework

Annotating and Detecting Medical Events in Clinical Notes


Title	Annotating and Detecting Medical Events in Clinical Notes
Authors	Prescott Klassen, Fei Xia, Meliha Yetisgen
Abstract	Early detection and treatment of diseases that onset after a patient is admitted to a hospital, such as pneumonia, is critical to improving and reducing costs in healthcare. Previous studies (Tepper et al., 2013) showed that change-of-state events in clinical notes could be important cues for phenotype detection. In this paper, we extend the annotation schema proposed in (Klassen et al., 2014) to mark change-of-state events, diagnosis events, coordination, and negation. After we have completed the annotation, we build NLP systems to automatically identify named entities and medical events, which yield an f-score of 94.7{%} and 91.8{%}, respectively.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1545/
PDF	https://www.aclweb.org/anthology/L16-1545
PWC	https://paperswithcode.com/paper/annotating-and-detecting-medical-events-in
Repo
Framework

Speech Synthesis of Code-Mixed Text


Title	Speech Synthesis of Code-Mixed Text
Authors	Sunayana Sitaram, Alan W Black
Abstract	Most Text to Speech (TTS) systems today assume that the input text is in a single language and is written in the same language that the text needs to be synthesized in. However, in bilingual and multilingual communities, code mixing or code switching occurs in speech, in which speakers switch between languages in the same utterance. Due to the popularity of social media, we now see code-mixing even in text in these multilingual communities. TTS systems capable of synthesizing such text need to be able to handle text that is written in multiple languages and scripts. Code-mixed text poses many challenges to TTS systems, such as language identification, spelling normalization and pronunciation modeling. In this work, we describe a preliminary framework for synthesizing code-mixed text. We carry out experiments on synthesizing code-mixed Hindi and English text. We find that there is a significant user preference for TTS systems that can correctly identify and pronounce words in different languages.
Tasks	Language Identification, Speech Synthesis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1546/
PDF	https://www.aclweb.org/anthology/L16-1546
PWC	https://paperswithcode.com/paper/speech-synthesis-of-code-mixed-text
Repo
Framework

CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis


Title	CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
Authors	Nick Campbell
Abstract	This paper reports the preservation of an old speech synthesis website as a corpus. CHATR was a revolutionary technique developed in the mid nineties for concatenative speech synthesis. The method has since become the standard for high quality speech output by computer although much of the current research is devoted to parametric or hybrid methods that employ smaller amounts of data and can be more easily tunable to individual voices. The system was first reported in 1994 and the website was functional in 1996. The ATR labs where this system was invented no longer exist, but the website has been preserved as a corpus containing 1537 samples of synthesised speech from that period (118 MB in aiff format) in 211 pages under various finely interrelated themes The corpus can be accessed from www.speech-data.jp as well as www.tcd-fastnet.com, where the original code and samples are now being maintained.
Tasks	Speech Synthesis
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1548/
PDF	https://www.aclweb.org/anthology/L16-1548
PWC	https://paperswithcode.com/paper/chatr-the-corpus-a-20-year-old-archive-of
Repo
Framework

Multimodal Resources for Human-Robot Communication Modelling


Title	Multimodal Resources for Human-Robot Communication Modelling
Authors	Stavroula{–}Evita Fotinea, Eleni Efthimiou, Maria Koutsombogera, Athanasia-Lida Dimou, Theodore Goulas, Kyriaki Vasilaki
Abstract	This paper reports on work related to the modelling of Human-Robot Communication on the basis of multimodal and multisensory human behaviour analysis. A primary focus in this framework of analysis is the definition of semantics of human actions in interaction, their capture and their representation in terms of behavioural patterns that, in turn, feed a multimodal human-robot communication system. Semantic analysis encompasses both oral and sign languages, as well as both verbal and non-verbal communicative signals to achieve an effective, natural interaction between elderly users with slight walking and cognitive inability and an assistive robotic platform.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1551/
PDF	https://www.aclweb.org/anthology/L16-1551
PWC	https://paperswithcode.com/paper/multimodal-resources-for-human-robot
Repo
Framework

A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs


Title	A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
Authors	Jackson Tolins, Kris Liu, Yingying Wang, Jean E. Fox Tree, Marilyn Walker, Michael Neff
Abstract	This paper presents a new corpus, the Personality Dyads Corpus, consisting of multimodal data for three conversations between three personality-matched, two-person dyads (a total of 9 separate dialogues). Participants were selected from a larger sample to be 0.8 of a standard deviation above or below the mean on the Big-Five Personality extraversion scale, to produce an Extravert-Extravert dyad, an Introvert-Introvert dyad, and an Extravert-Introvert dyad. Each pair carried out conversations for three different tasks. The conversations were recorded using optical motion capture for the body and data gloves for the hands. Dyads{'} speech was transcribed and the gestural and postural behavior was annotated with ANVIL. The released corpus includes personality profiles, ANVIL files containing speech transcriptions and the gestural annotations, and BVH files containing body and hand motion in 3D.
Tasks	Motion Capture
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1553/
PDF	https://www.aclweb.org/anthology/L16-1553
PWC	https://paperswithcode.com/paper/a-multimodal-motion-captured-corpus-of
Repo
Framework

A Hybrid Deep Learning Architecture for Sentiment Analysis


Title	A Hybrid Deep Learning Architecture for Sentiment Analysis
Authors	Md Shad Akhtar, Ayush Kumar, Asif Ekbal, Pushpak Bhattacharyya
Abstract	In this paper, we propose a novel hybrid deep learning archtecture which is highly efficient for sentiment analysis in resource-poor languages. We learn sentiment embedded vectors from the Convolutional Neural Network (CNN). These are augmented to a set of optimized features selected through a multi-objective optimization (MOO) framework. The sentiment augmented optimized vector obtained at the end is used for the training of SVM for sentiment classification. We evaluate our proposed approach for coarse-grained (i.e. sentence level) as well as fine-grained (i.e. aspect level) sentiment analysis on four Hindi datasets covering varying domains. In order to show that our proposed method is generic in nature we also evaluate it on two benchmark English datasets. Evaluation shows that the results of the proposed method are consistent across all the datasets and often outperforms the state-of-art systems. To the best of our knowledge, this is the very first attempt where such a deep learning model is used for less-resourced languages such as Hindi.
Tasks	Aspect-Based Sentiment Analysis, Sentiment Analysis
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1047/
PDF	https://www.aclweb.org/anthology/C16-1047
PWC	https://paperswithcode.com/paper/a-hybrid-deep-learning-architecture-for
Repo
Framework

Design and Development of the MERLIN Learner Corpus Platform


Title	Design and Development of the MERLIN Learner Corpus Platform
Authors	Verena Lyding, Karin Sch{"o}ne
Abstract	In this paper, we report on the design and development of an online search platform for the MERLIN corpus of learner texts in Czech, German and Italian. It was created in the context of the MERLIN project, which aims at empirically illustrating features of the Common European Framework of Reference (CEFR) for evaluating language competences based on authentic learner text productions compiled into a learner corpus. Furthermore, the project aims at providing access to the corpus through a search interface adapted to the needs of multifaceted target groups involved with language learning and teaching. This article starts by providing a brief overview on the project ambition, the data resource and its intended target groups. Subsequently, the main focus of the article is on the design and development process of the platform, which is carried out in a user-centred fashion. The paper presents the user studies carried out to collect requirements, details the resulting decisions concerning the platform design and its implementation, and reports on the evaluation of the platform prototype and final adjustments.
Tasks
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1392/
PDF	https://www.aclweb.org/anthology/L16-1392
PWC	https://paperswithcode.com/paper/design-and-development-of-the-merlin-learner
Repo
Framework

Lexical Coherence Graph Modeling Using Word Embeddings


Title	Lexical Coherence Graph Modeling Using Word Embeddings
Authors	Mohsen Mesgar, Michael Strube
Abstract
Tasks	Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/N16-1167/
PDF	https://www.aclweb.org/anthology/N16-1167
PWC	https://paperswithcode.com/paper/lexical-coherence-graph-modeling-using-word
Repo
Framework

Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization


Title	Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization
Authors	Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, Anton van den Hengel, Qinfeng Shi
Abstract	Trace-norm regularization plays an important role in many areas such as machine learning and computer vision. Solving trace-norm regularized Trace-norm regularization plays an important role in many areas such as computer vision and machine learning. When solving general large-scale trace-norm regularized problems, existing methods may be computationally expensive due to many high-dimensional truncated singular value decompositions (SVDs) or the unawareness of matrix ranks. In this paper, we propose a proximal Riemannian pursuit (PRP) paradigm which addresses a sequence of trace-norm regularized subproblems defined on nonlinear matrix varieties. To address the subproblem, we extend the proximal gradient method on vector space to nonlinear matrix varieties, in which the SVDs of intermediate solutions are maintained by cheap low-rank QR decompositions, therefore making the proposed method more scalable. Empirical studies on several tasks, such as matrix completion and low-rank representation based subspace clustering, demonstrate the competitive performance of the proposed paradigms over existing methods.
Tasks	Matrix Completion
Published	2016-06-01
URL	http://openaccess.thecvf.com/content_cvpr_2016/html/Tan_Proximal_Riemannian_Pursuit_CVPR_2016_paper.html
PDF	http://openaccess.thecvf.com/content_cvpr_2016/papers/Tan_Proximal_Riemannian_Pursuit_CVPR_2016_paper.pdf
PWC	https://paperswithcode.com/paper/proximal-riemannian-pursuit-for-large-scale
Repo
Framework

Amrita_CEN at SemEval-2016 Task 1: Semantic Relation from Word Embeddings in Higher Dimension


Title	Amrita_CEN at SemEval-2016 Task 1: Semantic Relation from Word Embeddings in Higher Dimension
Authors	Barathi Ganesh HB, An M, Kumar, Soman KP
Abstract
Tasks	Information Retrieval, Machine Translation, Question Answering, Reading Comprehension, Semantic Textual Similarity, Sentence Embedding, Text Summarization, Word Embeddings
Published	2016-06-01
URL	https://www.aclweb.org/anthology/S16-1109/
PDF	https://www.aclweb.org/anthology/S16-1109
PWC	https://paperswithcode.com/paper/amrita_cen-at-semeval-2016-task-1-semantic
Repo
Framework

Retrieving Occurrences of Grammatical Constructions


Title	Retrieving Occurrences of Grammatical Constructions
Authors	Anna Ehrlemark, Richard Johansson, Benjamin Lyngfelt
Abstract	Finding authentic examples of grammatical constructions is central in constructionist approaches to linguistics, language processing, and second language learning. In this paper, we address this problem as an information retrieval (IR) task. To facilitate research in this area, we built a benchmark collection by annotating the occurrences of six constructions in a Swedish corpus. Furthermore, we implemented a simple and flexible retrieval system for finding construction occurrences, in which the user specifies a ranking function using lexical-semantic similarities (lexicon-based or distributional). The system was evaluated using standard IR metrics on the new benchmark, and we saw that lexical-semantical rerankers improve significantly over a purely surface-oriented system, but must be carefully tailored for each individual construction.
Tasks	Information Retrieval
Published	2016-12-01
URL	https://www.aclweb.org/anthology/C16-1078/
PDF	https://www.aclweb.org/anthology/C16-1078
PWC	https://paperswithcode.com/paper/retrieving-occurrences-of-grammatical
Repo
Framework

Finding Alternative Translations in a Large Corpus of Movie Subtitle


Title	Finding Alternative Translations in a Large Corpus of Movie Subtitle
Authors	J{"o}rg Tiedemann
Abstract	OpenSubtitles.org provides a large collection of user contributed subtitles in various languages for movies and TV programs. Subtitle translations are valuable resources for cross-lingual studies and machine translation research. A less explored feature of the collection is the inclusion of alternative translations, which can be very useful for training paraphrase systems or collecting multi-reference test suites for machine translation. However, differences in translation may also be due to misspellings, incomplete or corrupt data files, or wrongly aligned subtitles. This paper reports our efforts in recognising and classifying alternative subtitle translations with language independent techniques. We use time-based alignment with lexical re-synchronisation techniques and BLEU score filters and sort alternative translations into categories using edit distance metrics and heuristic rules. Our approach produces large numbers of sentence-aligned translation alternatives for over 50 languages provided via the OPUS corpus collection.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1559/
PDF	https://www.aclweb.org/anthology/L16-1559
PWC	https://paperswithcode.com/paper/finding-alternative-translations-in-a-large
Repo
Framework

Exploiting a Large Strongly Comparable Corpus


Title	Exploiting a Large Strongly Comparable Corpus
Authors	Thierry Etchegoyhen, Andoni Azpeitia, Naiara P{'e}rez
Abstract	This article describes a large comparable corpus for Basque and Spanish and the methods employed to build a parallel resource from the original data. The EITB corpus, a strongly comparable corpus in the news domain, is to be shared with the research community, as an aid for the development and testing of methods in comparable corpora exploitation, and as basis for the improvement of data-driven machine translation systems for this language pair. Competing approaches were explored for the alignment of comparable segments in the corpus, resulting in the design of a simple method which outperformed a state-of-the-art method on the corpus test sets. The method we present is highly portable, computationally efficient, and significantly reduces deployment work, a welcome result for the exploitation of comparable corpora.
Tasks	Machine Translation
Published	2016-05-01
URL	https://www.aclweb.org/anthology/L16-1560/
PDF	https://www.aclweb.org/anthology/L16-1560
PWC	https://paperswithcode.com/paper/exploiting-a-large-strongly-comparable-corpus
Repo
Framework