Paper Group NANR 57
Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models. Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge. Annotating and Detecting Medical Events in Clinical Notes. Speech Synthesis of Code-Mixed Text. CHATR the Corpus; a 20-year-old archive of Con …
Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models
Title | Lexical Variability and Compositionality: Investigating Idiomaticity with Distributional Semantic Models |
Authors | Marco Silvio Giuseppe Senaldi, Gianluca E. Lebani, Aless Lenci, ro |
Abstract | |
Tasks | |
Published | 2016-08-01 |
URL | https://www.aclweb.org/anthology/W16-1803/ |
https://www.aclweb.org/anthology/W16-1803 | |
PWC | https://paperswithcode.com/paper/lexical-variability-and-compositionality |
Repo | |
Framework | |
Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge
Title | Monitoring Disease Outbreak Events on the Web Using Text-mining Approach and Domain Expert Knowledge |
Authors | Elena Arsevska, Mathieu Roche, Sylvain Falala, Renaud Lancelot, David Chavernac, Pascal Hendrikx, Barbara Dufour |
Abstract | Timeliness and precision for detection of infectious animal disease outbreaks from the information published on the web is crucial for prevention against their spread. We propose a generic method to enrich and extend the use of different expressions as queries in order to improve the acquisition of relevant disease related pages on the web. Our method combines a text mining approach to extract terms from corpora of relevant disease outbreak documents, and domain expert elicitation (Delphi method) to propose expressions and to select relevant combinations between terms obtained with text mining. In this paper we evaluated the performance as queries of a number of expressions obtained with text mining and validated by a domain expert and expressions proposed by a panel of 21 domain experts. We used African swine fever as an infectious animal disease model. The expressions obtained with text mining outperformed as queries the expressions proposed by domain experts. However, domain experts proposed expressions not extracted automatically. Our method is simple to conduct and flexible to adapt to any other animal infectious disease and even in the public health domain. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1543/ |
https://www.aclweb.org/anthology/L16-1543 | |
PWC | https://paperswithcode.com/paper/monitoring-disease-outbreak-events-on-the-web |
Repo | |
Framework | |
Annotating and Detecting Medical Events in Clinical Notes
Title | Annotating and Detecting Medical Events in Clinical Notes |
Authors | Prescott Klassen, Fei Xia, Meliha Yetisgen |
Abstract | Early detection and treatment of diseases that onset after a patient is admitted to a hospital, such as pneumonia, is critical to improving and reducing costs in healthcare. Previous studies (Tepper et al., 2013) showed that change-of-state events in clinical notes could be important cues for phenotype detection. In this paper, we extend the annotation schema proposed in (Klassen et al., 2014) to mark change-of-state events, diagnosis events, coordination, and negation. After we have completed the annotation, we build NLP systems to automatically identify named entities and medical events, which yield an f-score of 94.7{%} and 91.8{%}, respectively. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1545/ |
https://www.aclweb.org/anthology/L16-1545 | |
PWC | https://paperswithcode.com/paper/annotating-and-detecting-medical-events-in |
Repo | |
Framework | |
Speech Synthesis of Code-Mixed Text
Title | Speech Synthesis of Code-Mixed Text |
Authors | Sunayana Sitaram, Alan W Black |
Abstract | Most Text to Speech (TTS) systems today assume that the input text is in a single language and is written in the same language that the text needs to be synthesized in. However, in bilingual and multilingual communities, code mixing or code switching occurs in speech, in which speakers switch between languages in the same utterance. Due to the popularity of social media, we now see code-mixing even in text in these multilingual communities. TTS systems capable of synthesizing such text need to be able to handle text that is written in multiple languages and scripts. Code-mixed text poses many challenges to TTS systems, such as language identification, spelling normalization and pronunciation modeling. In this work, we describe a preliminary framework for synthesizing code-mixed text. We carry out experiments on synthesizing code-mixed Hindi and English text. We find that there is a significant user preference for TTS systems that can correctly identify and pronounce words in different languages. |
Tasks | Language Identification, Speech Synthesis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1546/ |
https://www.aclweb.org/anthology/L16-1546 | |
PWC | https://paperswithcode.com/paper/speech-synthesis-of-code-mixed-text |
Repo | |
Framework | |
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis
Title | CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis |
Authors | Nick Campbell |
Abstract | This paper reports the preservation of an old speech synthesis website as a corpus. CHATR was a revolutionary technique developed in the mid nineties for concatenative speech synthesis. The method has since become the standard for high quality speech output by computer although much of the current research is devoted to parametric or hybrid methods that employ smaller amounts of data and can be more easily tunable to individual voices. The system was first reported in 1994 and the website was functional in 1996. The ATR labs where this system was invented no longer exist, but the website has been preserved as a corpus containing 1537 samples of synthesised speech from that period (118 MB in aiff format) in 211 pages under various finely interrelated themes The corpus can be accessed from www.speech-data.jp as well as www.tcd-fastnet.com, where the original code and samples are now being maintained. |
Tasks | Speech Synthesis |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1548/ |
https://www.aclweb.org/anthology/L16-1548 | |
PWC | https://paperswithcode.com/paper/chatr-the-corpus-a-20-year-old-archive-of |
Repo | |
Framework | |
Multimodal Resources for Human-Robot Communication Modelling
Title | Multimodal Resources for Human-Robot Communication Modelling |
Authors | Stavroula{–}Evita Fotinea, Eleni Efthimiou, Maria Koutsombogera, Athanasia-Lida Dimou, Theodore Goulas, Kyriaki Vasilaki |
Abstract | This paper reports on work related to the modelling of Human-Robot Communication on the basis of multimodal and multisensory human behaviour analysis. A primary focus in this framework of analysis is the definition of semantics of human actions in interaction, their capture and their representation in terms of behavioural patterns that, in turn, feed a multimodal human-robot communication system. Semantic analysis encompasses both oral and sign languages, as well as both verbal and non-verbal communicative signals to achieve an effective, natural interaction between elderly users with slight walking and cognitive inability and an assistive robotic platform. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1551/ |
https://www.aclweb.org/anthology/L16-1551 | |
PWC | https://paperswithcode.com/paper/multimodal-resources-for-human-robot |
Repo | |
Framework | |
A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
Title | A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs |
Authors | Jackson Tolins, Kris Liu, Yingying Wang, Jean E. Fox Tree, Marilyn Walker, Michael Neff |
Abstract | This paper presents a new corpus, the Personality Dyads Corpus, consisting of multimodal data for three conversations between three personality-matched, two-person dyads (a total of 9 separate dialogues). Participants were selected from a larger sample to be 0.8 of a standard deviation above or below the mean on the Big-Five Personality extraversion scale, to produce an Extravert-Extravert dyad, an Introvert-Introvert dyad, and an Extravert-Introvert dyad. Each pair carried out conversations for three different tasks. The conversations were recorded using optical motion capture for the body and data gloves for the hands. Dyads{'} speech was transcribed and the gestural and postural behavior was annotated with ANVIL. The released corpus includes personality profiles, ANVIL files containing speech transcriptions and the gestural annotations, and BVH files containing body and hand motion in 3D. |
Tasks | Motion Capture |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1553/ |
https://www.aclweb.org/anthology/L16-1553 | |
PWC | https://paperswithcode.com/paper/a-multimodal-motion-captured-corpus-of |
Repo | |
Framework | |
A Hybrid Deep Learning Architecture for Sentiment Analysis
Title | A Hybrid Deep Learning Architecture for Sentiment Analysis |
Authors | Md Shad Akhtar, Ayush Kumar, Asif Ekbal, Pushpak Bhattacharyya |
Abstract | In this paper, we propose a novel hybrid deep learning archtecture which is highly efficient for sentiment analysis in resource-poor languages. We learn sentiment embedded vectors from the Convolutional Neural Network (CNN). These are augmented to a set of optimized features selected through a multi-objective optimization (MOO) framework. The sentiment augmented optimized vector obtained at the end is used for the training of SVM for sentiment classification. We evaluate our proposed approach for coarse-grained (i.e. sentence level) as well as fine-grained (i.e. aspect level) sentiment analysis on four Hindi datasets covering varying domains. In order to show that our proposed method is generic in nature we also evaluate it on two benchmark English datasets. Evaluation shows that the results of the proposed method are consistent across all the datasets and often outperforms the state-of-art systems. To the best of our knowledge, this is the very first attempt where such a deep learning model is used for less-resourced languages such as Hindi. |
Tasks | Aspect-Based Sentiment Analysis, Sentiment Analysis |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1047/ |
https://www.aclweb.org/anthology/C16-1047 | |
PWC | https://paperswithcode.com/paper/a-hybrid-deep-learning-architecture-for |
Repo | |
Framework | |
Design and Development of the MERLIN Learner Corpus Platform
Title | Design and Development of the MERLIN Learner Corpus Platform |
Authors | Verena Lyding, Karin Sch{"o}ne |
Abstract | In this paper, we report on the design and development of an online search platform for the MERLIN corpus of learner texts in Czech, German and Italian. It was created in the context of the MERLIN project, which aims at empirically illustrating features of the Common European Framework of Reference (CEFR) for evaluating language competences based on authentic learner text productions compiled into a learner corpus. Furthermore, the project aims at providing access to the corpus through a search interface adapted to the needs of multifaceted target groups involved with language learning and teaching. This article starts by providing a brief overview on the project ambition, the data resource and its intended target groups. Subsequently, the main focus of the article is on the design and development process of the platform, which is carried out in a user-centred fashion. The paper presents the user studies carried out to collect requirements, details the resulting decisions concerning the platform design and its implementation, and reports on the evaluation of the platform prototype and final adjustments. |
Tasks | |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1392/ |
https://www.aclweb.org/anthology/L16-1392 | |
PWC | https://paperswithcode.com/paper/design-and-development-of-the-merlin-learner |
Repo | |
Framework | |
Lexical Coherence Graph Modeling Using Word Embeddings
Title | Lexical Coherence Graph Modeling Using Word Embeddings |
Authors | Mohsen Mesgar, Michael Strube |
Abstract | |
Tasks | Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/N16-1167/ |
https://www.aclweb.org/anthology/N16-1167 | |
PWC | https://paperswithcode.com/paper/lexical-coherence-graph-modeling-using-word |
Repo | |
Framework | |
Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization
Title | Proximal Riemannian Pursuit for Large-Scale Trace-Norm Minimization |
Authors | Mingkui Tan, Shijie Xiao, Junbin Gao, Dong Xu, Anton van den Hengel, Qinfeng Shi |
Abstract | Trace-norm regularization plays an important role in many areas such as machine learning and computer vision. Solving trace-norm regularized Trace-norm regularization plays an important role in many areas such as computer vision and machine learning. When solving general large-scale trace-norm regularized problems, existing methods may be computationally expensive due to many high-dimensional truncated singular value decompositions (SVDs) or the unawareness of matrix ranks. In this paper, we propose a proximal Riemannian pursuit (PRP) paradigm which addresses a sequence of trace-norm regularized subproblems defined on nonlinear matrix varieties. To address the subproblem, we extend the proximal gradient method on vector space to nonlinear matrix varieties, in which the SVDs of intermediate solutions are maintained by cheap low-rank QR decompositions, therefore making the proposed method more scalable. Empirical studies on several tasks, such as matrix completion and low-rank representation based subspace clustering, demonstrate the competitive performance of the proposed paradigms over existing methods. |
Tasks | Matrix Completion |
Published | 2016-06-01 |
URL | http://openaccess.thecvf.com/content_cvpr_2016/html/Tan_Proximal_Riemannian_Pursuit_CVPR_2016_paper.html |
http://openaccess.thecvf.com/content_cvpr_2016/papers/Tan_Proximal_Riemannian_Pursuit_CVPR_2016_paper.pdf | |
PWC | https://paperswithcode.com/paper/proximal-riemannian-pursuit-for-large-scale |
Repo | |
Framework | |
Amrita_CEN at SemEval-2016 Task 1: Semantic Relation from Word Embeddings in Higher Dimension
Title | Amrita_CEN at SemEval-2016 Task 1: Semantic Relation from Word Embeddings in Higher Dimension |
Authors | Barathi Ganesh HB, An M, Kumar, Soman KP |
Abstract | |
Tasks | Information Retrieval, Machine Translation, Question Answering, Reading Comprehension, Semantic Textual Similarity, Sentence Embedding, Text Summarization, Word Embeddings |
Published | 2016-06-01 |
URL | https://www.aclweb.org/anthology/S16-1109/ |
https://www.aclweb.org/anthology/S16-1109 | |
PWC | https://paperswithcode.com/paper/amrita_cen-at-semeval-2016-task-1-semantic |
Repo | |
Framework | |
Retrieving Occurrences of Grammatical Constructions
Title | Retrieving Occurrences of Grammatical Constructions |
Authors | Anna Ehrlemark, Richard Johansson, Benjamin Lyngfelt |
Abstract | Finding authentic examples of grammatical constructions is central in constructionist approaches to linguistics, language processing, and second language learning. In this paper, we address this problem as an information retrieval (IR) task. To facilitate research in this area, we built a benchmark collection by annotating the occurrences of six constructions in a Swedish corpus. Furthermore, we implemented a simple and flexible retrieval system for finding construction occurrences, in which the user specifies a ranking function using lexical-semantic similarities (lexicon-based or distributional). The system was evaluated using standard IR metrics on the new benchmark, and we saw that lexical-semantical rerankers improve significantly over a purely surface-oriented system, but must be carefully tailored for each individual construction. |
Tasks | Information Retrieval |
Published | 2016-12-01 |
URL | https://www.aclweb.org/anthology/C16-1078/ |
https://www.aclweb.org/anthology/C16-1078 | |
PWC | https://paperswithcode.com/paper/retrieving-occurrences-of-grammatical |
Repo | |
Framework | |
Finding Alternative Translations in a Large Corpus of Movie Subtitle
Title | Finding Alternative Translations in a Large Corpus of Movie Subtitle |
Authors | J{"o}rg Tiedemann |
Abstract | OpenSubtitles.org provides a large collection of user contributed subtitles in various languages for movies and TV programs. Subtitle translations are valuable resources for cross-lingual studies and machine translation research. A less explored feature of the collection is the inclusion of alternative translations, which can be very useful for training paraphrase systems or collecting multi-reference test suites for machine translation. However, differences in translation may also be due to misspellings, incomplete or corrupt data files, or wrongly aligned subtitles. This paper reports our efforts in recognising and classifying alternative subtitle translations with language independent techniques. We use time-based alignment with lexical re-synchronisation techniques and BLEU score filters and sort alternative translations into categories using edit distance metrics and heuristic rules. Our approach produces large numbers of sentence-aligned translation alternatives for over 50 languages provided via the OPUS corpus collection. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1559/ |
https://www.aclweb.org/anthology/L16-1559 | |
PWC | https://paperswithcode.com/paper/finding-alternative-translations-in-a-large |
Repo | |
Framework | |
Exploiting a Large Strongly Comparable Corpus
Title | Exploiting a Large Strongly Comparable Corpus |
Authors | Thierry Etchegoyhen, Andoni Azpeitia, Naiara P{'e}rez |
Abstract | This article describes a large comparable corpus for Basque and Spanish and the methods employed to build a parallel resource from the original data. The EITB corpus, a strongly comparable corpus in the news domain, is to be shared with the research community, as an aid for the development and testing of methods in comparable corpora exploitation, and as basis for the improvement of data-driven machine translation systems for this language pair. Competing approaches were explored for the alignment of comparable segments in the corpus, resulting in the design of a simple method which outperformed a state-of-the-art method on the corpus test sets. The method we present is highly portable, computationally efficient, and significantly reduces deployment work, a welcome result for the exploitation of comparable corpora. |
Tasks | Machine Translation |
Published | 2016-05-01 |
URL | https://www.aclweb.org/anthology/L16-1560/ |
https://www.aclweb.org/anthology/L16-1560 | |
PWC | https://paperswithcode.com/paper/exploiting-a-large-strongly-comparable-corpus |
Repo | |
Framework | |