July 26, 2019

1945 words 10 mins read

Paper Group NANR 93

Paper Group NANR 93

Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts. Temporal Orientation of Tweets for Predicting Income of Users. Adversarial Adaptation of Synthetic or Stale Data. An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree. 語音文件檢索使 …

Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts

Title Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts
Authors Le Santos, ro, Edilson Anselmo Corr{^e}a J{'u}nior, Osvaldo Oliveira Jr, Diego Amancio, Let{'\i}cia Mansur, S Alu{'\i}sio, ra
Abstract Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. Linguistic features, mainly from parsers, have been used to detect MCI, but this is not suitable for large-scale assessments. MCI disfluencies produce non-grammatical speech that requires manual or high precision automatic correction of transcripts. In this paper, we modeled transcripts into complex networks and enriched them with word embedding (CNE) to better represent short texts produced in neuropsychological assessments. The network measurements were applied with well-known classifiers to automatically identify MCI in transcripts, in a binary classification task. A comparison was made with the performance of traditional approaches using Bag of Words (BoW) and linguistic features for three datasets: DementiaBank in English, and Cinderella and Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using only complex networks, while Support Vector Machine was superior to other classifiers. CNE provided the highest accuracies for DementiaBank and Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably owing to its short narratives. The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised. Taken together, the results indicate that complex networks enriched with embedding is promising for detecting MCI in large-scale assessments.
Tasks Word Embeddings
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1118/
PDF https://www.aclweb.org/anthology/P17-1118
PWC https://paperswithcode.com/paper/enriching-complex-networks-with-word-1
Repo
Framework

Temporal Orientation of Tweets for Predicting Income of Users

Title Temporal Orientation of Tweets for Predicting Income of Users
Authors Mohammed Hasanuzzaman, Sabyasachi Kamila, M Kaur, eep, Sriparna Saha, Asif Ekbal
Abstract Automatically estimating a user{'}s socio-economic profile from their language use in social media can significantly help social science research and various downstream applications ranging from business to politics. The current paper presents the first study where user cognitive structure is used to build a predictive model of income. In particular, we first develop a classifier using a weakly supervised learning framework to automatically time-tag tweets as past, present, or future. We quantify a user{'}s overall temporal orientation based on their distribution of tweets, and use it to build a predictive model of income. Our analysis uncovers a correlation between future temporal orientation and income. Finally, we measure the predictive power of future temporal orientation on income by performing regression.
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2104/
PDF https://www.aclweb.org/anthology/P17-2104
PWC https://paperswithcode.com/paper/temporal-orientation-of-tweets-for-predicting
Repo
Framework

Adversarial Adaptation of Synthetic or Stale Data

Title Adversarial Adaptation of Synthetic or Stale Data
Authors Young-Bum Kim, Karl Stratos, Dongchan Kim
Abstract Two types of data shift common in practice are 1. transferring from synthetic data to live user data (a deployment shift), and 2. transferring from stale data to current data (a temporal shift). Both cause a distribution mismatch between training and evaluation, leading to a model that overfits the flawed training data and performs poorly on the test data. We propose a solution to this mismatch problem by framing it as domain adaptation, treating the flawed training dataset as a source domain and the evaluation dataset as a target domain. To this end, we use and build on several recent advances in neural domain adaptation such as adversarial training (Ganinet al., 2016) and domain separation network (Bousmalis et al., 2016), proposing a new effective adversarial training scheme. In both supervised and unsupervised adaptation scenarios, our approach yields clear improvement over strong baselines.
Tasks Domain Adaptation, Spoken Language Understanding
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-1119/
PDF https://www.aclweb.org/anthology/P17-1119
PWC https://paperswithcode.com/paper/adversarial-adaptation-of-synthetic-or-stale
Repo
Framework

An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree

Title An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree
Authors Xinsheng Xu, Jing Lin, Ying Xiao, Jianzhe Yu
Abstract
Tasks
Published 2017-06-01
URL https://www.aclweb.org/anthology/O17-2003/
PDF https://www.aclweb.org/anthology/O17-2003
PWC https://paperswithcode.com/paper/an-approach-to-extract-product-features-from
Repo
Framework

語音文件檢索使用類神經網路技術 (On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval) [In Chinese]

Title 語音文件檢索使用類神經網路技術 (On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval) [In Chinese]
Authors Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen
Abstract
Tasks
Published 2017-12-01
URL https://www.aclweb.org/anthology/O17-3002/
PDF https://www.aclweb.org/anthology/O17-3002
PWC https://paperswithcode.com/paper/eae3aac-a12c-eccc2e-e-on-the-use-of-neural
Repo
Framework

Joint Unsupervised Learning of Semantic Representation of Words and Roles in Dependency Trees

Title Joint Unsupervised Learning of Semantic Representation of Words and Roles in Dependency Trees
Authors Michal Konkol
Abstract In this paper, we introduce WoRel, a model that jointly learns word embeddings and a semantic representation of word relations. The model learns from plain text sentences and their dependency parse trees. The word embeddings produced by WoRel outperform Skip-Gram and GloVe in word similarity and syntactical word analogy tasks and have comparable results on word relatedness and semantic word analogy tasks. We show that the semantic representation of relations enables us to express the meaning of phrases and is a promising research direction for semantics at the sentence level.
Tasks Named Entity Recognition, Question Answering, Sentiment Analysis, Word Embeddings
Published 2017-09-01
URL https://www.aclweb.org/anthology/R17-1052/
PDF https://doi.org/10.26615/978-954-452-049-6_052
PWC https://paperswithcode.com/paper/joint-unsupervised-learning-of-semantic
Repo
Framework

A computationally-assisted procedure for discovering poetic organization within oral tradition

Title A computationally-assisted procedure for discovering poetic organization within oral tradition
Authors David Meyer
Abstract
Tasks
Published 2017-03-01
URL https://www.aclweb.org/anthology/W17-0113/
PDF https://www.aclweb.org/anthology/W17-0113
PWC https://paperswithcode.com/paper/a-computationally-assisted-procedure-for
Repo
Framework

On the Distribution of Lexical Features at Multiple Levels of Analysis

Title On the Distribution of Lexical Features at Multiple Levels of Analysis
Authors Fatemeh Almodaresi, Lyle Ungar, Vivek Kulkarni, Mohsen Zakeri, Salvatore Giorgi, H. Andrew Schwartz
Abstract Natural language processing has increasingly moved from modeling documents and words toward studying the people behind the language. This move to working with data at the user or community level has presented the field with different characteristics of linguistic data. In this paper, we empirically characterize various lexical distributions at different levels of analysis, showing that, while most features are decidedly sparse and non-normal at the message-level (as with traditional NLP), they follow the central limit theorem to become much more Log-normal or even Normal at the user- and county-levels. Finally, we demonstrate that modeling lexical features for the correct level of analysis leads to marked improvements in common social scientific prediction tasks.
Tasks Document Classification, Sentiment Analysis
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2013/
PDF https://www.aclweb.org/anthology/P17-2013
PWC https://paperswithcode.com/paper/on-the-distribution-of-lexical-features-at
Repo
Framework

On the Challenges of Translating NLP Research into Commercial Products

Title On the Challenges of Translating NLP Research into Commercial Products
Authors Daniel Dahlmeier
Abstract This paper highlights challenges in industrial research related to translating research in natural language processing into commercial products. While the interest in natural language processing from industry is significant, the transfer of research to commercial products is non-trivial and its challenges are often unknown to or underestimated by many researchers. I discuss current obstacles and provide suggestions for increasing the chances for translating research to commercial success based on my experience in industrial research.
Tasks
Published 2017-07-01
URL https://www.aclweb.org/anthology/P17-2015/
PDF https://www.aclweb.org/anthology/P17-2015
PWC https://paperswithcode.com/paper/on-the-challenges-of-translating-nlp-research
Repo
Framework

A Web-Based Interactive Tool for Creating, Inspecting, Editing, and Publishing Etymological Datasets

Title A Web-Based Interactive Tool for Creating, Inspecting, Editing, and Publishing Etymological Datasets
Authors Johann-Mattis List
Abstract The paper presents the Etymological DICtionary ediTOR (EDICTOR), a free, interactive, web-based tool designed to aid historical linguists in creating, editing, analysing, and publishing etymological datasets. The EDICTOR offers interactive solutions for important tasks in historical linguistics, including facilitated input and segmentation of phonetic transcriptions, quantitative and qualitative analyses of phonetic and morphological data, enhanced interfaces for cognate class assignment and multiple word alignment, and automated evaluation of regular sound correspondences. As a web-based tool written in JavaScript, the EDICTOR can be used in standard web browsers across all major platforms.
Tasks Word Alignment
Published 2017-04-01
URL https://www.aclweb.org/anthology/E17-3003/
PDF https://www.aclweb.org/anthology/E17-3003
PWC https://paperswithcode.com/paper/a-web-based-interactive-tool-for-creating
Repo
Framework

There’s no `Count or Predict’ but task-based selection for distributional models

Title There’s no `Count or Predict’ but task-based selection for distributional models |
Authors Martin Riedl, Chris Biemann
Abstract
Tasks
Published 2017-01-01
URL https://www.aclweb.org/anthology/W17-6933/
PDF https://www.aclweb.org/anthology/W17-6933
PWC https://paperswithcode.com/paper/theres-no-count-or-predict-but-task-based
Repo
Framework

Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks

Title Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks
Authors Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Ashutosh Baheti
Abstract
Tasks Language Identification, Language Modelling
Published 2017-12-01
URL https://www.aclweb.org/anthology/W17-7509/
PDF https://www.aclweb.org/anthology/W17-7509
PWC https://paperswithcode.com/paper/curriculum-design-for-code-switching
Repo
Framework

Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video

Title Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video
Authors Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhang, Chengqing Zong
Abstract The rapid increase of the multimedia data over the Internet necessitates multi-modal summarization from collections of text, image, audio and video. In this work, we propose an extractive Multi-modal Summarization (MMS) method which can automatically generate a textual summary given a set of documents, images, audios and videos related to a specific topic. The key idea is to bridge the semantic gaps between multi-modal contents. For audio information, we design an approach to selectively use its transcription. For vision information, we learn joint representations of texts and images using a neural network. Finally, all the multi-modal aspects are considered to generate the textural summary by maximizing the salience, non-redundancy, readability and coverage through budgeted optimization of submodular functions. We further introduce an MMS corpus in English and Chinese. The experimental results on this dataset demonstrate that our method outperforms other competitive baseline methods.
Tasks Document Summarization, Speech Recognition, Video Summarization
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1114/
PDF https://www.aclweb.org/anthology/D17-1114
PWC https://paperswithcode.com/paper/multi-modal-summarization-for-asynchronous
Repo
Framework

Idiom-Aware Compositional Distributed Semantics

Title Idiom-Aware Compositional Distributed Semantics
Authors Pengfei Liu, Kaiyu Qian, Xipeng Qiu, Xuanjing Huang
Abstract Idioms are peculiar linguistic constructions that impose great challenges for representing the semantics of language, especially in current prevailing end-to-end neural models, which assume that the semantics of a phrase or sentence can be literally composed from its constitutive words. In this paper, we propose an idiom-aware distributed semantic model to build representation of sentences on the basis of understanding their contained idioms. Our models are grounded in the literal-first psycholinguistic hypothesis, which can adaptively learn semantic compositionality of a phrase literally or idiomatically. To better evaluate our models, we also construct an idiom-enriched sentiment classification dataset with considerable scale and abundant peculiarities of idioms. The qualitative and quantitative experimental analyses demonstrate the efficacy of our models.
Tasks Machine Translation, Sentiment Analysis, Text Classification
Published 2017-09-01
URL https://www.aclweb.org/anthology/D17-1124/
PDF https://www.aclweb.org/anthology/D17-1124
PWC https://paperswithcode.com/paper/idiom-aware-compositional-distributed
Repo
Framework

QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings

Title QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings
Authors Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, Shuwang Han
Abstract This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.
Tasks Lemmatization, Semantic Textual Similarity, Tokenization, Word Embeddings
Published 2017-08-01
URL https://www.aclweb.org/anthology/S17-2020/
PDF https://www.aclweb.org/anthology/S17-2020
PWC https://paperswithcode.com/paper/qlut-at-semeval-2017-task-1-semantic-textual
Repo
Framework
comments powered by Disqus