July 26, 2019

1945 words 10 mins read

Paper Group NANR 93

Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts. Temporal Orientation of Tweets for Predicting Income of Users. Adversarial Adaptation of Synthetic or Stale Data. An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree. 語音文件檢索使 …

Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts


Title	Enriching Complex Networks with Word Embeddings for Detecting Mild Cognitive Impairment from Speech Transcripts
Authors	Le Santos, ro, Edilson Anselmo Corr{^e}a J{'u}nior, Osvaldo Oliveira Jr, Diego Amancio, Let{'\i}cia Mansur, S Alu{'\i}sio, ra
Abstract	Mild Cognitive Impairment (MCI) is a mental disorder difficult to diagnose. Linguistic features, mainly from parsers, have been used to detect MCI, but this is not suitable for large-scale assessments. MCI disfluencies produce non-grammatical speech that requires manual or high precision automatic correction of transcripts. In this paper, we modeled transcripts into complex networks and enriched them with word embedding (CNE) to better represent short texts produced in neuropsychological assessments. The network measurements were applied with well-known classifiers to automatically identify MCI in transcripts, in a binary classification task. A comparison was made with the performance of traditional approaches using Bag of Words (BoW) and linguistic features for three datasets: DementiaBank in English, and Cinderella and Arizona-Battery in Portuguese. Overall, CNE provided higher accuracy than using only complex networks, while Support Vector Machine was superior to other classifiers. CNE provided the highest accuracies for DementiaBank and Cinderella, but BoW was more efficient for the Arizona-Battery dataset probably owing to its short narratives. The approach using linguistic features yielded higher accuracy if the transcriptions of the Cinderella dataset were manually revised. Taken together, the results indicate that complex networks enriched with embedding is promising for detecting MCI in large-scale assessments.
Tasks	Word Embeddings
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1118/
PDF	https://www.aclweb.org/anthology/P17-1118
PWC	https://paperswithcode.com/paper/enriching-complex-networks-with-word-1
Repo
Framework

Temporal Orientation of Tweets for Predicting Income of Users


Title	Temporal Orientation of Tweets for Predicting Income of Users
Authors	Mohammed Hasanuzzaman, Sabyasachi Kamila, M Kaur, eep, Sriparna Saha, Asif Ekbal
Abstract	Automatically estimating a user{'}s socio-economic profile from their language use in social media can significantly help social science research and various downstream applications ranging from business to politics. The current paper presents the first study where user cognitive structure is used to build a predictive model of income. In particular, we first develop a classifier using a weakly supervised learning framework to automatically time-tag tweets as past, present, or future. We quantify a user{'}s overall temporal orientation based on their distribution of tweets, and use it to build a predictive model of income. Our analysis uncovers a correlation between future temporal orientation and income. Finally, we measure the predictive power of future temporal orientation on income by performing regression.
Tasks
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2104/
PDF	https://www.aclweb.org/anthology/P17-2104
PWC	https://paperswithcode.com/paper/temporal-orientation-of-tweets-for-predicting
Repo
Framework

Adversarial Adaptation of Synthetic or Stale Data


Title	Adversarial Adaptation of Synthetic or Stale Data
Authors	Young-Bum Kim, Karl Stratos, Dongchan Kim
Abstract	Two types of data shift common in practice are 1. transferring from synthetic data to live user data (a deployment shift), and 2. transferring from stale data to current data (a temporal shift). Both cause a distribution mismatch between training and evaluation, leading to a model that overfits the flawed training data and performs poorly on the test data. We propose a solution to this mismatch problem by framing it as domain adaptation, treating the flawed training dataset as a source domain and the evaluation dataset as a target domain. To this end, we use and build on several recent advances in neural domain adaptation such as adversarial training (Ganinet al., 2016) and domain separation network (Bousmalis et al., 2016), proposing a new effective adversarial training scheme. In both supervised and unsupervised adaptation scenarios, our approach yields clear improvement over strong baselines.
Tasks	Domain Adaptation, Spoken Language Understanding
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-1119/
PDF	https://www.aclweb.org/anthology/P17-1119
PWC	https://paperswithcode.com/paper/adversarial-adaptation-of-synthetic-or-stale
Repo
Framework

An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree


Title	An Approach to Extract Product Features from Chinese Consumer Reviews and Establish Product Feature Structure Tree
Authors	Xinsheng Xu, Jing Lin, Ying Xiao, Jianzhe Yu
Abstract
Tasks
Published	2017-06-01
URL	https://www.aclweb.org/anthology/O17-2003/
PDF	https://www.aclweb.org/anthology/O17-2003
PWC	https://paperswithcode.com/paper/an-approach-to-extract-product-features-from
Repo
Framework

語音文件檢索使用類神經網路技術 (On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval) [In Chinese]


Title	語音文件檢索使用類神經網路技術 (On the Use of Neural Network Modeling Techniques for Spoken Document Retrieval) [In Chinese]
Authors	Tien-Hong Lo, Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen
Abstract
Tasks
Published	2017-12-01
URL	https://www.aclweb.org/anthology/O17-3002/
PDF	https://www.aclweb.org/anthology/O17-3002
PWC	https://paperswithcode.com/paper/eae3aac-a12c-eccc2e-e-on-the-use-of-neural
Repo
Framework

Joint Unsupervised Learning of Semantic Representation of Words and Roles in Dependency Trees


Title	Joint Unsupervised Learning of Semantic Representation of Words and Roles in Dependency Trees
Authors	Michal Konkol
Abstract	In this paper, we introduce WoRel, a model that jointly learns word embeddings and a semantic representation of word relations. The model learns from plain text sentences and their dependency parse trees. The word embeddings produced by WoRel outperform Skip-Gram and GloVe in word similarity and syntactical word analogy tasks and have comparable results on word relatedness and semantic word analogy tasks. We show that the semantic representation of relations enables us to express the meaning of phrases and is a promising research direction for semantics at the sentence level.
Tasks	Named Entity Recognition, Question Answering, Sentiment Analysis, Word Embeddings
Published	2017-09-01
URL	https://www.aclweb.org/anthology/R17-1052/
PDF	https://doi.org/10.26615/978-954-452-049-6_052
PWC	https://paperswithcode.com/paper/joint-unsupervised-learning-of-semantic
Repo
Framework

A computationally-assisted procedure for discovering poetic organization within oral tradition


Title	A computationally-assisted procedure for discovering poetic organization within oral tradition
Authors	David Meyer
Abstract
Tasks
Published	2017-03-01
URL	https://www.aclweb.org/anthology/W17-0113/
PDF	https://www.aclweb.org/anthology/W17-0113
PWC	https://paperswithcode.com/paper/a-computationally-assisted-procedure-for
Repo
Framework

On the Distribution of Lexical Features at Multiple Levels of Analysis


Title	On the Distribution of Lexical Features at Multiple Levels of Analysis
Authors	Fatemeh Almodaresi, Lyle Ungar, Vivek Kulkarni, Mohsen Zakeri, Salvatore Giorgi, H. Andrew Schwartz
Abstract	Natural language processing has increasingly moved from modeling documents and words toward studying the people behind the language. This move to working with data at the user or community level has presented the field with different characteristics of linguistic data. In this paper, we empirically characterize various lexical distributions at different levels of analysis, showing that, while most features are decidedly sparse and non-normal at the message-level (as with traditional NLP), they follow the central limit theorem to become much more Log-normal or even Normal at the user- and county-levels. Finally, we demonstrate that modeling lexical features for the correct level of analysis leads to marked improvements in common social scientific prediction tasks.
Tasks	Document Classification, Sentiment Analysis
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2013/
PDF	https://www.aclweb.org/anthology/P17-2013
PWC	https://paperswithcode.com/paper/on-the-distribution-of-lexical-features-at
Repo
Framework

On the Challenges of Translating NLP Research into Commercial Products


Title	On the Challenges of Translating NLP Research into Commercial Products
Authors	Daniel Dahlmeier
Abstract	This paper highlights challenges in industrial research related to translating research in natural language processing into commercial products. While the interest in natural language processing from industry is significant, the transfer of research to commercial products is non-trivial and its challenges are often unknown to or underestimated by many researchers. I discuss current obstacles and provide suggestions for increasing the chances for translating research to commercial success based on my experience in industrial research.
Tasks
Published	2017-07-01
URL	https://www.aclweb.org/anthology/P17-2015/
PDF	https://www.aclweb.org/anthology/P17-2015
PWC	https://paperswithcode.com/paper/on-the-challenges-of-translating-nlp-research
Repo
Framework

A Web-Based Interactive Tool for Creating, Inspecting, Editing, and Publishing Etymological Datasets


Title	A Web-Based Interactive Tool for Creating, Inspecting, Editing, and Publishing Etymological Datasets
Authors	Johann-Mattis List
Abstract	The paper presents the Etymological DICtionary ediTOR (EDICTOR), a free, interactive, web-based tool designed to aid historical linguists in creating, editing, analysing, and publishing etymological datasets. The EDICTOR offers interactive solutions for important tasks in historical linguistics, including facilitated input and segmentation of phonetic transcriptions, quantitative and qualitative analyses of phonetic and morphological data, enhanced interfaces for cognate class assignment and multiple word alignment, and automated evaluation of regular sound correspondences. As a web-based tool written in JavaScript, the EDICTOR can be used in standard web browsers across all major platforms.
Tasks	Word Alignment
Published	2017-04-01
URL	https://www.aclweb.org/anthology/E17-3003/
PDF	https://www.aclweb.org/anthology/E17-3003
PWC	https://paperswithcode.com/paper/a-web-based-interactive-tool-for-creating
Repo
Framework

There’s no `Count or Predict’ but task-based selection for distributional models


Title	There’s no `Count or Predict’ but task-based selection for distributional models \|
Authors	Martin Riedl, Chris Biemann
Abstract
Tasks
Published	2017-01-01
URL	https://www.aclweb.org/anthology/W17-6933/
PDF	https://www.aclweb.org/anthology/W17-6933
PWC	https://paperswithcode.com/paper/theres-no-count-or-predict-but-task-based
Repo
Framework

Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks


Title	Curriculum Design for Code-switching: Experiments with Language Identification and Language Modeling with Deep Neural Networks
Authors	Monojit Choudhury, Kalika Bali, Sunayana Sitaram, Ashutosh Baheti
Abstract
Tasks	Language Identification, Language Modelling
Published	2017-12-01
URL	https://www.aclweb.org/anthology/W17-7509/
PDF	https://www.aclweb.org/anthology/W17-7509
PWC	https://paperswithcode.com/paper/curriculum-design-for-code-switching
Repo
Framework


Title	Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video
Authors	Haoran Li, Junnan Zhu, Cong Ma, Jiajun Zhang, Chengqing Zong
Abstract	The rapid increase of the multimedia data over the Internet necessitates multi-modal summarization from collections of text, image, audio and video. In this work, we propose an extractive Multi-modal Summarization (MMS) method which can automatically generate a textual summary given a set of documents, images, audios and videos related to a specific topic. The key idea is to bridge the semantic gaps between multi-modal contents. For audio information, we design an approach to selectively use its transcription. For vision information, we learn joint representations of texts and images using a neural network. Finally, all the multi-modal aspects are considered to generate the textural summary by maximizing the salience, non-redundancy, readability and coverage through budgeted optimization of submodular functions. We further introduce an MMS corpus in English and Chinese. The experimental results on this dataset demonstrate that our method outperforms other competitive baseline methods.
Tasks	Document Summarization, Speech Recognition, Video Summarization
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1114/
PDF	https://www.aclweb.org/anthology/D17-1114
PWC	https://paperswithcode.com/paper/multi-modal-summarization-for-asynchronous
Repo
Framework

Idiom-Aware Compositional Distributed Semantics


Title	Idiom-Aware Compositional Distributed Semantics
Authors	Pengfei Liu, Kaiyu Qian, Xipeng Qiu, Xuanjing Huang
Abstract	Idioms are peculiar linguistic constructions that impose great challenges for representing the semantics of language, especially in current prevailing end-to-end neural models, which assume that the semantics of a phrase or sentence can be literally composed from its constitutive words. In this paper, we propose an idiom-aware distributed semantic model to build representation of sentences on the basis of understanding their contained idioms. Our models are grounded in the literal-first psycholinguistic hypothesis, which can adaptively learn semantic compositionality of a phrase literally or idiomatically. To better evaluate our models, we also construct an idiom-enriched sentiment classification dataset with considerable scale and abundant peculiarities of idioms. The qualitative and quantitative experimental analyses demonstrate the efficacy of our models.
Tasks	Machine Translation, Sentiment Analysis, Text Classification
Published	2017-09-01
URL	https://www.aclweb.org/anthology/D17-1124/
PDF	https://www.aclweb.org/anthology/D17-1124
PWC	https://paperswithcode.com/paper/idiom-aware-compositional-distributed
Repo
Framework

QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings


Title	QLUT at SemEval-2017 Task 1: Semantic Textual Similarity Based on Word Embeddings
Authors	Fanqing Meng, Wenpeng Lu, Yuteng Zhang, Jinyong Cheng, Yuehan Du, Shuwang Han
Abstract	This paper reports the details of our submissions in the task 1 of SemEval 2017. This task aims at assessing the semantic textual similarity of two sentences or texts. We submit three unsupervised systems based on word embeddings. The differences between these runs are the various preprocessing on evaluation data. The best performance of these systems on the evaluation of Pearson correlation is 0.6887. Unsurprisingly, results of our runs demonstrate that data preprocessing, such as tokenization, lemmatization, extraction of content words and removing stop words, is helpful and plays a significant role in improving the performance of models.
Tasks	Lemmatization, Semantic Textual Similarity, Tokenization, Word Embeddings
Published	2017-08-01
URL	https://www.aclweb.org/anthology/S17-2020/
PDF	https://www.aclweb.org/anthology/S17-2020
PWC	https://paperswithcode.com/paper/qlut-at-semeval-2017-task-1-semantic-textual
Repo
Framework