January 25, 2020

2900 words 14 mins read

Paper Group NANR 61

Medical Word Embeddings for Spanish: Development and Evaluation. Learning nonlinear level sets for dimensionality reduction in function approximation. Improving Sentence Representations with Multi-view Frameworks. Automatically Generating Psychiatric Case Notes From Digital Transcripts of Doctor-Patient Conversations. Gaussian Affinity for Max-Marg …

Medical Word Embeddings for Spanish: Development and Evaluation


Title	Medical Word Embeddings for Spanish: Development and Evaluation
Authors	Felipe Soares, Marta Villegas, Aitor Gonzalez-Agirre, Martin Krallinger, Jordi Armengol-Estap{'e}
Abstract	Word embeddings are representations of words in a dense vector space. Although they are not recent phenomena in Natural Language Processing (NLP), they have gained momentum after the recent developments of neural methods and Word2Vec. Regarding their applications in medical and clinical NLP, they are invaluable resources when training in-domain named entity recognition systems, classifiers or taggers, for instance. Thus, the development of tailored word embeddings for medical NLP is of great interest. However, we identified a gap in the literature which we aim to fill in this paper: the availability of embeddings for medical NLP in Spanish, as well as a standardized form of intrinsic evaluation. Since most work has been done for English, some established datasets for intrinsic evaluation are already available. In this paper, we show the steps we employed to adapt such datasets for the first time to Spanish, of particular relevance due to the considerable volume of EHRs in this language, as well as the creation of in-domain medical word embeddings for the Spanish using the state-of-the-art FastText model. We performed intrinsic evaluation with our adapted datasets, as well as extrinsic evaluation with a named entity recognition systems using a baseline embedding of general-domain. Both experiments proved that our embeddings are suitable for use in medical NLP in the Spanish language, and are more accurate than general-domain ones.
Tasks	Named Entity Recognition, Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1916/
PDF	https://www.aclweb.org/anthology/W19-1916
PWC	https://paperswithcode.com/paper/medical-word-embeddings-for-spanish
Repo
Framework

Learning nonlinear level sets for dimensionality reduction in function approximation


Title	Learning nonlinear level sets for dimensionality reduction in function approximation
Authors	Guannan Zhang, Jiaxin Zhang, Jacob Hinkle
Abstract	We developed a Nonlinear Level-set Learning (NLL) method for dimensionality reduction in high-dimensional function approximation with small data. This work is motivated by a variety of design tasks in real-world engineering applications, where practitioners would replace their computationally intensive physical models (e.g., high-resolution fluid simulators) with fast-to-evaluate predictive machine learning models, so as to accelerate the engineering design processes. There are two major challenges in constructing such predictive models: (a) high-dimensional inputs (e.g., many independent design parameters) and (b) small training data, generated by running extremely time-consuming simulations. Thus, reducing the input dimension is critical to alleviate the over-fitting issue caused by data insufficiency. Existing methods, including sliced inverse regression and active subspace approaches, reduce the input dimension by learning a linear coordinate transformation; our main contribution is to extend the transformation approach to a nonlinear regime. Specifically, we exploit reversible networks (RevNets) to learn nonlinear level sets of a high-dimensional function and parameterize its level sets in low-dimensional spaces. A new loss function was designed to utilize samples of the target functions’ gradient to encourage the transformed function to be sensitive to only a few transformed coordinates. The NLL approach is demonstrated by applying it to three 2D functions and two 20D functions for showing the improved approximation accuracy with the use of nonlinear transformation, as well as to an 8D composite material design problem for optimizing the buckling-resistance performance of composite shells of rocket inter-stages.
Tasks	Dimensionality Reduction
Published	2019-12-01
URL	http://papers.nips.cc/paper/9478-learning-nonlinear-level-sets-for-dimensionality-reduction-in-function-approximation
PDF	http://papers.nips.cc/paper/9478-learning-nonlinear-level-sets-for-dimensionality-reduction-in-function-approximation.pdf
PWC	https://paperswithcode.com/paper/learning-nonlinear-level-sets-for
Repo
Framework

Improving Sentence Representations with Multi-view Frameworks


Title	Improving Sentence Representations with Multi-view Frameworks
Authors	Shuai Tang, Virginia R. de Sa
Abstract	Multi-view learning can provide self-supervision when different views are available of the same data. Distributional hypothesis provides another form of useful self-supervision from adjacent sentences which are plentiful in large unlabelled corpora. Motivated by the asymmetry in the two hemispheres of the human brain as well as the observation that different learning architectures tend to emphasise different aspects of sentence meaning, we present two multi-view frameworks for learning sentence representations in an unsupervised fashion. One framework uses a generative objective and the other a discriminative one. In both frameworks, the final representation is an ensemble of two views, in which, one view encodes the input sentence with a Recurrent Neural Network (RNN), and the other view encodes it with a simple linear model. We show that, after learning, the vectors produced by our multi-view frameworks provide improved representations over their single-view learnt counterparts, and the combination of different views gives representational improvement over each view and demonstrates solid transferability on standard downstream tasks.
Tasks	MULTI-VIEW LEARNING
Published	2019-05-01
URL	https://openreview.net/forum?id=S1xzyhR9Y7
PDF	https://openreview.net/pdf?id=S1xzyhR9Y7
PWC	https://paperswithcode.com/paper/improving-sentence-representations-with-multi-1
Repo
Framework

Automatically Generating Psychiatric Case Notes From Digital Transcripts of Doctor-Patient Conversations


Title	Automatically Generating Psychiatric Case Notes From Digital Transcripts of Doctor-Patient Conversations
Authors	Nazmul Kazi, Kah, Indika a
Abstract	Electronic health records (EHRs) are notorious for reducing the face-to-face time with patients while increasing the screen-time for clinicians leading to burnout. This is especially problematic for psychiatry care in which maintaining consistent eye-contact and non-verbal cues are just as important as the spoken words. In this ongoing work, we explore the feasibility of automatically generating psychiatric EHR case notes from digital transcripts of doctor-patient conversation using a two-step approach: (1) predicting semantic topics for segments of transcripts using supervised machine learning, and (2) generating formal text of those segments using natural language processing. Through a series of preliminary experimental results obtained through a collection of synthetic and real-life transcripts, we demonstrate the viability of this approach.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1918/
PDF	https://www.aclweb.org/anthology/W19-1918
PWC	https://paperswithcode.com/paper/automatically-generating-psychiatric-case
Repo
Framework

Gaussian Affinity for Max-Margin Class Imbalanced Learning


Title	Gaussian Affinity for Max-Margin Class Imbalanced Learning
Authors	Munawar Hayat, Salman Khan, Syed Waqas Zamir, Jianbing Shen, Ling Shao
Abstract	Real-world object classes appear in imbalanced ratios. This poses a significant challenge for classifiers which get biased towards frequent classes. We hypothesize that improving the generalization capability of a classifier should improve learning on imbalanced datasets. Here, we introduce the first hybrid loss function that jointly performs classification and clustering in a single formulation. Our approach is based on an `affinity measure’ in Euclidean space that leads to the following benefits: (1) direct enforcement of maximum margin constraints on classification boundaries, (2) a tractable way to ensure uniformly spaced and equidistant cluster centers, (3) flexibility to learn multiple class prototypes to support diversity and discriminability in feature space. Our extensive experiments demonstrate the significant performance improvements on visual classification and verification tasks on multiple imbalanced datasets. The proposed loss can easily be plugged in any deep architecture as a differentiable block and demonstrates robustness against different levels of data imbalance and corrupted labels. \|
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Hayat_Gaussian_Affinity_for_Max-Margin_Class_Imbalanced_Learning_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Hayat_Gaussian_Affinity_for_Max-Margin_Class_Imbalanced_Learning_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/gaussian-affinity-for-max-margin-class
Repo
Framework

The ``Jump and Stay’’ Method to Discover Proper Verb Centered Constructions in Corpus Lattices


Title	The ``Jump and Stay’’ Method to Discover Proper Verb Centered Constructions in Corpus Lattices \|
Authors	B{'a}lint Sass
Abstract	The research presented here is based on the theoretical model of corpus lattices. We implemented this as an effective data structure, and developed an algorithm based on this structure to discover essential verbal expressions from corpus data. The idea behind the algorithm is the {}jump and stay{''} principle, which tells us that our target expressions will be found at such places in the lattice where the value of a suitable function (defined on the vertex set of the corpus lattice) significantly increases (jumps) and then remains the same (stays). We evaluated our method on Hungarian data. Evaluation shows that about 75{\%} of the obtained expressions are correct, actual errors are rare. Thus, this paper is 1. a proof of concept concerning the corpus lattice model, opening the way to investigate this structure further through our implementation; and 2. a proof of concept of the {}jump and stay{''} idea and the algorithm itself, opening the way to apply it further, e.g. for other languages.
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1124/
PDF	https://www.aclweb.org/anthology/R19-1124
PWC	https://paperswithcode.com/paper/the-jump-and-stay-method-to-discover-proper
Repo
Framework

Entrepreneurial Growth: Challenges to Young Omani Entrepreneurs


Title	Entrepreneurial Growth: Challenges to Young Omani Entrepreneurs
Authors	Dr. Subrahmanian Muthuraman;Dr. Mohammed Al Haziazi
Abstract	Entrepreneurship plays an important role in economic prosperity and social stability in many developed countries. Entrepreneurship has been adopted as a strategy to promote economic activities among young people. There is a growing interest in understanding the various challenges of youth entrepreneurship. The main purpose of this study was to assess the challenges in entrepreneurial growth of young entrepreneurs in the Sultanate of Oman. This research was set out to investigate the obstacles that young people encounter when setting up their businesses; the current obstacles that prevent the expansion of their entrepreneurial ventures; as well as the prospects for youth entrepreneurship development in this community. The study employed a descriptive survey research type and used convenience sampling technique to collect data. A standardized questionnaire was used as an instrument to collect data to establish the perceptions of 52 young Omani entrepreneurs. This paper is significant in that it brings insights on challenges for entrepreneurship in Oman. The importance of stimulating the entrepreneurial spirit, values, and attitudes of young people and encouraging innovative business start-ups while fostering a more entrepreneur-friendly culture must be translated into actual and effective policy actions in Oman. We consider that supporting youth entrepreneurship must be an Oman’s priority.
Tasks
Published	2019-07-30
URL	https://ijbassnet.com/publication/247/details
PDF	https://ijbassnet.com/storage/app/publications/5d4017815386711564481409.pdf
PWC	https://paperswithcode.com/paper/entrepreneurial-growth-challenges-to-young
Repo
Framework

The Design and Construction of the Corpus of China English


Title	The Design and Construction of the Corpus of China English
Authors	lixin xia, Yun Xia
Abstract	The paper describes the development a corpus of an English variety, i.e. China English, in or-der to provide a linguistic resource for researchers in the field of China English. The Corpus of China English (CCE) was built with due consideration given to its representativeness and authenticity. It was composed of more than 13,962,102 tokens in 15,333 texts evenly divided between the following four genres: newspapers, magazines, fiction and academic writings. The texts cover a wide range of domains, such as news, financial, politics, environment, social, culture, technology, sports, education, philosophy, literary, etc. It is a helpful resource for research on China English, computational linguistics, natural language processing, corpus linguistics and English language education.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/papers/W/W19/W19-3613/
PDF	https://www.aclweb.org/anthology/W19-3613
PWC	https://paperswithcode.com/paper/the-design-and-construction-of-the-corpus-of
Repo
Framework

Nurse care activity recognition challenge: summary and results


Title	Nurse care activity recognition challenge: summary and results
Authors	http://delivery.acm.org/10.1145/3350000/3345577/p746-lago.pdf
Abstract	Although activity recognition has been studied for a long time now, research and applications have focused on physical activity recognition. Even if many application domains require the recognition of more complex activities, research on such activities has attracted less attention. One reason for this gap is the lack of datasets to evaluate and compare different methods. To promote research in such scenarios, we organized the Open Lab Nursing Activity Recognition Challenge focusing on the recognition of complex activities related to the nursing domain. Nursing domain is one of the domains that can benefit enormously from activity recognition but has not been researched due to lack of datasets. The competition used the CARE-COM Nurse Care Activity Dataset, featuring 7 activities performed by 8 subjects in a controlled environment with accelerometer sensors, motion capture and indoor location sensor. In this paper, we summarize the results of the competition.
Tasks	Activity Recognition, Motion Capture, Multimodal Activity Recognition
Published	2019-09-09
URL	https://doi.org/10.1145/3341162.3345577
PDF	http://delivery.acm.org/10.1145/3350000/3345577/p746-lago.pdf
PWC	https://paperswithcode.com/paper/nurse-care-activity-recognition-challenge
Repo
Framework

Variational Structured Semantic Inference for Diverse Image Captioning


Title	Variational Structured Semantic Inference for Diverse Image Captioning
Authors	Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang
Abstract	Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i.e., the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i.e., Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior. Then, a reconstruction loss and the posterior-prior KL-divergence are jointly estimated to optimize the VSSI-cap model. Finally, diverse captions are generated upon the visual features and the latent variables from this structured encoder-inferer-decoder model. Experiments on the benchmark dataset show that the proposed VSSI-cap achieves significant improvements over the state-of-the-arts.
Tasks	Image Captioning
Published	2019-12-01
URL	http://papers.nips.cc/paper/8468-variational-structured-semantic-inference-for-diverse-image-captioning
PDF	http://papers.nips.cc/paper/8468-variational-structured-semantic-inference-for-diverse-image-captioning.pdf
PWC	https://paperswithcode.com/paper/variational-structured-semantic-inference-for
Repo
Framework

Fermi at SemEval-2019 Task 4: The sarah-jane-smith Hyperpartisan News Detector


Title	Fermi at SemEval-2019 Task 4: The sarah-jane-smith Hyperpartisan News Detector
Authors	Nikhil Chakravartula, Vijayasaradhi Indurthi, Bakhtiyar Syed
Abstract	This paper describes our system (Fermi) for Task 4: Hyper-partisan News detection of SemEval-2019. We use simple text classification algorithms by transforming the input features to a reduced feature set. We aim to find the right number of features useful for efficient classification and explore multiple training models to evaluate the performance of these text classification algorithms. Our team - Fermi{'}s model achieved an accuracy of 59.10{%} and an F1 score of 69.5{%} on the official test data set. In this paper, we provide a detailed description of the approach as well as the results obtained in the task.
Tasks	Text Classification
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2163/
PDF	https://www.aclweb.org/anthology/S19-2163
PWC	https://paperswithcode.com/paper/fermi-at-semeval-2019-task-4-the-sarah-jane
Repo
Framework

UdS-DFKI Participation at WMT 2019: Low-Resource (en-gu) and Coreference-Aware (en-de) Systems


Title	UdS-DFKI Participation at WMT 2019: Low-Resource (en-gu) and Coreference-Aware (en-de) Systems
Authors	Cristina Espa{~n}a-Bonet, Dana Ruiter
Abstract	This paper describes the UdS-DFKI submission to the WMT2019 news translation task for Gujarati{–}English (low-resourced pair) and German{–}English (document-level evaluation). Our systems rely on the on-line extraction of parallel sentences from comparable corpora for the first scenario and on the inclusion of coreference-related information in the training data in the second one.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5315/
PDF	https://www.aclweb.org/anthology/W19-5315
PWC	https://paperswithcode.com/paper/uds-dfki-participation-at-wmt-2019-low
Repo
Framework

Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes


Title	Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes
Authors	Veronica Lynn, Salvatore Giorgi, Niranjan Balasubramanian, H. Andrew Schwartz
Abstract	NLP naturally puts a primary focus on leveraging document language, occasionally considering user attributes as supplemental. However, as we tackle more social scientific tasks, it is possible user attributes might be of primary importance and the document supplemental. Here, we systematically investigate the predictive power of user-level features alone versus document-level features for document-level tasks. We first show user attributes can sometimes carry more task-related information than the document itself. For example, a tweet-level stance detection model using only 13 user-level attributes (i.e. features that did not depend on the specific tweet) was able to obtain a higher F1 than the top-performing SemEval participant. We then consider multiple tasks and a wider range of user attributes, showing the performance of strong document-only models can often be improved (as in stance, sentiment, and sarcasm) with user attributes, particularly benefiting tasks with stable {`}trait-like{''} outcomes (e.g. stance) most relative to frequently changing {`}state-like{''} outcomes (e.g. sentiment). These results not only support the growing work on integrating user factors into predictive systems, but that some of our NLP tasks might be better cast primarily as user-level (or human) tasks.
Tasks	Stance Detection
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2103/
PDF	https://www.aclweb.org/anthology/W19-2103
PWC	https://paperswithcode.com/paper/tweet-classification-without-the-tweet-an
Repo
Framework

Stance Classification, Outcome Prediction, and Impact Assessment: NLP Tasks for Studying Group Decision-Making


Title	Stance Classification, Outcome Prediction, and Impact Assessment: NLP Tasks for Studying Group Decision-Making
Authors	Elijah Mayfield, Alan Black
Abstract	In group decision-making, the nuanced process of conflict and resolution that leads to consensus formation is closely tied to the quality of decisions made. Behavioral scientists rarely have rich access to process variables, though, as unstructured discussion transcripts are difficult to analyze. Here, we define ways for NLP researchers to contribute to the study of groups and teams. We introduce three tasks alongside a large new corpus of over 400,000 group debates on Wikipedia. We describe the tasks and their importance, then provide baselines showing that BERT contextualized word embeddings consistently outperform other language representations.
Tasks	Decision Making, Word Embeddings
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2108/
PDF	https://www.aclweb.org/anthology/W19-2108
PWC	https://paperswithcode.com/paper/stance-classification-outcome-prediction-and
Repo
Framework

A Sociolinguistic Study of Online Echo Chambers on Twitter


Title	A Sociolinguistic Study of Online Echo Chambers on Twitter
Authors	Nikita Duseja, Harsh Jhamtani
Abstract	Online social media platforms such as Facebook and Twitter are increasingly facing criticism for polarization of users. One particular aspect which has caught the attention of various critics is presence of users in echo chambers - a situation wherein users are exposed mostly to the opinions which are in sync with their own views. In this paper, we perform a sociolinguistic study by comparing the tweets of users in echo chambers with the tweets of users not in echo chambers with similar levels of polarity on a broad topic. Specifically, we carry out a comparative analysis of tweet structure, lexical choices, and focus issues, and provide possible explanations for the results.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-2109/
PDF	https://www.aclweb.org/anthology/W19-2109
PWC	https://paperswithcode.com/paper/a-sociolinguistic-study-of-online-echo
Repo
Framework