February 1, 2020

3146 words 15 mins read

Paper Group AWR 325

Re-Ranking Words to Improve Interpretability of Automatically Generated Topics. Medical device surveillance with electronic health records. Go-Explore: a New Approach for Hard-Exploration Problems. Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation. Is a Single Embedding Enough? Learning Node Representations that Capture Multip …

Re-Ranking Words to Improve Interpretability of Automatically Generated Topics


Title	Re-Ranking Words to Improve Interpretability of Automatically Generated Topics
Authors	Areej Alokaili, Nikolaos Aletras, Mark Stevenson
Abstract	Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often difficult for humans to interpret. This paper explores the re-ranking of topic words to generate more interpretable topic representations. A range of approaches are compared and evaluated in two experiments. The first uses crowdworkers to associate topics represented by different word rankings with related documents. The second experiment is an automatic approach based on a document retrieval task applied on multiple domains. Results in both experiments demonstrate that re-ranking words improves topic interpretability and that the most effective re-ranking schemes were those which combine information about the importance of words both within topics and their relative frequency in the entire corpus. In addition, close correlation between the results of the two evaluation approaches suggests that the automatic method proposed here could be used to evaluate re-ranking methods without the need for human judgements.
Tasks	Interpretable Machine Learning
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12542v1
PDF	http://arxiv.org/pdf/1903.12542v1.pdf
PWC	https://paperswithcode.com/paper/re-ranking-words-to-improve-interpretability
Repo	https://github.com/areejokaili/topic_reranking
Framework	none

Medical device surveillance with electronic health records


Title	Medical device surveillance with electronic health records
Authors	Alison Callahan, Jason A Fries, Christopher Ré, James I Huddleston III, Nicholas J Giori, Scott Delp, Nigam H Shah
Abstract	Post-market medical device surveillance is a challenge facing manufacturers, regulatory agencies, and health care providers. Electronic health records are valuable sources of real world evidence to assess device safety and track device-related patient outcomes over time. However, distilling this evidence remains challenging, as information is fractured across clinical notes and structured records. Modern machine learning methods for machine reading promise to unlock increasingly complex information from text, but face barriers due to their reliance on large and expensive hand-labeled training sets. To address these challenges, we developed and validated state-of-the-art deep learning methods that identify patient outcomes from clinical notes without requiring hand-labeled training data. Using hip replacements as a test case, our methods accurately extracted implant details and reports of complications and pain from electronic health records with up to 96.3% precision, 98.5% recall, and 97.4% F1, improved classification performance by 12.7- 53.0% over rule-based methods, and detected over 6 times as many complication events compared to using structured data alone. Using these events to assess complication-free survivorship of different implant systems, we found significant variation between implants, including for risk of revision surgery, which could not be detected using coded data alone. Patients with revision surgeries had more hip pain mentions in the post-hip replacement, pre-revision period compared to patients with no evidence of revision surgery (mean hip pain mentions 4.97 vs. 3.23; t = 5.14; p < 0.001). Some implant models were associated with higher or lower rates of hip pain mentions. Our methods complement existing surveillance mechanisms by requiring orders of magnitude less hand-labeled training data, offering a scalable solution for national medical device surveillance.
Tasks	Reading Comprehension
Published	2019-04-03
URL	http://arxiv.org/abs/1904.07640v1
PDF	http://arxiv.org/pdf/1904.07640v1.pdf
PWC	https://paperswithcode.com/paper/190407640
Repo	https://github.com/som-shahlab/ehr-rwe
Framework	pytorch

Go-Explore: a New Approach for Hard-Exploration Problems


Title	Go-Explore: a New Approach for Hard-Exploration Problems
Authors	Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune
Abstract	A grand challenge in reinforcement learning is intelligent exploration, especially when rewards are sparse or deceptive. Two Atari games serve as benchmarks for such hard-exploration domains: Montezuma’s Revenge and Pitfall. On both games, current RL algorithms perform poorly, even those with intrinsic motivation, which is the dominant method to improve performance on hard-exploration domains. To address this shortfall, we introduce a new algorithm called Go-Explore. It exploits the following principles: (1) remember previously visited states, (2) first return to a promising state (without exploration), then explore from it, and (3) solve simulated environments through any available means (including by introducing determinism), then robustify via imitation learning. The combined effect of these principles is a dramatic performance improvement on hard-exploration problems. On Montezuma’s Revenge, Go-Explore scores a mean of over 43k points, almost 4 times the previous state of the art. Go-Explore can also harness human-provided domain knowledge and, when augmented with it, scores a mean of over 650k points on Montezuma’s Revenge. Its max performance of nearly 18 million surpasses the human world record, meeting even the strictest definition of “superhuman” performance. On Pitfall, Go-Explore with domain knowledge is the first algorithm to score above zero. Its mean score of almost 60k points exceeds expert human performance. Because Go-Explore produces high-performing demonstrations automatically and cheaply, it also outperforms imitation learning work where humans provide solution demonstrations. Go-Explore opens up many new research directions into improving it and weaving its insights into current RL algorithms. It may also enable progress on previously unsolvable hard-exploration problems in many domains, especially those that harness a simulator during training (e.g. robotics).
Tasks	Atari Games, Imitation Learning, Montezuma’s Revenge
Published	2019-01-30
URL	https://arxiv.org/abs/1901.10995v2
PDF	https://arxiv.org/pdf/1901.10995v2.pdf
PWC	https://paperswithcode.com/paper/go-explore-a-new-approach-for-hard
Repo	https://github.com/DanieleGravina/divergence-and-quality-diversity
Framework	none

Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation


Title	Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation
Authors	Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo
Abstract	Collaborative filtering often suffers from sparsity and cold start problems in real recommendation scenarios, therefore, researchers and engineers usually use side information to address the issues and improve the performance of recommender systems. In this paper, we consider knowledge graphs as the source of side information. We propose MKR, a Multi-task feature learning approach for Knowledge graph enhanced Recommendation. MKR is a deep end-to-end framework that utilizes knowledge graph embedding task to assist recommendation task. The two tasks are associated by cross&compress units, which automatically share latent features and learn high-order interactions between items in recommender systems and entities in the knowledge graph. We prove that cross&compress units have sufficient capability of polynomial approximation, and show that MKR is a generalized framework over several representative methods of recommender systems and multi-task learning. Through extensive experiments on real-world datasets, we demonstrate that MKR achieves substantial gains in movie, book, music, and news recommendation, over state-of-the-art baselines. MKR is also shown to be able to maintain a decent performance even if user-item interactions are sparse.
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graphs, Multi-Task Learning, Recommendation Systems
Published	2019-01-23
URL	http://arxiv.org/abs/1901.08907v1
PDF	http://arxiv.org/pdf/1901.08907v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-feature-learning-for-knowledge
Repo	https://github.com/hwwang55/MKR
Framework	tf


Title	Is a Single Embedding Enough? Learning Node Representations that Capture Multiple Social Contexts
Authors	Alessandro Epasto, Bryan Perozzi
Abstract	Recent interest in graph embedding methods has focused on learning a single representation for each node in the graph. But can nodes really be best described by a single vector representation? In this work, we propose a method for learning multiple representations of the nodes in a graph (e.g., the users of a social network). Based on a principled decomposition of the ego-network, each representation encodes the role of the node in a different local community in which the nodes participate. These representations allow for improved reconstruction of the nuanced relationships that occur in the graph – a phenomenon that we illustrate through state-of-the-art results on link prediction tasks on a variety of graphs, reducing the error by up to $90%$. In addition, we show that these embeddings allow for effective visual analysis of the learned community structure.
Tasks	Graph Embedding, Link Prediction
Published	2019-05-06
URL	https://arxiv.org/abs/1905.02138v1
PDF	https://arxiv.org/pdf/1905.02138v1.pdf
PWC	https://paperswithcode.com/paper/is-a-single-embedding-enough-learning-node
Repo	https://github.com/benedekrozemberczki/Splitter
Framework	pytorch

Pairwise Comparisons with Flexible Time-Dynamics


Title	Pairwise Comparisons with Flexible Time-Dynamics
Authors	Lucas Maystre, Victor Kristof, Matthias Grossglauser
Abstract	Inspired by applications in sports where the skill of players or teams competing against each other varies over time, we propose a probabilistic model of pairwise-comparison outcomes that can capture a wide range of time dynamics. We achieve this by replacing the static parameters of a class of popular pairwise-comparison models by continuous-time Gaussian processes; the covariance function of these processes enables expressive dynamics. We develop an efficient inference algorithm that computes an approximate Bayesian posterior distribution. Despite the flexbility of our model, our inference algorithm requires only a few linear-time iterations over the data and can take advantage of modern multiprocessor computer architectures. We apply our model to several historical databases of sports outcomes and find that our approach outperforms competing approaches in terms of predictive performance, scales to millions of observations, and generates compelling visualizations that help in understanding and interpreting the data.
Tasks	Bayesian Inference, Gaussian Processes
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07746v2
PDF	https://arxiv.org/pdf/1903.07746v2.pdf
PWC	https://paperswithcode.com/paper/linear-time-inference-for-pairwise
Repo	https://github.com/lucasmaystre/kickscore-kdd19
Framework	none

The Natural Selection of Words: Finding the Features of Fitness


Title	The Natural Selection of Words: Finding the Features of Fitness
Authors	Peter D. Turney, Saif M. Mohammad
Abstract	We introduce a dataset for studying the evolution of words, constructed from WordNet and the Google Books Ngram Corpus. The dataset tracks the evolution of 4,000 synonym sets (synsets), containing 9,000 English words, from 1800 AD to 2000 AD. We present a supervised learning algorithm that is able to predict the future leader of a synset: the word in the synset that will have the highest frequency. The algorithm uses features based on a word’s length, the characters in the word, and the historical frequencies of the word. It can predict change of leadership (including the identity of the new leader) fifty years in the future, with an F-score considerably above random guessing. Analysis of the learned models provides insight into the causes of change in the leader of a synset. The algorithm confirms observations linguists have made, such as the trend to replace the -ise suffix with -ize, the rivalry between the -ity and -ness suffixes, and the struggle between economy (shorter words are easier to remember and to write) and clarity (longer words are more distinctive and less likely to be confused with one another). The results indicate that integration of the Google Books Ngram Corpus with WordNet has significant potential for improving our understanding of how language evolves.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.07013v1
PDF	https://arxiv.org/pdf/1908.07013v1.pdf
PWC	https://paperswithcode.com/paper/the-natural-selection-of-words-finding-the
Repo	https://github.com/pdturney/natural-selection-of-words
Framework	none

Stein’s Lemma for the Reparameterization Trick with Exponential Family Mixtures


Title	Stein’s Lemma for the Reparameterization Trick with Exponential Family Mixtures
Authors	Wu Lin, Mohammad Emtiyaz Khan, Mark Schmidt
Abstract	Stein’s method (Stein, 1973; 1981) is a powerful tool for statistical applications, and has had a significant impact in machine learning. Stein’s lemma plays an essential role in Stein’s method. Previous applications of Stein’s lemma either required strong technical assumptions or were limited to Gaussian distributions with restricted covariance structures. In this work, we extend Stein’s lemma to exponential-family mixture distributions including Gaussian distributions with full covariance structures. Our generalization enables us to establish a connection between Stein’s lemma and the reparamterization trick to derive gradients of expectations of a large class of functions under weak assumptions. Using this connection, we can derive many new reparameterizable gradient-identities that goes beyond the reach of existing works. For example, we give gradient identities when expectation is taken with respect to Student’s t-distribution, skew Gaussian, exponentially modified Gaussian, and normal inverse Gaussian.
Tasks
Published	2019-10-29
URL	https://arxiv.org/abs/1910.13398v1
PDF	https://arxiv.org/pdf/1910.13398v1.pdf
PWC	https://paperswithcode.com/paper/191013398
Repo	https://github.com/yorkerlin/VB-MixEF
Framework	none

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models


Title	Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models
Authors	Oren Melamud, Chaitanya Shivade
Abstract	Large-scale clinical data is invaluable to driving many computational scientific advances today. However, understandable concerns regarding patient privacy hinder the open dissemination of such data and give rise to suboptimal siloed research. De-identification methods attempt to address these concerns but were shown to be susceptible to adversarial attacks. In this work, we focus on the vast amounts of unstructured natural language data stored in clinical notes and propose to automatically generate synthetic clinical notes that are more amenable to sharing using generative models trained on real de-identified records. To evaluate the merit of such notes, we measure both their privacy preservation properties as well as utility in training clinical NLP models. Experiments using neural language models yield notes whose utility is close to that of the real ones in some clinical NLP tasks, yet leave ample room for future improvements.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.07002v2
PDF	https://arxiv.org/pdf/1905.07002v2.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-generation-of-shareable
Repo	https://github.com/orenmel/synth-clinical-notes
Framework	pytorch

Graph Dynamical Networks for Unsupervised Learning of Atomic Scale Dynamics in Materials


Title	Graph Dynamical Networks for Unsupervised Learning of Atomic Scale Dynamics in Materials
Authors	Tian Xie, Arthur France-Lanord, Yanming Wang, Yang Shao-Horn, Jeffrey C. Grossman
Abstract	Understanding the dynamical processes that govern the performance of functional materials is essential for the design of next generation materials to tackle global energy and environmental challenges. Many of these processes involve the dynamics of individual atoms or small molecules in condensed phases, e.g. lithium ions in electrolytes, water molecules in membranes, molten atoms at interfaces, etc., which are difficult to understand due to the complexity of local environments. In this work, we develop graph dynamical networks, an unsupervised learning approach for understanding atomic scale dynamics in arbitrary phases and environments from molecular dynamics simulations. We show that important dynamical information can be learned for various multi-component amorphous material systems, which is difficult to obtain otherwise. With the large amounts of molecular dynamics data generated everyday in nearly every aspect of materials design, this approach provides a broadly useful, automated tool to understand atomic scale dynamics in material systems.
Tasks
Published	2019-02-18
URL	https://arxiv.org/abs/1902.06836v2
PDF	https://arxiv.org/pdf/1902.06836v2.pdf
PWC	https://paperswithcode.com/paper/graph-dynamical-networks-unsupervised
Repo	https://github.com/txie-93/gdynet
Framework	tf

GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-wise Transformations


Title	GraphTER: Unsupervised Learning of Graph Transformation Equivariant Representations via Auto-Encoding Node-wise Transformations
Authors	Xiang Gao, Wei Hu, Guo-Jun Qi
Abstract	Recent advances in Graph Convolutional Neural Networks (GCNNs) have shown their efficiency for non-Euclidean data on graphs, which often require a large amount of labeled data with high cost. It it thus critical to learn graph feature representations in an unsupervised manner in practice. To this end, we propose a novel unsupervised learning of Graph Transformation Equivariant Representations (GraphTER), aiming to capture intrinsic patterns of graph structure under both global and local transformations. Specifically, we allow to sample different groups of nodes from a graph and then transform them node-wise isotropically or anisotropically. Then, we self-train a representation encoder to capture the graph structures by reconstructing these node-wise transformations from the feature representations of the original and transformed graphs. In experiments, we apply the learned GraphTER to graphs of 3D point cloud data, and results on point cloud segmentation/classification show that GraphTER significantly outperforms state-of-the-art unsupervised approaches and pushes greatly closer towards the upper bound set by the fully supervised counterparts. The code is available at: https://github.com/gyshgx868/graph-ter.
Tasks
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08142v2
PDF	https://arxiv.org/pdf/1911.08142v2.pdf
PWC	https://paperswithcode.com/paper/graphter-unsupervised-learning-of-graph
Repo	https://github.com/gyshgx868/graph-ter
Framework	pytorch

MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals


Title	MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals
Authors	Shenda Hong, Cao Xiao, Tengfei Ma, Hongyan Li, Jimeng Sun
Abstract	Electrocardiography (ECG) signals are commonly used to diagnose various cardiac abnormalities. Recently, deep learning models showed initial success on modeling ECG data, however they are mostly black-box, thus lack interpretability needed for clinical usage. In this work, we propose MultIlevel kNowledge-guided Attention networks (MINA) that predict heart diseases from ECG signals with intuitive explanation aligned with medical knowledge. By extracting multilevel (beat-, rhythm- and frequency-level) domain knowledge features separately, MINA combines the medical knowledge and ECG data via a multilevel attention model, making the learned models highly interpretable. Our experiments showed MINA achieved PR-AUC 0.9436 (outperforming the best baseline by 5.51%) in real world ECG dataset. Finally, MINA also demonstrated robust performance and strong interpretability against signal distortion and noise contamination.
Tasks	Electrocardiography (ECG)
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11333v3
PDF	https://arxiv.org/pdf/1905.11333v3.pdf
PWC	https://paperswithcode.com/paper/mina-multilevel-knowledge-guided-attention
Repo	https://github.com/hsd1503/MINA
Framework	pytorch

Distilling Structured Knowledge into Embeddings for Explainable and Accurate Recommendation


Title	Distilling Structured Knowledge into Embeddings for Explainable and Accurate Recommendation
Authors	Yuan Zhang, Xiaoran Xu, Hanning Zhou, Yan Zhang
Abstract	Recently, the embedding-based recommendation models (e.g., matrix factorization and deep models) have been prevalent in both academia and industry due to their effectiveness and flexibility. However, they also have such intrinsic limitations as lacking explainability and suffering from data sparsity. In this paper, we propose an end-to-end joint learning framework to get around these limitations without introducing any extra overhead by distilling structured knowledge from a differentiable path-based recommendation model. Through extensive experiments, we show that our proposed framework can achieve state-of-the-art recommendation performance and meanwhile provide interpretable recommendation reasons.
Tasks
Published	2019-12-18
URL	https://arxiv.org/abs/1912.08422v1
PDF	https://arxiv.org/pdf/1912.08422v1.pdf
PWC	https://paperswithcode.com/paper/distilling-structured-knowledge-into
Repo	https://github.com/yuan-pku/Distilling-Structured-Knowledge-into-Embeddings-for-Explainable-and-Accurate-Recommendation
Framework	none

Facial age estimation by deep residual decision making


Title	Facial age estimation by deep residual decision making
Authors	Shichao Li, Kwang-Ting Cheng
Abstract	Residual representation learning simplifies the optimization problem of learning complex functions and has been widely used by traditional convolutional neural networks. However, it has not been applied to deep neural decision forest (NDF). In this paper we incorporate residual learning into NDF and the resulting model achieves state-of-the-art level accuracy on three public age estimation benchmarks while requiring less memory and computation. We further employ gradient-based technique to visualize the decision-making process of NDF and understand how it is influenced by facial image inputs. The code and pre-trained models will be available at https://github.com/Nicholasli1995/VisualizingNDF.
Tasks	Age Estimation, Decision Making, Representation Learning
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10737v1
PDF	https://arxiv.org/pdf/1908.10737v1.pdf
PWC	https://paperswithcode.com/paper/facial-age-estimation-by-deep-residual
Repo	https://github.com/Nicholasli1995/VisualizingNDF
Framework	pytorch

Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks


Title	Lightweight and Efficient Neural Natural Language Processing with Quaternion Networks
Authors	Yi Tay, Aston Zhang, Luu Anh Tuan, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, Siu Cheung Hui
Abstract	Many state-of-the-art neural models for NLP are heavily parameterized and thus memory inefficient. This paper proposes a series of lightweight and memory efficient neural architectures for a potpourri of natural language processing (NLP) tasks. To this end, our models exploit computation using Quaternion algebra and hypercomplex spaces, enabling not only expressive inter-component interactions but also significantly ($75%$) reduced parameter size due to lesser degrees of freedom in the Hamilton product. We propose Quaternion variants of models, giving rise to new architectures such as the Quaternion attention Model and Quaternion Transformer. Extensive experiments on a battery of NLP tasks demonstrates the utility of proposed Quaternion-inspired models, enabling up to $75%$ reduction in parameter size without significant loss in performance.
Tasks
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04393v1
PDF	https://arxiv.org/pdf/1906.04393v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-and-efficient-neural-natural
Repo	https://github.com/vanzytay/QuaternionTransformers
Framework	tf