January 25, 2020

2959 words 14 mins read

Paper Group NAWR 15

Heuristic Authorship Obfuscation. Combining Structured and Free-text Electronic Medical Record Data for Real-time Clinical Decision Support. Automated Pyramid Summarization Evaluation. Human-level Protein Localization with Convolutional Neural Networks. Bayesian Inference Semantics: A Modelling System and A Test Suite. HistoSegNet: Semantic Segment …

Heuristic Authorship Obfuscation


Title	Heuristic Authorship Obfuscation
Authors	Janek Bevendorff, Martin Potthast, Matthias Hagen, Benno Stein
Abstract	Authorship verification is the task of determining whether two texts were written by the same author. We deal with the adversary task, called authorship obfuscation: preventing verification by altering a to-be-obfuscated text. Our new obfuscation approach (1) models writing style difference as the Jensen-Shannon distance between the character n-gram distributions of texts, and (2) manipulates an author{'}s subconsciously encoded writing style in a sophisticated manner using heuristic search. To obfuscate, we analyze the huge space of textual variants for a paraphrased version of the to-be-obfuscated text that has a sufficient Jensen-Shannon distance at minimal costs in terms of text quality. We analyze, quantify, and illustrate the rationale of this approach, define paraphrasing operators, derive obfuscation thresholds, and develop an effective obfuscation framework. Our authorship obfuscation approach defeats state-of-the-art verification approaches, including unmasking and compression models, while keeping text changes at a minimum.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1104/
PDF	https://www.aclweb.org/anthology/P19-1104
PWC	https://paperswithcode.com/paper/heuristic-authorship-obfuscation
Repo	https://github.com/webis-de/acl-19
Framework	none

Combining Structured and Free-text Electronic Medical Record Data for Real-time Clinical Decision Support


Title	Combining Structured and Free-text Electronic Medical Record Data for Real-time Clinical Decision Support
Authors	Emilia Apostolova, Tony Wang, Tim Tschampel, Ioannis Koutroulis, Tom Velez
Abstract	The goal of this work is to utilize Electronic Medical Record (EMR) data for real-time Clinical Decision Support (CDS). We present a deep learning approach to combining in real time available diagnosis codes (ICD codes) and free-text notes: Patient Context Vectors. Patient Context Vectors are created by averaging ICD code embeddings, and by predicting the same from free-text notes via a Convolutional Neural Network. The Patient Context Vectors were then simply appended to available structured data (vital signs and lab results) to build prediction models for a specific condition. Experiments on predicting ARDS, a rare and complex condition, demonstrate the utility of Patient Context Vectors as a means of summarizing the patient history and overall condition, and improve significantly the prediction model results.
Tasks
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-5007/
PDF	https://www.aclweb.org/anthology/W19-5007
PWC	https://paperswithcode.com/paper/combining-structured-and-free-text-electronic
Repo	https://github.com/ema-/patient-context-vectors
Framework	none

Automated Pyramid Summarization Evaluation


Title	Automated Pyramid Summarization Evaluation
Authors	Yanjun Gao, Chen Sun, Rebecca J. Passonneau
Abstract	Pyramid evaluation was developed to assess the content of paragraph length summaries of source texts. A pyramid lists the distinct units of content found in several reference summaries, weights content units by how many reference summaries they occur in, and produces three scores based on the weighted content of new summaries. We present an automated method that is more efficient, more transparent, and more complete than previous automated pyramid methods. It is tested on a new dataset of student summaries, and historical NIST data from extractive summarizers.
Tasks
Published	2019-11-01
URL	https://www.aclweb.org/anthology/K19-1038/
PDF	https://www.aclweb.org/anthology/K19-1038
PWC	https://paperswithcode.com/paper/automated-pyramid-summarization-evaluation
Repo	https://github.com/serenayj/PyrEval
Framework	none

Human-level Protein Localization with Convolutional Neural Networks


Title	Human-level Protein Localization with Convolutional Neural Networks
Authors	Elisabeth Rumetshofer, Markus Hofmarcher, Clemens Röhrl, Sepp Hochreiter, Günter Klambauer
Abstract	Localizing a specific protein in a human cell is essential for understanding cellular functions and biological processes of underlying diseases. A promising, low-cost,and time-efficient biotechnology for localizing proteins is high-throughput fluorescence microscopy imaging (HTI). This imaging technique stains the protein of interest in a cell with fluorescent antibodies and subsequently takes a microscopic image. Together with images of other stained proteins or cell organelles and the annotation by the Human Protein Atlas project, these images provide a rich source of information on the protein location which can be utilized by computational methods. It is yet unclear how precise such methods are and whether they can compete with human experts. We here focus on deep learning image analysis methods and, in particular, on Convolutional Neural Networks (CNNs)since they showed overwhelming success across different imaging tasks. We pro-pose a novel CNN architecture “GapNet-PL” that has been designed to tackle the characteristics of HTI data and uses global averages of filters at different abstraction levels. We present the largest comparison of CNN architectures including GapNet-PL for protein localization in HTI images of human cells. GapNet-PL outperforms all other competing methods and reaches close to perfect localization in all 13 tasks with an average AUC of 98% and F1 score of 78%. On a separate test set the performance of GapNet-PL was compared with three human experts and 25 scholars. GapNet-PL achieved an accuracy of 91%, significantly (p-value 1.1e−6) outperforming the best human expert with an accuracy of 72%.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=ryl5khRcKm
PDF	https://openreview.net/pdf?id=ryl5khRcKm
PWC	https://paperswithcode.com/paper/human-level-protein-localization-with
Repo	https://github.com/ml-jku/gapnet-pl
Framework	tf

Bayesian Inference Semantics: A Modelling System and A Test Suite


Title	Bayesian Inference Semantics: A Modelling System and A Test Suite
Authors	Jean-Philippe Bernardy, Rasmus Blanck, Stergios Chatzikyriakidis, Shalom Lappin, Aleks Maskharashvili, re
Abstract	We present BIS, a Bayesian Inference Semantics, for probabilistic reasoning in natural language. The current system is based on the framework of Bernardy et al. (2018), but departs from it in important respects. BIS makes use of Bayesian learning for inferring a hypothesis from premises. This involves estimating the probability of the hypothesis, given the data supplied by the premises of an argument. It uses a syntactic parser to generate typed syntactic structures that serve as input to a model generation system. Sentences are interpreted compositionally to probabilistic programs, and the corresponding truth values are estimated using sampling methods. BIS successfully deals with various probabilistic semantic phenomena, including frequency adverbs, generalised quantifiers, generics, and vague predicates. It performs well on a number of interesting probabilistic reasoning tasks. It also sustains most classically valid inferences (instantiation, de Morgan{'}s laws, etc.). To test BIS we have built an experimental test suite with examples of a range of probabilistic and classical inference patterns.
Tasks	Bayesian Inference
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-1029/
PDF	https://www.aclweb.org/anthology/S19-1029
PWC	https://paperswithcode.com/paper/bayesian-inference-semantics-a-modelling
Repo	https://github.com/GU-CLASP/bbclm2019
Framework	none

HistoSegNet: Semantic Segmentation of Histological Tissue Type in Whole Slide Images


Title	HistoSegNet: Semantic Segmentation of Histological Tissue Type in Whole Slide Images
Authors	Lyndon Chan, Mahdi S. Hosseini, Corwyn Rowsell, Konstantinos N. Plataniotis, Savvas Damaskinos
Abstract	In digital pathology, tissue slides are scanned into Whole Slide Images (WSI) and pathologists first screen for diagnostically-relevant Regions of Interest (ROIs) before reviewing them. Screening for ROIs is a tedious and time-consuming visual recognition task which can be exhausting. The cognitive workload could be reduced by developing a visual aid to narrow down the visual search area by highlighting (or segmenting) regions of diagnostic relevance, enabling pathologists to spend more time diagnosing relevant ROIs. In this paper, we propose HistoSegNet, a method for semantic segmentation of histological tissue type (HTT). Using the HTT-annotated Atlas of Digital Pathology (ADP) database, we train a Convolutional Neural Network on the patch annotations, infer Gradient-Weighted Class Activation Maps, average overlapping predictions, and post-process the segmentation with a fully-connected Conditional Random Field. Our method out-performs more complicated weakly-supervised semantic segmentation methods and can generalize to other datasets without retraining.
Tasks	Medical Image Segmentation, Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Chan_HistoSegNet_Semantic_Segmentation_of_Histological_Tissue_Type_in_Whole_Slide_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Chan_HistoSegNet_Semantic_Segmentation_of_Histological_Tissue_Type_in_Whole_Slide_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/histosegnet-semantic-segmentation-of
Repo	https://github.com/lyndonchan/hsn_v1
Framework	tf

Fast Concept Mention Grouping for Concept Map-based Multi-Document Summarization


Title	Fast Concept Mention Grouping for Concept Map-based Multi-Document Summarization
Authors	Tobias Falke, Iryna Gurevych
Abstract	Concept map-based multi-document summarization has recently been proposed as a variant of the traditional summarization task with graph-structured summaries. As shown by previous work, the grouping of coreferent concept mentions across documents is a crucial subtask of it. However, while the current state-of-the-art method suggested a new grouping method that was shown to improve the summary quality, its use of pairwise comparisons leads to polynomial runtime complexity that prohibits the application to large document collections. In this paper, we propose two alternative grouping techniques based on locality sensitive hashing, approximate nearest neighbor search and a fast clustering algorithm. They exhibit linear and log-linear runtime complexity, making them much more scalable. We report experimental results that confirm the improved runtime behavior while also showing that the quality of the summary concept maps remains comparable.
Tasks	Document Summarization, Multi-Document Summarization
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1074/
PDF	https://www.aclweb.org/anthology/N19-1074
PWC	https://paperswithcode.com/paper/fast-concept-mention-grouping-for-concept-map
Repo	https://github.com/UKPLab/naacl2019-cmaps-lshcw
Framework	none

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency


Title	Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency
Authors	Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che
Abstract	We address the problem of adversarial attacks on text classification, which is rarely studied comparing to attacks on image classification. The challenge of this task is to generate adversarial examples that maintain lexical correctness, grammatical correctness and semantic similarity. Based on the synonyms substitution strategy, we introduce a new word replacement order determined by both the word saliency and the classification probability, and propose a greedy algorithm called probability weighted word saliency (PWWS) for text adversarial attack. Experiments on three popular datasets using convolutional as well as LSTM models show that PWWS reduces the classification accuracy to the most extent, and keeps a very low word substitution rate. A human evaluation study shows that our generated adversarial examples maintain the semantic similarity well and are hard for humans to perceive. Performing adversarial training using our perturbed datasets improves the robustness of the models. At last, our method also exhibits a good transferability on the generated adversarial examples.
Tasks	Adversarial Attack, Image Classification, Semantic Similarity, Semantic Textual Similarity, Text Classification
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1103/
PDF	https://www.aclweb.org/anthology/P19-1103
PWC	https://paperswithcode.com/paper/generating-natural-language-adversarial-2
Repo	https://github.com/JHL-HUST/PWWS
Framework	none

FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP


Title	FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP
Authors	Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, Rol Vollgraf,
Abstract	We present FLAIR, an NLP framework designed to facilitate training and distribution of state-of-the-art sequence labeling, text classification and language models. The core idea of the framework is to present a simple, unified interface for conceptually very different types of word and document embeddings. This effectively hides all embedding-specific engineering complexity and allows researchers to {`}mix and match{''} various embeddings with little effort. The framework also implements standard model training and hyperparameter selection routines, as well as a data fetching module that can download publicly available NLP datasets and convert them into data structures for quick set up of experiments. Finally, FLAIR also ships with a {`}model zoo{''} of pre-trained models to allow researchers to use state-of-the-art NLP models in their applications. This paper gives an overview of the framework and its functionality. The framework is available on GitHub at https://github.com/zalandoresearch/flair .
Tasks	Text Classification
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-4010/
PDF	https://www.aclweb.org/anthology/N19-4010
PWC	https://paperswithcode.com/paper/flair-an-easy-to-use-framework-for-state-of
Repo	https://github.com/zalandoresearch/flair
Framework	pytorch

Generalizing Unmasking for Short Texts


Title	Generalizing Unmasking for Short Texts
Authors	Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast
Abstract	Authorship verification is the problem of inferring whether two texts were written by the same author. For this task, unmasking is one of the most robust approaches as of today with the major shortcoming of only being applicable to book-length texts. In this paper, we present a generalized unmasking approach which allows for authorship verification of texts as short as four printed pages with very high precision at an adjustable recall tradeoff. Our generalized approach therefore reduces the required material by orders of magnitude, making unmasking applicable to authorship cases of more practical proportions. The new approach is on par with other state-of-the-art techniques that are optimized for texts of this length: it achieves accuracies of 75{–}80{%}, while also allowing for easy adjustment to forensic scenarios that require higher levels of confidence in the classification.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1068/
PDF	https://www.aclweb.org/anthology/N19-1068
PWC	https://paperswithcode.com/paper/generalizing-unmasking-for-short-texts
Repo	https://github.com/webis-de/NAACL-19
Framework	none

A Normative Theory for Causal Inference and Bayes Factor Computation in Neural Circuits


Title	A Normative Theory for Causal Inference and Bayes Factor Computation in Neural Circuits
Authors	Wenhao Zhang, Si Wu, Brent Doiron, Tai Sing Lee
Abstract	This study provides a normative theory for how Bayesian causal inference can be implemented in neural circuits. In both cognitive processes such as causal reasoning and perceptual inference such as cue integration, the nervous systems need to choose different models representing the underlying causal structures when making inferences on external stimuli. In multisensory processing, for example, the nervous system has to choose whether to integrate or segregate inputs from different sensory modalities to infer the sensory stimuli, based on whether the inputs are from the same or different sources. Making this choice is a model selection problem requiring the computation of Bayes factor, the ratio of likelihoods between the integration and the segregation models. In this paper, we consider the causal inference in multisensory processing and propose a novel generative model based on neural population code that takes into account both stimulus feature and stimulus reliability in the inference. In the case of circular variables such as heading direction, our normative theory yields an analytical solution for computing the Bayes factor, with a clear geometric interpretation, which can be implemented by simple additive mechanisms with neural population code. Numerical simulation shows that the tunings of the neurons computing Bayes factor are consistent with the “opposite neurons” discovered in dorsal medial superior temporal (MSTd) and the ventral intraparietal (VIP) areas for visual-vestibular processing. This study illuminates a potential neural mechanism for causal inference in the brain.
Tasks	Causal Inference, Model Selection
Published	2019-12-01
URL	http://papers.nips.cc/paper/8636-a-normative-theory-for-causal-inference-and-bayes-factor-computation-in-neural-circuits
PDF	http://papers.nips.cc/paper/8636-a-normative-theory-for-causal-inference-and-bayes-factor-computation-in-neural-circuits.pdf
PWC	https://paperswithcode.com/paper/a-normative-theory-for-causal-inference-and
Repo	https://github.com/wenhao-z/Bayes_factor_Opposite_neuron
Framework	none

Modeling the Induced Action Alternation and the Caused-Motion Construction with Tree Adjoining Grammar (TAG) and Semantic Frames


Title	Modeling the Induced Action Alternation and the Caused-Motion Construction with Tree Adjoining Grammar (TAG) and Semantic Frames
Authors	Esther Seyffarth
Abstract
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/W19-1003/
PDF	https://www.aclweb.org/anthology/W19-1003
PWC	https://paperswithcode.com/paper/modeling-the-induced-action-alternation-and
Repo	https://github.com/ojahnn/caused-motion-xmg
Framework	none

Crystal Graph Neural Networks for Data Mining in Materials Science


Title	Crystal Graph Neural Networks for Data Mining in Materials Science
Authors	Takenori Yamamoto
Abstract	Machine learning methods have been employed for materials prediction in various ways. It has recently been proposed that a crystalline material is represented by a multigraph called a crystal graph. Convolutional neural networks adapted to those graphs have successfully predicted bulk properties of materials with the use of equilibrium bond distances as spatial information. An investigation into graph neural networks for small molecules has recently shown that the no distance model performs almost as well as the distance model. This paper proposes crystal graph neural networks (CGNNs) that use no bond distances, and introduces a scale-invariant graph coordinator that makes up crystal graphs for the CGNN models to be trained on the dataset based on a theoretical materials database. The CGNN models predict the bulk properties such as formation energy, unit cell volume, band gap, and total magnetization for every testing material, and the average errors are less than the corresponding ones of the database. The predicted band gaps and total magnetizations are used for the metal-insulator and nonmagnet-magnet binary classifications, which result in success. This paper presents discussions about high- throughput screening of candidate materials with the use of the predicted formation energies, and also about the future progress of materials data mining on the basis of the CGNN architectures.
Tasks	Band Gap, Formation Energy, Materials Screening, Total Magnetization
Published	2019-05-27
URL	https://www.researchgate.net/publication/333667001_Crystal_Graph_Neural_Networks_for_Data_Mining_in_Materials_Science
PDF	https://storage.googleapis.com/rimcs_cgnn/cgnn_matsci_May_27_2019.pdf
PWC	https://paperswithcode.com/paper/crystal-graph-neural-networks-for-data-mining
Repo	https://github.com/Tony-Y/cgnn
Framework	pytorch

DeeplyTough: Learning Structural Comparison of Protein Binding Sites


Title	DeeplyTough: Learning Structural Comparison of Protein Binding Sites
Authors	Martin Simonovsky, Joshua Meyers
Abstract	Protein binding site comparison (pocket matching) is of importance in drug discovery. Identification of similar binding sites can help guide efforts for hit finding, understanding polypharmacology and characterization of protein function. The design of pocket matching methods has traditionally involved much intuition, and has employed a broad variety of algorithms and representations of the input protein structures. We regard the high heterogeneity of past work and the recent availability of large-scale benchmarks as an indicator that a data-driven approach may provide a new perspective. We propose DeeplyTough, a convolutional neural network that encodes a three-dimensional representation of protein binding sites into descriptor vectors that may be compared efficiently in an alignment-free manner by computing pairwise Euclidean distances. The network is trained with supervision: (i) to provide similar pockets with similar descriptors, (ii) to separate the descriptors of dissimilar pockets by a minimum margin, and (iii) to achieve robustness to nuisance variations. We evaluate our method using three large-scale benchmark datasets, on which it demonstrates excellent performance for held-out data coming from the training distribution and competitive performance when the trained network is required to generalize to datasets constructed independently.
Tasks	Drug Discovery
Published	2019-04-05
URL	https://www.biorxiv.org/content/10.1101/600304v1
PDF	https://www.biorxiv.org/content/biorxiv/early/2019/04/05/600304.full.pdf
PWC	https://paperswithcode.com/paper/deeplytough-learning-structural-comparison-of
Repo	https://github.com/BenevolentAI/DeeplyTough
Framework	pytorch

Wide-Coverage Neural A* Parsing for Minimalist Grammars


Title	Wide-Coverage Neural A* Parsing for Minimalist Grammars
Authors	John Torr, Milos Stanojevic, Mark Steedman, Shay B. Cohen
Abstract	Minimalist Grammars (Stabler, 1997) are a computationally oriented, and rigorous formalisation of many aspects of Chomsky{'}s (1995) Minimalist Program. This paper presents the first ever application of this formalism to the task of realistic wide-coverage parsing. The parser uses a linguistically expressive yet highly constrained grammar, together with an adaptation of the A* search algorithm currently used in CCG parsing (Lewis and Steedman, 2014; Lewis et al., 2016), with supertag probabilities provided by a bi-LSTM neural network supertagger trained on MGbank, a corpus of MG derivation trees. We report on some promising initial experimental results for overall dependency recovery as well as on the recovery of certain unbounded long distance dependencies. Finally, although like other MG parsers, ours has a high order polynomial worst case time complexity, we show that in practice its expected time complexity is cubic in the length of the sentence. The parser is publicly available.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1238/
PDF	https://www.aclweb.org/anthology/P19-1238
PWC	https://paperswithcode.com/paper/wide-coverage-neural-a-parsing-for-minimalist
Repo	https://github.com/mgparsing/astar_mg_parser
Framework	tf