February 1, 2020

3232 words 16 mins read

Paper Group AWR 254

Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation. Sameness Attracts, Novelty Disturbs, but Outliers Flourish in Fanfiction Online. Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing. Denoising Auto-encoding Priors in Undecimated Wavelet Domain for MR Image Reconstruction. Effi …

Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation


Title	Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation
Authors	Alain Jungo, Mauricio Reyes
Abstract	Despite the recent improvements in overall accuracy, deep learning systems still exhibit low levels of robustness. Detecting possible failures is critical for a successful clinical integration of these systems, where each data point corresponds to an individual patient. Uncertainty measures are a promising direction to improve failure detection since they provide a measure of a system’s confidence. Although many uncertainty estimation methods have been proposed for deep learning, little is known on their benefits and current challenges for medical image segmentation. Therefore, we report results of evaluating common voxel-wise uncertainty measures with respect to their reliability, and limitations on two medical image segmentation datasets. Results show that current uncertainty methods perform similarly and although they are well-calibrated at the dataset level, they tend to be miscalibrated at subject-level. Therefore, the reliability of uncertainty estimates is compromised, highlighting the importance of developing subject-wise uncertainty estimations. Additionally, among the benchmarked methods, we found auxiliary networks to be a valid alternative to common uncertainty methods since they can be applied to any previously trained segmentation model.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2019-07-07
URL	https://arxiv.org/abs/1907.03338v2
PDF	https://arxiv.org/pdf/1907.03338v2.pdf
PWC	https://paperswithcode.com/paper/assessing-reliability-and-challenges-of
Repo	https://github.com/alainjungo/reliability-challenges-uncertainty
Framework	pytorch

Sameness Attracts, Novelty Disturbs, but Outliers Flourish in Fanfiction Online


Title	Sameness Attracts, Novelty Disturbs, but Outliers Flourish in Fanfiction Online
Authors	Elise Jing, Simon DeDeo, Yong-Yeol Ahn
Abstract	The nature of what people enjoy is not just a central question for the creative industry, it is a driving force of cultural evolution. It is widely believed that successful cultural products balance novelty and conventionality: they provide something familiar but at least somewhat divergent from what has come before, and occupy a satisfying middle ground between “more of the same” and “too strange”. We test this belief using a large dataset of over half a million works of fanfiction from the website Archive of Our Own (AO3), looking at how the recognition a work receives varies with its novelty. We quantify the novelty through a term-based language model, and a topic model, in the context of existing works within the same fandom. Contrary to the balance theory, we find that the lowest-novelty are the most popular and that popularity declines monotonically with novelty. A few exceptions can be found: extremely popular works that are among the highest novelty within the fandom. Taken together, our findings not only challenge the traditional theory of the hedonic value of novelty, they invert it: people prefer the least novel things, are repelled by the middle ground, and have an occasional enthusiasm for extreme outliers. It suggests that cultural evolution must work against inertia — the appetite people have to continually reconsume the familiar, and may resemble a punctuated equilibrium rather than a smooth evolution.
Tasks	Language Modelling
Published	2019-04-16
URL	http://arxiv.org/abs/1904.07741v1
PDF	http://arxiv.org/pdf/1904.07741v1.pdf
PWC	https://paperswithcode.com/paper/sameness-attracts-novelty-disturbs-but
Repo	https://github.com/yzjing/ao3
Framework	none

Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing


Title	Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing
Authors	M. Scetbon, G. Varoquaux
Abstract	Are two sets of observations drawn from the same distribution? This problem is a two-sample test. Kernel methods lead to many appealing properties. Indeed state-of-the-art approaches use the $L^2$ distance between kernel-based distribution representatives to derive their test statistics. Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence. Moreover, for analytic kernels, we show that the $L^1$ geometry gives improved testing power for scalable computational procedures. Specifically, we derive a finite dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to maximize the differences of the distributions and give interpretable indications of how they differs. Using an $\ell_1$ norm gives better detection because differences between representatives are dense as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while much faster than state-of-the-art quadratic-time kernel-based tests. Experiments on artificial and real-world problems demonstrate improved power/time tradeoff than the state of the art, based on $\ell_2$ norms, and in some cases, better outright power than even the most expensive quadratic-time tests.
Tasks
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09264v2
PDF	https://arxiv.org/pdf/1909.09264v2.pdf
PWC	https://paperswithcode.com/paper/comparing-distributions-ell_1-geometry
Repo	https://github.com/meyerscetbon/l1_two_sample_test
Framework	none

Denoising Auto-encoding Priors in Undecimated Wavelet Domain for MR Image Reconstruction


Title	Denoising Auto-encoding Priors in Undecimated Wavelet Domain for MR Image Reconstruction
Authors	Siyuan Wang, Junjie Lv, Yuanyuan Hu, Dong Liang, Minghui Zhang, Qiegen Liu
Abstract	Compressive sensing is an impressive approach for fast MRI. It aims at reconstructing MR image using only a few under-sampled data in k-space, enhancing the efficiency of the data acquisition. In this study, we propose to learn priors based on undecimated wavelet transform and an iterative image reconstruction algorithm. At the stage of prior learning, transformed feature images obtained by undecimated wavelet transform are stacked as an input of denoising autoencoder network (DAE). The highly redundant and multi-scale input enables the correlation of feature images at different channels, which allows a robust network-driven prior. At the iterative reconstruction, the transformed DAE prior is incorporated into the classical iterative procedure by the means of proximal gradient algorithm. Experimental comparisons on different sampling trajectories and ratios validated the great potential of the presented algorithm.
Tasks	Compressive Sensing, Denoising, Image Reconstruction
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01108v2
PDF	https://arxiv.org/pdf/1909.01108v2.pdf
PWC	https://paperswithcode.com/paper/denoising-auto-encoding-priors-in-undecimated
Repo	https://github.com/yqx7150/WDAEPRec
Framework	none

Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems


Title	Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems
Authors	Geoffrey Roeder, Paul K. Grant, Andrew Phillips, Neil Dalchau, Edward Meeds
Abstract	We introduce a flexible, scalable Bayesian inference framework for nonlinear dynamical systems characterised by distinct and hierarchical variability at the individual, group, and population levels. Our model class is a generalisation of nonlinear mixed-effects (NLME) dynamical systems, the statistical workhorse for many experimental sciences. We cast parameter inference as stochastic optimisation of an end-to-end differentiable, block-conditional variational autoencoder. We specify the dynamics of the data-generating process as an ordinary differential equation (ODE) such that both the ODE and its solver are fully differentiable. This model class is highly flexible: the ODE right-hand sides can be a mixture of user-prescribed or “white-box” sub-components and neural network or “black-box” sub-components. Using stochastic optimisation, our amortised inference algorithm could seamlessly scale up to massive data collection pipelines (common in labs with robotic automation). Finally, our framework supports interpretability with respect to the underlying dynamics, as well as predictive generalization to unseen combinations of group components (also called “zero-shot” learning). We empirically validate our method by predicting the dynamic behaviour of bacteria that were genetically engineered to function as biosensors. Our implementation of the framework, the dataset, and all code to reproduce the experimental results is available at https://www.github.com/Microsoft/vi-hds .
Tasks	Bayesian Inference, Zero-Shot Learning
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12090v2
PDF	https://arxiv.org/pdf/1905.12090v2.pdf
PWC	https://paperswithcode.com/paper/efficient-amortised-bayesian-inference-for
Repo	https://github.com/Microsoft/vi-hds
Framework	tf

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information


Title	CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Authors	Shikhar Vashishth, Prince Jain, Partha Talukdar
Abstract	Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI’s effectiveness.
Tasks	Feature Engineering, Open Information Extraction, Open Knowledge Graph Canonicalization
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00172v1
PDF	http://arxiv.org/pdf/1902.00172v1.pdf
PWC	https://paperswithcode.com/paper/cesi-canonicalizing-open-knowledge-bases
Repo	https://github.com/malllabiisc/cesi
Framework	none

Simple and Effective Text Matching with Richer Alignment Features


Title	Simple and Effective Text Matching with Richer Alignment Features
Authors	Runqi Yang, Jianhai Zhang, Xing Gao, Feng Ji, Haiqing Chen
Abstract	In this paper, we present a fast and strong neural approach for general purpose text matching applications. We explore what is sufficient to build a fast and well-performed text matching model and propose to keep three key features available for inter-sequence alignment: original point-wise features, previous aligned features, and contextual features while simplifying all the remaining components. We conduct experiments on four well-studied benchmark datasets across tasks of natural language inference, paraphrase identification and answer selection. The performance of our model is on par with the state-of-the-art on all datasets with much fewer parameters and the inference speed is at least 6 times faster compared with similarly performed ones.
Tasks	Answer Selection, Natural Language Inference, Paraphrase Identification, Text Matching
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00300v1
PDF	https://arxiv.org/pdf/1908.00300v1.pdf
PWC	https://paperswithcode.com/paper/simple-and-effective-text-matching-with-1
Repo	https://github.com/hitvoice/RE2
Framework	tf

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models


Title	Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Authors	Giannis Daras, Augustus Odena, Han Zhang, Alexandros G. Dimakis
Abstract	We introduce a new local sparse attention layer that preserves two-dimensional geometry and locality. We show that by just replacing the dense attention layer of SAGAN with our construction, we obtain very significant FID, Inception score and pure visual improvements. FID score is improved from $18.65$ to $15.94$ on ImageNet, keeping all other parameters the same. The sparse attention patterns that we propose for our new layer are designed using a novel information theoretic criterion that uses information flow graphs. We also present a novel way to invert Generative Adversarial Networks with attention. Our method extracts from the attention layer of the discriminator a saliency map, which we use to construct a new loss function for the inversion. This allows us to visualize the newly introduced attention heads and show that they indeed capture interesting aspects of two-dimensional geometry of real images.
Tasks	Conditional Image Generation, Deep Attention, Image Generation
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12287v2
PDF	https://arxiv.org/pdf/1911.12287v2.pdf
PWC	https://paperswithcode.com/paper/your-local-gan-designing-two-dimensional
Repo	https://github.com/giannisdaras/ylg
Framework	tf

Inductive Relation Prediction by Subgraph Reasoning


Title	Inductive Relation Prediction by Subgraph Reasoning
Authors	Komal K. Teru, Etienne Denis, William L. Hamilton
Abstract	The dominant paradigm for relation prediction in knowledge graphs involves learning and operating on latent representations (i.e., embeddings) of entities and relations. However, these embedding-based methods do not explicitly capture the compositional logical rules underlying the knowledge graph, and they are limited to the transductive setting, where the full set of entities must be known during training. Here, we propose a graph neural network based relation prediction framework, GraIL, that reasons over local subgraph structures and has a strong inductive bias to learn entity-independent relational semantics. Unlike embedding-based models, GraIL is naturally inductive and can generalize to unseen entities and graphs after training. We provide theoretical proof and strong empirical evidence that GraIL can represent a useful subset of first-order logic and show that GraIL outperforms existing rule-induction baselines in the inductive setting. We also demonstrate significant gains obtained by ensembling GraIL with various knowledge graph embedding methods in the transductive setting, highlighting the complementary inductive bias of our method.
Tasks	Graph Embedding, Knowledge Graph Completion, Knowledge Graph Embedding, Knowledge Graphs, Question Answering, Relational Reasoning
Published	2019-11-16
URL	https://arxiv.org/abs/1911.06962v2
PDF	https://arxiv.org/pdf/1911.06962v2.pdf
PWC	https://paperswithcode.com/paper/inductive-relation-prediction-on-knowledge
Repo	https://github.com/muhanzhang/IGMC
Framework	pytorch

NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction


Title	NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction
Authors	Wenxuan Zhou, Hongtao Lin, Bill Yuchen Lin, Ziqi Wang, Junyi Du, Leonardo Neves, Xiang Ren
Abstract	Deep neural models for relation extraction tend to be less reliable when perfectly labeled data is limited, despite their success in label-sufficient scenarios. Instead of seeking more instance-level labels from human annotators, here we propose to annotate frequent surface patterns to form labeling rules. These rules can be automatically mined from large text corpora and generalized via a soft rule matching mechanism. Prior works use labeling rules in an exact matching fashion, which inherently limits the coverage of sentence matching and results in the low-recall issue. In this paper, we present a neural approach to ground rules for RE, named NERO, which jointly learns a relation extraction module and a soft matching module. One can employ any neural relation extraction models as the instantiation for the RE module. The soft matching module learns to match rules with semantically similar sentences such that raw corpora can be automatically labeled and leveraged by the RE module (in a much better coverage) as augmented supervision, in addition to the exactly matched sentences. Extensive experiments and analysis on two public and widely-used datasets demonstrate the effectiveness of the proposed NERO framework, comparing with both rule-based and semi-supervised methods. Through user studies, we find that the time efficiency for a human to annotate rules and sentences are similar (0.30 vs. 0.35 min per label). In particular, NERO’s performance using 270 rules is comparable to the models trained using 3,000 labeled sentences, yielding a 9.5x speedup. Moreover, NERO can predict for unseen relations at test time and provide interpretable predictions. We release our code to the community for future research.
Tasks	Relation Extraction
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02177v4
PDF	https://arxiv.org/pdf/1909.02177v4.pdf
PWC	https://paperswithcode.com/paper/neural-rule-grounding-for-low-resource
Repo	https://github.com/INK-USC/REGD
Framework	tf

Fairness for Robust Log Loss Classification


Title	Fairness for Robust Log Loss Classification
Authors	Ashkan Rezaei, Rizal Fathony, Omid Memarrast, Brian Ziebart
Abstract	Developing classification methods with high accuracy that also avoid unfair treatment of different groups has become increasingly important for data-driven decision making in social applications. Following the first principles of distributional robustness, we derive a new classifier that incorporates fairness criteria into its worst-case logarithmic loss minimization. This construction takes the form of a minimax game and produces a parametric exponential family conditional distribution that resembles truncated logistic regression. We demonstrate the advantages of our approach on three benchmark fairness datasets.
Tasks	Decision Making
Published	2019-03-10
URL	https://arxiv.org/abs/1903.03910v3
PDF	https://arxiv.org/pdf/1903.03910v3.pdf
PWC	https://paperswithcode.com/paper/fair-logistic-regression-an-adversarial
Repo	https://github.com/arezae4/fair-logloss-classification
Framework	none

LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence


Title	LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence
Authors	Rahul Yedida, Snehanshu Saha
Abstract	Optimizing deep neural networks is largely thought to be an empirical process, requiring manual tuning of several hyper-parameters, such as learning rate, weight decay, and dropout rate. Arguably, the learning rate is the most important of these to tune, and this has gained more attention in recent works. In this paper, we propose a novel method to compute the learning rate for training deep neural networks with stochastic gradient descent. We first derive a theoretical framework to compute learning rates dynamically based on the Lipschitz constant of the loss function. We then extend this framework to other commonly used optimization algorithms, such as gradient descent with momentum and Adam. We run an extensive set of experiments that demonstrate the efficacy of our approach on popular architectures and datasets, and show that commonly used learning rates are an order of magnitude smaller than the ideal value.
Tasks
Published	2019-02-20
URL	https://arxiv.org/abs/1902.07399v3
PDF	https://arxiv.org/pdf/1902.07399v3.pdf
PWC	https://paperswithcode.com/paper/a-novel-adaptive-learning-rate-scheduler-for
Repo	https://github.com/yrahul3910/symnet
Framework	none

Speeding up Word Mover’s Distance and its variants via properties of distances between embeddings


Title	Speeding up Word Mover’s Distance and its variants via properties of distances between embeddings
Authors	Matheus Werner, Eduardo Laber
Abstract	The Word Mover’s Distance (WMD) proposed in Kusner et al. [ICML,2015] is a distance between documents that takes advantage of semantic relations among words that are captured by their embeddings. This distance proved to be quite effective, obtaining state-of-art error rates for classification tasks, but also impracticable for large collections/documents due to its computational complexity. For circumventing this problem, variants of WMD have been proposed. Among them, Relaxed Word Mover’s Distance (RWMD) is one of the most successful due to its simplicity, effectiveness, and also because of its fast implementations. Relying on assumptions that are supported by empirical properties of the distances between embeddings, we propose an approach to speed up both WMD and RWMD. Experiments over 10 datasets suggest that our approach leads to a significant speed-up in document classification tasks while maintaining the same error rates.
Tasks	Document Classification
Published	2019-12-01
URL	https://arxiv.org/abs/1912.00509v1
PDF	https://arxiv.org/pdf/1912.00509v1.pdf
PWC	https://paperswithcode.com/paper/speeding-up-word-movers-distance-and-its
Repo	https://github.com/matwerner/fast-wmd
Framework	none

Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning


Title	Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning
Authors	Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song
Abstract	An increasing number of well-trained deep networks have been released online by researchers and developers, enabling the community to reuse them in a plug-and-play way without accessing the training annotations. However, due to the large number of network variants, such public-available trained models are often of different architectures, each of which being tailored for a specific task or dataset. In this paper, we study a deep-model reusing task, where we are given as input pre-trained networks of heterogeneous architectures specializing in distinct tasks, as teacher models. We aim to learn a multitalented and light-weight student model that is able to grasp the integrated knowledge from all such heterogeneous-structure teachers, again without accessing any human annotation. To this end, we propose a common feature learning scheme, in which the features of all teachers are transformed into a common space and the student is enforced to imitate them all so as to amalgamate the intact knowledge. We test the proposed approach on a list of benchmarks and demonstrate that the learned student is able to achieve very promising performance, superior to those of the teachers in their specialized tasks.
Tasks
Published	2019-06-24
URL	https://arxiv.org/abs/1906.10546v1
PDF	https://arxiv.org/pdf/1906.10546v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-amalgamation-from-heterogeneous
Repo	https://github.com/VainF/CommonFeatureLearning
Framework	pytorch

Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction


Title	Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction
Authors	Xiang Deng, Huan Sun
Abstract	Distant supervision (DS) has been widely used to automatically construct (noisy) labeled data for relation extraction (RE). Given two entities, distant supervision exploits sentences that directly mention them for predicting their semantic relation. We refer to this strategy as 1-hop DS, which unfortunately may not work well for long-tail entities with few supporting sentences. In this paper, we introduce a new strategy named 2-hop DS to enhance distantly supervised RE, based on the observation that there exist a large number of relational tables on the Web which contain entity pairs that share common relations. We refer to such entity pairs as anchors for each other, and collect all sentences that mention the anchor entity pairs of a given target entity pair to help relation prediction. We develop a new neural RE method REDS2 in the multi-instance learning paradigm, which adopts a hierarchical model structure to fuse information respectively from 1-hop DS and 2-hop DS. Extensive experimental results on a benchmark dataset show that REDS2 can consistently outperform various baselines across different settings by a substantial margin.
Tasks	Relation Extraction
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06007v1
PDF	https://arxiv.org/pdf/1909.06007v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-2-hop-distant-supervision-from
Repo	https://github.com/sunlab-osu/REDS2
Framework	pytorch