April 3, 2020

3145 words 15 mins read

Paper Group AWR 61

Face Recognition: Too Bias, or Not Too Bias?. GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation. Estimating the number and effect sizes of non-null hypotheses. Refinement of Unsupervised Cross-Lingual Word Embeddings. Electricity Theft Detection with self-attention. A Physiology-Driven Computational Model for Post-Cardiac Ar …

Face Recognition: Too Bias, or Not Too Bias?


Title	Face Recognition: Too Bias, or Not Too Bias?
Authors	Joseph P Robinson, Gennady Livitz, Yann Henon, Can Qin, Yun Fu, Samson Timoner
Abstract	We reveal critical insights into problems of bias in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face-pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs resulting in performance gaps among subgroups. By learning subgroup-specific thresholds, we not only mitigate problems in performance gaps but also show a notable boost in the overall performance. Furthermore, we do a human evaluation to measure the bias in humans, which supports the hypothesis that such a bias exists in human perception. For the BFW database, source code, and more, visit github.com/visionjo/facerec-bias-bfw.
Tasks	Face Recognition
Published	2020-02-16
URL	https://arxiv.org/abs/2002.06483v1
PDF	https://arxiv.org/pdf/2002.06483v1.pdf
PWC	https://paperswithcode.com/paper/face-recognition-too-bias-or-not-too-bias
Repo	https://github.com/visionjo/facerec-bias-bfw
Framework	none

GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation


Title	GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation
Authors	Nikhil Goyal, Harsh Vardhan Jain, Sayan Ranu
Abstract	Graph generative models have been extensively studied in the data mining literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addressed. First, learning graph distributions introduces additional computational overhead, which limits their scalability to large graph databases. Second, many techniques only learn the structure and do not address the need to also learn node and edge labels, which encode important semantic information and influence the structure itself. Third, existing techniques often incorporate domain-specific rules and lack generalizability. Fourth, the experimentation of existing techniques is not comprehensive enough due to either using weak evaluation metrics or focusing primarily on synthetic or small datasets. In this work, we develop a domain-agnostic technique called GraphGen to overcome all of these limitations. GraphGen converts graphs to sequences using minimum DFS codes. Minimum DFS codes are canonical labels and capture the graph structure precisely along with the label information. The complex joint distributions between structure and semantic labels are learned through a novel LSTM architecture. Extensive experiments on million-sized, real graph datasets show GraphGen to be 4 times faster on average than state-of-the-art techniques while being significantly better in quality across a comprehensive set of 11 different metrics. Our code is released at https://github.com/idea-iitd/graphgen.
Tasks	Graph Generation
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08184v1
PDF	https://arxiv.org/pdf/2001.08184v1.pdf
PWC	https://paperswithcode.com/paper/graphgen-a-scalable-approach-to-domain
Repo	https://github.com/idea-iitd/graphgen
Framework	pytorch

Estimating the number and effect sizes of non-null hypotheses


Title	Estimating the number and effect sizes of non-null hypotheses
Authors	Jennifer Brennan, Ramya Korlakai Vinayak, Kevin Jamieson
Abstract	We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer samples than would be required by an experiment that identified the discoveries. Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment. We prove that this simple and computationally efficient estimator enjoys a number of favorable theoretical properties, and demonstrate its effectiveness on data from a gene knockout experiment on influenza inhibition in Drosophila.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07297v1
PDF	https://arxiv.org/pdf/2002.07297v1.pdf
PWC	https://paperswithcode.com/paper/estimating-the-number-and-effect-sizes-of-non
Repo	https://github.com/jenniferbrennan/CountingDiscoveries
Framework	none


Title	Refinement of Unsupervised Cross-Lingual Word Embeddings
Authors	Magdalena Biesialska, Marta R. Costa-jussà
Abstract	Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion’s share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.
Tasks	Word Embeddings
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09213v1
PDF	https://arxiv.org/pdf/2002.09213v1.pdf
PWC	https://paperswithcode.com/paper/refinement-of-unsupervised-cross-lingual-word
Repo	https://github.com/artetxem/vecmap
Framework	none

Electricity Theft Detection with self-attention


Title	Electricity Theft Detection with self-attention
Authors	Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo
Abstract	In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China. Our key contribution is the introduction of a multi-head self-attention mechanism concatenated with dilated convolutions and unified by a convolution of kernel size $1$. Moreover, we introduce a binary input channel (Binary Mask) to identify the position of the missing values, allowing the network to learn how to deal with these values. Our model achieves an AUC of $0.926$ which is an improvement in more than $17%$ with respect to previous baseline work. The code is available on GitHub at https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention.
Tasks
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06219v1
PDF	https://arxiv.org/pdf/2002.06219v1.pdf
PWC	https://paperswithcode.com/paper/electricity-theft-detection-with-self
Repo	https://github.com/neuralmind-ai/electricity-theft-detection-with-self-attention
Framework	pytorch

A Physiology-Driven Computational Model for Post-Cardiac Arrest Outcome Prediction


Title	A Physiology-Driven Computational Model for Post-Cardiac Arrest Outcome Prediction
Authors	Han B. Kim, Hieu Nguyen, Qingchu Jin, Sharmila Tamby, Tatiana Gelaf Romer, Eric Sung, Ran Liu, Joseph Greenstein, Jose I. Suarez, Christian Storm, Raimond Winslow, Robert D. Stevens
Abstract	Patients resuscitated from cardiac arrest (CA) face a high risk of neurological disability and death, however pragmatic methods are lacking for accurate and reliable prognostication. The aim of this study was to build computational models to predict post-CA outcome by leveraging high-dimensional patient data available early after admission to the intensive care unit (ICU). We hypothesized that model performance could be enhanced by integrating physiological time series (PTS) data and by training machine learning (ML) classifiers. We compared three models integrating features extracted from the electronic health records (EHR) alone, features derived from PTS collected in the first 24hrs after ICU admission (PTS24), and models integrating PTS24 and EHR. Outcomes of interest were survival and neurological outcome at ICU discharge. Combined EHR-PTS24 models had higher discrimination (area under the receiver operating characteristic curve [AUC]) than models which used either EHR or PTS24 alone, for the prediction of survival (AUC 0.85, 0.80 and 0.68 respectively) and neurological outcome (0.87, 0.83 and 0.78). The best ML classifier achieved higher discrimination than the reference logistic regression model (APACHE III) for survival (AUC 0.85 vs 0.70) and neurological outcome prediction (AUC 0.87 vs 0.75). Feature analysis revealed previously unknown factors to be associated with post-CA recovery. Results attest to the effectiveness of ML models for post-CA predictive modeling and suggest that PTS recorded in very early phase after resuscitation encode short-term outcome probabilities.
Tasks	Time Series
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03309v2
PDF	https://arxiv.org/pdf/2002.03309v2.pdf
PWC	https://paperswithcode.com/paper/a-physiology-driven-computational-model-for
Repo	https://github.com/benfulcher/hctsa
Framework	none

Efficient Intent Detection with Dual Sentence Encoders


Title	Efficient Intent Detection with Dual Sentence Encoders
Authors	Iñigo Casanueva, Tadas Temčinas, Daniela Gerz, Matthew Henderson, Ivan Vulić
Abstract	Building conversational systems in new domains and with added functionality requires resource-efficient models that work under low-data regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and ConveRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.
Tasks	Intent Detection
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04807v1
PDF	https://arxiv.org/pdf/2003.04807v1.pdf
PWC	https://paperswithcode.com/paper/efficient-intent-detection-with-dual-sentence
Repo	https://github.com/PolyAI-LDN/task-specific-datasets
Framework	none

BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction


Title	BiOnt: Deep Learning using Multiple Biomedical Ontologies for Relation Extraction
Authors	Diana Sousa, Francisco M. Couto
Abstract	Successful biomedical relation extraction can provide evidence to researchers and clinicians about possible unknown associations between biomedical entities, advancing the current knowledge we have about those entities and their inherent mechanisms. Most biomedical relation extraction systems do not resort to external sources of knowledge, such as domain-specific ontologies. However, using deep learning methods, along with biomedical ontologies, has been recently shown to effectively advance the biomedical relation extraction field. To perform relation extraction, our deep learning system, BiOnt, employs four types of biomedical ontologies, namely, the Gene Ontology, the Human Phenotype Ontology, the Human Disease Ontology, and the Chemical Entities of Biological Interest, regarding gene-products, phenotypes, diseases, and chemical compounds, respectively. We tested our system with three data sets that represent three different types of relations of biomedical entities. BiOnt achieved, in F-score, an improvement of 4.93 percentage points for drug-drug interactions (DDI corpus), 4.99 percentage points for phenotype-gene relations (PGR corpus), and 2.21 percentage points for chemical-induced disease relations (BC5CDR corpus), relatively to the state-of-the-art. The code supporting this system is available at https://github.com/lasigeBioTM/BiONT.
Tasks	Relation Extraction
Published	2020-01-20
URL	https://arxiv.org/abs/2001.07139v1
PDF	https://arxiv.org/pdf/2001.07139v1.pdf
PWC	https://paperswithcode.com/paper/biont-deep-learning-using-multiple-biomedical
Repo	https://github.com/lasigeBioTM/BiONT
Framework	none

Reformer: The Efficient Transformer


Title	Reformer: The Efficient Transformer
Authors	Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya
Abstract	Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.
Tasks	Language Modelling
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04451v2
PDF	https://arxiv.org/pdf/2001.04451v2.pdf
PWC	https://paperswithcode.com/paper/reformer-the-efficient-transformer-1
Repo	https://github.com/lucidrains/reformer-pytorch
Framework	pytorch

Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information


Title	Post-Estimation Smoothing: A Simple Baseline for Learning with Side Information
Authors	Esther Rolf, Michael I. Jordan, Benjamin Recht
Abstract	Observational data are often accompanied by natural structural indices, such as time stamps or geographic locations, which are meaningful to prediction tasks but are often discarded. We leverage semantically meaningful indexing data while ensuring robustness to potentially uninformative or misleading indices. We propose a post-estimation smoothing operator as a fast and effective method for incorporating structural index data into prediction. Because the smoothing step is separate from the original predictor, it applies to a broad class of machine learning tasks, with no need to retrain models. Our theoretical analysis details simple conditions under which post-estimation smoothing will improve accuracy over that of the original predictor. Our experiments on large scale spatial and temporal datasets highlight the speed and accuracy of post-estimation smoothing in practice. Together, these results illuminate a novel way to consider and incorporate the natural structure of index variables in machine learning.
Tasks
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05955v1
PDF	https://arxiv.org/pdf/2003.05955v1.pdf
PWC	https://paperswithcode.com/paper/post-estimation-smoothing-a-simple-baseline
Repo	https://github.com/estherrolf/p-es
Framework	none

Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells


Title	Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells
Authors	Gengchen Mai, Krzysztof Janowicz, Bo Yan, Rui Zhu, Ling Cai, Ni Lao
Abstract	Unsupervised text encoding models have recently fueled substantial progress in NLP. The key idea is to use neural networks to convert words in texts to vector space representations based on word positions in a sentence and their contexts, which are suitable for end-to-end training of downstream tasks. We see a strikingly similar situation in spatial analysis, which focuses on incorporating both absolute positions and spatial contexts of geographic objects such as POIs into models. A general-purpose representation model for space is valuable for a multitude of tasks. However, no such general model exists to date beyond simply applying discretization or feed-forward nets to coordinates, and little effort has been put into jointly modeling distributions with vastly different characteristics, which commonly emerges from GIS data. Meanwhile, Nobel Prize-winning Neuroscience research shows that grid cells in mammals provide a multi-scale periodic representation that functions as a metric for location encoding and is critical for recognizing places and for path-integration. Therefore, we propose a representation learning model called Space2Vec to encode the absolute positions and spatial relationships of places. We conduct experiments on two real-world geographic data for two different tasks: 1) predicting types of POIs given their positions and context, 2) image classification leveraging their geo-locations. Results show that because of its multi-scale representations, Space2Vec outperforms well-established ML approaches such as RBF kernels, multi-layer feed-forward nets, and tile embedding approaches for location modeling and image classification tasks. Detailed analysis shows that all baselines can at most well handle distribution at one scale but show poor performances in other scales. In contrast, Space2Vec’s multi-scale representation can handle distributions at different scales.
Tasks	Image Classification, Representation Learning
Published	2020-02-16
URL	https://arxiv.org/abs/2003.00824v1
PDF	https://arxiv.org/pdf/2003.00824v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-representation-learning-for-1
Repo	https://github.com/gengchenmai/space2vec
Framework	pytorch

Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning


Title	Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning
Authors	Arsenii Ashukha, Alexander Lyzhov, Dmitry Molchanov, Dmitry Vetrov
Abstract	Uncertainty estimation and ensembling methods go hand-in-hand. Uncertainty estimation is one of the main benchmarks for assessment of ensembling performance. At the same time, deep learning ensembles have provided state-of-the-art results in uncertainty estimation. In this work, we focus on in-domain uncertainty for image classification. We explore the standards for its quantification and point out pitfalls of existing metrics. Avoiding these pitfalls, we perform a broad study of different ensembling techniques. To provide more insight in this study, we introduce the deep ensemble equivalent score (DEE) and show that many sophisticated ensembling techniques are equivalent to an ensemble of only few independently trained networks in terms of test performance.
Tasks	Image Classification
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06470v1
PDF	https://arxiv.org/pdf/2002.06470v1.pdf
PWC	https://paperswithcode.com/paper/pitfalls-of-in-domain-uncertainty-estimation-1
Repo	https://github.com/bayesgroup/pytorch-ensembles
Framework	pytorch

Capsules with Inverted Dot-Product Attention Routing


Title	Capsules with Inverted Dot-Product Attention Routing
Authors	Yao-Hung Hubert Tsai, Nitish Srivastava, Hanlin Goh, Ruslan Salakhutdinov
Abstract	We introduce a new routing algorithm for capsule networks, in which a child capsule is routed to a parent based only on agreement between the parent’s state and the child’s vote. The new mechanism 1) designs routing via inverted dot-product attention; 2) imposes Layer Normalization as normalization; and 3) replaces sequential iterative routing with concurrent iterative routing. When compared to previously proposed routing algorithms, our method improves performance on benchmark datasets such as CIFAR-10 and CIFAR-100, and it performs at-par with a powerful CNN (ResNet-18) with 4x fewer parameters. On a different task of recognizing digits from overlayed digit images, the proposed capsule model performs favorably against CNNs given the same number of layers and neurons per layer. We believe that our work raises the possibility of applying capsule networks to complex real-world tasks. Our code is publicly available at: https://github.com/apple/ml-capsules-inverted-attention-routing An alternative implementation is available at: https://github.com/yaohungt/Capsules-Inverted-Attention-Routing/blob/master/README.md
Tasks	Image Classification
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04764v2
PDF	https://arxiv.org/pdf/2002.04764v2.pdf
PWC	https://paperswithcode.com/paper/capsules-with-inverted-dot-product-attention-1
Repo	https://github.com/yaohungt/Capsules-Inverted-Attention-Routing
Framework	pytorch

Explaining Explanations: Axiomatic Feature Interactions for Deep Networks


Title	Explaining Explanations: Axiomatic Feature Interactions for Deep Networks
Authors	Joseph D. Janizek, Pascal Sturmfels, Su-In Lee
Abstract	Recent work has shown great promise in explaining neural network behavior. In particular, feature attribution methods explain which features were most important to a model’s prediction on a given input. However, for many tasks, simply knowing which features were important to a model’s prediction may not provide enough insight to understand model behavior. The interactions between features within the model may better help us understand not only the model, but also why certain features are more important than others. In this work we present Integrated Hessians, an extension of Integrated Gradients that explains pairwise feature interactions in neural networks. Integrated Hessians overcomes several theoretical limitations of previous methods to explain interactions, and unlike such previous methods is not limited to a specific architecture or class of neural network. We apply Integrated Hessians on a variety of neural networks trained on language data, biological data, astronomy data, and medical data and gain new insight into model behavior in each domain. Code available at https://github.com/suinleelab/path_explain
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04138v2
PDF	https://arxiv.org/pdf/2002.04138v2.pdf
PWC	https://paperswithcode.com/paper/explaining-explanations-axiomatic-feature
Repo	https://github.com/suinleelab/path_explain
Framework	tf

Subspace Capsule Network


Title	Subspace Capsule Network
Authors	Marzieh Edraki, Nazanin Rahnavard, Mubarak Shah
Abstract	Convolutional neural networks (CNNs) have become a key asset to most of fields in AI. Despite their successful performance, CNNs suffer from a major drawback. They fail to capture the hierarchy of spatial relation among different parts of an entity. As a remedy to this problem, the idea of capsules was proposed by Hinton. In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules. A capsule is created by projecting an input feature vector from a lower layer onto the capsule subspace using a learnable transformation. This transformation finds the degree of alignment of the input with the properties modeled by the capsule subspace. We show that SCN is a general capsule network that can successfully be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time. Effectiveness of SCN is evaluated through a comprehensive set of experiments on supervised image classification, semi-supervised image classification and high-resolution image generation tasks using the generative adversarial network (GAN) framework. SCN significantly improves the performance of the baseline models in all 3 tasks.
Tasks	Image Classification, Image Generation, Semi-Supervised Image Classification
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02924v1
PDF	https://arxiv.org/pdf/2002.02924v1.pdf
PWC	https://paperswithcode.com/paper/subspace-capsule-network
Repo	https://github.com/MarziEd/SubSpace-Capsule-Network
Framework	tf