April 3, 2020

2856 words 14 mins read

Paper Group AWR 57

Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder. Learning Unbiased Representations via Mutual Information Backpropagation. A Physical Embedding Model for Knowledge Graphs. GridMask Data Augmentation. An Analysis on the Learning Rules of the Skip-Gram Model. Smart Induction for Isabelle/HOL (System Description). A …

Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder


Title	Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder
Authors	Natasa Sarafijanovic-Djukic, Jesse Davis
Abstract	The goal of anomaly detection is to identify examples that deviate from normal or expected behavior. We tackle this problem for images. We consider a two-phase approach. First, using normal examples, a convolutional autoencoder (CAE) is trained to extract a low-dimensional representation of the images. Here, we propose a novel architectural choice when designing the CAE, an Inception-like CAE. It combines convolutional filters of different kernel sizes and it uses a Global Average Pooling (GAP) operation to extract the representations from the CAE’s bottleneck layer. Second, we employ a distanced-based anomaly detector in the low-dimensional space of the learned representation for the images. However, instead of computing the exact distance, we compute an approximate distance using product quantization. This alleviates the high memory and prediction time costs of distance-based anomaly detectors. We compare our proposed approach to a number of baselines and state-of-the-art methods on four image datasets, and we find that our approach resulted in improved predictive performance.
Tasks	Anomaly Detection, Quantization
Published	2020-03-12
URL	https://arxiv.org/abs/2003.08731v1
PDF	https://arxiv.org/pdf/2003.08731v1.pdf
PWC	https://paperswithcode.com/paper/fast-distance-based-anomaly-detection-in
Repo	https://github.com/natasasdj/anomalyDetection
Framework	none

Learning Unbiased Representations via Mutual Information Backpropagation


Title	Learning Unbiased Representations via Mutual Information Backpropagation
Authors	Ruggero Ragonesi, Riccardo Volpi, Jacopo Cavazza, Vittorio Murino
Abstract	We are interested in learning data-driven representations that can generalize well, even when trained on inherently biased data. In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties. We tackle this problem through the lens of information theory, leveraging recent findings for a differentiable estimation of mutual information. We propose a novel end-to-end optimization strategy, which simultaneously estimates and minimizes the mutual information between the learned representation and the data attributes. When applied on standard benchmarks, our model shows comparable or superior classification performance with respect to state-of-the-art approaches. Moreover, our method is general enough to be applicable to the problem of ``algorithmic fairness’', with competitive results. \|
Tasks
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06430v1
PDF	https://arxiv.org/pdf/2003.06430v1.pdf
PWC	https://paperswithcode.com/paper/learning-unbiased-representations-via-mutual
Repo	https://github.com/rugrag/learn-unbiased
Framework	tf

A Physical Embedding Model for Knowledge Graphs


Title	A Physical Embedding Model for Knowledge Graphs
Authors	Caglar Demir, Axel-Cyrille Ngonga Ngomo
Abstract	Knowledge graph embedding methods learn continuous vector representations for entities in knowledge graphs and have been used successfully in a large number of applications. We present a novel and scalable paradigm for the computation of knowledge graph embeddings, which we dub PYKE . Our approach combines a physical model based on Hooke’s law and its inverse with ideas from simulated annealing to compute embeddings for knowledge graphs efficiently. We prove that PYKE achieves a linear space complexity. While the time complexity for the initialization of our approach is quadratic, the time complexity of each of its iterations is linear in the size of the input knowledge graph. Hence, PYKE’s overall runtime is close to linear. Consequently, our approach easily scales up to knowledge graphs containing millions of triples. We evaluate our approach against six state-of-the-art embedding approaches on the DrugBank and DBpedia datasets in two series of experiments. The first series shows that the cluster purity achieved by PYKE is up to 26% (absolute) better than that of the state of art. In addition, PYKE is more than 22 times faster than existing embedding solutions in the best case. The results of our second series of experiments show that PYKE is up to 23% (absolute) better than the state of art on the task of type prediction while maintaining its superior scalability. Our implementation and results are open-source and are available at http://github.com/dice-group/PYKE.
Tasks	Graph Embedding, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07418v1
PDF	https://arxiv.org/pdf/2001.07418v1.pdf
PWC	https://paperswithcode.com/paper/a-physical-embedding-model-for-knowledge
Repo	https://github.com/dice-group/PYKE
Framework	none

GridMask Data Augmentation


Title	GridMask Data Augmentation
Authors	Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia
Abstract	We propose a novel data augmentation method `GridMask’ in this paper. It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks. We analyze the requirement of information dropping. Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective. It is based on the deletion of regions of the input image. Our extensive experiments show that our method outperforms the latest AutoAugment, which is way more computationally expensive due to the use of reinforcement learning to find the best policies. On the ImageNet dataset for recognition, COCO2017 object detection, and on Cityscapes dataset for semantic segmentation, our method all notably improves performance over baselines. The extensive experiments manifest the effectiveness and generality of the new method. \|
Tasks	Data Augmentation, Object Detection, Semantic Segmentation
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04086v2
PDF	https://arxiv.org/pdf/2001.04086v2.pdf
PWC	https://paperswithcode.com/paper/gridmask-data-augmentation
Repo	https://github.com/akuxcw/GridMask
Framework	pytorch

An Analysis on the Learning Rules of the Skip-Gram Model


Title	An Analysis on the Learning Rules of the Skip-Gram Model
Authors	Canlin Zhang, Xiuwen Liu, Daniel Bis
Abstract	To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the state-of-the-art implementation of the skip-gram model, is widely used and improves the performance of many natural language processing tasks, its mechanism is not yet well understood. In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results.
Tasks
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08489v1
PDF	https://arxiv.org/pdf/2003.08489v1.pdf
PWC	https://paperswithcode.com/paper/an-analysis-on-the-learning-rules-of-the-skip
Repo	https://github.com/canlinzhang/IJCNN-2019-paper
Framework	none

Smart Induction for Isabelle/HOL (System Description)


Title	Smart Induction for Isabelle/HOL (System Description)
Authors	Yutaka Nagashima
Abstract	Proof assistants offer tactics to facilitate inductive proofs. However, it still requires human ingenuity to decide what arguments to pass to those induction tactics. To automate this process, we present smart_induct for Isabelle/HOL. Given an inductive problem in any problem domain, smart_induct lists promising arguments for the induct tactic without relying on a search. Our evaluation demonstrated smart_induct produces valuable recommendations across problem domains.
Tasks
Published	2020-01-27
URL	https://arxiv.org/abs/2001.10834v1
PDF	https://arxiv.org/pdf/2001.10834v1.pdf
PWC	https://paperswithcode.com/paper/smart-induction-for-isabellehol-system
Repo	https://github.com/data61/PSL
Framework	none

Additive Tree Ensembles: Reasoning About Potential Instances


Title	Additive Tree Ensembles: Reasoning About Potential Instances
Authors	Laurens Devos, Wannes Meert, Jesse Davis
Abstract	Imagine being able to ask questions to a black box model such as “Which adversarial examples exist?", “Does a specific attribute have a disproportionate effect on the model’s prediction?” or “What kind of predictions are possible for a partially described example?” This last question is particularly important if your partial description does not correspond to any observed example in your data, as it provides insight into how the model will extrapolate to unseen data. These capabilities would be extremely helpful as it would allow a user to better understand the model’s behavior, particularly as it relates to issues such as robustness, fairness, and bias. In this paper, we propose such an approach for an ensemble of trees. Since, in general, this task is intractable we present a strategy that (1) can prune part of the input space given the question asked to simplify the problem; and (2) follows a divide and conquer approach that is incremental and can always return some answers and indicates which parts of the input domains are still uncertain. The usefulness of our approach is shown on a diverse set of use cases.
Tasks
Published	2020-01-31
URL	https://arxiv.org/abs/2001.11905v1
PDF	https://arxiv.org/pdf/2001.11905v1.pdf
PWC	https://paperswithcode.com/paper/additive-tree-ensembles-reasoning-about
Repo	https://github.com/laudv/treeck
Framework	none

Novelty-Prepared Few-Shot Classification


Title	Novelty-Prepared Few-Shot Classification
Authors	Chao Wang, Ruo-Ze Liu, Han-Jia Ye, Yang Yu
Abstract	Few-shot classification algorithms can alleviate the data scarceness issue, which is vital in many real-world problems, by adopting models pre-trained from abundant data in other domains. However, the pre-training process was commonly unaware of the future adaptation to other concept classes. We disclose that a classically fully trained feature extractor can leave little embedding space for unseen classes, which keeps the model from well-fitting the new classes. In this work, we propose to use a novelty-prepared loss function, called self-compacting softmax loss (SSL), for few-shot classification. The SSL can prevent the full occupancy of the embedding space. Thus the model is more prepared to learn new classes. In experiments on CUB-200-2011 and mini-ImageNet datasets, we show that SSL leads to significant improvement of the state-of-the-art performance. This work may shed some light on considering the model capacity for few-shot classification tasks.
Tasks
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00497v1
PDF	https://arxiv.org/pdf/2003.00497v1.pdf
PWC	https://paperswithcode.com/paper/novelty-prepared-few-shot-classification
Repo	https://github.com/polixir/algo-SSL
Framework	pytorch

Nearest Neighbor Dirichlet Process


Title	Nearest Neighbor Dirichlet Process
Authors	Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson
Abstract	There is a rich literature on Bayesian nonparametric methods for unknown densities. The most popular approach relies on Dirichlet process mixture models. These models characterize the unknown density as a kernel convolution with an unknown almost surely discrete mixing measure, which is given a Dirichlet process prior. Such models are very flexible and have good performance in many settings, but posterior computation relies on Markov chain Monte Carlo algorithms that can be complex and inefficient. As a simple and general alternative, we propose a class of nearest neighbor-Dirichlet processes. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a simple pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating dataset in the context of classification.
Tasks
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07953v1
PDF	https://arxiv.org/pdf/2003.07953v1.pdf
PWC	https://paperswithcode.com/paper/nearest-neighbor-dirichlet-process
Repo	https://github.com/shounakchattopadhyay/NN-DP
Framework	none

Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming


Title	Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming
Authors	Mostafa ElAraby, Guy Wolf, Margarida Carvalho
Abstract	We introduce a novel approach to optimize the architecture of deep neural networks by identifying critical neurons and removing non-critical ones. The proposed approach utilizes a mixed integer programming (MIP) formulation of neural models which includes a continuous importance score computed for each neuron in the network. The optimization in MIP solver minimizes the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the model. Further, the proposed formulation generalizes the recently considered lottery ticket optimization by identifying multiple “lucky” sub-networks resulting in optimized architecture that not only perform well on a single dataset, but also generalize across multiple ones upon retraining of network weights. Finally, the proposed framework provides significant improvement in scalability of automatic sparsification of deep network architectures compared to previous attempts. We validate the performance and generalizability of our approach on MNIST, Fashion-MNIST, and CIFAR-10 datasets, using three different neural networks: LeNet 5 and two ReLU fully connected models.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.07259v1
PDF	https://arxiv.org/pdf/2002.07259v1.pdf
PWC	https://paperswithcode.com/paper/identifying-critical-neurons-in-ann
Repo	https://github.com/chair-dsgt/mip-for-ann
Framework	pytorch

NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search


Title	NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search
Authors	Arber Zela, Julien Siems, Frank Hutter
Abstract	One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice. Nevertheless, there is still a lack of understanding on how these weight-sharing algorithms exactly work due to the many factors controlling the dynamics of the process. In order to allow a scientific study of these components, we introduce a general framework for one-shot NAS that can be instantiated to many recently-introduced variants and introduce a general benchmarking framework that draws on the recent large-scale tabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shot NAS methods. To showcase the framework, we compare several state-of-the-art one-shot NAS methods, examine how sensitive they are to their hyperparameters and how they can be improved by tuning their hyperparameters, and compare their performance to that of blackbox optimizers for NAS-Bench-101.
Tasks	Neural Architecture Search
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10422v1
PDF	https://arxiv.org/pdf/2001.10422v1.pdf
PWC	https://paperswithcode.com/paper/nas-bench-1shot1-benchmarking-and-dissecting-1
Repo	https://github.com/automl/nasbench-1shot1
Framework	pytorch

DeepLENS: Deep Learning for Entity Summarization


Title	DeepLENS: Deep Learning for Entity Summarization
Authors	Qingxia Liu, Gong Cheng, Yuzhong Qu
Abstract	Entity summarization has been a prominent task over knowledge graphs. While existing methods are mainly unsupervised, we present DeepLENS, a simple yet effective deep learning model where we exploit textual semantics for encoding triples and we score each candidate triple based on its interdependence on other triples. DeepLENS significantly outperformed existing methods on a public benchmark.
Tasks	Knowledge Graphs
Published	2020-03-08
URL	https://arxiv.org/abs/2003.03736v1
PDF	https://arxiv.org/pdf/2003.03736v1.pdf
PWC	https://paperswithcode.com/paper/deeplens-deep-learning-for-entity
Repo	https://github.com/nju-websoft/DeepLENS
Framework	tf

Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss


Title	Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
Authors	Lenaic Chizat, Francis Bach
Abstract	Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions. In presence of hidden low-dimensional structures, the resulting margin is independent of the ambiant dimension, which leads to strong generalization bounds. In contrast, training only the output layer implicitly solves a kernel support vector machine, which a priori does not enjoy such an adaptivity. Our analysis of training is non-quantitative in terms of running time but we prove computational guarantees in simplified settings by showing equivalences with online mirror descent. Finally, numerical experiments suggest that our analysis describes well the practical behavior of two-layer neural networks with ReLU activation and confirm the statistical benefits of this implicit bias.
Tasks
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04486v3
PDF	https://arxiv.org/pdf/2002.04486v3.pdf
PWC	https://paperswithcode.com/paper/implicit-bias-of-gradient-descent-for-wide
Repo	https://github.com/lchizat/2020-implicit-bias-wide-2NN
Framework	none

Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games


Title	Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games
Authors	Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton
Abstract	Playing text-based games requires skill in processing natural language and in planning. Although a key goal for agents solving this task is to generalize across multiple games, most previous work has either focused on solving a single game or has tackled generalization with rule-based heuristics. In this work, we investigate how structured information in the form of a knowledge graph (KG) can facilitate effective planning and generalization. We introduce a novel transformer-based sequence-to-sequence model that constructs a “belief” KG from raw text observations of the environment, dynamically updating this belief graph at every game step as it receives new observations. To train this model to build useful graph representations, we introduce and analyze a set of graph-related pre-training tasks. We demonstrate empirically that KG-based representations from our model help agents to converge faster to better policies for multiple text-based games, and further, enable stronger zero-shot performance on unseen games. Experiments on unseen games show that our best agent outperforms text-based baselines by 21.6%.
Tasks	Knowledge Graphs
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09127v1
PDF	https://arxiv.org/pdf/2002.09127v1.pdf
PWC	https://paperswithcode.com/paper/learning-dynamic-knowledge-graphs-to
Repo	https://github.com/xingdi-eric-yuan/GATA-public
Framework	pytorch

HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery


Title	HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery
Authors	Michel Deudon, Alfredo Kalaitzis, Israel Goytom, Md Rifat Arefin, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E. Kahou, Julien Cornebise, Yoshua Bengio
Abstract	Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic results, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet – from deforestation, to human rights violations – that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency’s MFSR competition on real-world satellite imagery.
Tasks	De-aliasing, Image Registration, Multi-Frame Super-Resolution, Super-Resolution
Published	2020-02-15
URL	https://arxiv.org/abs/2002.06460v1
PDF	https://arxiv.org/pdf/2002.06460v1.pdf
PWC	https://paperswithcode.com/paper/highres-net-recursive-fusion-for-multi-frame
Repo	https://github.com/ElementAI/HighRes-net
Framework	pytorch