Paper Group AWR 57
Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder. Learning Unbiased Representations via Mutual Information Backpropagation. A Physical Embedding Model for Knowledge Graphs. GridMask Data Augmentation. An Analysis on the Learning Rules of the Skip-Gram Model. Smart Induction for Isabelle/HOL (System Description). A …
Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder
Title | Fast Distance-based Anomaly Detection in Images Using an Inception-like Autoencoder |
Authors | Natasa Sarafijanovic-Djukic, Jesse Davis |
Abstract | The goal of anomaly detection is to identify examples that deviate from normal or expected behavior. We tackle this problem for images. We consider a two-phase approach. First, using normal examples, a convolutional autoencoder (CAE) is trained to extract a low-dimensional representation of the images. Here, we propose a novel architectural choice when designing the CAE, an Inception-like CAE. It combines convolutional filters of different kernel sizes and it uses a Global Average Pooling (GAP) operation to extract the representations from the CAE’s bottleneck layer. Second, we employ a distanced-based anomaly detector in the low-dimensional space of the learned representation for the images. However, instead of computing the exact distance, we compute an approximate distance using product quantization. This alleviates the high memory and prediction time costs of distance-based anomaly detectors. We compare our proposed approach to a number of baselines and state-of-the-art methods on four image datasets, and we find that our approach resulted in improved predictive performance. |
Tasks | Anomaly Detection, Quantization |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.08731v1 |
https://arxiv.org/pdf/2003.08731v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-distance-based-anomaly-detection-in |
Repo | https://github.com/natasasdj/anomalyDetection |
Framework | none |
Learning Unbiased Representations via Mutual Information Backpropagation
Title | Learning Unbiased Representations via Mutual Information Backpropagation |
Authors | Ruggero Ragonesi, Riccardo Volpi, Jacopo Cavazza, Vittorio Murino |
Abstract | We are interested in learning data-driven representations that can generalize well, even when trained on inherently biased data. In particular, we face the case where some attributes (bias) of the data, if learned by the model, can severely compromise its generalization properties. We tackle this problem through the lens of information theory, leveraging recent findings for a differentiable estimation of mutual information. We propose a novel end-to-end optimization strategy, which simultaneously estimates and minimizes the mutual information between the learned representation and the data attributes. When applied on standard benchmarks, our model shows comparable or superior classification performance with respect to state-of-the-art approaches. Moreover, our method is general enough to be applicable to the problem of ``algorithmic fairness’', with competitive results. | |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06430v1 |
https://arxiv.org/pdf/2003.06430v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-unbiased-representations-via-mutual |
Repo | https://github.com/rugrag/learn-unbiased |
Framework | tf |
A Physical Embedding Model for Knowledge Graphs
Title | A Physical Embedding Model for Knowledge Graphs |
Authors | Caglar Demir, Axel-Cyrille Ngonga Ngomo |
Abstract | Knowledge graph embedding methods learn continuous vector representations for entities in knowledge graphs and have been used successfully in a large number of applications. We present a novel and scalable paradigm for the computation of knowledge graph embeddings, which we dub PYKE . Our approach combines a physical model based on Hooke’s law and its inverse with ideas from simulated annealing to compute embeddings for knowledge graphs efficiently. We prove that PYKE achieves a linear space complexity. While the time complexity for the initialization of our approach is quadratic, the time complexity of each of its iterations is linear in the size of the input knowledge graph. Hence, PYKE’s overall runtime is close to linear. Consequently, our approach easily scales up to knowledge graphs containing millions of triples. We evaluate our approach against six state-of-the-art embedding approaches on the DrugBank and DBpedia datasets in two series of experiments. The first series shows that the cluster purity achieved by PYKE is up to 26% (absolute) better than that of the state of art. In addition, PYKE is more than 22 times faster than existing embedding solutions in the best case. The results of our second series of experiments show that PYKE is up to 23% (absolute) better than the state of art on the task of type prediction while maintaining its superior scalability. Our implementation and results are open-source and are available at http://github.com/dice-group/PYKE. |
Tasks | Graph Embedding, Knowledge Graph Embedding, Knowledge Graph Embeddings, Knowledge Graphs |
Published | 2020-01-21 |
URL | https://arxiv.org/abs/2001.07418v1 |
https://arxiv.org/pdf/2001.07418v1.pdf | |
PWC | https://paperswithcode.com/paper/a-physical-embedding-model-for-knowledge |
Repo | https://github.com/dice-group/PYKE |
Framework | none |
GridMask Data Augmentation
Title | GridMask Data Augmentation |
Authors | Pengguang Chen, Shu Liu, Hengshuang Zhao, Jiaya Jia |
Abstract | We propose a novel data augmentation method `GridMask’ in this paper. It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks. We analyze the requirement of information dropping. Then we show limitation of existing information dropping algorithms and propose our structured method, which is simple and yet very effective. It is based on the deletion of regions of the input image. Our extensive experiments show that our method outperforms the latest AutoAugment, which is way more computationally expensive due to the use of reinforcement learning to find the best policies. On the ImageNet dataset for recognition, COCO2017 object detection, and on Cityscapes dataset for semantic segmentation, our method all notably improves performance over baselines. The extensive experiments manifest the effectiveness and generality of the new method. | |
Tasks | Data Augmentation, Object Detection, Semantic Segmentation |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04086v2 |
https://arxiv.org/pdf/2001.04086v2.pdf | |
PWC | https://paperswithcode.com/paper/gridmask-data-augmentation |
Repo | https://github.com/akuxcw/GridMask |
Framework | pytorch |
An Analysis on the Learning Rules of the Skip-Gram Model
Title | An Analysis on the Learning Rules of the Skip-Gram Model |
Authors | Canlin Zhang, Xiuwen Liu, Daniel Bis |
Abstract | To improve the generalization of the representations for natural language processing tasks, words are commonly represented using vectors, where distances among the vectors are related to the similarity of the words. While word2vec, the state-of-the-art implementation of the skip-gram model, is widely used and improves the performance of many natural language processing tasks, its mechanism is not yet well understood. In this work, we derive the learning rules for the skip-gram model and establish their close relationship to competitive learning. In addition, we provide the global optimal solution constraints for the skip-gram model and validate them by experimental results. |
Tasks | |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08489v1 |
https://arxiv.org/pdf/2003.08489v1.pdf | |
PWC | https://paperswithcode.com/paper/an-analysis-on-the-learning-rules-of-the-skip |
Repo | https://github.com/canlinzhang/IJCNN-2019-paper |
Framework | none |
Smart Induction for Isabelle/HOL (System Description)
Title | Smart Induction for Isabelle/HOL (System Description) |
Authors | Yutaka Nagashima |
Abstract | Proof assistants offer tactics to facilitate inductive proofs. However, it still requires human ingenuity to decide what arguments to pass to those induction tactics. To automate this process, we present smart_induct for Isabelle/HOL. Given an inductive problem in any problem domain, smart_induct lists promising arguments for the induct tactic without relying on a search. Our evaluation demonstrated smart_induct produces valuable recommendations across problem domains. |
Tasks | |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.10834v1 |
https://arxiv.org/pdf/2001.10834v1.pdf | |
PWC | https://paperswithcode.com/paper/smart-induction-for-isabellehol-system |
Repo | https://github.com/data61/PSL |
Framework | none |
Additive Tree Ensembles: Reasoning About Potential Instances
Title | Additive Tree Ensembles: Reasoning About Potential Instances |
Authors | Laurens Devos, Wannes Meert, Jesse Davis |
Abstract | Imagine being able to ask questions to a black box model such as “Which adversarial examples exist?", “Does a specific attribute have a disproportionate effect on the model’s prediction?” or “What kind of predictions are possible for a partially described example?” This last question is particularly important if your partial description does not correspond to any observed example in your data, as it provides insight into how the model will extrapolate to unseen data. These capabilities would be extremely helpful as it would allow a user to better understand the model’s behavior, particularly as it relates to issues such as robustness, fairness, and bias. In this paper, we propose such an approach for an ensemble of trees. Since, in general, this task is intractable we present a strategy that (1) can prune part of the input space given the question asked to simplify the problem; and (2) follows a divide and conquer approach that is incremental and can always return some answers and indicates which parts of the input domains are still uncertain. The usefulness of our approach is shown on a diverse set of use cases. |
Tasks | |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2001.11905v1 |
https://arxiv.org/pdf/2001.11905v1.pdf | |
PWC | https://paperswithcode.com/paper/additive-tree-ensembles-reasoning-about |
Repo | https://github.com/laudv/treeck |
Framework | none |
Novelty-Prepared Few-Shot Classification
Title | Novelty-Prepared Few-Shot Classification |
Authors | Chao Wang, Ruo-Ze Liu, Han-Jia Ye, Yang Yu |
Abstract | Few-shot classification algorithms can alleviate the data scarceness issue, which is vital in many real-world problems, by adopting models pre-trained from abundant data in other domains. However, the pre-training process was commonly unaware of the future adaptation to other concept classes. We disclose that a classically fully trained feature extractor can leave little embedding space for unseen classes, which keeps the model from well-fitting the new classes. In this work, we propose to use a novelty-prepared loss function, called self-compacting softmax loss (SSL), for few-shot classification. The SSL can prevent the full occupancy of the embedding space. Thus the model is more prepared to learn new classes. In experiments on CUB-200-2011 and mini-ImageNet datasets, we show that SSL leads to significant improvement of the state-of-the-art performance. This work may shed some light on considering the model capacity for few-shot classification tasks. |
Tasks | |
Published | 2020-03-01 |
URL | https://arxiv.org/abs/2003.00497v1 |
https://arxiv.org/pdf/2003.00497v1.pdf | |
PWC | https://paperswithcode.com/paper/novelty-prepared-few-shot-classification |
Repo | https://github.com/polixir/algo-SSL |
Framework | pytorch |
Nearest Neighbor Dirichlet Process
Title | Nearest Neighbor Dirichlet Process |
Authors | Shounak Chattopadhyay, Antik Chakraborty, David B. Dunson |
Abstract | There is a rich literature on Bayesian nonparametric methods for unknown densities. The most popular approach relies on Dirichlet process mixture models. These models characterize the unknown density as a kernel convolution with an unknown almost surely discrete mixing measure, which is given a Dirichlet process prior. Such models are very flexible and have good performance in many settings, but posterior computation relies on Markov chain Monte Carlo algorithms that can be complex and inefficient. As a simple and general alternative, we propose a class of nearest neighbor-Dirichlet processes. The approach starts by grouping the data into neighborhoods based on standard algorithms. Within each neighborhood, the density is characterized via a Bayesian parametric model, such as a Gaussian with unknown parameters. Assigning a Dirichlet prior to the weights on these local kernels, we obtain a simple pseudo-posterior for the weights and kernel parameters. A simple and embarrassingly parallel Monte Carlo algorithm is proposed to sample from the resulting pseudo-posterior for the unknown density. Desirable asymptotic properties are shown, and the methods are evaluated in simulation studies and applied to a motivating dataset in the context of classification. |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07953v1 |
https://arxiv.org/pdf/2003.07953v1.pdf | |
PWC | https://paperswithcode.com/paper/nearest-neighbor-dirichlet-process |
Repo | https://github.com/shounakchattopadhyay/NN-DP |
Framework | none |
Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming
Title | Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming |
Authors | Mostafa ElAraby, Guy Wolf, Margarida Carvalho |
Abstract | We introduce a novel approach to optimize the architecture of deep neural networks by identifying critical neurons and removing non-critical ones. The proposed approach utilizes a mixed integer programming (MIP) formulation of neural models which includes a continuous importance score computed for each neuron in the network. The optimization in MIP solver minimizes the number of critical neurons (i.e., with high importance score) that need to be kept for maintaining the overall accuracy of the model. Further, the proposed formulation generalizes the recently considered lottery ticket optimization by identifying multiple “lucky” sub-networks resulting in optimized architecture that not only perform well on a single dataset, but also generalize across multiple ones upon retraining of network weights. Finally, the proposed framework provides significant improvement in scalability of automatic sparsification of deep network architectures compared to previous attempts. We validate the performance and generalizability of our approach on MNIST, Fashion-MNIST, and CIFAR-10 datasets, using three different neural networks: LeNet 5 and two ReLU fully connected models. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07259v1 |
https://arxiv.org/pdf/2002.07259v1.pdf | |
PWC | https://paperswithcode.com/paper/identifying-critical-neurons-in-ann |
Repo | https://github.com/chair-dsgt/mip-for-ann |
Framework | pytorch |
NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search
Title | NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search |
Authors | Arber Zela, Julien Siems, Frank Hutter |
Abstract | One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice. Nevertheless, there is still a lack of understanding on how these weight-sharing algorithms exactly work due to the many factors controlling the dynamics of the process. In order to allow a scientific study of these components, we introduce a general framework for one-shot NAS that can be instantiated to many recently-introduced variants and introduce a general benchmarking framework that draws on the recent large-scale tabular benchmark NAS-Bench-101 for cheap anytime evaluations of one-shot NAS methods. To showcase the framework, we compare several state-of-the-art one-shot NAS methods, examine how sensitive they are to their hyperparameters and how they can be improved by tuning their hyperparameters, and compare their performance to that of blackbox optimizers for NAS-Bench-101. |
Tasks | Neural Architecture Search |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10422v1 |
https://arxiv.org/pdf/2001.10422v1.pdf | |
PWC | https://paperswithcode.com/paper/nas-bench-1shot1-benchmarking-and-dissecting-1 |
Repo | https://github.com/automl/nasbench-1shot1 |
Framework | pytorch |
DeepLENS: Deep Learning for Entity Summarization
Title | DeepLENS: Deep Learning for Entity Summarization |
Authors | Qingxia Liu, Gong Cheng, Yuzhong Qu |
Abstract | Entity summarization has been a prominent task over knowledge graphs. While existing methods are mainly unsupervised, we present DeepLENS, a simple yet effective deep learning model where we exploit textual semantics for encoding triples and we score each candidate triple based on its interdependence on other triples. DeepLENS significantly outperformed existing methods on a public benchmark. |
Tasks | Knowledge Graphs |
Published | 2020-03-08 |
URL | https://arxiv.org/abs/2003.03736v1 |
https://arxiv.org/pdf/2003.03736v1.pdf | |
PWC | https://paperswithcode.com/paper/deeplens-deep-learning-for-entity |
Repo | https://github.com/nju-websoft/DeepLENS |
Framework | tf |
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
Title | Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss |
Authors | Lenaic Chizat, Francis Bach |
Abstract | Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponentially tailed losses can be fully characterized as a max-margin classifier in a certain non-Hilbertian space of functions. In presence of hidden low-dimensional structures, the resulting margin is independent of the ambiant dimension, which leads to strong generalization bounds. In contrast, training only the output layer implicitly solves a kernel support vector machine, which a priori does not enjoy such an adaptivity. Our analysis of training is non-quantitative in terms of running time but we prove computational guarantees in simplified settings by showing equivalences with online mirror descent. Finally, numerical experiments suggest that our analysis describes well the practical behavior of two-layer neural networks with ReLU activation and confirm the statistical benefits of this implicit bias. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04486v3 |
https://arxiv.org/pdf/2002.04486v3.pdf | |
PWC | https://paperswithcode.com/paper/implicit-bias-of-gradient-descent-for-wide |
Repo | https://github.com/lchizat/2020-implicit-bias-wide-2NN |
Framework | none |
Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games
Title | Learning Dynamic Knowledge Graphs to Generalize on Text-Based Games |
Authors | Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton |
Abstract | Playing text-based games requires skill in processing natural language and in planning. Although a key goal for agents solving this task is to generalize across multiple games, most previous work has either focused on solving a single game or has tackled generalization with rule-based heuristics. In this work, we investigate how structured information in the form of a knowledge graph (KG) can facilitate effective planning and generalization. We introduce a novel transformer-based sequence-to-sequence model that constructs a “belief” KG from raw text observations of the environment, dynamically updating this belief graph at every game step as it receives new observations. To train this model to build useful graph representations, we introduce and analyze a set of graph-related pre-training tasks. We demonstrate empirically that KG-based representations from our model help agents to converge faster to better policies for multiple text-based games, and further, enable stronger zero-shot performance on unseen games. Experiments on unseen games show that our best agent outperforms text-based baselines by 21.6%. |
Tasks | Knowledge Graphs |
Published | 2020-02-21 |
URL | https://arxiv.org/abs/2002.09127v1 |
https://arxiv.org/pdf/2002.09127v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-dynamic-knowledge-graphs-to |
Repo | https://github.com/xingdi-eric-yuan/GATA-public |
Framework | pytorch |
HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery
Title | HighRes-net: Recursive Fusion for Multi-Frame Super-Resolution of Satellite Imagery |
Authors | Michel Deudon, Alfredo Kalaitzis, Israel Goytom, Md Rifat Arefin, Zhichao Lin, Kris Sankaran, Vincent Michalski, Samira E. Kahou, Julien Cornebise, Yoshua Bengio |
Abstract | Generative deep learning has sparked a new wave of Super-Resolution (SR) algorithms that enhance single images with impressive aesthetic results, albeit with imaginary details. Multi-frame Super-Resolution (MFSR) offers a more grounded approach to the ill-posed problem, by conditioning on multiple low-resolution views. This is important for satellite monitoring of human impact on the planet – from deforestation, to human rights violations – that depend on reliable imagery. To this end, we present HighRes-net, the first deep learning approach to MFSR that learns its sub-tasks in an end-to-end fashion: (i) co-registration, (ii) fusion, (iii) up-sampling, and (iv) registration-at-the-loss. Co-registration of low-resolution views is learned implicitly through a reference-frame channel, with no explicit registration mechanism. We learn a global fusion operator that is applied recursively on an arbitrary number of low-resolution pairs. We introduce a registered loss, by learning to align the SR output to a ground-truth through ShiftNet. We show that by learning deep representations of multiple views, we can super-resolve low-resolution signals and enhance Earth Observation data at scale. Our approach recently topped the European Space Agency’s MFSR competition on real-world satellite imagery. |
Tasks | De-aliasing, Image Registration, Multi-Frame Super-Resolution, Super-Resolution |
Published | 2020-02-15 |
URL | https://arxiv.org/abs/2002.06460v1 |
https://arxiv.org/pdf/2002.06460v1.pdf | |
PWC | https://paperswithcode.com/paper/highres-net-recursive-fusion-for-multi-frame |
Repo | https://github.com/ElementAI/HighRes-net |
Framework | pytorch |