Paper Group NANR 135
Learning Compact Reward for Image Captioning. Neural Network Out-of-Distribution Detection for Regression Tasks. A Dynamic Approach to Accelerate Deep Learning Training. Inducing Stronger Object Representations in Deep Visual Trackers. Improving Confident-Classifiers For Out-of-distribution Detection. Multi-Scale Representation Learning for Spatial …
Learning Compact Reward for Image Captioning
Title | Learning Compact Reward for Image Captioning |
Authors | Anonymous |
Abstract | Adversarial learning has shown its advances in generating natural and diverse descriptions in image captioning. However, the learned reward of existing adversarial methods is vague and ill-defined due to the reward ambiguity problem. In this paper, we propose a refined Adversarial Inverse Reinforcement Learning (rAIRL) method to handle the reward ambiguity problem by disentangling reward for each word in a sentence, as well as achieve stable adversarial training by refining the loss function to shift the stationary point towards Nash equilibrium. In addition, we introduce a conditional term in the loss function to mitigate mode collapse and to increase the diversity of the generated descriptions. Our experiments on MS COCO show that our method can learn compact reward for image captioning. |
Tasks | Image Captioning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Syl5o2EFPB |
https://openreview.net/pdf?id=Syl5o2EFPB | |
PWC | https://paperswithcode.com/paper/learning-compact-reward-for-image-captioning |
Repo | |
Framework | |
Neural Network Out-of-Distribution Detection for Regression Tasks
Title | Neural Network Out-of-Distribution Detection for Regression Tasks |
Authors | Anonymous |
Abstract | Neural network out-of-distribution (OOD) detection aims to identify when a model is unable to generalize to new inputs, either due to covariate shift or anomalous data. Most existing OOD methods only apply to classification tasks, as they assume a discrete set of possible predictions. In this paper, we propose a method for neural network OOD detection that can be applied to regression problems. We demonstrate that the hidden features for in-distribution data can be described by a highly concentrated, low dimensional distribution. Therefore, we can model these in-distribution features with an extremely simple generative model, such as a Gaussian mixture model (GMM) with 4 or fewer components. We demonstrate on several real-world benchmark data sets that GMM-based feature detection achieves state-of-the-art OOD detection results on several regression tasks. Moreover, this approach is simple to implement and computationally efficient. |
Tasks | Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxsUySFwr |
https://openreview.net/pdf?id=ryxsUySFwr | |
PWC | https://paperswithcode.com/paper/neural-network-out-of-distribution-detection |
Repo | |
Framework | |
A Dynamic Approach to Accelerate Deep Learning Training
Title | A Dynamic Approach to Accelerate Deep Learning Training |
Authors | Anonymous |
Abstract | Mixed-precision arithmetic combining both single- and half-precision operands in the same operation have been successfully applied to train deep neural networks. Despite the advantages of mixed-precision arithmetic in terms of reducing the need for key resources like memory bandwidth or register file size, it has a limited capacity for diminishing computing costs and requires 32 bits to represent its output operands. This paper proposes two approaches to replace mixed-precision for half-precision arithmetic during a large portion of the training. The first approach achieves accuracy ratios slightly slower than the state-of-the-art by using half-precision arithmetic during more than 99% of training. The second approach reaches the same accuracy as the state-of-the-art by dynamically switching between half- and mixed-precision arithmetic during training. It uses half-precision during more than 94% of the training process. This paper is the first in demonstrating that half-precision can be used for a very large portion of DNNs training and still reach state-of-the-art accuracy. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJefGpEtDB |
https://openreview.net/pdf?id=SJefGpEtDB | |
PWC | https://paperswithcode.com/paper/a-dynamic-approach-to-accelerate-deep |
Repo | |
Framework | |
Inducing Stronger Object Representations in Deep Visual Trackers
Title | Inducing Stronger Object Representations in Deep Visual Trackers |
Authors | Anonymous |
Abstract | Fully convolutional deep correlation networks are integral components of state-of- the-art approaches to single object visual tracking. It is commonly assumed that these networks perform tracking by detection by matching features of the object instance with features of the entire frame. Strong architectural priors and conditioning on the object representation is thought to encourage this tracking strategy. Despite these strong priors, we show that deep trackers often default to “tracking- by-saliency” detection – without relying on the object instance representation. Our analysis shows that despite being a useful prior, salience detection can prevent the emergence of more robust tracking strategies in deep networks. This leads us to introduce an auxiliary detection task that encourages more discriminative object representations that improve tracking performance. |
Tasks | Saliency Detection, Visual Tracking |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BygfiAEtwS |
https://openreview.net/pdf?id=BygfiAEtwS | |
PWC | https://paperswithcode.com/paper/inducing-stronger-object-representations-in |
Repo | |
Framework | |
Improving Confident-Classifiers For Out-of-distribution Detection
Title | Improving Confident-Classifiers For Out-of-distribution Detection |
Authors | Anonymous |
Abstract | Discriminatively trained neural classifiers can be trusted, only when the input data comes from the training distribution (in-distribution). Therefore, detecting out-of-distribution (OOD) samples is very important to avoid classification errors. In the context of OOD detection for image classification, one of the recent approaches proposes training a classifier called “confident-classifier” by minimizing the standard cross-entropy loss on in-distribution samples and minimizing the KLdivergence between the predictive distribution of OOD samples in the low-density“boundary” of in-distribution and the uniform distribution (maximizing the entropy of the outputs). Thus, the samples could be detected as OOD if they have low confidence or high entropy. In this paper, we analyze this setting both theoretically and experimentally. We also propose a novel algorithm to generate the“boundary” OOD samples to train a classifier with an explicit “reject” class for OOD samples. We compare our approach against several recent classifier-based OOD detectors including the confident-classifiers on MNIST and Fashion-MNISTdatasets. Overall the proposed approach consistently performs better than others across most of the experiments. |
Tasks | Image Classification, Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeIGkBKPS |
https://openreview.net/pdf?id=rJeIGkBKPS | |
PWC | https://paperswithcode.com/paper/improving-confident-classifiers-for-out-of |
Repo | |
Framework | |
Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells
Title | Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells |
Authors | Anonymous |
Abstract | Unsupervised text encoding models have recently fueled substantial progress in Natural Language Processing (NLP). The key idea is to use neural networks to convert words in texts to vector space representations (embeddings) based on word positions in a sentence and their contexts. We see a strikingly similar situation in spatial analysis, which focuses on incorporating both absolute positions and spatial contexts of geographic objects such as Points of Interest (POIs) into models. A general space encoding method is valuable for a multitude of tasks such asPOI search, land use classification, point-based spatial interpolation and locationaware image classification. However, no such general model exists to date beyond simply applying discretizing or feed forward nets to coordinates, and little effort has been put into jointly modeling distributions with vastly different characteristics, which commonly emerges from GIS data. Meanwhile, Nobel Prize-winning Neuroscience research shows that grid cells in mammals provide a multi-scale periodic representation that functions as a metric for encoding space and are critical for recognizing places and for path-integration. Inspired by this research, wepropose a representation learning model called Space2vec to encode the absolutepositions and spatial relationships of places. We conduct experiments on realworld geographic data and predict types of POIs at given positions based on their1) locations and 2) nearby POIs. Results show that because of its multi-scale representations Space2vec outperforms well-established ML approaches such as RBF kernels, multi-layer feed forward nets, and tile embedding approaches. |
Tasks | Image Classification, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJljdh4KDH |
https://openreview.net/pdf?id=rJljdh4KDH | |
PWC | https://paperswithcode.com/paper/multi-scale-representation-learning-for |
Repo | |
Framework | |
Semantics Preserving Adversarial Attacks
Title | Semantics Preserving Adversarial Attacks |
Authors | Anonymous |
Abstract | While progress has been made in crafting visually imperceptible adversarial examples, constructing semantically meaningful ones remains a challenge. In this paper, we propose a framework to generate semantics preserving adversarial examples. First, we present a manifold learning method to capture the semantics of the inputs. The motivating principle is to learn the low-dimensional geometric summaries of the inputs via statistical inference. Then, we perturb the elements of the learned manifold using the Gram-Schmidt process to induce the perturbed elements to remain in the manifold. To produce adversarial examples, we propose an efficient algorithm whereby we leverage the semantics of the inputs as a source of knowledge upon which we impose adversarial constraints. We apply our approach on toy data, images and text, and show its effectiveness in producing semantics preserving adversarial examples which evade existing defenses against adversarial attacks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJx4O34YvS |
https://openreview.net/pdf?id=SJx4O34YvS | |
PWC | https://paperswithcode.com/paper/semantics-preserving-adversarial-attacks |
Repo | |
Framework | |
FNNP: Fast Neural Network Pruning Using Adaptive Batch Normalization
Title | FNNP: Fast Neural Network Pruning Using Adaptive Batch Normalization |
Authors | Anonymous |
Abstract | Finding out the computational redundant part of a trained Deep Neural Network (DNN) is the key question that pruning algorithms target on. Many algorithms try to predict model performance of the pruned sub-nets by introducing various evaluation methods. But they are either inaccurate or very complicated for general application. In this work, we present a pruning method called Fast Neural Network Pruning (FNNP), in which a simple yet efficient evaluation component called ABN-based evaluation is applied to unveil a strong correlation between different pruned DNN structures and their final settled accuracy. This strong correlation allows us to fast spot the pruned candidates with highest potential accuracy without actually fine tuning them. FNNP does not require any extra regularization or supervision introduced to a common DNN training pipeline but still can achieve better accuracy than many carefully-designed pruning methods. In the experiments of pruning MobileNet V1 and ResNet-50, FNNP outperforms all compared methods by up to 3.8%. Even in the more challenging experiments of pruning the compact model of MobileNet V1, our FNNP achieves the highest accuracy of 70.7% with an overall 50% operations (FLOPs) pruned. All accuracy data are Top-1 ImageNet classification accuracy. Source code and models are accessible to open-source community. |
Tasks | Network Pruning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJeUPlrYvr |
https://openreview.net/pdf?id=rJeUPlrYvr | |
PWC | https://paperswithcode.com/paper/fnnp-fast-neural-network-pruning-using |
Repo | |
Framework | |
Handwritten Amharic Character Recognition System Using Convolutional Neural Networks
Title | Handwritten Amharic Character Recognition System Using Convolutional Neural Networks |
Authors | Anonymous |
Abstract | Amharic language is an official language of the federal government of the Federal Democratic Republic of Ethiopia. Accordingly, there is a bulk of handwritten Amharic documents available in libraries, information centres, museums, and offices. Digitization of these documents enables to harness already available language technologies to local information needs and developments. Converting these documents will have a lot of advantages including (i) to preserve and transfer history of the country (ii) to save storage space (ii) proper handling of documents (iv) enhance retrieval of information through internet and other applications. Handwritten Amharic character recognition system becomes a challenging task due to inconsistency of a writer, variability in writing styles of different writers, relatively large number of characters of the script, high interclass similarity, structural complexity and degradation of documents due to different reasons. In order to recognize handwritten Amharic character a novel method based on deep neural networks is used which has recently shown exceptional performance in various pattern recognition and machine learning applications, but has not been endeavoured for Ethiopic script. The CNN model is trained and tested our database that contains 132,500 datasets of handwritten Amharic characters. Common machine learning methods usually apply a combination of feature extractor and trainable classifier. The use of CNN leads to significant improvements across different machine-learning classification algorithms. Our proposed CNN model is giving an accuracy of 91.83% on training data and 90.47% on validation data. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJxHcgStwr |
https://openreview.net/pdf?id=rJxHcgStwr | |
PWC | https://paperswithcode.com/paper/handwritten-amharic-character-recognition |
Repo | |
Framework | |
Learn Interpretable Word Embeddings Efficiently with von Mises-Fisher Distribution
Title | Learn Interpretable Word Embeddings Efficiently with von Mises-Fisher Distribution |
Authors | Anonymous |
Abstract | Word embedding plays a key role in various tasks of natural language processing. However, the dominant word embedding models don’t explain what information is carried with the resulting embeddings. To generate interpretable word embeddings we intend to replace the word vector with a probability density distribution. The insight here is that if we regularize the mixture distribution of all words to be uniform, then we can prove that the inner product between word embeddings represent the point-wise mutual information between words. Moreover, our model can also handle polysemy. Each word’s probability density distribution will generate different vectors for its various meanings. We have evaluated our model in several word similarity tasks. Results show that our model can outperform the dominant models consistently in these tasks. |
Tasks | Word Embeddings |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bke02gHYwB |
https://openreview.net/pdf?id=Bke02gHYwB | |
PWC | https://paperswithcode.com/paper/learn-interpretable-word-embeddings |
Repo | |
Framework | |
Stablizing Adversarial Invariance Induction by Discriminator Matching
Title | Stablizing Adversarial Invariance Induction by Discriminator Matching |
Authors | Anonymous |
Abstract | Incorporating the desired invariance into representation learning is a key challenge in many situations, e.g., for domain generalization and privacy/fairness constraints. An adversarial invariance induction (AII) shows its power on this purpose, which maximizes the proxy of the conditional entropy between representations and attributes by adversarial training between an attribute discriminator and feature extractor. However, the practical behavior of AII is still unclear as the previous analysis assumes the optimality of the attribute classifier, which is rarely held in practice. This paper first analyzes the practical behavior of AII both theoretically and empirically, indicating that AII has theoretical difficulty as it maximizes variational {\em upper} bound of the actual conditional entropy, and AII catastrophically fails to induce invariance even in simple cases as suggested by the above theoretical findings. We then argue that a simple modification to AII can significantly stabilize the adversarial induction framework and achieve better invariant representations. Our modification is based on the property of conditional entropy; it is maximized if and only if the divergence between all pairs of marginal distributions over $z$ between different attributes is minimized. The proposed method, {\em invariance induction by discriminator matching}, modify AII objective to explicitly consider the divergence minimization requirements by defining a proxy of the divergence by using the attribute discriminator. Empirical validations on both the toy dataset and four real-world datasets (related to applications of user anonymization and domain generalization) reveal that the proposed method provides superior performance when inducing invariance for nuisance factors. |
Tasks | Domain Generalization, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1lFa3EFwB |
https://openreview.net/pdf?id=B1lFa3EFwB | |
PWC | https://paperswithcode.com/paper/stablizing-adversarial-invariance-induction |
Repo | |
Framework | |
Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks
Title | Self-Imitation Learning via Trajectory-Conditioned Policy for Hard-Exploration Tasks |
Authors | Anonymous |
Abstract | Imitation learning from human-expert demonstrations has been shown to be greatly helpful for challenging reinforcement learning problems with sparse environment rewards. However, it is very difficult to achieve similar success without relying on expert demonstrations. Recent works on self-imitation learning showed that imitating the agent’s own past good experience could indirectly drive exploration in some environments, but these methods often lead to sub-optimal and myopic behavior. To address this issue, we argue that exploration in diverse directions by imitating diverse trajectories, instead of focusing on limited good trajectories, is more desirable for the hard-exploration tasks. We propose a new method of learning a trajectory-conditioned policy to imitate diverse trajectories from the agent’s own past experiences and show that such self-imitation helps avoid myopic behavior and increases the chance of finding a globally optimal solution for hard-exploration tasks, especially when there are misleading rewards. Our method significantly outperforms existing self-imitation learning and count-based exploration methods on various hard-exploration tasks with local optima. In particular, we report a state-of-the-art score of more than 20,000 points on Montezumas Revenge without using expert demonstrations or resetting to arbitrary states. |
Tasks | Imitation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Byg5KyHYwr |
https://openreview.net/pdf?id=Byg5KyHYwr | |
PWC | https://paperswithcode.com/paper/self-imitation-learning-via-trajectory |
Repo | |
Framework | |
Adaptive Neural Connections for Sparsity Learning
Title | Adaptive Neural Connections for Sparsity Learning |
Authors | Prakhar Kaushik, Alex Gain, Hava Siegelmann |
Abstract | Sparsity learning aims to decrease the computational and memory costs of large deep neural networks (DNNs) via pruning neural connections while simultaneously retaining high accuracy. A large body of work has developed sparsity learning approaches, with recent large-scale experiments showing that two main methods, magnitude pruning and Variational Dropout (VD), achieve similar state-of-the-art results for classification tasks. We propose Adaptive Neural Connections (ANC), a method for explicitly parameterizing fine-grained neuron-to-neuron connections via adjacency matrices at each layer that are learned through backpropagation. Explicitly parameterizing neuron-to-neuron connections confers two primary advantages: 1. Sparsity can be explicitly optimized for via norm-based regularization on the adjacency matrices; and 2. When combined with VD (which we term, ANC-VD), the adjacencies can be interpreted as learned weight importance parameters, which we hypothesize leads to improved convergence for VD. Experiments with ResNet18 show that architectures augmented with ANC outperform their vanilla counterparts. |
Tasks | Model Compression, Network Pruning, Sparse Learning |
Published | 2020-03-05 |
URL | http://openaccess.thecvf.com/content_WACV_2020/html/Gain_Adaptive_Neural_Connections_for_Sparsity_Learning_WACV_2020_paper.html |
http://openaccess.thecvf.com/content_WACV_2020/papers/Gain_Adaptive_Neural_Connections_for_Sparsity_Learning_WACV_2020_paper.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-neural-connections-for-sparsity |
Repo | |
Framework | |
Out-of-Distribution Detection Using Layerwise Uncertainty in Deep Neural Networks
Title | Out-of-Distribution Detection Using Layerwise Uncertainty in Deep Neural Networks |
Authors | Hirono Okamoto, Masahiro Suzuki, Yutaka Matsuo |
Abstract | In this paper, we tackle the problem of detecting samples that are not drawn from the training distribution, i.e., out-of-distribution (OOD) samples, in classification. Many previous studies have attempted to solve this problem by regarding samples with low classification confidence as OOD examples using deep neural networks (DNNs). However, on difficult datasets or models with low classification ability, these methods incorrectly regard in-distribution samples close to the decision boundary as OOD samples. This problem arises because their approaches use only the features close to the output layer and disregard the uncertainty of the features. Therefore, we propose a method that extracts the uncertainties of features in each layer of DNNs using a reparameterization trick and combines them. In experiments, our method outperforms the existing methods by a large margin, achieving state-of-the-art detection performance on several datasets and classification models. For example, our method increases the AUROC score of prior work (83.8%) to 99.8% in DenseNet on the CIFAR-100 and Tiny-ImageNet datasets. |
Tasks | Out-of-Distribution Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklVOnNtwH |
https://openreview.net/pdf?id=rklVOnNtwH | |
PWC | https://paperswithcode.com/paper/out-of-distribution-detection-using-layerwise |
Repo | |
Framework | |
Verification of Generative-Model-Based Visual Transformations
Title | Verification of Generative-Model-Based Visual Transformations |
Authors | Anonymous |
Abstract | Generative networks are promising models for specifying visual transformations. Unfortunately, certification of generative models is challenging as one needs to capture sufficient non-convexity so to produce precise bounds on the output. Existing verification methods either fail to scale to generative networks or do not capture enough non-convexity. In this work, we present a new verifier, called ApproxLine, that can certify non-trivial properties of generative networks. ApproxLine performs both deterministic and probabilistic abstract interpretation and captures infinite sets of outputs of generative networks. We show that ApproxLine can verify interesting interpolations in the network’s latent space. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxRMlrtPH |
https://openreview.net/pdf?id=HJxRMlrtPH | |
PWC | https://paperswithcode.com/paper/verification-of-generative-model-based-visual |
Repo | |
Framework | |