Paper Group NANR 9
On Need for Topology-Aware Generative Models for Manifold-Based Defenses. Deep k-NN for Noisy Labels. Visual Interpretability Alone Helps Adversarial Robustness. Learning to Rank Learning Curves. Towards Controllable and Interpretable Face Completion via Structure-Aware and Frequency-Oriented Attentive GANs. An Empirical and Comparative Analysis of …
On Need for Topology-Aware Generative Models for Manifold-Based Defenses
Title | On Need for Topology-Aware Generative Models for Manifold-Based Defenses |
Authors | Anonymous |
Abstract | ML algorithms or models, especially deep neural networks (DNNs), have shown significant promise in several areas. However, recently researchers have demonstrated that ML algorithms, especially DNNs, are vulnerable to adversarial examples (slightly perturbed samples that cause mis-classification). Existence of adversarial examples has hindered deployment of ML algorithms in safety-critical sectors, such as security. Several defenses for adversarial examples exist in the literature. One of the important classes of defenses are manifold-based defenses, where a sample is “pulled back” into the data manifold before classifying. These defenses rely on the manifold assumption (data lie in a manifold of lower dimension than the input space). These defenses use a generative model to approximate the input distribution. This paper asks the following question: do the generative models used in manifold-based defenses need to be topology-aware? Our paper suggests the answer is yes. We provide theoretical and empirical evidence to support our claim. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1lF_CEYwS |
https://openreview.net/pdf?id=r1lF_CEYwS | |
PWC | https://paperswithcode.com/paper/on-need-for-topology-aware-generative-models |
Repo | |
Framework | |
Deep k-NN for Noisy Labels
Title | Deep k-NN for Noisy Labels |
Authors | Anonymous |
Abstract | Modern machine learning models are often trained on examples with noisy labels that hurt performance and are hard to identify. In this paper, we provide an empirical study showing that a simple $k$-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled training data and produce more accurate models than some recently proposed methods. We also provide new statistical guarantees into its efficacy. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BklIxyHKDr |
https://openreview.net/pdf?id=BklIxyHKDr | |
PWC | https://paperswithcode.com/paper/deep-k-nn-for-noisy-labels |
Repo | |
Framework | |
Visual Interpretability Alone Helps Adversarial Robustness
Title | Visual Interpretability Alone Helps Adversarial Robustness |
Authors | Anonymous |
Abstract | Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability, and interpretability is itself susceptible to adversarial attacks. In this paper, we theoretically show that with the correct measurement of interpretation, it is actually difficult to hide adversarial examples, as confirmed by experiments on MNIST, CIFAR-10 and Restricted ImageNet. Spurred by that, we develop a novel defensive scheme built only on robust interpretation (without resorting to adversarial loss minimization). We show that our defense achieves similar classification robustness to state-of-the-art robust training methods while attaining higher interpretation robustness under various settings of adversarial attacks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hyes70EYDB |
https://openreview.net/pdf?id=Hyes70EYDB | |
PWC | https://paperswithcode.com/paper/visual-interpretability-alone-helps |
Repo | |
Framework | |
Learning to Rank Learning Curves
Title | Learning to Rank Learning Curves |
Authors | Anonymous |
Abstract | Many automated machine learning methods, such as those for hyperparameter and neural architecture optimization, are computationally expensive because they involve training many different model configurations. In this work, we present a new method that saves computational budget by terminating poor configurations early on in the training. In contrast to existing methods, we consider this task as a ranking and transfer learning problem. We qualitatively show that by optimizing a pairwise ranking loss and leveraging learning curves from other data sets, our model is able to effectively rank learning curves without having to observe many or very long learning curves. We further demonstrate that our method can be used to accelerate a neural architecture search by a factor of up to 100 without a significant performance degradation of the discovered architecture. In further experiments we analyze the quality of ranking, the influence of different model components as well as the predictive behavior of the model. |
Tasks | Learning-To-Rank, Neural Architecture Search, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxAHgSYDB |
https://openreview.net/pdf?id=BJxAHgSYDB | |
PWC | https://paperswithcode.com/paper/learning-to-rank-learning-curves |
Repo | |
Framework | |
Towards Controllable and Interpretable Face Completion via Structure-Aware and Frequency-Oriented Attentive GANs
Title | Towards Controllable and Interpretable Face Completion via Structure-Aware and Frequency-Oriented Attentive GANs |
Authors | Anonymous |
Abstract | Face completion is a challenging conditional image synthesis task. This paper proposes controllable and interpretable high-resolution and fast face completion by learning generative adversarial networks (GANs) progressively from low resolution to high resolution. We present structure-aware and frequency-oriented attentive GANs. The proposed structure-aware component leverages off-the-shelf facial landmark detectors and proposes a simple yet effective method of integrating the detected landmarks in generative learning. It facilitates facial expression transfer together with facial attributes control, and helps regularize the structural consistency in progressive training. The proposed frequency-oriented attentive module (FOAM) encourages GANs to attend to only finer details in the coarse-to-fine progressive training, thus enabling progressive attention to face structures. The learned FOAMs show a strong pattern of switching its attention from low-frequency to high-frequency signals. In experiments, the proposed method is tested on the CelebA-HQ benchmark. Experiment results show that our approach outperforms state-of-the-art face completion methods. The proposed method is also fast with mean inference time of 0.54 seconds for images at 1024x1024 resolution (using a Titan Xp GPU). |
Tasks | Facial Inpainting, Image Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxUkTVYvH |
https://openreview.net/pdf?id=ryxUkTVYvH | |
PWC | https://paperswithcode.com/paper/towards-controllable-and-interpretable-face |
Repo | |
Framework | |
An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms
Title | An Empirical and Comparative Analysis of Data Valuation with Scalable Algorithms |
Authors | Anonymous |
Abstract | This paper focuses on valuating training data for supervised learning tasks and studies the Shapley value, a data value notion originated in cooperative game theory. The Shapley value defines a unique value distribution scheme that satisfies a set of appealing properties desired by a data value notion. However, the Shapley value requires exponential complexity to calculate exactly. Existing approximation algorithms, although achieving great improvement over the exact algorithm, relies on retraining models for multiple times, thus remaining limited when applied to larger-scale learning tasks and real-world datasets. In this work, we develop a simple and efficient algorithm to estimate the Shapley value with complexity independent with the model size. The key idea is to approximate the model via a $K$-nearest neighbor ($K$NN) classifier, which has a locality structure that can lead to efficient Shapley value calculation. We evaluate the utility of the values produced by the $K$NN proxies in various settings, including label noise correction, watermark detection, data summarization, active data acquisition, and domain adaption. Extensive experiments demonstrate that our algorithm achieves at least comparable utility to the values produced by existing algorithms while significant efficiency improvement. Moreover, we theoretically analyze the Shapley value and justify its advantage over the leave-one-out error as a data value measure. |
Tasks | Data Summarization, Domain Adaptation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygBIxSFDS |
https://openreview.net/pdf?id=SygBIxSFDS | |
PWC | https://paperswithcode.com/paper/an-empirical-and-comparative-analysis-of-data |
Repo | |
Framework | |
AdamT: A Stochastic Optimization with Trend Correction Scheme
Title | AdamT: A Stochastic Optimization with Trend Correction Scheme |
Authors | Anonymous |
Abstract | Adam-typed optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing for capability on large-scale sparse datasets. On top of that, they are computationally efficient and insensitive to the hyper-parameter settings. In this paper, we present a new framework for adapting Adam-typed methods, namely AdamT. Instead of applying a simple exponential weighted average, AdamT also includes the trend information when updating the parameters with the adaptive step size and gradients. The newly added term is expected to efficiently capture the non-horizontal moving patterns on the cost surface, and thus converge more rapidly. We show empirically the importance of the trend component, where AdamT outperforms the conventional Adam method constantly in both convex and non-convex settings. |
Tasks | Stochastic Optimization |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Sklw_kHtPH |
https://openreview.net/pdf?id=Sklw_kHtPH | |
PWC | https://paperswithcode.com/paper/adamt-a-stochastic-optimization-with-trend |
Repo | |
Framework | |
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems
Title | Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems |
Authors | Anonymous |
Abstract | Recently, several studies have proven the global convergence and generalization abilities of the gradient descent method for two-layer ReLU networks. Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out. However, the performance of gradient descent on classification problems using the logistic loss function has not been well studied, and further investigation of this problem structure is possible. In this work, we demonstrate that the separability assumption using a neural tangent model is more reasonable than the positivity condition of the neural tangent kernel and provide a refined convergence analysis of the gradient descent for two-layer networks with smooth activations. A remarkable point of our result is that our convergence and generalization bounds have much better dependence on the network width in comparison to related studies. Consequently, our theory significantly enlarges a class of over-parameterized networks with provable generalization ability, with respect to the network width, while most studies require much higher over-parameterization. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJg641BKPH |
https://openreview.net/pdf?id=BJg641BKPH | |
PWC | https://paperswithcode.com/paper/gradient-descent-can-learn-less-over |
Repo | |
Framework | |
BERTScore: Evaluating Text Generation with BERT
Title | BERTScore: Evaluating Text Generation with BERT |
Authors | Anonymous |
Abstract | We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task and show that BERTScore is more robust to challenging examples compared to existing metrics. |
Tasks | Image Captioning, Machine Translation, Model Selection, Text Generation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeHuCVFDr |
https://openreview.net/pdf?id=SkeHuCVFDr | |
PWC | https://paperswithcode.com/paper/bertscore-evaluating-text-generation-with-1 |
Repo | |
Framework | |
Deep Expectation-Maximization in Hidden Markov Models via Simultaneous Perturbation Stochastic Approximation
Title | Deep Expectation-Maximization in Hidden Markov Models via Simultaneous Perturbation Stochastic Approximation |
Authors | Anonymous |
Abstract | We propose a novel method to estimate the parameters of a collection of Hidden Markov Models (HMM), each of which corresponds to a set of known features. The observation sequence of an individual HMM is noisy and/or insufficient, making parameter estimation solely based on its corresponding observation sequence a challenging problem. The key idea is to combine the classical Expectation-Maximization (EM) algorithm with a neural network, while these two are jointly trained in an end-to-end fashion, mapping the HMM features to its parameters and effectively fusing the information across different HMMs. In order to address the numerical difficulty in computing the gradient of the EM iteration, simultaneous perturbation stochastic approximation (SPSA) is employed to approximate the gradient. We also provide a rigorous proof that the approximated gradient due to SPSA converges to the true gradient almost surely. The efficacy of the proposed method is demonstrated on synthetic data as well as a real-world e-Commerce dataset. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxGAREYwB |
https://openreview.net/pdf?id=BkxGAREYwB | |
PWC | https://paperswithcode.com/paper/deep-expectation-maximization-in-hidden |
Repo | |
Framework | |
Query-efficient Meta Attack to Deep Neural Networks
Title | Query-efficient Meta Attack to Deep Neural Networks |
Authors | Anonymous |
Abstract | Black-box attack methods aim to infer suitable attack patterns to targeted DNN models by only using output feedback of the models and the corresponding input queries. However, due to lack of prior and inefficiency in leveraging the query and feedback information, existing methods are mostly query-intensive for obtaining effective attack patterns. In this work, we propose a meta attack approach that is capable of attacking a targeted model with much fewer queries. Its high query-efficiency stems from effective utilization of meta learning approaches in learning generalizable prior abstraction from the previously observed attack patterns and exploiting such prior to help infer attack patterns from only a few queries and outputs. Extensive experiments on MNIST, CIFAR10 and tiny-Imagenet demonstrate that our meta-attack method can remarkably reduce the number of model queries without sacrificing the attack performance. Besides, the obtained meta attacker is not restricted to a particular model but can be used easily with a fast adaptive ability to attack a variety of models. Our code will be released to the public. |
Tasks | Meta-Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Skxd6gSYDS |
https://openreview.net/pdf?id=Skxd6gSYDS | |
PWC | https://paperswithcode.com/paper/query-efficient-meta-attack-to-deep-neural-1 |
Repo | |
Framework | |
Learning Similarity Metrics for Numerical Simulations
Title | Learning Similarity Metrics for Numerical Simulations |
Authors | Anonymous |
Abstract | We propose a novel approach to compute a stable and generalizing metric (LNSM) with convolutional neural networks (CNN) to compare field data from a variety of numerical simulation sources. Our method employs a Siamese network architecture that is motivated by the mathematical properties of a metric and is known to work well for finding similarities of other data modalities. We leverage a controllable data generation setup with partial differential equation (PDE) solvers to create increasingly different outputs from a reference simulation. In addition, the data generation allows for adjusting the difficulty of the resulting learning task. A central component of our learned metric is a specialized loss function, that introduces knowledge about the correlation between single data samples into the training process. To demonstrate that the proposed approach outperforms existing simple metrics for vector spaces and other learned, image based metrics we evaluate the different methods on a large range of test data. Additionally, we analyze generalization benefits of using the proposed correlation loss and the impact of an adjustable training data difficulty. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hyl9ahVFwH |
https://openreview.net/pdf?id=Hyl9ahVFwH | |
PWC | https://paperswithcode.com/paper/learning-similarity-metrics-for-numerical |
Repo | |
Framework | |
Learning from Rules Generalizing Labeled Exemplars
Title | Learning from Rules Generalizing Labeled Exemplars |
Authors | Anonymous |
Abstract | In many applications labeled data is not readily available, and needs to be collected via pain-staking human supervision. We propose a rule-exemplar model for collecting human supervision to combine the scalability of rules with the quality of instance labels. The supervision is coupled such that it is both natural for humans and synergistic for learning. We propose a training algorithm that jointly denoises rules via latent coverage variables, and trains the model through a soft implication loss over the coverage and label variables. Empirical evaluation on five different tasks shows that (1) our algorithm is more accurate than several existing methods of learning from a mix of clean and noisy supervision, and (2) the coupled rule-exemplar supervision is effective in denoising rules. |
Tasks | Denoising |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkeuexBtDr |
https://openreview.net/pdf?id=SkeuexBtDr | |
PWC | https://paperswithcode.com/paper/learning-from-rules-generalizing-labeled |
Repo | |
Framework | |
Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search
Title | Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search |
Authors | Anonymous |
Abstract | Monte Carlo Tree Search (MCTS) algorithms have achieved great success on many challenging benchmarks (e.g., Computer Go). However, they generally require a large number of rollouts, making their applications to planning costly. Furthermore, it is also extremely challenging to parallelize MCTS due to its inherent sequential nature: each rollout heavily relies on the statistics (e.g., node visitation counts) estimated from previous simulations to achieve an effective exploration-exploitation tradeoff. In spite of these difficulties, we develop an algorithm, P-UCT, to effectively parallelize MCTS, which achieves linear speedup and exhibits negligible performance loss with an increasing number of workers. The key idea in P-UCT is a set of statistics that we introduce to track the number of on-going yet incomplete simulation queries (named as unobserved samples). These statistics are used to modify the UCT tree policy in the selection steps in a principled manner to retain effective exploration-exploitation tradeoff when we parallelize the most time-consuming expansion and simulation steps. Experimental results on a proprietary benchmark and the public Atari Game benchmark demonstrate the near-optimal linear speedup and the superior performance of P-UCT when compared to these existing techniques. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJlQtJSKDB |
https://openreview.net/pdf?id=BJlQtJSKDB | |
PWC | https://paperswithcode.com/paper/watch-the-unobserved-a-simple-approach-to |
Repo | |
Framework | |
WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE
Title | WORD SEQUENCE PREDICTION FOR AMHARIC LANGUAGE |
Authors | Anonymous |
Abstract | Word prediction is guessing what word comes after, based on some current information, and it is the main focus of this study. Even though Amharic is used by a large number of populations, no significant work is done on the topic. In this study, Amharic word sequence prediction model is developed using Machine learning. We used statistical methods using Hidden Markov Model by incorporating detailed parts of speech tag and user profiling or adaptation. One of the needs for this research is to overcome the challenges on inflected languages. Word sequence prediction is a challenging task for inflected languages (Gustavii &Pettersson, 2003; Seyyed & Assi, 2005). These kinds of languages are morphologically rich and have enormous word forms, which is a word can have different forms. As Amharic language is morphologically rich it shares the problem (Tessema, 2014).This problem makes word prediction system much more difficult and results poor performance. Previous researches used dictionary approach with no consideration of context information. Due to this reason, storing all forms in a dictionary won’t solve the problem as in English and other less inflected languages. Therefore, we introduced two models; tags and words and linear interpolation that use parts of speech tag information in addition to word n-grams in order to maximize the likelihood of syntactic appropriateness of the suggestions. The statistics included in the systems varies from single word frequencies to parts-of-speech tag n-grams. We described a combined statistical and lexical word prediction system and developed Amharic language models of bigram and trigram for the training purpose. The overall study followed Design Science Research Methodology (DSRM). |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkgqmyrYDH |
https://openreview.net/pdf?id=HkgqmyrYDH | |
PWC | https://paperswithcode.com/paper/word-sequence-prediction-for-amharic-language |
Repo | |
Framework | |