Paper Group NANR 97
Deep Orientation Uncertainty Learning based on a Bingham Loss. Learning Disentangled Representations for CounterFactual Regression. Unrestricted Adversarial Examples via Semantic Manipulation. Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue. Winning Privately: The Differentially Private Lottery Ticket Mechanism. Certifying Neu …
Deep Orientation Uncertainty Learning based on a Bingham Loss
Title | Deep Orientation Uncertainty Learning based on a Bingham Loss |
Authors | Anonymous |
Abstract | Reasoning about uncertain orientations is one of the core problems in many perception tasks such as object pose estimation or motion estimation. In these scenarios, poor illumination conditions, sensor limitations, or appearance invariance may result in highly uncertain estimates. In this work, we propose a novel learning based representation for orientation uncertainty. Characterizing uncertainty over unit quaternions with the Bingham distribution allows us to formulate a loss that naturally captures the antipodal symmetry of the representation. We discuss the interpretability of the learned distribution parameters and demonstrate the feasibility of our approach on several challenging real-world pose estimation tasks involving uncertain orientations. |
Tasks | Motion Estimation, Pose Estimation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryloogSKDS |
https://openreview.net/pdf?id=ryloogSKDS | |
PWC | https://paperswithcode.com/paper/deep-orientation-uncertainty-learning-based |
Repo | |
Framework | |
Learning Disentangled Representations for CounterFactual Regression
Title | Learning Disentangled Representations for CounterFactual Regression |
Authors | Anonymous |
Abstract | We consider the challenge of estimating treatment effects from observational data; and point out that, in general, only some factors based on the observed covariates X contribute to selection of the treatment T, and only some to determining the outcomes Y. We model this by considering three underlying sources of {X, T, Y} and show that explicitly modeling these sources offers great insight to guide designing models that better handle selection bias. This paper is an attempt to conceptualize this line of thought and provide a path to explore it further. In this work, we propose an algorithm to (1) identify disentangled representations of the above-mentioned underlying factors from any given observational dataset D and (2) leverage this knowledge to reduce, as well as account for, the negative impact of selection bias on estimating the treatment effects from D. Our empirical results show that the proposed method (i) achieves state-of-the-art performance in both individual and population based evaluation measures and (ii) is highly robust under various data generating scenarios. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HkxBJT4YvB |
https://openreview.net/pdf?id=HkxBJT4YvB | |
PWC | https://paperswithcode.com/paper/learning-disentangled-representations-for-1 |
Repo | |
Framework | |
Unrestricted Adversarial Examples via Semantic Manipulation
Title | Unrestricted Adversarial Examples via Semantic Manipulation |
Authors | Anonymous |
Abstract | Machine learning models, especially deep neural networks (DNNs), have been shown to be vulnerable against \emph{adversarial examples} which are carefully crafted samples with a small magnitude of the perturbation. Such adversarial perturbations are usually restricted by bounding their $\mathcal{L}_p$ norm such that they are imperceptible, and thus many current defenses can exploit this property to reduce their adversarial impact. In this paper, we instead introduce “unrestricted” perturbations that manipulate semantically meaningful image-based visual descriptors – color and texture – in order to generate effective and photorealistic adversarial examples. We show that these semantically aware perturbations are effective against JPEG compression, feature squeezing and adversarially trained model. We also show that the proposed methods can effectively be applied to both image classification and image captioning tasks on complex datasets such as ImageNet and MSCOCO. In addition, we conduct comprehensive user studies to show that our generated semantic adversarial examples are photorealistic to humans despite large magnitude perturbations when compared to other attacks. |
Tasks | Image Captioning, Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Sye_OgHFwH |
https://openreview.net/pdf?id=Sye_OgHFwH | |
PWC | https://paperswithcode.com/paper/unrestricted-adversarial-examples-via |
Repo | |
Framework | |
Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue
Title | Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue |
Authors | Anonymous |
Abstract | Knowledge-grounded dialogue is a task of generating an informative response based on both discourse context and external knowledge. As we focus on better modeling the knowledge selection in the multi-turn knowledge-grounded dialogue, we propose a sequential latent variable model as the first approach to this matter. The model named sequential knowledge transformer (SKT) can keep track of the prior and posterior distribution over knowledge; as a result, it can not only reduce the ambiguity caused from the diversity in knowledge selection of conversation but also better leverage the response information for proper choice of knowledge. Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation. We achieve the new state-of-the-art performance on Wizard of Wikipedia (Dinan et al., 2019) as one of the most large-scale and challenging benchmarks. We further validate the effectiveness of our model over existing conversation methods in another knowledge-based dialogue Holl-E dataset (Moghe et al., 2018). |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hke0K1HKwr |
https://openreview.net/pdf?id=Hke0K1HKwr | |
PWC | https://paperswithcode.com/paper/sequential-latent-knowledge-selection-for |
Repo | |
Framework | |
Winning Privately: The Differentially Private Lottery Ticket Mechanism
Title | Winning Privately: The Differentially Private Lottery Ticket Mechanism |
Authors | Anonymous |
Abstract | We propose the differentially private lottery ticket mechanism (DPLTM). An end-to-end differentially private training paradigm based on the lottery ticket hypothesis. Using ``high-quality winners”, selected via our custom score function, DPLTM significantly outperforms state-of-the-art. We show that DPLTM converges faster, allowing for early stopping with reduced privacy budget consumption. We further show that the tickets from DPLTM are transferable across datasets, domains, and architectures. Our extensive evaluation on several public datasets provides evidence to our claims. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1e5YC4KPS |
https://openreview.net/pdf?id=S1e5YC4KPS | |
PWC | https://paperswithcode.com/paper/winning-privately-the-differentially-private |
Repo | |
Framework | |
Certifying Neural Network Audio Classifiers
Title | Certifying Neural Network Audio Classifiers |
Authors | Anonymous |
Abstract | We present the first end-to-end verifier of audio classifiers. Compared to existing methods, our approach enables analysis of both, the entire audio processing stage as well as recurrent neural network architectures (e.g., LSTM). The audio processing is verified using novel convex relaxations tailored to feature extraction operations used in audio (e.g., Fast Fourier Transform) while recurrent architectures are certified via a novel binary relaxation for the recurrent unit update. We show the verifier scales to large networks while computing significantly tighter bounds than existing methods for common audio classification benchmarks: on the challenging Google Speech Commands dataset we certify 95% more inputs than the interval approximation (only prior scalable method), for a perturbation of -90dB. |
Tasks | Audio Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxkvlBtwH |
https://openreview.net/pdf?id=HJxkvlBtwH | |
PWC | https://paperswithcode.com/paper/certifying-neural-network-audio-classifiers |
Repo | |
Framework | |
Wasserstein-Bounded Generative Adversarial Networks
Title | Wasserstein-Bounded Generative Adversarial Networks |
Authors | Peng Zhou, Bingbing Ni, Lingxi Xie, Xiaopeng Zhang, Hang Wang, Cong Geng, Qi Tian |
Abstract | In the field of Generative Adversarial Networks (GANs), how to design a stable training strategy remains an open problem. Wasserstein GANs have largely promoted the stability over the original GANs by introducing Wasserstein distance, but still remain unstable and are prone to a variety of failure modes. In this paper, we present a general framework named Wasserstein-Bounded GAN (WBGAN), which improves a large family of WGAN-based approaches by simply adding an upper-bound constraint to the Wasserstein term. Furthermore, we show that WBGAN can reasonably measure the difference of distributions which almost have no intersection. Experiments demonstrate that WBGAN can stabilize as well as accelerate convergence in the training processes of a series of WGAN-based variants. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkxgrAVFwH |
https://openreview.net/pdf?id=BkxgrAVFwH | |
PWC | https://paperswithcode.com/paper/wasserstein-bounded-generative-adversarial |
Repo | |
Framework | |
Efficient Bi-Directional Verification of ReLU Networks via Quadratic Programming
Title | Efficient Bi-Directional Verification of ReLU Networks via Quadratic Programming |
Authors | Anonymous |
Abstract | Neural networks are known to be sensitive to adversarial perturbations. To investigate this undesired behavior we consider the problem of computing the distance to the decision boundary (DtDB) from a given sample for a deep NN classifier. In this work we present an iterative procedure where in each step we solve a convex quadratic programming (QP) task. Solving the single initial QP already results in a lower bound on the DtDB and can be used as a robustness certificate of the classifier around a given sample. In contrast to currently known approaches our method also provides upper bounds used as a measure of quality for the certificate. We show that our approach provides better or competitive results in comparison with a wide range of existing techniques. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Bkx4AJSFvB |
https://openreview.net/pdf?id=Bkx4AJSFvB | |
PWC | https://paperswithcode.com/paper/efficient-bi-directional-verification-of-relu |
Repo | |
Framework | |
Do Deep Neural Networks for Segmentation Understand Insideness?
Title | Do Deep Neural Networks for Segmentation Understand Insideness? |
Authors | Anonymous |
Abstract | Image segmentation aims at grouping pixels that belong to the same object or region. At the heart of image segmentation lies the problem of determining whether a pixel is inside or outside a region, which we denote as the “insideness” problem. Many Deep Neural Networks (DNNs) variants excel in segmentation benchmarks, but regarding insideness, they have not been well visualized or understood: What representations do DNNs use to address the long-range relationships of insideness? How do architectural choices affect the learning of these representations? In this paper, we take the reductionist approach by analyzing DNNs solving the insideness problem in isolation, i.e. determining the inside of closed (Jordan) curves. We demonstrate analytically that state-of-the-art feed-forward and recurrent architectures can implement solutions of the insideness problem for any given curve. Yet, only recurrent networks could learn these general solutions when the training enforced a specific “routine” capable of breaking down the long-range relationships. Our results highlights the need for new training strategies that decompose the learning into appropriate stages, and that lead to the general class of solutions necessary for DNNs to understand insideness. |
Tasks | Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryevtyHtPr |
https://openreview.net/pdf?id=ryevtyHtPr | |
PWC | https://paperswithcode.com/paper/do-deep-neural-networks-for-segmentation |
Repo | |
Framework | |
Learning the Arrow of Time for Problems in Reinforcement Learning
Title | Learning the Arrow of Time for Problems in Reinforcement Learning |
Authors | Anonymous |
Abstract | We humans have an innate understanding of the asymmetric progression of time, which we use to efficiently and safely perceive and manipulate our environment. Drawing inspiration from that, we approach the problem of learning an arrow of time in a Markov (Decision) Process. We illustrate how a learned arrow of time can capture salient information about the environment, which in turn can be used to measure reachability, detect side-effects and to obtain an intrinsic reward signal. Finally, we propose a simple yet effective algorithm to parameterize the problem at hand and learn an arrow of time with a function approximator (here, a deep neural network). Our empirical results span a selection of discrete and continuous environments, and demonstrate for a class of stochastic processes that the learned arrow of time agrees reasonably well with a well known notion of an arrow of time due to Jordan, Kinderlehrer and Otto (1998). |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rylJkpEtwS |
https://openreview.net/pdf?id=rylJkpEtwS | |
PWC | https://paperswithcode.com/paper/learning-the-arrow-of-time-for-problems-in |
Repo | |
Framework | |
Pre-training as Batch Meta Reinforcement Learning with tiMe
Title | Pre-training as Batch Meta Reinforcement Learning with tiMe |
Authors | Anonymous |
Abstract | Pre-training is transformative in supervised learning: a large network trained with large and existing datasets can be used as an initialization when learning a new task. Such initialization speeds up convergence and leads to higher performance. In this paper, we seek to understand what the formalization for pre-training from only existing and observational data in Reinforcement Learning (RL) is and whether it is possible. We formulate the setting as Batch Meta Reinforcement Learning. We identify MDP mis-identification to be a central challenge and motivate it with theoretical analysis. Combining ideas from Batch RL and Meta RL, we propose tiMe, which learns distillation of multiple value functions and MDP embeddings from only existing data. In challenging control tasks and without fine-tuning on unseen MDPs, tiMe is competitive with state-of-the-art model-free RL method trained with hundreds of thousands of environment interactions. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=S1elRa4twS |
https://openreview.net/pdf?id=S1elRa4twS | |
PWC | https://paperswithcode.com/paper/pre-training-as-batch-meta-reinforcement-1 |
Repo | |
Framework | |
Fairness with Wasserstein Adversarial Networks
Title | Fairness with Wasserstein Adversarial Networks |
Authors | Anonymous |
Abstract | Quantifying, enforcing and implementing fairness emerged as a major topic in machine learning. We investigate these questions in the context of deep learning. Our main algorithmic and theoretical tool is the computational estimation of similarities between probability, ```a la Wasserstein’', using adversarial networks. This idea is flexible enough to investigate different fairness constrained learning tasks, which we model by specifying properties of the underlying data generative process. The first setting considers bias in the generative model which should be filtered out. The second model is related to the presence of nuisance variables in the observations producing an unwanted bias for the learning task. For both models, we devise a learning algorithm based on approximation of Wasserstein distances using adversarial networks. We provide formal arguments describing the fairness enforcing properties of these algorithm in relation with the underlying fairness generative processes. Finally we perform experiments, both on synthetic and real world data, to demonstrate empirically the superiority of our approach compared to state of the art fairness algorithms as well as concurrent GAN type adversarial architectures based on Jensen divergence. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BkeGPJrtwB |
https://openreview.net/pdf?id=BkeGPJrtwB | |
PWC | https://paperswithcode.com/paper/fairness-with-wasserstein-adversarial |
Repo | |
Framework | |
Generalization through Memorization: Nearest Neighbor Language Models
Title | Generalization through Memorization: Nearest Neighbor Language Models |
Authors | Anonymous |
Abstract | We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this transformation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15.79 – a 2.9 point improvement with no additional training. We also show that this approach has implications for efficiently scaling up to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore, again without further training. Qualitatively, the model is particularly helpful in predicting rare patterns, such as factual knowledge. Together, these results strongly suggest that learning similarity between sequences of text is easier than predicting the next word, and that nearest neighbor search is an effective approach for language modeling in the long tail. |
Tasks | Domain Adaptation, Language Modelling |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HklBjCEKvH |
https://openreview.net/pdf?id=HklBjCEKvH | |
PWC | https://paperswithcode.com/paper/generalization-through-memorization-nearest-1 |
Repo | |
Framework | |
Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$
Title | Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$ |
Authors | Anonymous |
Abstract | In recent years several adversarial attacks and defenses have been proposed. Often seemingly robust models turn out to be non-robust when more sophisticated attacks are used. One way out of this dilemma are provable robustness guarantees. While provably robust models for specific $l_p$-perturbation models have been developed, we show that they do not come with any guarantee against other $l_q$-perturbations. We propose a new regularization scheme, MMR-Universal, for ReLU networks which enforces robustness wrt $l_1$- \textit{and} $l_\infty$-perturbations and show how that leads to the first provably robust models wrt any $l_p$-norm for $p\geq 1$. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rklk_ySYPB |
https://openreview.net/pdf?id=rklk_ySYPB | |
PWC | https://paperswithcode.com/paper/provable-robustness-against-all-adversarial-1 |
Repo | |
Framework | |
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
Title | Measuring Compositional Generalization: A Comprehensive Method on Realistic Data |
Authors | Anonymous |
Abstract | State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between compound divergence and accuracy. We also demonstrate how our method can be used to create new compositionality benchmarks on top of the existing SCAN dataset, which confirms these findings. |
Tasks | Question Answering |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SygcCnNKwr |
https://openreview.net/pdf?id=SygcCnNKwr | |
PWC | https://paperswithcode.com/paper/measuring-compositional-generalization-a |
Repo | |
Framework | |