April 1, 2020

3116 words 15 mins read

Paper Group NANR 124

Extreme Value k-means Clustering. Disentangling Style and Content in Anime Illustrations. Effects of Linguistic Labels on Learned Visual Representations in Convolutional Neural Networks: Labels matter!. Learning from Imperfect Annotations: An End-to-End Approach. Fuzzing-Based Hard-Label Black-Box Attacks Against Machine Learning Models. SGD with H …

Extreme Value k-means Clustering


Title	Extreme Value k-means Clustering
Authors	Anonymous
Abstract	Clustering is the central task in unsupervised learning and data mining. k-means is one of the most widely used clustering algorithms. Unfortunately, it is generally non-trivial to extend k-means to cluster data points beyond Gaussian distribution, particularly, the clusters with non-convex shapes (Beliakov & King, 2006). To this end, we, for the first time, introduce Extreme Value Theory (EVT) to improve the clustering ability of k-means. Particularly, the Euclidean space was transformed into a novel probability space denoted as extreme value space by EVT. We thus propose a novel algorithm called Extreme Value k-means (EV k-means), including GEV k-means and GPD k-means. In addition, we also introduce the tricks to accelerate Euclidean distance computation in improving the computational efficiency of classical k-means. Furthermore, our EV k-means is extended to an online version, i.e., online Extreme Value k-means, in utilizing the Mini Batch k-means to cluster streaming data. Extensive experiments are conducted to validate our EV k-means and online EV k-means on synthetic datasets and real datasets. Experimental results show that our algorithms significantly outperform competitors in most cases.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1lfga4KvS
PDF	https://openreview.net/pdf?id=r1lfga4KvS
PWC	https://paperswithcode.com/paper/extreme-value-k-means-clustering
Repo
Framework

Disentangling Style and Content in Anime Illustrations


Title	Disentangling Style and Content in Anime Illustrations
Authors	Anonymous
Abstract	Existing methods for AI-generated artworks still struggle with generating high-quality stylized content, where high-level semantics are preserved, or separating fine-grained styles from various artists. We propose a novel Generative Adversarial Disentanglement Network which can disentangle two complementary factors of variations when only one of them is labelled in general, and fully decompose complex anime illustrations into style and content in particular. Training such model is challenging, since given a style, various content data may exist but not the other way round. Our approach is divided into two stages, one that encodes an input image into a style independent content, and one based on a dual-conditional generator. We demonstrate the ability to generate high-fidelity anime portraits with a fixed content and a large variety of styles from over a thousand artists, and vice versa, using a single end-to-end network and with applications in style transfer. We show this unique capability as well as superior output to the current state-of-the-art.
Tasks	Style Transfer
Published	2020-01-01
URL	https://openreview.net/forum?id=BJe4V1HFPr
PDF	https://openreview.net/pdf?id=BJe4V1HFPr
PWC	https://paperswithcode.com/paper/disentangling-style-and-content-in-anime-1
Repo
Framework

Effects of Linguistic Labels on Learned Visual Representations in Convolutional Neural Networks: Labels matter!


Title	Effects of Linguistic Labels on Learned Visual Representations in Convolutional Neural Networks: Labels matter!
Authors	Anonymous
Abstract	We investigated the changes in visual representations learnt by CNNs when using different linguistic labels (e.g., trained with basic-level labels only, superordinate-level only, or both at the same time) and how they compare to human behavior when asked to select which of three images is most different. We compared CNNs with identical architecture and input, differing only in what labels were used to supervise the training. The results showed that in the absence of labels, the models learn very little categorical structure that is often assumed to be in the input. Models trained with superordinate labels (vehicle, tool, etc.) are most helpful in allowing the models to match human categorization, implying that human representations used in odd-one-out tasks are highly modulated by semantic information not obviously present in the visual input.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=r1xH5xHYwH
PDF	https://openreview.net/pdf?id=r1xH5xHYwH
PWC	https://paperswithcode.com/paper/effects-of-linguistic-labels-on-learned
Repo
Framework

Learning from Imperfect Annotations: An End-to-End Approach


Title	Learning from Imperfect Annotations: An End-to-End Approach
Authors	Anonymous
Abstract	Many machine learning systems today are trained on large amounts of human-annotated data. Annotation tasks that require a high level of competency make data acquisition expensive, while the resulting labels are often subjective, inconsistent, and may contain a variety of human biases. To improve data quality, practitioners often need to collect multiple annotations per example and aggregate them before training models. Such a multi-stage approach results in redundant annotations and may often produce imperfect ``ground truth’’ labels that limit the potential of training supervised machine learning models. We propose a new end-to-end framework that enables us to: (i) merge the aggregation step with model training, thus allowing deep learning systems to learn to predict ground truth estimates directly from the available data, and (ii) model difficulties of examples and learn representations of the annotators that allow us to estimate and take into account their competencies. Our approach is general and has many applications, including training more accurate models on crowdsourced data, ensemble learning, as well as classifier accuracy estimation from unlabeled data. We conduct an extensive experimental evaluation of our method on 5 crowdsourcing datasets of varied difficulty and show accuracy gains of up to 25% over the current state-of-the-art approaches for aggregating annotations, as well as significant reductions in the required annotation redundancy. \|
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=rJlVdREKDS
PDF	https://openreview.net/pdf?id=rJlVdREKDS
PWC	https://paperswithcode.com/paper/learning-from-imperfect-annotations-an-end-to
Repo
Framework

Fuzzing-Based Hard-Label Black-Box Attacks Against Machine Learning Models


Title	Fuzzing-Based Hard-Label Black-Box Attacks Against Machine Learning Models
Authors	Anonymous
Abstract	Machine learning models are known to be vulnerable to adversarial examples. Based on different levels of knowledge that attackers have about the models, adversarial example generation methods can be categorized into white-box and black-box attacks. We study the most realistic attacks, hard-label black-box attacks, where attackers only have the query access of a model and only the final predicted labels are available. The main limitation of the existing hard-label black-box attacks is that they need a large number of model queries, making them inefficient and even infeasible in practice. Inspired by the very successful fuzz testing approach in traditional software testing and computer security domains, we propose fuzzing-based hard-label black-box attacks against machine learning models. We design an AdvFuzzer to explore multiple paths between a source image and a guidance image, and design a LocalFuzzer to explore the nearby space around a given input for identifying potential adversarial examples. We demonstrate that our fuzzing attacks are feasible and effective in generating successful adversarial examples with significantly reduced number of model queries and L0 distance. More interestingly, supplied with a successful adversarial example as a seed, LocalFuzzer can immediately generate more successful adversarial examples even with smaller L2 distance from the source example, indicating that LocalFuzzer itself can be an independent and useful tool to augment many adversarial example generation algorithms.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=BklYhxBYwH
PDF	https://openreview.net/pdf?id=BklYhxBYwH
PWC	https://paperswithcode.com/paper/fuzzing-based-hard-label-black-box-attacks
Repo
Framework

SGD with Hardness Weighted Sampling for Distributionally Robust Deep Learning


Title	SGD with Hardness Weighted Sampling for Distributionally Robust Deep Learning
Authors	Anonymous
Abstract	Distributionally Robust Optimization (DRO) has been proposed as an alternative to Empirical Risk Minimization (ERM) in order to account for potential biases in the training data distribution. However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO compared to the wide-spread Stochastic Gradient Descent (SGD) based optimizers for deep learning with ERM. In this work, we demonstrate that SGD with hardness weighted sampling is a principled and efficient optimization method for DRO in machine learning and is particularly suited in the context of deep learning. Similar to a hard example mining strategy in essence and in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning. It only requires adding a softmax layer and maintaining an history of the loss values for each training example to compute adaptive sampling probabilities. In contrast to typical ad hoc hard mining approaches, and exploiting recent theoretical results in deep learning optimization, we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with ReLU activation and finite number of layers and parameters. Preliminary results demonstrate the feasibility and usefulness of our approach.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SyglyANFDr
PDF	https://openreview.net/pdf?id=SyglyANFDr
PWC	https://paperswithcode.com/paper/sgd-with-hardness-weighted-sampling-for
Repo
Framework

Curvature-based Robustness Certificates against Adversarial Examples


Title	Curvature-based Robustness Certificates against Adversarial Examples
Authors	Anonymous
Abstract	A robustness certificate against adversarial examples is the minimum distance of a given input to the decision boundary of the classifier (or its lower bound). For {\it any} perturbation of the input with a magnitude smaller than the certificate value, the classification output will provably remain unchanged. Computing exact robustness certificates for deep classifiers is difficult in general since it requires solving a non-convex optimization. In this paper, we provide computationally-efficient robustness certificates for deep classifiers with differentiable activation functions in two steps. First, we show that if the eigenvalues of the Hessian of the network (curvatures of the network) are bounded, we can compute a robustness certificate in the $l_2$ norm efficiently using convex optimization. Second, we derive a computationally-efficient differentiable upper bound on the curvature of a deep network. We also use the curvature bound as a regularization term during the training of the network to boost its certified robustness against adversarial examples. Putting these results together leads to our proposed {\bf C}urvature-based {\bf R}obustness {\bf C}ertificate (CRC) and {\bf C}urvature-based {\bf R}obust {\bf T}raining (CRT). Our numerical results show that CRC outperforms CROWN’s certificate by an order of magnitude while CRT leads to higher certified accuracy compared to standard adversarial training and TRADES.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Skgq1ANFDB
PDF	https://openreview.net/pdf?id=Skgq1ANFDB
PWC	https://paperswithcode.com/paper/curvature-based-robustness-certificates
Repo
Framework

Dimensional Reweighting Graph Convolution Networks


Title	Dimensional Reweighting Graph Convolution Networks
Authors	Anonymous
Abstract	In this paper, we propose a method named Dimensional reweighting Graph Convolutional Networks (DrGCNs), to tackle the problem of variance between dimensional information in the node representations of GCNs. We prove that DrGCNs can reduce the variance of the node representations by connecting our problem to the theory of the mean field. However, practically, we find that the degrees DrGCNs help vary severely on different datasets. We revisit the problem and develop a new measure K to quantify the effect. This measure guides when we should use dimensional reweighting in GCNs and how much it can help. Moreover, it offers insights to explain the improvement obtained by the proposed DrGCNs. The dimensional reweighting block is light-weighted and highly flexible to be built on most of the GCN variants. Carefully designed experiments, including several fixes on duplicates, information leaks, and wrong labels of the well-known node classification benchmark datasets, demonstrate the superior performances of DrGCNs over the existing state-of-the-art approaches. Significant improvements can also be observed on a large scale industrial dataset.
Tasks	Node Classification
Published	2020-01-01
URL	https://openreview.net/forum?id=SJeLO34KwS
PDF	https://openreview.net/pdf?id=SJeLO34KwS
PWC	https://paperswithcode.com/paper/dimensional-reweighting-graph-convolution
Repo
Framework

The Differentiable Cross-Entropy Method


Title	The Differentiable Cross-Entropy Method
Authors	Anonymous
Abstract	We study the Cross-Entropy Method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant (DCEM) that enables us to differentiate the output of CEM with respect to the objective function’s parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline in cases this has otherwise been impossible. We show applications in a synthetic energy-based structured prediction task and in non-convex continuous control. In the control setting we show on the simulated cheetah and walker tasks that we can embed their optimal action sequences with DCEM and then use policy optimization to fine-tune components of the controller as a step towards combining model-based and model-free RL.
Tasks	Continuous Control, Structured Prediction
Published	2020-01-01
URL	https://openreview.net/forum?id=HJluEeHKwH
PDF	https://openreview.net/pdf?id=HJluEeHKwH
PWC	https://paperswithcode.com/paper/the-differentiable-cross-entropy-method
Repo
Framework

CopyCAT: Taking Control of Neural Policies with Constant Attacks


Title	CopyCAT: Taking Control of Neural Policies with Constant Attacks
Authors	Anonymous
Abstract	We propose a new perspective on adversarial attacks against deep reinforcement learning agents. Our main contribution is CopyCAT, a targeted attack able to consistently lure an agent into following an outsider’s policy. It is pre-computed, therefore fast inferred, and could thus be usable in a real-time scenario. We show its effectiveness on Atari 2600 games in the novel read-only setting. In the latter, the adversary cannot directly modify the agent’s state -its representation of the environment- but can only attack the agent’s observation -its perception of the environment. Directly modifying the agent’s state would require a write-access to the agent’s inner workings and we argue that this assumption is too strong in realistic settings.
Tasks	Atari Games
Published	2020-01-01
URL	https://openreview.net/forum?id=SyxoygBKwB
PDF	https://openreview.net/pdf?id=SyxoygBKwB
PWC	https://paperswithcode.com/paper/copycat-taking-control-of-neural-policies
Repo
Framework

Data Annealing Transfer learning Procedure for Informal Language Understanding Tasks


Title	Data Annealing Transfer learning Procedure for Informal Language Understanding Tasks
Authors	Anonymous
Abstract	There are many applications for informal language understanding tasks in the real world. However, because informal language understanding tasks suffer more from data noise than formal ones, there is a huge performance gap between formal and informal language understanding tasks. The recent pre-trained models that improved the performance of formal language understanding tasks did not achieve the performance on informal language much. Although the formal tasks and informal tasks are similar in purpose, their language models significantly differ from each other. We propose a data annealing transfer learning procedure to bridge the performance gap on informal natural language understanding tasks. In the data annealing procedure, the training set contains mainly formal text data at first; then we gradually increase the proportion of the informal text data during the training process. We validate the data annealing procedure on three natural language understanding tasks: named entity recognition (NER), part-of-speech (POS) tagging, and chunking with two popular neural network models, LSTM and BERT. When BERT is fine-tuned with our learning procedure, it outperforms all the state-of-the-art models on the three informal tasks.
Tasks	Chunking, Named Entity Recognition, Part-Of-Speech Tagging, Transfer Learning
Published	2020-01-01
URL	https://openreview.net/forum?id=HJlys1BtwB
PDF	https://openreview.net/pdf?id=HJlys1BtwB
PWC	https://paperswithcode.com/paper/data-annealing-transfer-learning-procedure
Repo
Framework

A Gradient-Based Approach to Neural Networks Structure Learning


Title	A Gradient-Based Approach to Neural Networks Structure Learning
Authors	Anonymous
Abstract	Designing the architecture of deep neural networks (DNNs) requires human expertise and is a cumbersome task. One approach to automatize this task has been considering DNN architecture parameters such as the number of layers, the number of neurons per layer, or the activation function of each layer as hyper-parameters, and using an external method for optimizing it. Here we propose a novel neural network model, called Farfalle Neural Network, in which important architecture features such as the number of neurons in each layer and the wiring among the neurons are automatically learned during the training process. We show that the proposed model can replace a stack of dense layers, which is used as a part of many DNN architectures. It can achieve higher accuracy using significantly fewer parameters.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=Bye-sxHFwB
PDF	https://openreview.net/pdf?id=Bye-sxHFwB
PWC	https://paperswithcode.com/paper/a-gradient-based-approach-to-neural-networks
Repo
Framework

NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds


Title	NPTC-net: Narrow-Band Parallel Transport Convolutional Neural Network on Point Clouds
Authors	Anonymous
Abstract	Convolution plays a crucial role in various applications in signal and image processing, analysis and recognition. It is also the main building block of convolution neural networks (CNNs). Designing appropriate convolution neural networks on manifold-structured point clouds can inherit and empower recent advances of CNNs to analyzing and processing point cloud data. However, one of the major challenges is to define a proper way to “sweep” filters through the point cloud as a natural generalization of the planar convolution and to reflect the point cloud’s geometry at the same time. In this paper, we consider generalizing convolution by adapting parallel transport on the point cloud. Inspired by a triangulated surface based method \cite{DBLP:journals/corr/abs-1805-07857}, we propose the Narrow-Band Parallel Transport Convolution (NPTC) using a specifically defined connection on a voxelized narrow-band approximation of point cloud data. With that, we further propose a deep convolutional neural network based on NPTC (called NPTC-net) for point cloud classification and segmentation. Comprehensive experiments show that the proposed NPTC-net achieves similar or better results than current state-of-the-art methods on point clouds classification and segmentation.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=SJl9PTNYDS
PDF	https://openreview.net/pdf?id=SJl9PTNYDS
PWC	https://paperswithcode.com/paper/nptc-net-narrow-band-parallel-transport-1
Repo
Framework

Self-supervised Training of Proposal-based Segmentation via Background Prediction


Title	Self-supervised Training of Proposal-based Segmentation via Background Prediction
Authors	Anonymous
Abstract	While supervised object detection and segmentation methods achieve impressive accuracy, they generalize poorly to images whose appearance significantly differs from the data they have been trained on. To address this in scenarios where annotating data is prohibitively expensive, we introduce a self-supervised approach to detection and segmentation, able to work with monocular images captured with a moving camera. At the heart of our approach lies the observations that object segmentation and background reconstruction are linked tasks, and that, for structured scenes, background regions can be re-synthesized from their surroundings, whereas regions depicting the object cannot. We encode this intuition as a self-supervised loss function that we exploit to train a proposal-based segmentation network. To account for the discrete nature of the proposals, we develop a Monte Carlo-based training strategy that allows the algorithm to explore the large space of object proposals. We apply our method to human detection and segmentation in images that visually depart from those of standard benchmarks, achieving competitive results compared to the few existing self-supervised methods and approaching the accuracy of supervised ones that exploit large annotated datasets.
Tasks	Human Detection, Object Detection, Semantic Segmentation
Published	2020-01-01
URL	https://openreview.net/forum?id=BJxSWeSYPB
PDF	https://openreview.net/pdf?id=BJxSWeSYPB
PWC	https://paperswithcode.com/paper/self-supervised-training-of-proposal-based-1
Repo
Framework

Toward Understanding Generalization of Over-parameterized Deep ReLU network trained with SGD in Student-teacher Setting


Title	Toward Understanding Generalization of Over-parameterized Deep ReLU network trained with SGD in Student-teacher Setting
Authors	Anonymous
Abstract	To analyze deep ReLU network, we adopt a student-teacher setting in which an over-parameterized student network learns from the output of a fixed teacher network of the same depth, with Stochastic Gradient Descent (SGD). Our contributions are two-fold. First, we prove that when the gradient is zero (or bounded above by a small constant) at every data point in training, a situation called \emph{interpolation setting}, there exists many-to-one \emph{alignment} between student and teacher nodes in the lowest layer under mild conditions. This suggests that generalization in unseen dataset is achievable, even the same condition often leads to zero training error. Second, analysis of noisy recovery and training dynamics in 2-layer network shows that strong teacher nodes (with large fan-out weights) are learned first and subtle teacher nodes are left unlearned until late stage of training. As a result, it could take a long time to converge into these small-gradient critical points. Our analysis shows that over-parameterization plays two roles: (1) it is a necessary condition for alignment to happen at the critical points, and (2) in training dynamics, it helps student nodes cover more teacher nodes with fewer iterations. Both improve generalization. Experiments justify our finding.
Tasks
Published	2020-01-01
URL	https://openreview.net/forum?id=HJgcw0Etwr
PDF	https://openreview.net/pdf?id=HJgcw0Etwr
PWC	https://paperswithcode.com/paper/toward-understanding-generalization-of-over
Repo
Framework