Paper Group NANR 107
PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization. Certified Defenses for Adversarial Patches. Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations. Training Interpretable Convolutional Neural Networks towards Class-specific Filters. All SMILES Var …
PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization
Title | PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization |
Authors | Anonymous |
Abstract | In this paper, we propose a novel technique for improving the stochastic gradient descent (SGD) method to train deep networks, which we term \emph{PowerSGD}. The proposed PowerSGD method simply raises the stochastic gradient to a certain power $\gamma\in[0,1]$ during iterations and introduces only one additional parameter, namely, the power exponent $\gamma$ (when $\gamma=1$, PowerSGD reduces to SGD). We further propose PowerSGD with momentum, which we term \emph{PowerSGDM}, and provide convergence rate analysis on both PowerSGD and PowerSGDM methods. Experiments are conducted on popular deep learning models and benchmark datasets. Empirical results show that the proposed PowerSGD and PowerSGDM obtain faster initial training speed than adaptive gradient methods, comparable generalization ability with SGD, and improved robustness to hyper-parameter selection and vanishing gradients. PowerSGD is essentially a gradient modifier via a nonlinear transformation. As such, it is orthogonal and complementary to other techniques for accelerating gradient-based optimization. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJlqoTEtDB |
https://openreview.net/pdf?id=rJlqoTEtDB | |
PWC | https://paperswithcode.com/paper/powersgd-powered-stochastic-gradient-descent |
Repo | |
Framework | |
Certified Defenses for Adversarial Patches
Title | Certified Defenses for Adversarial Patches |
Authors | Anonymous |
Abstract | Adversarial patch attacks were recently recognized as the most practical threat model against real-world computer vision systems. Most published defenses against patch attacks are based on preprocessing input images to mitigate adversarial noise. The first contribution of this paper is a set of experiments demonstrating that such defense strategies can easily be broken by white-box adversaries. Motivated by this finding, we present an extension of certified defense algorithms and propose significantly faster variants for robust training against patch attacks. Finally, we experiment with different patch shapes for testing, and observe that robustness to such attacks transfers surprisingly well. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HyeaSkrYPH |
https://openreview.net/pdf?id=HyeaSkrYPH | |
PWC | https://paperswithcode.com/paper/certified-defenses-for-adversarial-patches |
Repo | |
Framework | |
Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations
Title | Mixture-of-Experts Variational Autoencoder for clustering and generating from similarity-based representations |
Authors | Anonymous |
Abstract | Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering gained popularity due to the non-linearity of neural networks, which allows for flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on synthetic data, the MNIST benchmark data set, and a challenging real-world task of defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines and we show that the MoE architecture in the decoder reduces the computational cost of sampling specific data modes with high fidelity. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJgs8TVtvr |
https://openreview.net/pdf?id=SJgs8TVtvr | |
PWC | https://paperswithcode.com/paper/mixture-of-experts-variational-autoencoder-1 |
Repo | |
Framework | |
Training Interpretable Convolutional Neural Networks towards Class-specific Filters
Title | Training Interpretable Convolutional Neural Networks towards Class-specific Filters |
Authors | Anonymous |
Abstract | Convolutional neural networks (CNNs) have often been treated as “black-box” and successfully used in a range of tasks. However, CNNs still suffer from the problem of filter ambiguity – an intricate many-to-many mapping relationship between filters and features, which undermines the models’ interpretability. To interpret CNNs, most existing works attempt to interpret a pre-trained model, while neglecting to reduce the filter ambiguity hidden behind. To this end, we propose a simple but effective strategy for training interpretable CNNs. Specifically, we propose a novel Label Sensitive Gate (LSG) structure to enable the model to learn disentangled filters in a supervised manner, in which redundant channels experience a periodical shutdown as flowing through a learnable gate varying with input labels. To reduce redundant filters during training, LSG is constrained with a sparsity regularization. In this way, such training strategy imposes each filter’s attention to just one or few classes, namely class-specific. Extensive experiments demonstrate the fabulous performance of our method in generating sparse and highly label- related representation of the input. Moreover, comparing to the standard training strategy, our model displays less redundancy and stronger interpretability. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1ltnp4KwS |
https://openreview.net/pdf?id=r1ltnp4KwS | |
PWC | https://paperswithcode.com/paper/training-interpretable-convolutional-neural |
Repo | |
Framework | |
All SMILES Variational Autoencoder for Molecular Property Prediction and Optimization
Title | All SMILES Variational Autoencoder for Molecular Property Prediction and Optimization |
Authors | Anonymous |
Abstract | Variational autoencoders (VAEs) defined over SMILES string and graph-based representations of molecules promise to improve the optimization of molecular properties, thereby revolutionizing the pharmaceuticals and materials industries. However, these VAEs are hindered by the non-unique nature of SMILES strings and the computational cost of graph convolutions. To efficiently pass messages along all paths through the molecular graph, we encode multiple SMILES strings of a single molecule using a set of stacked recurrent neural networks, harmonizing hidden representations of each atom between SMILES representations, and use attentional pooling to build a final fixed-length latent representation. By then decoding to a disjoint set of SMILES strings of the molecule, our All SMILES VAE learns an almost bijective mapping between molecules and latent representations near the high-probability-mass subspace of the prior. Our SMILES-derived but molecule-based latent representations significantly surpass the state-of-the-art in a variety of fully- and semi-supervised property regression and molecular property optimization tasks. |
Tasks | Molecular Property Prediction |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rkxUfANKwB |
https://openreview.net/pdf?id=rkxUfANKwB | |
PWC | https://paperswithcode.com/paper/all-smiles-variational-autoencoder-for |
Repo | |
Framework | |
Improving the Generalization of Visual Navigation Policies using Invariance Regularization
Title | Improving the Generalization of Visual Navigation Policies using Invariance Regularization |
Authors | Michel Aractingi, Christopher Dance, Julien Perez, Tomi Silander |
Abstract | Training agents to operate in one environment often yields overfitted models that are unable to generalize to the changes in that environment. However, due to the numerous variations that can occur in the real-world, the agent is often required to be robust in order to be useful. This has not been the case for agents trained with reinforcement learning (RL) algorithms. In this paper, we investigate the overfitting of RL agents to the training environments in visual navigation tasks. Our experiments show that deep RL agents can overfit even when trained on multiple environments simultaneously. We propose a regularization method which combines RL with supervised learning methods by adding a term to the RL objective that would encourage the invariance of a policy to variations in the observations that ought not to affect the action taken. The results of this method, called invariance regularization, show an improvement in the generalization of policies to environments not seen during training. |
Tasks | Visual Navigation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=B1xtFpVtvB |
https://openreview.net/pdf?id=B1xtFpVtvB | |
PWC | https://paperswithcode.com/paper/improving-the-generalization-of-visual |
Repo | |
Framework | |
PatchFormer: A neural architecture for self-supervised representation learning on images
Title | PatchFormer: A neural architecture for self-supervised representation learning on images |
Authors | Anonymous |
Abstract | Learning rich representations from predictive learning without labels has been a longstanding challenge in the field of machine learning. Generative pre-training has so far not been as successful as contrastive methods in modeling representations of raw images. In this paper, we propose a neural architecture for self-supervised representation learning on raw images called the PatchFormer which learns to model spatial dependencies across patches in a raw image. Our method learns to model the conditional probability distribution of missing patches given the context of surrounding patches. We evaluate the utility of the learned representations by fine-tuning the pre-trained model on low data-regime classification tasks. Specifically, we benchmark our model on semi-supervised ImageNet classification which has become a popular benchmark recently for semi-supervised and self-supervised learning methods. Our model is able to achieve 30.3% and 65.5% top-1 accuracies when trained only using 1% and 10% of the labels on ImageNet showing the promise for generative pre-training methods. |
Tasks | Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJg1lxrYwS |
https://openreview.net/pdf?id=SJg1lxrYwS | |
PWC | https://paperswithcode.com/paper/patchformer-a-neural-architecture-for-self |
Repo | |
Framework | |
Learning from Positive and Unlabeled Data with Adversarial Training
Title | Learning from Positive and Unlabeled Data with Adversarial Training |
Authors | Anonymous |
Abstract | Positive-unlabeled (PU) learning learns a binary classifier using only positive and unlabeled examples without labeled negative examples. This paper shows that the GAN (Generative Adversarial Networks) style of adversarial training is quite suitable for PU learning. GAN learns a generator to generate data (e.g., images) to fool a discriminator which tries to determine whether the generated data belong to a (positive) training class. PU learning is similar and can be naturally casted as trying to identify (not generate) likely positive data from the unlabeled set also to fool a discriminator that determines whether the identified likely positive data from the unlabeled set (U) are indeed positive (P). A direct adaptation of GAN for PU learning does not produce a strong classifier. This paper proposes a more effective method called Predictive Adversarial Networks (PAN) using a new objective function based on KL-divergence, which performs much better.~Empirical evaluation using both image and text data shows the effectiveness of PAN. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HygPjlrYvB |
https://openreview.net/pdf?id=HygPjlrYvB | |
PWC | https://paperswithcode.com/paper/learning-from-positive-and-unlabeled-data-2 |
Repo | |
Framework | |
Stagnant zone segmentation with U-net
Title | Stagnant zone segmentation with U-net |
Authors | Selam Waktola, Laurent Babout, Krzysztof Grudzien |
Abstract | Silo discharging and monitoring the process for industrial or research application depend on computerized segmentation of different parts of images such as stagnant and flowing zones which is the toughest task. X-ray Computed Tomography (CT) is one of a powerful non-destructive technique for cross-sectional images of a 3D object based on X-ray absorption. CT is the most proficient for investigating different granular flow phenomena and segmentation of the stagnant zone as compared to other imaging techniques. In any case, manual segmentation is tiresome and erroneous for further investigations. Hence, automatic and precise strategies are required. In the present work, a U-net architecture is used for segmenting the stagnant zone during silo discharging process. This proposed image segmentation method provides fast and effective outcomes by exploiting a convolutional neural networks technique with an accuracy of 97 percent |
Tasks | Computed Tomography (CT), Semantic Segmentation |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1eH9hNtwr |
https://openreview.net/pdf?id=H1eH9hNtwr | |
PWC | https://paperswithcode.com/paper/stagnant-zone-segmentation-with-u-net |
Repo | |
Framework | |
Towards a Deep Network Architecture for Structured Smoothness
Title | Towards a Deep Network Architecture for Structured Smoothness |
Authors | Anonymous |
Abstract | We propose the Fixed Grouping Layer (FGL); a novel feedforward layer designed to incorporate the inductive bias of structured smoothness into a deep learning model. FGL achieves this goal by connecting nodes across layers based on spatial similarity. The use of structured smoothness, as implemented by FGL, is motivated by applications to structured spatial data, which is, in turn, motivated by domain knowledge. The proposed model architecture outperforms conventional neural network architectures across a variety of simulated and real datasets with structured smoothness. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hklr204Fvr |
https://openreview.net/pdf?id=Hklr204Fvr | |
PWC | https://paperswithcode.com/paper/towards-a-deep-network-architecture-for |
Repo | |
Framework | |
Adaptive Generation of Unrestricted Adversarial Inputs
Title | Adaptive Generation of Unrestricted Adversarial Inputs |
Authors | Anonymous |
Abstract | Neural networks are vulnerable to adversarially-constructed perturbations of their inputs. Most research so far has considered perturbations of a fixed magnitude under some $l_p$ norm. Although studying these attacks is valuable, there has been increasing interest in the construction of—and robustness to—unrestricted attacks, which are not constrained to a small and rather artificial subset of all possible adversarial inputs. We introduce a novel algorithm for generating such unrestricted adversarial inputs which, unlike prior work, is adaptive: it is able to tune its attacks to the classifier being targeted. It also offers a 400–2,000× speedup over the existing state of the art. We demonstrate our approach by generating unrestricted adversarial inputs that fool classifiers robust to perturbation-based attacks. We also show that, by virtue of being adaptive and unrestricted, our attack is able to bypass adversarial training against it. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=rJg46kHYwH |
https://openreview.net/pdf?id=rJg46kHYwH | |
PWC | https://paperswithcode.com/paper/adaptive-generation-of-unrestricted |
Repo | |
Framework | |
A Functional Characterization of Randomly Initialized Gradient Descent in Deep ReLU Networks
Title | A Functional Characterization of Randomly Initialized Gradient Descent in Deep ReLU Networks |
Authors | Anonymous |
Abstract | Despite their popularity and successes, deep neural networks are poorly understood theoretically and treated as ‘black box’ systems. Using a functional view of these networks gives us a useful new lens with which to understand them. This allows us us to theoretically or experimentally probe properties of these networks, including the effect of standard initializations, the value of depth, the underlying loss surface, and the origins of generalization. One key result is that generalization results from smoothness of the functional approximation, combined with a flat initial approximation. This smoothness increases with number of units, explaining why massively overparamaterized networks continue to generalize well. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJl9PRVKDS |
https://openreview.net/pdf?id=BJl9PRVKDS | |
PWC | https://paperswithcode.com/paper/a-functional-characterization-of-randomly |
Repo | |
Framework | |
A closer look at network resolution for efficient network design
Title | A closer look at network resolution for efficient network design |
Authors | Anonymous |
Abstract | There is growing interest in designing lightweight neural networks for mobile and embedded vision applications. Previous works typically reduce computations from the structure level. For example, group convolution based methods reduce computations by factorizing a vanilla convolution into depth-wise and point-wise convolutions. Pruning based methods prune redundant connections in the network structure. In this paper, we explore the importance of network input for achieving optimal accuracy-efficiency trade-off. Reducing input scale is a simple yet effective way to reduce computational cost. It does not require careful network module design, specific hardware optimization and network retraining after pruning. Moreover, different input scales contain different representations to learn. We propose a framework to mutually learn from different input resolutions and network widths. With the shared knowledge, our framework is able to find better width-resolution balance and capture multi-scale representations. It achieves consistently better ImageNet top-1 accuracy over US-Net under different computation constraints, and outperforms the best compound scale model of EfficientNet by 1.5%. The superiority of our framework is also validated on COCO object detection and instance segmentation as well as transfer learning. |
Tasks | Instance Segmentation, Object Detection, Semantic Segmentation, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=H1x-pANtDB |
https://openreview.net/pdf?id=H1x-pANtDB | |
PWC | https://paperswithcode.com/paper/a-closer-look-at-network-resolution-for-1 |
Repo | |
Framework | |
Match prediction from group comparison data using neural networks
Title | Match prediction from group comparison data using neural networks |
Authors | Anonymous |
Abstract | We explore the match prediction problem where one seeks to estimate the likelihood of a group of M items preferred over another, based on partial group comparison data. Challenges arise in practice. As existing state-of-the-art algorithms are tailored to certain statistical models, we have different best algorithms across distinct scenarios. Worse yet, we have no prior knowledge on the underlying model for a given scenario. These call for a unified approach that can be universally applied to a wide range of scenarios and achieve consistently high performances. To this end, we incorporate deep learning architectures so as to reflect the key structural features that most state-of-the-art algorithms, some of which are optimal in certain settings, share in common. This enables us to infer hidden models underlying a given dataset, which govern in-group interactions and statistical patterns of comparisons, and hence to devise the best algorithm tailored to the dataset at hand. Through extensive experiments on synthetic and real-world datasets, we evaluate our framework in comparison to state-of-the-art algorithms. It turns out that our framework consistently leads to the best performance across all datasets in terms of cross entropy loss and prediction accuracy, while the state-of-the-art algorithms suffer from inconsistent performances across different datasets. Furthermore, we show that it can be easily extended to attain satisfactory performances in rank aggregation tasks, suggesting that it can be adaptable for other tasks as well. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=BJxYUaVtPB |
https://openreview.net/pdf?id=BJxYUaVtPB | |
PWC | https://paperswithcode.com/paper/match-prediction-from-group-comparison-data |
Repo | |
Framework | |
Adversarial Training with Voronoi Constraints
Title | Adversarial Training with Voronoi Constraints |
Authors | Anonymous |
Abstract | Adversarial examples are a pervasive phenomenon of machine learning models where seemingly imperceptible perturbations to the input lead to misclassifications for otherwise statistically accurate models. Adversarial training, one of the most successful empirical defenses to adversarial examples, refers to training on adversarial examples generated within a geometric constraint set. The most commonly used geometric constraint is an $L_p$-ball of radius $\epsilon$ in some norm. We introduce adversarial training with Voronoi constraints, which replaces the $L_p$-ball constraint with the Voronoi cell for each point in the training set. We show that adversarial training with Voronoi constraints produces robust models which significantly improve over the state-of-the-art on MNIST and are competitive on CIFAR-10. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJeb9xSYwB |
https://openreview.net/pdf?id=HJeb9xSYwB | |
PWC | https://paperswithcode.com/paper/adversarial-training-with-voronoi-constraints-1 |
Repo | |
Framework | |