Paper Group NANR 32
Shape Features Improve General Model Robustness. Deep Network classification by Scattering and Homotopy dictionary learning. HUBERT Untangles BERT to Improve Transfer across NLP Tasks. Deeper Insights into Weight Sharing in Neural Architecture Search. DyNet: Dynamic Convolution for Accelerating Convolution Neural Networks. A Simple Geometric Proof …
Shape Features Improve General Model Robustness
Title | Shape Features Improve General Model Robustness |
Authors | Anonymous |
Abstract | Recent studies show that convolutional neural networks (CNNs) are vulnerable under various settings, including adversarial examples, backdoor attacks, and distribution shifting. Motivated by the findings that human visual system pays more attention to global structure (e.g., shape) for recognition while CNNs are biased towards local texture features in images, we propose a unified framework EdgeGANRob based on robust edge features to improve the robustness of CNNs in general, which first explicitly extracts shape/structure features from a given image and then reconstructs a new image by refilling the texture information with a trained generative adversarial network (GAN). In addition, to reduce the sensitivity of edge detection algorithm to adversarial perturbation, we propose a robust edge detection approach Robust Canny based on the vanilla Canny algorithm. To gain more insights, we also compare EdgeGANRob with its simplified backbone procedure EdgeNetRob, which performs learning tasks directly on the extracted robust edge features. We find that EdgeNetRob can help boost model robustness significantly but at the cost of the clean model accuracy. EdgeGANRob, on the other hand, is able to improve clean model accuracy compared with EdgeNetRob and without losing the robustness benefits introduced by EdgeNetRob. Extensive experiments show that EdgeGANRob is resilient in different learning tasks under diverse settings. |
Tasks | Edge Detection |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJlPZlStwS |
https://openreview.net/pdf?id=SJlPZlStwS | |
PWC | https://paperswithcode.com/paper/shape-features-improve-general-model |
Repo | |
Framework | |
Deep Network classification by Scattering and Homotopy dictionary learning
Title | Deep Network classification by Scattering and Homotopy dictionary learning |
Authors | Anonymous |
Abstract | We introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet ILSVRC2012 dataset. The network first applies a scattering transform which linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse l1 dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework including ALISTA. Classification results are analyzed over ImageNet. |
Tasks | Dictionary Learning, Representation Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SJxWS64FwH |
https://openreview.net/pdf?id=SJxWS64FwH | |
PWC | https://paperswithcode.com/paper/deep-network-classification-by-scattering-and-1 |
Repo | |
Framework | |
HUBERT Untangles BERT to Improve Transfer across NLP Tasks
Title | HUBERT Untangles BERT to Improve Transfer across NLP Tasks |
Authors | Anonymous |
Abstract | We introduce HUBERT which combines the structured-representational power of Tensor-Product Representations (TPRs) and BERT, a pre-trained bidirectional transformer language model. We validate the effectiveness of our model on the GLUE benchmark and HANS dataset. We also show that there is shared structure between different NLP datasets which HUBERT, but not BERT, is able to learn and leverage. Extensive transfer-learning experiments are conducted to confirm this proposition. |
Tasks | Language Modelling, Transfer Learning |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxnM1rFvr |
https://openreview.net/pdf?id=HJxnM1rFvr | |
PWC | https://paperswithcode.com/paper/hubert-untangles-bert-to-improve-transfer |
Repo | |
Framework | |
Deeper Insights into Weight Sharing in Neural Architecture Search
Title | Deeper Insights into Weight Sharing in Neural Architecture Search |
Authors | Anonymous |
Abstract | With the success of deep neural networks, Neural Architecture Search (NAS) as a way of automatic model design has attracted wide attention. As training every child model from scratch is very time-consuming, recent works leverage weight-sharing to speed up the model evaluation procedure. These approaches greatly reduce computation by maintaining a single copy of weights on the super-net and share the weights among every child model. However, weight-sharing has no theoretical guarantee and its impact has not been well studied before. In this paper, we conduct comprehensive experiments to reveal the impact of weight-sharing: (1) The best-performing models from different runs or even from consecutive epochs within the same run have significant variance; (2) Even with high variance, we can extract valuable information from training the super-net with shared weights; (3) The interference between child models is a main factor that induces high variance; (4) Properly reducing the degree of weight sharing could effectively reduce variance and improve performance. |
Tasks | Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryxmrpNtvH |
https://openreview.net/pdf?id=ryxmrpNtvH | |
PWC | https://paperswithcode.com/paper/deeper-insights-into-weight-sharing-in-neural |
Repo | |
Framework | |
DyNet: Dynamic Convolution for Accelerating Convolution Neural Networks
Title | DyNet: Dynamic Convolution for Accelerating Convolution Neural Networks |
Authors | Anonymous |
Abstract | Convolution operator is the core of convolutional neural networks (CNNs) and occupies the most computation cost. To make CNNs more efficient, many methods have been proposed to either design lightweight networks or compress models. Although some efficient network structures have been proposed, such as MobileNet or ShuffleNet, we find that there still exists redundant information between convolution kernels. To address this issue, we propose a novel dynamic convolution method named \textbf{DyNet} in this paper, which can adaptively generate convolution kernels based on image contents. To demonstrate the effectiveness, we apply DyNet on multiple state-of-the-art CNNs. The experiment results show that DyNet can reduce the computation cost remarkably, while maintaining the performance nearly unchanged. Specifically, for ShuffleNetV2 (1.0), MobileNetV2 (1.0), ResNet18 and ResNet50, DyNet reduces 40.0%, 56.7%, 68.2% and 72.4% FLOPs respectively while the Top-1 accuracy on ImageNet only changes by +1.0%, -0.27%, -0.6% and -0.08%. Meanwhile, DyNet further accelerates the inference speed of MobileNetV2 (1.0), ResNet18 and ResNet50 by 1.87x,1.32x and 1.48x on CPU platform respectively. To verify the scalability, we also apply DyNet on segmentation task, the results show that DyNet can reduces 69.3% FLOPs while maintaining the Mean IoU on segmentation task. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyeZIkrKwS |
https://openreview.net/pdf?id=SyeZIkrKwS | |
PWC | https://paperswithcode.com/paper/dynet-dynamic-convolution-for-accelerating |
Repo | |
Framework | |
A Simple Geometric Proof for the Benefit of Depth in ReLU Networks
Title | A Simple Geometric Proof for the Benefit of Depth in ReLU Networks |
Authors | Anonymous |
Abstract | We present a simple proof for the benefit of depth in multi-layer feedforward network with rectifed activation (``"depth separation”). Specifically we present a sequence of classification problems f_i such that (a) for any fixed depth rectified network we can find an index m such that problems with index > m require exponential network width to fully represent the function f_m; and (b) for any problem f_m in the family, we present a concrete neural network with linear depth and bounded width that fully represents it. While there are several previous work showing similar results, our proof uses substantially simpler tools and techniques, and should be accessible to undergraduate students in computer science and people with similar backgrounds. | |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SkxEWgStDr |
https://openreview.net/pdf?id=SkxEWgStDr | |
PWC | https://paperswithcode.com/paper/a-simple-geometric-proof-for-the-benefit-of |
Repo | |
Framework | |
Towards Fast Adaptation of Neural Architectures with Meta Learning
Title | Towards Fast Adaptation of Neural Architectures with Meta Learning |
Authors | Anonymous |
Abstract | Recently, Neural Architecture Search (NAS) has been successfully applied to multiple artificial intelligence areas and shows better performance compared with hand-designed networks. However, the existing NAS methods only target a specific task. Most of them usually do well in searching an architecture for single task but are troublesome for multiple datasets or multiple tasks. Generally, the architecture for a new task is either searched from scratch, which is neither efficient nor flexible enough for practical application scenarios, or borrowed from the ones searched on other tasks, which might be not optimal. In order to tackle the transferability of NAS and conduct fast adaptation of neural architectures, we propose a novel Transferable Neural Architecture Search method based on meta-learning in this paper, which is termed as T-NAS. T-NAS learns a meta-architecture that is able to adapt to a new task quickly through a few gradient steps, which makes the transferred architecture suitable for the specific task. Extensive experiments show that T-NAS achieves state-of-the-art performance in few-shot learning and comparable performance in supervised learning but with 50x less searching cost, which demonstrates the effectiveness of our method. |
Tasks | Few-Shot Learning, Meta-Learning, Neural Architecture Search |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1eowANFvr |
https://openreview.net/pdf?id=r1eowANFvr | |
PWC | https://paperswithcode.com/paper/towards-fast-adaptation-of-neural |
Repo | |
Framework | |
Higher-Order Function Networks for Learning Composable 3D Object Representations
Title | Higher-Order Function Networks for Learning Composable 3D Object Representations |
Authors | Anonymous |
Abstract | We present a new approach to 3D object representation where the geometry of an object is encoded directly into the weights and biases of a second ‘mapping’ network. This mapping network can be used to reconstruct an object by applying its encoded transformation to points randomly sampled from a simple geometric space, such as the unit sphere. Next, we extend this concept to enable the composition of multiple mapping functions. This capability provides a method for mixing features of different objects through function composition in a latent function space. Our experiments examine the effectiveness of our method on a subset of the ShapeNet dataset. We find that this representation can reconstruct objects with accuracy equal to or exceeding state-of-the-art methods with orders of magnitude fewer parameters. Our smallest reconstruction network has only about 7000 parameters and shows reconstruction quality on par with state-of-the-art object representation architectures with millions of parameters. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJgfDREKDB |
https://openreview.net/pdf?id=HJgfDREKDB | |
PWC | https://paperswithcode.com/paper/higher-order-function-networks-for-learning-1 |
Repo | |
Framework | |
Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards
Title | Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards |
Authors | Anonymous |
Abstract | While recent progress in deep reinforcement learning has enabled robots to learn complex behaviors, tasks with long horizons and sparse rewards remain an ongoing challenge. In this work, we propose an effective reward shaping method through predictive coding to tackle sparse reward problems. By learning predictive representations offline and using these representations for reward shaping, we gain access to reward signals that understand the structure and dynamics of the environment. In particular, our method achieves better learning by providing reward signals that 1) understand environment dynamics 2) emphasize on features most useful for learning 3) resist noise in learned representations through reward accumulation. We demonstrate the usefulness of this approach in different domains ranging from robotic manipulation to navigation, and we show that reward signals produced through predictive coding are as effective for learning as hand-crafted rewards. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=Hkxi2gHYvH |
https://openreview.net/pdf?id=Hkxi2gHYvH | |
PWC | https://paperswithcode.com/paper/predictive-coding-for-boosting-deep |
Repo | |
Framework | |
Representation Quality Explain Adversarial Attacks
Title | Representation Quality Explain Adversarial Attacks |
Authors | Anonymous |
Abstract | Neural networks have been shown vulnerable to adversarial samples. Slightly perturbed input images are able to change the classification of accurate models, showing that the representation learned is not as good as previously thought. To aid the development of better neural networks, it would be important to evaluate to what extent are current neural networks’ representations capturing the existing features. Here we propose a way to evaluate the representation quality of neural networks using a novel type of zero-shot test, entitled Raw Zero-Shot. The main idea lies in the fact that some features are present on unknown classes and that unknown classes can be defined as a combination of previous learned features without representation bias (a bias towards representation that maps only current set of input-outputs and their boundary). To evaluate the soft-labels of unknown classes, two metrics are proposed. One is based on clustering validation techniques (Davies-Bouldin Index) and the other is based on soft-label distance of a given correct soft-label. Experiments show that such metrics are in accordance with the robustness to adversarial attacks and might serve as a guidance to build better models as well as be used in loss functions to create new types of neural networks. Interestingly, the results suggests that dynamic routing networks such as CapsNet have better representation while current deeper DNNs are trading off representation quality for accuracy. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SklfY6EFDH |
https://openreview.net/pdf?id=SklfY6EFDH | |
PWC | https://paperswithcode.com/paper/representation-quality-explain-adversarial |
Repo | |
Framework | |
Learning by shaking: Computing policy gradients by physical forward-propagation
Title | Learning by shaking: Computing policy gradients by physical forward-propagation |
Authors | Anonymous |
Abstract | Model-free and model-based reinforcement learning are two ends of a spectrum. Learning a good policy without a dynamic model can be prohibitively expensive. Learning the dynamic model of a system can reduce the cost of learning the policy, but it can also introduce bias if it is not accurate. We propose a middle ground where instead of the transition model, the sensitivity of the trajectories with respect to the perturbation (shaking) of the parameters is learned. This allows us to predict the local behavior of the physical system around a set of nominal policies without knowing the actual model. We assay our method on a custom-built physical robot in extensive experiments and show the feasibility of the approach in practice. We investigate potential challenges when applying our method to physical systems and propose solutions to each of them. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=r1gfweBFPB |
https://openreview.net/pdf?id=r1gfweBFPB | |
PWC | https://paperswithcode.com/paper/learning-by-shaking-computing-policy |
Repo | |
Framework | |
Learning Latent State Spaces for Planning through Reward Prediction
Title | Learning Latent State Spaces for Planning through Reward Prediction |
Authors | Anonymous |
Abstract | Model-based reinforcement learning methods typically learn models for high-dimensional state spaces by aiming to reconstruct and predict the original observations. However, drawing inspiration from model-free reinforcement learning, we propose learning a latent dynamics model directly from rewards. In this work, we introduce a model-based planning framework which learns a latent reward prediction model and then plan in the latent state-space. The latent representation is learned exclusively from multi-step reward prediction which we show to be the only necessary information for successful planning. With this framework, we are able to benefit from the concise model-free representation, while still enjoying the data-efficiency of model-based algorithms. We demonstrate our framework in multi-pendulum and multi-cheetah environments where several pendulums or cheetahs are shown to the agent but only one of them produces rewards. In these environments, it is important for the agent to construct a concise latent representation to filter out irrelevant observations. We find that our method can successfully learn an accurate latent reward prediction model in the presence of the irrelevant information while existing model-based methods fail. Planning in the learned latent state-space shows strong performance and high sample efficiency over model-free and model-based baselines. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ByxJjlHKwr |
https://openreview.net/pdf?id=ByxJjlHKwr | |
PWC | https://paperswithcode.com/paper/learning-latent-state-spaces-for-planning |
Repo | |
Framework | |
DeepAGREL: Biologically plausible deep learning via direct reinforcement
Title | DeepAGREL: Biologically plausible deep learning via direct reinforcement |
Authors | Anonymous |
Abstract | While much recent work has focused on biologically plausible variants of error-backpropagation, learning in the brain seems to mostly adhere to a reinforcement learning paradigm; biologically plausible neural reinforcement learning frameworks, however, were limited to shallow networks learning from compact and abstract sensory representations. Here, we show that it is possible to generalize such approaches to deep networks with an arbitrary number of layers. We demonstrate the learning scheme - DeepAGREL - on classical and hard image-classification benchmarks requiring deep networks, namely MNIST, CIFAR10, and CIFAR100, cast as direct reward tasks, both for deep fully connected, convolutional and locally connected architectures. We show that for these tasks, DeepAGREL achieves an accuracy that is equal to supervised error-backpropagation, and the trial-and-error nature of such learning imposes only a very limited cost in terms of training time. Thus, our results provide new insights into how deep learning may be implemented in the brain. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=ryl4-pEKvB |
https://openreview.net/pdf?id=ryl4-pEKvB | |
PWC | https://paperswithcode.com/paper/deepagrel-biologically-plausible-deep |
Repo | |
Framework | |
CrossNorm: On Normalization for Off-Policy Reinforcement Learning
Title | CrossNorm: On Normalization for Off-Policy Reinforcement Learning |
Authors | Anonymous |
Abstract | Off-policy temporal difference (TD) methods are a powerful class of reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD algorithms are not commonly used in combination with feature normalization techniques, despite positive effects of normalization in other domains. We show that naive application of existing normalization techniques is indeed not effective, but that well-designed normalization improves optimization stability and removes the necessity of target networks. In particular, we introduce a normalization based on a mixture of on- and off-policy transitions, which we call cross-normalization. It can be regarded as an extension of batch normalization that re-centers data for two different distributions, as present in off-policy learning. Applied to DDPG and TD3, cross-normalization improves over the state of the art across a range of MuJoCo benchmark tasks. |
Tasks | |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=SyeMblBtwr |
https://openreview.net/pdf?id=SyeMblBtwr | |
PWC | https://paperswithcode.com/paper/crossnorm-on-normalization-for-off-policy |
Repo | |
Framework | |
Imbalanced Classification via Adversarial Minority Over-sampling
Title | Imbalanced Classification via Adversarial Minority Over-sampling |
Authors | Anonymous |
Abstract | In most real-world scenarios, training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue via synthesizing less-frequent classes with adversarial examples of other classes. Surprisingly, we found this counter-intuitive method can effectively learn generalizable features of minority classes by transferring and leveraging the diversity of the majority information. Our experimental results on various types of class-imbalanced datasets in image classification and natural language processing show that the proposed method not only improves the generalization of minority classes significantly compared to other re-sampling or re-weighting methods, but also surpasses other methods of state-of-art level for the class-imbalanced classification. |
Tasks | Image Classification |
Published | 2020-01-01 |
URL | https://openreview.net/forum?id=HJxaC1rKDS |
https://openreview.net/pdf?id=HJxaC1rKDS | |
PWC | https://paperswithcode.com/paper/imbalanced-classification-via-adversarial |
Repo | |
Framework | |