Paper Group ANR 1644
When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks. Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation. Spatio-Temporal Attention Pooling for Audio Scene Classification. A Robust Non-Linear and Feature-Selection Image Fusion Theory. C-3PO: Cyclic-Three-Phase Optimization for …
When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks
Title | When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks |
Authors | Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, Dahua Lin |
Abstract | Recent advances in adversarial attacks uncover the intrinsic vulnerability of modern deep neural networks. Since then, extensive efforts have been devoted to enhancing the robustness of deep networks via specialized learning algorithms and loss functions. In this work, we take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks. To obtain the large number of networks needed for this study, we adopt one-shot neural architecture search, training a large network for once and then finetuning the sub-networks sampled therefrom. The sampled architectures together with the accuracies they achieve provide a rich basis for our study. Our “robust architecture Odyssey” reveals several valuable observations: 1) densely connected patterns result in improved robustness; 2) under computational budget, adding convolution operations to direct connection edge is effective; 3) flow of solution procedure (FSP) matrix is a good indicator of network robustness. Based on these observations, we discover a family of robust architectures (RobNets). On various datasets, including CIFAR, SVHN, Tiny-ImageNet, and ImageNet, RobNets exhibit superior robustness performance to other widely used architectures. Notably, RobNets substantially improve the robust accuracy (~5% absolute gains) under both white-box and black-box attacks, even with fewer parameter numbers. Code is available at https://github.com/gmh14/RobNets. |
Tasks | Neural Architecture Search |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10695v3 |
https://arxiv.org/pdf/1911.10695v3.pdf | |
PWC | https://paperswithcode.com/paper/when-nas-meets-robustness-in-search-of-robust |
Repo | |
Framework | |
Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation
Title | Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation |
Authors | Dongjun Lee |
Abstract | Most deep learning approaches for text-to-SQL generation are limited to the WikiSQL dataset, which only supports very simple queries over a single table. We focus on the Spider dataset, a complex and cross-domain text-to-SQL task, which includes complex queries over multiple tables. In this paper, we propose a SQL clause-wise decoding neural architecture with a self-attention based database schema encoder to address the Spider task. Each of the clause-specific decoders consists of a set of sub-modules, which is defined by the syntax of each clause. Additionally, our model works recursively to support nested queries. When evaluated on the Spider dataset, our approach achieves 4.6% and 9.8% accuracy gain in the test and dev sets, respectively. In addition, we show that our model is significantly more effective at predicting complex and nested queries than previous work. |
Tasks | Text-To-Sql |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08835v2 |
https://arxiv.org/pdf/1904.08835v2.pdf | |
PWC | https://paperswithcode.com/paper/recursive-and-clause-wise-decoding-for |
Repo | |
Framework | |
Spatio-Temporal Attention Pooling for Audio Scene Classification
Title | Spatio-Temporal Attention Pooling for Audio Scene Classification |
Authors | Huy Phan, Oliver Y. Chén, Lam Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, Alfred Mertins |
Abstract | Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The bidirectional recurrent layers are then able to encode the temporal dynamics of the resulting convolutional features. Afterwards, a two-dimensional attention mask is formed via the outer product of the spatial and temporal attention vectors learned from two designated attention layers to weigh and pool the recurrent output into a final feature vector for classification. The network is trained with between-class examples generated from between-class data augmentation. Experiments demonstrate that the proposed method not only outperforms a strong convolutional neural network baseline but also sets new state-of-the-art performance on the LITIS Rouen dataset. |
Tasks | Acoustic Scene Classification, Data Augmentation, Scene Classification |
Published | 2019-04-06 |
URL | https://arxiv.org/abs/1904.03543v2 |
https://arxiv.org/pdf/1904.03543v2.pdf | |
PWC | https://paperswithcode.com/paper/spatio-temporal-attention-pooling-for-audio |
Repo | |
Framework | |
A Robust Non-Linear and Feature-Selection Image Fusion Theory
Title | A Robust Non-Linear and Feature-Selection Image Fusion Theory |
Authors | Aiqing Fang, Xinbo Zhao, Yanning Zhang |
Abstract | The human visual perception system has strong robustness in image fusion. This robustness is based on human visual perception system’s characteristics of feature selection and non-linear fusion of different features. In order to simulate the human visual perception mechanism in image fusion tasks, we propose a multi-source image fusion framework that combines illuminance factors and attention mechanisms. The framework effectively combines traditional image features and modern deep learning features. First, we perform multi-scale decomposition of multi-source images. Then, the visual saliency map and the deep feature map are combined with the illuminance fusion factor to perform high-low frequency nonlinear fusion. Secondly, the characteristics of high and low frequency fusion are selected through the channel attention network to obtain the final fusion map. By simulating the nonlinear characteristics and selection characteristics of the human visual perception system in image fusion, the fused image is more in line with the human visual perception mechanism. Finally, we validate our fusion framework on public datasets of infrared and visible images, medical images and multi-focus images. The experimental results demonstrate the superiority of our fusion framework over state-of-arts in visual quality, objective fusion metrics and robustness. |
Tasks | Feature Selection |
Published | 2019-12-23 |
URL | https://arxiv.org/abs/1912.10738v1 |
https://arxiv.org/pdf/1912.10738v1.pdf | |
PWC | https://paperswithcode.com/paper/a-robust-non-linear-and-feature-selection |
Repo | |
Framework | |
C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning
Title | C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning |
Authors | Taewoo Kim, Joo-Haeng Lee |
Abstract | Motion retargeting between heterogeneous polymorphs with different sizes and kinematic configurations requires a comprehensive knowledge of (inverse) kinematics. Moreover, it is non-trivial to provide a kinematic independent general solution. In this study, we developed a cyclic three-phase optimization method based on deep reinforcement learning for human-robot motion retargeting. The motion retargeting learning is performed using refined data in a latent space by the cyclic and filtering paths of our method. In addition, the human-in-the-loop based three-phase approach provides a framework for the improvement of the motion retargeting policy by both quantitative and qualitative manners. Using the proposed C-3PO method, we were successfully able to learn the motion retargeting skill between the human skeleton and motion of the multiple robots such as NAO, Pepper, Baxter and C-3PO. |
Tasks | |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11303v3 |
https://arxiv.org/pdf/1909.11303v3.pdf | |
PWC | https://paperswithcode.com/paper/c-3po-cyclic-three-phase-optimization-for |
Repo | |
Framework | |
Pareto-optimal data compression for binary classification tasks
Title | Pareto-optimal data compression for binary classification tasks |
Authors | Max Tegmark, Tailin Wu |
Abstract | The goal of lossy data compression is to reduce the storage cost of a data set $X$ while retaining as much information as possible about something ($Y$) that you care about. For example, what aspects of an image $X$ contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping $X\to Z\equiv f(X)$ that maximizes the mutual information $I(Z,Y)$ while the entropy $H(Z)$ is kept below some fixed threshold. We present a method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable $X$ (an image, say) drawn from a class $Y\in{1,…,n}$ can be distilled into a vector $W=f(X)\in \mathbb{R}^{n-1}$ losslessly, so that $I(W,Y)=I(X,Y)$; for example, for a binary classification task of cats and dogs, each image $X$ is mapped into a single real number $W$ retaining all information that helps distinguish cats from dogs. For the $n=2$ case of binary classification, we then show how $W$ can be further compressed into a discrete variable $Z=g_\beta(W)\in{1,…,m_\beta}$ by binning $W$ into $m_\beta$ bins, in such a way that varying the parameter $\beta$ sweeps out the full Pareto frontier, solving a generalization of the Discrete Information Bottleneck (DIB) problem. We argue that the most interesting points on this frontier are “corners” maximizing $I(Z,Y)$ for a fixed number of bins $m=2,3…$ which can be conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm. |
Tasks | Image Clustering, Multiobjective Optimization |
Published | 2019-08-23 |
URL | https://arxiv.org/abs/1908.08961v2 |
https://arxiv.org/pdf/1908.08961v2.pdf | |
PWC | https://paperswithcode.com/paper/pareto-optimal-data-compression-for-binary |
Repo | |
Framework | |
Release Strategies and the Social Impacts of Language Models
Title | Release Strategies and the Social Impacts of Language Models |
Authors | Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, Jasmine Wang |
Abstract | Large language models have a range of beneficial uses: they can assist in prose, poetry, and programming; analyze dataset biases; and more. However, their flexibility and generative capabilities also raise misuse concerns. This report discusses OpenAI’s work related to the release of its GPT-2 language model. It discusses staged release, which allows time between model releases to conduct risk and benefit analyses as model sizes increased. It also discusses ongoing partnership-based research and provides recommendations for better coordination and responsible publication in AI. |
Tasks | Language Modelling |
Published | 2019-08-24 |
URL | https://arxiv.org/abs/1908.09203v2 |
https://arxiv.org/pdf/1908.09203v2.pdf | |
PWC | https://paperswithcode.com/paper/release-strategies-and-the-social-impacts-of |
Repo | |
Framework | |
A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection
Title | A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection |
Authors | Dongwon Park, Yonghyeok Seo, Dongju Shin, Jaesik Choi, Se Young Chun |
Abstract | Recently, robotic grasp detection (GD) and object detection (OD) with reasoning have been investigated using deep neural networks (DNNs). There have been works to combine these multi-tasks using separate networks so that robots can deal with situations of grasping specific target objects in the cluttered, stacked, complex piles of novel objects from a single RGB-D camera. We propose a single multi-task DNN that yields the information on GD, OD and relationship reasoning among objects with a simple post-processing. Our proposed methods yielded state-of-the-art performance with the accuracy of 98.6% and 74.2% and the computation speed of 33 and 62 frame per second on VMRD and Cornell datasets, respectively. Our methods also yielded 95.3% grasp success rate for single novel object grasping with a 4-axis robot arm and 86.7% grasp success rate in cluttered novel objects with a Baxter robot. |
Tasks | Object Detection |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.07050v1 |
https://arxiv.org/pdf/1909.07050v1.pdf | |
PWC | https://paperswithcode.com/paper/a-single-multi-task-deep-neural-network-with |
Repo | |
Framework | |
Neuromorphic In-Memory Computing Framework using Memtransistor Cross-bar based Support Vector Machines
Title | Neuromorphic In-Memory Computing Framework using Memtransistor Cross-bar based Support Vector Machines |
Authors | P. Kumar, A. R. Nair, O. Chatterjee, T. Paul, A. Ghosh, S. Chakrabartty, C. S. Thakur |
Abstract | This paper presents a novel framework for designing support vector machines (SVMs), which does not impose restriction on the SVM kernel to be positive-definite and allows the user to define memory constraint in terms of fixed template vectors. This makes the framework scalable and enables its implementation for low-power, high-density and memory constrained embedded application. An efficient hardware implementation of the same is also discussed, which utilizes novel low power memtransistor based cross-bar architecture, and is robust to device mismatch and randomness. We used memtransistor measurement data, and showed that the designed SVMs can achieve classification accuracy comparable to traditional SVMs on both synthetic and real-world benchmark datasets. This framework would be beneficial for design of SVM based wake-up systems for internet of things (IoTs) and edge devices where memtransistors can be used to optimize system’s energy-efficiency and perform in-memory matrix-vector multiplication (MVM). |
Tasks | |
Published | 2019-03-29 |
URL | https://arxiv.org/abs/1903.12330v2 |
https://arxiv.org/pdf/1903.12330v2.pdf | |
PWC | https://paperswithcode.com/paper/neuromorphic-in-memory-computing-framework |
Repo | |
Framework | |
A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control
Title | A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control |
Authors | Yuguang Yang |
Abstract | Deep reinforcement learning for high dimensional, hierarchical control tasks usually requires the use of complex neural networks as functional approximators, which can lead to inefficiency, instability and even divergence in the training process. Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple linear stages in a stable and efficient way. SDQL exploits the linear stage structure by approximating the Q function via a collection of deep Q sub-networks stacking along an axis marking the stage-wise progress of the whole task. By back-propagating the learned state values from later stages to earlier stages, all sub-networks co-adapt to maximize the total reward of the whole task, although each sub-network is responsible for learning optimal control policy for its own stage. This modularized architecture offers considerable flexibility in terms of environment and policy modeling, as it allows choices of different state spaces, action spaces, reward structures, and Q networks for each stage, Further, the backward stage-wise training procedure of SDQL can offers additional transparency, stability, and flexibility to the training process, thus facilitating model fine-tuning and hyper-parameter search. We demonstrate that SDQL is capable of learning competitive strategies for problems with characteristics of high-dimensional state space, heterogeneous action space(both discrete and continuous), multiple scales, and sparse and delayed rewards. |
Tasks | Q-Learning |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10684v1 |
https://arxiv.org/pdf/1911.10684v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-reinforcement-learning-architecture |
Repo | |
Framework | |
Single Image Deraining: From Model-Based to Data-Driven and Beyond
Title | Single Image Deraining: From Model-Based to Data-Driven and Beyond |
Authors | Wenhan Yang, Robby T. Tan, Shiqi Wang, Yuming Fang, Jiaying Liu |
Abstract | The goal of single-image deraining is to restore the rain-free background scenes of an image degraded by rain streaks and rain accumulation. The early single-image deraining methods employ a cost function, where various priors are developed to represent the properties of rain and background layers. Since 2017, single-image deraining methods step into a deep-learning era, and exploit various types of networks, i.e. convolutional neural networks, recurrent neural networks, generative adversarial networks, etc., demonstrating impressive performance. Given the current rapid development, in this paper, we provide a comprehensive survey of deraining methods over the last decade. We summarize the rain appearance models, and discuss two categories of deraining approaches: model-based and data-driven approaches. For the former, we organize the literature based on their basic models and priors. For the latter, we discuss developed ideas related to architectures, constraints, loss functions, and training datasets. We present milestones of single-image deraining methods, review a broad selection of previous works in different categories, and provide insights on the historical development route from the model-based to data-driven methods. We also summarize performance comparisons quantitatively and qualitatively. Beyond discussing the technicality of deraining methods, we also discuss the future directions. |
Tasks | Rain Removal, Single Image Deraining |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07150v2 |
https://arxiv.org/pdf/1912.07150v2.pdf | |
PWC | https://paperswithcode.com/paper/single-image-deraining-from-model-based-to |
Repo | |
Framework | |
Fair and Unbiased Algorithmic Decision Making: Current State and Future Challenges
Title | Fair and Unbiased Algorithmic Decision Making: Current State and Future Challenges |
Authors | Songül Tolan |
Abstract | Machine learning algorithms are now frequently used in sensitive contexts that substantially affect the course of human lives, such as credit lending or criminal justice. This is driven by the idea that `objective’ machines base their decisions solely on facts and remain unaffected by human cognitive biases, discriminatory tendencies or emotions. Yet, there is overwhelming evidence showing that algorithms can inherit or even perpetuate human biases in their decision making when they are based on data that contains biased human decisions. This has led to a call for fairness-aware machine learning. However, fairness is a complex concept which is also reflected in the attempts to formalize fairness for algorithmic decision making. Statistical formalizations of fairness lead to a long list of criteria that are each flawed (or harmful even) in different contexts. Moreover, inherent tradeoffs in these criteria make it impossible to unify them in one general framework. Thus, fairness constraints in algorithms have to be specific to the domains to which the algorithms are applied. In the future, research in algorithmic decision making systems should be aware of data and developer biases and add a focus on transparency to facilitate regular fairness audits. | |
Tasks | Decision Making |
Published | 2019-01-15 |
URL | http://arxiv.org/abs/1901.04730v1 |
http://arxiv.org/pdf/1901.04730v1.pdf | |
PWC | https://paperswithcode.com/paper/fair-and-unbiased-algorithmic-decision-making |
Repo | |
Framework | |
BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding
Title | BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding |
Authors | Gencer Sumbul, Marcela Charfuelan, Begüm Demir, Volker Markl |
Abstract | This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The BigEarthNet is significantly larger than the existing archives in remote sensing (RS) and thus is much more convenient to be used as a training source in the context of deep learning. This paper first addresses the limitations of the existing archives and then describes the properties of the BigEarthNet. Experimental results obtained in the framework of RS image scene classification problems show that a shallow Convolutional Neural Network (CNN) architecture trained on the BigEarthNet provides much higher accuracy compared to a state-of-the-art CNN model pre-trained on the ImageNet (which is a very popular large-scale benchmark archive in computer vision). The BigEarthNet opens up promising directions to advance operational RS applications and research in massive Sentinel-2 image archives. |
Tasks | Scene Classification |
Published | 2019-02-16 |
URL | https://arxiv.org/abs/1902.06148v3 |
https://arxiv.org/pdf/1902.06148v3.pdf | |
PWC | https://paperswithcode.com/paper/bigearthnet-a-large-scale-benchmark-archive |
Repo | |
Framework | |
A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification
Title | A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification |
Authors | Mahya Ameryan, Lambert Schomaker |
Abstract | In recent years, long short-term memory neural networks (LSTMs) have been applied quite successfully to problems in handwritten text recognition. However, their strength is more located in handling sequences of variable length than in handling geometric variability of the image patterns. Furthermore, the best results for LSTMs are often based on large-scale training of an ensemble of network instances. In this paper, an end-to-end convolutional LSTM Neural Network is used to handle both geometric variation and sequence variability. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (Convolutional Neural Network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently-scaled input images and different feature map sizes. Two datasets are used for evaluation of the performance of our algorithm: A standard benchmark RIMES dataset (French), and a historical handwritten dataset KdK (Dutch). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches. On the KdK dataset, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections. |
Tasks | Data Augmentation |
Published | 2019-12-06 |
URL | https://arxiv.org/abs/1912.03223v1 |
https://arxiv.org/pdf/1912.03223v1.pdf | |
PWC | https://paperswithcode.com/paper/a-limited-size-ensemble-of-homogeneous |
Repo | |
Framework | |
Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy
Title | Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy |
Authors | Abulikemu Abuduweili, Changliu Liu |
Abstract | High fidelity behavior prediction of intelligent agents is critical in many applications. However, the prediction model trained on the training set may not generalize to the testing set due to domain shift and time variance. The challenge motivates the adoption of online adaptation algorithms to update prediction models in real-time to improve the prediction performance. Inspired by Extended Kalman Filter (EKF), this paper introduces a series of online adaptation methods, which are applicable to neural network-based models. A base adaptation algorithm Modified EKF with forgetting factor (MEKF$_\lambda$) is introduced first, followed by exponential moving average filtering techniques. Then this paper introduces a dynamic multi-epoch update strategy to effectively utilize samples received in real time. With all these extensions, we propose a robust online adaptation algorithm: MEKF with Exponential Moving Average and Dynamic Multi-Epoch strategy (MEKF$_{\text{EMA-DME}}$). The proposed algorithm outperforms existing methods as demonstrated in experiments. |
Tasks | |
Published | 2019-12-04 |
URL | https://arxiv.org/abs/1912.01790v2 |
https://arxiv.org/pdf/1912.01790v2.pdf | |
PWC | https://paperswithcode.com/paper/robust-online-model-adaptation-by-extended |
Repo | |
Framework | |