January 25, 2020

3330 words 16 mins read

Paper Group ANR 1644

When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks. Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation. Spatio-Temporal Attention Pooling for Audio Scene Classification. A Robust Non-Linear and Feature-Selection Image Fusion Theory. C-3PO: Cyclic-Three-Phase Optimization for …

When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks


Title	When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks
Authors	Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, Dahua Lin
Abstract	Recent advances in adversarial attacks uncover the intrinsic vulnerability of modern deep neural networks. Since then, extensive efforts have been devoted to enhancing the robustness of deep networks via specialized learning algorithms and loss functions. In this work, we take an architectural perspective and investigate the patterns of network architectures that are resilient to adversarial attacks. To obtain the large number of networks needed for this study, we adopt one-shot neural architecture search, training a large network for once and then finetuning the sub-networks sampled therefrom. The sampled architectures together with the accuracies they achieve provide a rich basis for our study. Our “robust architecture Odyssey” reveals several valuable observations: 1) densely connected patterns result in improved robustness; 2) under computational budget, adding convolution operations to direct connection edge is effective; 3) flow of solution procedure (FSP) matrix is a good indicator of network robustness. Based on these observations, we discover a family of robust architectures (RobNets). On various datasets, including CIFAR, SVHN, Tiny-ImageNet, and ImageNet, RobNets exhibit superior robustness performance to other widely used architectures. Notably, RobNets substantially improve the robust accuracy (~5% absolute gains) under both white-box and black-box attacks, even with fewer parameter numbers. Code is available at https://github.com/gmh14/RobNets.
Tasks	Neural Architecture Search
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10695v3
PDF	https://arxiv.org/pdf/1911.10695v3.pdf
PWC	https://paperswithcode.com/paper/when-nas-meets-robustness-in-search-of-robust
Repo
Framework

Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation


Title	Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation
Authors	Dongjun Lee
Abstract	Most deep learning approaches for text-to-SQL generation are limited to the WikiSQL dataset, which only supports very simple queries over a single table. We focus on the Spider dataset, a complex and cross-domain text-to-SQL task, which includes complex queries over multiple tables. In this paper, we propose a SQL clause-wise decoding neural architecture with a self-attention based database schema encoder to address the Spider task. Each of the clause-specific decoders consists of a set of sub-modules, which is defined by the syntax of each clause. Additionally, our model works recursively to support nested queries. When evaluated on the Spider dataset, our approach achieves 4.6% and 9.8% accuracy gain in the test and dev sets, respectively. In addition, we show that our model is significantly more effective at predicting complex and nested queries than previous work.
Tasks	Text-To-Sql
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08835v2
PDF	https://arxiv.org/pdf/1904.08835v2.pdf
PWC	https://paperswithcode.com/paper/recursive-and-clause-wise-decoding-for
Repo
Framework

Spatio-Temporal Attention Pooling for Audio Scene Classification


Title	Spatio-Temporal Attention Pooling for Audio Scene Classification
Authors	Huy Phan, Oliver Y. Chén, Lam Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, Alfred Mertins
Abstract	Acoustic scenes are rich and redundant in their content. In this work, we present a spatio-temporal attention pooling layer coupled with a convolutional recurrent neural network to learn from patterns that are discriminative while suppressing those that are irrelevant for acoustic scene classification. The convolutional layers in this network learn invariant features from time-frequency input. The bidirectional recurrent layers are then able to encode the temporal dynamics of the resulting convolutional features. Afterwards, a two-dimensional attention mask is formed via the outer product of the spatial and temporal attention vectors learned from two designated attention layers to weigh and pool the recurrent output into a final feature vector for classification. The network is trained with between-class examples generated from between-class data augmentation. Experiments demonstrate that the proposed method not only outperforms a strong convolutional neural network baseline but also sets new state-of-the-art performance on the LITIS Rouen dataset.
Tasks	Acoustic Scene Classification, Data Augmentation, Scene Classification
Published	2019-04-06
URL	https://arxiv.org/abs/1904.03543v2
PDF	https://arxiv.org/pdf/1904.03543v2.pdf
PWC	https://paperswithcode.com/paper/spatio-temporal-attention-pooling-for-audio
Repo
Framework

A Robust Non-Linear and Feature-Selection Image Fusion Theory


Title	A Robust Non-Linear and Feature-Selection Image Fusion Theory
Authors	Aiqing Fang, Xinbo Zhao, Yanning Zhang
Abstract	The human visual perception system has strong robustness in image fusion. This robustness is based on human visual perception system’s characteristics of feature selection and non-linear fusion of different features. In order to simulate the human visual perception mechanism in image fusion tasks, we propose a multi-source image fusion framework that combines illuminance factors and attention mechanisms. The framework effectively combines traditional image features and modern deep learning features. First, we perform multi-scale decomposition of multi-source images. Then, the visual saliency map and the deep feature map are combined with the illuminance fusion factor to perform high-low frequency nonlinear fusion. Secondly, the characteristics of high and low frequency fusion are selected through the channel attention network to obtain the final fusion map. By simulating the nonlinear characteristics and selection characteristics of the human visual perception system in image fusion, the fused image is more in line with the human visual perception mechanism. Finally, we validate our fusion framework on public datasets of infrared and visible images, medical images and multi-focus images. The experimental results demonstrate the superiority of our fusion framework over state-of-arts in visual quality, objective fusion metrics and robustness.
Tasks	Feature Selection
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10738v1
PDF	https://arxiv.org/pdf/1912.10738v1.pdf
PWC	https://paperswithcode.com/paper/a-robust-non-linear-and-feature-selection
Repo
Framework

C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning


Title	C-3PO: Cyclic-Three-Phase Optimization for Human-Robot Motion Retargeting based on Reinforcement Learning
Authors	Taewoo Kim, Joo-Haeng Lee
Abstract	Motion retargeting between heterogeneous polymorphs with different sizes and kinematic configurations requires a comprehensive knowledge of (inverse) kinematics. Moreover, it is non-trivial to provide a kinematic independent general solution. In this study, we developed a cyclic three-phase optimization method based on deep reinforcement learning for human-robot motion retargeting. The motion retargeting learning is performed using refined data in a latent space by the cyclic and filtering paths of our method. In addition, the human-in-the-loop based three-phase approach provides a framework for the improvement of the motion retargeting policy by both quantitative and qualitative manners. Using the proposed C-3PO method, we were successfully able to learn the motion retargeting skill between the human skeleton and motion of the multiple robots such as NAO, Pepper, Baxter and C-3PO.
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11303v3
PDF	https://arxiv.org/pdf/1909.11303v3.pdf
PWC	https://paperswithcode.com/paper/c-3po-cyclic-three-phase-optimization-for
Repo
Framework

Pareto-optimal data compression for binary classification tasks


Title	Pareto-optimal data compression for binary classification tasks
Authors	Max Tegmark, Tailin Wu
Abstract	The goal of lossy data compression is to reduce the storage cost of a data set $X$ while retaining as much information as possible about something ($Y$) that you care about. For example, what aspects of an image $X$ contain the most information about whether it depicts a cat? Mathematically, this corresponds to finding a mapping $X\to Z\equiv f(X)$ that maximizes the mutual information $I(Z,Y)$ while the entropy $H(Z)$ is kept below some fixed threshold. We present a method for mapping out the Pareto frontier for classification tasks, reflecting the tradeoff between retained entropy and class information. We first show how a random variable $X$ (an image, say) drawn from a class $Y\in{1,…,n}$ can be distilled into a vector $W=f(X)\in \mathbb{R}^{n-1}$ losslessly, so that $I(W,Y)=I(X,Y)$; for example, for a binary classification task of cats and dogs, each image $X$ is mapped into a single real number $W$ retaining all information that helps distinguish cats from dogs. For the $n=2$ case of binary classification, we then show how $W$ can be further compressed into a discrete variable $Z=g_\beta(W)\in{1,…,m_\beta}$ by binning $W$ into $m_\beta$ bins, in such a way that varying the parameter $\beta$ sweeps out the full Pareto frontier, solving a generalization of the Discrete Information Bottleneck (DIB) problem. We argue that the most interesting points on this frontier are “corners” maximizing $I(Z,Y)$ for a fixed number of bins $m=2,3…$ which can be conveniently be found without multiobjective optimization. We apply this method to the CIFAR-10, MNIST and Fashion-MNIST datasets, illustrating how it can be interpreted as an information-theoretically optimal image clustering algorithm.
Tasks	Image Clustering, Multiobjective Optimization
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08961v2
PDF	https://arxiv.org/pdf/1908.08961v2.pdf
PWC	https://paperswithcode.com/paper/pareto-optimal-data-compression-for-binary
Repo
Framework


Title	Release Strategies and the Social Impacts of Language Models
Authors	Irene Solaiman, Miles Brundage, Jack Clark, Amanda Askell, Ariel Herbert-Voss, Jeff Wu, Alec Radford, Gretchen Krueger, Jong Wook Kim, Sarah Kreps, Miles McCain, Alex Newhouse, Jason Blazakis, Kris McGuffie, Jasmine Wang
Abstract	Large language models have a range of beneficial uses: they can assist in prose, poetry, and programming; analyze dataset biases; and more. However, their flexibility and generative capabilities also raise misuse concerns. This report discusses OpenAI’s work related to the release of its GPT-2 language model. It discusses staged release, which allows time between model releases to conduct risk and benefit analyses as model sizes increased. It also discusses ongoing partnership-based research and provides recommendations for better coordination and responsible publication in AI.
Tasks	Language Modelling
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09203v2
PDF	https://arxiv.org/pdf/1908.09203v2.pdf
PWC	https://paperswithcode.com/paper/release-strategies-and-the-social-impacts-of
Repo
Framework

A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection


Title	A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection
Authors	Dongwon Park, Yonghyeok Seo, Dongju Shin, Jaesik Choi, Se Young Chun
Abstract	Recently, robotic grasp detection (GD) and object detection (OD) with reasoning have been investigated using deep neural networks (DNNs). There have been works to combine these multi-tasks using separate networks so that robots can deal with situations of grasping specific target objects in the cluttered, stacked, complex piles of novel objects from a single RGB-D camera. We propose a single multi-task DNN that yields the information on GD, OD and relationship reasoning among objects with a simple post-processing. Our proposed methods yielded state-of-the-art performance with the accuracy of 98.6% and 74.2% and the computation speed of 33 and 62 frame per second on VMRD and Cornell datasets, respectively. Our methods also yielded 95.3% grasp success rate for single novel object grasping with a 4-axis robot arm and 86.7% grasp success rate in cluttered novel objects with a Baxter robot.
Tasks	Object Detection
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07050v1
PDF	https://arxiv.org/pdf/1909.07050v1.pdf
PWC	https://paperswithcode.com/paper/a-single-multi-task-deep-neural-network-with
Repo
Framework

Neuromorphic In-Memory Computing Framework using Memtransistor Cross-bar based Support Vector Machines


Title	Neuromorphic In-Memory Computing Framework using Memtransistor Cross-bar based Support Vector Machines
Authors	P. Kumar, A. R. Nair, O. Chatterjee, T. Paul, A. Ghosh, S. Chakrabartty, C. S. Thakur
Abstract	This paper presents a novel framework for designing support vector machines (SVMs), which does not impose restriction on the SVM kernel to be positive-definite and allows the user to define memory constraint in terms of fixed template vectors. This makes the framework scalable and enables its implementation for low-power, high-density and memory constrained embedded application. An efficient hardware implementation of the same is also discussed, which utilizes novel low power memtransistor based cross-bar architecture, and is robust to device mismatch and randomness. We used memtransistor measurement data, and showed that the designed SVMs can achieve classification accuracy comparable to traditional SVMs on both synthetic and real-world benchmark datasets. This framework would be beneficial for design of SVM based wake-up systems for internet of things (IoTs) and edge devices where memtransistors can be used to optimize system’s energy-efficiency and perform in-memory matrix-vector multiplication (MVM).
Tasks
Published	2019-03-29
URL	https://arxiv.org/abs/1903.12330v2
PDF	https://arxiv.org/pdf/1903.12330v2.pdf
PWC	https://paperswithcode.com/paper/neuromorphic-in-memory-computing-framework
Repo
Framework

A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control


Title	A Deep Reinforcement Learning Architecture for Multi-stage Optimal Control
Authors	Yuguang Yang
Abstract	Deep reinforcement learning for high dimensional, hierarchical control tasks usually requires the use of complex neural networks as functional approximators, which can lead to inefficiency, instability and even divergence in the training process. Here, we introduce stacked deep Q learning (SDQL), a flexible modularized deep reinforcement learning architecture, that can enable finding of optimal control policy of control tasks consisting of multiple linear stages in a stable and efficient way. SDQL exploits the linear stage structure by approximating the Q function via a collection of deep Q sub-networks stacking along an axis marking the stage-wise progress of the whole task. By back-propagating the learned state values from later stages to earlier stages, all sub-networks co-adapt to maximize the total reward of the whole task, although each sub-network is responsible for learning optimal control policy for its own stage. This modularized architecture offers considerable flexibility in terms of environment and policy modeling, as it allows choices of different state spaces, action spaces, reward structures, and Q networks for each stage, Further, the backward stage-wise training procedure of SDQL can offers additional transparency, stability, and flexibility to the training process, thus facilitating model fine-tuning and hyper-parameter search. We demonstrate that SDQL is capable of learning competitive strategies for problems with characteristics of high-dimensional state space, heterogeneous action space(both discrete and continuous), multiple scales, and sparse and delayed rewards.
Tasks	Q-Learning
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10684v1
PDF	https://arxiv.org/pdf/1911.10684v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-reinforcement-learning-architecture
Repo
Framework

Single Image Deraining: From Model-Based to Data-Driven and Beyond


Title	Single Image Deraining: From Model-Based to Data-Driven and Beyond
Authors	Wenhan Yang, Robby T. Tan, Shiqi Wang, Yuming Fang, Jiaying Liu
Abstract	The goal of single-image deraining is to restore the rain-free background scenes of an image degraded by rain streaks and rain accumulation. The early single-image deraining methods employ a cost function, where various priors are developed to represent the properties of rain and background layers. Since 2017, single-image deraining methods step into a deep-learning era, and exploit various types of networks, i.e. convolutional neural networks, recurrent neural networks, generative adversarial networks, etc., demonstrating impressive performance. Given the current rapid development, in this paper, we provide a comprehensive survey of deraining methods over the last decade. We summarize the rain appearance models, and discuss two categories of deraining approaches: model-based and data-driven approaches. For the former, we organize the literature based on their basic models and priors. For the latter, we discuss developed ideas related to architectures, constraints, loss functions, and training datasets. We present milestones of single-image deraining methods, review a broad selection of previous works in different categories, and provide insights on the historical development route from the model-based to data-driven methods. We also summarize performance comparisons quantitatively and qualitatively. Beyond discussing the technicality of deraining methods, we also discuss the future directions.
Tasks	Rain Removal, Single Image Deraining
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07150v2
PDF	https://arxiv.org/pdf/1912.07150v2.pdf
PWC	https://paperswithcode.com/paper/single-image-deraining-from-model-based-to
Repo
Framework

Fair and Unbiased Algorithmic Decision Making: Current State and Future Challenges


Title	Fair and Unbiased Algorithmic Decision Making: Current State and Future Challenges
Authors	Songül Tolan
Abstract	Machine learning algorithms are now frequently used in sensitive contexts that substantially affect the course of human lives, such as credit lending or criminal justice. This is driven by the idea that `objective’ machines base their decisions solely on facts and remain unaffected by human cognitive biases, discriminatory tendencies or emotions. Yet, there is overwhelming evidence showing that algorithms can inherit or even perpetuate human biases in their decision making when they are based on data that contains biased human decisions. This has led to a call for fairness-aware machine learning. However, fairness is a complex concept which is also reflected in the attempts to formalize fairness for algorithmic decision making. Statistical formalizations of fairness lead to a long list of criteria that are each flawed (or harmful even) in different contexts. Moreover, inherent tradeoffs in these criteria make it impossible to unify them in one general framework. Thus, fairness constraints in algorithms have to be specific to the domains to which the algorithms are applied. In the future, research in algorithmic decision making systems should be aware of data and developer biases and add a focus on transparency to facilitate regular fairness audits. \|
Tasks	Decision Making
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04730v1
PDF	http://arxiv.org/pdf/1901.04730v1.pdf
PWC	https://paperswithcode.com/paper/fair-and-unbiased-algorithmic-decision-making
Repo
Framework

BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding


Title	BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding
Authors	Gencer Sumbul, Marcela Charfuelan, Begüm Demir, Volker Markl
Abstract	This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The BigEarthNet is significantly larger than the existing archives in remote sensing (RS) and thus is much more convenient to be used as a training source in the context of deep learning. This paper first addresses the limitations of the existing archives and then describes the properties of the BigEarthNet. Experimental results obtained in the framework of RS image scene classification problems show that a shallow Convolutional Neural Network (CNN) architecture trained on the BigEarthNet provides much higher accuracy compared to a state-of-the-art CNN model pre-trained on the ImageNet (which is a very popular large-scale benchmark archive in computer vision). The BigEarthNet opens up promising directions to advance operational RS applications and research in massive Sentinel-2 image archives.
Tasks	Scene Classification
Published	2019-02-16
URL	https://arxiv.org/abs/1902.06148v3
PDF	https://arxiv.org/pdf/1902.06148v3.pdf
PWC	https://paperswithcode.com/paper/bigearthnet-a-large-scale-benchmark-archive
Repo
Framework

A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification


Title	A limited-size ensemble of homogeneous CNN/LSTMs for high-performance word classification
Authors	Mahya Ameryan, Lambert Schomaker
Abstract	In recent years, long short-term memory neural networks (LSTMs) have been applied quite successfully to problems in handwritten text recognition. However, their strength is more located in handling sequences of variable length than in handling geometric variability of the image patterns. Furthermore, the best results for LSTMs are often based on large-scale training of an ensemble of network instances. In this paper, an end-to-end convolutional LSTM Neural Network is used to handle both geometric variation and sequence variability. We show that high performances can be reached on a common benchmark set by using proper data augmentation for just five such networks using a proper coding scheme and a proper voting scheme. The networks have similar architectures (Convolutional Neural Network (CNN): five layers, bidirectional LSTM (BiLSTM): three layers followed by a connectionist temporal classification (CTC) processing step). The approach assumes differently-scaled input images and different feature map sizes. Two datasets are used for evaluation of the performance of our algorithm: A standard benchmark RIMES dataset (French), and a historical handwritten dataset KdK (Dutch). Final performance obtained for the word-recognition test of RIMES was 96.6%, a clear improvement over other state-of-the-art approaches. On the KdK dataset, our approach also shows good results. The proposed approach is deployed in the Monk search engine for historical-handwriting collections.
Tasks	Data Augmentation
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03223v1
PDF	https://arxiv.org/pdf/1912.03223v1.pdf
PWC	https://paperswithcode.com/paper/a-limited-size-ensemble-of-homogeneous
Repo
Framework

Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy


Title	Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy
Authors	Abulikemu Abuduweili, Changliu Liu
Abstract	High fidelity behavior prediction of intelligent agents is critical in many applications. However, the prediction model trained on the training set may not generalize to the testing set due to domain shift and time variance. The challenge motivates the adoption of online adaptation algorithms to update prediction models in real-time to improve the prediction performance. Inspired by Extended Kalman Filter (EKF), this paper introduces a series of online adaptation methods, which are applicable to neural network-based models. A base adaptation algorithm Modified EKF with forgetting factor (MEKF$_\lambda$) is introduced first, followed by exponential moving average filtering techniques. Then this paper introduces a dynamic multi-epoch update strategy to effectively utilize samples received in real time. With all these extensions, we propose a robust online adaptation algorithm: MEKF with Exponential Moving Average and Dynamic Multi-Epoch strategy (MEKF$_{\text{EMA-DME}}$). The proposed algorithm outperforms existing methods as demonstrated in experiments.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01790v2
PDF	https://arxiv.org/pdf/1912.01790v2.pdf
PWC	https://paperswithcode.com/paper/robust-online-model-adaptation-by-extended
Repo
Framework