April 3, 2020

# Paper Group AWR 27

Discrete Action On-Policy Learning with Action-Value Critic. TAdam: A Robust Stochastic Gradient Optimizer. Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images. To Share or Not To Share: A Comprehensive Appraisal of Weight-Sharing. Negative Margin Matters: Understanding Margin in Few-shot Classification. …

#### Discrete Action On-Policy Learning with Action-Value Critic

Title Discrete Action On-Policy Learning with Action-Value Critic
Authors Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou
Abstract Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate these properties on OpenAI Gym benchmark tasks, and illustrate how discretizing the action space could benefit the exploration phase and hence facilitate convergence to a better local optimal solution thanks to the flexibility of discrete policy.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03534v2
PDF https://arxiv.org/pdf/2002.03534v2.pdf
PWC https://paperswithcode.com/paper/discrete-action-on-policy-learning-with
Repo https://github.com/yuguangyue/CARSM
Framework tf

Authors Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
Abstract Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea. Adam, the popular optimization method, is modified with our method and the resultant optimizer, so-called TAdam, is shown to effectively outperform Adam in terms of robustness against noise on diverse task, ranging from regression and classification to reinforcement learning problems. The implementation of our algorithm can be found at https://github.com/Mahoumaru/TAdam.git
Published 2020-02-29
URL https://arxiv.org/abs/2003.00179v2
PDF https://arxiv.org/pdf/2003.00179v2.pdf
Framework pytorch

#### Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images

Title Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images
Authors Avrim Blum, Travis Dick, Naren Manoj, Hongyang Zhang
Abstract We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $\ell_p$ ball of radius $\epsilon$ when $p>2$. Although random smoothing has been well understood for the $\ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $\ell_\infty$ threat model. In this work, we show that any noise distribution $\mathcal{D}$ over $\mathbb{R}^d$ that provides $\ell_p$ robustness for all base classifiers with $p>2$ must satisfy $\mathbb{E}\eta_i^2=\Omega(d^{1-2/p}\epsilon^2(1-\delta)/\delta^2)$ for 99% of the features (pixels) of vector $\eta\sim\mathcal{D}$, where $\epsilon$ is the robust radius and $\delta$ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03517v3
PDF https://arxiv.org/pdf/2002.03517v3.pdf
PWC https://paperswithcode.com/paper/random-smoothing-might-be-unable-to-certify
Framework pytorch

#### To Share or Not To Share: A Comprehensive Appraisal of Weight-Sharing

Title To Share or Not To Share: A Comprehensive Appraisal of Weight-Sharing
Authors Aloïs Pourchot, Alexis Ducarouge, Olivier Sigaud
Abstract Weight-sharing (WS) has recently emerged as a paradigm to accelerate the automated search for efficient neural architectures, a process dubbed Neural Architecture Search (NAS). Although very appealing, this framework is not without drawbacks and several works have started to question its capabilities on small hand-crafted benchmarks. In this paper, we take advantage of the NASBench-101 dataset to challenge the efficiency of WS on a representative search space. By comparing a SOTA WS approach to a plain random search we show that, despite decent correlations between evaluations using weight-sharing and standalone ones, WS is only rarely helpful to NAS. We highlight in particular the reliance of the benefits on the search space itself.
Published 2020-02-11
URL https://arxiv.org/abs/2002.04289v1
PDF https://arxiv.org/pdf/2002.04289v1.pdf
PWC https://paperswithcode.com/paper/to-share-or-not-to-share-a-comprehensive
Repo https://github.com/apourchot/to_share_or_not_to_share
Framework pytorch

#### Negative Margin Matters: Understanding Margin in Few-shot Classification

Title Negative Margin Matters: Understanding Margin in Few-shot Classification
Authors Bin Liu, Yue Cao, Yutong Lin, Qi Li, Zheng Zhang, Mingsheng Long, Han Hu
Abstract This paper introduces a negative margin loss to metric learning based few-shot learning methods. The negative margin loss significantly outperforms regular softmax loss, and achieves state-of-the-art accuracy on three standard few-shot classification benchmarks with few bells and whistles. These results are contrary to the common practice in the metric learning field, that the margin is zero or positive. To understand why the negative margin loss performs well for the few-shot classification, we analyze the discriminability of learned features w.r.t different margins for training and novel classes, both empirically and theoretically. We find that although negative margin reduces the feature discriminability for training classes, it may also avoid falsely mapping samples of the same novel class to multiple peaks or clusters, and thus benefit the discrimination of novel classes. Code is available at https://github.com/bl0/negative-margin.few-shot.
Tasks Few-Shot Image Classification, Few-Shot Learning, Metric Learning
Published 2020-03-26
URL https://arxiv.org/abs/2003.12060v1
PDF https://arxiv.org/pdf/2003.12060v1.pdf
PWC https://paperswithcode.com/paper/negative-margin-matters-understanding-margin
Repo https://github.com/bl0/negative-margin.few-shot
Framework pytorch

#### Monocular Depth Prediction Through Continuous 3D Loss

Title Monocular Depth Prediction Through Continuous 3D Loss
Authors Minghan Zhu, Maani Ghaffari, Yuanxin Zhong, Pingping Lu, Zhong Cao, Ryan M. Eustice, Huei Peng
Abstract This paper reports a new continuous 3D loss function for learning depth from monocular images. The dense depth prediction from a monocular image is supervised using sparse LIDAR points, exploiting available data from camera-LIDAR sensor suites during training. Currently, accurate and affordable range sensor is not available. Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly. In contrast to the current point-to-point loss evaluation approach, the proposed 3D loss treats point clouds as continuous objects; and therefore, it overcomes the lack of dense ground truth depth due to the sparsity of LIDAR measurements. Experimental evaluations show that the proposed method achieves accurate depth measurement with consistent 3D geometric structures through a monocular camera.
Published 2020-03-21
URL https://arxiv.org/abs/2003.09763v1
PDF https://arxiv.org/pdf/2003.09763v1.pdf
PWC https://paperswithcode.com/paper/monocular-depth-prediction-through-continuous
Repo https://github.com/minghanz/DepthC3D
Framework pytorch

#### Identification of Chimera using Machine Learning

Title Identification of Chimera using Machine Learning
Authors M. A. Ganaie, Saptarshi Ghosh, Naveen Mendola, M Tanveer, Sarika Jalan
Abstract Coupled dynamics on the network models have been tremendously helpful in getting insight into complex spatiotemporal dynamical patterns of a wide variety of large-scale real-world complex systems. Chimera, a state of coexistence of incoherence and coherence, is one of such patterns arising in identically coupled oscillators, which has recently drawn tremendous attention due to its peculiar nature and wide applicability, specially in neuroscience. The identification of chimera is a challenging problem due to ambiguity in its appearance. We present a distinctive approach to identify and characterize the chimera state using machine learning techniques, namely random forest, oblique random forests via multi-surface proximal support vector machines (MPRaF-T, P, N) and sparse pre-trained / auto-encoder based random vector functional link neural network (RVFL-AE). We demonstrate high accuracy in identifying the coherent, incoherent and chimera states from given spatial profiles. We validate this approach for different time-continuous and time discrete coupled dynamics on networks. This work provides a direction for employing machine learning techniques to identify dynamical patterns arising due to the interaction among non-linear units on large-scale, and for characterizing complex spatio-temporal phenomena in real-world systems for various applications.
Published 2020-01-16
URL https://arxiv.org/abs/2001.08985v1
PDF https://arxiv.org/pdf/2001.08985v1.pdf
PWC https://paperswithcode.com/paper/identification-of-chimera-using-machine
Repo https://github.com/complex-systems-lab/Project_Chimera_ML
Framework none

#### Exposing Backdoors in Robust Machine Learning Models

Title Exposing Backdoors in Robust Machine Learning Models
Authors Ezekiel Soremekun, Sakshi Udeshi, Sudipta Chattopadhyay, Andreas Zeller
Abstract The introduction of robust optimisation has pushed the state-of-the-art in defending against adversarial attacks. However, the behaviour of such optimisation has not been studied in the light of a fundamentally different class of attacks called backdoors. In this paper, we demonstrate that adversarially robust models are susceptible to backdoor attacks. Subsequently, we observe that backdoors are reflected in the feature representation of such models. Then, this is leveraged to detect backdoor-infected models. Specifically, we use feature clustering to effectively detect backdoor-infected robust Deep Neural Networks (DNNs). In our evaluation of major classification tasks, our approach effectively detects robust DNNs infected with backdoors. Our investigation reveals that salient features of adversarially robust DNNs break the stealthy nature of backdoor attacks.
Published 2020-02-25
URL https://arxiv.org/abs/2003.00865v1
PDF https://arxiv.org/pdf/2003.00865v1.pdf
PWC https://paperswithcode.com/paper/exposing-backdoors-in-robust-machine-learning
Repo https://github.com/sakshiudeshi/Expose-Robust-Backdoors
Framework pytorch

#### Learning distributed representations of graphs with Geo2DR

Title Learning distributed representations of graphs with Geo2DR
Authors Paul Scherer, Pietro Lio
Abstract We present Geo2DR, a Python library for unsupervised learning on graph-structured data using discrete substructure patterns and neural language models. It contains efficient implementations of popular graph decomposition algorithms and neural language models in PyTorch which are combined to learn representations using the distributive hypothesis. Furthermore, Geo2DR comes with general data processing and loading methods which can bring substantial speed-up in the training of the neural language models. Through this we provide a unified set of tools and design methodology to quickly construct systems capable of learning distributed representations of graphs. This is useful for replication of existing methods, modification, or even creation of novel systems. This work serves to present the Geo2DR library and perform a comprehensive comparative analysis of existing methods re-implemented using Geo2DR across several widely used graph classification benchmarks. We show a high reproducibility of results in published methods and interoperability with other libraries useful for distributive language modelling.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05926v2
PDF https://arxiv.org/pdf/2003.05926v2.pdf
PWC https://paperswithcode.com/paper/learning-distributed-representations-of-3
Repo https://github.com/paulmorio/geo2dr
Framework pytorch

#### Hierarchical Conditional Relation Networks for Video Question Answering

Title Hierarchical Conditional Relation Networks for Video Question Answering
Authors Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
Abstract Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts. We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated structures for representation and reasoning over video. CRN takes as input an array of tensorial objects and a conditioning feature, and computes an array of encoded output objects. Model building becomes a simple exercise of replication, rearrangement and stacking of these reusable units for diverse modalities and contextual information. This design thus supports high-order relational and multi-step reasoning. The resulting architecture for VideoQA is a CRN hierarchy whose branches represent sub-videos or clips, all sharing the same question as the contextual condition. Our evaluations on well-known datasets achieved new SoTA results, demonstrating the impact of building a general-purpose reasoning unit on complex domains such as VideoQA.
Published 2020-02-25
URL https://arxiv.org/abs/2002.10698v3
PDF https://arxiv.org/pdf/2002.10698v3.pdf
PWC https://paperswithcode.com/paper/hierarchical-conditional-relation-networks
Repo https://github.com/thaolmk54/hcrn-videoqa
Framework pytorch

#### FusionLane: Multi-Sensor Fusion for Lane Marking Semantic Segmentation Using Deep Neural Networks

Title FusionLane: Multi-Sensor Fusion for Lane Marking Semantic Segmentation Using Deep Neural Networks
Authors Ruochen Yin, Biao Yu, Huapeng Wu, Yutao Song, Runxin Niu
Abstract It is a crucial step to achieve effective semantic segmentation of lane marking during the construction of the lane level high-precision map. In recent years, many image semantic segmentation methods have been proposed. These methods mainly focus on the image from camera, due to the limitation of the sensor itself, the accurate three-dimensional spatial position of the lane marking cannot be obtained, so the demand for the lane level high-precision map construction cannot be met. This paper proposes a lane marking semantic segmentation method based on LIDAR and camera fusion deep neural network. Different from other methods, in order to obtain accurate position information of the segmentation results, the semantic segmentation object of this paper is a bird’s eye view converted from a LIDAR points cloud instead of an image captured by a camera. This method first uses the deeplabv3+ [\ref{ref:1}] network to segment the image captured by the camera, and the segmentation result is merged with the point clouds collected by the LIDAR as the input of the proposed network. In this neural network, we also add a long short-term memory (LSTM) structure to assist the network for semantic segmentation of lane markings by using the the time series information. The experiments on more than 14,000 image datasets which we have manually labeled and expanded have shown the proposed method has better performance on the semantic segmentation of the points cloud bird’s eye view. Therefore, the automation of high-precision map construction can be significantly improved. Our code is available at https://github.com/rolandying/FusionLane.
Tasks Semantic Segmentation, Sensor Fusion, Time Series
Published 2020-03-09
URL https://arxiv.org/abs/2003.04404v1
PDF https://arxiv.org/pdf/2003.04404v1.pdf
PWC https://paperswithcode.com/paper/fusionlane-multi-sensor-fusion-for-lane
Repo https://github.com/rolandying/FusionLane
Framework tf
Title Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values
Authors Sebastian Weichwald, Martin E Jakobsen, Phillip B Mogensen, Lasse Petersen, Nikolaj Thams, Gherardo Varando
Abstract In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in non-linear systems, b) a simulation-backed explanation as to why large regression coefficients may predict causal links better in practice than small p-values and thus why normalising the data may sometimes hinder causal structure learning. For benchmark usage, we provide implementations at https://github.com/sweichwald/tidybench and detail the algorithms here. We propose the presented competition-proven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series.
Published 2020-02-21
URL https://arxiv.org/abs/2002.09573v1
PDF https://arxiv.org/pdf/2002.09573v1.pdf
PWC https://paperswithcode.com/paper/causal-structure-learning-from-time-series
Repo https://github.com/sweichwald/tidybench
Framework none

#### Speech2Phone: A Multilingual and Text Independent Speaker Identification Model

Title Speech2Phone: A Multilingual and Text Independent Speaker Identification Model
Authors Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Hamilton Pereira da Silva, Pedro Luiz de Paula Filho, Alessandro Ferreira Cordeiro, Victor de Oliveira Guedes, Sandra Maria Aluisio
Abstract Voice recognition is an area with a wide application potential. Speaker identification is useful in several voice recognition tasks, as seen in voice-based authentication, transcription systems and intelligent personal assistants. Some tasks benefit from open-set models which can handle new speakers without the need of retraining. Audio embeddings for speaker identification is a proposal to solve this issue. However, choosing a suitable model is a difficult task, especially when the training resources are scarce. Besides, it is not always clear whether embeddings are as good as more traditional methods. In this work, we propose the Speech2Phone and compare several embedding models for open-set speaker identification, as well as traditional closed-set models. The models were investigated in the scenario of small datasets, which makes them more applicable to languages in which data scarceness is an issue. The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Considering a testing dataset composed of 20 speakers, the best models reach accuracies of 100% and 76.96% for closed an open set scenarios, respectively. Results suggest that the models can perform language independent speaker identification. Among the tested models, a fully connected one, here presented as Speech2Phone, led to the higher accuracy. Furthermore, the models were tested for different languages showing that the knowledge learned was successfully transferred for close and distant languages to Portuguese (in terms of vocabulary). Finally, the models can scale and can handle more speakers than they were trained for, identifying 150% more speakers while still maintaining 55% accuracy.
Published 2020-02-25
URL https://arxiv.org/abs/2002.11213v1
PDF https://arxiv.org/pdf/2002.11213v1.pdf
PWC https://paperswithcode.com/paper/speech2phone-a-multilingual-and-text
Repo https://github.com/Edresson/Speech2Phone
Framework none

#### pymoo: Multi-objective Optimization in Python

Title pymoo: Multi-objective Optimization in Python
Authors Julian Blank, Kalyanmoy Deb
Abstract Python has become the programming language of choice for research and industry projects related to data science, machine learning, and deep learning. Since optimization is an inherent part of these research fields, more optimization related frameworks have arisen in the past few years. Only a few of them support optimization of multiple conflicting objectives at a time, but do not provide comprehensive tools for a complete multi-objective optimization task. To address this issue, we have developed pymoo, a multi-objective optimization framework in Python. We provide a guide to getting started with our framework by demonstrating the implementation of an exemplary constrained multi-objective optimization scenario. Moreover, we give a high-level overview of the architecture of pymoo to show its capabilities followed by an explanation of each module and its corresponding sub-modules. The implementations in our framework are customizable and algorithms can be modified/extended by supplying custom operators. Moreover, a variety of single, multi and many-objective test problems are provided and gradients can be retrieved by automatic differentiation out of the box. Also, pymoo addresses practical needs, such as the parallelization of function evaluations, methods to visualize low and high-dimensional spaces, and tools for multi-criteria decision making. For more information about pymoo, readers are encouraged to visit: https://pymoo.org
Published 2020-01-22
URL https://arxiv.org/abs/2002.04504v1
PDF https://arxiv.org/pdf/2002.04504v1.pdf
PWC https://paperswithcode.com/paper/pymoo-multi-objective-optimization-in-python
Repo https://github.com/msu-coinlab/pymoo
Framework none

#### Benchmarking Graph Neural Networks

Title Benchmarking Graph Neural Networks
Authors Vijay Prakash Dwivedi, Chaitanya K. Joshi, Thomas Laurent, Yoshua Bengio, Xavier Bresson
Abstract Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. They have been successfully applied to a myriad of domains including chemistry, physics, social sciences, knowledge graphs, recommendation, and neuroscience. As the field grows, it becomes critical to identify the architectures and key mechanisms which generalize across graphs sizes, enabling us to tackle larger, more complex datasets and domains. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new GNNs and compare models in the absence of a standardized benchmark with consistent experimental settings and large datasets. In this paper, we propose a reproducible GNN benchmarking framework, with the facility for researchers to add new datasets and models conveniently. We apply this benchmarking framework to novel medium-scale graph datasets from mathematical modeling, computer vision, chemistry and combinatorial problems to establish key operations when designing effective GNNs. Precisely, graph convolutions, anisotropic diffusion, residual connections and normalization layers are universal building blocks for developing robust and scalable GNNs.