April 3, 2020

3163 words 15 mins read

Paper Group AWR 27

Discrete Action On-Policy Learning with Action-Value Critic. TAdam: A Robust Stochastic Gradient Optimizer. Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images. To Share or Not To Share: A Comprehensive Appraisal of Weight-Sharing. Negative Margin Matters: Understanding Margin in Few-shot Classification. …

Discrete Action On-Policy Learning with Action-Value Critic


Title	Discrete Action On-Policy Learning with Action-Value Critic
Authors	Yuguang Yue, Yunhao Tang, Mingzhang Yin, Mingyuan Zhou
Abstract	Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently. To effectively operate in multidimensional discrete action spaces, we construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. We follow rigorous statistical analysis to design how to generate and combine these correlated actions, and how to sparsify the gradients by shutting down the contributions from certain dimensions. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques. We demonstrate these properties on OpenAI Gym benchmark tasks, and illustrate how discretizing the action space could benefit the exploration phase and hence facilitate convergence to a better local optimal solution thanks to the flexibility of discrete policy.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03534v2
PDF	https://arxiv.org/pdf/2002.03534v2.pdf
PWC	https://paperswithcode.com/paper/discrete-action-on-policy-learning-with
Repo	https://github.com/yuguangyue/CARSM
Framework	tf

TAdam: A Robust Stochastic Gradient Optimizer


Title	TAdam: A Robust Stochastic Gradient Optimizer
Authors	Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
Abstract	Machine learning algorithms aim to find patterns from observations, which may include some noise, especially in robotics domain. To perform well even with such noise, we expect them to be able to detect outliers and discard them when needed. We therefore propose a new stochastic gradient optimization method, whose robustness is directly built in the algorithm, using the robust student-t distribution as its core idea. Adam, the popular optimization method, is modified with our method and the resultant optimizer, so-called TAdam, is shown to effectively outperform Adam in terms of robustness against noise on diverse task, ranging from regression and classification to reinforcement learning problems. The implementation of our algorithm can be found at https://github.com/Mahoumaru/TAdam.git
Tasks
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00179v2
PDF	https://arxiv.org/pdf/2003.00179v2.pdf
PWC	https://paperswithcode.com/paper/tadam-a-robust-stochastic-gradient-optimizer
Repo	https://github.com/Mahoumaru/TAdam
Framework	pytorch

Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images


Title	Random Smoothing Might be Unable to Certify $\ell_\infty$ Robustness for High-Dimensional Images
Authors	Avrim Blum, Travis Dick, Naren Manoj, Hongyang Zhang
Abstract	We show a hardness result for random smoothing to achieve certified adversarial robustness against attacks in the $\ell_p$ ball of radius $\epsilon$ when $p>2$. Although random smoothing has been well understood for the $\ell_2$ case using the Gaussian distribution, much remains unknown concerning the existence of a noise distribution that works for the case of $p>2$. This has been posed as an open problem by Cohen et al. (2019) and includes many significant paradigms such as the $\ell_\infty$ threat model. In this work, we show that any noise distribution $\mathcal{D}$ over $\mathbb{R}^d$ that provides $\ell_p$ robustness for all base classifiers with $p>2$ must satisfy $\mathbb{E}\eta_i^2=\Omega(d^{1-2/p}\epsilon^2(1-\delta)/\delta^2)$ for 99% of the features (pixels) of vector $\eta\sim\mathcal{D}$, where $\epsilon$ is the robust radius and $\delta$ is the score gap between the highest-scored class and the runner-up. Therefore, for high-dimensional images with pixel values bounded in $[0,255]$, the required noise will eventually dominate the useful information in the images, leading to trivial smoothed classifiers.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03517v3
PDF	https://arxiv.org/pdf/2002.03517v3.pdf
PWC	https://paperswithcode.com/paper/random-smoothing-might-be-unable-to-certify
Repo	https://github.com/hongyanz/TRADES-smoothing
Framework	pytorch


Title	To Share or Not To Share: A Comprehensive Appraisal of Weight-Sharing
Authors	Aloïs Pourchot, Alexis Ducarouge, Olivier Sigaud
Abstract	Weight-sharing (WS) has recently emerged as a paradigm to accelerate the automated search for efficient neural architectures, a process dubbed Neural Architecture Search (NAS). Although very appealing, this framework is not without drawbacks and several works have started to question its capabilities on small hand-crafted benchmarks. In this paper, we take advantage of the NASBench-101 dataset to challenge the efficiency of WS on a representative search space. By comparing a SOTA WS approach to a plain random search we show that, despite decent correlations between evaluations using weight-sharing and standalone ones, WS is only rarely helpful to NAS. We highlight in particular the reliance of the benefits on the search space itself.
Tasks	Neural Architecture Search
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04289v1
PDF	https://arxiv.org/pdf/2002.04289v1.pdf
PWC	https://paperswithcode.com/paper/to-share-or-not-to-share-a-comprehensive
Repo	https://github.com/apourchot/to_share_or_not_to_share
Framework	pytorch

Negative Margin Matters: Understanding Margin in Few-shot Classification


Title	Negative Margin Matters: Understanding Margin in Few-shot Classification
Authors	Bin Liu, Yue Cao, Yutong Lin, Qi Li, Zheng Zhang, Mingsheng Long, Han Hu
Abstract	This paper introduces a negative margin loss to metric learning based few-shot learning methods. The negative margin loss significantly outperforms regular softmax loss, and achieves state-of-the-art accuracy on three standard few-shot classification benchmarks with few bells and whistles. These results are contrary to the common practice in the metric learning field, that the margin is zero or positive. To understand why the negative margin loss performs well for the few-shot classification, we analyze the discriminability of learned features w.r.t different margins for training and novel classes, both empirically and theoretically. We find that although negative margin reduces the feature discriminability for training classes, it may also avoid falsely mapping samples of the same novel class to multiple peaks or clusters, and thus benefit the discrimination of novel classes. Code is available at https://github.com/bl0/negative-margin.few-shot.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Metric Learning
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12060v1
PDF	https://arxiv.org/pdf/2003.12060v1.pdf
PWC	https://paperswithcode.com/paper/negative-margin-matters-understanding-margin
Repo	https://github.com/bl0/negative-margin.few-shot
Framework	pytorch

Monocular Depth Prediction Through Continuous 3D Loss


Title	Monocular Depth Prediction Through Continuous 3D Loss
Authors	Minghan Zhu, Maani Ghaffari, Yuanxin Zhong, Pingping Lu, Zhong Cao, Ryan M. Eustice, Huei Peng
Abstract	This paper reports a new continuous 3D loss function for learning depth from monocular images. The dense depth prediction from a monocular image is supervised using sparse LIDAR points, exploiting available data from camera-LIDAR sensor suites during training. Currently, accurate and affordable range sensor is not available. Stereo cameras and LIDARs measure depth either inaccurately or sparsely/costly. In contrast to the current point-to-point loss evaluation approach, the proposed 3D loss treats point clouds as continuous objects; and therefore, it overcomes the lack of dense ground truth depth due to the sparsity of LIDAR measurements. Experimental evaluations show that the proposed method achieves accurate depth measurement with consistent 3D geometric structures through a monocular camera.
Tasks	Depth Estimation
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09763v1
PDF	https://arxiv.org/pdf/2003.09763v1.pdf
PWC	https://paperswithcode.com/paper/monocular-depth-prediction-through-continuous
Repo	https://github.com/minghanz/DepthC3D
Framework	pytorch

Identification of Chimera using Machine Learning


Title	Identification of Chimera using Machine Learning
Authors	M. A. Ganaie, Saptarshi Ghosh, Naveen Mendola, M Tanveer, Sarika Jalan
Abstract	Coupled dynamics on the network models have been tremendously helpful in getting insight into complex spatiotemporal dynamical patterns of a wide variety of large-scale real-world complex systems. Chimera, a state of coexistence of incoherence and coherence, is one of such patterns arising in identically coupled oscillators, which has recently drawn tremendous attention due to its peculiar nature and wide applicability, specially in neuroscience. The identification of chimera is a challenging problem due to ambiguity in its appearance. We present a distinctive approach to identify and characterize the chimera state using machine learning techniques, namely random forest, oblique random forests via multi-surface proximal support vector machines (MPRaF-T, P, N) and sparse pre-trained / auto-encoder based random vector functional link neural network (RVFL-AE). We demonstrate high accuracy in identifying the coherent, incoherent and chimera states from given spatial profiles. We validate this approach for different time-continuous and time discrete coupled dynamics on networks. This work provides a direction for employing machine learning techniques to identify dynamical patterns arising due to the interaction among non-linear units on large-scale, and for characterizing complex spatio-temporal phenomena in real-world systems for various applications.
Tasks
Published	2020-01-16
URL	https://arxiv.org/abs/2001.08985v1
PDF	https://arxiv.org/pdf/2001.08985v1.pdf
PWC	https://paperswithcode.com/paper/identification-of-chimera-using-machine
Repo	https://github.com/complex-systems-lab/Project_Chimera_ML
Framework	none

Exposing Backdoors in Robust Machine Learning Models


Title	Exposing Backdoors in Robust Machine Learning Models
Authors	Ezekiel Soremekun, Sakshi Udeshi, Sudipta Chattopadhyay, Andreas Zeller
Abstract	The introduction of robust optimisation has pushed the state-of-the-art in defending against adversarial attacks. However, the behaviour of such optimisation has not been studied in the light of a fundamentally different class of attacks called backdoors. In this paper, we demonstrate that adversarially robust models are susceptible to backdoor attacks. Subsequently, we observe that backdoors are reflected in the feature representation of such models. Then, this is leveraged to detect backdoor-infected models. Specifically, we use feature clustering to effectively detect backdoor-infected robust Deep Neural Networks (DNNs). In our evaluation of major classification tasks, our approach effectively detects robust DNNs infected with backdoors. Our investigation reveals that salient features of adversarially robust DNNs break the stealthy nature of backdoor attacks.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2003.00865v1
PDF	https://arxiv.org/pdf/2003.00865v1.pdf
PWC	https://paperswithcode.com/paper/exposing-backdoors-in-robust-machine-learning
Repo	https://github.com/sakshiudeshi/Expose-Robust-Backdoors
Framework	pytorch

Learning distributed representations of graphs with Geo2DR


Title	Learning distributed representations of graphs with Geo2DR
Authors	Paul Scherer, Pietro Lio
Abstract	We present Geo2DR, a Python library for unsupervised learning on graph-structured data using discrete substructure patterns and neural language models. It contains efficient implementations of popular graph decomposition algorithms and neural language models in PyTorch which are combined to learn representations using the distributive hypothesis. Furthermore, Geo2DR comes with general data processing and loading methods which can bring substantial speed-up in the training of the neural language models. Through this we provide a unified set of tools and design methodology to quickly construct systems capable of learning distributed representations of graphs. This is useful for replication of existing methods, modification, or even creation of novel systems. This work serves to present the Geo2DR library and perform a comprehensive comparative analysis of existing methods re-implemented using Geo2DR across several widely used graph classification benchmarks. We show a high reproducibility of results in published methods and interoperability with other libraries useful for distributive language modelling.
Tasks	Graph Classification, Language Modelling
Published	2020-03-12
URL	https://arxiv.org/abs/2003.05926v2
PDF	https://arxiv.org/pdf/2003.05926v2.pdf
PWC	https://paperswithcode.com/paper/learning-distributed-representations-of-3
Repo	https://github.com/paulmorio/geo2dr
Framework	pytorch

Hierarchical Conditional Relation Networks for Video Question Answering


Title	Hierarchical Conditional Relation Networks for Video Question Answering
Authors	Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
Abstract	Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts. We introduce a general-purpose reusable neural unit called Conditional Relation Network (CRN) that serves as a building block to construct more sophisticated structures for representation and reasoning over video. CRN takes as input an array of tensorial objects and a conditioning feature, and computes an array of encoded output objects. Model building becomes a simple exercise of replication, rearrangement and stacking of these reusable units for diverse modalities and contextual information. This design thus supports high-order relational and multi-step reasoning. The resulting architecture for VideoQA is a CRN hierarchy whose branches represent sub-videos or clips, all sharing the same question as the contextual condition. Our evaluations on well-known datasets achieved new SoTA results, demonstrating the impact of building a general-purpose reasoning unit on complex domains such as VideoQA.
Tasks	Question Answering, Video Question Answering
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10698v3
PDF	https://arxiv.org/pdf/2002.10698v3.pdf
PWC	https://paperswithcode.com/paper/hierarchical-conditional-relation-networks
Repo	https://github.com/thaolmk54/hcrn-videoqa
Framework	pytorch

FusionLane: Multi-Sensor Fusion for Lane Marking Semantic Segmentation Using Deep Neural Networks


Title	FusionLane: Multi-Sensor Fusion for Lane Marking Semantic Segmentation Using Deep Neural Networks
Authors	Ruochen Yin, Biao Yu, Huapeng Wu, Yutao Song, Runxin Niu
Abstract	It is a crucial step to achieve effective semantic segmentation of lane marking during the construction of the lane level high-precision map. In recent years, many image semantic segmentation methods have been proposed. These methods mainly focus on the image from camera, due to the limitation of the sensor itself, the accurate three-dimensional spatial position of the lane marking cannot be obtained, so the demand for the lane level high-precision map construction cannot be met. This paper proposes a lane marking semantic segmentation method based on LIDAR and camera fusion deep neural network. Different from other methods, in order to obtain accurate position information of the segmentation results, the semantic segmentation object of this paper is a bird’s eye view converted from a LIDAR points cloud instead of an image captured by a camera. This method first uses the deeplabv3+ [\ref{ref:1}] network to segment the image captured by the camera, and the segmentation result is merged with the point clouds collected by the LIDAR as the input of the proposed network. In this neural network, we also add a long short-term memory (LSTM) structure to assist the network for semantic segmentation of lane markings by using the the time series information. The experiments on more than 14,000 image datasets which we have manually labeled and expanded have shown the proposed method has better performance on the semantic segmentation of the points cloud bird’s eye view. Therefore, the automation of high-precision map construction can be significantly improved. Our code is available at https://github.com/rolandying/FusionLane.
Tasks	Semantic Segmentation, Sensor Fusion, Time Series
Published	2020-03-09
URL	https://arxiv.org/abs/2003.04404v1
PDF	https://arxiv.org/pdf/2003.04404v1.pdf
PWC	https://paperswithcode.com/paper/fusionlane-multi-sensor-fusion-for-lane
Repo	https://github.com/rolandying/FusionLane
Framework	tf

Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values


Title	Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small p-values
Authors	Sebastian Weichwald, Martin E Jakobsen, Phillip B Mogensen, Lasse Petersen, Nikolaj Thams, Gherardo Varando
Abstract	In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semi-realistic and realistic time series data exhibiting common challenges in real-world Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in non-linear systems, b) a simulation-backed explanation as to why large regression coefficients may predict causal links better in practice than small p-values and thus why normalising the data may sometimes hinder causal structure learning. For benchmark usage, we provide implementations at https://github.com/sweichwald/tidybench and detail the algorithms here. We propose the presented competition-proven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series.
Tasks	Time Series
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09573v1
PDF	https://arxiv.org/pdf/2002.09573v1.pdf
PWC	https://paperswithcode.com/paper/causal-structure-learning-from-time-series
Repo	https://github.com/sweichwald/tidybench
Framework	none

Speech2Phone: A Multilingual and Text Independent Speaker Identification Model


Title	Speech2Phone: A Multilingual and Text Independent Speaker Identification Model
Authors	Edresson Casanova, Arnaldo Candido Junior, Christopher Shulby, Hamilton Pereira da Silva, Pedro Luiz de Paula Filho, Alessandro Ferreira Cordeiro, Victor de Oliveira Guedes, Sandra Maria Aluisio
Abstract	Voice recognition is an area with a wide application potential. Speaker identification is useful in several voice recognition tasks, as seen in voice-based authentication, transcription systems and intelligent personal assistants. Some tasks benefit from open-set models which can handle new speakers without the need of retraining. Audio embeddings for speaker identification is a proposal to solve this issue. However, choosing a suitable model is a difficult task, especially when the training resources are scarce. Besides, it is not always clear whether embeddings are as good as more traditional methods. In this work, we propose the Speech2Phone and compare several embedding models for open-set speaker identification, as well as traditional closed-set models. The models were investigated in the scenario of small datasets, which makes them more applicable to languages in which data scarceness is an issue. The results show that embeddings generated by artificial neural networks are competitive when compared to classical approaches for the task. Considering a testing dataset composed of 20 speakers, the best models reach accuracies of 100% and 76.96% for closed an open set scenarios, respectively. Results suggest that the models can perform language independent speaker identification. Among the tested models, a fully connected one, here presented as Speech2Phone, led to the higher accuracy. Furthermore, the models were tested for different languages showing that the knowledge learned was successfully transferred for close and distant languages to Portuguese (in terms of vocabulary). Finally, the models can scale and can handle more speakers than they were trained for, identifying 150% more speakers while still maintaining 55% accuracy.
Tasks	Speaker Identification
Published	2020-02-25
URL	https://arxiv.org/abs/2002.11213v1
PDF	https://arxiv.org/pdf/2002.11213v1.pdf
PWC	https://paperswithcode.com/paper/speech2phone-a-multilingual-and-text
Repo	https://github.com/Edresson/Speech2Phone
Framework	none

pymoo: Multi-objective Optimization in Python


Title	pymoo: Multi-objective Optimization in Python
Authors	Julian Blank, Kalyanmoy Deb
Abstract	Python has become the programming language of choice for research and industry projects related to data science, machine learning, and deep learning. Since optimization is an inherent part of these research fields, more optimization related frameworks have arisen in the past few years. Only a few of them support optimization of multiple conflicting objectives at a time, but do not provide comprehensive tools for a complete multi-objective optimization task. To address this issue, we have developed pymoo, a multi-objective optimization framework in Python. We provide a guide to getting started with our framework by demonstrating the implementation of an exemplary constrained multi-objective optimization scenario. Moreover, we give a high-level overview of the architecture of pymoo to show its capabilities followed by an explanation of each module and its corresponding sub-modules. The implementations in our framework are customizable and algorithms can be modified/extended by supplying custom operators. Moreover, a variety of single, multi and many-objective test problems are provided and gradients can be retrieved by automatic differentiation out of the box. Also, pymoo addresses practical needs, such as the parallelization of function evaluations, methods to visualize low and high-dimensional spaces, and tools for multi-criteria decision making. For more information about pymoo, readers are encouraged to visit: https://pymoo.org
Tasks	Decision Making
Published	2020-01-22
URL	https://arxiv.org/abs/2002.04504v1
PDF	https://arxiv.org/pdf/2002.04504v1.pdf
PWC	https://paperswithcode.com/paper/pymoo-multi-objective-optimization-in-python
Repo	https://github.com/msu-coinlab/pymoo
Framework	none

Benchmarking Graph Neural Networks


Title	Benchmarking Graph Neural Networks
Authors	Vijay Prakash Dwivedi, Chaitanya K. Joshi, Thomas Laurent, Yoshua Bengio, Xavier Bresson
Abstract	Graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. They have been successfully applied to a myriad of domains including chemistry, physics, social sciences, knowledge graphs, recommendation, and neuroscience. As the field grows, it becomes critical to identify the architectures and key mechanisms which generalize across graphs sizes, enabling us to tackle larger, more complex datasets and domains. Unfortunately, it has been increasingly difficult to gauge the effectiveness of new GNNs and compare models in the absence of a standardized benchmark with consistent experimental settings and large datasets. In this paper, we propose a reproducible GNN benchmarking framework, with the facility for researchers to add new datasets and models conveniently. We apply this benchmarking framework to novel medium-scale graph datasets from mathematical modeling, computer vision, chemistry and combinatorial problems to establish key operations when designing effective GNNs. Precisely, graph convolutions, anisotropic diffusion, residual connections and normalization layers are universal building blocks for developing robust and scalable GNNs.
Tasks	Knowledge Graphs
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00982v1
PDF	https://arxiv.org/pdf/2003.00982v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-graph-neural-networks
Repo	https://github.com/dmlc/dgl
Framework	pytorch