February 1, 2020

2997 words 15 mins read

Paper Group AWR 237

Star-Transformer. Estimating Solar Irradiance Using Sky Imagers. Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search. Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach. Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation. Joint Learning of Saliency Detecti …

Star-Transformer


Title	Star-Transformer
Authors	Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang
Abstract	Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.
Tasks	Named Entity Recognition, Natural Language Inference, Sentiment Analysis, Text Classification
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09113v2
PDF	http://arxiv.org/pdf/1902.09113v2.pdf
PWC	https://paperswithcode.com/paper/star-transformer
Repo	https://github.com/fastnlp/fastNLP
Framework	pytorch

Estimating Solar Irradiance Using Sky Imagers


Title	Estimating Solar Irradiance Using Sky Imagers
Authors	Soumyabrata Dev, Florian M. Savoy, Yee Hui Lee, Stefan Winkler
Abstract	Ground-based whole sky cameras are extensively used for localized monitoring of clouds nowadays. They capture hemispherical images of the sky at regular intervals using a fisheye lens. In this paper, we propose a framework for estimating solar irradiance from pictures taken by those imagers. Unlike pyranometers, such sky images contain information about cloud coverage and can be used to derive cloud movement. An accurate estimation of solar irradiance using solely those images is thus a first step towards short-term forecasting of solar energy generation based on cloud movement. We derive and validate our model using pyranometers co-located with our whole sky imagers. We achieve a better performance in estimating solar irradiance and in particular its short-term variations as compared to other related methods using ground-based observations.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04981v1
PDF	https://arxiv.org/pdf/1910.04981v1.pdf
PWC	https://paperswithcode.com/paper/estimating-solar-irradiance-using-sky-imagers
Repo	https://github.com/Soumyabrata/estimate-solar-irradiance
Framework	none

Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search


Title	Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search
Authors	Xiangxiang Chu, Bo Zhang, Ruijun Xu, Hailong Ma
Abstract	Fabricating neural models for a wide range of mobile devices demands for a specific design of networks due to highly constrained resources. Both evolution algorithms (EA) and reinforced learning methods (RL) have been dedicated to solve neural architecture search problems. However, these combinations usually concentrate on a single objective such as the error rate of image classification. They also fail to harness the very benefits from both sides. In this paper, we present a new multi-objective oriented algorithm called MoreMNAS (Multi-Objective Reinforced Evolution in Mobile Neural Architecture Search) by leveraging good virtues from both EA and RL. In particular, we incorporate a variant of multi-objective genetic algorithm NSGA-II, in which the search space is composed of various cells so that crossovers and mutations can be performed at the cell level. Moreover, reinforced control is mixed with a natural mutating process to regulate arbitrary mutation, maintaining a delicate balance between exploration and exploitation. Therefore, not only does our method prevent the searched models from degrading during the evolution process, but it also makes better use of learned knowledge. Our experiments conducted in Super-resolution domain (SR) deliver rivalling models compared to some state-of-the-art methods with fewer FLOPS.
Tasks	Image Classification, Neural Architecture Search, Super-Resolution
Published	2019-01-04
URL	http://arxiv.org/abs/1901.01074v3
PDF	http://arxiv.org/pdf/1901.01074v3.pdf
PWC	https://paperswithcode.com/paper/multi-objective-reinforced-evolution-in
Repo	https://github.com/moremnas/MoreMNAS
Framework	tf

Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach


Title	Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach
Authors	Bingfeng Zhang, Jimin Xiao, Yunchao Wei, Mingjie Sun, Kaizhu Huang
Abstract	Weakly supervised semantic segmentation is a challenging task as it only takes image-level information as supervision for training but produces pixel-level predictions for testing. To address such a challenging task, most recent state-of-the-art approaches propose to adopt two-step solutions, \emph{i.e. } 1) learn to generate pseudo pixel-level masks, and 2) engage FCNs to train the semantic segmentation networks with the pseudo masks. However, the two-step solutions usually employ many bells and whistles in producing high-quality pseudo masks, making this kind of methods complicated and inelegant. In this work, we harness the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps. Concretely, we firstly leverage an image classification branch to generate class activation maps for the annotated categories, which are further pruned into confident yet tiny object/background regions. Such reliable regions are then directly served as ground-truth labels for the parallel segmentation branch, where a newly designed dense energy loss function is adopted for optimization. Despite its apparent simplicity, our one-step solution achieves competitive mIoU scores (\emph{val}: 62.6, \emph{test}: 62.9) on Pascal VOC compared with those two-step state-of-the-arts. By extending our one-step method to two-step, we get a new state-of-the-art performance on the Pascal VOC (\emph{val}: 66.3, \emph{test}: 66.5).
Tasks	Image Classification, Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08039v1
PDF	https://arxiv.org/pdf/1911.08039v1.pdf
PWC	https://paperswithcode.com/paper/reliability-does-matter-an-end-to-end-weakly
Repo	https://github.com/zbf1991/RRM
Framework	pytorch

Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation


Title	Self-supervised Scale Equivariant Network for Weakly Supervised Semantic Segmentation
Authors	Yude Wang, Jie Zhang, Meina Kan, Shiguang Shan, Xilin Chen
Abstract	Weakly supervised semantic segmentation has attracted much research interest in recent years considering its advantage of low labeling cost. Most of the advanced algorithms follow the design principle that expands and constrains the seed regions from class activation maps (CAM). As well-known, conventional CAM tends to be incomplete or over-activated due to weak supervision. Fortunately, we find that semantic segmentation has a characteristic of spatial transformation equivariance, which can form a few self-supervisions to help weakly supervised learning. This work mainly explores the advantages of scale equivariant constrains for CAM generation, formulated as a self-supervised scale equivariant network (SSENet). Specifically, a novel scale equivariant regularization is elaborately designed to ensure consistency of CAMs from the same input image with different resolutions. This novel scale equivariant regularization can guide the whole network to learn more accurate class activation. This regularized CAM can be embedded in most recent advanced weakly supervised semantic segmentation framework. Extensive experiments on PASCAL VOC 2012 datasets demonstrate that our method achieves the state-of-the-art performance both quantitatively and qualitatively for weakly supervised semantic segmentation. Code has been made available.
Tasks	Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03714v1
PDF	https://arxiv.org/pdf/1909.03714v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-scale-equivariant-network-for
Repo	https://github.com/YudeWang/SSENet-pytorch
Framework	pytorch

Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation


Title	Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation
Authors	Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang
Abstract	Existing weakly supervised semantic segmentation (WSSS) methods usually utilize the results of pre-trained saliency detection (SD) models without explicitly modeling the connections between the two tasks, which is not the most efficient configuration. Here we propose a unified multi-task learning framework to jointly solve WSSS and SD using a single network, \ie saliency, and segmentation network (SSNet). SSNet consists of a segmentation network (SN) and a saliency aggregation module (SAM). For an input image, SN generates the segmentation result and, SAM predicts the saliency of each category and aggregating the segmentation masks of all categories into a saliency map. The proposed network is trained end-to-end with image-level category labels and class-agnostic pixel-level saliency labels. Experiments on PASCAL VOC 2012 segmentation dataset and four saliency benchmark datasets show the performance of our method compares favorably against state-of-the-art weakly supervised segmentation methods and fully supervised saliency detection methods.
Tasks	Multi-Task Learning, Saliency Detection, Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-09-09
URL	https://arxiv.org/abs/1909.04161v1
PDF	https://arxiv.org/pdf/1909.04161v1.pdf
PWC	https://paperswithcode.com/paper/joint-learning-of-saliency-detection-and
Repo	https://github.com/zengxianyu/jsws
Framework	none

Sparse Reduced-Rank Regression for Simultaneous Rank and Variable Selection via Manifold Optimization


Title	Sparse Reduced-Rank Regression for Simultaneous Rank and Variable Selection via Manifold Optimization
Authors	Kohei Yoshikawa, Shuichi Kawano
Abstract	We consider the problem of constructing a reduced-rank regression model whose coefficient parameter is represented as a singular value decomposition with sparse singular vectors. The traditional estimation procedure for the coefficient parameter often fails when the true rank of the parameter is high. To overcome this issue, we develop an estimation algorithm with rank and variable selection via sparse regularization and manifold optimization, which enables us to obtain an accurate estimation of the coefficient parameter even if the true rank of the coefficient parameter is high. Using sparse regularization, we can also select an optimal value of the rank. We conduct Monte Carlo experiments and real data analysis to illustrate the effectiveness of our proposed method.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05083v2
PDF	https://arxiv.org/pdf/1910.05083v2.pdf
PWC	https://paperswithcode.com/paper/sparse-reduced-rank-regression-for
Repo	https://github.com/yoshikawa-kohei/RVSManOpt
Framework	none

Training Agents using Upside-Down Reinforcement Learning


Title	Training Agents using Upside-Down Reinforcement Learning
Authors	Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber
Abstract	Traditional Reinforcement Learning (RL) algorithms either predict rewards with value functions or maximize them using policy search. We study an alternative: Upside-Down Reinforcement Learning (Upside-Down RL or UDRL), that solves RL problems primarily using supervised learning techniques. Many of its main principles are outlined in a companion report [34]. Here we present the first concrete implementation of UDRL and demonstrate its feasibility on certain episodic learning problems. Experimental results show that its performance can be surprisingly competitive with, and even exceed that of traditional baseline algorithms developed over decades of research.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02877v1
PDF	https://arxiv.org/pdf/1912.02877v1.pdf
PWC	https://paperswithcode.com/paper/training-agents-using-upside-down
Repo	https://github.com/parthchadha/upsideDownRL
Framework	pytorch

Urban Sound Tagging using Convolutional Neural Networks


Title	Urban Sound Tagging using Convolutional Neural Networks
Authors	Sainath Adapa
Abstract	In this paper, we propose a framework for environmental sound classification in a low-data context (less than 100 labeled examples per class). We show that using pre-trained image classification models along with the usage of data augmentation techniques results in higher performance over alternative approaches. We applied this system to the task of Urban Sound Tagging, part of the DCASE 2019. The objective was to label different sources of noise from raw audio data. A modified form of MobileNetV2, a convolutional neural network (CNN) model was trained to classify both coarse and fine tags jointly. The proposed model uses log-scaled Mel-spectrogram as the representation format for the audio data. Mixup, Random erasing, scaling, and shifting are used as data augmentation techniques. A second model that uses scaled labels was built to account for human errors in the annotations. The proposed model achieved the first rank on the leaderboard with Micro-AUPRC values of 0.751 and 0.860 on fine and coarse tags, respectively.
Tasks	Data Augmentation, Environmental Sound Classification, Image Classification
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12699v1
PDF	https://arxiv.org/pdf/1909.12699v1.pdf
PWC	https://paperswithcode.com/paper/urban-sound-tagging-using-convolutional
Repo	https://github.com/sainathadapa/urban-sound-tagging
Framework	pytorch

Multi-scale Dynamic Graph Convolutional Network for Hyperspectral Image Classification


Title	Multi-scale Dynamic Graph Convolutional Network for Hyperspectral Image Classification
Authors	Sheng Wan, Chen Gong, Ping Zhong, Bo Du, Lefei Zhang, Jian Yang
Abstract	Convolutional Neural Network (CNN) has demonstrated impressive ability to represent hyperspectral images and to achieve promising results in hyperspectral image classification. However, traditional CNN models can only operate convolution on regular square image regions with fixed size and weights, so they cannot universally adapt to the distinct local regions with various object distributions and geometric appearances. Therefore, their classification performances are still to be improved, especially in class boundaries. To alleviate this shortcoming, we consider employing the recently proposed Graph Convolutional Network (GCN) for hyperspectral image classification, as it can conduct the convolution on arbitrarily structured non-Euclidean data and is applicable to the irregular image regions represented by graph topological information. Different from the commonly used GCN models which work on a fixed graph, we enable the graph to be dynamically updated along with the graph convolution process, so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph. Moreover, to comprehensively deploy the multi-scale information inherited by hyperspectral images, we establish multiple input graphs with different neighborhood scales to extensively exploit the diversified spectral-spatial correlations at multiple scales. Therefore, our method is termed ‘Multi-scale Dynamic Graph Convolutional Network’ (MDGCN). The experimental results on three typical benchmark datasets firmly demonstrate the superiority of the proposed MDGCN to other state-of-the-art methods in both qualitative and quantitative aspects.
Tasks	Hyperspectral Image Classification, Image Classification
Published	2019-05-14
URL	https://arxiv.org/abs/1905.06133v1
PDF	https://arxiv.org/pdf/1905.06133v1.pdf
PWC	https://paperswithcode.com/paper/multi-scale-dynamic-graph-convolutional
Repo	https://github.com/LEAP-WS/MDGCN
Framework	tf

Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization


Title	Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorization
Authors	Eldad Meller, Alexander Finkelstein, Uri Almog, Mark Grobman
Abstract	Quantization of neural networks has become common practice, driven by the need for efficient implementations of deep neural networks on embedded devices. In this paper, we exploit an oft-overlooked degree of freedom in most networks - for a given layer, individual output channels can be scaled by any factor provided that the corresponding weights of the next layer are inversely scaled. Therefore, a given network has many factorizations which change the weights of the network without changing its function. We present a conceptually simple and easy to implement method that uses this property and show that proper factorizations significantly decrease the degradation caused by quantization. We show improvement on a wide variety of networks and achieve state-of-the-art degradation results for MobileNets. While our focus is on quantization, this type of factorization is applicable to other domains such as network-pruning, neural nets regularization and network interpretability.
Tasks	Network Pruning, Quantization
Published	2019-02-05
URL	http://arxiv.org/abs/1902.01917v1
PDF	http://arxiv.org/pdf/1902.01917v1.pdf
PWC	https://paperswithcode.com/paper/same-same-but-different-recovering-neural
Repo	https://github.com/Adamdad/Samesame
Framework	tf

NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization


Title	NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization
Authors	Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy
Abstract	As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel. Alistarh et al. (2017) describe two variants of data-parallel SGD that quantize and encode gradients to lessen communication costs. For the first variant, QSGD, they provide strong theoretical guarantees. For the second variant, which we call QSGDinf, they demonstrate impressive empirical gains for distributed training of large neural networks. Building on their work, we propose an alternative scheme for quantizing gradients and show that it yields stronger theoretical guarantees than exist for QSGD while matching the empirical performance of QSGDinf.
Tasks	Quantization
Published	2019-08-16
URL	https://arxiv.org/abs/1908.06077v1
PDF	https://arxiv.org/pdf/1908.06077v1.pdf
PWC	https://paperswithcode.com/paper/nuqsgd-improved-communication-efficiency-for
Repo	https://github.com/fartashf/nuqsgd
Framework	pytorch

3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization


Title	3D LiDAR and Stereo Fusion using Stereo Matching Network with Conditional Cost Volume Normalization
Authors	Tsun-Hsuan Wang, Hou-Ning Hu, Chieh Hubert Lin, Yi-Hsuan Tsai, Wei-Chen Chiu, Min Sun
Abstract	The complementary characteristics of active and passive depth sensing techniques motivate the fusion of the Li-DAR sensor and stereo camera for improved depth perception. Instead of directly fusing estimated depths across LiDAR and stereo modalities, we take advantages of the stereo matching network with two enhanced techniques: Input Fusion and Conditional Cost Volume Normalization (CCVNorm) on the LiDAR information. The proposed framework is generic and closely integrated with the cost volume component that is commonly utilized in stereo matching neural networks. We experimentally verify the efficacy and robustness of our method on the KITTI Stereo and Depth Completion datasets, obtaining favorable performance against various fusion strategies. Moreover, we demonstrate that, with a hierarchical extension of CCVNorm, the proposed method brings only slight overhead to the stereo matching network in terms of computation time and model size. For project page, see https://zswang666.github.io/Stereo-LiDAR-CCVNorm-Project-Page/
Tasks	Depth Completion, Stereo Matching, Stereo Matching Hand
Published	2019-04-05
URL	http://arxiv.org/abs/1904.02917v1
PDF	http://arxiv.org/pdf/1904.02917v1.pdf
PWC	https://paperswithcode.com/paper/3d-lidar-and-stereo-fusion-using-stereo
Repo	https://github.com/zswang666/Stereo-LiDAR-CCVNorm
Framework	pytorch

Extending Monocular Visual Odometry to Stereo Camera Systems by Scale Optimization


Title	Extending Monocular Visual Odometry to Stereo Camera Systems by Scale Optimization
Authors	Jiawei Mo, Junaed Sattar
Abstract	This paper proposes a novel approach for extending monocular visual odometry to a stereo camera system. The proposed method uses an additional camera to accurately estimate and optimize the scale of the monocular visual odometry, rather than triangulating 3D points from stereo matching. Specifically, the 3D points generated by the monocular visual odometry are projected onto the other camera of the stereo pair, and the scale is recovered and optimized by directly minimizing the photometric error. It is computationally efficient, adding minimal overhead to the stereo vision system compared to straightforward stereo matching, and is robust to repetitive texture. Additionally, direct scale optimization enables stereo visual odometry to be purely based on the direct method. Extensive evaluation on public datasets (e.g., KITTI), and outdoor environments (both terrestrial and underwater) demonstrates the accuracy and efficiency of a stereo visual odometry approach extended by scale optimization, and its robustness in environments with challenging textures.
Tasks	Monocular Visual Odometry, Stereo Matching, Stereo Matching Hand, Visual Odometry
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12723v3
PDF	https://arxiv.org/pdf/1905.12723v3.pdf
PWC	https://paperswithcode.com/paper/extending-monocular-visual-odometry-to-stereo
Repo	https://github.com/jiawei-mo/scale_optimization
Framework	none

Understanding Isomorphism Bias in Graph Data Sets


Title	Understanding Isomorphism Bias in Graph Data Sets
Authors	Sergei Ivanov, Sergei Sviridov, Evgeny Burnaev
Abstract	In recent years there has been a rapid increase in classification methods on graph structured data. Both in graph kernels and graph neural networks, one of the implicit assumptions of successful state-of-the-art models was that incorporating graph isomorphism features into the architecture leads to better empirical performance. However, as we discover in this work, commonly used data sets for graph classification have repeating instances which cause the problem of isomorphism bias, i.e. artificially increasing the accuracy of the models by memorizing target information from the training set. This prevents fair competition of the algorithms and raises a question of the validity of the obtained results. We analyze 54 data sets, previously extensively used for graph-related tasks, on the existence of isomorphism bias, give a set of recommendations to machine learning practitioners to properly set up their models, and open source new data sets for the future experiments.
Tasks	Graph Classification
Published	2019-10-26
URL	https://arxiv.org/abs/1910.12091v2
PDF	https://arxiv.org/pdf/1910.12091v2.pdf
PWC	https://paperswithcode.com/paper/understanding-isomorphism-bias-in-graph-data
Repo	https://github.com/nd7141/graph_datasets
Framework	pytorch