April 2, 2020

3475 words 17 mins read

Paper Group ANR 137

Paper Group ANR 137

Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGAN. Multi-scale domain-adversarial multiple-instance CNN for cancer subtype classification with non-annotated histopathological images. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. A hybrid model based on deep LSTM for predi …

Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGAN

Title Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGAN
Authors Joonyoung Song, Jae-Heon Jeong, Dae-Soon Park, Hyun-Ho Kim, Doo-Chun Seo, Jong Chul Ye
Abstract Multi-spectral satellite imaging sensors acquire various spectral band images such as red (R), green (G), blue (B), near-infrared (N), etc. Thanks to the unique spectroscopic property of each spectral band with respective to the objects on the ground, multi-spectral satellite imagery can be used for various geological survey applications. Unfortunately, image artifacts from imaging sensor noises often affect the quality of scenes and have negative impacts on the applications of satellite imagery. Recently, deep learning approaches have been extensively explored for the removal of noises in satellite imagery. Most deep learning denoising methods, however, follow a supervised learning scheme, which requires matched noisy image and clean image pairs that are difficult to collect in real situations. In this paper, we propose a novel unsupervised multispectral denoising method for satellite imagery using wavelet subband cycle-consistent adversarial network (WavCycleGAN). The proposed method is based on unsupervised learning scheme using adversarial loss and cycle-consistency loss to overcome the lack of paired data. Moreover, in contrast to the standard image domain cycleGAN, we introduce a wavelet subband domain learning scheme for effective denoising without sacrificing high frequency components such as edges and detail information. Experimental results for the removal of vertical stripe and wave noises in satellite imaging sensors demonstrate that the proposed method effectively removes noises and preserves important high frequency features of satellite images.
Tasks Denoising
Published 2020-02-23
URL https://arxiv.org/abs/2002.09847v1
PDF https://arxiv.org/pdf/2002.09847v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-denoising-for-satellite-imagery

Multi-scale domain-adversarial multiple-instance CNN for cancer subtype classification with non-annotated histopathological images

Title Multi-scale domain-adversarial multiple-instance CNN for cancer subtype classification with non-annotated histopathological images
Authors Noriaki Hashimoto, Daisuke Fukushima, Ryoichi Koga, Yusuke Takagi, Kaho Ko, Kei Kohno, Masato Nakaguro, Shigeo Nakamura, Hidekata Hontani, Ichiro Takeuchi
Abstract We propose a new method for cancer subtype classification from histopathological images, which can automatically detect tumor-specific features in a given whole slide image (WSI). The cancer subtype should be classified by referring to a WSI, i.e., a large size image (typically 40,000x40,000 pixels) of an entire pathological tissue slide, which consists of cancer and non-cancer portions. One difficulty for constructing cancer subtype classifiers comes from the high cost needed for annotating WSIs; without annotation, we have to construct the tumor region detector without knowing true labels. Furthermore, both global and local image features must be extracted from the WSI by changing the magnifications of the image. In addition, the image features should be stably detected against the variety/difference of staining among the hospitals/specimen. In this paper, we develop a new CNN-based cancer subtype classification method by effectively combining multiple-instance, domain adversarial, and multi-scale learning frameworks that can overcome these practical difficulties. When the proposed method was applied to malignant lymphoma subtype classifications of 196 cases collected from multiple hospitals, the classification performance was significantly better than the standard CNN or other conventional methods, and the accuracy was favorably compared to that of standard pathologists. In addition, we confirmed by immunostaining and expert pathologist’s visual inspections that the tumor regions were correctly detected.
Published 2020-01-06
URL https://arxiv.org/abs/2001.01599v1
PDF https://arxiv.org/pdf/2001.01599v1.pdf
PWC https://paperswithcode.com/paper/multi-scale-domain-adversarial-multiple

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

Title Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks
Authors Francesco Croce, Matthias Hein
Abstract The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness. Many promising defenses could be broken later on, making it difficult to identify the state-of-the-art. Frequent pitfalls in the evaluation are improper tuning of hyperparameters of the attacks, gradient obfuscation or masking. In this paper we first propose two extensions of the PGD-attack overcoming failures due to suboptimal step size and problems of the objective function. We then combine our novel attacks with two complementary existing ones to form a parameter-free, computationally affordable and user-independent ensemble of attacks to test adversarial robustness. We apply our ensemble to over 40 models from papers published at recent top machine learning and computer vision venues. In all except one of the cases we achieve lower robust test accuracy than reported in these papers, often by more than $10%$, identifying several broken defenses.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01690v1
PDF https://arxiv.org/pdf/2003.01690v1.pdf
PWC https://paperswithcode.com/paper/reliable-evaluation-of-adversarial-robustness

A hybrid model based on deep LSTM for predicting high-dimensional chaotic systems

Title A hybrid model based on deep LSTM for predicting high-dimensional chaotic systems
Authors Youming Lei, Jian Hu, Jianpeng Ding
Abstract We propose a hybrid method combining the deep long short-term memory (LSTM) model with the inexact empirical model of dynamical systems to predict high-dimensional chaotic systems. The deep hierarchy is encoded into the LSTM by superimposing multiple recurrent neural network layers and the hybrid model is trained with the Adam optimization algorithm. The statistical results of the Mackey-Glass system and the Kuramoto-Sivashinsky system are obtained under the criteria of root mean square error (RMSE) and anomaly correlation coefficient (ACC) using the singe-layer LSTM, the multi-layer LSTM, and the corresponding hybrid method, respectively. The numerical results show that the proposed method can effectively avoid the rapid divergence of the multi-layer LSTM model when reconstructing chaotic attractors, and demonstrate the feasibility of the combination of deep learning based on the gradient descent method and the empirical model.
Published 2020-01-21
URL https://arxiv.org/abs/2002.00799v1
PDF https://arxiv.org/pdf/2002.00799v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-model-based-on-deep-lstm-for

Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems

Title Tuning-free Plug-and-Play Proximal Algorithm for Inverse Imaging Problems
Authors Kaixuan Wei, Angelica Aviles-Rivero, Jingwei Liang, Ying Fu, Carola-Bibiane Schnlieb, Hua Huang
Abstract Plug-and-play (PnP) is a non-convex framework that combines ADMM or other proximal algorithms with advanced denoiser priors. Recently, PnP has achieved great empirical success, especially with the integration of deep learning-based denoisers. However, a key problem of PnP based approaches is that they require manual parameter tweaking. It is necessary to obtain high-quality results across the high discrepancy in terms of imaging conditions and varying scene content. In this work, we present a tuning-free PnP proximal algorithm, which can automatically determine the internal parameters including the penalty parameter, the denoising strength and the terminal time. A key part of our approach is to develop a policy network for automatic search of parameters, which can be effectively learned via mixed model-free and model-based deep reinforcement learning. We demonstrate, through numerical and visual experiments, that the learned policy can customize different parameters for different states, and often more efficient and effective than existing handcrafted criteria. Moreover, we discuss the practical considerations of the plugged denoisers, which together with our learned policy yield state-of-the-art results. This is prevalent on both linear and nonlinear exemplary inverse imaging problems, and in particular, we show promising results on Compressed Sensing MRI and phase retrieval.
Tasks Denoising
Published 2020-02-22
URL https://arxiv.org/abs/2002.09611v1
PDF https://arxiv.org/pdf/2002.09611v1.pdf
PWC https://paperswithcode.com/paper/tuning-free-plug-and-play-proximal-algorithm

SD-GAN: Structural and Denoising GAN reveals facial parts under occlusion

Title SD-GAN: Structural and Denoising GAN reveals facial parts under occlusion
Authors Samik Banerjee, Sukhendu Das
Abstract Certain facial parts are salient (unique) in appearance, which substantially contribute to the holistic recognition of a subject. Occlusion of these salient parts deteriorates the performance of face recognition algorithms. In this paper, we propose a generative model to reconstruct the missing parts of the face which are under occlusion. The proposed generative model (SD-GAN) reconstructs a face preserving the illumination variation and identity of the face. A novel adversarial training algorithm has been designed for a bimodal mutually exclusive Generative Adversarial Network (GAN) model, for faster convergence. A novel adversarial “structural” loss function is also proposed, comprising of two components: a holistic and a local loss, characterized by SSIM and patch-wise MSE. Ablation studies on real and synthetically occluded face datasets reveal that our proposed technique outperforms the competing methods by a considerable margin, even for boosting the performance of Face Recognition.
Tasks Denoising, Face Recognition
Published 2020-02-19
URL https://arxiv.org/abs/2002.08448v1
PDF https://arxiv.org/pdf/2002.08448v1.pdf
PWC https://paperswithcode.com/paper/sd-gan-structural-and-denoising-gan-reveals

Deep Transform and Metric Learning Network: Wedding Deep Dictionary Learning and Neural Networks

Title Deep Transform and Metric Learning Network: Wedding Deep Dictionary Learning and Neural Networks
Authors Wen Tang, Emilie Chouzenoux, Jean-Christophe Pesquet, Hamid Krim
Abstract On account of its many successes in inference tasks and denoising applications, Dictionary Learning (DL) and its related sparse optimization problems have garnered a lot of research interest. While most solutions have focused on single layer dictionaries, the improved recently proposed Deep DL (DDL) methods have also fallen short on a number of issues. We propose herein, a novel DDL approach where each DL layer can be formulated as a combination of one linear layer and a Recurrent Neural Network (RNN). The RNN is shown to flexibly account for the layer-associated and learned metric. Our proposed work unveils new insights into Neural Networks and DDL and provides a new, efficient and competitive approach to jointly learn a deep transform and a metric for inference applications. Extensive experiments are carried out to demonstrate that the proposed method can not only outperform existing DDL but also state-of-the-art generic CNNs.
Tasks Denoising, Dictionary Learning, Metric Learning
Published 2020-02-18
URL https://arxiv.org/abs/2002.07898v1
PDF https://arxiv.org/pdf/2002.07898v1.pdf
PWC https://paperswithcode.com/paper/deep-transform-and-metric-learning-network

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

Title PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning
Authors Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren
Abstract With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks (DNNs) is still challenging considering high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss. In this paper, we introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency. In other words, our method achieves the best of both worlds, and is desirable across theory/algorithm, compiler, and hardware levels. The proposed PatDNN is an end-to-end framework to efficiently execute DNN on mobile devices with the help of a novel model compression technique (pattern-based pruning based on extended ADMM solution framework) and a set of thorough architecture-aware compiler- and code generation-based optimizations (filter kernel reordering, compressed weight storage, register load redundancy elimination, and parameter auto-tuning). Evaluation results demonstrate that PatDNN outperforms three state-of-the-art end-to-end DNN frameworks, TensorFlow Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 44.5x, 11.4x, and 7.1x, respectively, with no accuracy compromise. Real-time inference of representative large-scale DNNs (e.g., VGG-16, ResNet-50) can be achieved using mobile devices.
Tasks Code Generation, Model Compression
Published 2020-01-01
URL https://arxiv.org/abs/2001.00138v4
PDF https://arxiv.org/pdf/2001.00138v4.pdf
PWC https://paperswithcode.com/paper/patdnn-achieving-real-time-dnn-execution-on

Source Printer Identification from Document Images Acquired using Smartphone

Title Source Printer Identification from Document Images Acquired using Smartphone
Authors Sharad Joshi, Suraj Saxena, Nitin Khanna
Abstract Vast volumes of printed documents continue to be used for various important as well as trivial applications. Such applications often rely on the information provided in the form of printed text documents whose integrity verification poses a challenge due to time constraints and lack of resources. Source printer identification provides essential information about the origin and integrity of a printed document in a fast and cost-effective manner. Even when fraudulent documents are identified, information about their origin can help stop future frauds. If a smartphone camera replaces scanner for the document acquisition process, document forensics would be more economical, user-friendly, and even faster in many applications where remote and distributed analysis is beneficial. Building on existing methods, we propose to learn a single CNN model from the fusion of letter images and their printer-specific noise residuals. In the absence of any publicly available dataset, we created a new dataset consisting of 2250 document images of text documents printed by eighteen printers and acquired by a smartphone camera at five acquisition settings. The proposed method achieves 98.42% document classification accuracy using images of letter ‘e’ under a 5x2 cross-validation approach. Further, when tested using about half a million letters of all types, it achieves 90.33% and 98.01% letter and document classification accuracies, respectively, thus highlighting the ability to learn a discriminative model without dependence on a single letter type. Also, classification accuracies are encouraging under various acquisition settings, including low illumination and change in angle between the document and camera planes.
Tasks Document Classification
Published 2020-03-27
URL https://arxiv.org/abs/2003.12602v1
PDF https://arxiv.org/pdf/2003.12602v1.pdf
PWC https://paperswithcode.com/paper/source-printer-identification-from-document

Identifying At-Risk K-12 Students in Multimodal Online Environments: A Machine Learning Approach

Title Identifying At-Risk K-12 Students in Multimodal Online Environments: A Machine Learning Approach
Authors Hang Li, Wenbiao Ding, Songfan Yang, Zitao Liu
Abstract With the rapid emergence of K-12 online learning platforms, a new era of education has been opened up. By offering more affordable and personalized courses compared to in-person classrooms, K-12 online tutoring is pushing the boundaries of education to the general public. It is crucial to have a dropout warning framework to preemptively identify K-12 students who are at risk of dropping out of the online courses. Prior researchers have focused on predicting dropout in Massive Open Online Courses (MOOCs), which often deliver higher education, i.e., graduate level courses at top institutions. However, few studies have focused on developing a machine learning approach for students in K-12 online courses. The dropout prediction scenarios are significantly different between MOOC based learning and K-12 online tutoring in many aspects such as environmental modalities, learning goals, online behaviors, etc. In this paper, we develop a machine learning framework to conduct accurate at-risk student identification specialized in K-12 multimodal online environments. Our approach considers both online and offline factors around K-12 students and aims at solving the challenging problems of (1) multiple modalities, i.e., K-12 online environments involve interactions from different modalities such as video, voice, etc; (2) length variability, i.e., students with different lengths of learning history; (3) time sensitivity, i.e., the dropout likelihood is changing with time; and (4) data imbalance, i.e., only less than 20% of K-12 students will choose to drop out the class. We conduct a wide range of offline and online experiments to demonstrate the effectiveness of our approach. In our offline experiments, we show that our method improves the dropout prediction performance when compared to state-of-the-art baselines on a real-world educational data set.
Published 2020-03-21
URL https://arxiv.org/abs/2003.09670v1
PDF https://arxiv.org/pdf/2003.09670v1.pdf
PWC https://paperswithcode.com/paper/identifying-at-risk-k-12-students-in

Lyceum: An efficient and scalable ecosystem for robot learning

Title Lyceum: An efficient and scalable ecosystem for robot learning
Authors Colin Summers, Kendall Lowrey, Aravind Rajeswaran, Siddhartha Srinivasa, Emanuel Todorov
Abstract We introduce Lyceum, a high-performance computational ecosystem for robot learning. Lyceum is built on top of the Julia programming language and the MuJoCo physics simulator, combining the ease-of-use of a high-level programming language with the performance of native C. In addition, Lyceum has a straightforward API to support parallel computation across multiple cores and machines. Overall, depending on the complexity of the environment, Lyceum is 5-30x faster compared to other popular abstractions like OpenAI’s Gym and DeepMind’s dm-control. This substantially reduces training time for various reinforcement learning algorithms; and is also fast enough to support real-time model predictive control through MuJoCo. The code, tutorials, and demonstration videos can be found at: www.lyceum.ml.
Published 2020-01-21
URL https://arxiv.org/abs/2001.07343v1
PDF https://arxiv.org/pdf/2001.07343v1.pdf
PWC https://paperswithcode.com/paper/lyceum-an-efficient-and-scalable-ecosystem-1

Cross-GCN: Enhancing Graph Convolutional Network with $k$-Order Feature Interactions

Title Cross-GCN: Enhancing Graph Convolutional Network with $k$-Order Feature Interactions
Authors Fuli Feng, Xiangnan He, Hanwang Zhang, Tat-Seng Chua
Abstract Graph Convolutional Network (GCN) is an emerging technique that performs learning and reasoning on graph data. It operates feature learning on the graph structure, through aggregating the features of the neighbor nodes to obtain the embedding of each target node. Owing to the strong representation power, recent research shows that GCN achieves state-of-the-art performance on several tasks such as recommendation and linked document classification. Despite its effectiveness, we argue that existing designs of GCN forgo modeling cross features, making GCN less effective for tasks or data where cross features are important. Although neural network can approximate any continuous function, including the multiplication operator for modeling feature crosses, it can be rather inefficient to do so (i.e., wasting many parameters at the risk of overfitting) if there is no explicit design. To this end, we design a new operator named Cross-feature Graph Convolution, which explicitly models the arbitrary-order cross features with complexity linear to feature dimension and order size. We term our proposed architecture as Cross-GCN, and conduct experiments on three graphs to validate its effectiveness. Extensive analysis validates the utility of explicitly modeling cross features in GCN, especially for feature learning at lower layers.
Tasks Document Classification
Published 2020-03-05
URL https://arxiv.org/abs/2003.02587v1
PDF https://arxiv.org/pdf/2003.02587v1.pdf
PWC https://paperswithcode.com/paper/cross-gcn-enhancing-graph-convolutional

HandAugment: A Simple Data Augmentation Method for Depth-Based 3D Hand Pose Estimation

Title HandAugment: A Simple Data Augmentation Method for Depth-Based 3D Hand Pose Estimation
Authors Zhaohui Zhang, Shipeng Xie, Mingxiu Chen, Haichao Zhu
Abstract Hand pose estimation from 3D depth images, has been explored widely using various kinds of techniques in the field of computer vision. Though, deep learning based method improve the performance greatly recently, however, this problem still remains unsolved due to lack of large datasets, like ImageNet or effective data synthesis methods. In this paper, we propose HandAugment, a method to synthesize image data to augment the training process of the neural networks. Our method has two main parts: First, We propose a scheme of two-stage neural networks. This scheme can make the neural networks focus on the hand regions and thus to improve the performance. Second, we introduce a simple and effective method to synthesize data by combining real and synthetic image together in the image space. Finally, we show that our method achieves the first place in the task of depth-based 3D hand pose estimation in HANDS 2019 challenge.
Tasks Data Augmentation, Hand Pose Estimation, Pose Estimation
Published 2020-01-03
URL https://arxiv.org/abs/2001.00702v2
PDF https://arxiv.org/pdf/2001.00702v2.pdf
PWC https://paperswithcode.com/paper/handaugment-a-simple-data-augmentation-for

Adaptive Group Sparse Regularization for Continual Learning

Title Adaptive Group Sparse Regularization for Continual Learning
Authors Sangwon Jung, Hongjoon Ahn, Sungmin Cha, Taesup Moon
Abstract We propose a novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL), using two group sparsity-based penalties. Our method selectively employs the two penalties when learning each node based its the importance, which is adaptively updated after learning each new task. By utilizing the proximal gradient descent method for learning, the exact sparsity and freezing of the model is guaranteed, and thus, the learner can explicitly control the model capacity as the learning continues. Furthermore, as a critical detail, we re-initialize the weights associated with unimportant nodes after learning each task in order to prevent the negative transfer that causes the catastrophic forgetting and facilitate efficient learning of new tasks. Throughout the extensive experimental results, we show that our AGS-CL uses much less additional memory space for storing the regularization parameters, and it significantly outperforms several state-of-the-art baselines on representative continual learning benchmarks for both supervised and reinforcement learning tasks.
Tasks Continual Learning
Published 2020-03-30
URL https://arxiv.org/abs/2003.13726v1
PDF https://arxiv.org/pdf/2003.13726v1.pdf
PWC https://paperswithcode.com/paper/adaptive-group-sparse-regularization-for

An Overview of Distance and Similarity Functions for Structured Data

Title An Overview of Distance and Similarity Functions for Structured Data
Authors Santiago Ontañón
Abstract The notions of distance and similarity play a key role in many machine learning approaches, and artificial intelligence (AI) in general, since they can serve as an organizing principle by which individuals classify objects, form concepts and make generalizations. While distance functions for propositional representations have been thoroughly studied, work on distance functions for structured representations, such as graphs, frames or logical clauses, has been carried out in different communities and is much less understood. Specifically, a significant amount of work that requires the use of a distance or similarity function for structured representations of data usually employs ad-hoc functions for specific applications. Therefore, the goal of this paper is to provide an overview of this work to identify connections between the work carried out in different areas and point out directions for future work.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07420v1
PDF https://arxiv.org/pdf/2002.07420v1.pdf
PWC https://paperswithcode.com/paper/an-overview-of-distance-and-similarity
comments powered by Disqus