April 2, 2020

3153 words 15 mins read

Paper Group ANR 364

On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width. Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation. AutoFCL: Automatically Tuning Fully Connected Layers for Handling Small Dataset. Stochastic Online Optimization using Kalman Recursion. ADAMT: A Stochastic Optimization with Trend …

On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width


Title	On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width
Authors	Etai Littwin, Lior Wolf
Abstract	The Hessian of neural networks can be decomposed into a sum of two matrices: (i) the positive semidefinite generalized Gauss-Newton matrix G, and (ii) the matrix H containing negative eigenvalues. We observe that for wider networks, minimizing the loss with the gradient descent optimization maneuvers through surfaces of positive curvatures at the start and end of training, and close to zero curvatures in between. In other words, it seems that during crucial parts of the training process, the Hessian in wide networks is dominated by the component G. To explain this phenomenon, we show that when initialized using common methodologies, the gradients of over-parameterized networks are approximately orthogonal to H, such that the curvature of the loss surface is strictly positive in the direction of the gradient.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04878v1
PDF	https://arxiv.org/pdf/2001.04878v1.pdf
PWC	https://paperswithcode.com/paper/on-the-convex-behavior-of-deep-neural
Repo
Framework

Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation


Title	Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation
Authors	Andrew Lensen, Bing Xue, Mengjie Zhang
Abstract	Data visualisation is a key tool in data mining for understanding big datasets. Many visualisation methods have been proposed, including the well-regarded state-of-the-art method t-Distributed Stochastic Neighbour Embedding. However, the most powerful visualisation methods have a significant limitation: the manner in which they create their visualisation from the original features of the dataset is completely opaque. Many domains require an understanding of the data in terms of the original features; there is hence a need for powerful visualisation methods which use understandable models. In this work, we propose a genetic programming approach named GPtSNE for evolving interpretable mappings from a dataset to highquality visualisations. A multi-objective approach is designed that produces a variety of visualisations in a single run which give different trade-offs between visual quality and model complexity. Testing against baseline methods on a variety of datasets shows the clear potential of GP-tSNE to allow deeper insight into data than that provided by existing visualisation methods. We further highlight the benefits of a multi-objective approach through an in-depth analysis of a candidate front, which shows how multiple models can
Tasks
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09578v1
PDF	https://arxiv.org/pdf/2001.09578v1.pdf
PWC	https://paperswithcode.com/paper/genetic-programming-for-evolving-a-front-of
Repo
Framework

AutoFCL: Automatically Tuning Fully Connected Layers for Handling Small Dataset


Title	AutoFCL: Automatically Tuning Fully Connected Layers for Handling Small Dataset
Authors	S. H. Shabbeer Basha, Sravan Kumar Vinakota, Shiv Ram Dubey, Viswanath Pulabaigari, Snehasis Mukherjee
Abstract	Deep Convolutional Neural Networks (CNN) have evolved as popular machine learning models for image classification during the past few years, due to their ability to learn the problem-specific features directly from the input images. The success of deep learning models solicits architecture engineering rather than hand-engineering the features. However, designing state-of-the-art CNN for a given task remains a non-trivial and challenging task, especially when training data size is less. To address this phenomenon, transfer learning has been used as a popularly adopted technique. While transferring the learned knowledge from one task to another, fine-tuning with the target-dependent Fully Connected (FC) layers generally produces better results over the target task. In this paper, the proposed AutoFCL model attempts to learn the structure of FC layers of a CNN automatically using Bayesian optimization. To evaluate the performance of the proposed AutoFCL, we utilize five pre-trained CNN models such as VGG-16, ResNet, DenseNet, MobileNet, and NASNetMobile. The experiments are conducted on three benchmark datasets, namely CalTech-101, Oxford-102 Flowers, and UC Merced Land Use datasets. Fine-tuning the newly learned (target-dependent) FC layers leads to state-of-the-art performance, according to the experiments carried out in this research. The proposed AutoFCL method outperforms the existing methods over CalTech-101 and Oxford-102 Flowers datasets by achieving the accuracy of 94:38% and 98:89%, respectively. However, our method achieves comparable performance on the UC Merced Land Use dataset with 96:83% accuracy.
Tasks	Image Classification, Transfer Learning
Published	2020-01-22
URL	https://arxiv.org/abs/2001.11951v3
PDF	https://arxiv.org/pdf/2001.11951v3.pdf
PWC	https://paperswithcode.com/paper/autofcl-automatically-tuning-fully-connected
Repo
Framework

Stochastic Online Optimization using Kalman Recursion


Title	Stochastic Online Optimization using Kalman Recursion
Authors	Joseph de Vilmarest, Olivier Wintenberger
Abstract	We study the Extended Kalman Filter in constant dynamics, offering a bayesian perspective of stochastic optimization. We obtain high probability bounds on the cumulative excess risk in an unconstrained setting. The unconstrained challenge is tackled through a two-phase analysis. First, for linear and logistic regressions, we prove that the algorithm enters a local phase where the estimate stays in a small region around the optimum. We provide explicit bounds with high probability on this convergence time. Second, for generalized linear regressions, we provide a martingale analysis of the excess risk in the local phase, improving existing ones in bounded stochastic optimization. The EKF appears as a parameter-free O(d^2) online algorithm that optimally solves some unconstrained optimization problems.
Tasks	Stochastic Optimization
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03636v1
PDF	https://arxiv.org/pdf/2002.03636v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-online-optimization-using-kalman
Repo
Framework

ADAMT: A Stochastic Optimization with Trend Correction Scheme


Title	ADAMT: A Stochastic Optimization with Trend Correction Scheme
Authors	Bingxin Zhou, Xuebin Zheng, Junbin Gao
Abstract	Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing for capability on large-scale sparse datasets with high computational efficiency. In this paper, we present a new framework for adapting Adam-type methods, namely AdamT. Instead of applying a simple exponential weighted average, AdamT also includes the trend information when updating the parameters with the adaptive step size and gradients. The additional terms promise an efficient movement on the complex cost surface, and thus the loss would converge more rapidly. We show empirically the importance of adding the trend component, where AdamT outperforms the vanilla Adam method constantly with state-of-the-art models on several classical real-world datasets.
Tasks	Stochastic Optimization
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06130v1
PDF	https://arxiv.org/pdf/2001.06130v1.pdf
PWC	https://paperswithcode.com/paper/adamt-a-stochastic-optimization-with-trend-1
Repo
Framework

A Survey on Knowledge Graph-Based Recommender Systems


Title	A Survey on Knowledge Graph-Based Recommender Systems
Authors	Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, Qing He
Abstract	To solve the information explosion problem and enhance user experience in various online applications, recommender systems have been developed to model users preferences. Although numerous efforts have been made toward more personalized recommendations, recommender systems still suffer from several challenges, such as data sparsity and cold start. In recent years, generating recommendations with the knowledge graph as side information has attracted considerable interest. Such an approach can not only alleviate the abovementioned issues for a more accurate recommendation, but also provide explanations for recommended items. In this paper, we conduct a systematical survey of knowledge graph-based recommender systems. We collect recently published papers in this field and summarize them from two perspectives. On the one hand, we investigate the proposed algorithms by focusing on how the papers utilize the knowledge graph for accurate and explainable recommendation. On the other hand, we introduce datasets used in these works. Finally, we propose several potential research directions in this field.
Tasks	Recommendation Systems
Published	2020-02-28
URL	https://arxiv.org/abs/2003.00911v1
PDF	https://arxiv.org/pdf/2003.00911v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-on-knowledge-graph-based-recommender
Repo
Framework

Temporal-adaptive Hierarchical Reinforcement Learning


Title	Temporal-adaptive Hierarchical Reinforcement Learning
Authors	Wen-Ji Zhou, Yang Yu
Abstract	Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is expected to be decomposed into corresponding levels with sub-tasks, and thus the learning can be more efficient. In HRL, although it is intuitive that a high-level policy only needs to make macro decisions in a low frequency, the exact frequency is hard to be simply determined. Previous HRL approaches often employed a fixed-time skip strategy or learn a terminal condition without taking account of the context, which, however, not only requires manual adjustments but also sacrifices some decision granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical policy learning} (TEMPLE) structure, which uses a temporal gate to adaptively control the high-level policy decision frequency. We train the TEMPLE structure with PPO and test its performance in a range of environments including 2-D rooms, Mujoco tasks, and Atari games. The results show that the TEMPLE structure can lead to improved performance in these environments with a sequential adaptive high-level control.
Tasks	Atari Games, Hierarchical Reinforcement Learning
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02080v1
PDF	https://arxiv.org/pdf/2002.02080v1.pdf
PWC	https://paperswithcode.com/paper/temporal-adaptive-hierarchical-reinforcement
Repo
Framework

Input Perturbation: A New Paradigm between Central and Local Differential Privacy


Title	Input Perturbation: A New Paradigm between Central and Local Differential Privacy
Authors	Yilin Kang, Yong Liu, Ben Niu, Xinyi Tong, Likun Zhang, Weiping Wang
Abstract	Traditionally, there are two models on differential privacy: the central model and the local model. The central model focuses on the machine learning model and the local model focuses on the training data. In this paper, we study the \textit{input perturbation} method in differentially private empirical risk minimization (DP-ERM), preserving privacy of the central model. By adding noise to the original training data and training with the `perturbed data’, we achieve ($\epsilon$,$\delta$)-differential privacy on the final model, along with some kind of privacy on the original data. We observe that there is an interesting connection between the local model and the central model: the perturbation on the original data causes the perturbation on the gradient, and finally the model parameters. This observation means that our method builds a bridge between local and central model, protecting the data, the gradient and the model simultaneously, which is more superior than previous central methods. Detailed theoretical analysis and experiments show that our method achieves almost the same (or even better) performance as some of the best previous central methods with more protections on privacy, which is an attractive result. Moreover, we extend our method to a more general case: the loss function satisfies the Polyak-Lojasiewicz condition, which is more general than strong convexity, the constraint on the loss function in most previous work. \|
Tasks
Published	2020-02-20
URL	https://arxiv.org/abs/2002.08570v1
PDF	https://arxiv.org/pdf/2002.08570v1.pdf
PWC	https://paperswithcode.com/paper/input-perturbation-a-new-paradigm-between
Repo
Framework

Developing a Multilingual Annotated Corpus of Misogyny and Aggression


Title	Developing a Multilingual Annotated Corpus of Misogyny and Aggression
Authors	Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha
Abstract	In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project). The dataset is collected from comments on YouTube videos and currently contains a total of over 20,000 comments. The comments are annotated at two levels - aggression (overtly aggressive, covertly aggressive, and non-aggressive) and misogyny (gendered and non-gendered). We describe the process of data collection, the tagset used for annotation, and issues and challenges faced during the process of annotation. Finally, we discuss the results of the baseline experiments conducted to develop a classifier for misogyny in the three languages.
Tasks
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07428v1
PDF	https://arxiv.org/pdf/2003.07428v1.pdf
PWC	https://paperswithcode.com/paper/developing-a-multilingual-annotated-corpus-of
Repo
Framework

Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images


Title	Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images
Authors	Bijju Kranthi Veduruparthi, Jayanta Mukherjee, Partha Pratim Das, Moses Arunsingh, Raj Kumar Shrimali, Sriram Prasath, Soumendranath Ray, Sanjay Chatterjee
Abstract	Prediction of survivability in a patient for tumor progression is useful to estimate the effectiveness of a treatment protocol. In our work, we present a model to take into account the heterogeneous nature of a tumor to predict survival. The tumor heterogeneity is measured in terms of its mass by combining information regarding the radiodensity obtained in images with the gross tumor volume (GTV). We propose a novel feature called Tumor Mass within a GTV (TMG), that improves the prediction of survivability, compared to existing models which use GTV. Weekly variation in TMG of a patient is computed from the image data and also estimated from a cell survivability model. The parameters obtained from the cell survivability model are indicatives of changes in TMG over the treatment period. We use these parameters along with other patient metadata to perform survival analysis and regression. Cox’s Proportional Hazard survival regression was performed using these data. Significant improvement in the average concordance index from 0.47 to 0.64 was observed when TMG is used in the model instead of GTV. The experiments show that there is a difference in the treatment response in responsive and non-responsive patients and that the proposed method can be used to predict patient survivability.
Tasks	Survival Analysis
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03537v1
PDF	https://arxiv.org/pdf/2003.03537v1.pdf
PWC	https://paperswithcode.com/paper/novel-radiomic-feature-for-survival
Repo
Framework

Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements


Title	Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements
Authors	Kai Shu, Suhang Wang, Dongwon Lee, Huan Liu
Abstract	In recent years, disinformation including fake news, has became a global phenomenon due to its explosive growth, particularly on social media. The wide spread of disinformation and fake news can cause detrimental societal effects. Despite the recent progress in detecting disinformation and fake news, it is still non-trivial due to its complexity, diversity, multi-modality, and costs of fact-checking or annotation. The goal of this chapter is to pave the way for appreciating the challenges and advancements via: (1) introducing the types of information disorder on social media and examine their differences and connections; (2) describing important and emerging tasks to combat disinformation for characterization, detection and attribution; and (3) discussing a weak supervision approach to detect disinformation with limited labeled data. We then provide an overview of the chapters in this book that represent the recent advancements in three related parts: (1) user engagements in the dissemination of information disorder; (2) techniques on detecting and mitigating disinformation; and (3) trending issues such as ethics, blockchain, clickbaits, etc. We hope this book to be a convenient entry point for researchers, practitioners, and students to understand the problems and challenges, learn state-of-the-art solutions for their specific needs, and quickly identify new research problems in their domains.
Tasks
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00623v1
PDF	https://arxiv.org/pdf/2001.00623v1.pdf
PWC	https://paperswithcode.com/paper/mining-disinformation-and-fake-news-concepts
Repo
Framework

PANDA: Prototypical Unsupervised Domain Adaptation


Title	PANDA: Prototypical Unsupervised Domain Adaptation
Authors	Dapeng Hu, Jian Liang, Qibin Hou, Hanshu Yan, Yunpeng Chen, Shuicheng Yan, Jiashi Feng
Abstract	Previous adversarial domain alignment methods for unsupervised domain adaptation (UDA) pursue conditional domain alignment via intermediate pseudo labels. However, these pseudo labels are generated by independent instances without considering the global data structure and tend to be noisy, making them unreliable for adversarial domain adaptation. Compared with pseudo labels, prototypes are more reliable to represent the data structure resistant to the domain shift since they are summarized over all the relevant instances. In this work, we attempt to calibrate the noisy pseudo labels with prototypes. Specifically, we first obtain a reliable prototypical representation for each instance by multiplying the soft instance predictions with the global prototypes. Based on the prototypical representation, we propose a novel Prototypical Adversarial Learning (PAL) scheme and exploit it to align both feature representations and intermediate prototypes across domains. Besides, with the intermediate prototypes as a proxy, we further minimize the intra-class variance in the target domain to adaptively improve the pseudo labels. Integrating the three objectives, we develop an unified framework termed PrototypicAl uNsupervised Domain Adaptation (PANDA) for UDA. Experiments show that PANDA achieves state-of-the-art or competitive results on multiple UDA benchmarks including both object recognition and semantic segmentation tasks.
Tasks	Domain Adaptation, Object Recognition, Semantic Segmentation, Unsupervised Domain Adaptation
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13274v1
PDF	https://arxiv.org/pdf/2003.13274v1.pdf
PWC	https://paperswithcode.com/paper/panda-prototypical-unsupervised-domain
Repo
Framework

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation


Title	Memory Aggregation Networks for Efficient Interactive Video Object Segmentation
Authors	Jiaxu Miao, Yunchao Wei, Yi Yang
Abstract	Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions. Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage. In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net integrates the interaction and the propagation operations into a single network, which significantly promotes the efficiency of iVOS in the scheme of multi-round interactions. More importantly, we propose a simple yet effective memory aggregation mechanism to record the informative knowledge from the previous interaction rounds, improving the robustness in discovering challenging objects of interest greatly. We conduct extensive experiments on the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net achieves the J@60 score of 76.1% without any bells and whistles, outperforming the state-of-the-arts with more than 2.7%.
Tasks	Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13246v1
PDF	https://arxiv.org/pdf/2003.13246v1.pdf
PWC	https://paperswithcode.com/paper/memory-aggregation-networks-for-efficient
Repo
Framework

Methods to Recover Unknown Processes in Partial Differential Equations Using Data


Title	Methods to Recover Unknown Processes in Partial Differential Equations Using Data
Authors	Zhen Chen, Kailiang Wu, Dongbin Xiu
Abstract	We study the problem of identifying unknown processes embedded in time-dependent partial differential equation (PDE) using observational data, with an application to advection-diffusion type PDE. We first conduct theoretical analysis and derive conditions to ensure the solvability of the problem. We then present a set of numerical approaches, including Galerkin type algorithm and collocation type algorithm. Analysis of the algorithms are presented, along with their implementation detail. The Galerkin algorithm is more suitable for practical situations, particularly those with noisy data, as it avoids using derivative/gradient data. Various numerical examples are then presented to demonstrate the performance and properties of the numerical methods.
Tasks
Published	2020-03-05
URL	https://arxiv.org/abs/2003.02387v1
PDF	https://arxiv.org/pdf/2003.02387v1.pdf
PWC	https://paperswithcode.com/paper/methods-to-recover-unknown-processes-in
Repo
Framework

Deep Learning in Multi-organ Segmentation


Title	Deep Learning in Multi-organ Segmentation
Authors	Yang Lei, Yabo Fu, Tonghe Wang, Richard L. J. Qiu, Walter J. Curran, Tian Liu, Xiaofeng Yang
Abstract	This paper presents a review of deep learning (DL) in multi-organ segmentation. We summarized the latest DL-based methods for medical image segmentation and applications. These methods were classified into six categories according to their network design. For each category, we listed the surveyed works, highlighted important contributions and identified specific challenges. Following the detailed review of each category, we briefly discussed its achievements, shortcomings and future potentials. We provided a comprehensive comparison among DL-based methods for thoracic and head & neck multiorgan segmentation using benchmark datasets, including the 2017 AAPM Thoracic Auto-segmentation Challenge datasets and 2015 MICCAI Head Neck Auto-Segmentation Challenge datasets.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2020-01-28
URL	https://arxiv.org/abs/2001.10619v1
PDF	https://arxiv.org/pdf/2001.10619v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-in-multi-organ-segmentation
Repo
Framework