Paper Group ANR 364
On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width. Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation. AutoFCL: Automatically Tuning Fully Connected Layers for Handling Small Dataset. Stochastic Online Optimization using Kalman Recursion. ADAMT: A Stochastic Optimization with Trend …
On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width
Title | On the Convex Behavior of Deep Neural Networks in Relation to the Layers’ Width |
Authors | Etai Littwin, Lior Wolf |
Abstract | The Hessian of neural networks can be decomposed into a sum of two matrices: (i) the positive semidefinite generalized Gauss-Newton matrix G, and (ii) the matrix H containing negative eigenvalues. We observe that for wider networks, minimizing the loss with the gradient descent optimization maneuvers through surfaces of positive curvatures at the start and end of training, and close to zero curvatures in between. In other words, it seems that during crucial parts of the training process, the Hessian in wide networks is dominated by the component G. To explain this phenomenon, we show that when initialized using common methodologies, the gradients of over-parameterized networks are approximately orthogonal to H, such that the curvature of the loss surface is strictly positive in the direction of the gradient. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04878v1 |
https://arxiv.org/pdf/2001.04878v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-convex-behavior-of-deep-neural |
Repo | |
Framework | |
Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation
Title | Genetic Programming for Evolving a Front of Interpretable Models for Data Visualisation |
Authors | Andrew Lensen, Bing Xue, Mengjie Zhang |
Abstract | Data visualisation is a key tool in data mining for understanding big datasets. Many visualisation methods have been proposed, including the well-regarded state-of-the-art method t-Distributed Stochastic Neighbour Embedding. However, the most powerful visualisation methods have a significant limitation: the manner in which they create their visualisation from the original features of the dataset is completely opaque. Many domains require an understanding of the data in terms of the original features; there is hence a need for powerful visualisation methods which use understandable models. In this work, we propose a genetic programming approach named GPtSNE for evolving interpretable mappings from a dataset to highquality visualisations. A multi-objective approach is designed that produces a variety of visualisations in a single run which give different trade-offs between visual quality and model complexity. Testing against baseline methods on a variety of datasets shows the clear potential of GP-tSNE to allow deeper insight into data than that provided by existing visualisation methods. We further highlight the benefits of a multi-objective approach through an in-depth analysis of a candidate front, which shows how multiple models can |
Tasks | |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09578v1 |
https://arxiv.org/pdf/2001.09578v1.pdf | |
PWC | https://paperswithcode.com/paper/genetic-programming-for-evolving-a-front-of |
Repo | |
Framework | |
AutoFCL: Automatically Tuning Fully Connected Layers for Handling Small Dataset
Title | AutoFCL: Automatically Tuning Fully Connected Layers for Handling Small Dataset |
Authors | S. H. Shabbeer Basha, Sravan Kumar Vinakota, Shiv Ram Dubey, Viswanath Pulabaigari, Snehasis Mukherjee |
Abstract | Deep Convolutional Neural Networks (CNN) have evolved as popular machine learning models for image classification during the past few years, due to their ability to learn the problem-specific features directly from the input images. The success of deep learning models solicits architecture engineering rather than hand-engineering the features. However, designing state-of-the-art CNN for a given task remains a non-trivial and challenging task, especially when training data size is less. To address this phenomenon, transfer learning has been used as a popularly adopted technique. While transferring the learned knowledge from one task to another, fine-tuning with the target-dependent Fully Connected (FC) layers generally produces better results over the target task. In this paper, the proposed AutoFCL model attempts to learn the structure of FC layers of a CNN automatically using Bayesian optimization. To evaluate the performance of the proposed AutoFCL, we utilize five pre-trained CNN models such as VGG-16, ResNet, DenseNet, MobileNet, and NASNetMobile. The experiments are conducted on three benchmark datasets, namely CalTech-101, Oxford-102 Flowers, and UC Merced Land Use datasets. Fine-tuning the newly learned (target-dependent) FC layers leads to state-of-the-art performance, according to the experiments carried out in this research. The proposed AutoFCL method outperforms the existing methods over CalTech-101 and Oxford-102 Flowers datasets by achieving the accuracy of 94:38% and 98:89%, respectively. However, our method achieves comparable performance on the UC Merced Land Use dataset with 96:83% accuracy. |
Tasks | Image Classification, Transfer Learning |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.11951v3 |
https://arxiv.org/pdf/2001.11951v3.pdf | |
PWC | https://paperswithcode.com/paper/autofcl-automatically-tuning-fully-connected |
Repo | |
Framework | |
Stochastic Online Optimization using Kalman Recursion
Title | Stochastic Online Optimization using Kalman Recursion |
Authors | Joseph de Vilmarest, Olivier Wintenberger |
Abstract | We study the Extended Kalman Filter in constant dynamics, offering a bayesian perspective of stochastic optimization. We obtain high probability bounds on the cumulative excess risk in an unconstrained setting. The unconstrained challenge is tackled through a two-phase analysis. First, for linear and logistic regressions, we prove that the algorithm enters a local phase where the estimate stays in a small region around the optimum. We provide explicit bounds with high probability on this convergence time. Second, for generalized linear regressions, we provide a martingale analysis of the excess risk in the local phase, improving existing ones in bounded stochastic optimization. The EKF appears as a parameter-free O(d^2) online algorithm that optimally solves some unconstrained optimization problems. |
Tasks | Stochastic Optimization |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03636v1 |
https://arxiv.org/pdf/2002.03636v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-online-optimization-using-kalman |
Repo | |
Framework | |
ADAMT: A Stochastic Optimization with Trend Correction Scheme
Title | ADAMT: A Stochastic Optimization with Trend Correction Scheme |
Authors | Bingxin Zhou, Xuebin Zheng, Junbin Gao |
Abstract | Adam-type optimizers, as a class of adaptive moment estimation methods with the exponential moving average scheme, have been successfully used in many applications of deep learning. Such methods are appealing for capability on large-scale sparse datasets with high computational efficiency. In this paper, we present a new framework for adapting Adam-type methods, namely AdamT. Instead of applying a simple exponential weighted average, AdamT also includes the trend information when updating the parameters with the adaptive step size and gradients. The additional terms promise an efficient movement on the complex cost surface, and thus the loss would converge more rapidly. We show empirically the importance of adding the trend component, where AdamT outperforms the vanilla Adam method constantly with state-of-the-art models on several classical real-world datasets. |
Tasks | Stochastic Optimization |
Published | 2020-01-17 |
URL | https://arxiv.org/abs/2001.06130v1 |
https://arxiv.org/pdf/2001.06130v1.pdf | |
PWC | https://paperswithcode.com/paper/adamt-a-stochastic-optimization-with-trend-1 |
Repo | |
Framework | |
A Survey on Knowledge Graph-Based Recommender Systems
Title | A Survey on Knowledge Graph-Based Recommender Systems |
Authors | Qingyu Guo, Fuzhen Zhuang, Chuan Qin, Hengshu Zhu, Xing Xie, Hui Xiong, Qing He |
Abstract | To solve the information explosion problem and enhance user experience in various online applications, recommender systems have been developed to model users preferences. Although numerous efforts have been made toward more personalized recommendations, recommender systems still suffer from several challenges, such as data sparsity and cold start. In recent years, generating recommendations with the knowledge graph as side information has attracted considerable interest. Such an approach can not only alleviate the abovementioned issues for a more accurate recommendation, but also provide explanations for recommended items. In this paper, we conduct a systematical survey of knowledge graph-based recommender systems. We collect recently published papers in this field and summarize them from two perspectives. On the one hand, we investigate the proposed algorithms by focusing on how the papers utilize the knowledge graph for accurate and explainable recommendation. On the other hand, we introduce datasets used in these works. Finally, we propose several potential research directions in this field. |
Tasks | Recommendation Systems |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2003.00911v1 |
https://arxiv.org/pdf/2003.00911v1.pdf | |
PWC | https://paperswithcode.com/paper/a-survey-on-knowledge-graph-based-recommender |
Repo | |
Framework | |
Temporal-adaptive Hierarchical Reinforcement Learning
Title | Temporal-adaptive Hierarchical Reinforcement Learning |
Authors | Wen-Ji Zhou, Yang Yu |
Abstract | Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is expected to be decomposed into corresponding levels with sub-tasks, and thus the learning can be more efficient. In HRL, although it is intuitive that a high-level policy only needs to make macro decisions in a low frequency, the exact frequency is hard to be simply determined. Previous HRL approaches often employed a fixed-time skip strategy or learn a terminal condition without taking account of the context, which, however, not only requires manual adjustments but also sacrifices some decision granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical policy learning} (TEMPLE) structure, which uses a temporal gate to adaptively control the high-level policy decision frequency. We train the TEMPLE structure with PPO and test its performance in a range of environments including 2-D rooms, Mujoco tasks, and Atari games. The results show that the TEMPLE structure can lead to improved performance in these environments with a sequential adaptive high-level control. |
Tasks | Atari Games, Hierarchical Reinforcement Learning |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02080v1 |
https://arxiv.org/pdf/2002.02080v1.pdf | |
PWC | https://paperswithcode.com/paper/temporal-adaptive-hierarchical-reinforcement |
Repo | |
Framework | |
Input Perturbation: A New Paradigm between Central and Local Differential Privacy
Title | Input Perturbation: A New Paradigm between Central and Local Differential Privacy |
Authors | Yilin Kang, Yong Liu, Ben Niu, Xinyi Tong, Likun Zhang, Weiping Wang |
Abstract | Traditionally, there are two models on differential privacy: the central model and the local model. The central model focuses on the machine learning model and the local model focuses on the training data. In this paper, we study the \textit{input perturbation} method in differentially private empirical risk minimization (DP-ERM), preserving privacy of the central model. By adding noise to the original training data and training with the `perturbed data’, we achieve ($\epsilon$,$\delta$)-differential privacy on the final model, along with some kind of privacy on the original data. We observe that there is an interesting connection between the local model and the central model: the perturbation on the original data causes the perturbation on the gradient, and finally the model parameters. This observation means that our method builds a bridge between local and central model, protecting the data, the gradient and the model simultaneously, which is more superior than previous central methods. Detailed theoretical analysis and experiments show that our method achieves almost the same (or even better) performance as some of the best previous central methods with more protections on privacy, which is an attractive result. Moreover, we extend our method to a more general case: the loss function satisfies the Polyak-Lojasiewicz condition, which is more general than strong convexity, the constraint on the loss function in most previous work. | |
Tasks | |
Published | 2020-02-20 |
URL | https://arxiv.org/abs/2002.08570v1 |
https://arxiv.org/pdf/2002.08570v1.pdf | |
PWC | https://paperswithcode.com/paper/input-perturbation-a-new-paradigm-between |
Repo | |
Framework | |
Developing a Multilingual Annotated Corpus of Misogyny and Aggression
Title | Developing a Multilingual Annotated Corpus of Misogyny and Aggression |
Authors | Shiladitya Bhattacharya, Siddharth Singh, Ritesh Kumar, Akanksha Bansal, Akash Bhagat, Yogesh Dawer, Bornini Lahiri, Atul Kr. Ojha |
Abstract | In this paper, we discuss the development of a multilingual annotated corpus of misogyny and aggression in Indian English, Hindi, and Indian Bangla as part of a project on studying and automatically identifying misogyny and communalism on social media (the ComMA Project). The dataset is collected from comments on YouTube videos and currently contains a total of over 20,000 comments. The comments are annotated at two levels - aggression (overtly aggressive, covertly aggressive, and non-aggressive) and misogyny (gendered and non-gendered). We describe the process of data collection, the tagset used for annotation, and issues and challenges faced during the process of annotation. Finally, we discuss the results of the baseline experiments conducted to develop a classifier for misogyny in the three languages. |
Tasks | |
Published | 2020-03-16 |
URL | https://arxiv.org/abs/2003.07428v1 |
https://arxiv.org/pdf/2003.07428v1.pdf | |
PWC | https://paperswithcode.com/paper/developing-a-multilingual-annotated-corpus-of |
Repo | |
Framework | |
Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images
Title | Novel Radiomic Feature for Survival Prediction of Lung Cancer Patients using Low-Dose CBCT Images |
Authors | Bijju Kranthi Veduruparthi, Jayanta Mukherjee, Partha Pratim Das, Moses Arunsingh, Raj Kumar Shrimali, Sriram Prasath, Soumendranath Ray, Sanjay Chatterjee |
Abstract | Prediction of survivability in a patient for tumor progression is useful to estimate the effectiveness of a treatment protocol. In our work, we present a model to take into account the heterogeneous nature of a tumor to predict survival. The tumor heterogeneity is measured in terms of its mass by combining information regarding the radiodensity obtained in images with the gross tumor volume (GTV). We propose a novel feature called Tumor Mass within a GTV (TMG), that improves the prediction of survivability, compared to existing models which use GTV. Weekly variation in TMG of a patient is computed from the image data and also estimated from a cell survivability model. The parameters obtained from the cell survivability model are indicatives of changes in TMG over the treatment period. We use these parameters along with other patient metadata to perform survival analysis and regression. Cox’s Proportional Hazard survival regression was performed using these data. Significant improvement in the average concordance index from 0.47 to 0.64 was observed when TMG is used in the model instead of GTV. The experiments show that there is a difference in the treatment response in responsive and non-responsive patients and that the proposed method can be used to predict patient survivability. |
Tasks | Survival Analysis |
Published | 2020-03-07 |
URL | https://arxiv.org/abs/2003.03537v1 |
https://arxiv.org/pdf/2003.03537v1.pdf | |
PWC | https://paperswithcode.com/paper/novel-radiomic-feature-for-survival |
Repo | |
Framework | |
Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements
Title | Mining Disinformation and Fake News: Concepts, Methods, and Recent Advancements |
Authors | Kai Shu, Suhang Wang, Dongwon Lee, Huan Liu |
Abstract | In recent years, disinformation including fake news, has became a global phenomenon due to its explosive growth, particularly on social media. The wide spread of disinformation and fake news can cause detrimental societal effects. Despite the recent progress in detecting disinformation and fake news, it is still non-trivial due to its complexity, diversity, multi-modality, and costs of fact-checking or annotation. The goal of this chapter is to pave the way for appreciating the challenges and advancements via: (1) introducing the types of information disorder on social media and examine their differences and connections; (2) describing important and emerging tasks to combat disinformation for characterization, detection and attribution; and (3) discussing a weak supervision approach to detect disinformation with limited labeled data. We then provide an overview of the chapters in this book that represent the recent advancements in three related parts: (1) user engagements in the dissemination of information disorder; (2) techniques on detecting and mitigating disinformation; and (3) trending issues such as ethics, blockchain, clickbaits, etc. We hope this book to be a convenient entry point for researchers, practitioners, and students to understand the problems and challenges, learn state-of-the-art solutions for their specific needs, and quickly identify new research problems in their domains. |
Tasks | |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00623v1 |
https://arxiv.org/pdf/2001.00623v1.pdf | |
PWC | https://paperswithcode.com/paper/mining-disinformation-and-fake-news-concepts |
Repo | |
Framework | |
PANDA: Prototypical Unsupervised Domain Adaptation
Title | PANDA: Prototypical Unsupervised Domain Adaptation |
Authors | Dapeng Hu, Jian Liang, Qibin Hou, Hanshu Yan, Yunpeng Chen, Shuicheng Yan, Jiashi Feng |
Abstract | Previous adversarial domain alignment methods for unsupervised domain adaptation (UDA) pursue conditional domain alignment via intermediate pseudo labels. However, these pseudo labels are generated by independent instances without considering the global data structure and tend to be noisy, making them unreliable for adversarial domain adaptation. Compared with pseudo labels, prototypes are more reliable to represent the data structure resistant to the domain shift since they are summarized over all the relevant instances. In this work, we attempt to calibrate the noisy pseudo labels with prototypes. Specifically, we first obtain a reliable prototypical representation for each instance by multiplying the soft instance predictions with the global prototypes. Based on the prototypical representation, we propose a novel Prototypical Adversarial Learning (PAL) scheme and exploit it to align both feature representations and intermediate prototypes across domains. Besides, with the intermediate prototypes as a proxy, we further minimize the intra-class variance in the target domain to adaptively improve the pseudo labels. Integrating the three objectives, we develop an unified framework termed PrototypicAl uNsupervised Domain Adaptation (PANDA) for UDA. Experiments show that PANDA achieves state-of-the-art or competitive results on multiple UDA benchmarks including both object recognition and semantic segmentation tasks. |
Tasks | Domain Adaptation, Object Recognition, Semantic Segmentation, Unsupervised Domain Adaptation |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13274v1 |
https://arxiv.org/pdf/2003.13274v1.pdf | |
PWC | https://paperswithcode.com/paper/panda-prototypical-unsupervised-domain |
Repo | |
Framework | |
Memory Aggregation Networks for Efficient Interactive Video Object Segmentation
Title | Memory Aggregation Networks for Efficient Interactive Video Object Segmentation |
Authors | Jiaxu Miao, Yunchao Wei, Yi Yang |
Abstract | Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions. Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage. In this work, we propose a unified framework, named Memory Aggregation Networks (MA-Net), to address the challenging iVOS in a more efficient way. Our MA-Net integrates the interaction and the propagation operations into a single network, which significantly promotes the efficiency of iVOS in the scheme of multi-round interactions. More importantly, we propose a simple yet effective memory aggregation mechanism to record the informative knowledge from the previous interaction rounds, improving the robustness in discovering challenging objects of interest greatly. We conduct extensive experiments on the validation set of DAVIS Challenge 2018 benchmark. In particular, our MA-Net achieves the J@60 score of 76.1% without any bells and whistles, outperforming the state-of-the-arts with more than 2.7%. |
Tasks | Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2020-03-30 |
URL | https://arxiv.org/abs/2003.13246v1 |
https://arxiv.org/pdf/2003.13246v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-aggregation-networks-for-efficient |
Repo | |
Framework | |
Methods to Recover Unknown Processes in Partial Differential Equations Using Data
Title | Methods to Recover Unknown Processes in Partial Differential Equations Using Data |
Authors | Zhen Chen, Kailiang Wu, Dongbin Xiu |
Abstract | We study the problem of identifying unknown processes embedded in time-dependent partial differential equation (PDE) using observational data, with an application to advection-diffusion type PDE. We first conduct theoretical analysis and derive conditions to ensure the solvability of the problem. We then present a set of numerical approaches, including Galerkin type algorithm and collocation type algorithm. Analysis of the algorithms are presented, along with their implementation detail. The Galerkin algorithm is more suitable for practical situations, particularly those with noisy data, as it avoids using derivative/gradient data. Various numerical examples are then presented to demonstrate the performance and properties of the numerical methods. |
Tasks | |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02387v1 |
https://arxiv.org/pdf/2003.02387v1.pdf | |
PWC | https://paperswithcode.com/paper/methods-to-recover-unknown-processes-in |
Repo | |
Framework | |
Deep Learning in Multi-organ Segmentation
Title | Deep Learning in Multi-organ Segmentation |
Authors | Yang Lei, Yabo Fu, Tonghe Wang, Richard L. J. Qiu, Walter J. Curran, Tian Liu, Xiaofeng Yang |
Abstract | This paper presents a review of deep learning (DL) in multi-organ segmentation. We summarized the latest DL-based methods for medical image segmentation and applications. These methods were classified into six categories according to their network design. For each category, we listed the surveyed works, highlighted important contributions and identified specific challenges. Following the detailed review of each category, we briefly discussed its achievements, shortcomings and future potentials. We provided a comprehensive comparison among DL-based methods for thoracic and head & neck multiorgan segmentation using benchmark datasets, including the 2017 AAPM Thoracic Auto-segmentation Challenge datasets and 2015 MICCAI Head Neck Auto-Segmentation Challenge datasets. |
Tasks | Medical Image Segmentation, Semantic Segmentation |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10619v1 |
https://arxiv.org/pdf/2001.10619v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-in-multi-organ-segmentation |
Repo | |
Framework | |