April 2, 2020

2961 words 14 mins read

Paper Group ANR 357

Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts. The Mathematical Structure of Integrated Information Theory. How neural networks find generalizable solutions: Self-tuned annealing in deep learning. Reasoning About Generalization via Conditional Mutual Information. On Approximation Capabilities of ReLU Activatio …

Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts


Title	Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts
Authors	Natalie Best, Jordan Ott, Erik Linstead
Abstract	Transfer learning allows us to train deep architectures requiring a large number of learned parameters, even if the amount of available data is limited, by leveraging existing models previously trained for another task. Here we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software UML diagrams. Our experimental results show training reacts positively to transfer learning as related to sample size, even though the pre-trained model was not exposed to training instances from the software domain. We contrast the transferred network with other networks to show its advantage on different sized training sets, which indicates that transfer learning is equally effective to custom deep architectures when large amounts of training data is not available.
Tasks	Transfer Learning
Published	2020-03-03
URL	https://arxiv.org/abs/2003.01627v1
PDF	https://arxiv.org/pdf/2003.01627v1.pdf
PWC	https://paperswithcode.com/paper/exploring-the-efficacy-of-transfer-learning
Repo
Framework

The Mathematical Structure of Integrated Information Theory


Title	The Mathematical Structure of Integrated Information Theory
Authors	Johannes Kleiner, Sean Tull
Abstract	Integrated Information Theory is one of the leading models of consciousness. It aims to describe both the quality and quantity of the conscious experience of a physical system, such as the brain, in a particular state. In this contribution, we propound the mathematical structure of the theory, separating the essentials from auxiliary formal tools. We provide a definition of a generalized IIT which has IIT 3.0 of Tononi et. al., as well as the Quantum IIT introduced by Zanardi et. al. as special cases. This provides an axiomatic definition of the theory which may serve as the starting point for future formal investigations and as an introduction suitable for researchers with a formal background.
Tasks
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07655v1
PDF	https://arxiv.org/pdf/2002.07655v1.pdf
PWC	https://paperswithcode.com/paper/the-mathematical-structure-of-integrated
Repo
Framework

How neural networks find generalizable solutions: Self-tuned annealing in deep learning


Title	How neural networks find generalizable solutions: Self-tuned annealing in deep learning
Authors	Yu Feng, Yuhai Tu
Abstract	Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting.
Tasks
Published	2020-01-06
URL	https://arxiv.org/abs/2001.01678v1
PDF	https://arxiv.org/pdf/2001.01678v1.pdf
PWC	https://paperswithcode.com/paper/how-neural-networks-find-generalizable
Repo
Framework

Reasoning About Generalization via Conditional Mutual Information


Title	Reasoning About Generalization via Conditional Mutual Information
Authors	Thomas Steinke, Lydia Zakynthinou
Abstract	We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the output (i.e., the trained model) of the learning algorithm. We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods. We then show that bounded CMI implies various forms of generalization.
Tasks
Published	2020-01-24
URL	https://arxiv.org/abs/2001.09122v2
PDF	https://arxiv.org/pdf/2001.09122v2.pdf
PWC	https://paperswithcode.com/paper/reasoning-about-generalization-via
Repo
Framework

On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks


Title	On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks
Authors	Behnam Asadi, Hui Jiang
Abstract	In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in $L^1$ up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L^1$, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04060v1
PDF	https://arxiv.org/pdf/2002.04060v1.pdf
PWC	https://paperswithcode.com/paper/on-approximation-capabilities-of-relu
Repo
Framework

Towards Automatic Bayesian Optimization: A first step involving acquisition functions


Title	Towards Automatic Bayesian Optimization: A first step involving acquisition functions
Authors	Eduardo C. Garrido Merchán, Luis C. Jariego Pérez
Abstract	Bayesian Optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression nor its gradients, they are expensive to evaluate and its evaluation is noisy. The most popular application of bayesian optimization is the automatic hyperparameter tuning of machine learning algorithms, where we obtain the best configuration of machine learning algorithms by optimizing the estimation of the generalization error of these algorithms. Despite being applied with success, bayesian optimization methodologies also have hyperparameters that need to be configured such as the probabilistic surrogate model or the acquisition function used. A bad decision over the configuration of these hyperparameters implies obtaining bad quality results. Typically, these hyperparameters are tuned by making assumptions of the objective function that we want to evaluate but there are scenarios where we do not have any prior information about the objective function. In this paper, we propose a first attempt over automatic bayesian optimization by exploring several heuristics that automatically tune the acquisition function of bayesian optimization. We illustrate the effectiveness of these heurisitcs in a set of benchmark problems and a hyperparameter tuning problem of a machine learning algorithm.
Tasks
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09643v1
PDF	https://arxiv.org/pdf/2003.09643v1.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-bayesian-optimization-a
Repo
Framework


Title	Explainable Agents Through Social Cues: A Review
Authors	Sebastian Wallkotter, Silvia Tulli, Ginevra Castellano, Ana Paiva, Mohamed Chetouani
Abstract	How to provide explanations has experienced a surge of interest in Human-Robot Interaction (HRI) over the last three years. In HRI this is known as explainability, expressivity, transparency or sometimes legibility, and the challenge for embodied agents is that they offer a unique array of modalities to communicate this information thanks to their embodiment. Responding to this surge of interest, we review the existing literature in explainability and organize it by (1) providing an overview of existing definitions, (2) showing how explainability is implemented and how it exploits different modalities, and (3) showing how the impact of explainability is measured. Additionally, we present a list of open questions and challenges that highlight areas that require further investigation by the community. This provides the interested scholar with an overview of the current state-of-the-art.
Tasks
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05251v1
PDF	https://arxiv.org/pdf/2003.05251v1.pdf
PWC	https://paperswithcode.com/paper/explainable-agents-through-social-cues-a
Repo
Framework

Random smooth gray value transformations for cross modality learning with gray value invariant networks


Title	Random smooth gray value transformations for cross modality learning with gray value invariant networks
Authors	Nikolas Lessmann, Bram van Ginneken
Abstract	Random transformations are commonly used for augmentation of the training data with the goal of reducing the uniformity of the training samples. These transformations normally aim at variations that can be expected in images from the same modality. Here, we propose a simple method for transforming the gray values of an image with the goal of reducing cross modality differences. This approach enables segmentation of the lumbar vertebral bodies in CT images using a network trained exclusively with MR images. The source code is made available at https://github.com/nlessmann/rsgt
Tasks
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06158v1
PDF	https://arxiv.org/pdf/2003.06158v1.pdf
PWC	https://paperswithcode.com/paper/random-smooth-gray-value-transformations-for
Repo
Framework

REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums


Title	REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
Authors	Joobin Gharibshah, Evangelos E. Papalexakis, Michalis Faloutsos
Abstract	How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the “topics” of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest.
Tasks
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02660v2
PDF	https://arxiv.org/pdf/2001.02660v2.pdf
PWC	https://paperswithcode.com/paper/rest-a-thread-embedding-approach-for
Repo
Framework

Emotions Don’t Lie: A Deepfake Detection Method using Audio-Visual Affective Cues


Title	Emotions Don’t Lie: A Deepfake Detection Method using Audio-Visual Affective Cues
Authors	Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
Abstract	We present a learning-based multimodal method for detecting real and deepfake videos. To maximize information for learning, we extract and analyze the similarity between the two audio and visual modalities from within the same video. Additionally, we extract and compare affective cues corresponding to emotion from the two modalities within a video to infer whether the input video is “real” or “fake”. We propose a deep learning network, inspired by the Siamese network architecture and the triplet loss. To validate our model, we report the AUC metric on two large-scale, audio-visual deepfake detection datasets, DeepFake-TIMIT Dataset and DFDC. We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets, respectively.
Tasks	DeepFake Detection, Face Swapping
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06711v2
PDF	https://arxiv.org/pdf/2003.06711v2.pdf
PWC	https://paperswithcode.com/paper/emotions-dont-lie-a-deepfake-detection-method
Repo
Framework

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests


Title	Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
Authors	Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li
Abstract	A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users’ preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length $T$ is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.
Tasks	Recommendation Systems
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00359v1
PDF	https://arxiv.org/pdf/2003.00359v1.pdf
PWC	https://paperswithcode.com/paper/contextual-bandit-based-personalized
Repo
Framework

Understanding Generalization in Deep Learning via Tensor Methods


Title	Understanding Generalization in Deep Learning via Tensor Methods
Authors	Jingling Li, Yanchao Sun, Jiahao Su, Taiji Suzuki, Furong Huang
Abstract	Deep neural networks generalize well on unseen data though the number of parameters often far exceeds the number of training examples. Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on. In this work, we advance the understanding of the relations between the network’s architecture and its generalizability from the compression perspective. Using tensor analysis, we propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks; thus, in practice, our generalization bound outperforms the previous compression-based ones, especially for neural networks using tensors as their weight kernels (e.g. CNNs). Moreover, these intuitive measurements provide further insights into designing neural network architectures with properties favorable for better/guaranteed generalizability. Our experimental results demonstrate that through the proposed measurable properties, our generalization error bound matches the trend of the test error well. Our theoretical analysis further provides justifications for the empirical success and limitations of some widely-used tensor-based compression approaches. We also discover the improvements to the compressibility and robustness of current neural networks when incorporating tensor operations via our proposed layer-wise structure.
Tasks
Published	2020-01-14
URL	https://arxiv.org/abs/2001.05070v1
PDF	https://arxiv.org/pdf/2001.05070v1.pdf
PWC	https://paperswithcode.com/paper/understanding-generalization-in-deep-learning
Repo
Framework

MapLUR: Exploring a new Paradigm for Estimating Air Pollution using Deep Learning on Map Images


Title	MapLUR: Exploring a new Paradigm for Estimating Air Pollution using Deep Learning on Map Images
Authors	Michael Steininger, Konstantin Kobs, Albin Zehe, Florian Lautenschlager, Martin Becker, Andreas Hotho
Abstract	Land-use regression (LUR) models are important for the assessment of air pollution concentrations in areas without measurement stations. While many such models exist, they often use manually constructed features based on restricted, locally available data. Thus, they are typically hard to reproduce and challenging to adapt to areas beyond those they have been developed for. In this paper, we advocate a paradigm shift for LUR models: We propose the Data-driven, Open, Global (DOG) paradigm that entails models based on purely data-driven approaches using only openly and globally available data. Progress within this paradigm will alleviate the need for experts to adapt models to the local characteristics of the available data sources and thus facilitate the generalizability of air pollution models to new areas on a global scale. In order to illustrate the feasibility of the DOG paradigm for LUR, we introduce a deep learning model called MapLUR. It is based on a convolutional neural network architecture and is trained exclusively on globally and openly available map data without requiring manual feature engineering. We compare our model to state-of-the-art baselines like linear regression, random forests and multi-layer perceptrons using a large data set of modeled $\text{NO}_2$ concentrations in Central London. Our results show that MapLUR significantly outperforms these approaches even though they are provided with manually tailored features. Furthermore, we illustrate that the automatic feature extraction inherent to models based on the DOG paradigm can learn features that are readily interpretable and closely resemble those commonly used in traditional LUR approaches.
Tasks	Feature Engineering
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07493v1
PDF	https://arxiv.org/pdf/2002.07493v1.pdf
PWC	https://paperswithcode.com/paper/maplur-exploring-a-new-paradigm-for
Repo
Framework

OccuSeg: Occupancy-aware 3D Instance Segmentation


Title	OccuSeg: Occupancy-aware 3D Instance Segmentation
Authors	Lei Han, Tian Zheng, Lan Xu, Lu Fang
Abstract	3D instance segmentation, with a variety of applications in robotics and augmented reality, is in large demands these days. Unlike 2D images that are projective observations of the environment, 3D models provide metric reconstruction of the scenes without occlusion or scale ambiguity. In this paper, we define “3D occupancy size”, as the number of voxels occupied by each instance. It owns advantages of robustness in prediction, on which basis, OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed. Our multi-task learning produces both occupancy signal and embedding representations, where the training of spatial and feature embeddings varies with their difference in scale-aware. Our clustering scheme benefits from the reliable comparison between the predicted occupancy size and the clustered occupancy size, which encourages hard samples being correctly clustered and avoids over segmentation. The proposed approach achieves state-of-the-art performance on 3 real-world datasets, i.e. ScanNetV2, S3DIS and SceneNN, while maintaining high efficiency.
Tasks	3D Instance Segmentation, Instance Segmentation, Multi-Task Learning, Semantic Segmentation
Published	2020-03-14
URL	https://arxiv.org/abs/2003.06537v1
PDF	https://arxiv.org/pdf/2003.06537v1.pdf
PWC	https://paperswithcode.com/paper/occuseg-occupancy-aware-3d-instance
Repo
Framework

Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds


Title	Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds
Authors	Guangnan Wu, Zhiyi Pan, Peng Jiang, Changhe Tu
Abstract	Instance segmentation in point clouds is one of the most fine-grained ways to understand the 3D scene. Due to its close relationship to semantic segmentation, many works approach these two tasks simultaneously and leverage the benefits of multi-task learning. However, most of them only considered simple strategies such as element-wise feature fusion, which may not lead to mutual promotion. In this work, we build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception, which uses similarity matrix measured from features for one task to help aggregate non-local information for the other task, avoiding the potential feature exclusion and task conflict. From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified. Moreover, the mechanism of how bi-directional attention module helps joint instance and semantic segmentation is also analyzed.
Tasks	Instance Segmentation, Multi-Task Learning, Semantic Segmentation
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05420v1
PDF	https://arxiv.org/pdf/2003.05420v1.pdf
PWC	https://paperswithcode.com/paper/bi-directional-attention-for-joint-instance
Repo
Framework