Paper Group ANR 357
Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts. The Mathematical Structure of Integrated Information Theory. How neural networks find generalizable solutions: Self-tuned annealing in deep learning. Reasoning About Generalization via Conditional Mutual Information. On Approximation Capabilities of ReLU Activatio …
Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts
Title | Exploring the Efficacy of Transfer Learning in Mining Image-Based Software Artifacts |
Authors | Natalie Best, Jordan Ott, Erik Linstead |
Abstract | Transfer learning allows us to train deep architectures requiring a large number of learned parameters, even if the amount of available data is limited, by leveraging existing models previously trained for another task. Here we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software UML diagrams. Our experimental results show training reacts positively to transfer learning as related to sample size, even though the pre-trained model was not exposed to training instances from the software domain. We contrast the transferred network with other networks to show its advantage on different sized training sets, which indicates that transfer learning is equally effective to custom deep architectures when large amounts of training data is not available. |
Tasks | Transfer Learning |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01627v1 |
https://arxiv.org/pdf/2003.01627v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-the-efficacy-of-transfer-learning |
Repo | |
Framework | |
The Mathematical Structure of Integrated Information Theory
Title | The Mathematical Structure of Integrated Information Theory |
Authors | Johannes Kleiner, Sean Tull |
Abstract | Integrated Information Theory is one of the leading models of consciousness. It aims to describe both the quality and quantity of the conscious experience of a physical system, such as the brain, in a particular state. In this contribution, we propound the mathematical structure of the theory, separating the essentials from auxiliary formal tools. We provide a definition of a generalized IIT which has IIT 3.0 of Tononi et. al., as well as the Quantum IIT introduced by Zanardi et. al. as special cases. This provides an axiomatic definition of the theory which may serve as the starting point for future formal investigations and as an introduction suitable for researchers with a formal background. |
Tasks | |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07655v1 |
https://arxiv.org/pdf/2002.07655v1.pdf | |
PWC | https://paperswithcode.com/paper/the-mathematical-structure-of-integrated |
Repo | |
Framework | |
How neural networks find generalizable solutions: Self-tuned annealing in deep learning
Title | How neural networks find generalizable solutions: Self-tuned annealing in deep learning |
Authors | Yu Feng, Yuhai Tu |
Abstract | Despite the tremendous success of Stochastic Gradient Descent (SGD) algorithm in deep learning, little is known about how SGD finds generalizable solutions in the high-dimensional weight space. By analyzing the learning dynamics and loss function landscape, we discover a robust inverse relation between the weight variance and the landscape flatness (inverse of curvature) for all SGD-based learning algorithms. To explain the inverse variance-flatness relation, we develop a random landscape theory, which shows that the SGD noise strength (effective temperature) depends inversely on the landscape flatness. Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape. Finally, we demonstrate how these new theoretical insights lead to more efficient algorithms, e.g., for avoiding catastrophic forgetting. |
Tasks | |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01678v1 |
https://arxiv.org/pdf/2001.01678v1.pdf | |
PWC | https://paperswithcode.com/paper/how-neural-networks-find-generalizable |
Repo | |
Framework | |
Reasoning About Generalization via Conditional Mutual Information
Title | Reasoning About Generalization via Conditional Mutual Information |
Authors | Thomas Steinke, Lydia Zakynthinou |
Abstract | We provide an information-theoretic framework for studying the generalization properties of machine learning algorithms. Our framework ties together existing approaches, including uniform convergence bounds and recent methods for adaptive data analysis. Specifically, we use Conditional Mutual Information (CMI) to quantify how well the input (i.e., the training data) can be recognized given the output (i.e., the trained model) of the learning algorithm. We show that bounds on CMI can be obtained from VC dimension, compression schemes, differential privacy, and other methods. We then show that bounded CMI implies various forms of generalization. |
Tasks | |
Published | 2020-01-24 |
URL | https://arxiv.org/abs/2001.09122v2 |
https://arxiv.org/pdf/2001.09122v2.pdf | |
PWC | https://paperswithcode.com/paper/reasoning-about-generalization-via |
Repo | |
Framework | |
On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks
Title | On Approximation Capabilities of ReLU Activation and Softmax Output Layer in Neural Networks |
Authors | Behnam Asadi, Hui Jiang |
Abstract | In this paper, we have extended the well-established universal approximator theory to neural networks that use the unbounded ReLU activation function and a nonlinear softmax output layer. We have proved that a sufficiently large neural network using the ReLU activation function can approximate any function in $L^1$ up to any arbitrary precision. Moreover, our theoretical results have shown that a large enough neural network using a nonlinear softmax output layer can also approximate any indicator function in $L^1$, which is equivalent to mutually-exclusive class labels in any realistic multiple-class pattern classification problems. To the best of our knowledge, this work is the first theoretical justification for using the softmax output layers in neural networks for pattern classification. |
Tasks | |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.04060v1 |
https://arxiv.org/pdf/2002.04060v1.pdf | |
PWC | https://paperswithcode.com/paper/on-approximation-capabilities-of-relu |
Repo | |
Framework | |
Towards Automatic Bayesian Optimization: A first step involving acquisition functions
Title | Towards Automatic Bayesian Optimization: A first step involving acquisition functions |
Authors | Eduardo C. Garrido Merchán, Luis C. Jariego Pérez |
Abstract | Bayesian Optimization is the state of the art technique for the optimization of black boxes, i.e., functions where we do not have access to their analytical expression nor its gradients, they are expensive to evaluate and its evaluation is noisy. The most popular application of bayesian optimization is the automatic hyperparameter tuning of machine learning algorithms, where we obtain the best configuration of machine learning algorithms by optimizing the estimation of the generalization error of these algorithms. Despite being applied with success, bayesian optimization methodologies also have hyperparameters that need to be configured such as the probabilistic surrogate model or the acquisition function used. A bad decision over the configuration of these hyperparameters implies obtaining bad quality results. Typically, these hyperparameters are tuned by making assumptions of the objective function that we want to evaluate but there are scenarios where we do not have any prior information about the objective function. In this paper, we propose a first attempt over automatic bayesian optimization by exploring several heuristics that automatically tune the acquisition function of bayesian optimization. We illustrate the effectiveness of these heurisitcs in a set of benchmark problems and a hyperparameter tuning problem of a machine learning algorithm. |
Tasks | |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09643v1 |
https://arxiv.org/pdf/2003.09643v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-automatic-bayesian-optimization-a |
Repo | |
Framework | |
Explainable Agents Through Social Cues: A Review
Title | Explainable Agents Through Social Cues: A Review |
Authors | Sebastian Wallkotter, Silvia Tulli, Ginevra Castellano, Ana Paiva, Mohamed Chetouani |
Abstract | How to provide explanations has experienced a surge of interest in Human-Robot Interaction (HRI) over the last three years. In HRI this is known as explainability, expressivity, transparency or sometimes legibility, and the challenge for embodied agents is that they offer a unique array of modalities to communicate this information thanks to their embodiment. Responding to this surge of interest, we review the existing literature in explainability and organize it by (1) providing an overview of existing definitions, (2) showing how explainability is implemented and how it exploits different modalities, and (3) showing how the impact of explainability is measured. Additionally, we present a list of open questions and challenges that highlight areas that require further investigation by the community. This provides the interested scholar with an overview of the current state-of-the-art. |
Tasks | |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05251v1 |
https://arxiv.org/pdf/2003.05251v1.pdf | |
PWC | https://paperswithcode.com/paper/explainable-agents-through-social-cues-a |
Repo | |
Framework | |
Random smooth gray value transformations for cross modality learning with gray value invariant networks
Title | Random smooth gray value transformations for cross modality learning with gray value invariant networks |
Authors | Nikolas Lessmann, Bram van Ginneken |
Abstract | Random transformations are commonly used for augmentation of the training data with the goal of reducing the uniformity of the training samples. These transformations normally aim at variations that can be expected in images from the same modality. Here, we propose a simple method for transforming the gray values of an image with the goal of reducing cross modality differences. This approach enables segmentation of the lumbar vertebral bodies in CT images using a network trained exclusively with MR images. The source code is made available at https://github.com/nlessmann/rsgt |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06158v1 |
https://arxiv.org/pdf/2003.06158v1.pdf | |
PWC | https://paperswithcode.com/paper/random-smooth-gray-value-transformations-for |
Repo | |
Framework | |
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
Title | REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums |
Authors | Joobin Gharibshah, Evangelos E. Papalexakis, Michalis Faloutsos |
Abstract | How can we extract useful information from a security forum? We focus on identifying threads of interest to a security professional: (a) alerts of worrisome events, such as attacks, (b) offering of malicious services and products, (c) hacking information to perform malicious acts, and (d) useful security-related experiences. The analysis of security forums is in its infancy despite several promising recent works. Novel approaches are needed to address the challenges in this domain: (a) the difficulty in specifying the “topics” of interest efficiently, and (b) the unstructured and informal nature of the text. We propose, REST, a systematic methodology to: (a) identify threads of interest based on a, possibly incomplete, bag of words, and (b) classify them into one of the four classes above. The key novelty of the work is a multi-step weighted embedding approach: we project words, threads and classes in appropriate embedding spaces and establish relevance and similarity there. We evaluate our method with real data from three security forums with a total of 164k posts and 21K threads. First, REST robustness to initial keyword selection can extend the user-provided keyword set and thus, it can recover from missing keywords. Second, REST categorizes the threads into the classes of interest with superior accuracy compared to five other methods: REST exhibits an accuracy between 63.3-76.9%. We see our approach as a first step for harnessing the wealth of information of online forums in a user-friendly way, since the user can loosely specify her keywords of interest. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.02660v2 |
https://arxiv.org/pdf/2001.02660v2.pdf | |
PWC | https://paperswithcode.com/paper/rest-a-thread-embedding-approach-for |
Repo | |
Framework | |
Emotions Don’t Lie: A Deepfake Detection Method using Audio-Visual Affective Cues
Title | Emotions Don’t Lie: A Deepfake Detection Method using Audio-Visual Affective Cues |
Authors | Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha |
Abstract | We present a learning-based multimodal method for detecting real and deepfake videos. To maximize information for learning, we extract and analyze the similarity between the two audio and visual modalities from within the same video. Additionally, we extract and compare affective cues corresponding to emotion from the two modalities within a video to infer whether the input video is “real” or “fake”. We propose a deep learning network, inspired by the Siamese network architecture and the triplet loss. To validate our model, we report the AUC metric on two large-scale, audio-visual deepfake detection datasets, DeepFake-TIMIT Dataset and DFDC. We compare our approach with several SOTA deepfake detection methods and report per-video AUC of 84.4% on the DFDC and 96.6% on the DF-TIMIT datasets, respectively. |
Tasks | DeepFake Detection, Face Swapping |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06711v2 |
https://arxiv.org/pdf/2003.06711v2.pdf | |
PWC | https://paperswithcode.com/paper/emotions-dont-lie-a-deepfake-detection-method |
Repo | |
Framework | |
Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests
Title | Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests |
Authors | Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li |
Abstract | A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users’ preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length $T$ is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings. |
Tasks | Recommendation Systems |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00359v1 |
https://arxiv.org/pdf/2003.00359v1.pdf | |
PWC | https://paperswithcode.com/paper/contextual-bandit-based-personalized |
Repo | |
Framework | |
Understanding Generalization in Deep Learning via Tensor Methods
Title | Understanding Generalization in Deep Learning via Tensor Methods |
Authors | Jingling Li, Yanchao Sun, Jiahao Su, Taiji Suzuki, Furong Huang |
Abstract | Deep neural networks generalize well on unseen data though the number of parameters often far exceeds the number of training examples. Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on. In this work, we advance the understanding of the relations between the network’s architecture and its generalizability from the compression perspective. Using tensor analysis, we propose a series of intuitive, data-dependent and easily-measurable properties that tightly characterize the compressibility and generalizability of neural networks; thus, in practice, our generalization bound outperforms the previous compression-based ones, especially for neural networks using tensors as their weight kernels (e.g. CNNs). Moreover, these intuitive measurements provide further insights into designing neural network architectures with properties favorable for better/guaranteed generalizability. Our experimental results demonstrate that through the proposed measurable properties, our generalization error bound matches the trend of the test error well. Our theoretical analysis further provides justifications for the empirical success and limitations of some widely-used tensor-based compression approaches. We also discover the improvements to the compressibility and robustness of current neural networks when incorporating tensor operations via our proposed layer-wise structure. |
Tasks | |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.05070v1 |
https://arxiv.org/pdf/2001.05070v1.pdf | |
PWC | https://paperswithcode.com/paper/understanding-generalization-in-deep-learning |
Repo | |
Framework | |
MapLUR: Exploring a new Paradigm for Estimating Air Pollution using Deep Learning on Map Images
Title | MapLUR: Exploring a new Paradigm for Estimating Air Pollution using Deep Learning on Map Images |
Authors | Michael Steininger, Konstantin Kobs, Albin Zehe, Florian Lautenschlager, Martin Becker, Andreas Hotho |
Abstract | Land-use regression (LUR) models are important for the assessment of air pollution concentrations in areas without measurement stations. While many such models exist, they often use manually constructed features based on restricted, locally available data. Thus, they are typically hard to reproduce and challenging to adapt to areas beyond those they have been developed for. In this paper, we advocate a paradigm shift for LUR models: We propose the Data-driven, Open, Global (DOG) paradigm that entails models based on purely data-driven approaches using only openly and globally available data. Progress within this paradigm will alleviate the need for experts to adapt models to the local characteristics of the available data sources and thus facilitate the generalizability of air pollution models to new areas on a global scale. In order to illustrate the feasibility of the DOG paradigm for LUR, we introduce a deep learning model called MapLUR. It is based on a convolutional neural network architecture and is trained exclusively on globally and openly available map data without requiring manual feature engineering. We compare our model to state-of-the-art baselines like linear regression, random forests and multi-layer perceptrons using a large data set of modeled $\text{NO}_2$ concentrations in Central London. Our results show that MapLUR significantly outperforms these approaches even though they are provided with manually tailored features. Furthermore, we illustrate that the automatic feature extraction inherent to models based on the DOG paradigm can learn features that are readily interpretable and closely resemble those commonly used in traditional LUR approaches. |
Tasks | Feature Engineering |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.07493v1 |
https://arxiv.org/pdf/2002.07493v1.pdf | |
PWC | https://paperswithcode.com/paper/maplur-exploring-a-new-paradigm-for |
Repo | |
Framework | |
OccuSeg: Occupancy-aware 3D Instance Segmentation
Title | OccuSeg: Occupancy-aware 3D Instance Segmentation |
Authors | Lei Han, Tian Zheng, Lan Xu, Lu Fang |
Abstract | 3D instance segmentation, with a variety of applications in robotics and augmented reality, is in large demands these days. Unlike 2D images that are projective observations of the environment, 3D models provide metric reconstruction of the scenes without occlusion or scale ambiguity. In this paper, we define “3D occupancy size”, as the number of voxels occupied by each instance. It owns advantages of robustness in prediction, on which basis, OccuSeg, an occupancy-aware 3D instance segmentation scheme is proposed. Our multi-task learning produces both occupancy signal and embedding representations, where the training of spatial and feature embeddings varies with their difference in scale-aware. Our clustering scheme benefits from the reliable comparison between the predicted occupancy size and the clustered occupancy size, which encourages hard samples being correctly clustered and avoids over segmentation. The proposed approach achieves state-of-the-art performance on 3 real-world datasets, i.e. ScanNetV2, S3DIS and SceneNN, while maintaining high efficiency. |
Tasks | 3D Instance Segmentation, Instance Segmentation, Multi-Task Learning, Semantic Segmentation |
Published | 2020-03-14 |
URL | https://arxiv.org/abs/2003.06537v1 |
https://arxiv.org/pdf/2003.06537v1.pdf | |
PWC | https://paperswithcode.com/paper/occuseg-occupancy-aware-3d-instance |
Repo | |
Framework | |
Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds
Title | Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds |
Authors | Guangnan Wu, Zhiyi Pan, Peng Jiang, Changhe Tu |
Abstract | Instance segmentation in point clouds is one of the most fine-grained ways to understand the 3D scene. Due to its close relationship to semantic segmentation, many works approach these two tasks simultaneously and leverage the benefits of multi-task learning. However, most of them only considered simple strategies such as element-wise feature fusion, which may not lead to mutual promotion. In this work, we build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception, which uses similarity matrix measured from features for one task to help aggregate non-local information for the other task, avoiding the potential feature exclusion and task conflict. From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified. Moreover, the mechanism of how bi-directional attention module helps joint instance and semantic segmentation is also analyzed. |
Tasks | Instance Segmentation, Multi-Task Learning, Semantic Segmentation |
Published | 2020-03-11 |
URL | https://arxiv.org/abs/2003.05420v1 |
https://arxiv.org/pdf/2003.05420v1.pdf | |
PWC | https://paperswithcode.com/paper/bi-directional-attention-for-joint-instance |
Repo | |
Framework | |