October 19, 2019

3611 words 17 mins read

Paper Group ANR 126

Paper Group ANR 126

Arbitrary Style Transfer with Deep Feature Reshuffle. An analytic theory of generalization dynamics and transfer learning in deep linear networks. Online Off-policy Prediction. Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health. A Comparative Study of Neural Network Models for Sentence Classificatio …

Arbitrary Style Transfer with Deep Feature Reshuffle

Title Arbitrary Style Transfer with Deep Feature Reshuffle
Authors Shuyang Gu, Congliang Chen, Jing Liao, Lu Yuan
Abstract This paper introduces a novel method by reshuffling deep features (i.e., permuting the spacial locations of a feature map) of the style image for arbitrary style transfer. We theoretically prove that our new style loss based on reshuffle connects both global and local style losses respectively used by most parametric and non-parametric neural style transfer methods. This simple idea can effectively address the challenging issues in existing style transfer methods. On one hand, it can avoid distortions in local style patterns, and allow semantic-level transfer, compared with neural parametric methods. On the other hand, it can preserve globally similar appearance to the style image, and avoid wash-out artifacts, compared with neural non-parametric methods. Based on the proposed loss, we also present a progressive feature-domain optimization approach. The experiments show that our method is widely applicable to various styles, and produces better quality than existing methods.
Tasks Style Transfer
Published 2018-05-10
URL http://arxiv.org/abs/1805.04103v4
PDF http://arxiv.org/pdf/1805.04103v4.pdf
PWC https://paperswithcode.com/paper/arbitrary-style-transfer-with-deep-feature
Repo
Framework

An analytic theory of generalization dynamics and transfer learning in deep linear networks

Title An analytic theory of generalization dynamics and transfer learning in deep linear networks
Authors Andrew K. Lampinen, Surya Ganguli
Abstract Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this striking performance. Furthermore, a major hope is that knowledge may transfer across tasks, so that multi-task learning can improve generalization on individual tasks. However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks. We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks. In particular, our theory provides analytic solutions to the training and testing error of deep networks as a function of training time, number of examples, network size and initialization, and the task structure and SNR. Our theory reveals that deep networks progressively learn the most important task structure first, so that generalization error at the early stopping time primarily depends on task structure and is independent of network size. This suggests any tight bound on generalization error must take into account task structure, and explains observations about real data being learned faster than random data. Intriguingly our theory also reveals the existence of a learning algorithm that proveably out-performs neural network training through gradient descent. Finally, for transfer learning, our theory reveals that knowledge transfer depends sensitively, but computably, on the SNRs and input feature alignments of pairs of tasks.
Tasks Multi-Task Learning, Transfer Learning
Published 2018-09-27
URL http://arxiv.org/abs/1809.10374v2
PDF http://arxiv.org/pdf/1809.10374v2.pdf
PWC https://paperswithcode.com/paper/an-analytic-theory-of-generalization-dynamics
Repo
Framework

Online Off-policy Prediction

Title Online Off-policy Prediction
Authors Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White
Abstract This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the predictions, and thus the samples are generated off-policy. The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades. The issue lies with the temporal difference (TD) learning update at the heart of most prediction algorithms: combining bootstrapping, off-policy sampling and function approximation may cause the value estimate to diverge. A breakthrough came with the development of a new objective function that admitted stochastic gradient descent variants of TD. Since then, many sound online off-policy prediction algorithms have been developed, but there has been limited empirical work investigating the relative merits of all the variants. This paper aims to fill these empirical gaps and provide clarity on the key ideas behind each method. We summarize the large body of literature on off-policy learning, focusing on 1- methods that use computation linear in the number of features and are convergent under off-policy sampling, and 2- other methods which have proven useful with non-fixed, nonlinear function approximation. We provide an empirical study of off-policy prediction methods in two challenging microworlds. We report each method’s parameter sensitivity, empirical convergence rate, and final performance, providing new insights that should enable practitioners to successfully extend these new methods to large-scale applications.[Abridged abstract]
Tasks
Published 2018-11-06
URL http://arxiv.org/abs/1811.02597v1
PDF http://arxiv.org/pdf/1811.02597v1.pdf
PWC https://paperswithcode.com/paper/online-off-policy-prediction
Repo
Framework

Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health

Title Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health
Authors Casey C. Bennett
Abstract Diabetes is a major public health problem in the United States, affecting roughly 30 million people. Diabetes complications, along with the mental health comorbidities that often co-occur with them, are major drivers of high healthcare costs, poor outcomes, and reduced treatment adherence in diabetes. Here, we evaluate in a large state-wide population whether we can use artificial intelligence (AI) techniques to identify clusters of patient trajectories within the broader diabetes population in order to create cost-effective, narrowly-focused case management intervention strategies to reduce development of complications. This approach combined data from: 1) claims, 2) case management notes, and 3) social determinants of health from ~300,000 real patients between 2014 and 2016. We categorized complications as five types: Cardiovascular, Neuropathy, Opthalmic, Renal, and Other. Modeling was performed combining a variety of machine learning algorithms, including supervised classification, unsupervised clustering, natural language processing of unstructured care notes, and feature engineering. The results showed that we can predict development of diabetes complications roughly 83.5% of the time using claims data or social determinants of health data. They also showed we can reveal meaningful clusters in the patient population related to complications and mental health that can be used to cost-effective screening program, reducing the number of patients to be screened down by 85%. This study outlines creation of an AI framework to develop protocols to better address mental health comorbidities that lead to complications development in the diabetes population. Future work is described that outlines potential lines of research and the need for better addressing the ‘people side’ of the equation.
Tasks Feature Engineering
Published 2018-10-06
URL https://arxiv.org/abs/1810.03044v3
PDF https://arxiv.org/pdf/1810.03044v3.pdf
PWC https://paperswithcode.com/paper/artificial-intelligence-for-diabetes-case
Repo
Framework

A Comparative Study of Neural Network Models for Sentence Classification

Title A Comparative Study of Neural Network Models for Sentence Classification
Authors Phuong Le-Hong, Anh-Cuong Le
Abstract This paper presents an extensive comparative study of four neural network models, including feed-forward networks, convolutional networks, recurrent networks and long short-term memory networks, on two sentence classification datasets of English and Vietnamese text. We show that on the English dataset, the convolutional network models without any feature engineering outperform some competitive sentence classifiers with rich hand-crafted linguistic features. We demonstrate that the GloVe word embeddings are consistently better than both Skip-gram word embeddings and word count vectors. We also show the superiority of convolutional neural network models on a Vietnamese newspaper sentence dataset over strong baseline models. Our experimental results suggest some good practices for applying neural network models in sentence classification.
Tasks Feature Engineering, Sentence Classification, Word Embeddings
Published 2018-10-03
URL http://arxiv.org/abs/1810.01656v1
PDF http://arxiv.org/pdf/1810.01656v1.pdf
PWC https://paperswithcode.com/paper/a-comparative-study-of-neural-network-models
Repo
Framework

Generating Titles for Web Tables

Title Generating Titles for Web Tables
Authors Braden Hancock, Hongrae Lee, Cong Yu
Abstract Descriptive titles provide crucial context for interpreting tables that are extracted from web pages and are a key component of table-based web applications. Prior approaches have attempted to produce titles by selecting existing text snippets associated with the table. These approaches, however, are limited by their dependence on suitable titles existing a priori. In our user study, we observe that the relevant information for the title tends to be scattered across the page, and often–more than 80% of the time–does not appear verbatim anywhere in the page. We propose instead the application of a sequence-to-sequence neural network model as a more generalizable means of generating high-quality titles. This is accomplished by extracting many text snippets that have potentially relevant information to the table, encoding them into an input sequence, and using both copy and generation mechanisms in the decoder to balance relevance and readability of the generated title. We validate this approach with human evaluation on sample web tables and report that while sequence models with only a copy mechanism or only a generation mechanism are easily outperformed by simple selection-based baselines, the model with both capabilities outperforms them all, approaching the quality of crowdsourced titles while training on fewer than ten thousand examples. To the best of our knowledge, the proposed technique is the first to consider text generation methods for table titles and establishes a new state of the art.
Tasks Text Generation
Published 2018-06-30
URL https://arxiv.org/abs/1807.00099v2
PDF https://arxiv.org/pdf/1807.00099v2.pdf
PWC https://paperswithcode.com/paper/title-generation-for-web-tables
Repo
Framework

Monitoring spatial sustainable development: Semi-automated analysis of satellite and aerial images for energy transition and sustainability indicators

Title Monitoring spatial sustainable development: Semi-automated analysis of satellite and aerial images for energy transition and sustainability indicators
Authors R. L. Curier, T. J. A. De Jong, Katharina Strauch, Katharina Cramer, Natalie Rosenski, Clara Schartner, M. Debusschere, Hannah Ziemons, Deniz Iren, Stefano Bromuri
Abstract Solar panels are installed by a large and growing number of households due to the convenience of having cheap and renewable energy to power house appliances. In contrast to other energy sources solar installations are distributed very decentralized and spread over hundred-thousands of locations. On a global level more than 25% of solar photovoltaic (PV) installations were decentralized. The effect of the quick energy transition from a carbon based economy to a green economy is though still very difficult to quantify. As a matter of fact the quick adoption of solar panels by households is difficult to track, with local registries that miss a large number of the newly built solar panels. This makes the task of assessing the impact of renewable energies an impossible task. Although models of the output of a region exist, they are often black box estimations. This project’s aim is twofold: First automate the process to extract the location of solar panels from aerial or satellite images and second, produce a map of solar panels along with statistics on the number of solar panels. Further, this project takes place in a wider framework which investigates how official statistics can benefit from new digital data sources. At project completion, a method for detecting solar panels from aerial images via machine learning will be developed and the methodology initially developed for BE, DE and NL will be standardized for application to other EU countries. In practice, machine learning techniques are used to identify solar panels in satellite and aerial images for the province of Limburg (NL), Flanders (BE) and North Rhine-Westphalia (DE).
Tasks
Published 2018-10-11
URL http://arxiv.org/abs/1810.04881v1
PDF http://arxiv.org/pdf/1810.04881v1.pdf
PWC https://paperswithcode.com/paper/monitoring-spatial-sustainable-development
Repo
Framework

Automatic Inspection of Utility Scale Solar Power Plants using Deep Learning

Title Automatic Inspection of Utility Scale Solar Power Plants using Deep Learning
Authors Alekh Karkada Ashok, Chandan G, Adithya Bhat, Kausthubh Karnataki, Ganesh Shankar
Abstract Solar energy has the potential to become the backbone energy source for the world. Utility scale solar power plants (more than 50 MW) could have more than 100K individual solar modules and be spread over more than 200 acres of land. Traditionally methods of monitoring each module become too costly in the utility scale. We demonstrate an alternative using the recent advances in deep learning to automatically analyze drone footage. We show that this can be a quick and reliable alternative. We show that it can save huge amounts of power and the impact the developing world hugely.
Tasks
Published 2018-12-20
URL http://arxiv.org/abs/1902.04132v1
PDF http://arxiv.org/pdf/1902.04132v1.pdf
PWC https://paperswithcode.com/paper/automatic-inspection-of-utility-scale-solar
Repo
Framework

MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks

Title MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks
Authors Muhammad Abdullah Hanif, Rachmad Vidya Wicaksana Putra, Muhammad Tanvir, Rehan Hafiz, Semeen Rehman, Muhammad Shafique
Abstract The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much. Hence, they lack a synergistic optimization of the hardware architecture and diverse dataflows for the complete CNN design, which can provide a higher potential for performance/energy efficiency. Towards this, we propose a novel Massively-Parallel Neural Array (MPNA) accelerator that integrates two heterogeneous systolic arrays and respective highly-optimized dataflow patterns to jointly accelerate both the convolutional (CONV) and the fully-connected (FC) layers. Besides fully-exploiting the available off-chip memory bandwidth, these optimized dataflows enable high data-reuse of all the data types (i.e., weights, input and output activations), and thereby enable our MPNA to achieve high energy savings. We synthesized our MPNA architecture using the ASIC design flow for a 28nm technology, and performed functional and timing validation using multiple real-world complex CNNs. MPNA achieves 149.7GOPS/W at 280MHz and consumes 239mW. Experimental results show that our MPNA architecture provides 1.7x overall performance improvement compared to state-of-the-art accelerator, and 51% energy saving compared to the baseline architecture.
Tasks
Published 2018-10-30
URL http://arxiv.org/abs/1810.12910v1
PDF http://arxiv.org/pdf/1810.12910v1.pdf
PWC https://paperswithcode.com/paper/mpna-a-massively-parallel-neural-array
Repo
Framework

Probabilistic Bisection with Spatial Metamodels

Title Probabilistic Bisection with Spatial Metamodels
Authors Sergio Rodriguez, Mike Ludkovski
Abstract Probabilistic Bisection Algorithm performs root finding based on knowledge acquired from noisy oracle responses. We consider the generalized PBA setting (G-PBA) where the statistical distribution of the oracle is unknown and location-dependent, so that model inference and Bayesian knowledge updating must be performed simultaneously. To this end, we propose to leverage the spatial structure of a typical oracle by constructing a statistical surrogate for the underlying logistic regression step. We investigate several non-parametric surrogates, including Binomial Gaussian Processes (B-GP), Polynomial, Kernel, and Spline Logistic Regression. In parallel, we develop sampling policies that adaptively balance learning the oracle distribution and learning the root. One of our proposals mimics active learning with B-GPs and provides a novel look-ahead predictive variance formula. The resulting gains of our Spatial PBA algorithm relative to earlier G-PBA models are illustrated with synthetic examples and a challenging stochastic root finding problem from Bermudan option pricing.
Tasks Active Learning, Gaussian Processes
Published 2018-06-30
URL http://arxiv.org/abs/1807.00095v1
PDF http://arxiv.org/pdf/1807.00095v1.pdf
PWC https://paperswithcode.com/paper/probabilistic-bisection-with-spatial
Repo
Framework

Ising distribution as a latent variable model

Title Ising distribution as a latent variable model
Authors Adrien Wohrer
Abstract During the past decades, the Ising distribution has attracted interest in many applied disciplines, as the maximum entropy distribution associated to any set of correlated binary (spin') variables with observed means and covariances. However, numerically speaking, the Ising distribution is unpractical, so alternative models are often preferred to handle correlated binary data. One popular alternative, especially in life sciences, is the Cox distribution (or the closely related dichotomized Gaussian distribution and log-normal Cox point process), where the spins are generated independently conditioned on the drawing of a latent variable with a multivariate normal distribution. This article explores the conditions for a principled replacement of the Ising distribution by a Cox distribution. It shows that the Ising distribution itself can be treated as a latent variable model, and it explores when this latent variable has a quasi-normal distribution. A variational approach to this question reveals a formal link with classic mean-field methods, especially Opper and Winther's adaptive TAP approximation. This link is confirmed by weak coupling (Plefka) expansions of the different approximations and then by numerical tests. Overall, this study suggests that an Ising distribution can be replaced by a Cox distribution in practical applications, precisely when its parameters lie in the mean-field domain’.
Tasks
Published 2018-03-07
URL https://arxiv.org/abs/1803.02598v4
PDF https://arxiv.org/pdf/1803.02598v4.pdf
PWC https://paperswithcode.com/paper/the-ising-distribution-as-a-latent-variable
Repo
Framework

Perturbation Analysis of Learning Algorithms: A Unifying Perspective on Generation of Adversarial Examples

Title Perturbation Analysis of Learning Algorithms: A Unifying Perspective on Generation of Adversarial Examples
Authors Emilio Rafael Balda, Arash Behboodi, Rudolf Mathar
Abstract Despite the tremendous success of deep neural networks in various learning problems, it has been observed that adding an intentionally designed adversarial perturbation to inputs of these architectures leads to erroneous classification with high confidence in the prediction. In this work, we propose a general framework based on the perturbation analysis of learning algorithms which consists of convex programming and is able to recover many current adversarial attacks as special cases. The framework can be used to propose novel attacks against learning algorithms for classification and regression tasks under various new constraints with closed form solutions in many instances. In particular we derive new attacks against classification algorithms which are shown to achieve comparable performances to notable existing attacks. The framework is then used to generate adversarial perturbations for regression tasks which include single pixel and single subset attacks. By applying this method to autoencoding and image colorization tasks, it is shown that adversarial perturbations can effectively perturb the output of regression tasks as well.
Tasks Colorization
Published 2018-12-15
URL http://arxiv.org/abs/1812.07385v1
PDF http://arxiv.org/pdf/1812.07385v1.pdf
PWC https://paperswithcode.com/paper/perturbation-analysis-of-learning-algorithms
Repo
Framework

Safe Element Screening for Submodular Function Minimization

Title Safe Element Screening for Submodular Function Minimization
Authors Weizhong Zhang, Bin Hong, Lin Ma, Wei Liu, Tong Zhang
Abstract Submodular functions are discrete analogs of convex functions, which have applications in various fields, including machine learning and computer vision. However, in large-scale applications, solving Submodular Function Minimization (SFM) problems remains challenging. In this paper, we make the first attempt to extend the emerging technique named screening in large-scale sparse learning to SFM for accelerating its optimization process. We first conduct a careful studying of the relationships between SFM and the corresponding convex proximal problems, as well as the accurate primal optimum estimation of the proximal problems. Relying on this study, we subsequently propose a novel safe screening method to quickly identify the elements guaranteed to be included (we refer to them as active) or excluded (inactive) in the final optimal solution of SFM during the optimization process. By removing the inactive elements and fixing the active ones, the problem size can be dramatically reduced, leading to great savings in the computational cost without sacrificing any accuracy. To the best of our knowledge, the proposed method is the first screening method in the fields of SFM and even combinatorial optimization, thus pointing out a new direction for accelerating SFM algorithms. Experiment results on both synthetic and real datasets demonstrate the significant speedups gained by our approach.
Tasks Combinatorial Optimization, Sparse Learning
Published 2018-05-22
URL http://arxiv.org/abs/1805.08527v4
PDF http://arxiv.org/pdf/1805.08527v4.pdf
PWC https://paperswithcode.com/paper/safe-element-screening-for-submodular
Repo
Framework

A flexible model for training action localization with varying levels of supervision

Title A flexible model for training action localization with varying levels of supervision
Authors Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid
Abstract Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by previous methods. The flexibility of our model enables joint learning from data with different levels of annotation. Experimental results demonstrate a significant gain by adding a few fully supervised examples to otherwise weakly labeled videos.
Tasks Action Detection, Action Localization
Published 2018-06-29
URL http://arxiv.org/abs/1806.11328v2
PDF http://arxiv.org/pdf/1806.11328v2.pdf
PWC https://paperswithcode.com/paper/a-flexible-model-for-training-action
Repo
Framework

Exploiting Edge Features in Graph Neural Networks

Title Exploiting Edge Features in Graph Neural Networks
Authors Liyu Gong, Qiang Cheng
Abstract Edge features contain important information about graphs. However, current state-of-the-art neural network models designed for graph learning, e.g. graph convolutional networks (GCN) and graph attention networks (GAT), adequately utilize edge features, especially multi-dimensional edge features. In this paper, we build a new framework for a family of new graph neural network models that can more sufficiently exploit edge features, including those of undirected or multi-dimensional edges. The proposed framework can consolidate current graph neural network models; e.g. graph convolutional networks (GCN) and graph attention networks (GAT). The proposed framework and new models have the following novelties: First, we propose to use doubly stochastic normalization of graph edge features instead of the commonly used row or symmetric normalization approches used in current graph neural networks. Second, we construct new formulas for the operations in each individual layer so that they can handle multi-dimensional edge features. Third, for the proposed new framework, edge features are adaptive across network layers. As a result, our proposed new framework and new models can exploit a rich source of graph information. We apply our new models to graph node classification on several citation networks, whole graph classification, and regression on several molecular datasets. Compared with the current state-of-the-art methods, i.e. GCNs and GAT, our models obtain better performance, which testify to the importance of exploiting edge features in graph neural networks.
Tasks Graph Classification, Node Classification
Published 2018-09-07
URL http://arxiv.org/abs/1809.02709v2
PDF http://arxiv.org/pdf/1809.02709v2.pdf
PWC https://paperswithcode.com/paper/exploiting-edge-features-in-graph-neural
Repo
Framework
comments powered by Disqus