Paper Group ANR 126
Arbitrary Style Transfer with Deep Feature Reshuffle. An analytic theory of generalization dynamics and transfer learning in deep linear networks. Online Off-policy Prediction. Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health. A Comparative Study of Neural Network Models for Sentence Classificatio …
Arbitrary Style Transfer with Deep Feature Reshuffle
Title | Arbitrary Style Transfer with Deep Feature Reshuffle |
Authors | Shuyang Gu, Congliang Chen, Jing Liao, Lu Yuan |
Abstract | This paper introduces a novel method by reshuffling deep features (i.e., permuting the spacial locations of a feature map) of the style image for arbitrary style transfer. We theoretically prove that our new style loss based on reshuffle connects both global and local style losses respectively used by most parametric and non-parametric neural style transfer methods. This simple idea can effectively address the challenging issues in existing style transfer methods. On one hand, it can avoid distortions in local style patterns, and allow semantic-level transfer, compared with neural parametric methods. On the other hand, it can preserve globally similar appearance to the style image, and avoid wash-out artifacts, compared with neural non-parametric methods. Based on the proposed loss, we also present a progressive feature-domain optimization approach. The experiments show that our method is widely applicable to various styles, and produces better quality than existing methods. |
Tasks | Style Transfer |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.04103v4 |
http://arxiv.org/pdf/1805.04103v4.pdf | |
PWC | https://paperswithcode.com/paper/arbitrary-style-transfer-with-deep-feature |
Repo | |
Framework | |
An analytic theory of generalization dynamics and transfer learning in deep linear networks
Title | An analytic theory of generalization dynamics and transfer learning in deep linear networks |
Authors | Andrew K. Lampinen, Surya Ganguli |
Abstract | Much attention has been devoted recently to the generalization puzzle in deep learning: large, deep networks can generalize well, but existing theories bounding generalization error are exceedingly loose, and thus cannot explain this striking performance. Furthermore, a major hope is that knowledge may transfer across tasks, so that multi-task learning can improve generalization on individual tasks. However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks. We develop an analytic theory of the nonlinear dynamics of generalization in deep linear networks, both within and across tasks. In particular, our theory provides analytic solutions to the training and testing error of deep networks as a function of training time, number of examples, network size and initialization, and the task structure and SNR. Our theory reveals that deep networks progressively learn the most important task structure first, so that generalization error at the early stopping time primarily depends on task structure and is independent of network size. This suggests any tight bound on generalization error must take into account task structure, and explains observations about real data being learned faster than random data. Intriguingly our theory also reveals the existence of a learning algorithm that proveably out-performs neural network training through gradient descent. Finally, for transfer learning, our theory reveals that knowledge transfer depends sensitively, but computably, on the SNRs and input feature alignments of pairs of tasks. |
Tasks | Multi-Task Learning, Transfer Learning |
Published | 2018-09-27 |
URL | http://arxiv.org/abs/1809.10374v2 |
http://arxiv.org/pdf/1809.10374v2.pdf | |
PWC | https://paperswithcode.com/paper/an-analytic-theory-of-generalization-dynamics |
Repo | |
Framework | |
Online Off-policy Prediction
Title | Online Off-policy Prediction |
Authors | Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White |
Abstract | This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the predictions, and thus the samples are generated off-policy. The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades. The issue lies with the temporal difference (TD) learning update at the heart of most prediction algorithms: combining bootstrapping, off-policy sampling and function approximation may cause the value estimate to diverge. A breakthrough came with the development of a new objective function that admitted stochastic gradient descent variants of TD. Since then, many sound online off-policy prediction algorithms have been developed, but there has been limited empirical work investigating the relative merits of all the variants. This paper aims to fill these empirical gaps and provide clarity on the key ideas behind each method. We summarize the large body of literature on off-policy learning, focusing on 1- methods that use computation linear in the number of features and are convergent under off-policy sampling, and 2- other methods which have proven useful with non-fixed, nonlinear function approximation. We provide an empirical study of off-policy prediction methods in two challenging microworlds. We report each method’s parameter sensitivity, empirical convergence rate, and final performance, providing new insights that should enable practitioners to successfully extend these new methods to large-scale applications.[Abridged abstract] |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02597v1 |
http://arxiv.org/pdf/1811.02597v1.pdf | |
PWC | https://paperswithcode.com/paper/online-off-policy-prediction |
Repo | |
Framework | |
Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health
Title | Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health |
Authors | Casey C. Bennett |
Abstract | Diabetes is a major public health problem in the United States, affecting roughly 30 million people. Diabetes complications, along with the mental health comorbidities that often co-occur with them, are major drivers of high healthcare costs, poor outcomes, and reduced treatment adherence in diabetes. Here, we evaluate in a large state-wide population whether we can use artificial intelligence (AI) techniques to identify clusters of patient trajectories within the broader diabetes population in order to create cost-effective, narrowly-focused case management intervention strategies to reduce development of complications. This approach combined data from: 1) claims, 2) case management notes, and 3) social determinants of health from ~300,000 real patients between 2014 and 2016. We categorized complications as five types: Cardiovascular, Neuropathy, Opthalmic, Renal, and Other. Modeling was performed combining a variety of machine learning algorithms, including supervised classification, unsupervised clustering, natural language processing of unstructured care notes, and feature engineering. The results showed that we can predict development of diabetes complications roughly 83.5% of the time using claims data or social determinants of health data. They also showed we can reveal meaningful clusters in the patient population related to complications and mental health that can be used to cost-effective screening program, reducing the number of patients to be screened down by 85%. This study outlines creation of an AI framework to develop protocols to better address mental health comorbidities that lead to complications development in the diabetes population. Future work is described that outlines potential lines of research and the need for better addressing the ‘people side’ of the equation. |
Tasks | Feature Engineering |
Published | 2018-10-06 |
URL | https://arxiv.org/abs/1810.03044v3 |
https://arxiv.org/pdf/1810.03044v3.pdf | |
PWC | https://paperswithcode.com/paper/artificial-intelligence-for-diabetes-case |
Repo | |
Framework | |
A Comparative Study of Neural Network Models for Sentence Classification
Title | A Comparative Study of Neural Network Models for Sentence Classification |
Authors | Phuong Le-Hong, Anh-Cuong Le |
Abstract | This paper presents an extensive comparative study of four neural network models, including feed-forward networks, convolutional networks, recurrent networks and long short-term memory networks, on two sentence classification datasets of English and Vietnamese text. We show that on the English dataset, the convolutional network models without any feature engineering outperform some competitive sentence classifiers with rich hand-crafted linguistic features. We demonstrate that the GloVe word embeddings are consistently better than both Skip-gram word embeddings and word count vectors. We also show the superiority of convolutional neural network models on a Vietnamese newspaper sentence dataset over strong baseline models. Our experimental results suggest some good practices for applying neural network models in sentence classification. |
Tasks | Feature Engineering, Sentence Classification, Word Embeddings |
Published | 2018-10-03 |
URL | http://arxiv.org/abs/1810.01656v1 |
http://arxiv.org/pdf/1810.01656v1.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-study-of-neural-network-models |
Repo | |
Framework | |
Generating Titles for Web Tables
Title | Generating Titles for Web Tables |
Authors | Braden Hancock, Hongrae Lee, Cong Yu |
Abstract | Descriptive titles provide crucial context for interpreting tables that are extracted from web pages and are a key component of table-based web applications. Prior approaches have attempted to produce titles by selecting existing text snippets associated with the table. These approaches, however, are limited by their dependence on suitable titles existing a priori. In our user study, we observe that the relevant information for the title tends to be scattered across the page, and often–more than 80% of the time–does not appear verbatim anywhere in the page. We propose instead the application of a sequence-to-sequence neural network model as a more generalizable means of generating high-quality titles. This is accomplished by extracting many text snippets that have potentially relevant information to the table, encoding them into an input sequence, and using both copy and generation mechanisms in the decoder to balance relevance and readability of the generated title. We validate this approach with human evaluation on sample web tables and report that while sequence models with only a copy mechanism or only a generation mechanism are easily outperformed by simple selection-based baselines, the model with both capabilities outperforms them all, approaching the quality of crowdsourced titles while training on fewer than ten thousand examples. To the best of our knowledge, the proposed technique is the first to consider text generation methods for table titles and establishes a new state of the art. |
Tasks | Text Generation |
Published | 2018-06-30 |
URL | https://arxiv.org/abs/1807.00099v2 |
https://arxiv.org/pdf/1807.00099v2.pdf | |
PWC | https://paperswithcode.com/paper/title-generation-for-web-tables |
Repo | |
Framework | |
Monitoring spatial sustainable development: Semi-automated analysis of satellite and aerial images for energy transition and sustainability indicators
Title | Monitoring spatial sustainable development: Semi-automated analysis of satellite and aerial images for energy transition and sustainability indicators |
Authors | R. L. Curier, T. J. A. De Jong, Katharina Strauch, Katharina Cramer, Natalie Rosenski, Clara Schartner, M. Debusschere, Hannah Ziemons, Deniz Iren, Stefano Bromuri |
Abstract | Solar panels are installed by a large and growing number of households due to the convenience of having cheap and renewable energy to power house appliances. In contrast to other energy sources solar installations are distributed very decentralized and spread over hundred-thousands of locations. On a global level more than 25% of solar photovoltaic (PV) installations were decentralized. The effect of the quick energy transition from a carbon based economy to a green economy is though still very difficult to quantify. As a matter of fact the quick adoption of solar panels by households is difficult to track, with local registries that miss a large number of the newly built solar panels. This makes the task of assessing the impact of renewable energies an impossible task. Although models of the output of a region exist, they are often black box estimations. This project’s aim is twofold: First automate the process to extract the location of solar panels from aerial or satellite images and second, produce a map of solar panels along with statistics on the number of solar panels. Further, this project takes place in a wider framework which investigates how official statistics can benefit from new digital data sources. At project completion, a method for detecting solar panels from aerial images via machine learning will be developed and the methodology initially developed for BE, DE and NL will be standardized for application to other EU countries. In practice, machine learning techniques are used to identify solar panels in satellite and aerial images for the province of Limburg (NL), Flanders (BE) and North Rhine-Westphalia (DE). |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.04881v1 |
http://arxiv.org/pdf/1810.04881v1.pdf | |
PWC | https://paperswithcode.com/paper/monitoring-spatial-sustainable-development |
Repo | |
Framework | |
Automatic Inspection of Utility Scale Solar Power Plants using Deep Learning
Title | Automatic Inspection of Utility Scale Solar Power Plants using Deep Learning |
Authors | Alekh Karkada Ashok, Chandan G, Adithya Bhat, Kausthubh Karnataki, Ganesh Shankar |
Abstract | Solar energy has the potential to become the backbone energy source for the world. Utility scale solar power plants (more than 50 MW) could have more than 100K individual solar modules and be spread over more than 200 acres of land. Traditionally methods of monitoring each module become too costly in the utility scale. We demonstrate an alternative using the recent advances in deep learning to automatically analyze drone footage. We show that this can be a quick and reliable alternative. We show that it can save huge amounts of power and the impact the developing world hugely. |
Tasks | |
Published | 2018-12-20 |
URL | http://arxiv.org/abs/1902.04132v1 |
http://arxiv.org/pdf/1902.04132v1.pdf | |
PWC | https://paperswithcode.com/paper/automatic-inspection-of-utility-scale-solar |
Repo | |
Framework | |
MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks
Title | MPNA: A Massively-Parallel Neural Array Accelerator with Dataflow Optimization for Convolutional Neural Networks |
Authors | Muhammad Abdullah Hanif, Rachmad Vidya Wicaksana Putra, Muhammad Tanvir, Rehan Hafiz, Semeen Rehman, Muhammad Shafique |
Abstract | The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much. Hence, they lack a synergistic optimization of the hardware architecture and diverse dataflows for the complete CNN design, which can provide a higher potential for performance/energy efficiency. Towards this, we propose a novel Massively-Parallel Neural Array (MPNA) accelerator that integrates two heterogeneous systolic arrays and respective highly-optimized dataflow patterns to jointly accelerate both the convolutional (CONV) and the fully-connected (FC) layers. Besides fully-exploiting the available off-chip memory bandwidth, these optimized dataflows enable high data-reuse of all the data types (i.e., weights, input and output activations), and thereby enable our MPNA to achieve high energy savings. We synthesized our MPNA architecture using the ASIC design flow for a 28nm technology, and performed functional and timing validation using multiple real-world complex CNNs. MPNA achieves 149.7GOPS/W at 280MHz and consumes 239mW. Experimental results show that our MPNA architecture provides 1.7x overall performance improvement compared to state-of-the-art accelerator, and 51% energy saving compared to the baseline architecture. |
Tasks | |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12910v1 |
http://arxiv.org/pdf/1810.12910v1.pdf | |
PWC | https://paperswithcode.com/paper/mpna-a-massively-parallel-neural-array |
Repo | |
Framework | |
Probabilistic Bisection with Spatial Metamodels
Title | Probabilistic Bisection with Spatial Metamodels |
Authors | Sergio Rodriguez, Mike Ludkovski |
Abstract | Probabilistic Bisection Algorithm performs root finding based on knowledge acquired from noisy oracle responses. We consider the generalized PBA setting (G-PBA) where the statistical distribution of the oracle is unknown and location-dependent, so that model inference and Bayesian knowledge updating must be performed simultaneously. To this end, we propose to leverage the spatial structure of a typical oracle by constructing a statistical surrogate for the underlying logistic regression step. We investigate several non-parametric surrogates, including Binomial Gaussian Processes (B-GP), Polynomial, Kernel, and Spline Logistic Regression. In parallel, we develop sampling policies that adaptively balance learning the oracle distribution and learning the root. One of our proposals mimics active learning with B-GPs and provides a novel look-ahead predictive variance formula. The resulting gains of our Spatial PBA algorithm relative to earlier G-PBA models are illustrated with synthetic examples and a challenging stochastic root finding problem from Bermudan option pricing. |
Tasks | Active Learning, Gaussian Processes |
Published | 2018-06-30 |
URL | http://arxiv.org/abs/1807.00095v1 |
http://arxiv.org/pdf/1807.00095v1.pdf | |
PWC | https://paperswithcode.com/paper/probabilistic-bisection-with-spatial |
Repo | |
Framework | |
Ising distribution as a latent variable model
Title | Ising distribution as a latent variable model |
Authors | Adrien Wohrer |
Abstract | During the past decades, the Ising distribution has attracted interest in many applied disciplines, as the maximum entropy distribution associated to any set of correlated binary (spin') variables with observed means and covariances. However, numerically speaking, the Ising distribution is unpractical, so alternative models are often preferred to handle correlated binary data. One popular alternative, especially in life sciences, is the Cox distribution (or the closely related dichotomized Gaussian distribution and log-normal Cox point process), where the spins are generated independently conditioned on the drawing of a latent variable with a multivariate normal distribution. This article explores the conditions for a principled replacement of the Ising distribution by a Cox distribution. It shows that the Ising distribution itself can be treated as a latent variable model, and it explores when this latent variable has a quasi-normal distribution. A variational approach to this question reveals a formal link with classic mean-field methods, especially Opper and Winther's adaptive TAP approximation. This link is confirmed by weak coupling (Plefka) expansions of the different approximations and then by numerical tests. Overall, this study suggests that an Ising distribution can be replaced by a Cox distribution in practical applications, precisely when its parameters lie in the mean-field domain’. |
Tasks | |
Published | 2018-03-07 |
URL | https://arxiv.org/abs/1803.02598v4 |
https://arxiv.org/pdf/1803.02598v4.pdf | |
PWC | https://paperswithcode.com/paper/the-ising-distribution-as-a-latent-variable |
Repo | |
Framework | |
Perturbation Analysis of Learning Algorithms: A Unifying Perspective on Generation of Adversarial Examples
Title | Perturbation Analysis of Learning Algorithms: A Unifying Perspective on Generation of Adversarial Examples |
Authors | Emilio Rafael Balda, Arash Behboodi, Rudolf Mathar |
Abstract | Despite the tremendous success of deep neural networks in various learning problems, it has been observed that adding an intentionally designed adversarial perturbation to inputs of these architectures leads to erroneous classification with high confidence in the prediction. In this work, we propose a general framework based on the perturbation analysis of learning algorithms which consists of convex programming and is able to recover many current adversarial attacks as special cases. The framework can be used to propose novel attacks against learning algorithms for classification and regression tasks under various new constraints with closed form solutions in many instances. In particular we derive new attacks against classification algorithms which are shown to achieve comparable performances to notable existing attacks. The framework is then used to generate adversarial perturbations for regression tasks which include single pixel and single subset attacks. By applying this method to autoencoding and image colorization tasks, it is shown that adversarial perturbations can effectively perturb the output of regression tasks as well. |
Tasks | Colorization |
Published | 2018-12-15 |
URL | http://arxiv.org/abs/1812.07385v1 |
http://arxiv.org/pdf/1812.07385v1.pdf | |
PWC | https://paperswithcode.com/paper/perturbation-analysis-of-learning-algorithms |
Repo | |
Framework | |
Safe Element Screening for Submodular Function Minimization
Title | Safe Element Screening for Submodular Function Minimization |
Authors | Weizhong Zhang, Bin Hong, Lin Ma, Wei Liu, Tong Zhang |
Abstract | Submodular functions are discrete analogs of convex functions, which have applications in various fields, including machine learning and computer vision. However, in large-scale applications, solving Submodular Function Minimization (SFM) problems remains challenging. In this paper, we make the first attempt to extend the emerging technique named screening in large-scale sparse learning to SFM for accelerating its optimization process. We first conduct a careful studying of the relationships between SFM and the corresponding convex proximal problems, as well as the accurate primal optimum estimation of the proximal problems. Relying on this study, we subsequently propose a novel safe screening method to quickly identify the elements guaranteed to be included (we refer to them as active) or excluded (inactive) in the final optimal solution of SFM during the optimization process. By removing the inactive elements and fixing the active ones, the problem size can be dramatically reduced, leading to great savings in the computational cost without sacrificing any accuracy. To the best of our knowledge, the proposed method is the first screening method in the fields of SFM and even combinatorial optimization, thus pointing out a new direction for accelerating SFM algorithms. Experiment results on both synthetic and real datasets demonstrate the significant speedups gained by our approach. |
Tasks | Combinatorial Optimization, Sparse Learning |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08527v4 |
http://arxiv.org/pdf/1805.08527v4.pdf | |
PWC | https://paperswithcode.com/paper/safe-element-screening-for-submodular |
Repo | |
Framework | |
A flexible model for training action localization with varying levels of supervision
Title | A flexible model for training action localization with varying levels of supervision |
Authors | Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid |
Abstract | Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame. Since such annotation is extremely tedious and prohibits scalability, there is a clear need to minimize the amount of manual supervision. In this work we propose a unifying framework that can handle and combine varying types of less-demanding weak supervision. Our model is based on discriminative clustering and integrates different types of supervision as constraints on the optimization. We investigate applications of such a model to training setups with alternative supervisory signals ranging from video-level class labels to the full per-frame annotation of action bounding boxes. Experiments on the challenging UCF101-24 and DALY datasets demonstrate competitive performance of our method at a fraction of supervision used by previous methods. The flexibility of our model enables joint learning from data with different levels of annotation. Experimental results demonstrate a significant gain by adding a few fully supervised examples to otherwise weakly labeled videos. |
Tasks | Action Detection, Action Localization |
Published | 2018-06-29 |
URL | http://arxiv.org/abs/1806.11328v2 |
http://arxiv.org/pdf/1806.11328v2.pdf | |
PWC | https://paperswithcode.com/paper/a-flexible-model-for-training-action |
Repo | |
Framework | |
Exploiting Edge Features in Graph Neural Networks
Title | Exploiting Edge Features in Graph Neural Networks |
Authors | Liyu Gong, Qiang Cheng |
Abstract | Edge features contain important information about graphs. However, current state-of-the-art neural network models designed for graph learning, e.g. graph convolutional networks (GCN) and graph attention networks (GAT), adequately utilize edge features, especially multi-dimensional edge features. In this paper, we build a new framework for a family of new graph neural network models that can more sufficiently exploit edge features, including those of undirected or multi-dimensional edges. The proposed framework can consolidate current graph neural network models; e.g. graph convolutional networks (GCN) and graph attention networks (GAT). The proposed framework and new models have the following novelties: First, we propose to use doubly stochastic normalization of graph edge features instead of the commonly used row or symmetric normalization approches used in current graph neural networks. Second, we construct new formulas for the operations in each individual layer so that they can handle multi-dimensional edge features. Third, for the proposed new framework, edge features are adaptive across network layers. As a result, our proposed new framework and new models can exploit a rich source of graph information. We apply our new models to graph node classification on several citation networks, whole graph classification, and regression on several molecular datasets. Compared with the current state-of-the-art methods, i.e. GCNs and GAT, our models obtain better performance, which testify to the importance of exploiting edge features in graph neural networks. |
Tasks | Graph Classification, Node Classification |
Published | 2018-09-07 |
URL | http://arxiv.org/abs/1809.02709v2 |
http://arxiv.org/pdf/1809.02709v2.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-edge-features-in-graph-neural |
Repo | |
Framework | |