Paper Group ANR 29
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks. Lipschitz standardization for robust multivariate learning. Distilling portable Generative Adversarial Networks for Image Translation. Identification of AC Networks via Online Learning. Deep regularization and direct training of the inner layers of Neural Networks wit …
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Title | Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks |
Authors | Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan |
Abstract | A fundamental question in modern machine learning is how deep neural networks can generalize. We address this question using 1) an equivalence between training infinitely wide neural networks and performing kernel regression with a deterministic kernel called the Neural Tangent Kernel (NTK) (Jacot et al. 2018), and 2) theoretical tools from statistical physics. We derive analytical expressions for learning curves for kernel regression, and use them to evaluate how the test loss of a trained neural network depends on the number of samples. Our approach allows us not only to compute the total test risk but also the decomposition of the risk due to different spectral components of the kernel. Complementary to recent results showing that during gradient descent, neural networks fit low frequency components first, we identify a new type of frequency principle: as the size of the training set size grows, kernel machines and neural networks begin to fit successively higher frequency modes of the target function. We verify our theory with simulations of kernel regression and training wide artificial neural networks. |
Tasks | |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02561v2 |
https://arxiv.org/pdf/2002.02561v2.pdf | |
PWC | https://paperswithcode.com/paper/spectrum-dependent-learning-curves-in-kernel |
Repo | |
Framework | |
Lipschitz standardization for robust multivariate learning
Title | Lipschitz standardization for robust multivariate learning |
Authors | Adrián Javaloy, Isabel Valera |
Abstract | Current trends in machine learning rely on out-of-the-box gradient-based approaches. With the aim of mitigating numerical errors and to improve the convergence of the learning process, a common empirical practice is to standardize or normalize the data. However, there is a lack of theoretical analysis regarding why and when these methods result in an improvement of the learning process. In this work, we first study these methods in the context of black-box variational inference, specifically analyzing the effect that scaling the data has on the smoothness of the optimization landscape. Our analysis shows that no general rule applies in order to decide which of the existing data scaling methods, or even if they, will improve the learning process. Second, we highlight the issues that arise when dealing with multivariate data, due to the discrepancy in smoothness of the likelihood functions for different variables, and the inability to scale discrete data. Finally, we propose a novel Lipschitz standardization, and its extension for discrete data, which overcomes the aforementioned limitations. Specifically, as backed by our experiments, Lipschitz standardization i) favors a fairer learning across different variables in the data; and ii) results in faster and more accurate learning. |
Tasks | |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11369v1 |
https://arxiv.org/pdf/2002.11369v1.pdf | |
PWC | https://paperswithcode.com/paper/lipschitz-standardization-for-robust |
Repo | |
Framework | |
Distilling portable Generative Adversarial Networks for Image Translation
Title | Distilling portable Generative Adversarial Networks for Image Translation |
Authors | Hanting Chen, Yunhe Wang, Han Shu, Changyuan Wen, Chunjing Xu, Boxin Shi, Chao Xu, Chang Xu |
Abstract | Despite Generative Adversarial Networks (GANs) have been widely used in various image-to-image translation tasks, they can be hardly applied on mobile devices due to their heavy computation and storage cost. Traditional network compression methods focus on visually recognition tasks, but never deal with generation tasks. Inspired by knowledge distillation, a student generator of fewer parameters is trained by inheriting the low-level and high-level information from the original heavy teacher generator. To promote the capability of student generator, we include a student discriminator to measure the distances between real images, and images generated by student and teacher generators. An adversarial learning process is therefore established to optimize student generator and student discriminator. Qualitative and quantitative analysis by conducting experiments on benchmark datasets demonstrate that the proposed method can learn portable generative models with strong performance. |
Tasks | Image-to-Image Translation |
Published | 2020-03-07 |
URL | https://arxiv.org/abs/2003.03519v1 |
https://arxiv.org/pdf/2003.03519v1.pdf | |
PWC | https://paperswithcode.com/paper/distilling-portable-generative-adversarial |
Repo | |
Framework | |
Identification of AC Networks via Online Learning
Title | Identification of AC Networks via Online Learning |
Authors | Emanuele Fabbiani, Pulkit Nahata, Giuseppe De Nicolao, Giancarlo Ferrari-Trecate |
Abstract | The increasing integration of intermittent renewable generation in power networks calls for novel planning and control methodologies, which hinge on detailed knowledge of the grid. However, reliable information concerning the system topology and parameters may be missing or outdated for temporally varying AC networks. This paper proposes an online learning procedure to estimate the admittance matrix of an AC network capturing topological information and line parameters. We start off by providing a recursive identification algorithm that exploits phasor measurements of voltages and currents. With the goal of accelerating convergence, we subsequently complement our base algorithm with a design-of-experiment procedure, which maximizes the information content of data at each step by computing optimal voltage excitations. Our approach improves on existing techniques and its effectiveness is substantiated by numerical studies on a 6-bus AC network. |
Tasks | |
Published | 2020-03-13 |
URL | https://arxiv.org/abs/2003.06210v1 |
https://arxiv.org/pdf/2003.06210v1.pdf | |
PWC | https://paperswithcode.com/paper/identification-of-ac-networks-via-online |
Repo | |
Framework | |
Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
Title | Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows |
Authors | Gene Ryan Yoo, Houman Owhadi |
Abstract | We introduce a new regularization method for Artificial Neural Networks (ANNs) based on Kernel Flows (KFs). KFs were introduced as a method for kernel selection in regression/kriging based on the minimization of the loss of accuracy incurred by halving the number of interpolation points in random batches of the dataset. Writing $f_\theta(x) = \big(f^{(n)}_{\theta_n}\circ f^{(n-1)}_{\theta_{n-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big)(x)$ for the functional representation of compositional structure of the ANN, the inner layers outputs $h^{(i)}(x) = \big(f^{(i)}_{\theta_i}\circ f^{(i-1)}_{\theta_{i-1}} \circ \dots \circ f^{(1)}_{\theta_1}\big)(x)$ define a hierarchy of feature maps and kernels $k^{(i)}(x,x’)=\exp(- \gamma_i \h^{(i)}(x)-h^{(i)}(x’)_2^2)$. When combined with a batch of the dataset these kernels produce KF losses $e_2^{(i)}$ (the $L^2$ regression error incurred by using a random half of the batch to predict the other half) depending on parameters of inner layers $\theta_1,\ldots,\theta_i$ (and $\gamma_i$). The proposed method simply consists in aggregating a subset of these KF losses with a classical output loss. We test the proposed method on CNNs and WRNs without alteration of structure nor output classifier and report reduced test errors, decreased generalization gaps, and increased robustness to distribution shift without significant increase in computational complexity. We suspect that these results might be explained by the fact that while conventional training only employs a linear functional (a generalized moment) of the empirical distribution defined by the dataset and can be prone to trapping in the Neural Tangent Kernel regime (under over-parameterizations), the proposed loss function (defined as a nonlinear functional of the empirical distribution) effectively trains the underlying kernel defined by the CNN beyond regressing the data with that kernel. |
Tasks | |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.08335v1 |
https://arxiv.org/pdf/2002.08335v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-regularization-and-direct-training-of |
Repo | |
Framework | |
Getting Fairness Right: Towards a Toolbox for Practitioners
Title | Getting Fairness Right: Towards a Toolbox for Practitioners |
Authors | Boris Ruf, Chaouki Boutharouite, Marcin Detyniecki |
Abstract | The potential risk of AI systems unintentionally embedding and reproducing bias has attracted the attention of machine learning practitioners and society at large. As policy makers are willing to set the standards of algorithms and AI techniques, the issue on how to refine existing regulation, in order to enforce that decisions made by automated systems are fair and non-discriminatory, is again critical. Meanwhile, researchers have demonstrated that the various existing metrics for fairness are statistically mutually exclusive and the right choice mostly depends on the use case and the definition of fairness. Recognizing that the solutions for implementing fair AI are not purely mathematical but require the commitments of the stakeholders to define the desired nature of fairness, this paper proposes to draft a toolbox which helps practitioners to ensure fair AI practices. Based on the nature of the application and the available training data, but also on legal requirements and ethical, philosophical and cultural dimensions, the toolbox aims to identify the most appropriate fairness objective. This approach attempts to structure the complex landscape of fairness metrics and, therefore, makes the different available options more accessible to non-technical people. In the proven absence of a silver bullet solution for fair AI, this toolbox intends to produce the fairest AI systems possible with respect to their local context. |
Tasks | |
Published | 2020-03-15 |
URL | https://arxiv.org/abs/2003.06920v1 |
https://arxiv.org/pdf/2003.06920v1.pdf | |
PWC | https://paperswithcode.com/paper/getting-fairness-right-towards-a-toolbox-for |
Repo | |
Framework | |
Algorithmic Fairness from a Non-ideal Perspective
Title | Algorithmic Fairness from a Non-ideal Perspective |
Authors | Sina Fazelpour, Zachary C. Lipton |
Abstract | Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing predictions to drive automated decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In efforts to mitigate these problems, researchers have proposed a variety of metrics for quantifying deviations from various statistical parities that we might expect to observe in a fair world and offered a variety of algorithms in attempts to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to \emph{fair machine learning} to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach requires positing the principles according to which a just world would operate. In the most straightforward application of ideal theory, one supports a proposed policy by arguing that it closes a discrepancy between the real and the perfectly just world. However, by failing to account for the mechanisms by which our non-ideal world arose, the responsibilities of various decision-makers, and the impacts of proposed policies, naive applications of ideal thinking can lead to misguided interventions. In this paper, we demonstrate a connection between the fair machine learning literature and the ideal approach in political philosophy, and argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach. We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research. |
Tasks | |
Published | 2020-01-08 |
URL | https://arxiv.org/abs/2001.09773v1 |
https://arxiv.org/pdf/2001.09773v1.pdf | |
PWC | https://paperswithcode.com/paper/algorithmic-fairness-from-a-non-ideal |
Repo | |
Framework | |
CBIR using features derived by Deep Learning
Title | CBIR using features derived by Deep Learning |
Authors | Subhadip Maji, Smarajit Bose |
Abstract | In a Content Based Image Retrieval (CBIR) System, the task is to retrieve similar images from a large database given a query image. The usual procedure is to extract some useful features from the query image, and retrieve images which have similar set of features. For this purpose, a suitable similarity measure is chosen, and images with high similarity scores are retrieved. Naturally the choice of these features play a very important role in the success of this system, and high level features are required to reduce the semantic gap. In this paper, we propose to use features derived from pre-trained network models from a deep-learning convolution network trained for a large image classification problem. This approach appears to produce vastly superior results for a variety of databases, and it outperforms many contemporary CBIR systems. We analyse the retrieval time of the method, and also propose a pre-clustering of the database based on the above-mentioned features which yields comparable results in a much shorter time in most of the cases. |
Tasks | Content-Based Image Retrieval, Image Classification, Image Retrieval |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.07877v1 |
https://arxiv.org/pdf/2002.07877v1.pdf | |
PWC | https://paperswithcode.com/paper/cbir-using-features-derived-by-deep-learning |
Repo | |
Framework | |
Spectral neighbor joining for reconstruction of latent tree models
Title | Spectral neighbor joining for reconstruction of latent tree models |
Authors | Ariel Jaffe, Noah Amsel, Boaz Nadler, Joseph T. Chang, Yuval Kluger |
Abstract | A key assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of various organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a common task is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover latent tree graphical models. In contrast to distance based methods, SNJ is based on a spectral measure of similarity between all pairs of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that SNJ requires fewer samples to accurately recover trees in regimes where the tree contains a large number of leaves or long edges. We provide theoretical support for this observation by analyzing the model of a perfect binary tree. |
Tasks | |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12547v1 |
https://arxiv.org/pdf/2002.12547v1.pdf | |
PWC | https://paperswithcode.com/paper/spectral-neighbor-joining-for-reconstruction |
Repo | |
Framework | |
BCNet: Learning Body and Cloth Shape from A Single Image
Title | BCNet: Learning Body and Cloth Shape from A Single Image |
Authors | Boyi Jiang, Juyong Zhang, Yang Hong, Jinhao Luo, Ligang Liu, Hujun Bao |
Abstract | In this paper, we consider the problem to automatically reconstruct both garment and body shapes from a single near front view RGB image. To this end, we propose a layered garment representation on top of SMPL and novelly make the skinning weight of garment to be independent with the body mesh, which significantly improves the expression ability of our garment model. Compared with existing methods, our method can support more garment categories like skirts and recover more accurate garment geometry. To train our model, we construct two large scale datasets with ground truth body and garment geometries as well as paired color images. Compared with single mesh or non-parametric representation, our method can achieve more flexible control with separate meshes, makes applications like re-pose, garment transfer, and garment texture mapping possible. |
Tasks | |
Published | 2020-04-01 |
URL | https://arxiv.org/abs/2004.00214v1 |
https://arxiv.org/pdf/2004.00214v1.pdf | |
PWC | https://paperswithcode.com/paper/bcnet-learning-body-and-cloth-shape-from-a |
Repo | |
Framework | |
ARDA: Automatic Relational Data Augmentation for Machine Learning
Title | ARDA: Automatic Relational Data Augmentation for Machine Learning |
Authors | Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, David Karger |
Abstract | Automatic machine learning (\AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmentation. Automatic data augmentation involves finding new features relevant to the user’s predictive task with minimal ``human-in-the-loop’’ involvement. We present \system, an end-to-end system that takes as input a dataset and a data repository, and outputs an augmented data set such that training a predictive model on this augmented dataset results in improved performance. Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes out noisy or irrelevant features from the resulting join. We perform an extensive empirical evaluation of different system components and benchmark our feature selection algorithm on real-world datasets. | |
Tasks | Data Augmentation, Feature Selection, Model Selection |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09758v1 |
https://arxiv.org/pdf/2003.09758v1.pdf | |
PWC | https://paperswithcode.com/paper/arda-automatic-relational-data-augmentation |
Repo | |
Framework | |
Computer Aided Diagnosis for Spitzoid lesions classification using Artificial Intelligence techniques
Title | Computer Aided Diagnosis for Spitzoid lesions classification using Artificial Intelligence techniques |
Authors | Abir Belaala, Labib Sadek, Noureddine Zerhouni, Christine Devalland |
Abstract | Spitzoid lesions may be largely categorized into Spitz Nevus, Atypical Spitz Tumors, and Spitz Melanomas. Classifying a lesion precisely as Atypical Spitz Tumors or AST is challenging and often requires the integration of clinical, histological, and immunohistochemical features to differentiate AST from regular Spitz nevus and malignant Spitz melanomas. Specifically, this paper aims to test several artificial intelligence techniques so as to build a computer aided diagnosis system. A proposed three-phase approach is being implemented. In Phase I, collected data are preprocessed with an effective Synthetic Minority Oversampling TEchnique or SMOTE-based method being implemented to treat the imbalance data problem. Then, a feature selection mechanism using genetic algorithm (GA) is applied in Phase II. Finally, in Phase III, a ten-fold cross-validation method is used to compare the performance of seven machine-learning algorithms for classification. Results obtained with SMOTE-Multilayer Perceptron with GA-based 14 features show the highest classification accuracy (0.98), a sensitivity of 0.99, and a specificity of 0.98, outperforming other Spitzoid lesions classification algorithms. |
Tasks | Feature Selection |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04745v1 |
https://arxiv.org/pdf/2003.04745v1.pdf | |
PWC | https://paperswithcode.com/paper/computer-aided-diagnosis-for-spitzoid-lesions |
Repo | |
Framework | |
Short-Term Forecasting of CO2 Emission Intensity in Power Grids by Machine Learning
Title | Short-Term Forecasting of CO2 Emission Intensity in Power Grids by Machine Learning |
Authors | Kenneth Leerbeck, Peder Bacher, Rune Junker, Goran Goranović, Olivier Corradi, Razgar Ebrahimy, Anna Tveit, Henrik Madsen |
Abstract | A machine learning algorithm is developed to forecast the CO2 emission intensities in electrical power grids in the Danish bidding zone DK2, distinguishing between average and marginal emissions. The analysis was done on data set comprised of a large number (473) of explanatory variables such as power production, demand, import, weather conditions etc. collected from selected neighboring zones. The number was reduced to less than 50 using both LASSO (a penalized linear regression analysis) and a forward feature selection algorithm. Three linear regression models that capture different aspects of the data (non-linearities and coupling of variables etc.) were created and combined into a final model using Softmax weighted average. Cross-validation is performed for debiasing and autoregressive moving average model (ARIMA) implemented to correct the residuals, making the final model the variant with exogenous inputs (ARIMAX). The forecasts with the corresponding uncertainties are given for two time horizons, below and above six hours. Marginal emissions came up independent of any conditions in the DK2 zone, suggesting that the marginal generators are located in the neighbouring zones. The developed methodology can be applied to any bidding zone in the European electricity network without requiring detailed knowledge about the zone. |
Tasks | Feature Selection |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.05740v1 |
https://arxiv.org/pdf/2003.05740v1.pdf | |
PWC | https://paperswithcode.com/paper/short-term-forecasting-of-co2-emission |
Repo | |
Framework | |
Quantum Bandits
Title | Quantum Bandits |
Authors | Balthazar Casalé, Giuseppe Di Molfetta, Hachem Kadri, Liva Ralaivola |
Abstract | We consider the quantum version of the bandit problem known as {\em best arm identification} (BAI). We first propose a quantum modeling of the BAI problem, which assumes that both the learning agent and the environment are quantum; we then propose an algorithm based on quantum amplitude amplification to solve BAI. We formally analyze the behavior of the algorithm on all instances of the problem and we show, in particular, that it is able to get the optimal solution quadratically faster than what is known to hold in the classical case. |
Tasks | |
Published | 2020-02-15 |
URL | https://arxiv.org/abs/2002.06395v1 |
https://arxiv.org/pdf/2002.06395v1.pdf | |
PWC | https://paperswithcode.com/paper/quantum-bandits |
Repo | |
Framework | |
Memory-Loss is Fundamental for Stability and Distinguishes the Echo State Property Threshold in Reservoir Computing & Beyond
Title | Memory-Loss is Fundamental for Stability and Distinguishes the Echo State Property Threshold in Reservoir Computing & Beyond |
Authors | G Manjunath |
Abstract | Reservoir computing, a highly successful neuromorphic computing scheme used to filter, predict, classify temporal inputs, has entered an era of microchips for several other engineering and biological applications. A basis for reservoir computing is memory-loss or the echo state property. It is an open problem on how design parameters of the reservoir can be optimized to maximize reservoir freedom to map an input robustly and yet have its close-by-variants represented in the reservoir differently. We present a framework to analyze stability due to input and parameter perturbations and make a surprising fundamental conclusion, that the echo state property is \emph{equivalent} to robustness to input in any nonlinear recurrent neural network that may or may not be in the gambit of reservoir computing. Further, backed by theoretical conclusions, we define and find the difficult-to-describe \emph{input specific} edge-of-criticality or the echo state property threshold, which defines the boundary between parameter related stability and instability. |
Tasks | |
Published | 2020-01-03 |
URL | https://arxiv.org/abs/2001.00766v1 |
https://arxiv.org/pdf/2001.00766v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-loss-is-fundamental-for-stability-and |
Repo | |
Framework | |