January 27, 2020

3155 words 15 mins read

Paper Group ANR 1187

Identifying and Correcting Label Bias in Machine Learning. Question Generation from Paragraphs: A Tale of Two Hierarchical Models. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks. Profile-Based Privacy for Locally Private Computations. On the clustering of correlated random vari …

Identifying and Correcting Label Bias in Machine Learning


Title	Identifying and Correcting Label Bias in Machine Learning
Authors	Heinrich Jiang, Ofir Nachum
Abstract	Datasets often contain biases which unfairly disadvantage certain groups, and classifiers trained on such datasets can inherit these biases. In this paper, we provide a mathematical formulation of how this bias can arise. We do so by assuming the existence of underlying, unknown, and unbiased labels which are overwritten by an agent who intends to provide accurate labels but may have biases against certain groups. Despite the fact that we only observe the biased labels, we are able to show that the bias may nevertheless be corrected by re-weighting the data points without changing the labels. We show, with theoretical guarantees, that training on the re-weighted dataset corresponds to training on the unobserved but unbiased labels, thus leading to an unbiased machine learning classifier. Our procedure is fast and robust and can be used with virtually any learning algorithm. We evaluate on a number of standard machine learning fairness datasets and a variety of fairness notions, finding that our method outperforms standard approaches in achieving fair classification.
Tasks
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04966v1
PDF	http://arxiv.org/pdf/1901.04966v1.pdf
PWC	https://paperswithcode.com/paper/identifying-and-correcting-label-bias-in
Repo
Framework

Question Generation from Paragraphs: A Tale of Two Hierarchical Models


Title	Question Generation from Paragraphs: A Tale of Two Hierarchical Models
Authors	Vishwajeet Kumar, Raktim Chaki, Sai Teja Talluri, Ganesh Ramakrishnan, Yuan-Fang Li, Gholamreza Haffari
Abstract	Automatic question generation from paragraphs is an important and challenging problem, particularly due to the long context from paragraphs. In this paper, we propose and study two hierarchical models for the task of question generation from paragraphs. Specifically, we propose (a) a novel hierarchical BiLSTM model with selective attention and (b) a novel hierarchical Transformer architecture, both of which learn hierarchical representations of paragraphs. We model a paragraph in terms of its constituent sentences, and a sentence in terms of its constituent words. While the introduction of the attention mechanism benefits the hierarchical BiLSTM model, the hierarchical Transformer, with its inherent attention and positional encoding mechanisms also performs better than flat transformer model. We conducted empirical evaluation on the widely used SQuAD and MS MARCO datasets using standard metrics. The results demonstrate the overall effectiveness of the hierarchical models over their flat counterparts. Qualitatively, our hierarchical models are able to generate fluent and relevant questions
Tasks	Question Generation
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03407v1
PDF	https://arxiv.org/pdf/1911.03407v1.pdf
PWC	https://paperswithcode.com/paper/question-generation-from-paragraphs-a-tale-of-1
Repo
Framework

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks


Title	Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
Authors	Ziwei Ji, Matus Telgarsky
Abstract	Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size $n$, the (inverse) target error $1/\epsilon$, and the (inverse) failure probability $1/\delta$. This work shows that $\widetilde{\Theta}(1/\epsilon)$ iterations of gradient descent with $\widetilde{\Omega}(1/\epsilon^2)$ training examples on two-layer ReLU networks of any width exceeding $\mathrm{polylog}(n,1/\epsilon,1/\delta)$ suffice to achieve a test misclassification error of $\epsilon$. We also prove that stochastic gradient descent can achieve $\epsilon$ test error with polylogarithmic width and $\widetilde{\Theta}(1/\epsilon)$ samples. The analysis relies upon the separation margin of the limiting kernel, which is guaranteed positive, can distinguish between true labels and random labels, and can give a tight sample-complexity analysis in the infinite-width setting
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.12292v4
PDF	https://arxiv.org/pdf/1909.12292v4.pdf
PWC	https://paperswithcode.com/paper/polylogarithmic-width-suffices-for-gradient
Repo
Framework

Profile-Based Privacy for Locally Private Computations


Title	Profile-Based Privacy for Locally Private Computations
Authors	Joseph Geumlek, Kamalika Chaudhuri
Abstract	Differential privacy has emerged as a gold standard in privacy-preserving data analysis. A popular variant is local differential privacy, where the data holder is the trusted curator. A major barrier, however, towards a wider adoption of this model is that it offers a poor privacy-utility tradeoff. In this work, we address this problem by introducing a new variant of local privacy called profile-based privacy. The central idea is that the problem setting comes with a graph G of data generating distributions, whose edges encode sensitive pairs of distributions that should be made indistinguishable. This provides higher utility because unlike local differential privacy, we no longer need to make every pair of private values in the domain indistinguishable, and instead only protect the identity of the underlying distribution. We establish privacy properties of the profile-based privacy definition, such as post-processing invariance and graceful composition. Finally, we provide mechanisms that are private in this framework, and show via simulations that they achieve higher utility than the corresponding local differential privacy mechanisms.
Tasks
Published	2019-01-21
URL	https://arxiv.org/abs/1903.09084v2
PDF	https://arxiv.org/pdf/1903.09084v2.pdf
PWC	https://paperswithcode.com/paper/profile-based-privacy-for-locally-private
Repo
Framework

On the clustering of correlated random variables


Title	On the clustering of correlated random variables
Authors	Zenon Gniazdowski, Dawid Kaliszewski
Abstract	In this work, the possibility of clustering correlated random variables was examined, both because of their mutual similarity and because of their similarity to the principal components. The k-means algorithm and spectral algorithms were used for clustering. For spectral methods, the similarity matrix was both the matrix of relation established on the level of correlation and the matrix of coefficients of determination. For four different sets of data, different ways of measuring the disimilarity of variables were analyzed, and the impact of the diversity of initial points on the efficiency of the k-means algorithm was analyzed.
Tasks
Published	2019-09-07
URL	https://arxiv.org/abs/1909.03332v1
PDF	https://arxiv.org/pdf/1909.03332v1.pdf
PWC	https://paperswithcode.com/paper/on-the-clustering-of-correlated-random
Repo
Framework

Managing Machine Learning Workflow Components


Title	Managing Machine Learning Workflow Components
Authors	Marcio Moreno, Vítor Lourenço, Sandro Rama Fiorini, Polyana Costa, Rafael Brandão, Daniel Civitarese, Renato Cerqueira
Abstract	Machine Learning Workflows~(MLWfs) have become essential and a disruptive approach in problem-solving over several industries. However, the development process of MLWfs may be complicated, hard to achieve, time-consuming, and error-prone. To handle this problem, in this paper, we introduce \emph{machine learning workflow management}~(MLWfM) as a technique to aid the development and reuse of MLWfs and their components through three aspects: representation, execution, and creation. More precisely, we discuss our approach to structure the MLWfs’ components and their metadata to aid retrieval and reuse of components in new MLWfs. Also, we consider the execution of these components within a tool. The hybrid knowledge representation, called Hyperknowledge, frames our methodology, supporting the three MLWfM’s aspects. To validate our approach, we show a practical use case in the Oil & Gas industry.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.05665v1
PDF	https://arxiv.org/pdf/1912.05665v1.pdf
PWC	https://paperswithcode.com/paper/managing-machine-learning-workflow-components
Repo
Framework

Automated Detection of Left Ventricle in Arterial Input Function Images for Inline Perfusion Mapping using Deep Learning: A study of 15,000 Patients


Title	Automated Detection of Left Ventricle in Arterial Input Function Images for Inline Perfusion Mapping using Deep Learning: A study of 15,000 Patients
Authors	Hui Xue, Ethan Tseng, Kristopher D Knott, Tushar Kotecha, Louise Brown, Sven Plein, Marianna Fontana, James C Moon, Peter Kellman
Abstract	Quantification of myocardial perfusion has the potential to improve detection of regional and global flow reduction. Significant effort has been made to automate the workflow, where one essential step is the arterial input function (AIF) extraction. Since failure here invalidates quantification, high accuracy is required. For this purpose, this study presents a robust AIF detection method using the convolutional neural net (CNN) model. CNN models were trained by assembling 25,027 scans (N=12,984 patients) from three hospitals, seven scanners. A test set of 5,721 scans (N=2,805 patients) evaluated model performance. The 2D+T AIF time series was inputted into CNN. Two variations were investigated: a) Two Classes (2CS) for background and foreground (LV mask); b) Three Classes (3CS) for background, foreground LV and RV. Final model was deployed on MR scanners via the Gadgetron InlineAI. Model loading time on MR scanner was ~340ms and applying it took ~180ms. The 3CS model successfully detect LV for 99.98% of all test cases (1 failed out of 5,721 cases). The mean Dice ratio for 3CS was 0.87+/-0.08 with 92.0% of all test cases having Dice ratio >0.75, while the 2CS model gave lower Dice of 0.82+/-0.22 (P<1e-5). Extracted AIF signals using CNN were further compared to manual ground-truth for foot-time, peak-time, first-pass duration, peak value and area-under-curve. No significant differences were found for all features (P>0.2). This study proposed, validated, and deployed a robust CNN solution to detect the LV for the extraction of the AIF signal used in fully automated perfusion flow mapping. A very large data cohort was assembled and resulting models were deployed to MR scanners for fully inline AI in clinical hospitals.
Tasks	Time Series
Published	2019-10-16
URL	https://arxiv.org/abs/1910.07122v1
PDF	https://arxiv.org/pdf/1910.07122v1.pdf
PWC	https://paperswithcode.com/paper/automated-detection-of-left-ventricle-in
Repo
Framework

Learning with Sets in Multiple Instance Regression Applied to Remote Sensing


Title	Learning with Sets in Multiple Instance Regression Applied to Remote Sensing
Authors	Thomas Uriot
Abstract	In this paper, we propose a novel approach to tackle the multiple instance regression (MIR) problem. This problem arises when the data is a collection of bags, where each bag is made of multiple instances corresponding to the same unique real-valued label. Our goal is to train a regression model which maps the instances of an unseen bag to its unique label. This MIR setting is common to remote sensing applications where there is high variability in the measurements and low geographical variability in the quantity being estimated. Our approach, in contrast to most competing methods, does not make the assumption that there exists a prime instance responsible for the label in each bag. Instead, we treat each bag as a set (i.e, an unordered sequence) of instances and learn to map each bag to its unique label by using all the instances in each bag. This is done by implementing an order-invariant operation characterized by a particular type of attention mechanism. This method is very flexible as it does not require domain knowledge nor does it make any assumptions about the distribution of the instances within each bag. We test our algorithm on five real world datasets and outperform previous state-of-the-art on three of the datasets. In addition, we augment our feature space by adding the moments of each feature for each bag, as extra features, and show that while the first moments lead to higher accuracy, there is a diminishing return.
Tasks
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07745v3
PDF	https://arxiv.org/pdf/1903.07745v3.pdf
PWC	https://paperswithcode.com/paper/learning-with-sets-in-multiple-instance
Repo
Framework

Competing Ratio Loss for Discriminative Multi-class Image Classification


Title	Competing Ratio Loss for Discriminative Multi-class Image Classification
Authors	Ke Zhang, Xinsheng Wang, Yurong Guo, Dongliang Chang, Zhenbing Zhao, Zhanyu Ma, Tony X. Han
Abstract	The development of deep convolutional neural network architecture is critical to the improvement of image classification task performance. Many image classification studies use deep convolutional neural network and focus on modifying the network structure to improve image classification performance. Conversely, our study focuses on loss function design. Cross-entropy Loss (CEL) has been widely used for training deep convolutional neural network for the task of multi-class classification. Although CEL has been successfully implemented in several image classification tasks, it only focuses on the posterior probability of the correct class. For this reason, a negative log likelihood ratio loss (NLLR) was proposed to better differentiate between the correct class and the competing incorrect ones. However, during the training of the deep convolutional neural network, the value of NLLR is not always positive or negative, which severely affects the convergence of NLLR. Our proposed competing ratio loss (CRL) calculates the posterior probability ratio between the correct class and the competing incorrect classes to further enlarge the probability difference between the correct and incorrect classes. We added hyperparameters to CRL, thereby ensuring its value to be positive and that the update size of backpropagation is suitable for the CRL’s fast convergence. To demonstrate the performance of CRL, we conducted experiments on general image classification tasks (CIFAR10/100, SVHN, ImageNet), the fine-grained image classification tasks (CUB200-2011 and Stanford Car), and the challenging face age estimation task (using Adience). Experimental results show the effectiveness and robustness of the proposed loss function on different deep convolutional neural network architectures and different image classification tasks.
Tasks	Age Estimation, Fine-Grained Image Classification, Image Classification
Published	2019-12-25
URL	https://arxiv.org/abs/1912.11642v1
PDF	https://arxiv.org/pdf/1912.11642v1.pdf
PWC	https://paperswithcode.com/paper/competing-ratio-loss-for-discriminative-multi-1
Repo
Framework

Analysis of the Optimization Landscapes for Overcomplete Representation Learning


Title	Analysis of the Optimization Landscapes for Overcomplete Representation Learning
Authors	Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu
Abstract	We study nonconvex optimization landscapes for learning overcomplete representations, including learning (i) sparsely used overcomplete dictionaries and (ii) convolutional dictionaries, where these unsupervised learning problems find many applications in high-dimensional data analysis. Despite the empirical success of simple nonconvex algorithms, theoretical justifications of why these methods work so well are far from satisfactory. In this work, we show these problems can be formulated as $\ell^4$-norm optimization problems with spherical constraint, and study the geometric properties of their nonconvex optimization landscapes. For both problems, we show the nonconvex objectives have benign (global) geometric structures, in the sense that every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature. This discovery enables the development of guaranteed global optimization methods using simple initializations. For both problems, we show the nonconvex objectives have benign geometric structures – every local minimizer is close to one of the target solutions and every saddle point exhibits negative curvature – either in the entire space or within a sufficiently large region. This discovery ensures local search algorithms (such as Riemannian gradient descent) with simple initializations approximately find the target solutions. Finally, numerical experiments justify our theoretical discoveries.
Tasks	Representation Learning
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02427v2
PDF	https://arxiv.org/pdf/1912.02427v2.pdf
PWC	https://paperswithcode.com/paper/analysis-of-the-optimization-landscapes-for
Repo
Framework

Improving Gradient Estimation in Evolutionary Strategies With Past Descent Directions


Title	Improving Gradient Estimation in Evolutionary Strategies With Past Descent Directions
Authors	Florian Meier, Asier Mujika, Marcelo Matheus Gauy, Angelika Steger
Abstract	Evolutionary Strategies (ES) are known to be an effective black-box optimization technique for deep neural networks when the true gradients cannot be computed, such as in Reinforcement Learning. We continue a recent line of research that uses surrogate gradients to improve the gradient estimation of ES. We propose a novel method to optimally incorporate surrogate gradient information. Our approach, unlike previous work, needs no information about the quality of the surrogate gradients and is always guaranteed to find a descent direction that is better than the surrogate gradient. This allows to iteratively use the previous gradient estimate as surrogate gradient for the current search point. We theoretically prove that this yields fast convergence to the true gradient for linear functions and show under simplifying assumptions that it significantly improves gradient estimates for general functions. Finally, we evaluate our approach empirically on MNIST and reinforcement learning tasks and show that it considerably improves the gradient estimation of ES at no extra computational cost.
Tasks
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05268v1
PDF	https://arxiv.org/pdf/1910.05268v1.pdf
PWC	https://paperswithcode.com/paper/improving-gradient-estimation-in-evolutionary-1
Repo
Framework

Introducing Graph Smoothness Loss for Training Deep Learning Architectures


Title	Introducing Graph Smoothness Loss for Training Deep Learning Architectures
Authors	Myriam Bontonou, Carlos Lassance, Ghouthi Boukli Hacene, Vincent Gripon, Jian Tang, Antonio Ortega
Abstract	We introduce a novel loss function for training deep learning architectures to perform classification. It consists in minimizing the smoothness of label signals on similarity graphs built at the output of the architecture. Equivalently, it can be seen as maximizing the distances between the network function images of training inputs from distinct classes. As such, only distances between pairs of examples in distinct classes are taken into account in the process, and the training does not prevent inputs from the same class to be mapped to distant locations in the output domain. We show that this loss leads to similar performance in classification as architectures trained using the classical cross-entropy, while offering interesting degrees of freedom and properties. We also demonstrate the interest of the proposed loss to increase robustness of trained architectures to deviations of the inputs.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00301v1
PDF	http://arxiv.org/pdf/1905.00301v1.pdf
PWC	https://paperswithcode.com/paper/introducing-graph-smoothness-loss-for
Repo
Framework

Data consistency networks for (calibration-less) accelerated parallel MR image reconstruction


Title	Data consistency networks for (calibration-less) accelerated parallel MR image reconstruction
Authors	Jo Schlemper, Jinming Duan, Cheng Ouyang, Chen Qin, Jose Caballero, Joseph V. Hajnal, Daniel Rueckert
Abstract	We present simple reconstruction networks for multi-coil data by extending deep cascade of CNN’s and exploiting the data consistency layer. In particular, we propose two variants, where one is inspired by POCSENSE and the other is calibration-less. We show that the proposed approaches are competitive relative to the state of the art both quantitatively and qualitatively.
Tasks	Calibration, Image Reconstruction
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11795v1
PDF	https://arxiv.org/pdf/1909.11795v1.pdf
PWC	https://paperswithcode.com/paper/data-consistency-networks-for-calibration
Repo
Framework

Recovering low-rank structure from multiple networks with unknown edge distributions


Title	Recovering low-rank structure from multiple networks with unknown edge distributions
Authors	Keith Levin, Asad Lodhia, Elizaveta Levina
Abstract	In increasingly many settings, particularly in neuroimaging, data sets consist of multiple samples from a population of networks, with vertices aligned across networks. For example, fMRI studies yield graphs whose vertices correspond to brain regions, which are the same across subjects. We consider the setting where we observe a sample of networks whose adjacency matrices have a shared low-rank expectation, but edge-level noise distributions may vary from one network to another. We show that so long as edge noise is sub-gamma distributed in each network, the shared low-rank structure can be recovered accurately using an eigenvalue truncation of a weighted network average. We also explore the extent to which edge-level errors influence estimation and downstream inference tasks. The proposed approach is illustrated on synthetic networks and on an fMRI study of schizophrenia.
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.07265v1
PDF	https://arxiv.org/pdf/1906.07265v1.pdf
PWC	https://paperswithcode.com/paper/recovering-low-rank-structure-from-multiple
Repo
Framework

Approximation capability of neural networks on spaces of probability measures and tree-structured domains


Title	Approximation capability of neural networks on spaces of probability measures and tree-structured domains
Authors	Tomas Pevny, Vojtech Kovarik
Abstract	This paper extends the proof of density of neural networks in the space of continuous (or even measurable) functions on Euclidean spaces to functions on compact sets of probability measures. By doing so the work parallels a more then a decade old results on mean-map embedding of probability measures in reproducing kernel Hilbert spaces. The work has wide practical consequences for multi-instance learning, where it theoretically justifies some recently proposed constructions. The result is then extended to Cartesian products, yielding universal approximation theorem for tree-structured domains, which naturally occur in data-exchange formats like JSON, XML, YAML, AVRO, and ProtoBuffer. This has important practical implications, as it enables to automatically create an architecture of neural networks for processing structured data (AutoML paradigms), as demonstrated by an accompanied library for JSON format.
Tasks	AutoML
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00764v1
PDF	https://arxiv.org/pdf/1906.00764v1.pdf
PWC	https://paperswithcode.com/paper/190600764
Repo
Framework