April 2, 2020

3106 words 15 mins read

Paper Group ANR 361

Paper Group ANR 361

Balancedness and Alignment are Unlikely in Linear Neural Networks. A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels. Linearly Constrained Neural Networks. Sample Complexity Result for Multi-category Classifiers of Bounded Variation. Communication-Efficient Massive UAV Online Path Control: Federa …

Balancedness and Alignment are Unlikely in Linear Neural Networks

Title Balancedness and Alignment are Unlikely in Linear Neural Networks
Authors Adityanarayanan Radhakrishnan, Eshaan Nichani, Daniel Bernstein, Caroline Uhler
Abstract We study the invariance properties of alignment in linear neural networks under gradient descent. Alignment of weight matrices is a form of implicit regularization, and previous works have studied this phenomenon in fully connected networks with 1-dimensional outputs. In such networks, we prove that there exists an initialization such that adjacent layers remain aligned throughout training under any real-valued loss function. We then define alignment for fully connected networks with multidimensional outputs and prove that it generally cannot be an invariant for such networks under the squared loss. Moreover, we characterize the datasets under which alignment is possible. We then analyze networks with layer constraints such as convolutional networks. In particular, we prove that gradient descent is equivalent to projected gradient descent, and show that alignment is impossible given sufficiently large datasets. Importantly, since our definition of alignment is a relaxation of balancedness, our negative results extend to this property.
Tasks
Published 2020-03-13
URL https://arxiv.org/abs/2003.06340v1
PDF https://arxiv.org/pdf/2003.06340v1.pdf
PWC https://paperswithcode.com/paper/balancedness-and-alignment-are-unlikely-in
Repo
Framework

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels

Title A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels
Authors Lorenz Braun, Sotirios Nikas, Chen Song, Vincent Heuveline, Holger Fröning
Abstract Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non trivial task. We address this with a simple model enabling portable and fast predictions among different GPUs using only hardware-independent features extracted. This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU and SHOC. Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of [13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on five different GPUs, while latency for a single prediction varies between 0.1 and 0.2 seconds.
Tasks
Published 2020-01-20
URL https://arxiv.org/abs/2001.07104v1
PDF https://arxiv.org/pdf/2001.07104v1.pdf
PWC https://paperswithcode.com/paper/a-simple-model-for-portable-and-fast
Repo
Framework

Linearly Constrained Neural Networks

Title Linearly Constrained Neural Networks
Authors Johannes Hendriks, Carl Jidling, Adrian Wills, Thomas Schön
Abstract We present an approach to designing neural network based models that will explicitly satisfy known linear constraints. To achieve this, the target function is modelled as a linear transformation of an underlying function. This transformation is chosen such that any prediction of the target function is guaranteed to satisfy the constraints and can be determined from known physics or, more generally, by following a constructive procedure that was previously presented for Gaussian processes. The approach is demonstrated on simulated and real-data examples.
Tasks Gaussian Processes
Published 2020-02-05
URL https://arxiv.org/abs/2002.01600v1
PDF https://arxiv.org/pdf/2002.01600v1.pdf
PWC https://paperswithcode.com/paper/linearly-constrained-neural-networks
Repo
Framework

Sample Complexity Result for Multi-category Classifiers of Bounded Variation

Title Sample Complexity Result for Multi-category Classifiers of Bounded Variation
Authors Khadija Musayeva
Abstract We control the probability of the uniform deviation between empirical and generalization performances of multi-category classifiers by an empirical L1 -norm covering number when these performances are defined on the basis of the truncated hinge loss function. The only assumption made on the functions implemented by multi-category classifiers is that they are of bounded variation (BV). For such classifiers, we derive the sample size estimate sufficient for the mentioned performances to be close with high probability. Particularly, we are interested in the dependency of this estimate on the number C of classes. To this end, first, we upper bound the scale-sensitive version of the VC-dimension, the fat-shattering dimension of sets of BV functions defined on R^d which gives a O(1/epsilon^d ) as the scale epsilon goes to zero. Secondly, we provide a sharper decomposition result for the fat-shattering dimension in terms of C, which for sets of BV functions gives an improvement from O(C^(d/2 +1)) to O(Cln^2(C)). This improvement then propagates to the sample complexity estimate.
Tasks
Published 2020-03-20
URL https://arxiv.org/abs/2003.09176v1
PDF https://arxiv.org/pdf/2003.09176v1.pdf
PWC https://paperswithcode.com/paper/sample-complexity-result-for-multi-category
Repo
Framework

Communication-Efficient Massive UAV Online Path Control: Federated Learning Meets Mean-Field Game Theory

Title Communication-Efficient Massive UAV Online Path Control: Federated Learning Meets Mean-Field Game Theory
Authors Hamid Shiri, Jihong Park, Mehdi Bennis
Abstract This paper investigates the control of a massive population of UAVs such as drones. The straightforward method of control of UAVs by considering the interactions among them to make a flock requires a huge inter-UAV communication which is impossible to implement in real-time applications. One method of control is to apply the mean-field game (MFG) framework which substantially reduces communications among the UAVs. However, to realize this framework, powerful processors are required to obtain the control laws at different UAVs. This requirement limits the usage of the MFG framework for real-time applications such as massive UAV control. Thus, a function approximator based on neural networks (NN) is utilized to approximate the solutions of Hamilton-Jacobi-Bellman (HJB) and Fokker-Planck-Kolmogorov (FPK) equations. Nevertheless, using an approximate solution can violate the conditions for convergence of the MFG framework. Therefore, the federated learning (FL) approach which can share the model parameters of NNs at drones, is proposed with NN based MFG to satisfy the required conditions. The stability analysis of the NN based MFG approach is presented and the performance of the proposed FL-MFG is elaborated by the simulations.
Tasks
Published 2020-03-09
URL https://arxiv.org/abs/2003.04451v1
PDF https://arxiv.org/pdf/2003.04451v1.pdf
PWC https://paperswithcode.com/paper/communication-efficient-massive-uav-online
Repo
Framework

Heavy-tailed Representations, Text Polarity Classification & Data Augmentation

Title Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
Authors Hamid Jalalzai, Pierre Colombo, Chloé Clavel, Eric Gaussier, Giovanna Varni, Emmanuel Vignon, Anne Sabourin
Abstract The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which performance outperforms the baseline. This classifier exhibits a scale invariance property which we leverage by introducing a novel text generation method for label preserving dataset augmentation. Numerical experiments on synthetic and real text data demonstrate the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, e.g. positive or negative sentiment.
Tasks Data Augmentation, Text Generation
Published 2020-03-25
URL https://arxiv.org/abs/2003.11593v1
PDF https://arxiv.org/pdf/2003.11593v1.pdf
PWC https://paperswithcode.com/paper/heavy-tailed-representations-text-polarity
Repo
Framework

VMRFANet:View-Specific Multi-Receptive Field Attention Network for Person Re-identification

Title VMRFANet:View-Specific Multi-Receptive Field Attention Network for Person Re-identification
Authors Honglong Cai, Yuedong Fang, Zhiguan Wang, Tingchun Yeh, Jinxing Cheng
Abstract Person re-identification (re-ID) aims to retrieve the same person across different cameras. In practice, it still remains a challenging task due to background clutter, variations on body poses and view conditions, inaccurate bounding box detection, etc. To tackle these issues, in this paper, we propose a novel multi-receptive field attention (MRFA) module that utilizes filters of various sizes to help network focusing on informative pixels. Besides, we present a view-specific mechanism that guides attention module to handle the variation of view conditions. Moreover, we introduce a Gaussian horizontal random cropping/padding method which further improves the robustness of our proposed network. Comprehensive experiments demonstrate the effectiveness of each component. Our method achieves 95.5% / 88.1% in rank-1 / mAP on Market-1501, 88.9% / 80.0% on DukeMTMC-reID, 81.1% / 78.8% on CUHK03 labeled dataset and 78.9% / 75.3% on CUHK03 detected dataset, outperforming current state-of-the-art methods.
Tasks Person Re-Identification
Published 2020-01-21
URL https://arxiv.org/abs/2001.07354v1
PDF https://arxiv.org/pdf/2001.07354v1.pdf
PWC https://paperswithcode.com/paper/vmrfanetview-specific-multi-receptive-field
Repo
Framework

A Unified View of Label Shift Estimation

Title A Unified View of Label Shift Estimation
Authors Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, Zachary C. Lipton
Abstract Label shift describes the setting where although the label distribution might change between the source and target domains, the class-conditional probabilities (of data given a label) do not. There are two dominant approaches for estimating the label marginal. BBSE, a moment-matching approach based on confusion matrices, is provably consistent and provides interpretable error bounds. However, a maximum likelihood estimation approach, which we call MLLS, dominates empirically. In this paper, we present a unified view of the two methods and the first theoretical characterization of the likelihood-based estimator. Our contributions include (i) conditions for consistency of MLLS, which include calibration of the classifier and a confusion matrix invertibility condition that BBSE also requires; (ii) a unified view of the methods, casting the confusion matrix as roughly equivalent to MLLS for a particular choice of calibration method; and (iii) a decomposition of MLLS’s finite-sample error into terms reflecting the impacts of miscalibration and estimation error. Our analysis attributes BBSE’s statistical inefficiency to a loss of information due to coarse calibration. We support our findings with experiments on both synthetic data and the MNIST and CIFAR10 image recognition datasets.
Tasks Calibration
Published 2020-03-17
URL https://arxiv.org/abs/2003.07554v1
PDF https://arxiv.org/pdf/2003.07554v1.pdf
PWC https://paperswithcode.com/paper/a-unified-view-of-label-shift-estimation
Repo
Framework

Generating Natural Language Adversarial Examples on a Large Scale with Generative Models

Title Generating Natural Language Adversarial Examples on a Large Scale with Generative Models
Authors Yankun Ren, Jianbin Lin, Siliang Tang, Jun Zhou, Shuang Yang, Yuan Qi, Xiang Ren
Abstract Today text classification models have been widely used. However, these classifiers are found to be easily fooled by adversarial examples. Fortunately, standard attacking methods generate adversarial texts in a pair-wise way, that is, an adversarial text can only be created from a real-world text by replacing a few words. In many applications, these texts are limited in numbers, therefore their corresponding adversarial examples are often not diverse enough and sometimes hard to read, thus can be easily detected by humans and cannot create chaos at a large scale. In this paper, we propose an end to end solution to efficiently generate adversarial texts from scratch using generative models, which are not restricted to perturbing the given texts. We call it unrestricted adversarial text generation. Specifically, we train a conditional variational autoencoder (VAE) with an additional adversarial loss to guide the generation of adversarial examples. Moreover, to improve the validity of adversarial texts, we utilize discrimators and the training framework of generative adversarial networks (GANs) to make adversarial texts consistent with real data. Experimental results on sentiment analysis demonstrate the scalability and efficiency of our method. It can attack text classification models with a higher success rate than existing methods, and provide acceptable quality for humans in the meantime.
Tasks Adversarial Text, Sentiment Analysis, Text Classification, Text Generation
Published 2020-03-10
URL https://arxiv.org/abs/2003.10388v1
PDF https://arxiv.org/pdf/2003.10388v1.pdf
PWC https://paperswithcode.com/paper/generating-natural-language-adversarial-3
Repo
Framework

Optimal estimation of high-dimensional Gaussian mixtures

Title Optimal estimation of high-dimensional Gaussian mixtures
Authors Natalie Doss, Yihong Wu, Pengkun Yang, Harrison H. Zhou
Abstract This paper studies the optimal rate of estimation in a finite Gaussian location mixture model in high dimensions without separation conditions. We assume that the number of components $k$ is bounded and that the centers lie in a ball of bounded radius, while allowing the dimension $d$ to be as large as the sample size $n$. Extending the one-dimensional result of Heinrich and Kahn \cite{HK2015}, we show that the minimax rate of estimating the mixing distribution in Wasserstein distance is $\Theta((d/n)^{1/4} + n^{-1/(4k-2)})$, achieved by an estimator computable in time $O(nd^2+n^{5/4})$. Furthermore, we show that the mixture density can be estimated at the optimal parametric rate $\Theta(\sqrt{d/n})$ in Hellinger distance; however, no computationally efficient algorithm is known to achieve the optimal rate. Both the theoretical and methodological development rely on a careful application of the method of moments. Central to our results is the observation that the information geometry of finite Gaussian mixtures is characterized by the moment tensors of the mixing distribution, whose low-rank structure can be exploited to obtain a sharp local entropy bound.
Tasks
Published 2020-02-14
URL https://arxiv.org/abs/2002.05818v1
PDF https://arxiv.org/pdf/2002.05818v1.pdf
PWC https://paperswithcode.com/paper/optimal-estimation-of-high-dimensional
Repo
Framework

A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation

Title A Neural Topical Expansion Framework for Unstructured Persona-oriented Dialogue Generation
Authors Minghong Xu, Piji Li, Haoran Yang, Pengjie Ren, Zhaochun Ren, Zhumin Chen, Jun Ma
Abstract Unstructured Persona-oriented Dialogue Systems (UPDS) has been demonstrated effective in generating persona consistent responses by utilizing predefined natural language user persona descriptions (e.g., “I am a vegan”). However, the predefined user persona descriptions are usually short and limited to only a few descriptive words, which makes it hard to correlate them with the dialogues. As a result, existing methods either fail to use the persona description or use them improperly when generating persona consistent responses. To address this, we propose a neural topical expansion framework, namely Persona Exploration and Exploitation (PEE), which is able to extend the predefined user persona description with semantically correlated content before utilizing them to generate dialogue responses. PEE consists of two main modules: persona exploration and persona exploitation. The former learns to extend the predefined user persona description by mining and correlating with existing dialogue corpus using a variational auto-encoder (VAE) based topic model. The latter learns to generate persona consistent responses by utilizing the predefined and extended user persona description. In order to make persona exploitation learn to utilize user persona description more properly, we also introduce two persona-oriented loss functions: Persona-oriented Matching (P-Match) loss and Persona-oriented Bag-of-Words (P-BoWs) loss which respectively supervise persona selection in encoder and decoder. Experimental results show that our approach outperforms state-of-the-art baselines, in terms of both automatic and human evaluations.
Tasks Dialogue Generation
Published 2020-02-06
URL https://arxiv.org/abs/2002.02153v1
PDF https://arxiv.org/pdf/2002.02153v1.pdf
PWC https://paperswithcode.com/paper/a-neural-topical-expansion-framework-for
Repo
Framework

Failout: Achieving Failure-Resilient Inference in Distributed Neural Networks

Title Failout: Achieving Failure-Resilient Inference in Distributed Neural Networks
Authors Ashkan Yousefpour, Brian Q. Nguyen, Siddartha Devic, Guanhua Wang, Aboudy Kreidieh, Hans Lobel, Alexandre M. Bayen, Jason P. Jue
Abstract When a neural network is partitioned and distributed across physical nodes, failure of physical nodes causes the failure of the neural units that are placed on those nodes, which results in a significant performance drop. Current approaches focus on resiliency of training in distributed neural networks. However, resiliency of inference in distributed neural networks is less explored. We introduce ResiliNet, a scheme for making inference in distributed neural networks resilient to physical node failures. ResiliNet combines two concepts to provide resiliency: skip connection in residual neural networks, and a novel technique called failout, which is introduced in this paper. Failout simulates physical node failure conditions during training using dropout, and is specifically designed to improve the resiliency of distributed neural networks. The results of the experiments and ablation studies using three datasets confirm the ability of ResiliNet to provide inference resiliency for distributed neural networks.
Tasks
Published 2020-02-18
URL https://arxiv.org/abs/2002.07386v1
PDF https://arxiv.org/pdf/2002.07386v1.pdf
PWC https://paperswithcode.com/paper/failout-achieving-failure-resilient-inference
Repo
Framework

MCMLSD: A Probabilistic Algorithm and Evaluation Framework for Line Segment Detection

Title MCMLSD: A Probabilistic Algorithm and Evaluation Framework for Line Segment Detection
Authors James H. Elder, Emilio J. Almazàn, Yiming Qian, Ron Tal
Abstract Traditional approaches to line segment detection typically involve perceptual grouping in the image domain and/or global accumulation in the Hough domain. Here we propose a probabilistic algorithm that merges the advantages of both approaches. In a first stage lines are detected using a global probabilistic Hough approach. In the second stage each detected line is analyzed in the image domain to localize the line segments that generated the peak in the Hough map. By limiting search to a line, the distribution of segments over the sequence of points on the line can be modeled as a Markov chain, and a probabilistically optimal labelling can be computed exactly using a standard dynamic programming algorithm, in linear time. The Markov assumption also leads to an intuitive ranking method that uses the local marginal posterior probabilities to estimate the expected number of correctly labelled points on a segment. To assess the resulting Markov Chain Marginal Line Segment Detector (MCMLSD) we develop and apply a novel quantitative evaluation methodology that controls for under- and over-segmentation. Evaluation on the YorkUrbanDB and Wireframe datasets shows that the proposed MCMLSD method outperforms prior traditional approaches, as well as more recent deep learning methods.
Tasks Line Segment Detection
Published 2020-01-06
URL https://arxiv.org/abs/2001.01788v1
PDF https://arxiv.org/pdf/2001.01788v1.pdf
PWC https://paperswithcode.com/paper/mcmlsd-a-probabilistic-algorithm-and
Repo
Framework

Detecting Symmetries with Neural Networks

Title Detecting Symmetries with Neural Networks
Authors Sven Krippendorf, Marc Syvaeri
Abstract Identifying symmetries in data sets is generally difficult, but knowledge about them is crucial for efficient data handling. Here we present a method how neural networks can be used to identify symmetries. We make extensive use of the structure in the embedding layer of the neural network which allows us to identify whether a symmetry is present and to identify orbits of the symmetry in the input. To determine which continuous or discrete symmetry group is present we analyse the invariant orbits in the input. We present examples based on rotation groups $SO(n)$ and the unitary group $SU(2).$ Further we find that this method is useful for the classification of complete intersection Calabi-Yau manifolds where it is crucial to identify discrete symmetries on the input space. For this example we present a novel data representation in terms of graphs.
Tasks
Published 2020-03-30
URL https://arxiv.org/abs/2003.13679v1
PDF https://arxiv.org/pdf/2003.13679v1.pdf
PWC https://paperswithcode.com/paper/detecting-symmetries-with-neural-networks
Repo
Framework

Prediction with Spatio-temporal Point Processes with Self Organizing Decision Trees

Title Prediction with Spatio-temporal Point Processes with Self Organizing Decision Trees
Authors Oguzhan Karaahmetoglu, Suleyman Serdar Kozat
Abstract We study the spatio-temporal prediction problem, which has attracted attention of many researchers due to its critical real-life applications. In particular, we introduce a novel approach to this problem. Our approach is based on the Hawkes process, which is a non-stationary and self-exciting point process. We extend the formulations of a standard point process model that can represent time-series data to represent a spatio-temporal data. We model the data as nonstationary in time and space. Furthermore, we partition the spatial region we are working on into subregions via an adaptive decision tree and model the source statistics in each subregion with individual but mutually interacting point processes. We also provide a gradient based joint optimization algorithm for the point process and decision tree parameters. Thus, we introduce a model that can jointly infer the source statistics and an adaptive partitioning of the spatial region. Finally, we provide experimental results on a real-life data, which provides significant improvement due to space adaptation and joint optimization compared to standard well-known methods in the literature.
Tasks Point Processes, Time Series
Published 2020-03-07
URL https://arxiv.org/abs/2003.03657v1
PDF https://arxiv.org/pdf/2003.03657v1.pdf
PWC https://paperswithcode.com/paper/prediction-with-spatio-temporal-point
Repo
Framework
comments powered by Disqus