April 1, 2020

3055 words 15 mins read

Paper Group ANR 405

Paper Group ANR 405

Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches. Efficient Memory Management for Deep Neural Net Inference. Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology. TCM-ICP: Transformation Compatibility Measure for Registering Multiple LIDAR Scans. Decoupling Learning R …

Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches

Title Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches
Authors Naeem Paeedeh, Kamaledin Ghiasi-Shirazi
Abstract Least mean squares (LMS) is a particular case of the backpropagation (BP) algorithm applied to single-layer neural networks with the mean squared error (MSE) loss. One drawback of the LMS is that the instantaneous weight update is proportional to the square of the norm of the input vector. Normalized least mean squares (NLMS) algorithm amends this drawback by dividing the weight changes by the square of the norm of the input vector. The affine projection algorithm (APA) improved the NLMS algorithm to weight update over a batch of recently seen samples. However, the application of NLMS and APA had been limited to single-layer networks and adaptive filters. In this paper, we consider a virtual target for each neuron of a multi-layer neural network and show that the BP algorithm is equivalent to training the weights of each layer using these virtual targets and the LMS algorithm. We also introduce a consequentialism interpretation of the NLMS and the APA algorithms that justifies their use in multi-layer neural networks. Given any optimization algorithm based on the BP over mini-batches, we propose a novel consequentialism method for updating the weights.Consequently, our proposed weight update can be applied both to plain stochastic gradient descent (SGD) and to momentum methods like RMSProp, Adam, and NAG. These ideas helped us to update the weights more carefully in such a way that minimization of the loss for one sample of the mini-batch does not interfere with other samples in that mini-batch. Our experiments show the usefulness of the proposed method in optimizing deep neural network architectures.
Published 2020-03-11
URL https://arxiv.org/abs/2003.05164v1
PDF https://arxiv.org/pdf/2003.05164v1.pdf
PWC https://paperswithcode.com/paper/improving-the-backpropagation-algorithm-with

Efficient Memory Management for Deep Neural Net Inference

Title Efficient Memory Management for Deep Neural Net Inference
Authors Yury Pisarchyk, Juhyun Lee
Abstract While deep neural net inference was considered a task for servers only, latest advances in technology allow the task of inference to be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge. We explore various strategies to smartly share memory buffers among intermediate tensors in deep neural nets. Employing these can result in up to 11% smaller memory footprint than the state of the art.
Published 2020-01-10
URL https://arxiv.org/abs/2001.03288v3
PDF https://arxiv.org/pdf/2001.03288v3.pdf
PWC https://paperswithcode.com/paper/efficient-memory-management-for-deep-neural

Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology

Title Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology
Authors Stefan Studer, Thanh Binh Bui, Christian Drescher, Alexander Hanuschkin, Ludwig Winkler, Steven Peters, Klaus-Robert Mueller
Abstract We propose a process model for the development of machine learning applications. It guides machine learning practitioners and project organizations from industry and academia with a checklist of tasks that spans the complete project life-cycle, ranging from the very first idea to the continuous maintenance of any machine learning application. With each task, we propose quality assurance methodology that is drawn from practical experience and scientific literature and that has proven to be general and stable enough to include them in best practices. We expand on CRISP-DM, a data mining process model that enjoys strong industry support but lacks to address machine learning specific tasks.
Published 2020-03-11
URL https://arxiv.org/abs/2003.05155v1
PDF https://arxiv.org/pdf/2003.05155v1.pdf
PWC https://paperswithcode.com/paper/towards-crisp-mlq-a-machine-learning-process

TCM-ICP: Transformation Compatibility Measure for Registering Multiple LIDAR Scans

Title TCM-ICP: Transformation Compatibility Measure for Registering Multiple LIDAR Scans
Authors Aby Thomas, Adarsh Sunilkumar, Shankar Shylesh, Aby Abahai T., Subhasree Methirumangalath, Dong Chen, Jiju Peethambaran
Abstract Rigid registration of multi-view and multi-platform LiDAR scans is a fundamental problem in 3D mapping, robotic navigation, and large-scale urban modeling applications. Data acquisition with LiDAR sensors involves scanning multiple areas from different points of view, thus generating partially overlapping point clouds of the real world scenes. Traditionally, ICP (Iterative Closest Point) algorithm is used to register the acquired point clouds together to form a unique point cloud that captures the scanned real world scene. Conventional ICP faces local minima issues and often needs a coarse initial alignment to converge to the optimum. In this work, we present an algorithm for registering multiple, overlapping LiDAR scans. We introduce a geometric metric called Transformation Compatibility Measure (TCM) which aids in choosing the most similar point clouds for registration in each iteration of the algorithm. The LiDAR scan most similar to the reference LiDAR scan is then transformed using simplex technique. An optimization of the transformation using gradient descent and simulated annealing techniques are then applied to improve the resulting registration. We evaluate the proposed algorithm on four different real world scenes and experimental results shows that the registration performance of the proposed method is comparable or superior to the traditionally used registration methods. Further, the algorithm achieves superior registration results even when dealing with outliers.
Published 2020-01-04
URL https://arxiv.org/abs/2001.01129v2
PDF https://arxiv.org/pdf/2001.01129v2.pdf
PWC https://paperswithcode.com/paper/tcm-icp-transformation-compatibility-measure

Decoupling Learning Rates Using Empirical Bayes Priors

Title Decoupling Learning Rates Using Empirical Bayes Priors
Authors Sareh Nabi, Houssam Nassif, Joseph Hong, Hamed Mamani, Guido Imbens
Abstract In this work, we propose an Empirical Bayes approach to decouple the learning rates of first order and second order features (or any other feature grouping) in a Generalized Linear Model. Such needs arise in small-batch or low-traffic use-cases. As the first order features are likely to have a more pronounced effect on the outcome, focusing on learning first order weights first is likely to improve performance and convergence time. Our Empirical Bayes method clamps features in each group together and uses the observed data for the deployed model to empirically compute a hierarchical prior in hindsight. We apply our method to a standard classification setting, as well as a contextual bandit setting in an Amazon production system. Both during simulations and live experiments, our method shows marked improvements, especially in cases of small traffic. Our findings are promising, as optimizing over sparse data is often a challenge. Furthermore, our approach can be applied to any problem instance modeled as a Bayesian framework.
Published 2020-02-04
URL https://arxiv.org/abs/2002.01129v1
PDF https://arxiv.org/pdf/2002.01129v1.pdf
PWC https://paperswithcode.com/paper/decoupling-learning-rates-using-empirical

Approximating Trajectory Constraints with Machine Learning – Microgrid Islanding with Frequency Constraints

Title Approximating Trajectory Constraints with Machine Learning – Microgrid Islanding with Frequency Constraints
Authors Yichen Zhang, Chen Chen, Guodong Liu, Tianqi Hong, Feng Qiu
Abstract In this paper, we introduce a deep learning aided constraint encoding method to tackle the frequency-constraint microgrid scheduling problem. The nonlinear function between system operating condition and frequency nadir is approximated by using a neural network, which admits an exact mixed-integer formulation (MIP). This formulation is then integrated with the scheduling problem to encode the frequency constraint. With the stronger representation power of the neural network, the resulting commands can ensure adequate frequency response in a realistic setting in addition to islanding success. The proposed method is validated on a modified 33-node system. Successful islanding with a secure response is simulated under the scheduled commands using a detailed three-phase model in Simulink. The advantages of our model are particularly remarkable when the inertia emulation functions from wind turbine generators are considered.
Published 2020-01-16
URL https://arxiv.org/abs/2001.05775v2
PDF https://arxiv.org/pdf/2001.05775v2.pdf
PWC https://paperswithcode.com/paper/approximating-trajectory-constraints-with

Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network

Title Simple and Scalable Epistemic Uncertainty Estimation Using a Single Deep Deterministic Neural Network
Authors Joost van Amersfoort, Lewis Smith, Yee Whye Teh, Yarin Gal
Abstract We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. Our approach, deterministic uncertainty quantification (DUQ), builds upon ideas of RBF networks. We scale training in these with a novel loss function and centroid updating scheme. By enforcing detectability of changes in the input using a gradient penalty, we are able to reliably detect out of distribution data. Our uncertainty quantification scales well to large datasets, and using a single model, we improve upon or match Deep Ensembles on notable difficult dataset pairs such as FashionMNIST vs. MNIST, and CIFAR-10 vs. SVHN, while maintaining competitive accuracy.
Published 2020-03-04
URL https://arxiv.org/abs/2003.02037v1
PDF https://arxiv.org/pdf/2003.02037v1.pdf
PWC https://paperswithcode.com/paper/simple-and-scalable-epistemic-uncertainty

FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints

Title FlexServe: Deployment of PyTorch Models as Flexible REST Endpoints
Authors Edward Verenich, Alvaro Velasquez, M. G. Sarwar Murshed, Faraz Hussain
Abstract The integration of artificial intelligence capabilities into modern software systems is increasingly being simplified through the use of cloud-based machine learning services and representational state transfer architecture design. However, insufficient information regarding underlying model provenance and the lack of control over model evolution serve as an impediment to the more widespread adoption of these services in many operational environments which have strict security requirements. Furthermore, tools such as TensorFlow Serving allow models to be deployed as RESTful endpoints, but require error-prone transformations for PyTorch models as these dynamic computational graphs. This is in contrast to the static computational graphs of TensorFlow. To enable rapid deployments of PyTorch models without intermediate transformations we have developed FlexServe, a simple library to deploy multi-model ensembles with flexible batching.
Published 2020-02-29
URL https://arxiv.org/abs/2003.01538v1
PDF https://arxiv.org/pdf/2003.01538v1.pdf
PWC https://paperswithcode.com/paper/flexserve-deployment-of-pytorch-models-as

Towards a Computer Vision Particle Flow

Title Towards a Computer Vision Particle Flow
Authors Francesco Armando Di Bello, Sanmay Ganguly, Eilam Gross, Marumi Kado, Michael Pitt, Jonathan Shlomi, Lorenzo Santi
Abstract In high energy physics experiments Particle Flow (PFlow) algorithms are designed to reach optimal calorimeter reconstruction and jet energy resolution. A computer vision approach to PFlow reconstruction using deep Neural Network techniques based on Convolutional layers (cPFlow) is proposed. The algorithm is trained to learn, from calorimeter and charged particle track images, to distinguish the calorimeter energy deposits from neutral and charged particles in a non-trivial context, where the energy originated by a $\pi^{+}$ and a $\pi^{0}$ is overlapping within calorimeter clusters. The performance of the cPFlow and a traditional parametrized PFlow (pPFlow) algorithm are compared. The cPFlow provides a precise reconstruction of the neutral and charged energy in the calorimeter and therefore outperform more traditional pPFlow algorithm both, in energy response and position resolution.
Published 2020-03-19
URL https://arxiv.org/abs/2003.08863v1
PDF https://arxiv.org/pdf/2003.08863v1.pdf
PWC https://paperswithcode.com/paper/towards-a-computer-vision-particle-flow

SCALE-Net: Scalable Vehicle Trajectory Prediction Network under Random Number of Interacting Vehicles via Edge-enhanced Graph Convolutional Neural Network

Title SCALE-Net: Scalable Vehicle Trajectory Prediction Network under Random Number of Interacting Vehicles via Edge-enhanced Graph Convolutional Neural Network
Authors Hyeongseok Jeon, Junwon Choi, Dongsuk Kum
Abstract Predicting the future trajectory of surrounding vehicles in a randomly varying traffic level is one of the most challenging problems in developing an autonomous vehicle. Since there is no pre-defined number of interacting vehicles participate in, the prediction network has to be scalable with respect to the vehicle number in order to guarantee the consistency in terms of both accuracy and computational load. In this paper, the first fully scalable trajectory prediction network, SCALE-Net, is proposed that can ensure both higher prediction performance and consistent computational load regardless of the number of surrounding vehicles. The SCALE-Net employs the Edge-enhance Graph Convolutional Neural Network (EGCN) for the inter-vehicular interaction embedding network. Since the proposed EGCN is inherently scalable with respect to the graph node (an agent in this study), the model can be operated independently from the total number of vehicles considered. We evaluated the scalability of the SCALE-Net on the publically available NGSIM datasets by comparing variations on computation time and prediction accuracy per single driving scene with respect to the varying vehicle number. The experimental test shows that both computation time and prediction performance of the SCALE-Net consistently outperform those of previous models regardless of the level of traffic complexities.
Tasks Trajectory Prediction
Published 2020-02-28
URL https://arxiv.org/abs/2002.12609v1
PDF https://arxiv.org/pdf/2002.12609v1.pdf
PWC https://paperswithcode.com/paper/scale-net-scalable-vehicle-trajectory

Impact of Data Quality on Deep Neural Network Training

Title Impact of Data Quality on Deep Neural Network Training
Authors Subrata Goswami
Abstract It is well known that data is critical for training neural networks. Lot have been written about quantities of data required to train networks well. However, there is not much publications on how data quality effects convergence of such networks. There is dearth of information on what is considered good data ( for the task ). This empirical experimental study explores some impacts of data quality. Specific results are shown in the paper how simple changes can have impact on Mean Average Precision (mAP).
Published 2020-01-20
URL https://arxiv.org/abs/2002.03732v1
PDF https://arxiv.org/pdf/2002.03732v1.pdf
PWC https://paperswithcode.com/paper/impact-of-data-quality-on-deep-neural-network

Online Tensor-Based Learning for Multi-Way Data

Title Online Tensor-Based Learning for Multi-Way Data
Authors Ali Anaissi, Basem Suleiman, Seid Miad Zandavi
Abstract The online analysis of multi-way data stored in a tensor $\mathcal{X} \in \mathbb{R} ^{I_1 \times \dots \times I_N} $ has become an essential tool for capturing the underlying structures and extracting the sensitive features which can be used to learn a predictive model. However, data distributions often evolve with time and a current predictive model may not be sufficiently representative in the future. Therefore, incrementally updating the tensor-based features and model coefficients are required in such situations. A new efficient tensor-based feature extraction, named NeSGD, is proposed for online $CANDECOMP/PARAFAC$ (CP) decomposition. According to the new features obtained from the resultant matrices of NeSGD, a new criteria is triggered for the updated process of the online predictive model. Experimental evaluation in the field of structural health monitoring using laboratory-based and real-life structural datasets show that our methods provide more accurate results compared with existing online tensor analysis and model learning. The results showed that the proposed methods significantly improved the classification error rates, were able to assimilate the changes in the positive data distribution over time, and maintained a high predictive accuracy in all case studies.
Published 2020-03-10
URL https://arxiv.org/abs/2003.04497v1
PDF https://arxiv.org/pdf/2003.04497v1.pdf
PWC https://paperswithcode.com/paper/online-tensor-based-learning-for-multi-way

Exocentric to Egocentric Image Generation via Parallel Generative Adversarial Network

Title Exocentric to Egocentric Image Generation via Parallel Generative Adversarial Network
Authors Gaowen Liu, Hao Tang, Hugo Latapie, Yan Yan
Abstract Cross-view image generation has been recently proposed to generate images of one view from another dramatically different view. In this paper, we investigate exocentric (third-person) view to egocentric (first-person) view image generation. This is a challenging task since egocentric view sometimes is remarkably different from exocentric view. Thus, transforming the appearances across the two views is a non-trivial task. To this end, we propose a novel Parallel Generative Adversarial Network (P-GAN) with a novel cross-cycle loss to learn the shared information for generating egocentric images from exocentric view. We also incorporate a novel contextual feature loss in the learning procedure to capture the contextual information in images. Extensive experiments on the Exo-Ego datasets show that our model outperforms the state-of-the-art approaches.
Tasks Image Generation
Published 2020-02-08
URL https://arxiv.org/abs/2002.03219v1
PDF https://arxiv.org/pdf/2002.03219v1.pdf
PWC https://paperswithcode.com/paper/exocentric-to-egocentric-image-generation-via

Adversarial Code Learning for Image Generation

Title Adversarial Code Learning for Image Generation
Authors Jiangbo Yuan, Bing Wu, Wanying Ding, Qing Ping, Zhendong Yu
Abstract We introduce the “adversarial code learning” (ACL) module that improves overall image generation performance to several types of deep models. Instead of performing a posterior distribution modeling in the pixel spaces of generators, ACLs aim to jointly learn a latent code with another image encoder/inference net, with a prior noise as its input. We conduct the learning in an adversarial learning process, which bears a close resemblance to the original GAN but again shifts the learning from image spaces to prior and latent code spaces. ACL is a portable module that brings up much more flexibility and possibilities in generative model designs. First, it allows flexibility to convert non-generative models like Autoencoders and standard classification models to decent generative models. Second, it enhances existing GANs’ performance by generating meaningful codes and images from any part of the prior. We have incorporated our ACL module with the aforementioned frameworks and have performed experiments on synthetic, MNIST, CIFAR-10, and CelebA datasets. Our models have achieved significant improvements which demonstrated the generality for image generation tasks.
Tasks Image Generation
Published 2020-01-30
URL https://arxiv.org/abs/2001.11539v1
PDF https://arxiv.org/pdf/2001.11539v1.pdf
PWC https://paperswithcode.com/paper/adversarial-code-learning-for-image

On the Optimization Dynamics of Wide Hypernetworks

Title On the Optimization Dynamics of Wide Hypernetworks
Authors Etai Littwin, Tomer Galanti, Lior Wolf
Abstract Recent results in the theoretical study of deep learning have shown that the optimization dynamics of wide neural networks exhibit a surprisingly simple behaviour. In this work, we study the optimization dynamics of hypernetworks, which are architectures in which a learned meta-network produces the weights of a task-specific primary network. Hypernetworks have been demonstrated repeatedly to obtain state of the art results. However, their theoretical understanding is still lacking. As can be expected, the optimization process of multiplicative models is much more complicated than optimizing standard ReLU networks. It is shown that for an infinitely wide neural network with a gating layer the cost function cannot be accurately approximated by it first order Taylor approximation. Specifically, for a fixed sized primary network of depth H, the first H terms of the Taylor approximation of the cost function are non-zero, even when the meta-network is infinitely wide. However, for an infinitely wide meta and primary networks, the learning dynamics is determined by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters and the kernel of this process is given by the Hadamard product of the kernels induced by the meta and primary networks. As part of our study, we partially solve an open problem suggested by Dyer & Gur-Ari (2020) and show that the convergence rate of the r order term of the Taylor expansion of the cost function, along the optimization trajectories of SGD is n^{1-r}, where n is the width of the learned neural network, improving upon the n^{-1} bound suggested by the conjecture of Dyer & Gur-Ari, while matching their empirical observations.
Published 2020-03-27
URL https://arxiv.org/abs/2003.12193v2
PDF https://arxiv.org/pdf/2003.12193v2.pdf
PWC https://paperswithcode.com/paper/on-the-optimization-dynamics-of-wide
comments powered by Disqus