October 18, 2019

2930 words 14 mins read

Paper Group ANR 586

Paper Group ANR 586

Accelerated Gossip via Stochastic Heavy Ball Method. A Mean-Field Optimal Control Formulation of Deep Learning. Tübingen-Oslo system: Linear regression works the best at Predicting Current and Future Psychological Health from Childhood Essays in the CLPsych 2018 Shared Task. The Research of the Real-time Detection and Recognition of Targets in Stre …

Accelerated Gossip via Stochastic Heavy Ball Method

Title Accelerated Gossip via Stochastic Heavy Ball Method
Authors Nicolas Loizou, Peter Richtárik
Abstract In this paper we show how the stochastic heavy ball method (SHB) – a popular method for solving stochastic convex and non-convex optimization problems –operates as a randomized gossip algorithm. In particular, we focus on two special cases of SHB: the Randomized Kaczmarz method with momentum and its block variant. Building upon a recent framework for the design and analysis of randomized gossip algorithms, [Loizou Richtarik, 2016] we interpret the distributed nature of the proposed methods. We present novel protocols for solving the average consensus problem where in each step all nodes of the network update their values but only a subset of them exchange their private values. Numerical experiments on popular wireless sensor networks showing the benefits of our protocols are also presented.
Tasks
Published 2018-09-23
URL http://arxiv.org/abs/1809.08657v1
PDF http://arxiv.org/pdf/1809.08657v1.pdf
PWC https://paperswithcode.com/paper/accelerated-gossip-via-stochastic-heavy-ball
Repo
Framework

A Mean-Field Optimal Control Formulation of Deep Learning

Title A Mean-Field Optimal Control Formulation of Deep Learning
Authors Weinan E, Jiequn Han, Qianxiao Li
Abstract Recent work linking deep neural networks and dynamical systems opened up new avenues to analyze deep learning. In particular, it is observed that new insights can be obtained by recasting deep learning as an optimal control problem on difference or differential equations. However, the mathematical aspects of such a formulation have not been systematically explored. This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem. Mirroring the development of classical optimal control, we state and prove optimality conditions of both the Hamilton-Jacobi-Bellman type and the Pontryagin type. These mean-field results reflect the probabilistic nature of the learning problem. In addition, by appealing to the mean-field Pontryagin’s maximum principle, we establish some quantitative relationships between population and empirical learning problems. This serves to establish a mathematical foundation for investigating the algorithmic and theoretical connections between optimal control and deep learning.
Tasks
Published 2018-07-03
URL http://arxiv.org/abs/1807.01083v1
PDF http://arxiv.org/pdf/1807.01083v1.pdf
PWC https://paperswithcode.com/paper/a-mean-field-optimal-control-formulation-of
Repo
Framework

Tübingen-Oslo system: Linear regression works the best at Predicting Current and Future Psychological Health from Childhood Essays in the CLPsych 2018 Shared Task

Title Tübingen-Oslo system: Linear regression works the best at Predicting Current and Future Psychological Health from Childhood Essays in the CLPsych 2018 Shared Task
Authors Çağrı Çöltekin, Taraka Rama
Abstract This paper describes our efforts in predicting current and future psychological health from childhood essays within the scope of the CLPsych-2018 Shared Task. We experimented with a number of different models, including recurrent and convolutional networks, Poisson regression, support vector regression, and L1 and L2 regularized linear regression. We obtained the best results on the training/development data with L2 regularized linear regression (ridge regression) which also got the best scores on main metrics in the official testing for task A (predicting psychological health from essays written at the age of 11 years) and task B (predicting later psychological health from essays written at the age of 11).
Tasks
Published 2018-09-13
URL http://arxiv.org/abs/1809.04838v1
PDF http://arxiv.org/pdf/1809.04838v1.pdf
PWC https://paperswithcode.com/paper/tubingen-oslo-system-linear-regression-works
Repo
Framework

The Research of the Real-time Detection and Recognition of Targets in Streetscape Videos

Title The Research of the Real-time Detection and Recognition of Targets in Streetscape Videos
Authors Liu Jian-min
Abstract This study proposes a method for the real-time detection and recognition of targets in streetscape videos. The proposed method is based on separation confidence computation and scale synthesis optimization. We use the proposed method to detect and recognize targets in streetscape videos with high frame rates and high definition. Furthermore, we experimentally demonstrate that the accuracy and robustness of our proposed method are superior to those of conventional methods.
Tasks
Published 2018-06-11
URL http://arxiv.org/abs/1806.04070v1
PDF http://arxiv.org/pdf/1806.04070v1.pdf
PWC https://paperswithcode.com/paper/the-research-of-the-real-time-detection-and
Repo
Framework

Avoiding overfitting of multilayer perceptrons by training derivatives

Title Avoiding overfitting of multilayer perceptrons by training derivatives
Authors V. I. Avrutskiy
Abstract Resistance to overfitting is observed for neural networks trained with extended backpropagation algorithm. In addition to target values, its cost function uses derivatives of those up to the $4^{\mathrm{th}}$ order. For common applications of neural networks, high order derivatives are not readily available, so simpler cases are considered: training network to approximate analytical function inside 2D and 5D domains and solving Poisson equation inside a 2D circle. For function approximation, the cost is a sum of squared differences between output and target as well as their derivatives with respect to the input. Differential equations are usually solved by putting a multilayer perceptron in place of unknown function and training its weights, so that equation holds within some margin of error. Commonly used cost is the equation’s residual squared. Added terms are squared derivatives of said residual with respect to the independent variables. To investigate overfitting, the cost is minimized for points of regular grids with various spacing, and its root mean is compared with its value on much denser test set. Fully connected perceptrons with six hidden layers and $2\cdot10^{4}$, $1\cdot10^{6}$ and $5\cdot10^{6}$ weights in total are trained with Rprop until cost changes by less than 10% for last 1000 epochs, or when the $10000^{\mathrm{th}}$ epoch is reached. Training the network with $5\cdot10^{6}$ weights to represent simple 2D function using 10 points with 8 extra derivatives in each produces cost test to train ratio of $1.5$, whereas for classical backpropagation in comparable conditions this ratio is $2\cdot10^{4}$.
Tasks
Published 2018-02-28
URL http://arxiv.org/abs/1802.10301v1
PDF http://arxiv.org/pdf/1802.10301v1.pdf
PWC https://paperswithcode.com/paper/avoiding-overfitting-of-multilayer
Repo
Framework

Hierarchical Clustering with Prior Knowledge

Title Hierarchical Clustering with Prior Knowledge
Authors Xiaofei Ma, Satya Dhavala
Abstract Hierarchical clustering is a class of algorithms that seeks to build a hierarchy of clusters. It has been the dominant approach to constructing embedded classification schemes since it outputs dendrograms, which capture the hierarchical relationship among members at all levels of granularity, simultaneously. Being greedy in the algorithmic sense, a hierarchical clustering partitions data at every step solely based on a similarity / dissimilarity measure. The clustering results oftentimes depend on not only the distribution of the underlying data, but also the choice of dissimilarity measure and the clustering algorithm. In this paper, we propose a method to incorporate prior domain knowledge about entity relationship into the hierarchical clustering. Specifically, we use a distance function in ultrametric space to encode the external ontological information. We show that popular linkage-based algorithms can faithfully recover the encoded structure. Similar to some regularized machine learning techniques, we add this distance as a penalty term to the original pairwise distance to regulate the final structure of the dendrogram. As a case study, we applied this method on real data in the building of a customer behavior based product taxonomy for an Amazon service, leveraging the information from a larger Amazon-wide browse structure. The method is useful when one wants to leverage the relational information from external sources, or the data used to generate the distance matrix is noisy and sparse. Our work falls in the category of semi-supervised or constrained clustering.
Tasks
Published 2018-06-09
URL http://arxiv.org/abs/1806.03432v3
PDF http://arxiv.org/pdf/1806.03432v3.pdf
PWC https://paperswithcode.com/paper/hierarchical-clustering-with-prior-knowledge
Repo
Framework

Probabilistic Deep Learning using Random Sum-Product Networks

Title Probabilistic Deep Learning using Random Sum-Product Networks
Authors Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Kristian Kersting, Zoubin Ghahramani
Abstract The need for consistent treatment of uncertainty has recently triggered increased interest in probabilistic deep learning methods. However, most current approaches have severe limitations when it comes to inference, since many of these models do not even permit to evaluate exact data likelihoods. Sum-product networks (SPNs), on the other hand, are an excellent architecture in that regard, as they allow to efficiently evaluate likelihoods, as well as arbitrary marginalization and conditioning tasks. Nevertheless, SPNs have not been fully explored as serious deep learning models, likely due to their special structural requirements, which complicate learning. In this paper, we make a drastic simplification and use random SPN structures which are trained in a “classical deep learning manner”, i.e. employing automatic differentiation, SGD, and GPU support. The resulting models, called RAT-SPNs, yield prediction results comparable to deep neural networks, while still being interpretable as generative model and maintaining well-calibrated uncertainties. This property makes them highly robust under missing input features and enables them to naturally detect outliers and peculiar samples.
Tasks
Published 2018-06-05
URL http://arxiv.org/abs/1806.01910v2
PDF http://arxiv.org/pdf/1806.01910v2.pdf
PWC https://paperswithcode.com/paper/probabilistic-deep-learning-using-random-sum
Repo
Framework

When Does Stochastic Gradient Algorithm Work Well?

Title When Does Stochastic Gradient Algorithm Work Well?
Authors Lam M. Nguyen, Nam H. Nguyen, Dzung T. Phan, Jayant R. Kalagnanam, Katya Scheinberg
Abstract In this paper, we consider a general stochastic optimization problem which is often at the core of supervised learning, such as deep learning and linear classification. We consider a standard stochastic gradient descent (SGD) method with a fixed, large step size and propose a novel assumption on the objective function, under which this method has the improved convergence rates (to a neighborhood of the optimal solutions). We then empirically demonstrate that these assumptions hold for logistic regression and standard deep neural networks on classical data sets. Thus our analysis helps to explain when efficient behavior can be expected from the SGD method in training classification models and deep neural networks.
Tasks Stochastic Optimization
Published 2018-01-18
URL http://arxiv.org/abs/1801.06159v2
PDF http://arxiv.org/pdf/1801.06159v2.pdf
PWC https://paperswithcode.com/paper/when-does-stochastic-gradient-algorithm-work
Repo
Framework

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Title BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism
Authors Nicolas Weber, Florian Schmidt, Mathias Niepert, Felipe Huici
Abstract Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use, the training of and inference in deep neural networks is resource (energy, compute, and memory) intensive. In contrast to recent works focusing on algorithmic enhancements, we introduce BrainSlug, a framework that transparently accelerates neural network workloads by changing the default layer-by-layer processing to a depth-first approach, reducing the amount of data required by the computations and thus improving the performance of the available hardware caches. BrainSlug achieves performance improvements of up to 41.1% on CPUs and 35.7% on GPUs. These optimizations come at zero cost to the user as they do not require hardware changes and only need tiny adjustments to the software.
Tasks Machine Translation, Object Recognition
Published 2018-04-23
URL http://arxiv.org/abs/1804.08378v1
PDF http://arxiv.org/pdf/1804.08378v1.pdf
PWC https://paperswithcode.com/paper/brainslug-transparent-acceleration-of-deep
Repo
Framework

Parameterized Synthetic Image Data Set for Fisheye Lens

Title Parameterized Synthetic Image Data Set for Fisheye Lens
Authors Zhen Chen, Anthimos Georgiadis
Abstract Based on different projection geometry, a fisheye image can be presented as a parameterized non-rectilinear image. Deep neural networks(DNN) is one of the solutions to extract parameters for fisheye image feature description. However, a large number of images are required for training a reasonable prediction model for DNN. In this paper, we propose to extend the scale of the training dataset using parameterized synthetic images. It effectively boosts the diversity of images and avoids the data scale limitation. To simulate different viewing angles and distances, we adopt controllable parameterized projection processes on transformation. The reliability of the proposed method is proved by testing images captured by our fisheye camera. The synthetic dataset is the first dataset that is able to extend to a big scale labeled fisheye image dataset. It is accessible via: http://www2.leuphana.de/misl/fisheye-data-set/.
Tasks
Published 2018-11-12
URL http://arxiv.org/abs/1811.04627v1
PDF http://arxiv.org/pdf/1811.04627v1.pdf
PWC https://paperswithcode.com/paper/parameterized-synthetic-image-data-set-for
Repo
Framework

Feature Preserving and Uniformity-controllable Point Cloud Simplification on Graph

Title Feature Preserving and Uniformity-controllable Point Cloud Simplification on Graph
Authors Junkun Qi, Wei Hu, Zongming Guo
Abstract With the development of 3D sensing technologies, point clouds have attracted increasing attention in a variety of applications for 3D object representation, such as autonomous driving, 3D immersive tele-presence and heritage reconstruction. However, it is challenging to process large-scale point clouds in terms of both computation time and storage due to the tremendous amounts of data. Hence, we propose a point cloud simplification algorithm, aiming to strike a balance between preserving sharp features and keeping uniform density during resampling. In particular, leveraging on graph spectral processing, we represent irregular point clouds naturally on graphs, and propose concise formulations of feature preservation and density uniformity based on graph filters. The problem of point cloud simplification is finally formulated as a trade-off between the two factors and efficiently solved by our proposed algorithm. Experimental results demonstrate the superiority of our method, as well as its efficient application in point cloud registration.
Tasks Autonomous Driving, Point Cloud Registration
Published 2018-12-29
URL http://arxiv.org/abs/1812.11383v1
PDF http://arxiv.org/pdf/1812.11383v1.pdf
PWC https://paperswithcode.com/paper/feature-preserving-and-uniformity
Repo
Framework

Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients

Title Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients
Authors S. Alireza Golestaneh, Lina Karam
Abstract Perceptual quality assessment for synthesized textures is a challenging task. In this paper, we propose a training-free reduced-reference (RR) objective quality assessment method that quantifies the perceived quality of synthesized textures. The proposed reduced-reference synthesized texture quality assessment metric is based on measuring the spatial and statistical attributes of the texture image using both image- and gradient-based wavelet coefficients at multiple scales. Performance evaluations on two synthesized texture databases demonstrate that our proposed RR synthesized texture quality metric significantly outperforms both full-reference and RR state-of-the-art quality metrics in predicting the perceived visual quality of the synthesized textures.
Tasks
Published 2018-04-21
URL http://arxiv.org/abs/1804.08020v2
PDF http://arxiv.org/pdf/1804.08020v2.pdf
PWC https://paperswithcode.com/paper/synthesized-texture-quality-assessment-via
Repo
Framework

Guiding Neural Machine Translation with Retrieved Translation Pieces

Title Guiding Neural Machine Translation with Retrieved Translation Pieces
Authors Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura
Abstract One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect $n$-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call “translation pieces”. We compute pseudo-probabilities for each retrieved sentence based on similarities between the input sentence and the retrieved source sentences, and use these to weight the retrieved translation pieces. Finally, an existing NMT model is used to translate the input sentence, with an additional bonus given to outputs that contain the collected translation pieces. We show our method improves NMT translation results up to 6 BLEU points on three narrow domain translation tasks where repetitiveness of the target sentences is particularly salient. It also causes little increase in the translation time, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation.
Tasks Machine Translation
Published 2018-04-07
URL http://arxiv.org/abs/1804.02559v1
PDF http://arxiv.org/pdf/1804.02559v1.pdf
PWC https://paperswithcode.com/paper/guiding-neural-machine-translation-with
Repo
Framework

Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions

Title Towards safe deep learning: accurately quantifying biomarker uncertainty in neural network predictions
Authors Zach Eaton-Rosen, Felix Bragman, Sotirios Bisdas, Sebastien Ourselin, M. Jorge Cardoso
Abstract Automated medical image segmentation, specifically using deep learning, has shown outstanding performance in semantic segmentation tasks. However, these methods rarely quantify their uncertainty, which may lead to errors in downstream analysis. In this work we propose to use Bayesian neural networks to quantify uncertainty within the domain of semantic segmentation. We also propose a method to convert voxel-wise segmentation uncertainty into volumetric uncertainty, and calibrate the accuracy and reliability of confidence intervals of derived measurements. When applied to a tumour volume estimation application, we demonstrate that by using such modelling of uncertainty, deep learning systems can be made to report volume estimates with well-calibrated error-bars, making them safer for clinical use. We also show that the uncertainty estimates extrapolate to unseen data, and that the confidence intervals are robust in the presence of artificial noise. This could be used to provide a form of quality control and quality assurance, and may permit further adoption of deep learning tools in the clinic.
Tasks Medical Image Segmentation, Semantic Segmentation
Published 2018-06-22
URL http://arxiv.org/abs/1806.08640v1
PDF http://arxiv.org/pdf/1806.08640v1.pdf
PWC https://paperswithcode.com/paper/towards-safe-deep-learning-accurately
Repo
Framework

A Learning Theory in Linear Systems under Compositional Models

Title A Learning Theory in Linear Systems under Compositional Models
Authors Se Un Park
Abstract We present a learning theory for the training of a linear system operator having an input compositional variable and propose a Bayesian inversion method for inferring the unknown variable from an output of a noisy linear system. We assume that we have partial or even no knowledge of the operator but have training data of input and ouput. A compositional variable satisfies the constraints that the elements of the variable are all non-negative and sum to unity. We quantified the uncertainty in the trained operator and present the convergence rates of training in explicit forms for several interesting cases under stochastic compositional models. The trained linear operator with the covariance matrix, estimated from the training set of pairs of ground-truth input and noisy output data, is further used in evaluation of posterior uncertainty of the solution. This posterior uncertainty clearly demonstrates uncertainty propagation from noisy training data and addresses possible mismatch between the true operator and the estimated one in the final solution.
Tasks
Published 2018-06-29
URL http://arxiv.org/abs/1807.00084v1
PDF http://arxiv.org/pdf/1807.00084v1.pdf
PWC https://paperswithcode.com/paper/a-learning-theory-in-linear-systems-under
Repo
Framework
comments powered by Disqus