July 29, 2019

3226 words 16 mins read

Paper Group AWR 124

Paper Group AWR 124

DeepVel: deep learning for the estimation of horizontal velocities at the solar surface. Deep Residual Learning for Small-Footprint Keyword Spotting. A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models. On Convergence and Stability of GANs. Developing Bug-Free Machine Learning Systems With Formal Mathematics. …

DeepVel: deep learning for the estimation of horizontal velocities at the solar surface

Title DeepVel: deep learning for the estimation of horizontal velocities at the solar surface
Authors A. Asensio Ramos, I. S. Requerey, N. Vitas
Abstract Many phenomena taking place in the solar photosphere are controlled by plasma motions. Although the line-of-sight component of the velocity can be estimated using the Doppler effect, we do not have direct spectroscopic access to the components that are perpendicular to the line-of-sight. These components are typically estimated using methods based on local correlation tracking. We have designed DeepVel, an end-to-end deep neural network that produces an estimation of the velocity at every single pixel and at every time step and at three different heights in the atmosphere from just two consecutive continuum images. We confront DeepVel with local correlation tracking, pointing out that they give very similar results in the time- and spatially-averaged cases. We use the network to study the evolution in height of the horizontal velocity field in fragmenting granules, supporting the buoyancy-braking mechanism for the formation of integranular lanes in these granules. We also show that DeepVel can capture very small vortices, so that we can potentially expand the scaling cascade of vortices to very small sizes and durations.
Tasks
Published 2017-03-15
URL http://arxiv.org/abs/1703.05128v2
PDF http://arxiv.org/pdf/1703.05128v2.pdf
PWC https://paperswithcode.com/paper/deepvel-deep-learning-for-the-estimation-of
Repo https://github.com/aasensio/deepvel
Framework tf

Deep Residual Learning for Small-Footprint Keyword Spotting

Title Deep Residual Learning for Small-Footprint Keyword Spotting
Authors Raphael Tang, Jimmy Lin
Abstract We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google’s previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models that also outperform previous small-footprint variants. To our knowledge, we are the first to examine these approaches for keyword spotting, and our results establish an open-source state-of-the-art reference to support the development of future speech-based interfaces.
Tasks Keyword Spotting, Small-Footprint Keyword Spotting
Published 2017-10-28
URL http://arxiv.org/abs/1710.10361v2
PDF http://arxiv.org/pdf/1710.10361v2.pdf
PWC https://paperswithcode.com/paper/deep-residual-learning-for-small-footprint
Repo https://github.com/castorini/honk
Framework pytorch

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Title A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models
Authors Youssef Oualil, Dietrich Klakow
Abstract Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.
Tasks
Published 2017-08-20
URL http://arxiv.org/abs/1708.05997v2
PDF http://arxiv.org/pdf/1708.05997v2.pdf
PWC https://paperswithcode.com/paper/a-batch-noise-contrastive-estimation-approach
Repo https://github.com/Stonesjtu/Pytorch-NCE
Framework pytorch

On Convergence and Stability of GANs

Title On Convergence and Stability of GANs
Authors Naveen Kodali, Jacob Abernethy, James Hays, Zsolt Kira
Abstract We propose studying GAN training dynamics as regret minimization, which is in contrast to the popular view that there is consistent minimization of a divergence between real and generated distributions. We analyze the convergence of GAN training from this new point of view to understand why mode collapse happens. We hypothesize the existence of undesirable local equilibria in this non-convex game to be responsible for mode collapse. We observe that these local equilibria often exhibit sharp gradients of the discriminator function around some real data points. We demonstrate that these degenerate local equilibria can be avoided with a gradient penalty scheme called DRAGAN. We show that DRAGAN enables faster training, achieves improved stability with fewer mode collapses, and leads to generator networks with better modeling performance across a variety of architectures and objective functions.
Tasks
Published 2017-05-19
URL http://arxiv.org/abs/1705.07215v5
PDF http://arxiv.org/pdf/1705.07215v5.pdf
PWC https://paperswithcode.com/paper/on-convergence-and-stability-of-gans
Repo https://github.com/eriklindernoren/PyTorch-GAN
Framework pytorch

Developing Bug-Free Machine Learning Systems With Formal Mathematics

Title Developing Bug-Free Machine Learning Systems With Formal Mathematics
Authors Daniel Selsam, Percy Liang, David L. Dill
Abstract Noisy data, non-convex objectives, model misspecification, and numerical instability can all cause undesired behaviors in machine learning systems. As a result, detecting actual implementation errors can be extremely difficult. We demonstrate a methodology in which developers use an interactive proof assistant to both implement their system and to state a formal theorem defining what it means for their system to be correct. The process of proving this theorem interactively in the proof assistant exposes all implementation errors since any error in the program would cause the proof to fail. As a case study, we implement a new system, Certigrad, for optimizing over stochastic computation graphs, and we generate a formal (i.e. machine-checkable) proof that the gradients sampled by the system are unbiased estimates of the true mathematical gradients. We train a variational autoencoder using Certigrad and find the performance comparable to training the same model in TensorFlow.
Tasks
Published 2017-06-26
URL http://arxiv.org/abs/1706.08605v1
PDF http://arxiv.org/pdf/1706.08605v1.pdf
PWC https://paperswithcode.com/paper/developing-bug-free-machine-learning-systems
Repo https://github.com/dselsam/certigrad
Framework tf

Understanding Hidden Memories of Recurrent Neural Networks

Title Understanding Hidden Memories of Recurrent Neural Networks
Authors Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu
Abstract Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their effectiveness limits further improvements on their architectures. In this paper, we present a visual analytics method for understanding and comparing RNN models for NLP tasks. We propose a technique to explain the function of individual hidden state units based on their expected response to input texts. We then co-cluster hidden state units and words based on the expected response and visualize co-clustering results as memory chips and word clouds to provide more structured knowledge on RNNs’ hidden states. We also propose a glyph-based sequence visualization based on aggregate information to analyze the behavior of an RNN’s hidden state at the sentence-level. The usability and effectiveness of our method are demonstrated through case studies and reviews from domain experts.
Tasks
Published 2017-10-30
URL http://arxiv.org/abs/1710.10777v1
PDF http://arxiv.org/pdf/1710.10777v1.pdf
PWC https://paperswithcode.com/paper/understanding-hidden-memories-of-recurrent
Repo https://github.com/myaooo/RNNVis
Framework tf

Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data

Title Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data
Authors Jamie R. Nuñez, Christopher R. Anderton, Ryan S. Renslow
Abstract Color vision deficiency (CVD) affects more than 4% of the population and leads to a different visual perception of colors. Though this has been known for decades, colormaps with many colors across the visual spectra are often used to represent data, leading to the potential for misinterpretation or difficulty with interpretation by someone with this deficiency. Until the creation of the module presented here, there were no colormaps mathematically optimized for CVD using modern color appearance models. While there have been some attempts to make aesthetically pleasing or subjectively tolerable colormaps for those with CVD, our goal was to make optimized colormaps for the most accurate perception of scientific data by as many viewers as possible. We developed a Python module, cmaputil, to create CVD-optimized colormaps, which imports colormaps and modifies them to be perceptually uniform in CVD-safe colorspace while linearizing and maximizing the brightness range. The module is made available to the science community to enable others to easily create their own CVDoptimized colormaps. Here, we present an example CVD-optimized colormap created with this module that is optimized for viewing by those without a CVD as well as those with redgreen colorblindness. This colormap, cividis, enables nearly-identical visual-data interpretation to both groups, is perceptually uniform in hue and brightness, and increases in brightness linearly.
Tasks
Published 2017-11-29
URL http://arxiv.org/abs/1712.01662v3
PDF http://arxiv.org/pdf/1712.01662v3.pdf
PWC https://paperswithcode.com/paper/optimizing-colormaps-with-consideration-for
Repo https://github.com/pnnl/cmaputil
Framework none

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

Title Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
Authors Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua
Abstract Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature interactions with the same weight, as not all feature interactions are equally useful and predictive. For example, the interactions with useless features may even introduce noises and adversely degrade the performance. In this work, we improve FM by discriminating the importance of different feature interactions. We propose a novel model named Attentional Factorization Machine (AFM), which learns the importance of each feature interaction from data via a neural attention network. Extensive experiments on two real-world datasets demonstrate the effectiveness of AFM. Empirically, it is shown on regression task AFM betters FM with a $8.6%$ relative improvement, and consistently outperforms the state-of-the-art deep learning methods Wide&Deep and DeepCross with a much simpler structure and fewer model parameters. Our implementation of AFM is publicly available at: https://github.com/hexiangnan/attentional_factorization_machine
Tasks
Published 2017-08-15
URL http://arxiv.org/abs/1708.04617v1
PDF http://arxiv.org/pdf/1708.04617v1.pdf
PWC https://paperswithcode.com/paper/attentional-factorization-machines-learning
Repo https://github.com/hexiangnan/attentional_factorization_machine
Framework tf

Large sample analysis of the median heuristic

Title Large sample analysis of the median heuristic
Authors Damien Garreau, Wittawat Jitkrittum, Motonobu Kanagawa
Abstract In kernel methods, the median heuristic has been widely used as a way of setting the bandwidth of RBF kernels. While its empirical performances make it a safe choice under many circumstances, there is little theoretical understanding of why this is the case. Our aim in this paper is to advance our understanding of the median heuristic by focusing on the setting of kernel two-sample test. We collect new findings that may be of interest for both theoreticians and practitioners. In theory, we provide a convergence analysis that shows the asymptotic normality of the bandwidth chosen by the median heuristic in the setting of kernel two-sample test. Systematic empirical investigations are also conducted in simple settings, comparing the performances based on the bandwidths chosen by the median heuristic and those by the maximization of test power.
Tasks
Published 2017-07-23
URL http://arxiv.org/abs/1707.07269v3
PDF http://arxiv.org/pdf/1707.07269v3.pdf
PWC https://paperswithcode.com/paper/large-sample-analysis-of-the-median-heuristic
Repo https://github.com/mmontana/ECG-heartbeat-classification
Framework none

On Structured Prediction Theory with Calibrated Convex Surrogate Losses

Title On Structured Prediction Theory with Calibrated Convex Surrogate Losses
Authors Anton Osokin, Francis Bach, Simon Lacoste-Julien
Abstract We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called “calibration function” relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for general structured prediction.
Tasks Calibration, Structured Prediction
Published 2017-03-07
URL http://arxiv.org/abs/1703.02403v4
PDF http://arxiv.org/pdf/1703.02403v4.pdf
PWC https://paperswithcode.com/paper/on-structured-prediction-theory-with
Repo https://github.com/aosokin/consistentSurrogates_derivations
Framework none

Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining

Title Benchmarking Relief-Based Feature Selection Methods for Bioinformatics Data Mining
Authors Ryan J. Urbanowicz, Randal S. Olson, Peter Schmitt, Melissa Meeker, Jason H. Moore
Abstract Modern biomedical data mining requires feature selection methods that can (1) be applied to large scale feature spaces (e.g. omics' data), (2) function in noisy problems, (3) detect complex patterns of association (e.g. gene-gene interactions), (4) be flexibly adapted to various problem domains and data types (e.g. genetic variants, gene expression, and clinical data) and (5) are computationally tractable. To that end, this work examines a set of filter-style feature selection algorithms inspired by the Relief’ algorithm, i.e. Relief-Based algorithms (RBAs). We implement and expand these RBAs in an open source framework called ReBATE (Relief-Based Algorithm Training Environment). We apply a comprehensive genetic simulation study comparing existing RBAs, a proposed RBA called MultiSURF, and other established feature selection methods, over a variety of problems. The results of this study (1) support the assertion that RBAs are particularly flexible, efficient, and powerful feature selection methods that differentiate relevant features having univariate, multivariate, epistatic, or heterogeneous associations, (2) confirm the efficacy of expansions for classification vs. regression, discrete vs. continuous features, missing data, multiple classes, or class imbalance, (3) identify previously unknown limitations of specific RBAs, and (4) suggest that while MultiSURF* performs best for explicitly identifying pure 2-way interactions, MultiSURF yields the most reliable feature selection performance across a wide range of problem types.
Tasks Feature Selection
Published 2017-11-22
URL http://arxiv.org/abs/1711.08477v2
PDF http://arxiv.org/pdf/1711.08477v2.pdf
PWC https://paperswithcode.com/paper/benchmarking-relief-based-feature-selection
Repo https://github.com/EpistasisLab/ReBATE
Framework none

A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Title A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
Authors Ben London
Abstract We study the generalization error of randomized learning algorithms – focusing on stochastic gradient descent (SGD) – using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm’s random hyperparameters, including distributions that depend on the training data. This inspires an adaptive sampling algorithm for SGD that optimizes the posterior at runtime. We analyze this algorithm in the context of our generalization bounds and evaluate it on a benchmark dataset. Our experiments demonstrate that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy.
Tasks
Published 2017-09-19
URL http://arxiv.org/abs/1709.06617v4
PDF http://arxiv.org/pdf/1709.06617v4.pdf
PWC https://paperswithcode.com/paper/a-pac-bayesian-analysis-of-randomized
Repo https://github.com/Zymrael/PAC-Adasampling
Framework pytorch

Diving into the shallows: a computational perspective on large-scale shallow learning

Title Diving into the shallows: a computational perspective on large-scale shallow learning
Authors Siyuan Ma, Mikhail Belkin
Abstract In this paper we first identify a basic limitation in gradient descent-based optimization methods when used in conjunctions with smooth kernels. An analysis based on the spectral properties of the kernel demonstrates that only a vanishingly small portion of the function space is reachable after a polynomial number of gradient descent iterations. This lack of approximating power drastically limits gradient descent for a fixed computational budget leading to serious over-regularization/underfitting. The issue is purely algorithmic, persisting even in the limit of infinite data. To address this shortcoming in practice, we introduce EigenPro iteration, based on a preconditioning scheme using a small number of approximately computed eigenvectors. It can also be viewed as learning a new kernel optimized for gradient descent. It turns out that injecting this small (computationally inexpensive and SGD-compatible) amount of approximate second-order information leads to major improvements in convergence. For large data, this translates into significant performance boost over the standard kernel methods. In particular, we are able to consistently match or improve the state-of-the-art results recently reported in the literature with a small fraction of their computational budget. Finally, we feel that these results show a need for a broader computational perspective on modern large-scale learning to complement more traditional statistical and convergence analyses. In particular, many phenomena of large-scale high-dimensional inference are best understood in terms of optimization on infinite dimensional Hilbert spaces, where standard algorithms can sometimes have properties at odds with finite-dimensional intuition. A systematic analysis concentrating on the approximation power of such algorithms within a budget of computation may lead to progress both in theory and practice.
Tasks
Published 2017-03-30
URL http://arxiv.org/abs/1703.10622v2
PDF http://arxiv.org/pdf/1703.10622v2.pdf
PWC https://paperswithcode.com/paper/diving-into-the-shallows-a-computational
Repo https://github.com/EigenPro/EigenPro-matlab
Framework none

Accelerated Stochastic Power Iteration

Title Accelerated Stochastic Power Iteration
Authors Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu
Abstract Principal component analysis (PCA) is one of the most powerful tools in machine learning. The simplest method for PCA, the power iteration, requires $\mathcal O(1/\Delta)$ full-data passes to recover the principal component of a matrix with eigen-gap $\Delta$. Lanczos, a significantly more complex method, achieves an accelerated rate of $\mathcal O(1/\sqrt{\Delta})$ passes. Modern applications, however, motivate methods that only ingest a subset of available data, known as the stochastic setting. In the online stochastic setting, simple algorithms like Oja’s iteration achieve the optimal sample complexity $\mathcal O(\sigma^2/\Delta^2)$. Unfortunately, they are fully sequential, and also require $\mathcal O(\sigma^2/\Delta^2)$ iterations, far from the $\mathcal O(1/\sqrt{\Delta})$ rate of Lanczos. We propose a simple variant of the power iteration with an added momentum term, that achieves both the optimal sample and iteration complexity. In the full-pass setting, standard analysis shows that momentum achieves the accelerated rate, $\mathcal O(1/\sqrt{\Delta})$. We demonstrate empirically that naively applying momentum to a stochastic method, does not result in acceleration. We perform a novel, tight variance analysis that reveals the “breaking-point variance” beyond which this acceleration does not occur. By combining this insight with modern variance reduction techniques, we construct stochastic PCA algorithms, for the online and offline setting, that achieve an accelerated iteration complexity $\mathcal O(1/\sqrt{\Delta})$. Due to the embarassingly parallel nature of our methods, this acceleration translates directly to wall-clock time if deployed in a parallel environment. Our approach is very general, and applies to many non-convex optimization problems that can now be accelerated using the same technique.
Tasks Dimensionality Reduction
Published 2017-07-10
URL http://arxiv.org/abs/1707.02670v1
PDF http://arxiv.org/pdf/1707.02670v1.pdf
PWC https://paperswithcode.com/paper/accelerated-stochastic-power-iteration
Repo https://github.com/cgh2797/PCA_SVD_accelerate
Framework none

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

Title Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks
Authors Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang
Abstract Fashion landmarks are functional key points defined on clothes, such as corners of neckline, hemline, and cuff. They have been recently introduced as an effective visual representation for fashion image understanding. However, detecting fashion landmarks are challenging due to background clutters, human poses, and scales. To remove the above variations, previous works usually assumed bounding boxes of clothes are provided in training and test as additional annotations, which are expensive to obtain and inapplicable in practice. This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test. To this end, we present a novel Deep LAndmark Network (DLAN), where bounding boxes and landmarks are jointly estimated and trained iteratively in an end-to-end manner. DLAN contains two dedicated modules, including a Selective Dilated Convolution for handling scale discrepancies, and a Hierarchical Recurrent Spatial Transformer for handling background clutters. To evaluate DLAN, we present a large-scale fashion landmark dataset, namely Unconstrained Landmark Database (ULD), consisting of 30K images. Statistics show that ULD is more challenging than existing datasets in terms of image scales, background clutters, and human poses. Extensive experiments demonstrate the effectiveness of DLAN over the state-of-the-art methods. DLAN also exhibits excellent generalization across different clothing categories and modalities, making it extremely suitable for real-world fashion analysis.
Tasks
Published 2017-08-07
URL http://arxiv.org/abs/1708.02044v1
PDF http://arxiv.org/pdf/1708.02044v1.pdf
PWC https://paperswithcode.com/paper/unconstrained-fashion-landmark-detection-via
Repo https://github.com/shumming/GLE_FLD
Framework pytorch
comments powered by Disqus