January 30, 2020

3127 words 15 mins read

Paper Group ANR 475

Paper Group ANR 475

Least-squares Optimal Relative Planar Motion for Vehicle-mounted Cameras. A Soft STAPLE Algorithm Combined with Anatomical Knowledge. Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits. Benanza: Automatic $μ$Benchmark Generation to Compute “Lower-bound” Latency and Inform Optimizations of Deep Learning Models on GPUs. Preparing Less …

Least-squares Optimal Relative Planar Motion for Vehicle-mounted Cameras

Title Least-squares Optimal Relative Planar Motion for Vehicle-mounted Cameras
Authors Levente Hajder, Daniel Barath
Abstract A new closed-form solver is proposed minimizing the algebraic error optimally, in the least-squares sense, to estimate the relative planar motion of two calibrated cameras. The main objective is to solve the over-determined case, i.e., when a larger-than-minimal sample of point correspondences is given - thus, estimating the motion from at least three correspondences. The algorithm requires the camera movement to be constrained to a plane, e.g. mounted to a vehicle, and the image plane to be orthogonal to the ground. The solver obtains the motion parameters as the roots of a 6-th degree polynomial. It is validated both in synthetic experiments and on publicly available real-world datasets that using the proposed solver leads to results superior to the state-of-the-art in terms of geometric accuracy with no noticeable deterioration in the processing time.
Tasks
Published 2019-12-13
URL https://arxiv.org/abs/1912.06464v1
PDF https://arxiv.org/pdf/1912.06464v1.pdf
PWC https://paperswithcode.com/paper/least-squares-optimal-relative-planar-motion
Repo
Framework

A Soft STAPLE Algorithm Combined with Anatomical Knowledge

Title A Soft STAPLE Algorithm Combined with Anatomical Knowledge
Authors Eytan Kats, Jacob Goldberger, Hayit Greenspan
Abstract Supervised machine learning algorithms, especially in the medical domain, are affected by considerable ambiguity in expert markings. In this study we address the case where the experts’ opinion is obtained as a distribution over the possible values. We propose a soft version of the STAPLE algorithm for experts’ markings fusion that can handle soft values. The algorithm was applied to obtain consensus from soft Multiple Sclerosis (MS) segmentation masks. Soft MS segmentations are constructed from manual binary delineations by including lesion surrounding voxels in the segmentation mask with a reduced confidence weight. We suggest that these voxels contain additional anatomical information about the lesion structure. The fused masks are utilized as ground truth mask to train a Fully Convolutional Neural Network (FCNN). The proposed method was evaluated on the MICCAI 2016 challenge dataset, and yields improved precision-recall tradeoff and a higher average Dice similarity coefficient.
Tasks
Published 2019-10-26
URL https://arxiv.org/abs/1910.12077v1
PDF https://arxiv.org/pdf/1910.12077v1.pdf
PWC https://paperswithcode.com/paper/a-soft-staple-algorithm-combined-with
Repo
Framework

Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

Title Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits
Authors Ali Yekkehkhany, Ebrahim Arian, Mohammad Hajiesmaili, Rakesh Nagi
Abstract In this paper, we study multi-armed bandit problems in explore-then-commit setting. In our proposed explore-then-commit setting, the goal is to identify the best arm after a pure experimentation (exploration) phase and exploit it once or for a given finite number of times. We identify that although the arm with the highest expected reward is the most desirable objective for infinite exploitations, it is not necessarily the one that is most probable to have the highest reward in a single or finite-time exploitations. Alternatively, we advocate the idea of risk-aversion where the objective is to compete against the arm with the best risk-return trade-off. Then, we propose two algorithms whose objectives are to select the arm that is most probable to reward the most. Using a new notion of finite-time exploitation regret, we find an upper bound for the minimum number of experiments before commitment, to guarantee an upper bound for the regret. As compared to existing risk-averse bandit algorithms, our algorithms do not rely on hyper-parameters, resulting in a more robust behavior in practice, which is verified by the numerical evaluation.
Tasks
Published 2019-04-30
URL https://arxiv.org/abs/1904.13387v3
PDF https://arxiv.org/pdf/1904.13387v3.pdf
PWC https://paperswithcode.com/paper/risk-averse-explore-then-commit-algorithms
Repo
Framework

Benanza: Automatic $μ$Benchmark Generation to Compute “Lower-bound” Latency and Inform Optimizations of Deep Learning Models on GPUs

Title Benanza: Automatic $μ$Benchmark Generation to Compute “Lower-bound” Latency and Inform Optimizations of Deep Learning Models on GPUs
Authors Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu
Abstract As Deep Learning (DL) models have been increasingly used in latency-sensitive applications, there has been a growing interest in improving their response time. An important venue for such improvement is to profile the execution of these models and characterize their performance to identify possible optimization opportunities. However, the current profiling tools lack the highly desired abilities to characterize ideal performance, identify sources of inefficiency, and quantify the benefits of potential optimizations. Such deficiencies have led to slow characterization/optimization cycles that cannot keep up with the fast pace at which new DL models are introduced. We propose Benanza, a sustainable and extensible benchmarking and analysis design that speeds up the characterization/optimization cycle of DL models on GPUs. Benanza consists of four major components: a model processor that parses models into an internal representation, a configurable benchmark generator that automatically generates micro-benchmarks given a set of models, a database of benchmark results, and an analyzer that computes the “lower-bound” latency of DL models using the benchmark data and informs optimizations of model execution. The “lower-bound” latency metric estimates the ideal model execution on a GPU system and serves as the basis for identifying optimization opportunities in frameworks or system libraries. We used Benanza to evaluate 30 ONNX models in MXNet, ONNX Runtime, and PyTorch on 7 GPUs ranging from Kepler to the latest Turing, and identified optimizations in parallel layer execution, cuDNN convolution algorithm selection, framework inefficiency, layer fusion, and using Tensor Cores.
Tasks
Published 2019-11-16
URL https://arxiv.org/abs/1911.06922v3
PDF https://arxiv.org/pdf/1911.06922v3.pdf
PWC https://paperswithcode.com/paper/benanza-automatic-ubenchmark-generation-to
Repo
Framework

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Title Preparing Lessons: Improve Knowledge Distillation with Better Supervision
Authors Tiancheng Wen, Shenqi Lai, Xueming Qian
Abstract Knowledge distillation (KD) is widely used for training a compact model with the supervision of another large model, which could effectively improve the performance. Previous methods mainly focus on two aspects: 1) training the student to mimic representation space of the teacher; 2) training the model progressively or adding extra module like discriminator. Knowledge from teacher is useful, but it is still not exactly right compared with ground truth. Besides, overly uncertain supervision also influences the result. We introduce two novel approaches, Knowledge Adjustment (KA) and Dynamic Temperature Distillation (DTD), to penalize bad supervision and improve student model. Experiments on CIFAR-100, CINIC-10 and Tiny ImageNet show that our methods get encouraging performance compared with state-of-the-art methods. When combined with other KD-based methods, the performance will be further improved.
Tasks
Published 2019-11-18
URL https://arxiv.org/abs/1911.07471v1
PDF https://arxiv.org/pdf/1911.07471v1.pdf
PWC https://paperswithcode.com/paper/preparing-lessons-improve-knowledge
Repo
Framework

Learning Interpretable Models Using an Oracle

Title Learning Interpretable Models Using an Oracle
Authors Abhishek Ghose, Balaraman Ravindran
Abstract As Machine Learning (ML) becomes pervasive in various real world systems, the need for models to be interpretable or explainable has increased. We focus on interpretability, noting that models often need to be constrained in size for them to be considered understandable, e.g., a decision tree of depth 5 is easier to interpret than one of depth 50. This suggests a trade-off between interpretability and accuracy. We propose a technique to minimize this tradeoff. Our strategy is to first learn a powerful, possibly black-box, probabilistic model on the data, which we refer to as the oracle. We use this to adaptively sample the training dataset to present data to our model of interest to learn from. Determining the sampling strategy is formulated as an optimization problem that, independent of the dimensionality of the data, uses only seven variables. We empirically show that this often significantly increases the accuracy of our model. Our technique is model agnostic - in that, both the interpretable model and the oracle might come from any model family. Results using multiple real world datasets, using Linear Probability Models and Decision Trees as interpretable models, and Gradient Boosted Model and Random Forest as oracles are presented. Additionally, we discuss an interesting example of using a sentence-embedding based text classifier as an oracle to improve the accuracy of a term-frequency based bag-of-words linear classifier.
Tasks Sentence Embedding
Published 2019-06-17
URL https://arxiv.org/abs/1906.06852v1
PDF https://arxiv.org/pdf/1906.06852v1.pdf
PWC https://paperswithcode.com/paper/learning-interpretable-models-using-an-oracle
Repo
Framework

A Practical Solution for SAR Despeckling with Only Single Speckled Images

Title A Practical Solution for SAR Despeckling with Only Single Speckled Images
Authors Ye Yuan, Jianguo Sun, Jian Guan, Pengming Feng, Yanxia Wu
Abstract In this letter, we aim to address synthetic aperture radar (SAR) despeckling problem with the necessity of neither clean (speckle-free) SAR images nor independent speckled image pairs from the same scene, a practical solution for SAR despeckling (PSD) is proposed. Firstly, to generate speckled-to-speckled (S2S) image pairs from the same scene in the situation of only single speckled SAR images are available, an adversarial learning framework is designed. Then, the S2S SAR image pairs are employed to train a modified despeckling Nested-UNet model using the Noise2Noise (N2N) strategy. Moreover, an iterative version of the PSD method (PSDi) is also proposed. The performance of the proposed methods is demonstrated by both synthetic speckled and real SAR data. SAR block-matching 3-D algorithm (SAR-BM3D) and SAR dilated residual network (SAR-DRN) are used in the visual and quantitative comparison. Experimental results show that the proposed methods can reach a good tradeoff between speckle suppression and edge preservation.
Tasks
Published 2019-12-13
URL https://arxiv.org/abs/1912.06295v1
PDF https://arxiv.org/pdf/1912.06295v1.pdf
PWC https://paperswithcode.com/paper/a-practical-solution-for-sar-despeckling-with
Repo
Framework

Towards Lingua Franca Named Entity Recognition with BERT

Title Towards Lingua Franca Named Entity Recognition with BERT
Authors Taesun Moon, Parul Awasthy, Jian Ni, Radu Florian
Abstract Information extraction is an important task in NLP, enabling the automatic extraction of data for relational database filling. Historically, research and data was produced for English text, followed in subsequent years by datasets in Arabic, Chinese (ACE/OntoNotes), Dutch, Spanish, German (CoNLL evaluations), and many others. The natural tendency has been to treat each language as a different dataset and build optimized models for each. In this paper we investigate a single Named Entity Recognition model, based on a multilingual BERT, that is trained jointly on many languages simultaneously, and is able to decode these languages with better accuracy than models trained only on one language. To improve the initial model, we study the use of regularization strategies such as multitask learning and partial gradient updates. In addition to being a single model that can tackle multiple languages (including code switch), the model could be used to make zero-shot predictions on a new language, even ones for which training data is not available, out of the box. The results show that this model not only performs competitively with monolingual models, but it also achieves state-of-the-art results on the CoNLL02 Dutch and Spanish datasets, OntoNotes Arabic and Chinese datasets. Moreover, it performs reasonably well on unseen languages, achieving state-of-the-art for zero-shot on three CoNLL languages.
Tasks Named Entity Recognition
Published 2019-11-19
URL https://arxiv.org/abs/1912.01389v2
PDF https://arxiv.org/pdf/1912.01389v2.pdf
PWC https://paperswithcode.com/paper/towards-lingua-franca-named-entity
Repo
Framework

Semi-Parametric Efficient Policy Learning with Continuous Actions

Title Semi-Parametric Efficient Policy Learning with Continuous Actions
Authors Mert Demirer, Vasilis Syrgkanis, Greg Lewis, Victor Chernozhukov
Abstract We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation.
Tasks
Published 2019-05-24
URL https://arxiv.org/abs/1905.10116v2
PDF https://arxiv.org/pdf/1905.10116v2.pdf
PWC https://paperswithcode.com/paper/semi-parametric-efficient-policy-learning
Repo
Framework

ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs

Title ODE Analysis of Stochastic Gradient Methods with Optimism and Anchoring for Minimax Problems and GANs
Authors Ernest K. Ryu, Kun Yuan, Wotao Yin
Abstract Despite remarkable empirical success, the training dynamics of generative adversarial networks (GAN), which involves solving a minimax game using stochastic gradients, is still poorly understood. In this work, we analyze last-iterate convergence of simultaneous gradient descent (simGD) and its variants under the assumption of convex-concavity, guided by a continuous-time analysis with differential equations. First, we show that simGD, as is, converges with stochastic sub-gradients under strict convexity in the primal variable. Second, we generalize optimistic simGD to accommodate an optimism rate separate from the learning rate and show its convergence with full gradients. Finally, we present anchored simGD, a new method, and show convergence with stochastic subgradients.
Tasks
Published 2019-05-26
URL https://arxiv.org/abs/1905.10899v2
PDF https://arxiv.org/pdf/1905.10899v2.pdf
PWC https://paperswithcode.com/paper/ode-analysis-of-stochastic-gradient-methods
Repo
Framework

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

Title NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units
Authors Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu
Abstract To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms. Similar to how GPUs have evolved from a slave device into a mainstream processor architecture, it is likely that NPUs will become first class citizens in this fast-evolving heterogeneous architecture space. This paper makes a case for enabling address translation in NPUs to decouple the virtual and physical memory address space. Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored for NPUs. Compared to an oracular MMU design point, our proposal incurs only an average 0.06% performance overhead.
Tasks
Published 2019-11-15
URL https://arxiv.org/abs/1911.06859v1
PDF https://arxiv.org/pdf/1911.06859v1.pdf
PWC https://paperswithcode.com/paper/neummu-architectural-support-for-efficient
Repo
Framework

Distributed Optimization for Over-Parameterized Learning

Title Distributed Optimization for Over-Parameterized Learning
Authors Chi Zhang, Qianxiao Li
Abstract Distributed optimization often consists of two updating phases: local optimization and inter-node communication. Conventional approaches require working nodes to communicate with the server every one or few iterations to guarantee convergence. In this paper, we establish a completely different conclusion that each node can perform an arbitrary number of local optimization steps before communication. Moreover, we show that the more local updating can reduce the overall communication, even for an infinity number of steps where each node is free to update its local model to near-optimality before exchanging information. The extra assumption we make is that the optimal sets of local loss functions have a non-empty intersection, which is inspired by the over-paramterization phenomenon in large-scale optimization and deep learning. Our theoretical findings are confirmed by both distributed convex optimization and deep learning experiments.
Tasks Distributed Optimization
Published 2019-06-14
URL https://arxiv.org/abs/1906.06205v1
PDF https://arxiv.org/pdf/1906.06205v1.pdf
PWC https://paperswithcode.com/paper/distributed-optimization-for-over
Repo
Framework

Classical Information Theory of Networks

Title Classical Information Theory of Networks
Authors Filippo Radicchi, Dmitri Krioukov, Harrison Hartle, Ginestra Bianconi
Abstract Heterogeneity is among the most important features characterizing real-world networks. Empirical evidence in support of this fact is unquestionable. Existing theoretical frameworks justify heterogeneity in networks as a convenient way to enhance desirable systemic features, such as robustness, synchronizability and navigability. Information theory is one of the most fundamental theoretical frameworks of network science and machine learning. However, the current information theory frameworks for understading networks, based on maximum entropy network ensembles, are not able to explain the emergence of heterogeneity in complex networks. Here, we fill this gap of knowledge by developing a classical information theoretical framework for networks based on finding a trade-off between the information content of a compressed representation of the ensemble and the information content of the actual network ensemble. We show that among all degree distributions that can be used to generate random networks, the one emerging from the principle of maximum entropy is a power law. We also study spatially embedded networks finding that the interactions between nodes naturally lead to nonuniform distributions of points in the space. The pertinent features of real-world air transportation networks are well described by the proposed framework.
Tasks
Published 2019-08-10
URL https://arxiv.org/abs/1908.03811v3
PDF https://arxiv.org/pdf/1908.03811v3.pdf
PWC https://paperswithcode.com/paper/classical-information-theory-of-networks
Repo
Framework

Lie Group Auto-Encoder

Title Lie Group Auto-Encoder
Authors Liyu Gong, Qiang Cheng
Abstract In this paper, we propose an auto-encoder based generative neural network model whose encoder compresses the inputs into vectors in the tangent space of a special Lie group manifold: upper triangular positive definite affine transform matrices (UTDATs). UTDATs are representations of Gaussian distributions and can straightforwardly generate Gaussian distributed samples. Therefore, the encoder is trained together with a decoder (generator) which takes Gaussian distributed latent vectors as input. Compared with related generative models such as variational auto-encoder, the proposed model incorporates the information on geometric properties of Gaussian distributions. As a special case, we derive an exponential mapping layer for diagonal Gaussian UTDATs which eliminates matrix exponential operator compared with general exponential mapping in Lie group theory. Moreover, we derive an intrinsic loss for UTDAT Lie group which can be calculated as l-2 loss in the tangent space. Furthermore, inspired by the Lie group theory, we propose to use the Lie algebra vectors rather than the raw parameters (e.g. mean) of Gaussian distributions as compressed representations of original inputs. Experimental results verity the effectiveness of the proposed new generative model and the benefits gained from the Lie group structural information of UTDATs.
Tasks
Published 2019-01-28
URL http://arxiv.org/abs/1901.09970v1
PDF http://arxiv.org/pdf/1901.09970v1.pdf
PWC https://paperswithcode.com/paper/lie-group-auto-encoder
Repo
Framework

Analytical Derivatives for Differentiable Renderer: 3D Pose Estimation by Silhouette Consistency

Title Analytical Derivatives for Differentiable Renderer: 3D Pose Estimation by Silhouette Consistency
Authors Zaiqiang Wu, Wei Jiang
Abstract Differentiable render is widely used in optimization-based 3D reconstruction which requires gradients from differentiable operations for gradient-based optimization. The existing differentiable renderers obtain the gradients of rendering via numerical technique which is of low accuracy and efficiency. Motivated by this fact, a differentiable mesh renderer with analytical gradients is proposed. The main obstacle of rasterization based rendering being differentiable is the discrete sampling operation. To make the rasterization differentiable, the pixel intensity is defined as a double integral over the pixel area and the integral is approximated by anti-aliasing with an average filter. Then the analytical gradients with respect to the vertices coordinates can be derived from the continuous definition of pixel intensity. To demonstrate the effectiveness and efficiency of the proposed differentiable renderer, experiments of 3D pose estimation by only multi-viewpoint silhouettes were conducted. The experimental results show that 3D pose estimation without 3D and 2D joints supervision is capable of producing competitive results both qualitatively and quantitatively. The experimental results also show that the proposed differentiable renderer is of higher accuracy and efficiency compared with previous method of differentiable renderer.
Tasks 3D Pose Estimation, 3D Reconstruction, Pose Estimation
Published 2019-06-19
URL https://arxiv.org/abs/1906.07870v1
PDF https://arxiv.org/pdf/1906.07870v1.pdf
PWC https://paperswithcode.com/paper/analytical-derivatives-for-differentiable
Repo
Framework
comments powered by Disqus