April 3, 2020

3007 words 15 mins read

Paper Group ANR 48

Paper Group ANR 48

Diagonal Preconditioning: Theory and Algorithms. BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method. Toward Cross-Domain Speech Recognition with End-to-End Models. Unity Style Transfer for Person Re-Identification. Recommendation on a Budget: Column Space Recovery from Partially Observed Entries with Random …

Diagonal Preconditioning: Theory and Algorithms

Title Diagonal Preconditioning: Theory and Algorithms
Authors Zhaonan Qu, Yinyu Ye, Zhengyuan Zhou
Abstract Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.
Tasks Causal Inference
Published 2020-03-17
URL https://arxiv.org/abs/2003.07545v2
PDF https://arxiv.org/pdf/2003.07545v2.pdf
PWC https://paperswithcode.com/paper/multi-action-offline-policy-learning-with

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

Title BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method
Authors Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang
Abstract Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance and limited applicability. To solve the problem, we propose a new block-based pruning framework that comprises a general and flexible structured pruning dimension as well as a powerful and efficient reweighted regularization method. Our framework is universal, which can be applied to both CNNs and RNNs, implying complete support for the two major kinds of computation-intensive layers (i.e., CONV and FC layers). To complete all aspects of the pruning-for-acceleration task, we also integrate compiler-based code optimization into our framework that can perform DNN inference in a real-time manner. To the best of our knowledge, it is the first time that the weight pruning framework achieves universal coverage for both CNNs and RNNs with real-time mobile acceleration and no accuracy compromise.
Published 2020-01-23
URL https://arxiv.org/abs/2001.08357v2
PDF https://arxiv.org/pdf/2001.08357v2.pdf
PWC https://paperswithcode.com/paper/blk-rew-a-unified-block-based-dnn-pruning

Toward Cross-Domain Speech Recognition with End-to-End Models

Title Toward Cross-Domain Speech Recognition with End-to-End Models
Authors Thai-Son Nguyen, Sebastian Stüker, Alex Waibel
Abstract In the area of multi-domain speech recognition, research in the past focused on hybrid acoustic models to build cross-domain and domain-invariant speech recognition systems. In this paper, we empirically examine the difference in behavior between hybrid acoustic models and neural end-to-end systems when mixing acoustic training data from several domains. For these experiments we composed a multi-domain dataset from public sources, with the different domains in the corpus covering a wide variety of topics and acoustic conditions such as telephone conversations, lectures, read speech and broadcast news. We show that for the hybrid models, supplying additional training data from other domains with mismatched acoustic conditions does not increase the performance on specific domains. However, our end-to-end models optimized with sequence-based criterion generalize better than the hybrid models on diverse domains. In term of word-error-rate performance, our experimental acoustic-to-word and attention-based models trained on multi-domain dataset reach the performance of domain-specific long short-term memory (LSTM) hybrid models, thus resulting in multi-domain speech recognition systems that do not suffer in performance over domain specific ones. Moreover, the use of neural end-to-end models eliminates the need of domain-adapted language models during recognition, which is a great advantage when the input domain is unknown.
Tasks Speech Recognition
Published 2020-03-09
URL https://arxiv.org/abs/2003.04194v1
PDF https://arxiv.org/pdf/2003.04194v1.pdf
PWC https://paperswithcode.com/paper/toward-cross-domain-speech-recognition-with

Unity Style Transfer for Person Re-Identification

Title Unity Style Transfer for Person Re-Identification
Authors Chong Liu, Xiaojun Chang, Yi-Dong Shen
Abstract Style variation has been a major challenge for person re-identification, which aims to match the same pedestrians across different cameras. Existing works attempted to address this problem with camera-invariant descriptor subspace learning. However, there will be more image artifacts when the difference between the images taken by different cameras is larger. To solve this problem, we propose a UnityStyle adaption method, which can smooth the style disparities within the same camera and across different cameras. Specifically, we firstly create UnityGAN to learn the style changes between cameras, producing shape-stable style-unity images for each camera, which is called UnityStyle images. Meanwhile, we use UnityStyle images to eliminate style differences between different images, which makes a better match between query and gallery. Then, we apply the proposed method to Re-ID models, expecting to obtain more style-robust depth features for querying. We conduct extensive experiments on widely used benchmark datasets to evaluate the performance of the proposed framework, the results of which confirm the superiority of the proposed model.
Tasks Person Re-Identification, Style Transfer
Published 2020-03-04
URL https://arxiv.org/abs/2003.02068v1
PDF https://arxiv.org/pdf/2003.02068v1.pdf
PWC https://paperswithcode.com/paper/unity-style-transfer-for-person-re

Recommendation on a Budget: Column Space Recovery from Partially Observed Entries with Random or Active Sampling

Title Recommendation on a Budget: Column Space Recovery from Partially Observed Entries with Random or Active Sampling
Authors C. Kim, M. Bayati
Abstract We analyze alternating minimization for column space recovery of a partially observed, approximately low rank matrix with a growing number of columns and a fixed budget of observations per column. In this work, we prove that if the budget is greater than the rank of the matrix, column space recovery succeeds – as the number of columns grows, the estimate from alternating minimization converges to the true column space with probability tending to one. From our proof techniques, we naturally formulate an active sampling strategy for choosing entries of a column that is theoretically and empirically (on synthetic and real data) better than the commonly studied uniformly random sampling strategy.
Published 2020-02-26
URL https://arxiv.org/abs/2002.11589v1
PDF https://arxiv.org/pdf/2002.11589v1.pdf
PWC https://paperswithcode.com/paper/recommendation-on-a-budget-column-space

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

Title Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding
Authors Hongseok Namkoong, Ramtin Keramati, Steve Yadlowsky, Emma Brunskill
Abstract When observed decisions depend only on observed features, off-policy policy evaluation (OPE) methods for sequential decision making problems can estimate the performance of evaluation policies before deploying them. This assumption is frequently violated due to unobserved confounders, unrecorded variables that impact both the decisions and their outcomes. We assess robustness of OPE methods under unobserved confounding by developing worst-case bounds on the performance of an evaluation policy. When unobserved confounders can affect every decision in an episode, we demonstrate that even small amounts of per-decision confounding can heavily bias OPE methods. Fortunately, in a number of important settings found in healthcare, policy-making, operations, and technology, unobserved confounders may primarily affect only one of the many decisions made. Under this less pessimistic model of one-decision confounding, we propose an efficient loss-minimization-based procedure for computing worst-case bounds, and prove its statistical consistency. On two simulated healthcare examples—management of sepsis patients and developmental interventions for autistic children—where this is a reasonable model of confounding, we demonstrate that our method invalidates non-robust results and provides meaningful certificates of robustness, allowing reliable selection of policies even under unobserved confounding.
Tasks Decision Making
Published 2020-03-12
URL https://arxiv.org/abs/2003.05623v1
PDF https://arxiv.org/pdf/2003.05623v1.pdf
PWC https://paperswithcode.com/paper/off-policy-policy-evaluation-for-sequential

In-Domain GAN Inversion for Real Image Editing

Title In-Domain GAN Inversion for Real Image Editing
Authors Jiapeng Zhu, Yujun Shen, Deli Zhao, Bolei Zhou
Abstract Recent work has shown that a variety of controllable semantics emerges in the latent space of the Generative Adversarial Networks (GANs) when being trained to synthesize images. However, it is difficult to use these learned semantics for real image editing. A common practice of feeding a real image to a trained GAN generator is to invert it back to a latent code. However, we find that existing inversion methods typically focus on reconstructing the target image by pixel values yet fail to land the inverted code in the semantic domain of the original latent space. As a result, the reconstructed image cannot well support semantic editing through varying the latent code. To solve this problem, we propose an in-domain GAN inversion approach, which not only faithfully reconstructs the input image but also ensures the inverted code to be semantically meaningful for editing. We first learn a novel domain-guided encoder to project any given image to the native latent space of GANs. We then propose a domain-regularized optimization by involving the encoder as a regularizer to fine-tune the code produced by the encoder, which better recovers the target image. Extensive experiments suggest that our inversion method achieves satisfying real image reconstruction and more importantly facilitates various image editing tasks, such as image interpolation and semantic manipulation, significantly outperforming start-of-the-arts.
Tasks Image Reconstruction
Published 2020-03-31
URL https://arxiv.org/abs/2004.00049v1
PDF https://arxiv.org/pdf/2004.00049v1.pdf
PWC https://paperswithcode.com/paper/in-domain-gan-inversion-for-real-image

CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Title CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus
Authors Florian Kluger, Eric Brachmann, Hanno Ackermann, Carsten Rother, Michael Ying Yang, Bodo Rosenhahn
Abstract We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. Applications include finding multiple vanishing points in man-made scenes, fitting planes to architectural imagery, or estimating multiple rigid motions within the same sequence. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. A neural network conditioned on previously detected models guides a RANSAC estimator to different subsets of all measurements, thereby finding model instances one after another. We train our method supervised as well as self-supervised. For supervised training of the search strategy, we contribute a new dataset for vanishing point estimation. Leveraging this dataset, the proposed algorithm is superior with respect to other robust estimators as well as to designated vanishing point estimation algorithms. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods.
Tasks Homography Estimation
Published 2020-01-08
URL https://arxiv.org/abs/2001.02643v3
PDF https://arxiv.org/pdf/2001.02643v3.pdf
PWC https://paperswithcode.com/paper/consac-robust-multi-model-fitting-by

Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy Classifiers

Title Supervised Enhanced Soft Subspace Clustering (SESSC) for TSK Fuzzy Classifiers
Authors Yuqi Cui, Huidong Wang, Dongrui Wu
Abstract Fuzzy c-means based clustering algorithms are frequently used for Takagi-Sugeno-Kang (TSK) fuzzy classifier antecedent parameter estimation. One rule is initialized from each cluster. However, most of these clustering algorithms are unsupervised, which waste valuable label information in the training data. This paper proposes a supervised enhanced soft subspace clustering (SESSC) algorithm, which considers simultaneously the within-cluster compactness, between-cluster separation, and label information in clustering. It can effectively deal with high-dimensional data, be used as a classifier alone, or be integrated into a TSK fuzzy classifier to further improve its performance. Experiments on nine UCI datasets from various application domains demonstrated that SESSC based initialization outperformed other clustering approaches, especially when the number of rules is small.
Published 2020-02-27
URL https://arxiv.org/abs/2002.12404v1
PDF https://arxiv.org/pdf/2002.12404v1.pdf
PWC https://paperswithcode.com/paper/supervised-enhanced-soft-subspace-clustering

Cross-conformal e-prediction

Title Cross-conformal e-prediction
Authors Vladimir Vovk
Abstract This note discusses a simple modification of cross-conformal prediction inspired by recent work on e-values. The precursor of conformal prediction developed in the 1990s by Gammerman, Vapnik, and Vovk was also based on e-values and is called conformal e-prediction in this note. Replacing e-values by p-values led to conformal prediction, which has important advantages over conformal e-prediction without obvious disadvantages. The situation with cross-conformal prediction is, however, different: whereas for cross-conformal prediction validity is only an empirical fact (and can be broken with excessive randomization), this note draws the reader’s attention to the obvious fact that cross-conformal e-prediction enjoys a guaranteed property of validity.
Published 2020-01-16
URL https://arxiv.org/abs/2001.05989v1
PDF https://arxiv.org/pdf/2001.05989v1.pdf
PWC https://paperswithcode.com/paper/cross-conformal-e-prediction

Machine learning based non-Newtonian fluid model with molecular fidelity

Title Machine learning based non-Newtonian fluid model with molecular fidelity
Authors Huan Lei, Lei Wu, Weinan E
Abstract We introduce a machine-learning-based framework for constructing continuum non-Newtonian fluid dynamics model directly from a micro-scale description. Polymer solution is used as an example to demonstrate the essential ideas. To faithfully retain molecular fidelity, we establish a micro-macro correspondence via a set of encoders for the micro-scale polymer configurations and their macro-scale counterparts, a set of nonlinear conformation tensors. The dynamics of these conformation tensors can be derived from the micro-scale model and the relevant terms can be parametrized using machine learning. The final model, named the deep non-Newtonian model (DeePN$^2$), takes the form of conventional non-Newtonian fluid dynamics models, with a new form of the objective tensor derivative. Numerical results demonstrate the accuracy of DeePN$^2$.
Published 2020-03-07
URL https://arxiv.org/abs/2003.03672v1
PDF https://arxiv.org/pdf/2003.03672v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-based-non-newtonian-fluid

Fair Learning with Private Demographic Data

Title Fair Learning with Private Demographic Data
Authors Hussein Mozannar, Mesrob I. Ohannessian, Nathan Srebro
Abstract Sensitive attributes such as race are rarely available to learners in real world settings as their collection is often restricted by laws and regulations. We give a scheme that allows individuals to release their sensitive information privately while still allowing any downstream entity to learn non-discriminatory predictors. We show how to adapt non-discriminatory learners to work with privatized protected attributes giving theoretical guarantees on performance. Finally, we highlight how the methodology could apply to learning fair predictors in settings where protected attributes are only available for a subset of the data.
Published 2020-02-26
URL https://arxiv.org/abs/2002.11651v1
PDF https://arxiv.org/pdf/2002.11651v1.pdf
PWC https://paperswithcode.com/paper/fair-learning-with-private-demographic-data

Stochastic Flows and Geometric Optimization on the Orthogonal Group

Title Stochastic Flows and Geometric Optimization on the Orthogonal Group
Authors Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani
Abstract We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, normalizing flows and metric learning. We show an intriguing connection between efficient stochastic optimization on the orthogonal group and graph theory (e.g. matching problem, partition functions over graphs, graph-coloring). We leverage the theory of Lie groups and provide theoretical results for the designed class of algorithms. We demonstrate broad applicability of our methods by showing strong performance on the seemingly unrelated tasks of learning world models to obtain stable policies for the most difficult $\mathrm{Humanoid}$ agent from $\mathrm{OpenAI}$ $\mathrm{Gym}$ and improving convolutional neural networks.
Tasks Metric Learning, Stochastic Optimization
Published 2020-03-30
URL https://arxiv.org/abs/2003.13563v1
PDF https://arxiv.org/pdf/2003.13563v1.pdf
PWC https://paperswithcode.com/paper/stochastic-flows-and-geometric-optimization

Faster ILOD: Incremental Learning for Object Detectors based on Faster RCNN

Title Faster ILOD: Incremental Learning for Object Detectors based on Faster RCNN
Authors Can Peng, Kun Zhao, Brian C. Lovell
Abstract The human vision and perception system is inherently incremental where new knowledge is continually learned over time whilst existing knowledge is retained. On the other hand, deep learning networks are ill-equipped for incremental learning. When a well-trained network is adapted to new categories, its performance on the old categories will dramatically degrade. To address this problem, incremental learning methods have been explored to preserve the old knowledge of deep learning models. However, the state-of-the-art incremental object detector employs an external fixed region proposal method that increases overall computation time and reduces accuracy compared to object detectors such as Faster RCNN that use trainable Region Proposal Networks (RPNs). The purpose of this paper is to design an efficient end-to-end incremental object detector using knowledge distillation for object detectors based on RPNs. We first evaluate and analyze the performance of RPN-based detector with classic distillation towards incremental detection tasks. Then, we introduce multi-network adaptive distillation that properly retains knowledge from the old categories when fine-turning the model for new task. Experiments on the benchmark datasets, PASCAL VOC and COCO, demonstrate that the proposed incremental detector is more accurate as well as being 13 times faster than the baseline detector.
Published 2020-03-09
URL https://arxiv.org/abs/2003.03901v1
PDF https://arxiv.org/pdf/2003.03901v1.pdf
PWC https://paperswithcode.com/paper/faster-ilod-incremental-learning-for-object

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

Title A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Authors Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, Sebastian U. Stich
Abstract Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).
Tasks Stochastic Optimization
Published 2020-03-23
URL https://arxiv.org/abs/2003.10422v1
PDF https://arxiv.org/pdf/2003.10422v1.pdf
PWC https://paperswithcode.com/paper/a-unified-theory-of-decentralized-sgd-with
comments powered by Disqus