October 20, 2019

3220 words 16 mins read

Paper Group ANR 6

Paper Group ANR 6

Markov Properties of Discrete Determinantal Point Processes. Semi-Supervised Translation with MMD Networks. Two Stream 3D Semantic Scene Completion. FSNet: An Identity-Aware Generative Model for Image-based Face Swapping. Practical Issues of Action-conditioned Next Image Prediction. Convergence Rate of Block-Coordinate Maximization Burer-Monteiro M …

Markov Properties of Discrete Determinantal Point Processes

Title Markov Properties of Discrete Determinantal Point Processes
Authors Kayvan Sadeghi, Alessandro Rinaldo
Abstract Determinantal point processes (DPPs) are probabilistic models for repulsion. When used to represent the occurrence of random subsets of a finite base set, DPPs allow to model global negative associations in a mathematically elegant and direct way. Discrete DPPs have become popular and computationally tractable models for solving several machine learning tasks that require the selection of diverse objects, and have been successfully applied in numerous real-life problems. Despite their popularity, the statistical properties of such models have not been adequately explored. In this note, we derive the Markov properties of discrete DPPs and show how they can be expressed using graphical models.
Tasks Point Processes
Published 2018-10-04
URL http://arxiv.org/abs/1810.02294v2
PDF http://arxiv.org/pdf/1810.02294v2.pdf
PWC https://paperswithcode.com/paper/markov-properties-of-discrete-determinantal
Repo
Framework

Semi-Supervised Translation with MMD Networks

Title Semi-Supervised Translation with MMD Networks
Authors Mark Hamilton
Abstract This work aims to improve semi-supervised learning in a neural network architecture by introducing a hybrid supervised and unsupervised cost function. The unsupervised component is trained using a differentiable estimator of the Maximum Mean Discrepancy (MMD) distance between the network output and the target dataset. We introduce the notion of an $n$-channel network and several methods to improve performance of these nets based on supervised pre-initialization, and multi-scale kernels. This work investigates the effectiveness of these methods on language translation where very few quality translations are known \textit{a priori}. We also present a thorough investigation of the hyper-parameter space of this method on both synthetic data.
Tasks
Published 2018-10-28
URL http://arxiv.org/abs/1810.11906v1
PDF http://arxiv.org/pdf/1810.11906v1.pdf
PWC https://paperswithcode.com/paper/semi-supervised-translation-with-mmd-networks
Repo
Framework

Two Stream 3D Semantic Scene Completion

Title Two Stream 3D Semantic Scene Completion
Authors Martin Garbade, Yueh-Tung Chen, Johann Sawatzky, Juergen Gall
Abstract Inferring the 3D geometry and the semantic meaning of surfaces, which are occluded, is a very challenging task. Recently, a first end-to-end learning approach has been proposed that completes a scene from a single depth image. The approach voxelizes the scene and predicts for each voxel if it is occupied and, if it is occupied, the semantic class label. In this work, we propose a two stream approach that leverages depth information and semantic information, which is inferred from the RGB image, for this task. The approach constructs an incomplete 3D semantic tensor, which uses a compact three-channel encoding for the inferred semantic information, and uses a 3D CNN to infer the complete 3D semantic tensor. In our experimental evaluation, we show that the proposed two stream approach substantially outperforms the state-of-the-art for semantic scene completion.
Tasks
Published 2018-04-10
URL https://arxiv.org/abs/1804.03550v4
PDF https://arxiv.org/pdf/1804.03550v4.pdf
PWC https://paperswithcode.com/paper/two-stream-3d-semantic-scene-completion
Repo
Framework

FSNet: An Identity-Aware Generative Model for Image-based Face Swapping

Title FSNet: An Identity-Aware Generative Model for Image-based Face Swapping
Authors Ryota Natsume, Tatsuya Yatagawa, Shigeo Morishima
Abstract This paper presents FSNet, a deep generative model for image-based face swapping. Traditionally, face-swapping methods are based on three-dimensional morphable models (3DMMs), and facial textures are replaced between the estimated three-dimensional (3D) geometries in two images of different individuals. However, the estimation of 3D geometries along with different lighting conditions using 3DMMs is still a difficult task. We herein represent the face region with a latent variable that is assigned with the proposed deep neural network (DNN) instead of facial textures. The proposed DNN synthesizes a face-swapped image using the latent variable of the face region and another image of the non-face region. The proposed method is not required to fit to the 3DMM; additionally, it performs face swapping only by feeding two face images to the proposed network. Consequently, our DNN-based face swapping performs better than previous approaches for challenging inputs with different face orientations and lighting conditions. Through several experiments, we demonstrated that the proposed method performs face swapping in a more stable manner than the state-of-the-art method, and that its results are compatible with the method thereof.
Tasks Face Swapping
Published 2018-11-30
URL http://arxiv.org/abs/1811.12666v1
PDF http://arxiv.org/pdf/1811.12666v1.pdf
PWC https://paperswithcode.com/paper/fsnet-an-identity-aware-generative-model-for
Repo
Framework

Practical Issues of Action-conditioned Next Image Prediction

Title Practical Issues of Action-conditioned Next Image Prediction
Authors Donglai Zhu, Hao Chen, Hengshuai Yao, Masoud Nosrati, Peyman Yadmellat, Yunfei Zhang
Abstract The problem of action-conditioned image prediction is to predict the expected next frame given the current camera frame the robot observes and an action selected by the robot. We provide the first comparison of two recent popular models, especially for image prediction on cars. Our major finding is that action tiling encoding is the most important factor leading to the remarkable performance of the CDNA model. We present a light-weight model by action tiling encoding which has a single-decoder feedforward architecture same as [action_video_prediction_honglak]. On a real driving dataset, the CDNA model achieves ${0.3986} \times 10^{-3}$ MSE and ${0.9846}$ Structure SIMilarity (SSIM) with a network size of about {\bfseries ${12.6}$ million} parameters. With a small network of fewer than {\bfseries ${1}$ million} parameters, our new model achieves a comparable performance to CDNA at ${0.3613} \times 10^{-3}$ MSE and ${0.9633}$ SSIM. Our model requires less memory, is more computationally efficient and is advantageous to be used inside self-driving vehicles.
Tasks
Published 2018-02-08
URL http://arxiv.org/abs/1802.02975v1
PDF http://arxiv.org/pdf/1802.02975v1.pdf
PWC https://paperswithcode.com/paper/practical-issues-of-action-conditioned-next
Repo
Framework

Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs

Title Convergence Rate of Block-Coordinate Maximization Burer-Monteiro Method for Solving Large SDPs
Authors Murat A. Erdogdu, Asuman Ozdaglar, Pablo A. Parrilo, Nuri Denizcan Vanli
Abstract Semidefinite programming (SDP) with diagonal constraints arise in many optimization problems, such as Max-Cut, community detection and group synchronization. Although SDPs can be solved to arbitrary precision in polynomial time, generic convex solvers do not scale well with the dimension of the problem. In order to address this issue, Burer and Monteiro proposed to reduce the dimension of the problem by appealing to a low-rank factorization and solve the subsequent non-convex problem instead. In this paper, we present coordinate ascent based methods to solve this non-convex problem with provable convergence guarantees. More specifically, we prove that the block-coordinate maximization algorithm applied to the non-convex Burer-Monteiro method globally converges to a first-order stationary point with a sublinear rate without any assumptions on the problem. We further show that this algorithm converges linearly around a local maximum provided that the objective function exhibits quadratic decay. We establish that this condition generically holds when the rank of the factorization is sufficiently large. Furthermore, incorporating Lanczos method to the block-coordinate maximization, we propose an algorithm that is guaranteed to return a solution that provides $1-O(1/r)$ approximation to the original SDP without any assumptions, where $r$ is the rank of the factorization. This approximation ratio is known to be optimal (up to constants) under the unique games conjecture, and we can explicitly quantify the number of iterations to obtain such a solution.
Tasks Community Detection
Published 2018-07-12
URL https://arxiv.org/abs/1807.04428v2
PDF https://arxiv.org/pdf/1807.04428v2.pdf
PWC https://paperswithcode.com/paper/convergence-rate-of-block-coordinate
Repo
Framework

Pairwise Covariates-adjusted Block Model for Community Detection

Title Pairwise Covariates-adjusted Block Model for Community Detection
Authors Sihan Huang, Yang Feng
Abstract One of the most fundamental problems in network study is community detection. The stochastic block model (SBM) is one widely used model for network data with different estimation methods developed with their community detection consistency results unveiled. However, the SBM is restricted by the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariate information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to efficiently solve PCABM. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model (DCBM) under a wide range of simulated and real networks when covariate information is accessible.
Tasks Community Detection
Published 2018-07-10
URL http://arxiv.org/abs/1807.03469v1
PDF http://arxiv.org/pdf/1807.03469v1.pdf
PWC https://paperswithcode.com/paper/pairwise-covariates-adjusted-block-model-for
Repo
Framework

Whole-Slide Mitosis Detection in H&E Breast Histology Using PHH3 as a Reference to Train Distilled Stain-Invariant Convolutional Networks

Title Whole-Slide Mitosis Detection in H&E Breast Histology Using PHH3 as a Reference to Train Distilled Stain-Invariant Convolutional Networks
Authors David Tellez, Maschenka Balkenhol, Irene Otte-Holler, Rob van de Loo, Rob Vogels, Peter Bult, Carla Wauters, Willem Vreuls, Suzanne Mol, Nico Karssemeijer, Geert Litjens, Jeroen van der Laak, Francesco Ciompi
Abstract Manual counting of mitotic tumor cells in tissue sections constitutes one of the strongest prognostic markers for breast cancer. This procedure, however, is time-consuming and error-prone. We developed a method to automatically detect mitotic figures in breast cancer tissue sections based on convolutional neural networks (CNNs). Application of CNNs to hematoxylin and eosin (H&E) stained histological tissue sections is hampered by: (1) noisy and expensive reference standards established by pathologists, (2) lack of generalization due to staining variation across laboratories, and (3) high computational requirements needed to process gigapixel whole-slide images (WSIs). In this paper, we present a method to train and evaluate CNNs to specifically solve these issues in the context of mitosis detection in breast cancer WSIs. First, by combining image analysis of mitotic activity in phosphohistone-H3 (PHH3) restained slides and registration, we built a reference standard for mitosis detection in entire H&E WSIs requiring minimal manual annotation effort. Second, we designed a data augmentation strategy that creates diverse and realistic H&E stain variations by modifying the hematoxylin and eosin color channels directly. Using it during training combined with network ensembling resulted in a stain invariant mitosis detector. Third, we applied knowledge distillation to reduce the computational requirements of the mitosis detection ensemble with a negligible loss of performance. The system was trained in a single-center cohort and evaluated in an independent multicenter cohort from The Cancer Genome Atlas on the three tasks of the Tumor Proliferation Assessment Challenge (TUPAC). We obtained a performance within the top-3 best methods for most of the tasks of the challenge.
Tasks Data Augmentation, Mitosis Detection
Published 2018-08-17
URL http://arxiv.org/abs/1808.05896v1
PDF http://arxiv.org/pdf/1808.05896v1.pdf
PWC https://paperswithcode.com/paper/whole-slide-mitosis-detection-in-he-breast
Repo
Framework

Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

Title Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces
Authors Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher
Abstract In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral-regularized algorithms, including ridge regression, principal component analysis, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.
Tasks
Published 2018-01-20
URL http://arxiv.org/abs/1801.06720v3
PDF http://arxiv.org/pdf/1801.06720v3.pdf
PWC https://paperswithcode.com/paper/optimal-rates-for-spectral-algorithms-with
Repo
Framework

An Embarrassingly Simple Approach for Knowledge Distillation

Title An Embarrassingly Simple Approach for Knowledge Distillation
Authors Mengya Gao, Yujun Shen, Quanquan Li, Junjie Yan, Liang Wan, Dahua Lin, Chen Change Loy, Xiaoou Tang
Abstract Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and the KD loss simultaneously, using a pre-defined loss weight to balance these two terms. In this work, we propose to first transfer the backbone knowledge from a teacher to the student, and then only learn the task-head of the student network. Such a decomposition of the training process circumvents the need of choosing an appropriate loss weight, which is often difficult in practice, and thus makes it easier to apply to different datasets and tasks. Importantly, the decomposition permits the core of our method, Stage-by-Stage Knowledge Distillation (SSKD), which facilitates progressive feature mimicking from teacher to student. Extensive experiments on CIFAR-100 and ImageNet suggest that SSKD significantly narrows down the performance gap between student and teacher, outperforming state-of-the-art approaches. We also demonstrate the generalization ability of SSKD on other challenging benchmarks, including face recognition on IJB-A dataset as well as object detection on COCO dataset.
Tasks Face Recognition, Object Detection, Transfer Learning
Published 2018-12-05
URL https://arxiv.org/abs/1812.01819v2
PDF https://arxiv.org/pdf/1812.01819v2.pdf
PWC https://paperswithcode.com/paper/feature-matters-a-stage-by-stage-approach-for
Repo
Framework

THORS: An Efficient Approach for Making Classifiers Cost-sensitive

Title THORS: An Efficient Approach for Making Classifiers Cost-sensitive
Authors Ye Tian, Weiping Zhang
Abstract In this paper, we propose an effective THresholding method based on ORder Statistic, called THORS, to convert an arbitrary scoring-type classifier, which can induce a continuous cumulative distribution function of the score, into a cost-sensitive one. The procedure, uses order statistic to find an optimal threshold for classification, requiring almost no knowledge of classifiers itself. Unlike common data-driven methods, we analytically show that THORS has theoretical guaranteed performance, theoretical bounds for the costs and lower time complexity. Coupled with empirical results on several real-world data sets, we argue that THORS is the preferred cost-sensitive technique.
Tasks
Published 2018-11-07
URL http://arxiv.org/abs/1811.02814v1
PDF http://arxiv.org/pdf/1811.02814v1.pdf
PWC https://paperswithcode.com/paper/thors-an-efficient-approach-for-making
Repo
Framework

Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models

Title Beyond Domain Adaptation: Unseen Domain Encapsulation via Universal Non-volume Preserving Models
Authors Thanh-Dat Truong, Chi Nhan Duong, Khoa Luu, Minh-Triet Tran, Minh Do
Abstract Recognition across domains has recently become an active topic in the research community. However, it has been largely overlooked in the problem of recognition in new unseen domains. Under this condition, the delivered deep network models are unable to be updated, adapted or fine-tuned. Therefore, recent deep learning techniques, such as: domain adaptation, feature transferring, and fine-tuning, cannot be applied. This paper presents a novel Universal Non-volume Preserving approach to the problem of domain generalization in the context of deep learning. The proposed method can be easily incorporated with any other ConvNet framework within an end-to-end deep network design to improve the performance. On digit recognition, we benchmark on four popular digit recognition databases, i.e. MNIST, USPS, SVHN and MNIST-M. The proposed method is also experimented on face recognition on Extended Yale-B, CMU-PIE and CMU-MPIE databases and compared against other the state-of-the-art methods. In the problem of pedestrian detection, we empirically observe that the proposed method learns models that improve performance across a priori unknown data distributions.
Tasks Domain Adaptation, Domain Generalization, Face Recognition, Pedestrian Detection
Published 2018-12-09
URL http://arxiv.org/abs/1812.03407v1
PDF http://arxiv.org/pdf/1812.03407v1.pdf
PWC https://paperswithcode.com/paper/beyond-domain-adaptation-unseen-domain
Repo
Framework

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

Title Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Authors Ohad Shamir
Abstract We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots,w_k$), which arise in the context of training depth-$k$ linear neural networks. We prove that for standard random initializations, and under mild assumptions on $f$, the number of iterations required for convergence scales exponentially with the depth $k$. We also show empirically that this phenomenon can occur in higher dimensions, where each $w_i$ is a matrix. This highlights a potential obstacle in understanding the convergence of gradient-based methods for deep linear neural networks, where $k$ is large.
Tasks
Published 2018-09-23
URL https://arxiv.org/abs/1809.08587v4
PDF https://arxiv.org/pdf/1809.08587v4.pdf
PWC https://paperswithcode.com/paper/exponential-convergence-time-of-gradient
Repo
Framework

It All Matters: Reporting Accuracy, Inference Time and Power Consumption for Face Emotion Recognition on Embedded Systems

Title It All Matters: Reporting Accuracy, Inference Time and Power Consumption for Face Emotion Recognition on Embedded Systems
Authors Jelena Milosevic, Dexmont Pena, Andrew Forembsky, David Moloney, Miroslaw Malek
Abstract While several approaches to face emotion recognition task are proposed in literature, none of them reports on power consumption nor inference time required to run the system in an embedded environment. Without adequate knowledge about these factors it is not clear whether we are actually able to provide accurate face emotion recognition in the embedded environment or not, and if not, how far we are from making it feasible and what are the biggest bottlenecks we face. The main goal of this paper is to answer these questions and to convey the message that instead of reporting only detection accuracy also power consumption and inference time should be reported as real usability of the proposed systems and their adoption in human computer interaction strongly depends on it. In this paper, we identify the state-of-the art face emotion recognition methods that are potentially suitable for embedded environment and the most frequently used datasets for this task. Our study shows that most of the performed experiments use datasets with posed expressions or in a particular experimental setup with special conditions for image collection. Since our goal is to evaluate the performance of the identified promising methods in the realistic scenario, we collect a new dataset with non-exaggerated emotions and we use it, in addition to the publicly available datasets, for the evaluation of detection accuracy, power consumption and inference time on three frequently used embedded devices with different computational capabilities. Our results show that gray images are still more suitable for embedded environment than color ones and that for most of the analyzed systems either inference time or energy consumption or both are limiting factor for their adoption in real-life embedded applications.
Tasks Emotion Recognition
Published 2018-06-29
URL http://arxiv.org/abs/1807.00046v1
PDF http://arxiv.org/pdf/1807.00046v1.pdf
PWC https://paperswithcode.com/paper/it-all-matters-reporting-accuracy-inference
Repo
Framework

DeepThin: A Self-Compressing Library for Deep Neural Networks

Title DeepThin: A Self-Compressing Library for Deep Neural Networks
Authors Matthew Sotoudeh, Sara S. Baghsorkhi
Abstract As the industry deploys increasingly large and complex neural networks to mobile devices, more pressure is put on the memory and compute resources of those devices. Deep compression, or compression of deep neural network weight matrices, is a technique to stretch resources for such scenarios. Existing compression methods cannot effectively compress models smaller than 1-2% of their original size. We develop a new compression technique, DeepThin, building on existing research in the area of low rank factorization. We identify and break artificial constraints imposed by low rank approximations by combining rank factorization with a reshaping process that adds nonlinearity to the approximation function. We deploy DeepThin as a plug-gable library integrated with TensorFlow that enables users to seamlessly compress models at different granularities. We evaluate DeepThin on two state-of-the-art acoustic models, TFKaldi and DeepSpeech, comparing it to previous compression work (Pruning, HashNet, and Rank Factorization), empirical limit study approaches, and hand-tuned models. For TFKaldi, our DeepThin networks show better word error rates (WER) than competing methods at practically all tested compression rates, achieving an average of 60% relative improvement over rank factorization, 57% over pruning, 23% over hand-tuned same-size networks, and 6% over the computationally expensive HashedNets. For DeepSpeech, DeepThin-compressed networks achieve better test loss than all other compression methods, reaching a 28% better result than rank factorization, 27% better than pruning, 20% better than hand-tuned same-size networks, and 12% better than HashedNets. DeepThin also provide inference performance benefits ranging from 2X to 14X speedups, depending on the compression ratio and platform cache sizes.
Tasks
Published 2018-02-20
URL http://arxiv.org/abs/1802.06944v1
PDF http://arxiv.org/pdf/1802.06944v1.pdf
PWC https://paperswithcode.com/paper/deepthin-a-self-compressing-library-for-deep
Repo
Framework
comments powered by Disqus