October 18, 2019

3668 words 18 mins read

Paper Group ANR 524

Social Choice Random Utility Models of Intransitive Pairwise Comparisons. Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle. Semantic Stereo for Incidental Satellite Images. When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets. Scene Coordinate Regr …


Title	Social Choice Random Utility Models of Intransitive Pairwise Comparisons
Authors	Rahul Makhijani, Johan Ugander
Abstract	There is a growing need for discrete choice models that account for the complex nature of human choices, escaping traditional behavioral assumptions such as the transitivity of pairwise preferences. Recently, several parametric models of intransitive comparisons have been proposed, but in all cases the maximum likelihood problem is non-concave, making inference difficult. In this work we generalize this trend, showing that there cannot exist any parametric model with a concave log-likelihood function that can exhibit intransitive preferences. Given this result, we motivate a new model for analyzing intransitivity in pairwise comparisons, taking inspiration from the Condorcet method (majority vote) in social choice theory. The Majority Vote model we analyze is defined as a voting process over independent Random Utility Models (RUMs). We infer a multidimensional embedding of each object or player, in contrast to the traditional one-dimensional embedding used by models such as the Thurstone or Bradley-Terry-Luce (BTL) models. We show that a three-dimensional majority vote model is capable of modeling arbitrarily strong and long intransitive cycles, and can also represent arbitrary pairwise comparison probabilities on any triplet. We provide experimental results that substantiate our claims regarding the effectiveness of our model in capturing intransitivity for various pairwise choice tasks such as predicting choices in recommendation systems, winners in online video games, and elections.
Tasks	Recommendation Systems
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02518v3
PDF	http://arxiv.org/pdf/1810.02518v3.pdf
PWC	https://paperswithcode.com/paper/social-choice-random-utility-models-of
Repo
Framework

Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle


Title	Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle
Authors	Rana Ali Amjad, Bernhard C. Geiger
Abstract	In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, making the optimization problem ill-posed, or it is piecewise constant, hence not admitting gradient-based optimization methods. Second, the invariance of the IB functional under bijections prevents it from capturing properties of the learned representation that are desirable for classification, such as robustness and simplicity. We argue that these issues are partly resolved for stochastic DNNs, DNNs that include a (hard or soft) decision rule, or by replacing the IB functional with related, but more well-behaved cost functions. We conclude that recent successes reported about training DNNs using the IB framework must be attributed to such solutions. As a side effect, our results indicate limitations of the IB framework for the analysis of DNNs. We also note that rather than trying to repair the inherent problems in the IB functional, a better approach may be to design regularizers on latent representation enforcing the desired properties directly.
Tasks
Published	2018-02-27
URL	http://arxiv.org/abs/1802.09766v6
PDF	http://arxiv.org/pdf/1802.09766v6.pdf
PWC	https://paperswithcode.com/paper/learning-representations-for-neural-network
Repo
Framework

Semantic Stereo for Incidental Satellite Images


Title	Semantic Stereo for Incidental Satellite Images
Authors	Marc Bosch, Kevin Foster, Gordon Christie, Sean Wang, Gregory D Hager, Myron Brown
Abstract	The increasingly common use of incidental satellite images for stereo reconstruction versus rigidly tasked binocular or trinocular coincident collection is helping to enable timely global-scale 3D mapping; however, reliable stereo correspondence from multi-date image pairs remains very challenging due to seasonal appearance differences and scene change. Promising recent work suggests that semantic scene segmentation can provide a robust regularizing prior for resolving ambiguities in stereo correspondence and reconstruction problems. To enable research for pairwise semantic stereo and multi-view semantic 3D reconstruction with incidental satellite images, we have established a large-scale public dataset including multi-view, multi-band satellite images and ground truth geometric and semantic labels for two large cities. To demonstrate the complementary nature of the stereo and segmentation tasks, we present lightweight public baselines adapted from recent state of the art convolutional neural network models and assess their performance.
Tasks	3D Reconstruction, Scene Segmentation
Published	2018-11-21
URL	http://arxiv.org/abs/1811.08739v1
PDF	http://arxiv.org/pdf/1811.08739v1.pdf
PWC	https://paperswithcode.com/paper/semantic-stereo-for-incidental-satellite
Repo
Framework

When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets


Title	When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets
Authors	Hong-Yu Zhou, Avital Oliver, Jianxin Wu, Yefeng Zheng
Abstract	Semi-Supervised Learning (SSL) has been proved to be an effective way to leverage both labeled and unlabeled data at the same time. Recent semi-supervised approaches focus on deep neural networks and have achieved promising results on several benchmarks: CIFAR10, CIFAR100 and SVHN. However, most of their experiments are based on models trained from scratch instead of pre-trained models. On the other hand, transfer learning has demonstrated its value when the target domain has limited labeled data. Here comes the intuitive question: is it possible to incorporate SSL when fine-tuning a pre-trained model? We comprehensively study how SSL methods starting from pretrained models perform under varying conditions, including training strategies, architecture choice and datasets. From this study, we obtain several interesting and useful observations. While practitioners have had an intuitive understanding of these observations, we do a comprehensive emperical analysis and demonstrate that: (1) the gains from SSL techniques over a fully-supervised baseline are smaller when trained from a pre-trained model than when trained from random initialization, (2) when the domain of the source data used to train the pre-trained model differs significantly from the domain of the target task, the gains from SSL are significantly higher and (3) some SSL methods are able to advance fully-supervised baselines (like Pseudo-Label). We hope our studies can deepen the understanding of SSL research and facilitate the process of developing more effective SSL methods to utilize pre-trained models. Code is now available at github.
Tasks	Transfer Learning
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05313v1
PDF	http://arxiv.org/pdf/1812.05313v1.pdf
PWC	https://paperswithcode.com/paper/when-semi-supervised-learning-meets-transfer
Repo
Framework

Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization


Title	Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
Authors	Xiaotian Li, Juha Ylioinas, Jakob Verbeek, Juho Kannala
Abstract	Image-based camera relocalization is an important problem in computer vision and robotics. Recent works utilize convolutional neural networks (CNNs) to regress for pixels in a query image their corresponding 3D world coordinates in the scene. The final pose is then solved via a RANSAC-based optimization scheme using the predicted coordinates. Usually, the CNN is trained with ground truth scene coordinates, but it has also been shown that the network can discover 3D scene geometry automatically by minimizing single-view reprojection loss. However, due to the deficiencies of the reprojection loss, the network needs to be carefully initialized. In this paper, we present a new angle-based reprojection loss, which resolves the issues of the original reprojection loss. With this new loss function, the network can be trained without careful initialization, and the system achieves more accurate results. The new loss also enables us to utilize available multi-view constraints, which further improve performance.
Tasks	Camera Relocalization
Published	2018-08-15
URL	http://arxiv.org/abs/1808.04999v2
PDF	http://arxiv.org/pdf/1808.04999v2.pdf
PWC	https://paperswithcode.com/paper/scene-coordinate-regression-with-angle-based
Repo
Framework

A 2.5D Cascaded Convolutional Neural Network with Temporal Information for Automatic Mitotic Cell Detection in 4D Microscopic Images


Title	A 2.5D Cascaded Convolutional Neural Network with Temporal Information for Automatic Mitotic Cell Detection in 4D Microscopic Images
Authors	Titinunt Kitrungrotsakul, Xian-Hau Han, Yutaro Iwamoto, Satoko Takemoto, Hideo Yokota, Sari Ipponjima, Tomomi Nemoto, Xiong Wei, Yen-Wei Chen
Abstract	In recent years, intravital skin imaging has been increasingly used in mammalian skin research to investigate cell behaviors. A fundamental step of the investigation is mitotic cell (cell division) detection. Because of the complex backgrounds (normal cells), the majority of the existing methods cause several false positives. In this paper, we proposed a 2.5D cascaded end-to-end convolutional neural network (CasDetNet) with temporal information to accurately detect automatic mitotic cell in 4D microscopic images with few training data. The CasDetNet consists of two 2.5D networks. The first one is used for detecting candidate cells with only volume information and the second one, containing temporal information, for reducing false positive and adding mitotic cells that were missed in the first step. The experimental results show that our CasDetNet can achieve higher precision and recall compared to other state-of-the-art methods.
Tasks
Published	2018-06-04
URL	http://arxiv.org/abs/1806.01018v2
PDF	http://arxiv.org/pdf/1806.01018v2.pdf
PWC	https://paperswithcode.com/paper/a-25d-cascaded-convolutional-neural-network
Repo
Framework

A Continuous-Time View of Early Stopping for Least Squares


Title	A Continuous-Time View of Early Stopping for Least Squares
Authors	Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani
Abstract	We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that of ridge regression. Under the calibration $t=1/\lambda$—where $t$ is the time parameter in gradient flow, and $\lambda$ the tuning parameter in ridge regression—we prove that the risk of gradient flow is no less than 1.69 times that of ridge, along the entire path (for all $t \geq 0$). This holds in finite samples with very weak assumptions on the data model (in particular, with no assumptions on the features $X$). We prove that the same relative risk bound holds for prediction risk, in an average sense over the underlying signal $\beta_0$. Finally, we examine limiting risk expressions (under standard Marchenko-Pastur asymptotics), and give supporting numerical experiments.
Tasks	Calibration
Published	2018-10-23
URL	http://arxiv.org/abs/1810.10082v4
PDF	http://arxiv.org/pdf/1810.10082v4.pdf
PWC	https://paperswithcode.com/paper/a-continuous-time-view-of-early-stopping-for
Repo
Framework

RPC Considered Harmful: Fast Distributed Deep Learning on RDMA


Title	RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Authors	Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, Lidong Zhou
Abstract	Deep learning emerges as an important new resource-intensive workload and has been successfully applied in computer vision, speech, natural language processing, and so on. Distributed deep learning is becoming a necessity to cope with growing data and model sizes. Its computation is typically characterized by a simple tensor data abstraction to model multi-dimensional matrices, a data-flow graph to model computation, and iterative executions with relatively frequent synchronizations, thereby making it substantially different from Map/Reduce style distributed big data computation. RPC, commonly used as the communication primitive, has been adopted by popular deep learning frameworks such as TensorFlow, which uses gRPC. We show that RPC is sub-optimal for distributed deep learning computation, especially on an RDMA-capable network. The tensor abstraction and data-flow graph, coupled with an RDMA network, offers the opportunity to reduce the unnecessary overhead (e.g., memory copy) without sacrificing programmability and generality. In particular, from a data access point of view, a remote machine is abstracted just as a “device” on an RDMA channel, with a simple memory interface for allocating, reading, and writing memory regions. Our graph analyzer looks at both the data flow graph and the tensors to optimize memory allocation and remote data access using this interface. The result is up to 25 times speedup in representative deep learning benchmarks against the standard gRPC in TensorFlow and up to 169% improvement even against an RPC implementation optimized for RDMA, leading to faster convergence in the training process.
Tasks
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08430v1
PDF	http://arxiv.org/pdf/1805.08430v1.pdf
PWC	https://paperswithcode.com/paper/rpc-considered-harmful-fast-distributed-deep
Repo
Framework

Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data


Title	Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data
Authors	Chao Li, Shohei Shimizu
Abstract	Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete data and apply a continuous-valued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a non-Gaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations.
Tasks	Causal Discovery, Model Selection
Published	2018-02-16
URL	http://arxiv.org/abs/1802.05889v1
PDF	http://arxiv.org/pdf/1802.05889v1.pdf
PWC	https://paperswithcode.com/paper/combining-linear-non-gaussian-acyclic-model
Repo
Framework

Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions


Title	Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions
Authors	Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, Ilya Razenshteyn
Abstract	We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. The notion is a natural analogue of the notion of a Lipschitz extension of a Lipschitz map. We show that for every map $f$ there exists an outer bi-Lipschitz extension $f'$ whose distortion is greater than that of $f$ by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem for outer bi-Lipschitz extensions. We also study outer bi-Lipschitz extensions of near-isometric maps and show upper and lower bounds for them. Then, we present applications of our results to prioritized and terminal dimension reduction problems. * We prove a prioritized variant of the Johnson-Lindenstrauss lemma: given a set of points $X\subset \mathbb{R}^d$ of size $N$ and a permutation (“priority ranking”) of $X$, there exists an embedding $f$ of $X$ into $\mathbb{R}^{O(\log N)}$ with distortion $O(\log \log N)$ such that the point of rank $j$ has only $O(\log^{3 + \varepsilon} j)$ non-zero coordinates - more specifically, all but the first $O(\log^{3+\varepsilon} j)$ coordinates are equal to $0$; the distortion of $f$ restricted to the first $j$ points (according to the ranking) is at most $O(\log\log j)$. The result makes a progress towards answering an open question by Elkin, Filtser, and Neiman about prioritized dimension reductions. * We prove that given a set $X$ of $N$ points in $\mathbb{R}^d$, there exists a terminal dimension reduction embedding of $\mathbb{R}^d$ into $\mathbb{R}^{d’}$, where $d’ = O\left(\frac{\log N}{\varepsilon^4}\right)$, which preserves distances $\x-y$ between points $x\in X$ and $y \in \mathbb{R}^{d}$, up to a multiplicative factor of $1 \pm \varepsilon$. This improves a recent result by Elkin, Filtser, and Neiman. The dimension reductions that we obtain are nonlinear, and this nonlinearity is necessary.
Tasks	Dimensionality Reduction
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03591v1
PDF	http://arxiv.org/pdf/1811.03591v1.pdf
PWC	https://paperswithcode.com/paper/nonlinear-dimension-reduction-via-outer-bi
Repo
Framework

Learning of Optimal Forecast Aggregation in Partial Evidence Environments


Title	Learning of Optimal Forecast Aggregation in Partial Evidence Environments
Authors	Yakov Babichenko, Dan Garber
Abstract	We consider the forecast aggregation problem in repeated settings, where the forecasts are done on a binary event. At each period multiple experts provide forecasts about an event. The goal of the aggregator is to aggregate those forecasts into a subjective accurate forecast. We assume that experts are Bayesian; namely they share a common prior, each expert is exposed to some evidence, and each expert applies Bayes rule to deduce his forecast. The aggregator is ignorant with respect to the information structure (i.e., distribution over evidence) according to which experts make their prediction. The aggregator observes the experts’ forecasts only. At the end of each period the actual state is realized. We focus on the question whether the aggregator can learn to aggregate optimally the forecasts of the experts, where the optimal aggregation is the Bayesian aggregation that takes into account all the information (evidence) in the system. We consider the class of partial evidence information structures, where each expert is exposed to a different subset of conditionally independent signals. Our main results are positive; We show that optimal aggregation can be learned in polynomial time in a quite wide range of instances of the partial evidence environments. We provide a tight characterization of the instances where learning is possible and impossible.
Tasks
Published	2018-02-20
URL	http://arxiv.org/abs/1802.07107v1
PDF	http://arxiv.org/pdf/1802.07107v1.pdf
PWC	https://paperswithcode.com/paper/learning-of-optimal-forecast-aggregation-in
Repo
Framework

Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization


Title	Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization
Authors	Marcella Astrid, Seung-Ik Lee, Beom-Su Seo
Abstract	Convolutional Neural Networks (CNNs) is one of successful method in many areas such as image classification tasks. However, the amount of memory and computational cost needed for CNNs inference obstructs them to run efficiently in mobile devices because of memory and computational ability limitation. One of the method to compress CNNs is compressing the layers iteratively, i.e. by layer-by-layer compression and fine-tuning, with CP-decomposition in convolutional layers. To compress with CP-decomposition, rank selection is important. In the previous approach rank selection that is based on sensitivity of each layer, the average rank of the network was still arbitrarily selected. Additionally, the rank of all layers were decided before whole process of iterative compression, while the rank of a layer can be changed after fine-tuning. Therefore, this paper proposes selecting rank of each layer using Variational Bayesian Matrix Factorization (VBMF) which is more systematic than arbitrary approach. Furthermore, to consider the change of each layer’s rank after fine-tuning of previous iteration, the method is applied just before compressing the target layer, i.e. after fine-tuning of the previous iteration. The results show better accuracy while also having more compression rate in AlexNet’s convolutional layers compression.
Tasks	Image Classification
Published	2018-01-16
URL	http://arxiv.org/abs/1801.05243v1
PDF	http://arxiv.org/pdf/1801.05243v1.pdf
PWC	https://paperswithcode.com/paper/rank-selection-of-cp-decomposed-convolutional
Repo
Framework

Visual Data Augmentation through Learning


Title	Visual Data Augmentation through Learning
Authors	Grigorios G. Chrysos, Yannis Panagakis, Stefanos Zafeiriou
Abstract	The rapid progress in machine learning methods has been empowered by i) huge datasets that have been collected and annotated, ii) improved engineering (e.g. data pre-processing/normalization). The existing datasets typically include several million samples, which constitutes their extension a colossal task. In addition, the state-of-the-art data-driven methods demand a vast amount of data, hence a standard engineering trick employed is artificial data augmentation for instance by adding into the data cropped and (affinely) transformed images. However, this approach does not correspond to any change in the natural 3D scene. We propose instead to perform data augmentation through learning realistic local transformations. We learn a forward and an inverse transformation that maps an image from the high-dimensional space of pixel intensities to a latent space which varies (approximately) linearly with the latent space of a realistically transformed version of the image. Such transformed images can be considered two successive frames in a video. Next, we utilize these transformations to learn a linear model that modifies the latent spaces and then use the inverse transformation to synthesize a new image. We argue that the this procedure produces powerful invariant representations. We perform both qualitative and quantitative experiments that demonstrate our proposed method creates new realistic images.
Tasks	Data Augmentation
Published	2018-01-20
URL	http://arxiv.org/abs/1801.06665v1
PDF	http://arxiv.org/pdf/1801.06665v1.pdf
PWC	https://paperswithcode.com/paper/visual-data-augmentation-through-learning
Repo
Framework

Generalization Bounds for Unsupervised Cross-Domain Mapping with WGANs


Title	Generalization Bounds for Unsupervised Cross-Domain Mapping with WGANs
Authors	Tomer Galanti, Sagie Benaim, Lior Wolf
Abstract	The recent empirical success of unsupervised cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with the adversarial training method called the Wasserstein GAN and derive a novel generalization bound, which limits the risk between the learned mapping $h$ and the target mapping $y$, by a sum of two terms: (i) the risk between $h$ and the most distant alternative mapping that was learned by the same cross-domain mapping algorithm, and (ii) the minimal Wasserstein GAN divergence between the target domain and the domain obtained by applying a hypothesis $h^$ on the samples of the source domain, where $h^$ is a hypothesis selected by the same algorithm. The bound is directly related to Occam’s razor and encourages the selection of the minimal architecture that supports a small Wasserstein GAN divergence. The bound leads to multiple algorithmic consequences, including a method for hyperparameters selection and for an early stopping in cross-domain mapping GANs. We also demonstrate a novel capability for unsupervised learning of estimating confidence in the mapping of every specific sample. Lastly, we show how non-minimal architectures can be effectively trained by an inverted knowledge distillation, in which a minimal architecture is used to train a larger one, leading to higher quality outputs.
Tasks
Published	2018-07-23
URL	https://arxiv.org/abs/1807.08501v3
PDF	https://arxiv.org/pdf/1807.08501v3.pdf
PWC	https://paperswithcode.com/paper/generalization-bounds-for-unsupervised-cross
Repo
Framework

Efficient Design of Hardware-Enabled Reservoir Computing in FPGAs


Title	Efficient Design of Hardware-Enabled Reservoir Computing in FPGAs
Authors	Bogdan Penkovsky, Laurent Larger, Daniel Brunner
Abstract	In this work, we propose a new approach towards the efficient optimization and implementation of reservoir computing hardware reducing the required domain expert knowledge and optimization effort. First, we adapt the reservoir input mask to the structure of the data via linear autoencoders. We therefore incorporate the advantages of dimensionality reduction and dimensionality expansion achieved by conventional algorithmically efficient linear algebra procedures of principal component analysis. Second, we employ evolutionary-inspired genetic algorithm techniques resulting in a highly efficient optimization of reservoir dynamics with dramatically reduced number of evaluations comparing to exhaustive search. We illustrate the method on the so-called single-node reservoir computing architecture, especially suitable for implementation in ultrahigh-speed hardware. The combination of both methods and the resulting reduction of time required for performance optimization of a hardware system establish a strategy towards machine learning hardware capable of self-adaption to optimally solve specific problems. We confirm the validity of those principles building reservoir computing hardware based on a field-programmable gate array.
Tasks	Dimensionality Reduction
Published	2018-05-04
URL	http://arxiv.org/abs/1805.03033v2
PDF	http://arxiv.org/pdf/1805.03033v2.pdf
PWC	https://paperswithcode.com/paper/efficient-design-of-hardware-enabled
Repo
Framework