Paper Group ANR 524
Social Choice Random Utility Models of Intransitive Pairwise Comparisons. Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle. Semantic Stereo for Incidental Satellite Images. When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets. Scene Coordinate Regr …
Social Choice Random Utility Models of Intransitive Pairwise Comparisons
Title | Social Choice Random Utility Models of Intransitive Pairwise Comparisons |
Authors | Rahul Makhijani, Johan Ugander |
Abstract | There is a growing need for discrete choice models that account for the complex nature of human choices, escaping traditional behavioral assumptions such as the transitivity of pairwise preferences. Recently, several parametric models of intransitive comparisons have been proposed, but in all cases the maximum likelihood problem is non-concave, making inference difficult. In this work we generalize this trend, showing that there cannot exist any parametric model with a concave log-likelihood function that can exhibit intransitive preferences. Given this result, we motivate a new model for analyzing intransitivity in pairwise comparisons, taking inspiration from the Condorcet method (majority vote) in social choice theory. The Majority Vote model we analyze is defined as a voting process over independent Random Utility Models (RUMs). We infer a multidimensional embedding of each object or player, in contrast to the traditional one-dimensional embedding used by models such as the Thurstone or Bradley-Terry-Luce (BTL) models. We show that a three-dimensional majority vote model is capable of modeling arbitrarily strong and long intransitive cycles, and can also represent arbitrary pairwise comparison probabilities on any triplet. We provide experimental results that substantiate our claims regarding the effectiveness of our model in capturing intransitivity for various pairwise choice tasks such as predicting choices in recommendation systems, winners in online video games, and elections. |
Tasks | Recommendation Systems |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02518v3 |
http://arxiv.org/pdf/1810.02518v3.pdf | |
PWC | https://paperswithcode.com/paper/social-choice-random-utility-models-of |
Repo | |
Framework | |
Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle
Title | Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle |
Authors | Rana Ali Amjad, Bernhard C. Geiger |
Abstract | In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, making the optimization problem ill-posed, or it is piecewise constant, hence not admitting gradient-based optimization methods. Second, the invariance of the IB functional under bijections prevents it from capturing properties of the learned representation that are desirable for classification, such as robustness and simplicity. We argue that these issues are partly resolved for stochastic DNNs, DNNs that include a (hard or soft) decision rule, or by replacing the IB functional with related, but more well-behaved cost functions. We conclude that recent successes reported about training DNNs using the IB framework must be attributed to such solutions. As a side effect, our results indicate limitations of the IB framework for the analysis of DNNs. We also note that rather than trying to repair the inherent problems in the IB functional, a better approach may be to design regularizers on latent representation enforcing the desired properties directly. |
Tasks | |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.09766v6 |
http://arxiv.org/pdf/1802.09766v6.pdf | |
PWC | https://paperswithcode.com/paper/learning-representations-for-neural-network |
Repo | |
Framework | |
Semantic Stereo for Incidental Satellite Images
Title | Semantic Stereo for Incidental Satellite Images |
Authors | Marc Bosch, Kevin Foster, Gordon Christie, Sean Wang, Gregory D Hager, Myron Brown |
Abstract | The increasingly common use of incidental satellite images for stereo reconstruction versus rigidly tasked binocular or trinocular coincident collection is helping to enable timely global-scale 3D mapping; however, reliable stereo correspondence from multi-date image pairs remains very challenging due to seasonal appearance differences and scene change. Promising recent work suggests that semantic scene segmentation can provide a robust regularizing prior for resolving ambiguities in stereo correspondence and reconstruction problems. To enable research for pairwise semantic stereo and multi-view semantic 3D reconstruction with incidental satellite images, we have established a large-scale public dataset including multi-view, multi-band satellite images and ground truth geometric and semantic labels for two large cities. To demonstrate the complementary nature of the stereo and segmentation tasks, we present lightweight public baselines adapted from recent state of the art convolutional neural network models and assess their performance. |
Tasks | 3D Reconstruction, Scene Segmentation |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08739v1 |
http://arxiv.org/pdf/1811.08739v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-stereo-for-incidental-satellite |
Repo | |
Framework | |
When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets
Title | When Semi-Supervised Learning Meets Transfer Learning: Training Strategies, Models and Datasets |
Authors | Hong-Yu Zhou, Avital Oliver, Jianxin Wu, Yefeng Zheng |
Abstract | Semi-Supervised Learning (SSL) has been proved to be an effective way to leverage both labeled and unlabeled data at the same time. Recent semi-supervised approaches focus on deep neural networks and have achieved promising results on several benchmarks: CIFAR10, CIFAR100 and SVHN. However, most of their experiments are based on models trained from scratch instead of pre-trained models. On the other hand, transfer learning has demonstrated its value when the target domain has limited labeled data. Here comes the intuitive question: is it possible to incorporate SSL when fine-tuning a pre-trained model? We comprehensively study how SSL methods starting from pretrained models perform under varying conditions, including training strategies, architecture choice and datasets. From this study, we obtain several interesting and useful observations. While practitioners have had an intuitive understanding of these observations, we do a comprehensive emperical analysis and demonstrate that: (1) the gains from SSL techniques over a fully-supervised baseline are smaller when trained from a pre-trained model than when trained from random initialization, (2) when the domain of the source data used to train the pre-trained model differs significantly from the domain of the target task, the gains from SSL are significantly higher and (3) some SSL methods are able to advance fully-supervised baselines (like Pseudo-Label). We hope our studies can deepen the understanding of SSL research and facilitate the process of developing more effective SSL methods to utilize pre-trained models. Code is now available at github. |
Tasks | Transfer Learning |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05313v1 |
http://arxiv.org/pdf/1812.05313v1.pdf | |
PWC | https://paperswithcode.com/paper/when-semi-supervised-learning-meets-transfer |
Repo | |
Framework | |
Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization
Title | Scene Coordinate Regression with Angle-Based Reprojection Loss for Camera Relocalization |
Authors | Xiaotian Li, Juha Ylioinas, Jakob Verbeek, Juho Kannala |
Abstract | Image-based camera relocalization is an important problem in computer vision and robotics. Recent works utilize convolutional neural networks (CNNs) to regress for pixels in a query image their corresponding 3D world coordinates in the scene. The final pose is then solved via a RANSAC-based optimization scheme using the predicted coordinates. Usually, the CNN is trained with ground truth scene coordinates, but it has also been shown that the network can discover 3D scene geometry automatically by minimizing single-view reprojection loss. However, due to the deficiencies of the reprojection loss, the network needs to be carefully initialized. In this paper, we present a new angle-based reprojection loss, which resolves the issues of the original reprojection loss. With this new loss function, the network can be trained without careful initialization, and the system achieves more accurate results. The new loss also enables us to utilize available multi-view constraints, which further improve performance. |
Tasks | Camera Relocalization |
Published | 2018-08-15 |
URL | http://arxiv.org/abs/1808.04999v2 |
http://arxiv.org/pdf/1808.04999v2.pdf | |
PWC | https://paperswithcode.com/paper/scene-coordinate-regression-with-angle-based |
Repo | |
Framework | |
A 2.5D Cascaded Convolutional Neural Network with Temporal Information for Automatic Mitotic Cell Detection in 4D Microscopic Images
Title | A 2.5D Cascaded Convolutional Neural Network with Temporal Information for Automatic Mitotic Cell Detection in 4D Microscopic Images |
Authors | Titinunt Kitrungrotsakul, Xian-Hau Han, Yutaro Iwamoto, Satoko Takemoto, Hideo Yokota, Sari Ipponjima, Tomomi Nemoto, Xiong Wei, Yen-Wei Chen |
Abstract | In recent years, intravital skin imaging has been increasingly used in mammalian skin research to investigate cell behaviors. A fundamental step of the investigation is mitotic cell (cell division) detection. Because of the complex backgrounds (normal cells), the majority of the existing methods cause several false positives. In this paper, we proposed a 2.5D cascaded end-to-end convolutional neural network (CasDetNet) with temporal information to accurately detect automatic mitotic cell in 4D microscopic images with few training data. The CasDetNet consists of two 2.5D networks. The first one is used for detecting candidate cells with only volume information and the second one, containing temporal information, for reducing false positive and adding mitotic cells that were missed in the first step. The experimental results show that our CasDetNet can achieve higher precision and recall compared to other state-of-the-art methods. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.01018v2 |
http://arxiv.org/pdf/1806.01018v2.pdf | |
PWC | https://paperswithcode.com/paper/a-25d-cascaded-convolutional-neural-network |
Repo | |
Framework | |
A Continuous-Time View of Early Stopping for Least Squares
Title | A Continuous-Time View of Early Stopping for Least Squares |
Authors | Alnur Ali, J. Zico Kolter, Ryan J. Tibshirani |
Abstract | We study the statistical properties of the iterates generated by gradient descent, applied to the fundamental problem of least squares regression. We take a continuous-time view, i.e., consider infinitesimal step sizes in gradient descent, in which case the iterates form a trajectory called gradient flow. Our primary focus is to compare the risk of gradient flow to that of ridge regression. Under the calibration $t=1/\lambda$—where $t$ is the time parameter in gradient flow, and $\lambda$ the tuning parameter in ridge regression—we prove that the risk of gradient flow is no less than 1.69 times that of ridge, along the entire path (for all $t \geq 0$). This holds in finite samples with very weak assumptions on the data model (in particular, with no assumptions on the features $X$). We prove that the same relative risk bound holds for prediction risk, in an average sense over the underlying signal $\beta_0$. Finally, we examine limiting risk expressions (under standard Marchenko-Pastur asymptotics), and give supporting numerical experiments. |
Tasks | Calibration |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.10082v4 |
http://arxiv.org/pdf/1810.10082v4.pdf | |
PWC | https://paperswithcode.com/paper/a-continuous-time-view-of-early-stopping-for |
Repo | |
Framework | |
RPC Considered Harmful: Fast Distributed Deep Learning on RDMA
Title | RPC Considered Harmful: Fast Distributed Deep Learning on RDMA |
Authors | Jilong Xue, Youshan Miao, Cheng Chen, Ming Wu, Lintao Zhang, Lidong Zhou |
Abstract | Deep learning emerges as an important new resource-intensive workload and has been successfully applied in computer vision, speech, natural language processing, and so on. Distributed deep learning is becoming a necessity to cope with growing data and model sizes. Its computation is typically characterized by a simple tensor data abstraction to model multi-dimensional matrices, a data-flow graph to model computation, and iterative executions with relatively frequent synchronizations, thereby making it substantially different from Map/Reduce style distributed big data computation. RPC, commonly used as the communication primitive, has been adopted by popular deep learning frameworks such as TensorFlow, which uses gRPC. We show that RPC is sub-optimal for distributed deep learning computation, especially on an RDMA-capable network. The tensor abstraction and data-flow graph, coupled with an RDMA network, offers the opportunity to reduce the unnecessary overhead (e.g., memory copy) without sacrificing programmability and generality. In particular, from a data access point of view, a remote machine is abstracted just as a “device” on an RDMA channel, with a simple memory interface for allocating, reading, and writing memory regions. Our graph analyzer looks at both the data flow graph and the tensors to optimize memory allocation and remote data access using this interface. The result is up to 25 times speedup in representative deep learning benchmarks against the standard gRPC in TensorFlow and up to 169% improvement even against an RPC implementation optimized for RDMA, leading to faster convergence in the training process. |
Tasks | |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08430v1 |
http://arxiv.org/pdf/1805.08430v1.pdf | |
PWC | https://paperswithcode.com/paper/rpc-considered-harmful-fast-distributed-deep |
Repo | |
Framework | |
Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data
Title | Combining Linear Non-Gaussian Acyclic Model with Logistic Regression Model for Estimating Causal Structure from Mixed Continuous and Discrete Data |
Authors | Chao Li, Shohei Shimizu |
Abstract | Estimating causal models from observational data is a crucial task in data analysis. For continuous-valued data, Shimizu et al. have proposed a linear acyclic non-Gaussian model to understand the data generating process, and have shown that their model is identifiable when the number of data is sufficiently large. However, situations in which continuous and discrete variables coexist in the same problem are common in practice. Most existing causal discovery methods either ignore the discrete data and apply a continuous-valued algorithm or discretize all the continuous data and then apply a discrete Bayesian network approach. These methods possibly loss important information when we ignore discrete data or introduce the approximation error due to discretization. In this paper, we define a novel hybrid causal model which consists of both continuous and discrete variables. The model assumes: (1) the value of a continuous variable is a linear function of its parent variables plus a non-Gaussian noise, and (2) each discrete variable is a logistic variable whose distribution parameters depend on the values of its parent variables. In addition, we derive the BIC scoring function for model selection. The new discovery algorithm can learn causal structures from mixed continuous and discrete data without discretization. We empirically demonstrate the power of our method through thorough simulations. |
Tasks | Causal Discovery, Model Selection |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.05889v1 |
http://arxiv.org/pdf/1802.05889v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-linear-non-gaussian-acyclic-model |
Repo | |
Framework | |
Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions
Title | Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions |
Authors | Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, Ilya Razenshteyn |
Abstract | We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. The notion is a natural analogue of the notion of a Lipschitz extension of a Lipschitz map. We show that for every map $f$ there exists an outer bi-Lipschitz extension $f'$ whose distortion is greater than that of $f$ by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem for outer bi-Lipschitz extensions. We also study outer bi-Lipschitz extensions of near-isometric maps and show upper and lower bounds for them. Then, we present applications of our results to prioritized and terminal dimension reduction problems. * We prove a prioritized variant of the Johnson-Lindenstrauss lemma: given a set of points $X\subset \mathbb{R}^d$ of size $N$ and a permutation (“priority ranking”) of $X$, there exists an embedding $f$ of $X$ into $\mathbb{R}^{O(\log N)}$ with distortion $O(\log \log N)$ such that the point of rank $j$ has only $O(\log^{3 + \varepsilon} j)$ non-zero coordinates - more specifically, all but the first $O(\log^{3+\varepsilon} j)$ coordinates are equal to $0$; the distortion of $f$ restricted to the first $j$ points (according to the ranking) is at most $O(\log\log j)$. The result makes a progress towards answering an open question by Elkin, Filtser, and Neiman about prioritized dimension reductions. * We prove that given a set $X$ of $N$ points in $\mathbb{R}^d$, there exists a terminal dimension reduction embedding of $\mathbb{R}^d$ into $\mathbb{R}^{d’}$, where $d’ = O\left(\frac{\log N}{\varepsilon^4}\right)$, which preserves distances $\x-y$ between points $x\in X$ and $y \in \mathbb{R}^{d}$, up to a multiplicative factor of $1 \pm \varepsilon$. This improves a recent result by Elkin, Filtser, and Neiman. The dimension reductions that we obtain are nonlinear, and this nonlinearity is necessary. |
Tasks | Dimensionality Reduction |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03591v1 |
http://arxiv.org/pdf/1811.03591v1.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-dimension-reduction-via-outer-bi |
Repo | |
Framework | |
Learning of Optimal Forecast Aggregation in Partial Evidence Environments
Title | Learning of Optimal Forecast Aggregation in Partial Evidence Environments |
Authors | Yakov Babichenko, Dan Garber |
Abstract | We consider the forecast aggregation problem in repeated settings, where the forecasts are done on a binary event. At each period multiple experts provide forecasts about an event. The goal of the aggregator is to aggregate those forecasts into a subjective accurate forecast. We assume that experts are Bayesian; namely they share a common prior, each expert is exposed to some evidence, and each expert applies Bayes rule to deduce his forecast. The aggregator is ignorant with respect to the information structure (i.e., distribution over evidence) according to which experts make their prediction. The aggregator observes the experts’ forecasts only. At the end of each period the actual state is realized. We focus on the question whether the aggregator can learn to aggregate optimally the forecasts of the experts, where the optimal aggregation is the Bayesian aggregation that takes into account all the information (evidence) in the system. We consider the class of partial evidence information structures, where each expert is exposed to a different subset of conditionally independent signals. Our main results are positive; We show that optimal aggregation can be learned in polynomial time in a quite wide range of instances of the partial evidence environments. We provide a tight characterization of the instances where learning is possible and impossible. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07107v1 |
http://arxiv.org/pdf/1802.07107v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-of-optimal-forecast-aggregation-in |
Repo | |
Framework | |
Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization
Title | Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization |
Authors | Marcella Astrid, Seung-Ik Lee, Beom-Su Seo |
Abstract | Convolutional Neural Networks (CNNs) is one of successful method in many areas such as image classification tasks. However, the amount of memory and computational cost needed for CNNs inference obstructs them to run efficiently in mobile devices because of memory and computational ability limitation. One of the method to compress CNNs is compressing the layers iteratively, i.e. by layer-by-layer compression and fine-tuning, with CP-decomposition in convolutional layers. To compress with CP-decomposition, rank selection is important. In the previous approach rank selection that is based on sensitivity of each layer, the average rank of the network was still arbitrarily selected. Additionally, the rank of all layers were decided before whole process of iterative compression, while the rank of a layer can be changed after fine-tuning. Therefore, this paper proposes selecting rank of each layer using Variational Bayesian Matrix Factorization (VBMF) which is more systematic than arbitrary approach. Furthermore, to consider the change of each layer’s rank after fine-tuning of previous iteration, the method is applied just before compressing the target layer, i.e. after fine-tuning of the previous iteration. The results show better accuracy while also having more compression rate in AlexNet’s convolutional layers compression. |
Tasks | Image Classification |
Published | 2018-01-16 |
URL | http://arxiv.org/abs/1801.05243v1 |
http://arxiv.org/pdf/1801.05243v1.pdf | |
PWC | https://paperswithcode.com/paper/rank-selection-of-cp-decomposed-convolutional |
Repo | |
Framework | |
Visual Data Augmentation through Learning
Title | Visual Data Augmentation through Learning |
Authors | Grigorios G. Chrysos, Yannis Panagakis, Stefanos Zafeiriou |
Abstract | The rapid progress in machine learning methods has been empowered by i) huge datasets that have been collected and annotated, ii) improved engineering (e.g. data pre-processing/normalization). The existing datasets typically include several million samples, which constitutes their extension a colossal task. In addition, the state-of-the-art data-driven methods demand a vast amount of data, hence a standard engineering trick employed is artificial data augmentation for instance by adding into the data cropped and (affinely) transformed images. However, this approach does not correspond to any change in the natural 3D scene. We propose instead to perform data augmentation through learning realistic local transformations. We learn a forward and an inverse transformation that maps an image from the high-dimensional space of pixel intensities to a latent space which varies (approximately) linearly with the latent space of a realistically transformed version of the image. Such transformed images can be considered two successive frames in a video. Next, we utilize these transformations to learn a linear model that modifies the latent spaces and then use the inverse transformation to synthesize a new image. We argue that the this procedure produces powerful invariant representations. We perform both qualitative and quantitative experiments that demonstrate our proposed method creates new realistic images. |
Tasks | Data Augmentation |
Published | 2018-01-20 |
URL | http://arxiv.org/abs/1801.06665v1 |
http://arxiv.org/pdf/1801.06665v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-data-augmentation-through-learning |
Repo | |
Framework | |
Generalization Bounds for Unsupervised Cross-Domain Mapping with WGANs
Title | Generalization Bounds for Unsupervised Cross-Domain Mapping with WGANs |
Authors | Tomer Galanti, Sagie Benaim, Lior Wolf |
Abstract | The recent empirical success of unsupervised cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with the adversarial training method called the Wasserstein GAN and derive a novel generalization bound, which limits the risk between the learned mapping $h$ and the target mapping $y$, by a sum of two terms: (i) the risk between $h$ and the most distant alternative mapping that was learned by the same cross-domain mapping algorithm, and (ii) the minimal Wasserstein GAN divergence between the target domain and the domain obtained by applying a hypothesis $h^$ on the samples of the source domain, where $h^$ is a hypothesis selected by the same algorithm. The bound is directly related to Occam’s razor and encourages the selection of the minimal architecture that supports a small Wasserstein GAN divergence. The bound leads to multiple algorithmic consequences, including a method for hyperparameters selection and for an early stopping in cross-domain mapping GANs. We also demonstrate a novel capability for unsupervised learning of estimating confidence in the mapping of every specific sample. Lastly, we show how non-minimal architectures can be effectively trained by an inverted knowledge distillation, in which a minimal architecture is used to train a larger one, leading to higher quality outputs. |
Tasks | |
Published | 2018-07-23 |
URL | https://arxiv.org/abs/1807.08501v3 |
https://arxiv.org/pdf/1807.08501v3.pdf | |
PWC | https://paperswithcode.com/paper/generalization-bounds-for-unsupervised-cross |
Repo | |
Framework | |
Efficient Design of Hardware-Enabled Reservoir Computing in FPGAs
Title | Efficient Design of Hardware-Enabled Reservoir Computing in FPGAs |
Authors | Bogdan Penkovsky, Laurent Larger, Daniel Brunner |
Abstract | In this work, we propose a new approach towards the efficient optimization and implementation of reservoir computing hardware reducing the required domain expert knowledge and optimization effort. First, we adapt the reservoir input mask to the structure of the data via linear autoencoders. We therefore incorporate the advantages of dimensionality reduction and dimensionality expansion achieved by conventional algorithmically efficient linear algebra procedures of principal component analysis. Second, we employ evolutionary-inspired genetic algorithm techniques resulting in a highly efficient optimization of reservoir dynamics with dramatically reduced number of evaluations comparing to exhaustive search. We illustrate the method on the so-called single-node reservoir computing architecture, especially suitable for implementation in ultrahigh-speed hardware. The combination of both methods and the resulting reduction of time required for performance optimization of a hardware system establish a strategy towards machine learning hardware capable of self-adaption to optimally solve specific problems. We confirm the validity of those principles building reservoir computing hardware based on a field-programmable gate array. |
Tasks | Dimensionality Reduction |
Published | 2018-05-04 |
URL | http://arxiv.org/abs/1805.03033v2 |
http://arxiv.org/pdf/1805.03033v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-design-of-hardware-enabled |
Repo | |
Framework | |