Paper Group ANR 1105
Efficiently testing local optimality and escaping saddles for ReLU networks. Parameter Re-Initialization through Cyclical Batch Size Schedules. Newton-Krylov PDE-constrained LDDMM in the space of band-limited vector fields. Game Tree Search in a Robust Multistage Optimization Framework: Exploiting Pruning Mechanisms. Stochastic Neighbor Embedding u …
Efficiently testing local optimality and escaping saddles for ReLU networks
Title | Efficiently testing local optimality and escaping saddles for ReLU networks |
Authors | Chulhee Yun, Suvrit Sra, Ali Jadbabaie |
Abstract | We provide a theoretical algorithm for checking local optimality and escaping saddles at nondifferentiable points of empirical risks of two-layer ReLU networks. Our algorithm receives any parameter value and returns: local minimum, second-order stationary point, or a strict descent direction. The presence of $M$ data points on the nondifferentiability of the ReLU divides the parameter space into at most $2^M$ regions, which makes analysis difficult. By exploiting polyhedral geometry, we reduce the total computation down to one convex quadratic program (QP) for each hidden node, $O(M)$ (in)equality tests, and one (or a few) nonconvex QP. For the last QP, we show that our specific problem can be solved efficiently, in spite of nonconvexity. In the benign case, we solve one equality constrained QP, and we prove that projected gradient descent solves it exponentially fast. In the bad case, we have to solve a few more inequality constrained QPs, but we prove that the time complexity is exponential only in the number of inequality constraints. Our experiments show that either benign case or bad case with very few inequality constraints occurs, implying that our algorithm is efficient in most cases. |
Tasks | |
Published | 2018-09-28 |
URL | https://arxiv.org/abs/1809.10858v2 |
https://arxiv.org/pdf/1809.10858v2.pdf | |
PWC | https://paperswithcode.com/paper/efficiently-testing-local-optimality-and |
Repo | |
Framework | |
Parameter Re-Initialization through Cyclical Batch Size Schedules
Title | Parameter Re-Initialization through Cyclical Batch Size Schedules |
Authors | Norman Mu, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney |
Abstract | Optimal parameter initialization remains a crucial problem for neural network training. A poor weight initialization may take longer to train and/or converge to sub-optimal solutions. Here, we propose a method of weight re-initialization by repeated annealing and injection of noise in the training process. We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training. We evaluate our methods through extensive experiments on tasks in language modeling, natural language inference, and image classification. We demonstrate the ability of our method to improve language modeling performance by up to 7.91 perplexity and reduce training iterations by up to $61%$, in addition to its flexibility in enabling snapshot ensembling and use with adversarial training. |
Tasks | Image Classification, Language Modelling, Natural Language Inference |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01216v1 |
http://arxiv.org/pdf/1812.01216v1.pdf | |
PWC | https://paperswithcode.com/paper/parameter-re-initialization-through-cyclical |
Repo | |
Framework | |
Newton-Krylov PDE-constrained LDDMM in the space of band-limited vector fields
Title | Newton-Krylov PDE-constrained LDDMM in the space of band-limited vector fields |
Authors | Monica Hernandez |
Abstract | PDE-constrained Large Deformation Diffeomorphic Metric Mapping is a particularly interesting framework of physically meaningful diffeomorphic registration methods. Newton-Krylov optimization has shown an excellent numerical accuracy and an extraordinarily fast convergence rate in this framework. However, the most significant limitation of PDE-constrained LDDMM is the huge computational complexity, that hinders the extensive use in Computational Anatomy applications. In this work, we propose two PDE-constrained LDDMM methods parameterized in the space of band-limited vector fields and we evaluate their performance with respect to the most related state of the art methods. The parameterization in the space of band-limited vector fields dramatically alleviates the computational burden avoiding the computation of the high-frequency components of the velocity fields that would be suppressed by the action of the low-pass filters involved in the computation of the gradient and the Hessian-vector products. Besides, the proposed methods have shown an improved accuracy with respect to the benchmark methods. |
Tasks | |
Published | 2018-07-13 |
URL | http://arxiv.org/abs/1807.05117v1 |
http://arxiv.org/pdf/1807.05117v1.pdf | |
PWC | https://paperswithcode.com/paper/newton-krylov-pde-constrained-lddmm-in-the |
Repo | |
Framework | |
Game Tree Search in a Robust Multistage Optimization Framework: Exploiting Pruning Mechanisms
Title | Game Tree Search in a Robust Multistage Optimization Framework: Exploiting Pruning Mechanisms |
Authors | Michael Hartisch, Ulf Lorenz |
Abstract | We investigate pruning in search trees of so-called quantified integer linear programs (QIPs). QIPs consist of a set of linear inequalities and a minimax objective function, where some variables are existentially and others are universally quantified. They can be interpreted as two-person zero-sum games between an existential and a universal player on the one hand, or multistage optimization problems under uncertainty on the other hand. Solutions are so-called winning strategies for the existential player that specify how to react on moves of the universal player - i.e. certain assignments of universally quantified variables - to certainly win the game. QIPs can be solved with the help of game tree search that is enhanced with non-chronological back-jumping. We develop and theoretically substantiate pruning techniques based upon (algebraic) properties similar to pruning mechanisms known from linear programming and quantified boolean formulas. The presented Strategic Copy-Pruning mechanism allows to \textit{implicitly} deduce the existence of a strategy in linear time (by static examination of the QIP-matrix) without explicitly traversing the strategy itself. We show that the implementation of our findings can massively speed up the search process. |
Tasks | |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12146v1 |
http://arxiv.org/pdf/1811.12146v1.pdf | |
PWC | https://paperswithcode.com/paper/game-tree-search-in-a-robust-multistage |
Repo | |
Framework | |
Stochastic Neighbor Embedding under f-divergences
Title | Stochastic Neighbor Embedding under f-divergences |
Authors | Daniel Jiwoong Im, Nakul Verma, Kristin Branson |
Abstract | The t-distributed Stochastic Neighbor Embedding (t-SNE) is a powerful and popular method for visualizing high-dimensional data. It minimizes the Kullback-Leibler (KL) divergence between the original and embedded data distributions. In this work, we propose extending this method to other f-divergences. We analytically and empirically evaluate the types of latent structure-manifold, cluster, and hierarchical-that are well-captured using both the original KL-divergence as well as the proposed f-divergence generalization, and find that different divergences perform better for different types of structure. A common concern with $t$-SNE criterion is that it is optimized using gradient descent, and can become stuck in poor local minima. We propose optimizing the f-divergence based loss criteria by minimizing a variational bound. This typically performs better than optimizing the primal form, and our experiments show that it can improve upon the embedding results obtained from the original $t$-SNE criterion as well. |
Tasks | |
Published | 2018-11-03 |
URL | http://arxiv.org/abs/1811.01247v1 |
http://arxiv.org/pdf/1811.01247v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-neighbor-embedding-under-f |
Repo | |
Framework | |
Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data
Title | Bridging the Generalization Gap: Training Robust Models on Confounded Biological Data |
Authors | Tzu-Yu Liu, Ajay Kannan, Adam Drake, Marvin Bertin, Nathan Wan |
Abstract | Statistical learning on biological data can be challenging due to confounding variables in sample collection and processing. Confounders can cause models to generalize poorly and result in inaccurate prediction performance metrics if models are not validated thoroughly. In this paper, we propose methods to control for confounding factors and further improve prediction performance. We introduce OrthoNormal basis construction In cOnfounding factor Normalization (ONION) to remove confounding covariates and use the Domain-Adversarial Neural Network (DANN) to penalize models for encoding confounder information. We apply the proposed methods to simulated and empirical patient data and show significant improvements in generalization. |
Tasks | |
Published | 2018-12-12 |
URL | http://arxiv.org/abs/1812.04778v1 |
http://arxiv.org/pdf/1812.04778v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-the-generalization-gap-training |
Repo | |
Framework | |
Do Neural Network Cross-Modal Mappings Really Bridge Modalities?
Title | Do Neural Network Cross-Modal Mappings Really Bridge Modalities? |
Authors | Guillem Collell, Marie-Francine Moens |
Abstract | Feed-forward networks are widely used in cross-modal applications to bridge modalities by mapping distributed vectors of one modality to the other, or to a shared space. The predicted vectors are then used to perform e.g., retrieval or labeling. Thus, the success of the whole system relies on the ability of the mapping to make the neighborhood structure (i.e., the pairwise similarities) of the predicted vectors akin to that of the target vectors. However, whether this is achieved has not been investigated yet. Here, we propose a new similarity measure and two ad hoc experiments to shed light on this issue. In three cross-modal benchmarks we learn a large number of language-to-vision and vision-to-language neural network mappings (up to five layers) using a rich diversity of image and text features and loss functions. Our results reveal that, surprisingly, the neighborhood structure of the predicted vectors consistently resembles more that of the input vectors than that of the target vectors. In a second experiment, we further show that untrained nets do not significantly disrupt the neighborhood (i.e., semantic) structure of the input vectors. |
Tasks | |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07616v2 |
http://arxiv.org/pdf/1805.07616v2.pdf | |
PWC | https://paperswithcode.com/paper/do-neural-network-cross-modal-mappings-really |
Repo | |
Framework | |
Energy Flow Networks: Deep Sets for Particle Jets
Title | Energy Flow Networks: Deep Sets for Particle Jets |
Authors | Patrick T. Komiske, Eric M. Metodiev, Jesse Thaler |
Abstract | A key question for machine learning approaches in particle physics is how to best represent and learn from collider events. As an event is intrinsically a variable-length unordered set of particles, we build upon recent machine learning efforts to learn directly from sets of features or “point clouds”. Adapting and specializing the “Deep Sets” framework to particle physics, we introduce Energy Flow Networks, which respect infrared and collinear safety by construction. We also develop Particle Flow Networks, which allow for general energy dependence and the inclusion of additional particle-level information such as charge and flavor. These networks feature a per-particle internal (latent) representation, and summing over all particles yields an overall event-level latent representation. We show how this latent space decomposition unifies existing event representations based on detector images and radiation moments. To demonstrate the power and simplicity of this set-based approach, we apply these networks to the collider task of discriminating quark jets from gluon jets, finding similar or improved performance compared to existing methods. We also show how the learned event representation can be directly visualized, providing insight into the inner workings of the model. These architectures lend themselves to efficiently processing and analyzing events for a wide variety of tasks at the Large Hadron Collider. Implementations and examples of our architectures are available online in our EnergyFlow package. |
Tasks | |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05165v2 |
http://arxiv.org/pdf/1810.05165v2.pdf | |
PWC | https://paperswithcode.com/paper/energy-flow-networks-deep-sets-for-particle |
Repo | |
Framework | |
Hybrid Multi-camera Visual Servoing to Moving Target
Title | Hybrid Multi-camera Visual Servoing to Moving Target |
Authors | Hanz Cuevas-Velasquez, Nanbo Li, Radim Tylecek, Marcelo Saval-Calvo, Robert B. Fisher |
Abstract | Visual servoing is a well-known task in robotics. However, there are still challenges when multiple visual sources are combined to accurately guide the robot or occlusions appear. In this paper we present a novel visual servoing approach using hybrid multi-camera input data to lead a robot arm accurately to dynamically moving target points in the presence of partial occlusions. The approach uses four RGBD sensors as Eye-to-Hand (EtoH) visual input, and an arm-mounted stereo camera as Eye-in-Hand (EinH). A Master supervisor task selects between using the EtoH or the EinH, depending on the distance between the robot and target. The Master also selects the subset of EtoH cameras that best perceive the target. When the EinH sensor is used, if the target becomes occluded or goes out of the sensor’s view-frustum, the Master switches back to the EtoH sensors to re-track the object. Using this adaptive visual input data, the robot is then controlled using an iterative planner that uses position, orientation and joint configuration to estimate the trajectory. Since the target is dynamic, this trajectory is updated every time-step. Experiments show good performance in four different situations: tracking a ball, targeting a bulls-eye, guiding a straw to a mouth and delivering an item to a moving hand. The experiments cover both simple situations such as a ball that is mostly visible from all cameras, and more complex situations such as the mouth which is partially occluded from some of the sensors. |
Tasks | |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.02285v2 |
http://arxiv.org/pdf/1803.02285v2.pdf | |
PWC | https://paperswithcode.com/paper/hybrid-multi-camera-visual-servoing-to-moving |
Repo | |
Framework | |
An unsupervised and customizable misspelling generator for mining noisy health-related text sources
Title | An unsupervised and customizable misspelling generator for mining noisy health-related text sources |
Authors | Abeed Sarker, Graciela Gonzalez-Hernandez |
Abstract | In this paper, we present a customizable datacentric system that automatically generates common misspellings for complex health-related terms. The spelling variant generator relies on a dense vector model learned from large unlabeled text, which is used to find semantically close terms to the original/seed keyword, followed by the filtering of terms that are lexically dissimilar beyond a given threshold. The process is executed recursively, converging when no new terms similar (lexically and semantically) to the seed keyword are found. Weighting of intra-word character sequence similarities allows further problem-specific customization of the system. On a dataset prepared for this study, our system outperforms the current state-of-the-art for medication name variant generation with best F1-score of 0.69 and F1/4-score of 0.78. Extrinsic evaluation of the system on a set of cancer-related terms showed an increase of over 67% in retrieval rate from Twitter posts when the generated variants are included. Our proposed spelling variant generator has several advantages over the current state-of-the-art and other types of variant generators-(i) it is capable of filtering out lexically similar but semantically dissimilar terms, (ii) the number of variants generated is low as many low-frequency and ambiguous misspellings are filtered out, and (iii) the system is fully automatic, customizable and easily executable. While the base system is fully unsupervised, we show how supervision maybe employed to adjust weights for task-specific customization. The performance and significant relative simplicity of our proposed approach makes it a much needed misspelling generation resource for health-related text mining from noisy sources. The source code for the system has been made publicly available for research purposes. |
Tasks | |
Published | 2018-06-04 |
URL | http://arxiv.org/abs/1806.00910v1 |
http://arxiv.org/pdf/1806.00910v1.pdf | |
PWC | https://paperswithcode.com/paper/an-unsupervised-and-customizable-misspelling |
Repo | |
Framework | |
Deep Factorised Inverse-Sketching
Title | Deep Factorised Inverse-Sketching |
Authors | Kaiyue Pang, Da Li, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales |
Abstract | Modelling human free-hand sketches has become topical recently, driven by practical applications such as fine-grained sketch based image retrieval (FG-SBIR). Sketches are clearly related to photo edge-maps, but a human free-hand sketch of a photo is not simply a clean rendering of that photo’s edge map. Instead there is a fundamental process of abstraction and iconic rendering, where overall geometry is warped and salient details are selectively included. In this paper we study this sketching process and attempt to invert it. We model this inversion by translating iconic free-hand sketches to contours that resemble more geometrically realistic projections of object boundaries, and separately factorise out the salient added details. This factorised re-representation makes it easier to match a free-hand sketch to a photo instance of an object. Specifically, we propose a novel unsupervised image style transfer model based on enforcing a cyclic embedding consistency constraint. A deep FG-SBIR model is then formulated to accommodate complementary discriminative detail from each factorised sketch for better matching with the corresponding photo. Our method is evaluated both qualitatively and quantitatively to demonstrate its superiority over a number of state-of-the-art alternatives for style transfer and FG-SBIR. |
Tasks | Image Retrieval, Sketch-Based Image Retrieval, Style Transfer |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02313v1 |
http://arxiv.org/pdf/1808.02313v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-factorised-inverse-sketching |
Repo | |
Framework | |
3D Regression Neural Network for the Quantification of Enlarged Perivascular Spaces in Brain MRI
Title | 3D Regression Neural Network for the Quantification of Enlarged Perivascular Spaces in Brain MRI |
Authors | Florian Dubost, Hieab Adams, Gerda Bortsova, M. Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen de Bruijne |
Abstract | Enlarged perivascular spaces (EPVS) in the brain are an emerging imaging marker for cerebral small vessel disease, and have been shown to be related to increased risk of various neurological diseases, including stroke and dementia. Automatic quantification of EPVS would greatly help to advance research into its etiology and its potential as a risk indicator of disease. We propose a convolutional network regression method to quantify the extent of EPVS in the basal ganglia from 3D brain MRI. We first segment the basal ganglia and subsequently apply a 3D convolutional regression network designed for small object detection within this region of interest. The network takes an image as input, and outputs a quantification score of EPVS. The network has significantly more convolution operations than pooling ones and no final activation, allowing it to span the space of real numbers. We validated our approach using a dataset of 2000 brain MRI scans scored visually. Experiments with varying sizes of training and test sets showed that a good performance can be achieved with a training set of only 200 scans. With a training set of 1000 scans, the intraclass correlation coefficient (ICC) between our scoring method and the expert’s visual score was 0.74. Our method outperforms by a large margin - more than 0.10 - four more conventional automated approaches based on intensities, scale-invariant feature transform, and random forest. We show that the network learns the structures of interest and investigate the influence of hyper-parameters on the performance. We also evaluate the reproducibility of our network using a set of 60 subjects scanned twice (scan-rescan reproducibility). On this set our network achieves an ICC of 0.93, while the intrarater agreement reaches 0.80. Furthermore, the automatic EPVS scoring correlates similarly to age as visual scoring. |
Tasks | Object Detection, Small Object Detection |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.05914v2 |
http://arxiv.org/pdf/1802.05914v2.pdf | |
PWC | https://paperswithcode.com/paper/3d-regression-neural-network-for-the |
Repo | |
Framework | |
Impulsive Noise Robust Sparse Recovery via Continuous Mixed Norm
Title | Impulsive Noise Robust Sparse Recovery via Continuous Mixed Norm |
Authors | Amirhossein Javaheri, Hadi Zayyani, Mario A. T. Figueiredo, Farrokh Marvasti |
Abstract | This paper investigates the problem of sparse signal recovery in the presence of additive impulsive noise. The heavytailed impulsive noise is well modelled with stable distributions. Since there is no explicit formulation for the probability density function of $S\alpha S$ distribution, alternative approximations like Generalized Gaussian Distribution (GGD) are used which impose $\ell_p$-norm fidelity on the residual error. In this paper, we exploit a Continuous Mixed Norm (CMN) for robust sparse recovery instead of $\ell_p$-norm. We show that in blind conditions, i.e., in case where the parameters of noise distribution are unknown, incorporating CMN can lead to near optimal recovery. We apply Alternating Direction Method of Multipliers (ADMM) for solving the problem induced by utilizing CMN for robust sparse recovery. In this approach, CMN is replaced with a surrogate function and Majorization-Minimization technique is incorporated to solve the problem. Simulation results confirm the efficiency of the proposed method compared to some recent algorithms in the literature for impulsive noise robust sparse recovery. |
Tasks | |
Published | 2018-04-12 |
URL | http://arxiv.org/abs/1804.04614v1 |
http://arxiv.org/pdf/1804.04614v1.pdf | |
PWC | https://paperswithcode.com/paper/impulsive-noise-robust-sparse-recovery-via |
Repo | |
Framework | |
Cross-City Transfer Learning for Deep Spatio-Temporal Prediction
Title | Cross-City Transfer Learning for Deep Spatio-Temporal Prediction |
Authors | Leye Wang, Xu Geng, Xiaojuan Ma, Feng Liu, Qiang Yang |
Abstract | Spatio-temporal prediction is a key type of tasks in urban computing, e.g., traffic flow and air quality. Adequate data is usually a prerequisite, especially when deep learning is adopted. However, the development levels of different cities are unbalanced, and still many cities suffer from data scarcity. To address the problem, we propose a novel cross-city transfer learning method for deep spatio-temporal prediction tasks, called RegionTrans. RegionTrans aims to effectively transfer knowledge from a data-rich source city to a data-scarce target city. More specifically, we first learn an inter-city region matching function to match each target city region to a similar source city region. A neural network is designed to effectively extract region-level representation for spatio-temporal prediction. Finally, an optimization algorithm is proposed to transfer learned features from the source city to the target city with the region matching function. Using citywide crowd flow prediction as a demonstration experiment, we verify the effectiveness of RegionTrans. Results show that RegionTrans can outperform the state-of-the-art fine-tuning deep spatio-temporal prediction models by reducing up to 10.7% prediction error. |
Tasks | Transfer Learning |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00386v2 |
http://arxiv.org/pdf/1802.00386v2.pdf | |
PWC | https://paperswithcode.com/paper/cross-city-transfer-learning-for-deep-spatio |
Repo | |
Framework | |
Nonparametric Pricing Analytics with Customer Covariates
Title | Nonparametric Pricing Analytics with Customer Covariates |
Authors | Ningyuan Chen, Guillermo Gallego |
Abstract | Personalized pricing analytics is becoming an essential tool in retailing. Upon observing the personalized information of each arriving customer, the firm needs to set a price accordingly based on the covariates such as income, education background, past purchasing history to extract more revenue. For new entrants of the business, the lack of historical data may severely limit the power and profitability of personalized pricing. We propose a nonparametric pricing policy to simultaneously learn the preference of customers based on the covariates and maximize the expected revenue over a finite horizon. The policy does not depend on any prior assumptions on how the personalized information affects consumers’ preferences (such as linear models). It is adaptively splits the covariate space into smaller bins (hyper-rectangles) and clusters customers based on their covariates and preferences, offering similar prices for customers who belong to the same cluster trading off granularity and accuracy. We show that the algorithm achieves a regret of order $O(\log(T)^2 T^{(2+d)/(4+d)})$, where $T$ is the length of the horizon and $d$ is the dimension of the covariate. It improves the current regret in the literature \citep{slivkins2014contextual}, under mild technical conditions in the pricing context (smoothness and local concavity). We also prove that no policy can achieve a regret less than $O(T^{(2+d)/(4+d)})$ for a particular instance and thus demonstrate the near optimality of the proposed policy. |
Tasks | |
Published | 2018-05-03 |
URL | https://arxiv.org/abs/1805.01136v3 |
https://arxiv.org/pdf/1805.01136v3.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-learning-and-optimization-with |
Repo | |
Framework | |