October 19, 2019

2548 words 12 mins read

Paper Group ANR 374

Paper Group ANR 374

Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks. Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent. Linearized Binary Regression. The Singular Values of Convolutional Layers. Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip. T …

Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks

Title Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks
Authors Yoav Kaempfer, Lior Wolf
Abstract While there are optimal TSP solvers, as well as recent learning-based approaches, the generalization of the TSP to the Multiple Traveling Salesmen Problem is much less studied. Here, we design a neural network solution that treats the salesmen, cities and depot as three different sets of varying cardinalities. We apply a novel technique that combines elements from recent architectures that were developed for sets, as well as elements from graph networks. Coupled with new constraint enforcing output layers, a dedicated loss, and a search method, our solution is shown to outperform all the meta-heuristics of the leading solver in the field.
Tasks
Published 2018-03-26
URL http://arxiv.org/abs/1803.09621v2
PDF http://arxiv.org/pdf/1803.09621v2.pdf
PWC https://paperswithcode.com/paper/learning-the-multiple-traveling-salesmen
Repo
Framework

Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent

Title Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Authors Xiaowu Dai, Yuhua Zhu
Abstract Stochastic gradient descent (SGD) is almost ubiquitously used for training non-convex optimization tasks. Recently, a hypothesis proposed by Keskar et al. [2017] that large batch methods tend to converge to sharp minimizers has received increasing attention. We theoretically justify this hypothesis by providing new properties of SGD in both finite-time and asymptotic regimes. In particular, we give an explicit escaping time of SGD from a local minimum in the finite-time regime and prove that SGD tends to converge to flatter minima in the asymptotic regime (although may take exponential time to converge) regardless of the batch size. We also find that SGD with a larger ratio of learning rate to batch size tends to converge to a flat minimum faster, however, its generalization performance could be worse than the SGD with a smaller ratio of learning rate to batch size. We include numerical experiments to corroborate these theoretical findings.
Tasks
Published 2018-12-03
URL http://arxiv.org/abs/1812.00542v1
PDF http://arxiv.org/pdf/1812.00542v1.pdf
PWC https://paperswithcode.com/paper/towards-theoretical-understanding-of-large
Repo
Framework

Linearized Binary Regression

Title Linearized Binary Regression
Authors Andrew S. Lan, Mung Chiang, Christoph Studer
Abstract Probit regression was first proposed by Bliss in 1934 to study mortality rates of insects. Since then, an extensive body of work has analyzed and used probit or related binary regression methods (such as logistic regression) in numerous applications and fields. This paper provides a fresh angle to such well-established binary regression methods. Concretely, we demonstrate that linearizing the probit model in combination with linear estimators performs on par with state-of-the-art nonlinear regression methods, such as posterior mean or maximum aposteriori estimation, for a broad range of real-world regression problems. We derive exact, closed-form, and nonasymptotic expressions for the mean-squared error of our linearized estimators, which clearly separates them from nonlinear regression methods that are typically difficult to analyze. We showcase the efficacy of our methods and results for a number of synthetic and real-world datasets, which demonstrates that linearized binary regression finds potential use in a variety of inference, estimation, signal processing, and machine learning applications that deal with binary-valued observations or measurements.
Tasks
Published 2018-02-01
URL http://arxiv.org/abs/1802.00430v1
PDF http://arxiv.org/pdf/1802.00430v1.pdf
PWC https://paperswithcode.com/paper/linearized-binary-regression
Repo
Framework

The Singular Values of Convolutional Layers

Title The Singular Values of Convolutional Layers
Authors Hanie Sedghi, Vineet Gupta, Philip M. Long
Abstract We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%.
Tasks
Published 2018-05-26
URL http://arxiv.org/abs/1805.10408v2
PDF http://arxiv.org/pdf/1805.10408v2.pdf
PWC https://paperswithcode.com/paper/the-singular-values-of-convolutional-layers
Repo
Framework

Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip

Title Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip
Authors Vishal Saxena, Xinyu Wu, Kehan Zhu
Abstract Emerging non-volatile memory (NVM), or memristive, devices promise energy-efficient realization of deep learning, when efficiently integrated with mixed-signal integrated circuits on a CMOS substrate. Even though several algorithmic challenges need to be addressed to turn the vision of memristive Neuromorphic Systems-on-a-Chip (NeuSoCs) into reality, issues at the device and circuit interface need immediate attention from the community. In this work, we perform energy-estimation of a NeuSoC system and predict the desirable circuit and device parameters for energy-efficiency optimization. Also, CMOS synapse circuits based on the concept of CMOS memristor emulator are presented as a system prototyping methodology, while practical memristor devices are being developed and integrated with general-purpose CMOS. The proposed mixed-signal memristive synapse can be designed and fabricated using standard CMOS technologies and open doors to interesting applications in cognitive computing circuits.
Tasks
Published 2018-02-07
URL http://arxiv.org/abs/1802.02342v3
PDF http://arxiv.org/pdf/1802.02342v3.pdf
PWC https://paperswithcode.com/paper/energy-efficient-cmos-memristive-synapses-for
Repo
Framework

Temporally Object-based Video Co-Segmentation

Title Temporally Object-based Video Co-Segmentation
Authors Michael Ying Yang, Matthias Reso, Jun Tang, Wentong Liao, Bodo Rosenhahn
Abstract In this paper, we propose an unsupervised video object co-segmentation framework based on the primary object proposals to extract the common foreground object(s) from a given video set. In addition to the objectness attributes and motion coherence our framework exploits the temporal consistency of the object-like regions between adjacent frames to enrich the set of original object proposals. We call the enriched proposal sets temporal proposal streams, as they are composed of the most similar proposals from each frame augmented with predicted proposals using temporally consistent superpixel information. The temporal proposal streams represent all the possible region tubes of the objects. Therefore, we formulate a graphical model to select a proposal stream for each object in which the pairwise potentials consist of the appearance dissimilarity between different streams in the same video and also the similarity between the streams in different videos. This model is suitable for single (multiple) foreground objects in two (more) videos, which can be solved by any existing energy minimization method. We evaluate our proposed framework by comparing it to other video co-segmentation algorithms. Our method achieves improved performance on state-of-the-art benchmark datasets.
Tasks
Published 2018-02-09
URL http://arxiv.org/abs/1802.03279v1
PDF http://arxiv.org/pdf/1802.03279v1.pdf
PWC https://paperswithcode.com/paper/temporally-object-based-video-co-segmentation
Repo
Framework

A Multi-Level Deep Ensemble Model for Skin Lesion Classification in Dermoscopy Images

Title A Multi-Level Deep Ensemble Model for Skin Lesion Classification in Dermoscopy Images
Authors Yutong Xie, Jianpeng Zhang, Yong Xia
Abstract A multi-level deep ensemble (MLDE) model that can be trained in an ‘end to end’ manner is proposed for skin lesion classification in dermoscopy images. In this model, four pre-trained ResNet-50 networks are used to characterize the multiscale information of skin lesions and are combined by using an adaptive weighting scheme that can be learned during the error back propagation. The proposed MLDE model achieved an average AUC value of 86.5% on the ISIC-skin 2018 official validation dataset, which is substantially higher than the average AUC values achieved by each of four ResNet-50 networks.
Tasks Skin Lesion Classification
Published 2018-07-23
URL http://arxiv.org/abs/1807.08488v1
PDF http://arxiv.org/pdf/1807.08488v1.pdf
PWC https://paperswithcode.com/paper/a-multi-level-deep-ensemble-model-for-skin
Repo
Framework

Data Augmentation for Neural Online Chat Response Selection

Title Data Augmentation for Neural Online Chat Response Selection
Authors Wenchao Du, Alan W Black
Abstract Data augmentation seeks to manipulate the available data for training to improve the generalization ability of models. We investigate two data augmentation proxies, permutation and flipping, for neural dialog response selection task on various models over multiple datasets, including both Chinese and English languages. Different from standard data augmentation techniques, our method combines the original and synthesized data for prediction. Empirical results show that our approach can gain 1 to 3 recall-at-1 points over baseline models in both full-scale and small-scale settings.
Tasks Data Augmentation
Published 2018-09-03
URL http://arxiv.org/abs/1809.00428v1
PDF http://arxiv.org/pdf/1809.00428v1.pdf
PWC https://paperswithcode.com/paper/data-augmentation-for-neural-online-chat
Repo
Framework

Minimum Margin Loss for Deep Face Recognition

Title Minimum Margin Loss for Deep Face Recognition
Authors Xin Wei, Hui Wang, Bryan Scotney, Huan Wan
Abstract Face recognition has achieved great progress owing to the fast development of the deep neural network in the past a few years. As an important part of deep neural networks, a number of the loss functions have been proposed which significantly improve the state-of-the-art methods. In this paper, we proposed a new loss function called Minimum Margin Loss (MML) which aims at enlarging the margin of those overclose class centre pairs so as to enhance the discriminative ability of the deep features. MML supervises the training process together with the Softmax Loss and the Centre Loss, and also makes up the defect of Softmax + Centre Loss. The experimental results on MegaFace, LFW and YTF datasets show that the proposed method achieves the state-of-the-art performance, which demonstrates the effectiveness of the proposed MML.
Tasks Face Recognition
Published 2018-05-17
URL http://arxiv.org/abs/1805.06741v4
PDF http://arxiv.org/pdf/1805.06741v4.pdf
PWC https://paperswithcode.com/paper/minimum-margin-loss-for-deep-face-recognition
Repo
Framework

Composing Entropic Policies using Divergence Correction

Title Composing Entropic Policies using Divergence Correction
Authors Jonathan J Hunt, Andre Barreto, Timothy P Lillicrap, Nicolas Heess
Abstract Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning. Here, we analyze two recent works composing behaviors represented in the form of action-value functions and show that they perform poorly in some situations. As part of this analysis, we extend an important generalization of policy improvement to the maximum entropy framework and introduce an algorithm for the practical implementation of successor features in continuous action spaces. Then we propose a novel approach which addresses the failure cases of prior work and, in principle, recovers the optimal policy during transfer. This method works by explicitly learning the (discounted, future) divergence between base policies. We study this approach in the tabular case and on non-trivial continuous control problems with compositional structure and show that it outperforms or matches existing methods across all tasks considered.
Tasks Continuous Control
Published 2018-12-05
URL https://arxiv.org/abs/1812.02216v2
PDF https://arxiv.org/pdf/1812.02216v2.pdf
PWC https://paperswithcode.com/paper/entropic-policy-composition-with-generalized
Repo
Framework

Regression by clustering using Metropolis-Hastings

Title Regression by clustering using Metropolis-Hastings
Authors Adolfo Quiroz, Simón Ramírez-Amaya, Álvaro Riascos
Abstract High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte Carlo methods to improve risk adjustment by clustering diagnostic codes into risk groups optimal for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from 500 thousand enrollees of the Colombian Healthcare System. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill.
Tasks
Published 2018-11-29
URL https://arxiv.org/abs/1811.12295v2
PDF https://arxiv.org/pdf/1811.12295v2.pdf
PWC https://paperswithcode.com/paper/regression-by-clustering-using-metropolis
Repo
Framework

Statistical learning of geometric characteristics of wireless networks

Title Statistical learning of geometric characteristics of wireless networks
Authors Antoine Brochard, Bartłomiej Błaszczyszyn, Stéphane Mallat, Sixin Zhang
Abstract Motivated by the prediction of cell loads in cellular networks, we formulate the following new, fundamental problem of statistical learning of geometric marks of point processes: An unknown marking function, depending on the geometry of point patterns, produces characteristics (marks) of the points. One aims at learning this function from the examples of marked point patterns in order to predict the marks of new point patterns. To approximate (interpolate) the marking function, in our baseline approach, we build a statistical regression model of the marks with respect some local point distance representation. In a more advanced approach, we use a global data representation via the scattering moments of random measures, which build informative and stable to deformations data representation, already proven useful in image analysis and related application domains. In this case, the regression of the scattering moments of the marked point patterns with respect to the non-marked ones is combined with the numerical solution of the inverse problem, where the marks are recovered from the estimated scattering moments. Considering some simple, generic marks, often appearing in the modeling of wireless networks, such as the shot-noise values, nearest neighbour distance, and some characteristics of the Voronoi cells, we show that the scattering moments can capture similar geometry information as the baseline approach, and can reach even better performance, especially for non-local marking functions. Our results motivate further development of statistical learning tools for stochastic geometry and analysis of wireless networks, in particular to predict cell loads in cellular networks from the locations of base stations and traffic demand.
Tasks Point Processes
Published 2018-12-19
URL http://arxiv.org/abs/1812.08265v1
PDF http://arxiv.org/pdf/1812.08265v1.pdf
PWC https://paperswithcode.com/paper/statistical-learning-of-geometric
Repo
Framework

CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation

Title CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation
Authors Radek Mackowiak, Philip Lenz, Omair Ghori, Ferran Diego, Oliver Lange, Carsten Rother
Abstract State of the art methods for semantic image segmentation are trained in a supervised fashion using a large corpus of fully labeled training images. However, gathering such a corpus is expensive, due to human annotation effort, in contrast to gathering unlabeled data. We propose an active learning-based strategy, called CEREALS, in which a human only has to hand-label a few, automatically selected, regions within an unlabeled image corpus. This minimizes human annotation effort while maximizing the performance of a semantic image segmentation method. The automatic selection procedure is achieved by: a) using a suitable information measure combined with an estimate about human annotation effort, which is inferred from a learned cost model, and b) exploiting the spatial coherency of an image. The performance of CEREALS is demonstrated on Cityscapes, where we are able to reduce the annotation effort to 17%, while keeping 95% of the mean Intersection over Union (mIoU) of a model that was trained with the fully annotated training set of Cityscapes.
Tasks Active Learning, Semantic Segmentation
Published 2018-10-23
URL http://arxiv.org/abs/1810.09726v1
PDF http://arxiv.org/pdf/1810.09726v1.pdf
PWC https://paperswithcode.com/paper/cereals-cost-effective-region-based-active
Repo
Framework

High-Dimensional Vector Semantics

Title High-Dimensional Vector Semantics
Authors M. Andrecut
Abstract In this paper we explore the “vector semantics” problem from the perspective of “almost orthogonal” property of high-dimensional random vectors. We show that this intriguing property can be used to “memorize” random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering.
Tasks
Published 2018-02-23
URL http://arxiv.org/abs/1802.09914v1
PDF http://arxiv.org/pdf/1802.09914v1.pdf
PWC https://paperswithcode.com/paper/high-dimensional-vector-semantics
Repo
Framework

Predicting Inpatient Discharge Prioritization With Electronic Health Records

Title Predicting Inpatient Discharge Prioritization With Electronic Health Records
Authors Anand Avati, Stephen Pfohl, Chris Lin, Thao Nguyen, Meng Zhang, Philip Hwang, Jessica Wetstone, Kenneth Jung, Andrew Ng, Nigam H. Shah
Abstract Identifying patients who will be discharged within 24 hours can improve hospital resource management and quality of care. We studied this problem using eight years of Electronic Health Records (EHR) data from Stanford Hospital. We fit models to predict 24 hour discharge across the entire inpatient population. The best performing models achieved an area under the receiver-operator characteristic curve (AUROC) of 0.85 and an AUPRC of 0.53 on a held out test set. This model was also well calibrated. Finally, we analyzed the utility of this model in a decision theoretic framework to identify regions of ROC space in which using the model increases expected utility compared to the trivial always negative or always positive classifiers.
Tasks
Published 2018-12-02
URL http://arxiv.org/abs/1812.00371v1
PDF http://arxiv.org/pdf/1812.00371v1.pdf
PWC https://paperswithcode.com/paper/predicting-inpatient-discharge-prioritization
Repo
Framework
comments powered by Disqus