Paper Group ANR 374
Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks. Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent. Linearized Binary Regression. The Singular Values of Convolutional Layers. Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip. T …
Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks
Title | Learning the Multiple Traveling Salesmen Problem with Permutation Invariant Pooling Networks |
Authors | Yoav Kaempfer, Lior Wolf |
Abstract | While there are optimal TSP solvers, as well as recent learning-based approaches, the generalization of the TSP to the Multiple Traveling Salesmen Problem is much less studied. Here, we design a neural network solution that treats the salesmen, cities and depot as three different sets of varying cardinalities. We apply a novel technique that combines elements from recent architectures that were developed for sets, as well as elements from graph networks. Coupled with new constraint enforcing output layers, a dedicated loss, and a search method, our solution is shown to outperform all the meta-heuristics of the leading solver in the field. |
Tasks | |
Published | 2018-03-26 |
URL | http://arxiv.org/abs/1803.09621v2 |
http://arxiv.org/pdf/1803.09621v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-the-multiple-traveling-salesmen |
Repo | |
Framework | |
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Title | Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent |
Authors | Xiaowu Dai, Yuhua Zhu |
Abstract | Stochastic gradient descent (SGD) is almost ubiquitously used for training non-convex optimization tasks. Recently, a hypothesis proposed by Keskar et al. [2017] that large batch methods tend to converge to sharp minimizers has received increasing attention. We theoretically justify this hypothesis by providing new properties of SGD in both finite-time and asymptotic regimes. In particular, we give an explicit escaping time of SGD from a local minimum in the finite-time regime and prove that SGD tends to converge to flatter minima in the asymptotic regime (although may take exponential time to converge) regardless of the batch size. We also find that SGD with a larger ratio of learning rate to batch size tends to converge to a flat minimum faster, however, its generalization performance could be worse than the SGD with a smaller ratio of learning rate to batch size. We include numerical experiments to corroborate these theoretical findings. |
Tasks | |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00542v1 |
http://arxiv.org/pdf/1812.00542v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-theoretical-understanding-of-large |
Repo | |
Framework | |
Linearized Binary Regression
Title | Linearized Binary Regression |
Authors | Andrew S. Lan, Mung Chiang, Christoph Studer |
Abstract | Probit regression was first proposed by Bliss in 1934 to study mortality rates of insects. Since then, an extensive body of work has analyzed and used probit or related binary regression methods (such as logistic regression) in numerous applications and fields. This paper provides a fresh angle to such well-established binary regression methods. Concretely, we demonstrate that linearizing the probit model in combination with linear estimators performs on par with state-of-the-art nonlinear regression methods, such as posterior mean or maximum aposteriori estimation, for a broad range of real-world regression problems. We derive exact, closed-form, and nonasymptotic expressions for the mean-squared error of our linearized estimators, which clearly separates them from nonlinear regression methods that are typically difficult to analyze. We showcase the efficacy of our methods and results for a number of synthetic and real-world datasets, which demonstrates that linearized binary regression finds potential use in a variety of inference, estimation, signal processing, and machine learning applications that deal with binary-valued observations or measurements. |
Tasks | |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00430v1 |
http://arxiv.org/pdf/1802.00430v1.pdf | |
PWC | https://paperswithcode.com/paper/linearized-binary-regression |
Repo | |
Framework | |
The Singular Values of Convolutional Layers
Title | The Singular Values of Convolutional Layers |
Authors | Hanie Sedghi, Vineet Gupta, Philip M. Long |
Abstract | We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%. |
Tasks | |
Published | 2018-05-26 |
URL | http://arxiv.org/abs/1805.10408v2 |
http://arxiv.org/pdf/1805.10408v2.pdf | |
PWC | https://paperswithcode.com/paper/the-singular-values-of-convolutional-layers |
Repo | |
Framework | |
Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip
Title | Energy-Efficient CMOS Memristive Synapses for Mixed-Signal Neuromorphic System-on-a-Chip |
Authors | Vishal Saxena, Xinyu Wu, Kehan Zhu |
Abstract | Emerging non-volatile memory (NVM), or memristive, devices promise energy-efficient realization of deep learning, when efficiently integrated with mixed-signal integrated circuits on a CMOS substrate. Even though several algorithmic challenges need to be addressed to turn the vision of memristive Neuromorphic Systems-on-a-Chip (NeuSoCs) into reality, issues at the device and circuit interface need immediate attention from the community. In this work, we perform energy-estimation of a NeuSoC system and predict the desirable circuit and device parameters for energy-efficiency optimization. Also, CMOS synapse circuits based on the concept of CMOS memristor emulator are presented as a system prototyping methodology, while practical memristor devices are being developed and integrated with general-purpose CMOS. The proposed mixed-signal memristive synapse can be designed and fabricated using standard CMOS technologies and open doors to interesting applications in cognitive computing circuits. |
Tasks | |
Published | 2018-02-07 |
URL | http://arxiv.org/abs/1802.02342v3 |
http://arxiv.org/pdf/1802.02342v3.pdf | |
PWC | https://paperswithcode.com/paper/energy-efficient-cmos-memristive-synapses-for |
Repo | |
Framework | |
Temporally Object-based Video Co-Segmentation
Title | Temporally Object-based Video Co-Segmentation |
Authors | Michael Ying Yang, Matthias Reso, Jun Tang, Wentong Liao, Bodo Rosenhahn |
Abstract | In this paper, we propose an unsupervised video object co-segmentation framework based on the primary object proposals to extract the common foreground object(s) from a given video set. In addition to the objectness attributes and motion coherence our framework exploits the temporal consistency of the object-like regions between adjacent frames to enrich the set of original object proposals. We call the enriched proposal sets temporal proposal streams, as they are composed of the most similar proposals from each frame augmented with predicted proposals using temporally consistent superpixel information. The temporal proposal streams represent all the possible region tubes of the objects. Therefore, we formulate a graphical model to select a proposal stream for each object in which the pairwise potentials consist of the appearance dissimilarity between different streams in the same video and also the similarity between the streams in different videos. This model is suitable for single (multiple) foreground objects in two (more) videos, which can be solved by any existing energy minimization method. We evaluate our proposed framework by comparing it to other video co-segmentation algorithms. Our method achieves improved performance on state-of-the-art benchmark datasets. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03279v1 |
http://arxiv.org/pdf/1802.03279v1.pdf | |
PWC | https://paperswithcode.com/paper/temporally-object-based-video-co-segmentation |
Repo | |
Framework | |
A Multi-Level Deep Ensemble Model for Skin Lesion Classification in Dermoscopy Images
Title | A Multi-Level Deep Ensemble Model for Skin Lesion Classification in Dermoscopy Images |
Authors | Yutong Xie, Jianpeng Zhang, Yong Xia |
Abstract | A multi-level deep ensemble (MLDE) model that can be trained in an ‘end to end’ manner is proposed for skin lesion classification in dermoscopy images. In this model, four pre-trained ResNet-50 networks are used to characterize the multiscale information of skin lesions and are combined by using an adaptive weighting scheme that can be learned during the error back propagation. The proposed MLDE model achieved an average AUC value of 86.5% on the ISIC-skin 2018 official validation dataset, which is substantially higher than the average AUC values achieved by each of four ResNet-50 networks. |
Tasks | Skin Lesion Classification |
Published | 2018-07-23 |
URL | http://arxiv.org/abs/1807.08488v1 |
http://arxiv.org/pdf/1807.08488v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multi-level-deep-ensemble-model-for-skin |
Repo | |
Framework | |
Data Augmentation for Neural Online Chat Response Selection
Title | Data Augmentation for Neural Online Chat Response Selection |
Authors | Wenchao Du, Alan W Black |
Abstract | Data augmentation seeks to manipulate the available data for training to improve the generalization ability of models. We investigate two data augmentation proxies, permutation and flipping, for neural dialog response selection task on various models over multiple datasets, including both Chinese and English languages. Different from standard data augmentation techniques, our method combines the original and synthesized data for prediction. Empirical results show that our approach can gain 1 to 3 recall-at-1 points over baseline models in both full-scale and small-scale settings. |
Tasks | Data Augmentation |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00428v1 |
http://arxiv.org/pdf/1809.00428v1.pdf | |
PWC | https://paperswithcode.com/paper/data-augmentation-for-neural-online-chat |
Repo | |
Framework | |
Minimum Margin Loss for Deep Face Recognition
Title | Minimum Margin Loss for Deep Face Recognition |
Authors | Xin Wei, Hui Wang, Bryan Scotney, Huan Wan |
Abstract | Face recognition has achieved great progress owing to the fast development of the deep neural network in the past a few years. As an important part of deep neural networks, a number of the loss functions have been proposed which significantly improve the state-of-the-art methods. In this paper, we proposed a new loss function called Minimum Margin Loss (MML) which aims at enlarging the margin of those overclose class centre pairs so as to enhance the discriminative ability of the deep features. MML supervises the training process together with the Softmax Loss and the Centre Loss, and also makes up the defect of Softmax + Centre Loss. The experimental results on MegaFace, LFW and YTF datasets show that the proposed method achieves the state-of-the-art performance, which demonstrates the effectiveness of the proposed MML. |
Tasks | Face Recognition |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06741v4 |
http://arxiv.org/pdf/1805.06741v4.pdf | |
PWC | https://paperswithcode.com/paper/minimum-margin-loss-for-deep-face-recognition |
Repo | |
Framework | |
Composing Entropic Policies using Divergence Correction
Title | Composing Entropic Policies using Divergence Correction |
Authors | Jonathan J Hunt, Andre Barreto, Timothy P Lillicrap, Nicolas Heess |
Abstract | Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning. Here, we analyze two recent works composing behaviors represented in the form of action-value functions and show that they perform poorly in some situations. As part of this analysis, we extend an important generalization of policy improvement to the maximum entropy framework and introduce an algorithm for the practical implementation of successor features in continuous action spaces. Then we propose a novel approach which addresses the failure cases of prior work and, in principle, recovers the optimal policy during transfer. This method works by explicitly learning the (discounted, future) divergence between base policies. We study this approach in the tabular case and on non-trivial continuous control problems with compositional structure and show that it outperforms or matches existing methods across all tasks considered. |
Tasks | Continuous Control |
Published | 2018-12-05 |
URL | https://arxiv.org/abs/1812.02216v2 |
https://arxiv.org/pdf/1812.02216v2.pdf | |
PWC | https://paperswithcode.com/paper/entropic-policy-composition-with-generalized |
Repo | |
Framework | |
Regression by clustering using Metropolis-Hastings
Title | Regression by clustering using Metropolis-Hastings |
Authors | Adolfo Quiroz, Simón Ramírez-Amaya, Álvaro Riascos |
Abstract | High quality risk adjustment in health insurance markets weakens insurer incentives to engage in inefficient behavior to attract lower-cost enrollees. We propose a novel methodology based on Markov Chain Monte Carlo methods to improve risk adjustment by clustering diagnostic codes into risk groups optimal for health expenditure prediction. We test the performance of our methodology against common alternatives using panel data from 500 thousand enrollees of the Colombian Healthcare System. Results show that our methodology outperforms common alternatives and suggest that it has potential to improve access to quality healthcare for the chronically ill. |
Tasks | |
Published | 2018-11-29 |
URL | https://arxiv.org/abs/1811.12295v2 |
https://arxiv.org/pdf/1811.12295v2.pdf | |
PWC | https://paperswithcode.com/paper/regression-by-clustering-using-metropolis |
Repo | |
Framework | |
Statistical learning of geometric characteristics of wireless networks
Title | Statistical learning of geometric characteristics of wireless networks |
Authors | Antoine Brochard, Bartłomiej Błaszczyszyn, Stéphane Mallat, Sixin Zhang |
Abstract | Motivated by the prediction of cell loads in cellular networks, we formulate the following new, fundamental problem of statistical learning of geometric marks of point processes: An unknown marking function, depending on the geometry of point patterns, produces characteristics (marks) of the points. One aims at learning this function from the examples of marked point patterns in order to predict the marks of new point patterns. To approximate (interpolate) the marking function, in our baseline approach, we build a statistical regression model of the marks with respect some local point distance representation. In a more advanced approach, we use a global data representation via the scattering moments of random measures, which build informative and stable to deformations data representation, already proven useful in image analysis and related application domains. In this case, the regression of the scattering moments of the marked point patterns with respect to the non-marked ones is combined with the numerical solution of the inverse problem, where the marks are recovered from the estimated scattering moments. Considering some simple, generic marks, often appearing in the modeling of wireless networks, such as the shot-noise values, nearest neighbour distance, and some characteristics of the Voronoi cells, we show that the scattering moments can capture similar geometry information as the baseline approach, and can reach even better performance, especially for non-local marking functions. Our results motivate further development of statistical learning tools for stochastic geometry and analysis of wireless networks, in particular to predict cell loads in cellular networks from the locations of base stations and traffic demand. |
Tasks | Point Processes |
Published | 2018-12-19 |
URL | http://arxiv.org/abs/1812.08265v1 |
http://arxiv.org/pdf/1812.08265v1.pdf | |
PWC | https://paperswithcode.com/paper/statistical-learning-of-geometric |
Repo | |
Framework | |
CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation
Title | CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation |
Authors | Radek Mackowiak, Philip Lenz, Omair Ghori, Ferran Diego, Oliver Lange, Carsten Rother |
Abstract | State of the art methods for semantic image segmentation are trained in a supervised fashion using a large corpus of fully labeled training images. However, gathering such a corpus is expensive, due to human annotation effort, in contrast to gathering unlabeled data. We propose an active learning-based strategy, called CEREALS, in which a human only has to hand-label a few, automatically selected, regions within an unlabeled image corpus. This minimizes human annotation effort while maximizing the performance of a semantic image segmentation method. The automatic selection procedure is achieved by: a) using a suitable information measure combined with an estimate about human annotation effort, which is inferred from a learned cost model, and b) exploiting the spatial coherency of an image. The performance of CEREALS is demonstrated on Cityscapes, where we are able to reduce the annotation effort to 17%, while keeping 95% of the mean Intersection over Union (mIoU) of a model that was trained with the fully annotated training set of Cityscapes. |
Tasks | Active Learning, Semantic Segmentation |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.09726v1 |
http://arxiv.org/pdf/1810.09726v1.pdf | |
PWC | https://paperswithcode.com/paper/cereals-cost-effective-region-based-active |
Repo | |
Framework | |
High-Dimensional Vector Semantics
Title | High-Dimensional Vector Semantics |
Authors | M. Andrecut |
Abstract | In this paper we explore the “vector semantics” problem from the perspective of “almost orthogonal” property of high-dimensional random vectors. We show that this intriguing property can be used to “memorize” random vectors by simply adding them, and we provide an efficient probabilistic solution to the set membership problem. Also, we discuss several applications to word context vector embeddings, document sentences similarity, and spam filtering. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.09914v1 |
http://arxiv.org/pdf/1802.09914v1.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-vector-semantics |
Repo | |
Framework | |
Predicting Inpatient Discharge Prioritization With Electronic Health Records
Title | Predicting Inpatient Discharge Prioritization With Electronic Health Records |
Authors | Anand Avati, Stephen Pfohl, Chris Lin, Thao Nguyen, Meng Zhang, Philip Hwang, Jessica Wetstone, Kenneth Jung, Andrew Ng, Nigam H. Shah |
Abstract | Identifying patients who will be discharged within 24 hours can improve hospital resource management and quality of care. We studied this problem using eight years of Electronic Health Records (EHR) data from Stanford Hospital. We fit models to predict 24 hour discharge across the entire inpatient population. The best performing models achieved an area under the receiver-operator characteristic curve (AUROC) of 0.85 and an AUPRC of 0.53 on a held out test set. This model was also well calibrated. Finally, we analyzed the utility of this model in a decision theoretic framework to identify regions of ROC space in which using the model increases expected utility compared to the trivial always negative or always positive classifiers. |
Tasks | |
Published | 2018-12-02 |
URL | http://arxiv.org/abs/1812.00371v1 |
http://arxiv.org/pdf/1812.00371v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-inpatient-discharge-prioritization |
Repo | |
Framework | |