Paper Group ANR 944
Fast Optimal Bandwidth Selection for RBF Kernel using Reproducing Kernel Hilbert Space Operators for Kernel Based Classifiers. Deep State Space Models for Unconditional Word Generation. MedAL: Deep Active Learning Sampling Method for Medical Image Analysis. Eliminating Latent Discrimination: Train Then Mask. Physics-constrained, data-driven discove …
Fast Optimal Bandwidth Selection for RBF Kernel using Reproducing Kernel Hilbert Space Operators for Kernel Based Classifiers
Title | Fast Optimal Bandwidth Selection for RBF Kernel using Reproducing Kernel Hilbert Space Operators for Kernel Based Classifiers |
Authors | Bharath Bhushan Damodaran |
Abstract | Kernel based methods have shown effective performance in many remote sensing classification tasks. However their performance significantly depend on its hyper-parameters. The conventional technique to estimate the parameter comes with high computational complexity. Thus, the objective of this letter is to propose an fast and efficient method to select the bandwidth parameter of the Gaussian kernel in the kernel based classification methods. The proposed method is developed based on the operators in the reproducing kernel Hilbert space and it is evaluated on Support vector machines and PerTurbo classification method. Experiments conducted with hyperspectral datasets show that our proposed method outperforms the state-of-art method in terms in computational time and classification performance. |
Tasks | |
Published | 2018-04-14 |
URL | http://arxiv.org/abs/1804.05214v1 |
http://arxiv.org/pdf/1804.05214v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-optimal-bandwidth-selection-for-rbf |
Repo | |
Framework | |
Deep State Space Models for Unconditional Word Generation
Title | Deep State Space Models for Unconditional Word Generation |
Authors | Florian Schmidt, Thomas Hofmann |
Abstract | Autoregressive feedback is considered a necessity for successful unconditional text generation using stochastic sequence models. However, such feedback is known to introduce systematic biases into the training process and it obscures a principle of generation: committing to global information and forgetting local nuances. We show that a non-autoregressive deep state space model with a clear separation of global and local uncertainty can be built from only two ingredients: An independent noise source and a deterministic transition function. Recent advances on flow-based variational inference can be used to train an evidence lower-bound without resorting to annealing, auxiliary losses or similar measures. The result is a highly interpretable generative model on par with comparable auto-regressive models on the task of word generation. |
Tasks | Text Generation |
Published | 2018-06-12 |
URL | http://arxiv.org/abs/1806.04550v2 |
http://arxiv.org/pdf/1806.04550v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-state-space-models-for-unconditional |
Repo | |
Framework | |
MedAL: Deep Active Learning Sampling Method for Medical Image Analysis
Title | MedAL: Deep Active Learning Sampling Method for Medical Image Analysis |
Authors | Asim Smailagic, Hae Young Noh, Pedro Costa, Devesh Walawalkar, Kartik Khandelwal, Mostafa Mirshekari, Jonathon Fagert, Adrián Galdrán, Susu Xu |
Abstract | Deep learning models have been successfully used in medical image analysis problems but they require a large amount of labeled images to obtain good performance.Deep learning models have been successfully used in medical image analysis problems but they require a large amount of labeled images to obtain good performance. However, such large labeled datasets are costly to acquire. Active learning techniques can be used to minimize the number of required training labels while maximizing the model’s performance.In this work, we propose a novel sampling method that queries the unlabeled examples that maximize the average distance to all training set examples in a learned feature space. We then extend our sampling method to define a better initial training set, without the need for a trained model, by using ORB feature descriptors. We validate MedAL on 3 medical image datasets and show that our method is robust to different dataset properties. MedAL is also efficient, achieving 80% accuracy on the task of Diabetic Retinopathy detection using only 425 labeled images, corresponding to a 32% reduction in the number of required labeled examples compared to the standard uncertainty sampling technique, and a 40% reduction compared to random sampling. |
Tasks | Active Learning, Diabetic Retinopathy Detection |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09287v2 |
http://arxiv.org/pdf/1809.09287v2.pdf | |
PWC | https://paperswithcode.com/paper/medal-deep-active-learning-sampling-method |
Repo | |
Framework | |
Eliminating Latent Discrimination: Train Then Mask
Title | Eliminating Latent Discrimination: Train Then Mask |
Authors | Soheil Ghili, Ehsan Kazemi, Amin Karbasi |
Abstract | How can we control for latent discrimination in predictive models? How can we provably remove it? Such questions are at the heart of algorithmic fairness and its impacts on society. In this paper, we define a new operational fairness criteria, inspired by the well-understood notion of omitted variable-bias in statistics and econometrics. Our notion of fairness effectively controls for sensitive features and provides diagnostics for deviations from fair decision making. We then establish analytical and algorithmic results about the existence of a fair classifier in the context of supervised learning. Our results readily imply a simple, but rather counter-intuitive, strategy for eliminating latent discrimination. In order to prevent other features proxying for sensitive features, we need to include sensitive features in the training phase, but exclude them in the test/evaluation phase while controlling for their effects. We evaluate the performance of our algorithm on several real-world datasets and show how fairness for these datasets can be improved with a very small loss in accuracy. |
Tasks | Decision Making |
Published | 2018-11-12 |
URL | http://arxiv.org/abs/1811.04973v2 |
http://arxiv.org/pdf/1811.04973v2.pdf | |
PWC | https://paperswithcode.com/paper/eliminating-latent-discrimination-train-then |
Repo | |
Framework | |
Physics-constrained, data-driven discovery of coarse-grained dynamics
Title | Physics-constrained, data-driven discovery of coarse-grained dynamics |
Authors | L. Felsberger, P. S. Koutsourelakis |
Abstract | The combination of high-dimensionality and disparity of time scales encountered in many problems in computational physics has motivated the development of coarse-grained (CG) models. In this paper, we advocate the paradigm of data-driven discovery for extract- ing governing equations by employing fine-scale simulation data. In particular, we cast the coarse-graining process under a probabilistic state-space model where the transition law dic- tates the evolution of the CG state variables and the emission law the coarse-to-fine map. The directed probabilistic graphical model implied, suggests that given values for the fine- grained (FG) variables, probabilistic inference tools must be employed to identify the cor- responding values for the CG states and to that end, we employ Stochastic Variational In- ference. We advocate a sparse Bayesian learning perspective which avoids overfitting and reveals the most salient features in the CG evolution law. The formulation adopted enables the quantification of a crucial, and often neglected, component in the CG process, i.e. the pre- dictive uncertainty due to information loss. Furthermore, it is capable of reconstructing the evolution of the full, fine-scale system. We demonstrate the efficacy of the proposed frame- work in high-dimensional systems of random walkers. |
Tasks | |
Published | 2018-02-11 |
URL | http://arxiv.org/abs/1802.03824v1 |
http://arxiv.org/pdf/1802.03824v1.pdf | |
PWC | https://paperswithcode.com/paper/physics-constrained-data-driven-discovery-of |
Repo | |
Framework | |
Development of Real-time ADAS Object Detector for Deployment on CPU
Title | Development of Real-time ADAS Object Detector for Deployment on CPU |
Authors | Alexander Kozlov, Daniil Osokin |
Abstract | In this work, we outline the set of problems, which any Object Detection CNN faces when its development comes to the deployment stage and propose methods to deal with such difficulties. We show that these practices allow one to get Object Detection network, which can recognize two classes: vehicles and pedestrians and achieves more than 60 frames per second inference speed on Core$^{TM}$ i5-6500 CPU. The proposed model is built on top of the popular Single Shot MultiBox Object Detection framework but with substantial improvements, which were inspired by the discovered problems. The network has just 1.96 GMAC complexity and less than 7 MB model size. It is publicly available as a part of Intel$\circledR$ OpenVINO$^{TM}$ Toolkit. |
Tasks | Object Detection |
Published | 2018-11-14 |
URL | http://arxiv.org/abs/1811.05894v1 |
http://arxiv.org/pdf/1811.05894v1.pdf | |
PWC | https://paperswithcode.com/paper/development-of-real-time-adas-object-detector |
Repo | |
Framework | |
An Efficient Semismooth Newton Based Algorithm for Convex Clustering
Title | An Efficient Semismooth Newton Based Algorithm for Convex Clustering |
Authors | Yancheng Yuan, Defeng Sun, Kim-Chuan Toh |
Abstract | Clustering may be the most fundamental problem in unsupervised learning which is still active in machine learning research because its importance in many applications. Popular methods like K-means, may suffer from instability as they are prone to get stuck in its local minima. Recently, the sum-of-norms (SON) model (also known as clustering path), which is a convex relaxation of hierarchical clustering model, has been proposed in [7] and [5] Although numerical algorithms like ADMM and AMA are proposed to solve convex clustering model [2], it is known to be very challenging to solve large-scale problems. In this paper, we propose a semi-smooth Newton based augmented Lagrangian method for large-scale convex clustering problems. Extensive numerical experiments on both simulated and real data demonstrate that our algorithm is highly efficient and robust for solving large-scale problems. Moreover, the numerical results also show the superior performance and scalability of our algorithm compared to existing first-order methods. |
Tasks | |
Published | 2018-02-20 |
URL | http://arxiv.org/abs/1802.07091v1 |
http://arxiv.org/pdf/1802.07091v1.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-semismooth-newton-based |
Repo | |
Framework | |
Maximum Margin Metric Learning Over Discriminative Nullspace for Person Re-identification
Title | Maximum Margin Metric Learning Over Discriminative Nullspace for Person Re-identification |
Authors | T M Feroz Ali, Subhasis Chaudhuri |
Abstract | In this paper we propose a novel metric learning framework called Nullspace Kernel Maximum Margin Metric Learning (NK3ML) which efficiently addresses the small sample size (SSS) problem inherent in person re-identification and offers a significant performance gain over existing state-of-the-art methods. Taking advantage of the very high dimensionality of the feature space, the metric is learned using a maximum margin criterion (MMC) over a discriminative nullspace where all training sample points of a given class map onto a single point, minimizing the within class scatter. A kernel version of MMC is used to obtain a better between class separation. Extensive experiments on four challenging benchmark datasets for person re-identification demonstrate that the proposed algorithm outperforms all existing methods. We obtain 99.8% rank-1 accuracy on the most widely accepted and challenging dataset VIPeR, compared to the previous state of the art being only 63.92%. |
Tasks | Metric Learning, Person Re-Identification |
Published | 2018-07-28 |
URL | http://arxiv.org/abs/1807.10908v1 |
http://arxiv.org/pdf/1807.10908v1.pdf | |
PWC | https://paperswithcode.com/paper/maximum-margin-metric-learning-over |
Repo | |
Framework | |
HyP-DESPOT: A Hybrid Parallel Algorithm for Online Planning under Uncertainty
Title | HyP-DESPOT: A Hybrid Parallel Algorithm for Online Planning under Uncertainty |
Authors | Panpan Cai, Yuanfu Luo, David Hsu, Wee Sun Lee |
Abstract | Planning under uncertainty is critical for robust robot performance in uncertain, dynamic environments, but it incurs high computational cost. State-of-the-art online search algorithms, such as DESPOT, have vastly improved the computational efficiency of planning under uncertainty and made it a valuable tool for robotics in practice. This work takes one step further by leveraging both CPU and GPU parallelization in order to achieve near real-time online planning performance for complex tasks with large state, action, and observation spaces. Specifically, we propose Hybrid Parallel DESPOT (HyP-DESPOT), a massively parallel online planning algorithm that integrates CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT tree search by simultaneously traversing multiple independent paths using multi-core CPUs and performs parallel Monte-Carlo simulations at the leaf nodes of the search tree using GPUs. Experimental results show that HyP-DESPOT speeds up online planning by up to several hundred times, compared with the original DESPOT algorithm, in several challenging robotic tasks in simulation. |
Tasks | |
Published | 2018-02-17 |
URL | http://arxiv.org/abs/1802.06215v1 |
http://arxiv.org/pdf/1802.06215v1.pdf | |
PWC | https://paperswithcode.com/paper/hyp-despot-a-hybrid-parallel-algorithm-for |
Repo | |
Framework | |
Convolutional Neural Network Quantization using Generalized Gamma Distribution
Title | Convolutional Neural Network Quantization using Generalized Gamma Distribution |
Authors | Doyun Kim, Han Young Yim, Sanghyuck Ha, Changgwun Lee, Inyup Kang |
Abstract | As edge applications using convolutional neural networks (CNN) models grow, it is becoming necessary to introduce dedicated hardware accelerators in which network parameters and feature-map data are represented with limited precision. In this paper we propose a novel quantization algorithm for energy-efficient deployment of the hardware accelerators. For weights and biases, the optimal bit length of the fractional part is determined so that the quantization error is minimized over their distribution. For feature-map data, meanwhile, their sample distribution is well approximated with the generalized gamma distribution (GGD), and accordingly the optimal quantization step size can be obtained through the asymptotical closed form solution of GGD. The proposed quantization algorithm has a higher signal-to-quantization-noise ratio (SQNR) than other quantization schemes previously proposed for CNNs, and even can be more improved by tuning the quantization parameters, resulting in efficient implementation of the hardware accelerators for CNNs in terms of power consumption and memory bandwidth. |
Tasks | Quantization |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1810.13329v1 |
http://arxiv.org/pdf/1810.13329v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-neural-network-quantization |
Repo | |
Framework | |
ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation
Title | ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation |
Authors | Francesco Lattari, Marco Ciccone, Matteo Matteucci, Jonathan Masci, Francesco Visin |
Abstract | We introduce ReConvNet, a recurrent convolutional architecture for semi-supervised video object segmentation that is able to fast adapt its features to focus on any specific object of interest at inference time. Generalization to new objects never observed during training is known to be a hard task for supervised approaches that would need to be retrained. To tackle this problem, we propose a more efficient solution that learns spatio-temporal features self-adapting to the object of interest via conditional affine transformations. This approach is simple, can be trained end-to-end and does not necessarily require extra training steps at inference time. Our method shows competitive results on DAVIS2016 with respect to state-of-the art approaches that use online fine-tuning, and outperforms them on DAVIS2017. ReConvNet shows also promising results on the DAVIS-Challenge 2018 winning the $10$-th position. |
Tasks | Semantic Segmentation, Semi-supervised Video Object Segmentation, Video Object Segmentation, Video Semantic Segmentation |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05510v2 |
http://arxiv.org/pdf/1806.05510v2.pdf | |
PWC | https://paperswithcode.com/paper/reconvnet-video-object-segmentation-with |
Repo | |
Framework | |
Empirical Bounds on Linear Regions of Deep Rectifier Networks
Title | Empirical Bounds on Linear Regions of Deep Rectifier Networks |
Authors | Thiago Serra, Srikumar Ramalingam |
Abstract | We can compare the expressiveness of neural networks that use rectified linear units (ReLUs) by the number of linear regions, which reflect the number of pieces of the piecewise linear functions modeled by such networks. However, enumerating these regions is prohibitive and the known analytical bounds are identical for networks with same dimensions. In this work, we approximate the number of linear regions through empirical bounds based on features of the trained network and probabilistic inference. Our first contribution is a method to sample the activation patterns defined by ReLUs using universal hash functions. This method is based on a Mixed-Integer Linear Programming (MILP) formulation of the network and an algorithm for probabilistic lower bounds of MILP solution sets that we call MIPBound, which is considerably faster than exact counting and reaches values in similar orders of magnitude. Our second contribution is a tighter activation-based bound for the maximum number of linear regions, which is particularly stronger in networks with narrow layers. Combined, these bounds yield a fast proxy for the number of linear regions of a deep neural network. |
Tasks | |
Published | 2018-10-08 |
URL | https://arxiv.org/abs/1810.03370v3 |
https://arxiv.org/pdf/1810.03370v3.pdf | |
PWC | https://paperswithcode.com/paper/empirical-bounds-on-linear-regions-of-deep |
Repo | |
Framework | |
Differential Private Stack Generalization with an Application to Diabetes Prediction
Title | Differential Private Stack Generalization with an Application to Diabetes Prediction |
Authors | Quanming Yao, Xiawei Guo, James T. Kwok, WeiWei Tu, Yuqiang Chen, Wenyuan Dai, Qiang Yang |
Abstract | To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of significant concern. |
Tasks | Diabetes Prediction, Feature Importance, Transfer Learning |
Published | 2018-11-23 |
URL | https://arxiv.org/abs/1811.09491v3 |
https://arxiv.org/pdf/1811.09491v3.pdf | |
PWC | https://paperswithcode.com/paper/differential-private-stack-generalization |
Repo | |
Framework | |
Learning to Represent Bilingual Dictionaries
Title | Learning to Represent Bilingual Dictionaries |
Authors | Muhao Chen, Yingtao Tian, Haochen Chen, Kai-Wei Chang, Steven Skiena, Carlo Zaniolo |
Abstract | Bilingual word embeddings have been widely used to capture the similarity of lexical semantics in different human languages. However, many applications, such as cross-lingual semantic search and question answering, can be largely benefited from the cross-lingual correspondence between sentences and lexicons. To bridge this gap, we propose a neural embedding model that leverages bilingual dictionaries. The proposed model is trained to map the literal word definitions to the cross-lingual target words, for which we explore with different sentence encoding techniques. To enhance the learning process on limited resources, our model adopts several critical learning strategies, including multi-task learning on different bridges of languages, and joint learning of the dictionary model with a bilingual word embedding model. Experimental evaluation focuses on two applications. The results of the cross-lingual reverse dictionary retrieval task show our model’s promising ability of comprehending bilingual concepts based on descriptions, and highlight the effectiveness of proposed learning strategies in improving performance. Meanwhile, our model effectively addresses the bilingual paraphrase identification problem and significantly outperforms previous approaches. |
Tasks | Multi-Task Learning, Paraphrase Identification, Question Answering, Word Embeddings |
Published | 2018-08-10 |
URL | https://arxiv.org/abs/1808.03726v3 |
https://arxiv.org/pdf/1808.03726v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-represent-bilingual-dictionaries |
Repo | |
Framework | |
Improving Span-based Question Answering Systems with Coarsely Labeled Data
Title | Improving Span-based Question Answering Systems with Coarsely Labeled Data |
Authors | Hao Cheng, Ming-Wei Chang, Kenton Lee, Ankur Parikh, Michael Collins, Kristina Toutanova |
Abstract | We study approaches to improve fine-grained short answer Question Answering models by integrating coarse-grained data annotated for paragraph-level relevance and show that coarsely annotated data can bring significant performance gains. Experiments demonstrate that the standard multi-task learning approach of sharing representations is not the most effective way to leverage coarse-grained annotations. Instead, we can explicitly model the latent fine-grained short answer variables and optimize the marginal log-likelihood directly or use a newly proposed \emph{posterior distillation} learning objective. Since these latent-variable methods have explicit access to the relationship between the fine and coarse tasks, they result in significantly larger improvements from coarse supervision. |
Tasks | Multi-Task Learning, Question Answering |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.02076v1 |
http://arxiv.org/pdf/1811.02076v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-span-based-question-answering |
Repo | |
Framework | |