Paper Group ANR 383
Perishability of Data: Dynamic Pricing under Varying-Coefficient Models. A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering. Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent. Geometry of Factored Nuclear Norm Regularization. On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse …
Perishability of Data: Dynamic Pricing under Varying-Coefficient Models
Title | Perishability of Data: Dynamic Pricing under Varying-Coefficient Models |
Authors | Adel Javanmard |
Abstract | We consider a firm that sells a large number of products to its customers in an online fashion. Each product is described by a high dimensional feature vector, and the market value of a product is assumed to be linear in the values of its features. Parameters of the valuation model are unknown and can change over time. The firm sequentially observes a product’s features and can use the historical sales data (binary sale/no sale feedbacks) to set the price of current product, with the objective of maximizing the collected revenue. We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on projected stochastic gradient descent (PSGD) and characterize its regret in terms of time $T$, features dimension $d$, and the temporal variability in the model parameters, $\delta_t$. We consider two settings. In the first one, feature vectors are chosen antagonistically by nature and we prove that the regret of PSGD pricing policy is of order $O(\sqrt{T} + \sum_{t=1}^T \sqrt{t}\delta_t)$. In the second setting (referred to as stochastic features model), the feature vectors are drawn independently from an unknown distribution. We show that in this case, the regret of PSGD pricing policy is of order $O(d^2 \log T + \sum_{t=1}^T t\delta_t/d)$. |
Tasks | |
Published | 2017-01-13 |
URL | http://arxiv.org/abs/1701.03537v2 |
http://arxiv.org/pdf/1701.03537v2.pdf | |
PWC | https://paperswithcode.com/paper/perishability-of-data-dynamic-pricing-under |
Repo | |
Framework | |
A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering
Title | A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering |
Authors | Bin Bi, Hao Ma |
Abstract | This paper proposes a novel neural machine reading model for open-domain question answering at scale. Existing machine comprehension models typically assume that a short piece of relevant text containing answers is already identified and given to the models, from which the models are designed to extract answers. This assumption, however, is not realistic for building a large-scale open-domain question answering system which requires both deep text understanding and identifying relevant text from corpus simultaneously. In this paper, we introduce Neural Comprehensive Ranker (NCR) that integrates both passage ranking and answer extraction in one single framework. A Q&A system based on this framework allows users to issue an open-domain question without needing to provide a piece of text that must contain the answer. Experiments show that the unified NCR model is able to outperform the states-of-the-art in both retrieval of relevant text and answer extraction. |
Tasks | Open-Domain Question Answering, Question Answering, Reading Comprehension |
Published | 2017-09-29 |
URL | http://arxiv.org/abs/1709.10204v2 |
http://arxiv.org/pdf/1709.10204v2.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-comprehensive-ranker-ncr-for-open |
Repo | |
Framework | |
Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent
Title | Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent |
Authors | Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu |
Abstract | Data compression is a popular technique for improving the efficiency of data processing workloads such as SQL queries and more recently, machine learning (ML) with classical batch gradient methods. But the efficacy of such ideas for mini-batch stochastic gradient descent (MGD), arguably the workhorse algorithm of modern ML, is an open question. MGD’s unique data access pattern renders prior art, including those designed for batch gradient methods, less effective. We fill this crucial research gap by proposing a new lossless compression scheme we call tuple-oriented compression (TOC) that is inspired by an unlikely source, the string/text compression scheme Lempel-Ziv-Welch, but tailored to MGD in a way that preserves tuple boundaries within mini-batches. We then present a suite of novel compressed matrix operation execution techniques tailored to the TOC compression scheme that operate directly over the compressed data representation and avoid decompression overheads. An extensive empirical evaluation with real-world datasets shows that TOC consistently achieves substantial compression ratios by up to 51x and reduces runtimes for MGD workloads by up to 10.2x in popular ML systems. |
Tasks | |
Published | 2017-02-22 |
URL | http://arxiv.org/abs/1702.06943v3 |
http://arxiv.org/pdf/1702.06943v3.pdf | |
PWC | https://paperswithcode.com/paper/tuple-oriented-compression-for-large-scale |
Repo | |
Framework | |
Geometry of Factored Nuclear Norm Regularization
Title | Geometry of Factored Nuclear Norm Regularization |
Authors | Qiuwei Li, Zhihui Zhu, Gongguo Tang |
Abstract | This work investigates the geometry of a nonconvex reformulation of minimizing a general convex loss function $f(X)$ regularized by the matrix nuclear norm $\X_*$. Nuclear-norm regularized matrix inverse problems are at the heart of many applications in machine learning, signal processing, and control. The statistical performance of nuclear norm regularization has been studied extensively in literature using convex analysis techniques. Despite its optimal performance, the resulting optimization has high computational complexity when solved using standard or even tailored fast convex solvers. To develop faster and more scalable algorithms, we follow the proposal of Burer-Monteiro to factor the matrix variable $X$ into the product of two smaller rectangular matrices $X=UV^T$ and also replace the nuclear norm $\X_*$ with $(\U_F^2+\V_F^2)/2$. In spite of the nonconvexity of the factored formulation, we prove that when the convex loss function $f(X)$ is $(2r,4r)$-restricted well-conditioned, each critical point of the factored problem either corresponds to the optimal solution $X^\star$ of the original convex optimization or is a strict saddle point where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation allows many local search algorithms to converge to the global optimum with random initializations. |
Tasks | |
Published | 2017-04-05 |
URL | http://arxiv.org/abs/1704.01265v1 |
http://arxiv.org/pdf/1704.01265v1.pdf | |
PWC | https://paperswithcode.com/paper/geometry-of-factored-nuclear-norm |
Repo | |
Framework | |
On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation
Title | On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation |
Authors | Yingzhen Yang, Jiashi Feng, Nebojsa Jojic, Jianchao Yang, Thomas S. Huang |
Abstract | We study the proximal gradient descent (PGD) method for $\ell^{0}$ sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the $\ell^{0}$ sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the $\ell^{0}$ sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem. |
Tasks | Compressive Sensing, Dimensionality Reduction |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01230v1 |
http://arxiv.org/pdf/1709.01230v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-suboptimality-of-proximal-gradient |
Repo | |
Framework | |
DeepPainter: Painter Classification Using Deep Convolutional Autoencoders
Title | DeepPainter: Painter Classification Using Deep Convolutional Autoencoders |
Authors | Eli David, Nathan S. Netanyahu |
Abstract | In this paper we describe the problem of painter classification, and propose a novel approach based on deep convolutional autoencoder neural networks. While previous approaches relied on image processing and manual feature extraction from paintings, our approach operates on the raw pixel level, without any preprocessing or manual feature extraction. We first train a deep convolutional autoencoder on a dataset of paintings, and subsequently use it to initialize a supervised convolutional neural network for the classification phase. The proposed approach substantially outperforms previous methods, improving the previous state-of-the-art for the 3-painter classification problem from 90.44% accuracy (previous state-of-the-art) to 96.52% accuracy, i.e., a 63% reduction in error rate. |
Tasks | |
Published | 2017-11-23 |
URL | http://arxiv.org/abs/1711.08763v1 |
http://arxiv.org/pdf/1711.08763v1.pdf | |
PWC | https://paperswithcode.com/paper/deeppainter-painter-classification-using-deep |
Repo | |
Framework | |
Discriminative Similarity for Clustering and Semi-Supervised Learning
Title | Discriminative Similarity for Clustering and Semi-Supervised Learning |
Authors | Yingzhen Yang, Feng Liang, Nebojsa Jojic, Shuicheng Yan, Jiashi Feng, Thomas S. Huang |
Abstract | Similarity-based clustering and semi-supervised learning methods separate the data into clusters or classes according to the pairwise similarity between the data, and the pairwise similarity is crucial for their performance. In this paper, we propose a novel discriminative similarity learning framework which learns discriminative similarity for either data clustering or semi-supervised learning. The proposed framework learns classifier from each hypothetical labeling, and searches for the optimal labeling by minimizing the generalization error of the learned classifiers associated with the hypothetical labeling. Kernel classifier is employed in our framework. By generalization analysis via Rademacher complexity, the generalization error bound for the kernel classifier learned from hypothetical labeling is expressed as the sum of pairwise similarity between the data from different classes, parameterized by the weights of the kernel classifier. Such pairwise similarity serves as the discriminative similarity for the purpose of clustering and semi-supervised learning, and discriminative similarity with similar form can also be induced by the integrated squared error bound for kernel density classification. Based on the discriminative similarity induced by the kernel classifier, we propose new clustering and semi-supervised learning methods. |
Tasks | |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01231v1 |
http://arxiv.org/pdf/1709.01231v1.pdf | |
PWC | https://paperswithcode.com/paper/discriminative-similarity-for-clustering-and |
Repo | |
Framework | |
Simultaneous Optimization of Neural Network Weights and Active Nodes using Metaheuristics
Title | Simultaneous Optimization of Neural Network Weights and Active Nodes using Metaheuristics |
Authors | Varun Kumar Ojha, Ajith Abraham, Vaclav Snasel |
Abstract | Optimization of neural network (NN) significantly influenced by the transfer function used in its active nodes. It has been observed that the homogeneity in the activation nodes does not provide the best solution. Therefore, the customizable transfer functions whose underlying parameters are subjected to optimization were used to provide heterogeneity to NN. For the experimental purpose, a meta-heuristic framework using a combined genotype representation of connection weights and transfer function parameter was used. The performance of adaptive Logistic, Tangent-hyperbolic, Gaussian and Beta functions were analyzed. In present research work, concise comparisons between different transfer function and between the NN optimization algorithms are presented. The comprehensive analysis of the results obtained over the benchmark dataset suggests that the Artificial Bee Colony with adaptive transfer function provides the best results in terms of classification accuracy over the particle swarm optimization and differential evolution. |
Tasks | |
Published | 2017-07-06 |
URL | http://arxiv.org/abs/1707.01810v1 |
http://arxiv.org/pdf/1707.01810v1.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-optimization-of-neural-network |
Repo | |
Framework | |
Generative Adversarial Perturbations
Title | Generative Adversarial Perturbations |
Authors | Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie |
Abstract | In this paper, we propose novel generative models for creating adversarial examples, slightly perturbed images resembling natural images but maliciously crafted to fool pre-trained models. We present trainable deep neural networks for transforming images to adversarial perturbations. Our proposed models can produce image-agnostic and image-dependent perturbations for both targeted and non-targeted attacks. We also demonstrate that similar architectures can achieve impressive results in fooling classification and semantic segmentation models, obviating the need for hand-crafting attack methods for each task. Using extensive experiments on challenging high-resolution datasets such as ImageNet and Cityscapes, we show that our perturbations achieve high fooling rates with small perturbation norms. Moreover, our attacks are considerably faster than current iterative methods at inference time. |
Tasks | Semantic Segmentation |
Published | 2017-12-06 |
URL | http://arxiv.org/abs/1712.02328v3 |
http://arxiv.org/pdf/1712.02328v3.pdf | |
PWC | https://paperswithcode.com/paper/generative-adversarial-perturbations |
Repo | |
Framework | |
Comparative Benchmarking of Causal Discovery Techniques
Title | Comparative Benchmarking of Causal Discovery Techniques |
Authors | Karamjit Singh, Garima Gupta, Vartika Tewari, Gautam Shroff |
Abstract | In this paper we present a comprehensive view of prominent causal discovery algorithms, categorized into two main categories (1) assuming acyclic and no latent variables, and (2) allowing both cycles and latent variables, along with experimental results comparing them from three perspectives: (a) structural accuracy, (b) standard predictive accuracy, and (c) accuracy of counterfactual inference. For (b) and (c) we train causal Bayesian networks with structures as predicted by each causal discovery technique to carry out counterfactual or standard predictive inference. We compare causal algorithms on two pub- licly available and one simulated datasets having different sample sizes: small, medium and large. Experiments show that structural accuracy of a technique does not necessarily correlate with higher accuracy of inferencing tasks. Fur- ther, surveyed structure learning algorithms do not perform well in terms of structural accuracy in case of datasets having large number of variables. |
Tasks | Causal Discovery, Counterfactual Inference |
Published | 2017-08-18 |
URL | http://arxiv.org/abs/1708.06246v2 |
http://arxiv.org/pdf/1708.06246v2.pdf | |
PWC | https://paperswithcode.com/paper/comparative-benchmarking-of-causal-discovery |
Repo | |
Framework | |
Speeding-up the decision making of a learning agent using an ion trap quantum processor
Title | Speeding-up the decision making of a learning agent using an ion trap quantum processor |
Authors | Theeraphot Sriarunothai, Sabine Wölk, Gouri Shankar Giri, Nicolai Friis, Vedran Dunjko, Hans J. Briegel, Christof Wunderlich |
Abstract | We report a proof-of-principle experimental demonstration of the quantum speed-up for learning agents utilizing a small-scale quantum information processor based on radiofrequency-driven trapped ions. The decision-making process of a quantum learning agent within the projective simulation paradigm for machine learning is implemented in a system of two qubits. The latter are realized using hyperfine states of two frequency-addressed atomic ions exposed to a static magnetic field gradient. We show that the deliberation time of this quantum learning agent is quadratically improved with respect to comparable classical learning agents. The performance of this quantum-enhanced learning agent highlights the potential of scalable quantum processors taking advantage of machine learning. |
Tasks | Decision Making |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01366v3 |
http://arxiv.org/pdf/1709.01366v3.pdf | |
PWC | https://paperswithcode.com/paper/speeding-up-the-decision-making-of-a-learning |
Repo | |
Framework | |
Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier
Title | Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier |
Authors | Saad Bin Ahmed, Saeeda Naz, Salahuddin Swati, Muhammad Imran Razzak |
Abstract | The recognition of cursive script is regarded as a subtle task in optical character recognition due to its varied representation. Every cursive script has different nature and associated challenges. As Urdu is one of cursive language that is derived from Arabic script, thats why it nearly shares the same challenges and difficulties even more harder. We can categorized Urdu and Arabic language on basis of its script they use. Urdu is mostly written in Nastaliq style whereas, Arabic follows Naskh style of writing. This paper presents new and comprehensive Urdu handwritten offline database name Urdu-Nastaliq Handwritten Dataset (UNHD). Currently, there is no standard and comprehensive Urdu handwritten dataset available publicly for researchers. The acquired dataset covers commonly used ligatures that were written by 500 writers with their natural handwriting on A4 size paper. We performed experiments using recurrent neural networks and reported a significant accuracy for handwritten Urdu character recognition. |
Tasks | Optical Character Recognition |
Published | 2017-05-15 |
URL | http://arxiv.org/abs/1705.05455v1 |
http://arxiv.org/pdf/1705.05455v1.pdf | |
PWC | https://paperswithcode.com/paper/handwritten-urdu-character-recognition-using |
Repo | |
Framework | |
Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs
Title | Inter-Subject Analysis: Inferring Sparse Interactions with Dense Intra-Graphs |
Authors | Cong Ma, Junwei Lu, Han Liu |
Abstract | We develop a new modeling framework for Inter-Subject Analysis (ISA). The goal of ISA is to explore the dependency structure between different subjects with the intra-subject dependency as nuisance. It has important applications in neuroscience to explore the functional connectivity between brain regions under natural stimuli. Our framework is based on the Gaussian graphical models, under which ISA can be converted to the problem of estimation and inference of the inter-subject precision matrix. The main statistical challenge is that we do not impose sparsity constraint on the whole precision matrix and we only assume the inter-subject part is sparse. For estimation, we propose to estimate an alternative parameter to get around the non-sparse issue and it can achieve asymptotic consistency even if the intra-subject dependency is dense. For inference, we propose an “untangle and chord” procedure to de-bias our estimator. It is valid without the sparsity assumption on the inverse Hessian of the log-likelihood function. This inferential method is general and can be applied to many other statistical problems, thus it is of independent theoretical interest. Numerical experiments on both simulated and brain imaging data validate our methods and theory. |
Tasks | |
Published | 2017-09-20 |
URL | http://arxiv.org/abs/1709.07036v1 |
http://arxiv.org/pdf/1709.07036v1.pdf | |
PWC | https://paperswithcode.com/paper/inter-subject-analysis-inferring-sparse |
Repo | |
Framework | |
Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
Title | Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning |
Authors | James Steven Supancic III, Deva Ramanan |
Abstract | We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational budget. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typically made heuristically. Instead, we propose to learn an optimal decision-making policy by formulating tracking as a partially observable decision-making process (POMDP). We learn policies with deep reinforcement learning algorithms that need supervision (a reward signal) only when the track has gone awry. We demonstrate that sparse rewards allow us to quickly train on massive datasets, several orders of magnitude more than past work. Interestingly, by treating the data source of Internet videos as unlimited streams, we both learn and evaluate our trackers in a single, unified computational stream. |
Tasks | Decision Making |
Published | 2017-07-17 |
URL | http://arxiv.org/abs/1707.04991v1 |
http://arxiv.org/pdf/1707.04991v1.pdf | |
PWC | https://paperswithcode.com/paper/tracking-as-online-decision-making-learning-a |
Repo | |
Framework | |
Quantifying Program Bias
Title | Quantifying Program Bias |
Authors | Aws Albarghouthi, Loris D’Antoni, Samuel Drews, Aditya Nori |
Abstract | With the range and sensitivity of algorithmic decisions expanding at a break-neck speed, it is imperative that we aggressively investigate whether programs are biased. We propose a novel probabilistic program analysis technique and apply it to quantifying bias in decision-making programs. Specifically, we (i) present a sound and complete automated verification technique for proving quantitative properties of probabilistic programs; (ii) show that certain notions of bias, recently proposed in the fairness literature, can be phrased as quantitative correctness properties; and (iii) present FairSquare, the first verification tool for quantifying program bias, and evaluate it on a range of decision-making programs. |
Tasks | Decision Making |
Published | 2017-02-17 |
URL | http://arxiv.org/abs/1702.05437v2 |
http://arxiv.org/pdf/1702.05437v2.pdf | |
PWC | https://paperswithcode.com/paper/quantifying-program-bias |
Repo | |
Framework | |