Paper Group ANR 54
Contextual bandits with surrogate losses: Margin bounds and efficient algorithms. Anderson Acceleration for Reinforcement Learning. Impacts of Dirty Data: and Experimental Evaluation. Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants. Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Mac …
Contextual bandits with surrogate losses: Margin bounds and efficient algorithms
Title | Contextual bandits with surrogate losses: Margin bounds and efficient algorithms |
Authors | Dylan J. Foster, Akshay Krishnamurthy |
Abstract | We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. Using the ramp loss, we derive new margin-based regret bounds in terms of standard sequential complexity measures of a benchmark class of real-valued regression functions. Using the hinge loss, we derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regressors. Under realizability assumptions, our results also yield classical regret bounds. |
Tasks | Multi-Armed Bandits |
Published | 2018-06-28 |
URL | http://arxiv.org/abs/1806.10745v2 |
http://arxiv.org/pdf/1806.10745v2.pdf | |
PWC | https://paperswithcode.com/paper/contextual-bandits-with-surrogate-losses |
Repo | |
Framework | |
Anderson Acceleration for Reinforcement Learning
Title | Anderson Acceleration for Reinforcement Learning |
Authors | Matthieu Geist, Bruno Scherrer |
Abstract | Anderson acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning. |
Tasks | |
Published | 2018-09-25 |
URL | http://arxiv.org/abs/1809.09501v1 |
http://arxiv.org/pdf/1809.09501v1.pdf | |
PWC | https://paperswithcode.com/paper/anderson-acceleration-for-reinforcement |
Repo | |
Framework | |
Impacts of Dirty Data: and Experimental Evaluation
Title | Impacts of Dirty Data: and Experimental Evaluation |
Authors | Zhixin Qi, Hongzhi Wang, Jianzhong Li, Hong Gao |
Abstract | Data quality issues have attracted widespread attention due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of the appropriate algorithm with the consideration of data quality and the determination of the data share to clean. However, rare research has focused on exploring such relationship. Motivated by this, this paper conducts an experimental comparison for the effects of missing, inconsistent and conflicting data on classification, clustering, and regression algorithms. Based on the experimental findings, we provide guidelines for algorithm selection and data cleaning. |
Tasks | |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06071v1 |
http://arxiv.org/pdf/1803.06071v1.pdf | |
PWC | https://paperswithcode.com/paper/impacts-of-dirty-data-and-experimental |
Repo | |
Framework | |
Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants
Title | Efficient Optimization Algorithms for Robust Principal Component Analysis and Its Variants |
Authors | Shiqian Ma, Necdet Serhat Aybat |
Abstract | Robust PCA has drawn significant attention in the last decade due to its success in numerous application domains, ranging from bio-informatics, statistics, and machine learning to image and video processing in computer vision. Robust PCA and its variants such as sparse PCA and stable PCA can be formulated as optimization problems with exploitable special structures. Many specialized efficient optimization methods have been proposed to solve robust PCA and related problems. In this paper we review existing optimization methods for solving convex and nonconvex relaxations/variants of robust PCA, discuss their advantages and disadvantages, and elaborate on their convergence behaviors. We also provide some insights for possible future research directions including new algorithmic frameworks that might be suitable for implementing on multi-processor setting to handle large-scale problems. |
Tasks | |
Published | 2018-06-09 |
URL | http://arxiv.org/abs/1806.03430v1 |
http://arxiv.org/pdf/1806.03430v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-optimization-algorithms-for-robust |
Repo | |
Framework | |
Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation
Title | Beyond Weight Tying: Learning Joint Input-Output Embeddings for Neural Machine Translation |
Authors | Nikolaos Pappas, Lesly Miculicich Werlen, James Henderson |
Abstract | Tying the weights of the target word embeddings with the target word classifiers of neural machine translation models leads to faster training and often to better translation quality. Given the success of this parameter sharing, we investigate other forms of sharing in between no sharing and hard equality of parameters. In particular, we propose a structure-aware output layer which captures the semantic structure of the output space of words within a joint input-output embedding. The model is a generalized form of weight tying which shares parameters but allows learning a more flexible relationship with input word embeddings and allows the effective capacity of the output layer to be controlled. In addition, the model shares weights across output classifiers and translation contexts which allows it to better leverage prior knowledge about them. Our evaluation on English-to-Finnish and English-to-German datasets shows the effectiveness of the method against strong encoder-decoder baselines trained with or without weight tying. |
Tasks | Machine Translation, Word Embeddings |
Published | 2018-08-31 |
URL | http://arxiv.org/abs/1808.10681v1 |
http://arxiv.org/pdf/1808.10681v1.pdf | |
PWC | https://paperswithcode.com/paper/beyond-weight-tying-learning-joint-input |
Repo | |
Framework | |
Unsupervised Multimodal Representation Learning across Medical Images and Reports
Title | Unsupervised Multimodal Representation Learning across Medical Images and Reports |
Authors | Tzu-Ming Harry Hsu, Wei-Hung Weng, Willie Boag, Matthew McDermott, Peter Szolovits |
Abstract | Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods. |
Tasks | Representation Learning |
Published | 2018-11-21 |
URL | http://arxiv.org/abs/1811.08615v1 |
http://arxiv.org/pdf/1811.08615v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-multimodal-representation |
Repo | |
Framework | |
Neural Multi-task Learning in Automated Assessment
Title | Neural Multi-task Learning in Automated Assessment |
Authors | Ronan Cummins, Marek Rei |
Abstract | Grammatical error detection and automated essay scoring are two tasks in the area of automated assessment. Traditionally these tasks have been treated independently with different machine learning models and features used for each task. In this paper, we develop a multi-task neural network model that jointly optimises for both tasks, and in particular we show that neural automated essay scoring can be significantly improved. We show that while the essay score provides little evidence to inform grammatical error detection, the essay score is highly influenced by error detection. |
Tasks | Grammatical Error Detection, Multi-Task Learning |
Published | 2018-01-21 |
URL | http://arxiv.org/abs/1801.06830v1 |
http://arxiv.org/pdf/1801.06830v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-multi-task-learning-in-automated |
Repo | |
Framework | |
Voice Disorder Detection Using Long Short Term Memory (LSTM) Model
Title | Voice Disorder Detection Using Long Short Term Memory (LSTM) Model |
Authors | Vibhuti Gupta |
Abstract | Automated detection of voice disorders with computational methods is a recent research area in the medical domain since it requires a rigorous endoscopy for the accurate diagnosis. Efficient screening methods are required for the diagnosis of voice disorders so as to provide timely medical facilities in minimal resources. Detecting Voice disorder using computational methods is a challenging problem since audio data is continuous due to which extracting relevant features and applying machine learning is hard and unreliable. This paper proposes a Long short term memory model (LSTM) to detect pathological voice disorders and evaluates its performance in a real 400 testing samples without any labels. Different feature extraction methods are used to provide the best set of features before applying LSTM model for classification. The paper describes the approach and experiments that show promising results with 22% sensitivity, 97% specificity and 56% unweighted average recall. |
Tasks | |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01779v1 |
http://arxiv.org/pdf/1812.01779v1.pdf | |
PWC | https://paperswithcode.com/paper/voice-disorder-detection-using-long-short |
Repo | |
Framework | |
Learning RUMs: Reducing Mixture to Single Component via PCA
Title | Learning RUMs: Reducing Mixture to Single Component via PCA |
Authors | Devavrat Shah, Dogyoon Song |
Abstract | We consider the problem of learning a mixture of Random Utility Models (RUMs). Despite the success of RUMs in various domains and the versatility of mixture RUMs to capture the heterogeneity in preferences, there has been only limited progress in learning a mixture of RUMs from partial data such as pairwise comparisons. In contrast, there have been significant advances in terms of learning a single component RUM using pairwise comparisons. In this paper, we aim to bridge this gap between mixture learning and single component learning of RUM by developing a reduction' procedure. We propose to utilize PCA-based spectral clustering that simultaneously de-noises’ pairwise comparison data. We prove that our algorithm manages to cluster the partial data correctly (i.e., comparisons from the same RUM component are grouped in the same cluster) with high probability even when data is generated from a possibly {\em heterogeneous} mixture of well-separated {\em generic} RUMs. Both the time and the sample complexities scale polynomially in model parameters including the number of items. Two key features in the analysis are in establishing (1) a meaningful upper bound on the sub-Gaussian norm for RUM components embedded into the vector space of pairwise marginals and (2) the robustness of PCA with missing values in the $L_{2, \infty}$ sense, which might be of interest in their own right. |
Tasks | |
Published | 2018-12-31 |
URL | https://arxiv.org/abs/1812.11917v3 |
https://arxiv.org/pdf/1812.11917v3.pdf | |
PWC | https://paperswithcode.com/paper/mixture-learning-from-partial-observations |
Repo | |
Framework | |
Causal Modeling with Probabilistic Simulation Models
Title | Causal Modeling with Probabilistic Simulation Models |
Authors | Duligur Ibeling |
Abstract | Recent authors have proposed analyzing conditional reasoning through a notion of intervention on a simulation program, and have found a sound and complete axiomatization of the logic of conditionals in this setting. Here we extend this setting to the case of probabilistic simulation models. We give a natural definition of probability on formulas of the conditional language, allowing for the expression of counterfactuals, and prove foundational results about this definition. We also find an axiomatization for reasoning about linear inequalities involving probabilities in this setting. We prove soundness, completeness, and NP-completeness of the satisfiability problem for this logic. |
Tasks | |
Published | 2018-07-30 |
URL | http://arxiv.org/abs/1807.11139v1 |
http://arxiv.org/pdf/1807.11139v1.pdf | |
PWC | https://paperswithcode.com/paper/causal-modeling-with-probabilistic-simulation |
Repo | |
Framework | |
The role of visual saliency in the automation of seismic interpretation
Title | The role of visual saliency in the automation of seismic interpretation |
Authors | Muhammad Amir Shafiq, Tariq Alshawi, Zhiling Long, Ghassan AlRegib |
Abstract | In this paper, we propose a workflow based on SalSi for the detection and delineation of geological structures such as salt domes. SalSi is a seismic attribute designed based on the modeling of human visual system that detects the salient features and captures the spatial correlation within seismic volumes for delineating seismic structures. Using SalSi, we can not only highlight the neighboring regions of salt domes to assist a seismic interpreter but also delineate such structures using a region growing method and post-processing. The proposed delineation workflow detects the salt-dome boundary with very good precision and accuracy. Experimental results show the effectiveness of the proposed workflow on a real seismic dataset acquired from the North Sea, F3 block. For the subjective evaluation of the results of different salt-dome delineation algorithms, we have used a reference salt-dome boundary interpreted by a geophysicist. For the objective evaluation of results, we have used five different metrics based on pixels, shape, and curvedness to establish the effectiveness of the proposed workflow. The proposed workflow is not only fast but also yields better results as compared to other salt-dome delineation algorithms and shows a promising potential in seismic interpretation. |
Tasks | Seismic Interpretation |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11960v1 |
http://arxiv.org/pdf/1812.11960v1.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-visual-saliency-in-the-automation |
Repo | |
Framework | |
Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks
Title | Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks |
Authors | Peter L. Bartlett, David P. Helmbold, Philip M. Long |
Abstract | We analyze algorithms for approximating a function $f(x) = \Phi x$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $\Theta_1,…,\Theta_L$ and defined by $h(x) = \Theta_L \Theta_{L-1} … \Theta_1 x$. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic. We provide polynomial bounds on the number of iterations for gradient descent to approximate the least squares matrix $\Phi$, in the case where the initial hypothesis $\Theta_1 = … = \Theta_L = I$ has excess loss bounded by a small enough constant. On the other hand, we show that gradient descent fails to converge for $\Phi$ whose distance from the identity is a larger constant, and we show that some forms of regularization toward the identity in each layer do not help. If $\Phi$ is symmetric positive definite, we show that an algorithm that initializes $\Theta_i = I$ learns an $\epsilon$-approximation of $f$ using a number of updates polynomial in $L$, the condition number of $\Phi$, and $\log(d/\epsilon)$. In contrast, we show that if the least squares matrix $\Phi$ is symmetric and has a negative eigenvalue, then all members of a class of algorithms that perform gradient descent with identity initialization, and optionally regularize toward the identity in each layer, fail to converge. We analyze an algorithm for the case that $\Phi$ satisfies $u^{\top} \Phi u > 0$ for all $u$, but may not be symmetric. This algorithm uses two regularizers: one that maintains the invariant $u^{\top} \Theta_L \Theta_{L-1} … \Theta_1 u > 0$ for all $u$, and another that “balances” $\Theta_1, …, \Theta_L$ so that they have the same singular values. |
Tasks | |
Published | 2018-02-16 |
URL | http://arxiv.org/abs/1802.06093v4 |
http://arxiv.org/pdf/1802.06093v4.pdf | |
PWC | https://paperswithcode.com/paper/gradient-descent-with-identity-initialization-1 |
Repo | |
Framework | |
Spectrum concentration in deep residual learning: a free probability approach
Title | Spectrum concentration in deep residual learning: a free probability approach |
Authors | Zenan Ling, Xing He, Robert C. Qiu |
Abstract | We revisit the initialization of deep residual networks (ResNets) by introducing a novel analytical tool in free probability to the community of deep learning. This tool deals with non-Hermitian random matrices, rather than their conventional Hermitian counterparts in the literature. As a consequence, this new tool enables us to evaluate the singular value spectrum of the input-output Jacobian of a fully-connected deep ResNet for both linear and nonlinear cases. With the powerful tool of free probability, we conduct an asymptotic analysis of the spectrum on the single-layer case, and then extend this analysis to the multi-layer case of an arbitrary number of layers. In particular, we propose to rescale the classical random initialization by the number of residual units, so that the spectrum has the order of $O(1)$, when compared with the large width and depth of the network. We empirically demonstrate that the proposed initialization scheme learns at a speed of orders of magnitudes faster than the classical ones, and thus attests a strong practical relevance of this investigation. |
Tasks | |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11694v3 |
http://arxiv.org/pdf/1807.11694v3.pdf | |
PWC | https://paperswithcode.com/paper/spectrum-concentration-in-deep-residual |
Repo | |
Framework | |
K-nearest Neighbor Search by Random Projection Forests
Title | K-nearest Neighbor Search by Random Projection Forests |
Authors | Donghui Yan, Yingjie Wang, Jin Wang, Honggang Wang, Zhenpeng Li |
Abstract | K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable. |
Tasks | |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11689v1 |
http://arxiv.org/pdf/1812.11689v1.pdf | |
PWC | https://paperswithcode.com/paper/k-nearest-neighbor-search-by-random |
Repo | |
Framework | |
Posture recognition using an RGB-D camera : exploring 3D body modeling and deep learning approaches
Title | Posture recognition using an RGB-D camera : exploring 3D body modeling and deep learning approaches |
Authors | Mohamed El Amine Elforaici, Ismail Chaaraoui, Wassim Bouachir, Youssef Ouakrim, Neila Mezghani |
Abstract | The emergence of RGB-D sensors offered new possibilities for addressing complex artificial vision problems efficiently. Human posture recognition is among these computer vision problems, with a wide range of applications such as ambient assisted living and intelligent health care systems. In this context, our paper presents novel methods and ideas to design automatic posture recognition systems using an RGB-D camera. More specifically, we introduce two supervised methods to learn and recognize human postures using the main types of visual data provided by an RGB-D camera. The first method is based on convolutional features extracted from 2D images. Convolutional Neural Networks (CNNs) are trained to recognize human postures using transfer learning on RGB and depth images. Secondly, we propose to model the posture using the body joint configuration in the 3D space. Posture recognition is then performed through SVM classification of 3D skeleton-based features. To evaluate the proposed methods, we created a challenging posture recognition dataset with a considerable variability regarding the acquisition conditions. The experimental results demonstrated comparable performances and high precision for both methods in recognizing human postures, with a slight superiority for the CNN-based method when applied on depth images. Moreover, the two approaches demonstrated a high robustness to several perturbation factors, such as scale and orientation change. |
Tasks | Transfer Learning |
Published | 2018-09-30 |
URL | http://arxiv.org/abs/1810.00308v2 |
http://arxiv.org/pdf/1810.00308v2.pdf | |
PWC | https://paperswithcode.com/paper/posture-recognition-using-an-rgb-d-camera |
Repo | |
Framework | |