May 5, 2019

3370 words 16 mins read

Paper Group ANR 460

Paper Group ANR 460

SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks. OBDA Constraints for Effective Query Answering (Extended Version). High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts. Non-Convex Projected Gradient Descent for Generalized Low-Rank T …

SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks

Title SoftTarget Regularization: An Effective Technique to Reduce Over-Fitting in Neural Networks
Authors Armen Aghajanyan
Abstract Deep neural networks are learning models with a very high capacity and therefore prone to over-fitting. Many regularization techniques such as Dropout, DropConnect, and weight decay all attempt to solve the problem of over-fitting by reducing the capacity of their respective models (Srivastava et al., 2014), (Wan et al., 2013), (Krogh & Hertz, 1992). In this paper we introduce a new form of regularization that guides the learning problem in a way that reduces over-fitting without sacrificing the capacity of the model. The mistakes that models make in early stages of training carry information about the learning problem. By adjusting the labels of the current epoch of training through a weighted average of the real labels, and an exponential average of the past soft-targets we achieved a regularization scheme as powerful as Dropout without necessarily reducing the capacity of the model, and simplified the complexity of the learning problem. SoftTarget regularization proved to be an effective tool in various neural network architectures.
Tasks
Published 2016-09-21
URL http://arxiv.org/abs/1609.06693v3
PDF http://arxiv.org/pdf/1609.06693v3.pdf
PWC https://paperswithcode.com/paper/softtarget-regularization-an-effective
Repo
Framework

OBDA Constraints for Effective Query Answering (Extended Version)

Title OBDA Constraints for Effective Query Answering (Extended Version)
Authors Dag Hovland, Davide Lanti, Martin Rezk, Guohui Xiao
Abstract In Ontology Based Data Access (OBDA) users pose SPARQL queries over an ontology that lies on top of relational datasources. These queries are translated on-the-fly into SQL queries by OBDA systems. Standard SPARQL-to-SQL translation techniques in OBDA often produce SQL queries containing redundant joins and unions, even after a number of semantic and structural optimizations. These redundancies are detrimental to the performance of query answering, especially in complex industrial OBDA scenarios with large enterprise databases. To address this issue, we introduce two novel notions of OBDA constraints and show how to exploit them for efficient query answering. We conduct an extensive set of experiments on large datasets using real world data and queries, showing that these techniques strongly improve the performance of query answering up to orders of magnitude.
Tasks
Published 2016-05-13
URL http://arxiv.org/abs/1605.04263v2
PDF http://arxiv.org/pdf/1605.04263v2.pdf
PWC https://paperswithcode.com/paper/obda-constraints-for-effective-query
Repo
Framework

High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts

Title High Performance Software in Multidimensional Reduction Methods for Image Processing with Application to Ancient Manuscripts
Authors Corneliu T. C. Arsene, Stephen Church, Mark Dickinson
Abstract Multispectral imaging is an important technique for improving the readability of written or printed text where the letters have faded, either due to deliberate erasing or simply due to the ravages of time. Often the text can be read simply by looking at individual wavelengths, but in some cases the images need further enhancement to maximise the chances of reading the text. There are many possible enhancement techniques and this paper assesses and compares an extended set of dimensionality reduction methods for image processing. We assess 15 dimensionality reduction methods in two different manuscripts. This assessment was performed both subjectively by asking the opinions of scholars who were experts in the languages used in the manuscripts which of the techniques they preferred and also by using the Davies-Bouldin and Dunn indexes for assessing the quality of the resulted image clusters. We found that the Canonical Variates Analysis (CVA) method which was using a Matlab implementation and we have used previously to enhance multispectral images, it was indeed superior to all the other tested methods. However it is very likely that other approaches will be more suitable in specific circumstance so we would still recommend that a range of these techniques are tried. In particular, CVA is a supervised clustering technique so it requires considerably more user time and effort than a non-supervised technique such as the much more commonly used Principle Component Analysis Approach (PCA). If the results from PCA are adequate to allow a text to be read then the added effort required for CVA may not be justified. For the purposes of comparing the computational times and the image results, a CVA method is also implemented in C programming language and using the GNU (GNUs Not Unix) Scientific Library (GSL) and the OpenCV (OPEN source Computer Vision) computer vision programming library.
Tasks Dimensionality Reduction
Published 2016-12-19
URL http://arxiv.org/abs/1612.06457v5
PDF http://arxiv.org/pdf/1612.06457v5.pdf
PWC https://paperswithcode.com/paper/high-performance-software-in-multidimensional
Repo
Framework

Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression

Title Non-Convex Projected Gradient Descent for Generalized Low-Rank Tensor Regression
Authors Han Chen, Garvesh Raskutti, Ming Yuan
Abstract In this paper, we consider the problem of learning high-dimensional tensor regression problems with low-rank structure. One of the core challenges associated with learning high-dimensional models is computation since the underlying optimization problems are often non-convex. While convex relaxations could lead to polynomial-time algorithms they are often slow in practice. On the other hand, limited theoretical guarantees exist for non-convex methods. In this paper we provide a general framework that provides theoretical guarantees for learning high-dimensional tensor regression models under different low-rank structural assumptions using the projected gradient descent algorithm applied to a potentially non-convex constraint set $\Theta$ in terms of its \emph{localized Gaussian width}. We juxtapose our theoretical results for non-convex projected gradient descent algorithms with previous results on regularized convex approaches. The two main differences between the convex and non-convex approach are: (i) from a computational perspective whether the non-convex projection operator is computable and whether the projection has desirable contraction properties and (ii) from a statistical upper bound perspective, the non-convex approach has a superior rate for a number of examples. We provide three concrete examples of low-dimensional structure which address these issues and explain the pros and cons for the non-convex and convex approaches. We supplement our theoretical results with simulations which show that, under several common settings of generalized low rank tensor regression, the projected gradient descent approach is superior both in terms of statistical error and run-time provided the step-sizes of the projected descent algorithm are suitably chosen.
Tasks
Published 2016-11-30
URL http://arxiv.org/abs/1611.10349v1
PDF http://arxiv.org/pdf/1611.10349v1.pdf
PWC https://paperswithcode.com/paper/non-convex-projected-gradient-descent-for
Repo
Framework

A Survey of Multi-View Representation Learning

Title A Survey of Multi-View Representation Learning
Authors Yingming Li, Ming Yang, Zhongfei Zhang
Abstract Recently, multi-view representation learning has become a rapidly growing direction in machine learning and data mining areas. This paper introduces two categories for multi-view representation learning: multi-view representation alignment and multi-view representation fusion. Consequently, we first review the representative methods and theories of multi-view representation learning based on the perspective of alignment, such as correlation-based alignment. Representative examples are canonical correlation analysis (CCA) and its several extensions. Then from the perspective of representation fusion we investigate the advancement of multi-view representation learning that ranges from generative methods including multi-modal topic learning, multi-view sparse coding, and multi-view latent space Markov networks, to neural network-based methods including multi-modal autoencoders, multi-view convolutional neural networks, and multi-modal recurrent neural networks. Further, we also investigate several important applications of multi-view representation learning. Overall, this survey aims to provide an insightful overview of theoretical foundation and state-of-the-art developments in the field of multi-view representation learning and to help researchers find the most appropriate tools for particular applications.
Tasks Representation Learning
Published 2016-10-03
URL http://arxiv.org/abs/1610.01206v5
PDF http://arxiv.org/pdf/1610.01206v5.pdf
PWC https://paperswithcode.com/paper/a-survey-of-multi-view-representation
Repo
Framework

The Peaking Phenomenon in Semi-supervised Learning

Title The Peaking Phenomenon in Semi-supervised Learning
Authors Jesse H. Krijthe, Marco Loog
Abstract For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training set. We explain why the learning curve has a more steep incline and a more gradual decline in this setting through simulation studies and by applying an approximation of the learning curve based on the work by Raudys & Duin.
Tasks
Published 2016-10-17
URL http://arxiv.org/abs/1610.05160v1
PDF http://arxiv.org/pdf/1610.05160v1.pdf
PWC https://paperswithcode.com/paper/the-peaking-phenomenon-in-semi-supervised
Repo
Framework

Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization

Title Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization
Authors Amelia Perry, Alexander S. Wein, Afonso S. Bandeira, Ankur Moitra
Abstract A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, in which a prominent eigenvector is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and P'ech'e showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the signal strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, not all the information about the spike is necessarily contained in the spectrum. We study the fundamental limitations of statistical methods, including non-spectral ones. Our results include: I) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for a variety of benign priors for the spike. We extend previous work on the spherically symmetric and i.i.d. Rademacher priors through an elementary, unified analysis. II) For any non-Gaussian Wigner ensemble, we show that PCA is always suboptimal for detection. However, a variant of PCA achieves the optimal threshold (for benign priors) by pre-transforming the matrix entries according to a carefully designed function. This approach has been stated before, and we give a rigorous and general analysis. III) For both the Gaussian Wishart ensemble and various synchronization problems over groups, we show that inefficient procedures can work below the threshold where PCA succeeds, whereas no known efficient algorithm achieves this. This conjectural gap between what is statistically possible and what can be done efficiently remains open.
Tasks
Published 2016-09-19
URL http://arxiv.org/abs/1609.05573v2
PDF http://arxiv.org/pdf/1609.05573v2.pdf
PWC https://paperswithcode.com/paper/optimality-and-sub-optimality-of-pca-for
Repo
Framework

A Synthetic Approach for Recommendation: Combining Ratings, Social Relations, and Reviews

Title A Synthetic Approach for Recommendation: Combining Ratings, Social Relations, and Reviews
Authors Guang-Neng Hu, Xin-Yu Dai, Yunya Song, Shu-Jian Huang, Jia-Jun Chen
Abstract Recommender systems (RSs) provide an effective way of alleviating the information overload problem by selecting personalized choices. Online social networks and user-generated content provide diverse sources for recommendation beyond ratings, which present opportunities as well as challenges for traditional RSs. Although social matrix factorization (Social MF) can integrate ratings with social relations and topic matrix factorization can integrate ratings with item reviews, both of them ignore some useful information. In this paper, we investigate the effective data fusion by combining the two approaches, in two steps. First, we extend Social MF to exploit the graph structure of neighbors. Second, we propose a novel framework MR3 to jointly model these three types of information effectively for rating prediction by aligning latent factors and hidden topics. We achieve more accurate rating prediction on two real-life datasets. Furthermore, we measure the contribution of each data source to the proposed framework.
Tasks Recommendation Systems
Published 2016-01-11
URL http://arxiv.org/abs/1601.02327v1
PDF http://arxiv.org/pdf/1601.02327v1.pdf
PWC https://paperswithcode.com/paper/a-synthetic-approach-for-recommendation
Repo
Framework

A Harmonic Mean Linear Discriminant Analysis for Robust Image Classification

Title A Harmonic Mean Linear Discriminant Analysis for Robust Image Classification
Authors Shuai Zheng, Feiping Nie, Chris Ding, Heng Huang
Abstract Linear Discriminant Analysis (LDA) is a widely-used supervised dimensionality reduction method in computer vision and pattern recognition. In null space based LDA (NLDA), a well-known LDA extension, between-class distance is maximized in the null space of the within-class scatter matrix. However, there are some limitations in NLDA. Firstly, for many data sets, null space of within-class scatter matrix does not exist, thus NLDA is not applicable to those datasets. Secondly, NLDA uses arithmetic mean of between-class distances and gives equal consideration to all between-class distances, which makes larger between-class distances can dominate the result and thus limits the performance of NLDA. In this paper, we propose a harmonic mean based Linear Discriminant Analysis, Multi-Class Discriminant Analysis (MCDA), for image classification, which minimizes the reciprocal of weighted harmonic mean of pairwise between-class distance. More importantly, MCDA gives higher priority to maximize small between-class distances. MCDA can be extended to multi-label dimension reduction. Results on 7 single-label data sets and 4 multi-label data sets show that MCDA has consistently better performance than 10 other single-label approaches and 4 other multi-label approaches in terms of classification accuracy, macro and micro average F1 score.
Tasks Dimensionality Reduction, Image Classification
Published 2016-10-14
URL http://arxiv.org/abs/1610.04631v2
PDF http://arxiv.org/pdf/1610.04631v2.pdf
PWC https://paperswithcode.com/paper/a-harmonic-mean-linear-discriminant-analysis
Repo
Framework

Context Aware Nonnegative Matrix Factorization Clustering

Title Context Aware Nonnegative Matrix Factorization Clustering
Authors Rocco Tripodi, Sebastiano Vascon, Marcello Pelillo
Abstract In this article we propose a method to refine the clustering results obtained with the nonnegative matrix factorization (NMF) technique, imposing consistency constraints on the final labeling of the data. The research community focused its effort on the initialization and on the optimization part of this method, without paying attention to the final cluster assignments. We propose a game theoretic framework in which each object to be clustered is represented as a player, which has to choose its cluster membership. The information obtained with NMF is used to initialize the strategy space of the players and a weighted graph is used to model the interactions among the players. These interactions allow the players to choose a cluster which is coherent with the clusters chosen by similar players, a property which is not guaranteed by NMF, since it produces a soft clustering of the data. The results on common benchmarks show that our model is able to improve the performances of many NMF formulations.
Tasks
Published 2016-09-15
URL http://arxiv.org/abs/1609.04628v1
PDF http://arxiv.org/pdf/1609.04628v1.pdf
PWC https://paperswithcode.com/paper/context-aware-nonnegative-matrix
Repo
Framework

Faster Training of Very Deep Networks Via p-Norm Gates

Title Faster Training of Very Deep Networks Via p-Norm Gates
Authors Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh
Abstract A major contributing factor to the recent advances in deep neural networks is structural units that let sensory information and gradients to propagate easily. Gating is one such structure that acts as a flow control. Gates are employed in many recent state-of-the-art recurrent models such as LSTM and GRU, and feedforward models such as Residual Nets and Highway Networks. This enables learning in very deep networks with hundred layers and helps achieve record-breaking results in vision (e.g., ImageNet with Residual Nets) and NLP (e.g., machine translation with GRU). However, there is limited work in analysing the role of gating in the learning process. In this paper, we propose a flexible $p$-norm gating scheme, which allows user-controllable flow and as a consequence, improve the learning speed. This scheme subsumes other existing gating schemes, including those in GRU, Highway Networks and Residual Nets as special cases. Experiments on large sequence and vector datasets demonstrate that the proposed gating scheme helps improve the learning speed significantly without extra overhead.
Tasks Machine Translation
Published 2016-08-11
URL http://arxiv.org/abs/1608.03639v1
PDF http://arxiv.org/pdf/1608.03639v1.pdf
PWC https://paperswithcode.com/paper/faster-training-of-very-deep-networks-via-p
Repo
Framework

Opponent Modeling in Deep Reinforcement Learning

Title Opponent Modeling in Deep Reinforcement Learning
Authors He He, Jordan Boyd-Graber, Kevin Kwok, Hal Daumé III
Abstract Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because strategies interact with each other and change. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN); however, we retain explicit modeling (if desired) using multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.
Tasks
Published 2016-09-18
URL http://arxiv.org/abs/1609.05559v1
PDF http://arxiv.org/pdf/1609.05559v1.pdf
PWC https://paperswithcode.com/paper/opponent-modeling-in-deep-reinforcement
Repo
Framework

DECOrrelated feature space partitioning for distributed sparse regression

Title DECOrrelated feature space partitioning for distributed sparse regression
Authors Xiangyu Wang, David Dunson, Chenlei Leng
Abstract Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space). While the majority of the literature focuses on sample space partitioning, feature space partitioning is more effective when $p\gg n$. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In this paper, we solve these problems through a new embarrassingly parallel framework named DECO for distributed variable selection and parameter estimation. In DECO, variables are first partitioned and allocated to $m$ distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number $m$. Extensive numerical experiments are provided to illustrate the performance of the new framework.
Tasks
Published 2016-02-08
URL http://arxiv.org/abs/1602.02575v2
PDF http://arxiv.org/pdf/1602.02575v2.pdf
PWC https://paperswithcode.com/paper/decorrelated-feature-space-partitioning-for
Repo
Framework

Counting Everyday Objects in Everyday Scenes

Title Counting Everyday Objects in Everyday Scenes
Authors Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, Devi Parikh
Abstract We are interested in counting the number of instances of object classes in natural, everyday images. Previous counting approaches tackle the problem in restricted domains such as counting pedestrians in surveillance videos. Counts can also be estimated from outputs of other vision tasks like object detection. In this work, we build dedicated models for counting designed to tackle the large variance in counts, appearances, and scales of objects found in natural scenes. Our approach is inspired by the phenomenon of subitizing - the ability of humans to make quick assessments of counts given a perceptual signal, for small count values. Given a natural scene, we employ a divide and conquer strategy while incorporating context across the scene to adapt the subitizing idea to counting. Our approach offers consistent improvements over numerous baseline approaches for counting on the PASCAL VOC 2007 and COCO datasets. Subsequently, we study how counting can be used to improve object detection. We then show a proof of concept application of our counting methods to the task of Visual Question Answering, by studying the `how many?’ questions in the VQA and COCO-QA datasets. |
Tasks Object Detection, Question Answering, Visual Question Answering
Published 2016-04-12
URL http://arxiv.org/abs/1604.03505v3
PDF http://arxiv.org/pdf/1604.03505v3.pdf
PWC https://paperswithcode.com/paper/counting-everyday-objects-in-everyday-scenes
Repo
Framework

Mobility Map Computations for Autonomous Navigation using an RGBD Sensor

Title Mobility Map Computations for Autonomous Navigation using an RGBD Sensor
Authors Nicolò Genesio, Tariq Abuhashim, Fabio Solari, Manuela Chessa, Lorenzo Natale
Abstract In recent years, the numbers of life-size humanoids as well as their mobile capabilities have steadily grown. Stable walking motion and control for humanoid robots are active fields of research. In this scenario an open question is how to model and analyse the scene so that a motion planning algorithm can generate an appropriate walking pattern. This paper presents the current work towards scene modelling and understanding, using an RGBD sensor. The main objective is to provide the humanoid robot iCub with capabilities to navigate safely and interact with various parts of the environment. In this sense we address the problem of traversability analysis of the scene, focusing on classification of point clouds as a function of mobility, and hence walking safety.
Tasks Autonomous Navigation, Motion Planning
Published 2016-10-05
URL http://arxiv.org/abs/1610.01326v1
PDF http://arxiv.org/pdf/1610.01326v1.pdf
PWC https://paperswithcode.com/paper/mobility-map-computations-for-autonomous
Repo
Framework
comments powered by Disqus