Paper Group ANR 121
Simultaneous Clustering and Estimation of Heterogeneous Graphical Models. Understanding the Energy and Precision Requirements for Online Learning. Active Algorithms For Preference Learning Problems with Multiple Populations. Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures. Adaptive Feature Abstraction for T …
Simultaneous Clustering and Estimation of Heterogeneous Graphical Models
Title | Simultaneous Clustering and Estimation of Heterogeneous Graphical Models |
Authors | Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng |
Abstract | We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations. Unlike most previous approaches which assume that the cluster structure is given in advance, an appealing feature of our method is to learn cluster structure while estimating heterogeneous graphical models. This is achieved via a high dimensional version of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993). A joint graphical lasso penalty is imposed on the conditional maximization step to extract both homogeneity and heterogeneity components across all clusters. Our algorithm is computationally efficient due to fast sparse learning routines and can be implemented without unsupervised learning knowledge. The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity). Such a result gives a theoretical guideline in terminating our ECM iterations. |
Tasks | Sparse Learning |
Published | 2016-11-28 |
URL | http://arxiv.org/abs/1611.09391v2 |
http://arxiv.org/pdf/1611.09391v2.pdf | |
PWC | https://paperswithcode.com/paper/simultaneous-clustering-and-estimation-of |
Repo | |
Framework | |
Understanding the Energy and Precision Requirements for Online Learning
Title | Understanding the Energy and Precision Requirements for Online Learning |
Authors | Charbel Sakr, Ameya Patil, Sai Zhang, Yongjune Kim, Naresh Shanbhag |
Abstract | It is well-known that the precision of data, hyperparameters, and internal representations employed in learning systems directly impacts its energy, throughput, and latency. The precision requirements for the training algorithm are also important for systems that learn on-the-fly. Prior work has shown that the data and hyperparameters can be quantized heavily without incurring much penalty in classification accuracy when compared to floating point implementations. These works suffer from two key limitations. First, they assume uniform precision for the classifier and for the training algorithm and thus miss out on the opportunity to further reduce precision. Second, prior works are empirical studies. In this article, we overcome both these limitations by deriving analytical lower bounds on the precision requirements of the commonly employed stochastic gradient descent (SGD) on-line learning algorithm in the specific context of a support vector machine (SVM). Lower bounds on the data precision are derived in terms of the the desired classification accuracy and precision of the hyperparameters used in the classifier. Additionally, lower bounds on the hyperparameter precision in the SGD training algorithm are obtained. These bounds are validated using both synthetic and the UCI breast cancer dataset. Additionally, the impact of these precisions on the energy consumption of a fixed-point SVM with on-line training is studied. |
Tasks | |
Published | 2016-07-03 |
URL | http://arxiv.org/abs/1607.00669v3 |
http://arxiv.org/pdf/1607.00669v3.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-energy-and-precision |
Repo | |
Framework | |
Active Algorithms For Preference Learning Problems with Multiple Populations
Title | Active Algorithms For Preference Learning Problems with Multiple Populations |
Authors | Aniruddha Bhargava, Ravi Ganti, Robert Nowak |
Abstract | In this paper we model the problem of learning preferences of a population as an active learning problem. We propose an algorithm can adaptively choose pairs of items to show to users coming from a heterogeneous population, and use the obtained reward to decide which pair of items to show next. We provide computationally efficient algorithms with provable sample complexity guarantees for this problem in both the noiseless and noisy cases. In the process of establishing sample complexity guarantees for our algorithms, we establish new results using a Nystr{"o}m-like method which can be of independent interest. We supplement our theoretical results with experimental comparisons. |
Tasks | Active Learning |
Published | 2016-03-14 |
URL | http://arxiv.org/abs/1603.04118v2 |
http://arxiv.org/pdf/1603.04118v2.pdf | |
PWC | https://paperswithcode.com/paper/active-algorithms-for-preference-learning |
Repo | |
Framework | |
Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures
Title | Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures |
Authors | Cheng Luo, Yang Xiang, Richard Yi Da Xu |
Abstract | Time-varying mixture densities occur in many scenarios, for example, the distributions of keywords that appear in publications may evolve from year to year, video frame features associated with multiple targets may evolve in a sequence. Any models that realistically cater to this phenomenon must exhibit two important properties: the underlying mixture densities must have an unknown number of mixtures, and there must be some “smoothness” constraints in place for the adjacent mixture densities. The traditional Hierarchical Dirichlet Process (HDP) may be suited to the first property, but certainly not the second. This is due to how each random measure in the lower hierarchies is sampled independent of each other and hence does not facilitate any temporal correlations. To overcome such shortcomings, we proposed a new Smoothed Hierarchical Dirichlet Process (sHDP). The key novelty of this model is that we place a temporal constraint amongst the nearby discrete measures ${G_j}$ in the form of symmetric Kullback-Leibler (KL) Divergence with a fixed bound $B$. Although the constraint we place only involves a single scalar value, it nonetheless allows for flexibility in the corresponding successive measures. Remarkably, it also led us to infer the model within the stick-breaking process where the traditional Beta distribution used in stick-breaking is now replaced by a new constraint calculated from $B$. We present the inference algorithm and elaborate on its solutions. Our experiment using NIPS keywords has shown the desirable effect of the model. |
Tasks | |
Published | 2016-04-16 |
URL | http://arxiv.org/abs/1604.04741v1 |
http://arxiv.org/pdf/1604.04741v1.pdf | |
PWC | https://paperswithcode.com/paper/smoothed-hierarchical-dirichlet-process-a-non |
Repo | |
Framework | |
Adaptive Feature Abstraction for Translating Video to Text
Title | Adaptive Feature Abstraction for Translating Video to Text |
Authors | Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin |
Abstract | Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose a new approach for generating adaptive spatiotemporal representations of videos for the captioning task. A novel attention mechanism is developed, that adaptively and sequentially focuses on different layers of CNN features (levels of feature “abstraction”), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics. |
Tasks | Video Captioning |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07837v3 |
http://arxiv.org/pdf/1611.07837v3.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-feature-abstraction-for-translating |
Repo | |
Framework | |
Constrained Multi-Slot Optimization for Ranking Recommendations
Title | Constrained Multi-Slot Optimization for Ranking Recommendations |
Authors | Kinjal Basu, Shaunak Chatterjee, Ankan Saha |
Abstract | Ranking items to be recommended to users is one of the main problems in large scale social media applications. This problem can be set up as a multi-objective optimization problem to allow for trading off multiple, potentially conflicting objectives (that are driven by those items) against each other. Most previous approaches to this problem optimize for a single slot without considering the interaction effect of these items on one another. In this paper, we develop a constrained multi-slot optimization formulation, which allows for modeling interactions among the items on the different slots. We characterize the solution in terms of problem parameters and identify conditions under which an efficient solution is possible. The problem formulation results in a quadratically constrained quadratic program (QCQP). We provide an algorithm that gives us an efficient solution by relaxing the constraints of the QCQP minimally. Through simulated experiments, we show the benefits of modeling interactions in a multi-slot ranking context, and the speed and accuracy of our QCQP approximate solver against other state of the art methods. |
Tasks | |
Published | 2016-02-13 |
URL | http://arxiv.org/abs/1602.04391v2 |
http://arxiv.org/pdf/1602.04391v2.pdf | |
PWC | https://paperswithcode.com/paper/constrained-multi-slot-optimization-for |
Repo | |
Framework | |
The Anatomy of a Search and Mining System for Digital Archives
Title | The Anatomy of a Search and Mining System for Digital Archives |
Authors | Martyn Harris, Mark Levene, Dell Zhang, Dan Levene |
Abstract | Samtla (Search And Mining Tools with Linguistic Analysis) is a digital humanities system designed in collaboration with historians and linguists to assist them with their research work in quantifying the content of any textual corpora through approximate phrase search and document comparison. The retrieval engine uses a character-based n-gram language model rather than the conventional word-based one so as to achieve great flexibility in language agnostic query processing. The index is implemented as a space-optimised character-based suffix tree with an accompanying database of document content and metadata. A number of text mining tools are integrated into the system to allow researchers to discover textual patterns, perform comparative analysis, and find out what is currently popular in the research community. Herein we describe the system architecture, user interface, models and algorithms, and data storage of the Samtla system. We also present several case studies of its usage in practice together with an evaluation of the systems’ ranking performance through crowdsourcing. |
Tasks | Language Modelling |
Published | 2016-03-23 |
URL | http://arxiv.org/abs/1603.07150v1 |
http://arxiv.org/pdf/1603.07150v1.pdf | |
PWC | https://paperswithcode.com/paper/the-anatomy-of-a-search-and-mining-system-for |
Repo | |
Framework | |
On Column Selection in Approximate Kernel Canonical Correlation Analysis
Title | On Column Selection in Approximate Kernel Canonical Correlation Analysis |
Authors | Weiran Wang |
Abstract | We study the problem of column selection in large-scale kernel canonical correlation analysis (KCCA) using the Nystr"om approximation, where one approximates two positive semi-definite kernel matrices using “landmark” points from the training set. When building low-rank kernel approximations in KCCA, previous work mostly samples the landmarks uniformly at random from the training set. We propose novel strategies for sampling the landmarks non-uniformly based on a version of statistical leverage scores recently developed for kernel ridge regression. We study the approximation accuracy of the proposed non-uniform sampling strategy, develop an incremental algorithm that explores the path of approximation ranks and facilitates efficient model selection, and derive the kernel stability of out-of-sample mapping for our method. Experimental results on both synthetic and real-world datasets demonstrate the promise of our method. |
Tasks | Model Selection |
Published | 2016-02-05 |
URL | http://arxiv.org/abs/1602.02172v1 |
http://arxiv.org/pdf/1602.02172v1.pdf | |
PWC | https://paperswithcode.com/paper/on-column-selection-in-approximate-kernel |
Repo | |
Framework | |
Semi-Supervised Recognition of the Diploglossus Millepunctatus Lizard Species using Artificial Vision Algorithms
Title | Semi-Supervised Recognition of the Diploglossus Millepunctatus Lizard Species using Artificial Vision Algorithms |
Authors | Jhony-Heriberto Giraldo-Zuluaga, Augusto Salazar, Juan M. Daza |
Abstract | Animal biometrics is an important requirement for monitoring and conservation tasks. The classical animal biometrics risk the animals’ integrity, are expensive for numerous animals, and depend on expert criterion. The non-invasive biometrics techniques offer alternatives to manage the aforementioned problems. In this paper we propose an automatic segmentation and identification algorithm based on artificial vision algorithms to recognize Diploglossus millepunctatus. Diploglossus millepunctatus is an endangered lizard species. The algorithm is based on two stages: automatic segmentation to remove the subjective evaluation, and one identification stage to reduce the analysis time. A 82.87% of correct segmentation in average is reached. Meanwhile the identification algorithm is achieved with euclidean distance point algorithms such as Iterative Closest Point and Procrustes Analysis. A performance of 92.99% on the top 1, and a 96.82% on the top 5 is reached. The developed software, and the database used in this paper are publicly available for download from the web page of the project. |
Tasks | |
Published | 2016-11-09 |
URL | http://arxiv.org/abs/1611.02803v1 |
http://arxiv.org/pdf/1611.02803v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-recognition-of-the |
Repo | |
Framework | |
A Nonparametric Latent Factor Model For Location-Aware Video Recommendations
Title | A Nonparametric Latent Factor Model For Location-Aware Video Recommendations |
Authors | Ehtsham Elahi |
Abstract | We are interested in learning customers’ video preferences from their historic viewing patterns and geographical location. We consider a Bayesian latent factor modeling approach for this task. In order to tune the complexity of the model to best represent the data, we make use of Bayesian nonparameteric techniques. We describe an inference technique that can scale to large real-world data sets. Finally we show results obtained by applying the model to a large internal Netflix data set, that illustrates that the model was able to capture interesting relationships between viewing patterns and geographical location. |
Tasks | |
Published | 2016-12-05 |
URL | http://arxiv.org/abs/1612.01481v1 |
http://arxiv.org/pdf/1612.01481v1.pdf | |
PWC | https://paperswithcode.com/paper/a-nonparametric-latent-factor-model-for |
Repo | |
Framework | |
Neural networks with differentiable structure
Title | Neural networks with differentiable structure |
Authors | Thomas Miconi |
Abstract | While gradient descent has proven highly successful in learning connection weights for neural networks, the actual structure of these networks is usually determined by hand, or by other optimization algorithms. Here we describe a simple method to make network structure differentiable, and therefore accessible to gradient descent. We test this method on recurrent neural networks applied to simple sequence prediction problems. Starting with initial networks containing only one node, the method automatically builds networks that successfully solve the tasks. The number of nodes in the final network correlates with task difficulty. The method can dynamically increase network size in response to an abrupt complexification in the task; however, reduction in network size in response to task simplification is not evident for reasonable meta-parameters. The method does not penalize network performance for these test tasks: variable-size networks actually reach better performance than fixed-size networks of higher, lower or identical size. We conclude by discussing how this method could be applied to more complex networks, such as feedforward layered networks, or multiple-area networks of arbitrary shape. |
Tasks | |
Published | 2016-06-20 |
URL | http://arxiv.org/abs/1606.06216v3 |
http://arxiv.org/pdf/1606.06216v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-with-differentiable-structure |
Repo | |
Framework | |
Generating Natural Questions About an Image
Title | Generating Natural Questions About an Image |
Authors | Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende |
Abstract | There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language. |
Tasks | Image Captioning, Question Generation |
Published | 2016-03-19 |
URL | http://arxiv.org/abs/1603.06059v3 |
http://arxiv.org/pdf/1603.06059v3.pdf | |
PWC | https://paperswithcode.com/paper/generating-natural-questions-about-an-image |
Repo | |
Framework | |
Neural Network-based Word Alignment through Score Aggregation
Title | Neural Network-based Word Alignment through Score Aggregation |
Authors | Joel Legrand, Michael Auli, Ronan Collobert |
Abstract | We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs. To enable unsupervised training, we use an aggregation operation that summarizes the alignment scores for a given target word. A soft-margin objective increases scores for true target words while decreasing scores for target words that are not present. Compared to the popular Fast Align model, our approach improves alignment accuracy by 7 AER on English-Czech, by 6 AER on Romanian-English and by 1.7 AER on English-French alignment. |
Tasks | Word Alignment |
Published | 2016-06-30 |
URL | http://arxiv.org/abs/1606.09560v1 |
http://arxiv.org/pdf/1606.09560v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-network-based-word-alignment-through |
Repo | |
Framework | |
A Method for Image Reduction Based on a Generalization of Ordered Weighted Averaging Functions
Title | A Method for Image Reduction Based on a Generalization of Ordered Weighted Averaging Functions |
Authors | A. Diego S. Farias, Valdigleis S. Costa, Luiz Ranyer A. Lopes, Benjamín Bedregal, Regivan Santiago |
Abstract | In this paper we propose a special type of aggregation function which generalizes the notion of Ordered Weighted Averaging Function - OWA. The resulting functions are called Dynamic Ordered Weighted Averaging Functions — DYOWAs. This generalization will be developed in such way that the weight vectors are variables depending on the input vector. Particularly, this operators generalize the aggregation functions: Minimum, Maximum, Arithmetic Mean, Median, etc, which are extensively used in image processing. In this field of research two problems are considered: The determination of methods to reduce images and the construction of techniques which provide noise reduction. The operators described here are able to be used in both cases. In terms of image reduction we apply the methodology provided by Patermain et al. We use the noise reduction operators obtained here to treat the images obtained in the first part of the paper, thus obtaining images with better quality. |
Tasks | |
Published | 2016-01-15 |
URL | http://arxiv.org/abs/1601.03785v1 |
http://arxiv.org/pdf/1601.03785v1.pdf | |
PWC | https://paperswithcode.com/paper/a-method-for-image-reduction-based-on-a |
Repo | |
Framework | |
The interplay between system identification and machine learning
Title | The interplay between system identification and machine learning |
Authors | Gianluigi Pillonetto |
Abstract | Learning from examples is one of the key problems in science and engineering. It deals with function reconstruction from a finite set of direct and noisy samples. Regularization in reproducing kernel Hilbert spaces (RKHSs) is widely used to solve this task and includes powerful estimators such as regularization networks. Recent achievements include the proof of the statistical consistency of these kernel- based approaches. Parallel to this, many different system identification techniques have been developed but the interaction with machine learning does not appear so strong yet. One reason is that the RKHSs usually employed in machine learning do not embed the information available on dynamic systems, e.g. BIBO stability. In addition, in system identification the independent data assumptions routinely adopted in machine learning are never satisfied in practice. This paper provides new results which strengthen the connection between system identification and machine learning. Our starting point is the introduction of RKHSs of dynamic systems. They contain functionals over spaces defined by system inputs and allow to interpret system identification as learning from examples. In both linear and nonlinear settings, it is shown that this perspective permits to derive in a relatively simple way conditions on RKHS stability (i.e. the property of containing only BIBO stable systems or predictors), also facilitating the design of new kernels for system identification. Furthermore, we prove the convergence of the regularized estimator to the optimal predictor under conditions typical of dynamic systems. |
Tasks | |
Published | 2016-12-29 |
URL | http://arxiv.org/abs/1612.09158v1 |
http://arxiv.org/pdf/1612.09158v1.pdf | |
PWC | https://paperswithcode.com/paper/the-interplay-between-system-identification |
Repo | |
Framework | |