May 7, 2019

3061 words 15 mins read

Paper Group ANR 121

Simultaneous Clustering and Estimation of Heterogeneous Graphical Models. Understanding the Energy and Precision Requirements for Online Learning. Active Algorithms For Preference Learning Problems with Multiple Populations. Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures. Adaptive Feature Abstraction for T …

Simultaneous Clustering and Estimation of Heterogeneous Graphical Models


Title	Simultaneous Clustering and Estimation of Heterogeneous Graphical Models
Authors	Botao Hao, Will Wei Sun, Yufeng Liu, Guang Cheng
Abstract	We consider joint estimation of multiple graphical models arising from heterogeneous and high-dimensional observations. Unlike most previous approaches which assume that the cluster structure is given in advance, an appealing feature of our method is to learn cluster structure while estimating heterogeneous graphical models. This is achieved via a high dimensional version of Expectation Conditional Maximization (ECM) algorithm (Meng and Rubin, 1993). A joint graphical lasso penalty is imposed on the conditional maximization step to extract both homogeneity and heterogeneity components across all clusters. Our algorithm is computationally efficient due to fast sparse learning routines and can be implemented without unsupervised learning knowledge. The superior performance of our method is demonstrated by extensive experiments and its application to a Glioblastoma cancer dataset reveals some new insights in understanding the Glioblastoma cancer. In theory, a non-asymptotic error bound is established for the output directly from our high dimensional ECM algorithm, and it consists of two quantities: statistical error (statistical accuracy) and optimization error (computational complexity). Such a result gives a theoretical guideline in terminating our ECM iterations.
Tasks	Sparse Learning
Published	2016-11-28
URL	http://arxiv.org/abs/1611.09391v2
PDF	http://arxiv.org/pdf/1611.09391v2.pdf
PWC	https://paperswithcode.com/paper/simultaneous-clustering-and-estimation-of
Repo
Framework

Understanding the Energy and Precision Requirements for Online Learning


Title	Understanding the Energy and Precision Requirements for Online Learning
Authors	Charbel Sakr, Ameya Patil, Sai Zhang, Yongjune Kim, Naresh Shanbhag
Abstract	It is well-known that the precision of data, hyperparameters, and internal representations employed in learning systems directly impacts its energy, throughput, and latency. The precision requirements for the training algorithm are also important for systems that learn on-the-fly. Prior work has shown that the data and hyperparameters can be quantized heavily without incurring much penalty in classification accuracy when compared to floating point implementations. These works suffer from two key limitations. First, they assume uniform precision for the classifier and for the training algorithm and thus miss out on the opportunity to further reduce precision. Second, prior works are empirical studies. In this article, we overcome both these limitations by deriving analytical lower bounds on the precision requirements of the commonly employed stochastic gradient descent (SGD) on-line learning algorithm in the specific context of a support vector machine (SVM). Lower bounds on the data precision are derived in terms of the the desired classification accuracy and precision of the hyperparameters used in the classifier. Additionally, lower bounds on the hyperparameter precision in the SGD training algorithm are obtained. These bounds are validated using both synthetic and the UCI breast cancer dataset. Additionally, the impact of these precisions on the energy consumption of a fixed-point SVM with on-line training is studied.
Tasks
Published	2016-07-03
URL	http://arxiv.org/abs/1607.00669v3
PDF	http://arxiv.org/pdf/1607.00669v3.pdf
PWC	https://paperswithcode.com/paper/understanding-the-energy-and-precision
Repo
Framework

Active Algorithms For Preference Learning Problems with Multiple Populations


Title	Active Algorithms For Preference Learning Problems with Multiple Populations
Authors	Aniruddha Bhargava, Ravi Ganti, Robert Nowak
Abstract	In this paper we model the problem of learning preferences of a population as an active learning problem. We propose an algorithm can adaptively choose pairs of items to show to users coming from a heterogeneous population, and use the obtained reward to decide which pair of items to show next. We provide computationally efficient algorithms with provable sample complexity guarantees for this problem in both the noiseless and noisy cases. In the process of establishing sample complexity guarantees for our algorithms, we establish new results using a Nystr{"o}m-like method which can be of independent interest. We supplement our theoretical results with experimental comparisons.
Tasks	Active Learning
Published	2016-03-14
URL	http://arxiv.org/abs/1603.04118v2
PDF	http://arxiv.org/pdf/1603.04118v2.pdf
PWC	https://paperswithcode.com/paper/active-algorithms-for-preference-learning
Repo
Framework

Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures


Title	Smoothed Hierarchical Dirichlet Process: A Non-Parametric Approach to Constraint Measures
Authors	Cheng Luo, Yang Xiang, Richard Yi Da Xu
Abstract	Time-varying mixture densities occur in many scenarios, for example, the distributions of keywords that appear in publications may evolve from year to year, video frame features associated with multiple targets may evolve in a sequence. Any models that realistically cater to this phenomenon must exhibit two important properties: the underlying mixture densities must have an unknown number of mixtures, and there must be some “smoothness” constraints in place for the adjacent mixture densities. The traditional Hierarchical Dirichlet Process (HDP) may be suited to the first property, but certainly not the second. This is due to how each random measure in the lower hierarchies is sampled independent of each other and hence does not facilitate any temporal correlations. To overcome such shortcomings, we proposed a new Smoothed Hierarchical Dirichlet Process (sHDP). The key novelty of this model is that we place a temporal constraint amongst the nearby discrete measures ${G_j}$ in the form of symmetric Kullback-Leibler (KL) Divergence with a fixed bound $B$. Although the constraint we place only involves a single scalar value, it nonetheless allows for flexibility in the corresponding successive measures. Remarkably, it also led us to infer the model within the stick-breaking process where the traditional Beta distribution used in stick-breaking is now replaced by a new constraint calculated from $B$. We present the inference algorithm and elaborate on its solutions. Our experiment using NIPS keywords has shown the desirable effect of the model.
Tasks
Published	2016-04-16
URL	http://arxiv.org/abs/1604.04741v1
PDF	http://arxiv.org/pdf/1604.04741v1.pdf
PWC	https://paperswithcode.com/paper/smoothed-hierarchical-dirichlet-process-a-non
Repo
Framework

Adaptive Feature Abstraction for Translating Video to Text


Title	Adaptive Feature Abstraction for Translating Video to Text
Authors	Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin
Abstract	Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose a new approach for generating adaptive spatiotemporal representations of videos for the captioning task. A novel attention mechanism is developed, that adaptively and sequentially focuses on different layers of CNN features (levels of feature “abstraction”), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.
Tasks	Video Captioning
Published	2016-11-23
URL	http://arxiv.org/abs/1611.07837v3
PDF	http://arxiv.org/pdf/1611.07837v3.pdf
PWC	https://paperswithcode.com/paper/adaptive-feature-abstraction-for-translating
Repo
Framework

Constrained Multi-Slot Optimization for Ranking Recommendations


Title	Constrained Multi-Slot Optimization for Ranking Recommendations
Authors	Kinjal Basu, Shaunak Chatterjee, Ankan Saha
Abstract	Ranking items to be recommended to users is one of the main problems in large scale social media applications. This problem can be set up as a multi-objective optimization problem to allow for trading off multiple, potentially conflicting objectives (that are driven by those items) against each other. Most previous approaches to this problem optimize for a single slot without considering the interaction effect of these items on one another. In this paper, we develop a constrained multi-slot optimization formulation, which allows for modeling interactions among the items on the different slots. We characterize the solution in terms of problem parameters and identify conditions under which an efficient solution is possible. The problem formulation results in a quadratically constrained quadratic program (QCQP). We provide an algorithm that gives us an efficient solution by relaxing the constraints of the QCQP minimally. Through simulated experiments, we show the benefits of modeling interactions in a multi-slot ranking context, and the speed and accuracy of our QCQP approximate solver against other state of the art methods.
Tasks
Published	2016-02-13
URL	http://arxiv.org/abs/1602.04391v2
PDF	http://arxiv.org/pdf/1602.04391v2.pdf
PWC	https://paperswithcode.com/paper/constrained-multi-slot-optimization-for
Repo
Framework

The Anatomy of a Search and Mining System for Digital Archives


Title	The Anatomy of a Search and Mining System for Digital Archives
Authors	Martyn Harris, Mark Levene, Dell Zhang, Dan Levene
Abstract	Samtla (Search And Mining Tools with Linguistic Analysis) is a digital humanities system designed in collaboration with historians and linguists to assist them with their research work in quantifying the content of any textual corpora through approximate phrase search and document comparison. The retrieval engine uses a character-based n-gram language model rather than the conventional word-based one so as to achieve great flexibility in language agnostic query processing. The index is implemented as a space-optimised character-based suffix tree with an accompanying database of document content and metadata. A number of text mining tools are integrated into the system to allow researchers to discover textual patterns, perform comparative analysis, and find out what is currently popular in the research community. Herein we describe the system architecture, user interface, models and algorithms, and data storage of the Samtla system. We also present several case studies of its usage in practice together with an evaluation of the systems’ ranking performance through crowdsourcing.
Tasks	Language Modelling
Published	2016-03-23
URL	http://arxiv.org/abs/1603.07150v1
PDF	http://arxiv.org/pdf/1603.07150v1.pdf
PWC	https://paperswithcode.com/paper/the-anatomy-of-a-search-and-mining-system-for
Repo
Framework

On Column Selection in Approximate Kernel Canonical Correlation Analysis


Title	On Column Selection in Approximate Kernel Canonical Correlation Analysis
Authors	Weiran Wang
Abstract	We study the problem of column selection in large-scale kernel canonical correlation analysis (KCCA) using the Nystr"om approximation, where one approximates two positive semi-definite kernel matrices using “landmark” points from the training set. When building low-rank kernel approximations in KCCA, previous work mostly samples the landmarks uniformly at random from the training set. We propose novel strategies for sampling the landmarks non-uniformly based on a version of statistical leverage scores recently developed for kernel ridge regression. We study the approximation accuracy of the proposed non-uniform sampling strategy, develop an incremental algorithm that explores the path of approximation ranks and facilitates efficient model selection, and derive the kernel stability of out-of-sample mapping for our method. Experimental results on both synthetic and real-world datasets demonstrate the promise of our method.
Tasks	Model Selection
Published	2016-02-05
URL	http://arxiv.org/abs/1602.02172v1
PDF	http://arxiv.org/pdf/1602.02172v1.pdf
PWC	https://paperswithcode.com/paper/on-column-selection-in-approximate-kernel
Repo
Framework

Semi-Supervised Recognition of the Diploglossus Millepunctatus Lizard Species using Artificial Vision Algorithms


Title	Semi-Supervised Recognition of the Diploglossus Millepunctatus Lizard Species using Artificial Vision Algorithms
Authors	Jhony-Heriberto Giraldo-Zuluaga, Augusto Salazar, Juan M. Daza
Abstract	Animal biometrics is an important requirement for monitoring and conservation tasks. The classical animal biometrics risk the animals’ integrity, are expensive for numerous animals, and depend on expert criterion. The non-invasive biometrics techniques offer alternatives to manage the aforementioned problems. In this paper we propose an automatic segmentation and identification algorithm based on artificial vision algorithms to recognize Diploglossus millepunctatus. Diploglossus millepunctatus is an endangered lizard species. The algorithm is based on two stages: automatic segmentation to remove the subjective evaluation, and one identification stage to reduce the analysis time. A 82.87% of correct segmentation in average is reached. Meanwhile the identification algorithm is achieved with euclidean distance point algorithms such as Iterative Closest Point and Procrustes Analysis. A performance of 92.99% on the top 1, and a 96.82% on the top 5 is reached. The developed software, and the database used in this paper are publicly available for download from the web page of the project.
Tasks
Published	2016-11-09
URL	http://arxiv.org/abs/1611.02803v1
PDF	http://arxiv.org/pdf/1611.02803v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-recognition-of-the
Repo
Framework

A Nonparametric Latent Factor Model For Location-Aware Video Recommendations


Title	A Nonparametric Latent Factor Model For Location-Aware Video Recommendations
Authors	Ehtsham Elahi
Abstract	We are interested in learning customers’ video preferences from their historic viewing patterns and geographical location. We consider a Bayesian latent factor modeling approach for this task. In order to tune the complexity of the model to best represent the data, we make use of Bayesian nonparameteric techniques. We describe an inference technique that can scale to large real-world data sets. Finally we show results obtained by applying the model to a large internal Netflix data set, that illustrates that the model was able to capture interesting relationships between viewing patterns and geographical location.
Tasks
Published	2016-12-05
URL	http://arxiv.org/abs/1612.01481v1
PDF	http://arxiv.org/pdf/1612.01481v1.pdf
PWC	https://paperswithcode.com/paper/a-nonparametric-latent-factor-model-for
Repo
Framework

Neural networks with differentiable structure


Title	Neural networks with differentiable structure
Authors	Thomas Miconi
Abstract	While gradient descent has proven highly successful in learning connection weights for neural networks, the actual structure of these networks is usually determined by hand, or by other optimization algorithms. Here we describe a simple method to make network structure differentiable, and therefore accessible to gradient descent. We test this method on recurrent neural networks applied to simple sequence prediction problems. Starting with initial networks containing only one node, the method automatically builds networks that successfully solve the tasks. The number of nodes in the final network correlates with task difficulty. The method can dynamically increase network size in response to an abrupt complexification in the task; however, reduction in network size in response to task simplification is not evident for reasonable meta-parameters. The method does not penalize network performance for these test tasks: variable-size networks actually reach better performance than fixed-size networks of higher, lower or identical size. We conclude by discussing how this method could be applied to more complex networks, such as feedforward layered networks, or multiple-area networks of arbitrary shape.
Tasks
Published	2016-06-20
URL	http://arxiv.org/abs/1606.06216v3
PDF	http://arxiv.org/pdf/1606.06216v3.pdf
PWC	https://paperswithcode.com/paper/neural-networks-with-differentiable-structure
Repo
Framework

Generating Natural Questions About an Image


Title	Generating Natural Questions About an Image
Authors	Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiaodong He, Lucy Vanderwende
Abstract	There has been an explosion of work in the vision & language community during the past few years from image captioning to video transcription, and answering questions about images. These tasks have focused on literal descriptions of the image. To move beyond the literal, we choose to explore how questions about an image are often directed at commonsense inference and the abstract events evoked by objects in the image. In this paper, we introduce the novel task of Visual Question Generation (VQG), where the system is tasked with asking a natural and engaging question when shown an image. We provide three datasets which cover a variety of images from object-centric to event-centric, with considerably more abstract training data than provided to state-of-the-art captioning systems thus far. We train and test several generative and retrieval models to tackle the task of VQG. Evaluation results show that while such models ask reasonable questions for a variety of images, there is still a wide gap with human performance which motivates further work on connecting images with commonsense knowledge and pragmatics. Our proposed task offers a new challenge to the community which we hope furthers interest in exploring deeper connections between vision & language.
Tasks	Image Captioning, Question Generation
Published	2016-03-19
URL	http://arxiv.org/abs/1603.06059v3
PDF	http://arxiv.org/pdf/1603.06059v3.pdf
PWC	https://paperswithcode.com/paper/generating-natural-questions-about-an-image
Repo
Framework

Neural Network-based Word Alignment through Score Aggregation


Title	Neural Network-based Word Alignment through Score Aggregation
Authors	Joel Legrand, Michael Auli, Ronan Collobert
Abstract	We present a simple neural network for word alignment that builds source and target word window representations to compute alignment scores for sentence pairs. To enable unsupervised training, we use an aggregation operation that summarizes the alignment scores for a given target word. A soft-margin objective increases scores for true target words while decreasing scores for target words that are not present. Compared to the popular Fast Align model, our approach improves alignment accuracy by 7 AER on English-Czech, by 6 AER on Romanian-English and by 1.7 AER on English-French alignment.
Tasks	Word Alignment
Published	2016-06-30
URL	http://arxiv.org/abs/1606.09560v1
PDF	http://arxiv.org/pdf/1606.09560v1.pdf
PWC	https://paperswithcode.com/paper/neural-network-based-word-alignment-through
Repo
Framework

A Method for Image Reduction Based on a Generalization of Ordered Weighted Averaging Functions


Title	A Method for Image Reduction Based on a Generalization of Ordered Weighted Averaging Functions
Authors	A. Diego S. Farias, Valdigleis S. Costa, Luiz Ranyer A. Lopes, Benjamín Bedregal, Regivan Santiago
Abstract	In this paper we propose a special type of aggregation function which generalizes the notion of Ordered Weighted Averaging Function - OWA. The resulting functions are called Dynamic Ordered Weighted Averaging Functions — DYOWAs. This generalization will be developed in such way that the weight vectors are variables depending on the input vector. Particularly, this operators generalize the aggregation functions: Minimum, Maximum, Arithmetic Mean, Median, etc, which are extensively used in image processing. In this field of research two problems are considered: The determination of methods to reduce images and the construction of techniques which provide noise reduction. The operators described here are able to be used in both cases. In terms of image reduction we apply the methodology provided by Patermain et al. We use the noise reduction operators obtained here to treat the images obtained in the first part of the paper, thus obtaining images with better quality.
Tasks
Published	2016-01-15
URL	http://arxiv.org/abs/1601.03785v1
PDF	http://arxiv.org/pdf/1601.03785v1.pdf
PWC	https://paperswithcode.com/paper/a-method-for-image-reduction-based-on-a
Repo
Framework

The interplay between system identification and machine learning


Title	The interplay between system identification and machine learning
Authors	Gianluigi Pillonetto
Abstract	Learning from examples is one of the key problems in science and engineering. It deals with function reconstruction from a finite set of direct and noisy samples. Regularization in reproducing kernel Hilbert spaces (RKHSs) is widely used to solve this task and includes powerful estimators such as regularization networks. Recent achievements include the proof of the statistical consistency of these kernel- based approaches. Parallel to this, many different system identification techniques have been developed but the interaction with machine learning does not appear so strong yet. One reason is that the RKHSs usually employed in machine learning do not embed the information available on dynamic systems, e.g. BIBO stability. In addition, in system identification the independent data assumptions routinely adopted in machine learning are never satisfied in practice. This paper provides new results which strengthen the connection between system identification and machine learning. Our starting point is the introduction of RKHSs of dynamic systems. They contain functionals over spaces defined by system inputs and allow to interpret system identification as learning from examples. In both linear and nonlinear settings, it is shown that this perspective permits to derive in a relatively simple way conditions on RKHS stability (i.e. the property of containing only BIBO stable systems or predictors), also facilitating the design of new kernels for system identification. Furthermore, we prove the convergence of the regularized estimator to the optimal predictor under conditions typical of dynamic systems.
Tasks
Published	2016-12-29
URL	http://arxiv.org/abs/1612.09158v1
PDF	http://arxiv.org/pdf/1612.09158v1.pdf
PWC	https://paperswithcode.com/paper/the-interplay-between-system-identification
Repo
Framework