Paper Group ANR 527
Graphical Model Sketch. Using Apache Lucene to Search Vector of Locally Aggregated Descriptors. Smoothing parameter estimation framework for IBM word alignment models. High Dimensional Inference with Random Maximum A-Posteriori Perturbations. ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms. Unsupervised preprocessing for Tactile Data. Linear Hy …
Graphical Model Sketch
Title | Graphical Model Sketch |
Authors | Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S. Muthukrishnan, Siqi Sun |
Abstract | Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch. |
Tasks | |
Published | 2016-02-09 |
URL | http://arxiv.org/abs/1602.03105v2 |
http://arxiv.org/pdf/1602.03105v2.pdf | |
PWC | https://paperswithcode.com/paper/graphical-model-sketch |
Repo | |
Framework | |
Using Apache Lucene to Search Vector of Locally Aggregated Descriptors
Title | Using Apache Lucene to Search Vector of Locally Aggregated Descriptors |
Authors | Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, Lucia Vadicamo |
Abstract | Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors. |
Tasks | |
Published | 2016-04-19 |
URL | http://arxiv.org/abs/1604.05576v1 |
http://arxiv.org/pdf/1604.05576v1.pdf | |
PWC | https://paperswithcode.com/paper/using-apache-lucene-to-search-vector-of |
Repo | |
Framework | |
Smoothing parameter estimation framework for IBM word alignment models
Title | Smoothing parameter estimation framework for IBM word alignment models |
Authors | Vuong Van Bui, Cuong Anh Le |
Abstract | IBM models are very important word alignment models in Machine Translation. Following the Maximum Likelihood Estimation principle to estimate their parameters, the models will easily overfit the training data when the data are sparse. While smoothing is a very popular solution in Language Model, there still lacks studies on smoothing for word alignment. In this paper, we propose a framework which generalizes the notable work Moore [2004] of applying additive smoothing to word alignment models. The framework allows developers to customize the smoothing amount for each pair of word. The added amount will be scaled appropriately by a common factor which reflects how much the framework trusts the adding strategy according to the performance on data. We also carefully examine various performance criteria and propose a smoothened version of the error count, which generally gives the best result. |
Tasks | Language Modelling, Machine Translation, Word Alignment |
Published | 2016-01-14 |
URL | http://arxiv.org/abs/1601.03650v4 |
http://arxiv.org/pdf/1601.03650v4.pdf | |
PWC | https://paperswithcode.com/paper/smoothing-parameter-estimation-framework-for |
Repo | |
Framework | |
High Dimensional Inference with Random Maximum A-Posteriori Perturbations
Title | High Dimensional Inference with Random Maximum A-Posteriori Perturbations |
Authors | Tamir Hazan, Francesco Orabona, Anand D. Sarwate, Subhransu Maji, Tommi Jaakkola |
Abstract | This paper presents a new approach, called perturb-max, for high-dimensional statistical inference that is based on applying random perturbations followed by optimization. This framework injects randomness to maximum a-posteriori (MAP) predictors by randomly perturbing the potential function for the input. A classic result from extreme value statistics asserts that perturb-max operations generate unbiased samples from the Gibbs distribution using high-dimensional perturbations. Unfortunately, the computational cost of generating so many high-dimensional random variables can be prohibitive. However, when the perturbations are of low dimension, sampling the perturb-max prediction is as efficient as MAP optimization. This paper shows that the expected value of perturb-max inference with low dimensional perturbations can be used sequentially to generate unbiased samples from the Gibbs distribution. Furthermore the expected value of the maximal perturbations is a natural bound on the entropy of such perturb-max models. A measure concentration result for perturb-max values shows that the deviation of their sampled average from its expectation decays exponentially in the number of samples, allowing effective approximation of the expectation. |
Tasks | |
Published | 2016-02-10 |
URL | http://arxiv.org/abs/1602.03571v3 |
http://arxiv.org/pdf/1602.03571v3.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-inference-with-random |
Repo | |
Framework | |
ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms
Title | ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms |
Authors | Enrico Santus, Tin-Shing Chiu, Qin Lu, Alessandro Lenci, Chu-Ren Huang |
Abstract | In this paper, we describe ROOT13, a supervised system for the classification of hypernyms, co-hyponyms and random words. The system relies on a Random Forest algorithm and 13 unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT13 achieves an F1 score of 88.3%, against a baseline of 57.6% (vector cosine). When the classification is binary, ROOT13 achieves the following results: hypernyms-co-hyponyms (93.4% vs. 60.2%), hypernymsrandom (92.3% vs. 65.5%) and co-hyponyms-random (97.3% vs. 81.5%). Our results are competitive with stateof-the-art models. |
Tasks | |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08705v1 |
http://arxiv.org/pdf/1603.08705v1.pdf | |
PWC | https://paperswithcode.com/paper/root13-spotting-hypernyms-co-hyponyms-and |
Repo | |
Framework | |
Unsupervised preprocessing for Tactile Data
Title | Unsupervised preprocessing for Tactile Data |
Authors | Maximilian Karl, Justin Bayer, Patrick van der Smagt |
Abstract | Tactile information is important for gripping, stable grasp, and in-hand manipulation, yet the complexity of tactile data prevents widespread use of such sensors. We make use of an unsupervised learning algorithm that transforms the complex tactile data into a compact, latent representation without the need to record ground truth reference data. These compact representations can either be used directly in a reinforcement learning based controller or can be used to calibrate the tactile sensor to physical quantities with only a few datapoints. We show the quality of our latent representation by predicting important features and with a simple control task. |
Tasks | |
Published | 2016-06-23 |
URL | http://arxiv.org/abs/1606.07312v1 |
http://arxiv.org/pdf/1606.07312v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-preprocessing-for-tactile-data |
Repo | |
Framework | |
Linear Hypothesis Testing in Dense High-Dimensional Linear Models
Title | Linear Hypothesis Testing in Dense High-Dimensional Linear Models |
Authors | Yinchu Zhu, Jelena Bradic |
Abstract | We propose a methodology for testing linear hypothesis in high-dimensional linear models. The proposed test does not impose any restriction on the size of the model, i.e. model sparsity or the loading vector representing the hypothesis. Providing asymptotically valid methods for testing general linear functions of the regression parameters in high-dimensions is extremely challenging – especially without making restrictive or unverifiable assumptions on the number of non-zero elements. We propose to test the moment conditions related to the newly designed restructured regression, where the inputs are transformed and augmented features. These new features incorporate the structure of the null hypothesis directly. The test statistics are constructed in such a way that lack of sparsity in the original model parameter does not present a problem for the theoretical justification of our procedures. We establish asymptotically exact control on Type I error without imposing any sparsity assumptions on model parameter or the vector representing the linear hypothesis. Our method is also shown to achieve certain optimality in detecting deviations from the null hypothesis. We demonstrate the favorable finite-sample performance of the proposed methods, via a number of numerical and a real data example. |
Tasks | |
Published | 2016-10-10 |
URL | http://arxiv.org/abs/1610.02987v2 |
http://arxiv.org/pdf/1610.02987v2.pdf | |
PWC | https://paperswithcode.com/paper/linear-hypothesis-testing-in-dense-high |
Repo | |
Framework | |
Communication-efficient Distributed Sparse Linear Discriminant Analysis
Title | Communication-efficient Distributed Sparse Linear Discriminant Analysis |
Authors | Lu Tian, Quanquan Gu |
Abstract | We propose a communication-efficient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime. Our method distributes the data of size $N$ into $m$ machines, and estimates a local sparse LDA estimator on each machine using the data subset of size $N/m$. After the distributed estimation, our method aggregates the debiased local estimators from $m$ machines, and sparsifies the aggregated estimator. We show that the aggregated estimator attains the same statistical rate as the centralized estimation method, as long as the number of machines $m$ is chosen appropriately. Moreover, we prove that our method can attain the model selection consistency under a milder condition than the centralized method. Experiments on both synthetic and real datasets corroborate our theory. |
Tasks | Model Selection |
Published | 2016-10-15 |
URL | http://arxiv.org/abs/1610.04798v1 |
http://arxiv.org/pdf/1610.04798v1.pdf | |
PWC | https://paperswithcode.com/paper/communication-efficient-distributed-sparse |
Repo | |
Framework | |
Does Distributionally Robust Supervised Learning Give Robust Classifiers?
Title | Does Distributionally Robust Supervised Learning Give Robust Classifiers? |
Authors | Weihua Hu, Gang Niu, Issei Sato, Masashi Sugiyama |
Abstract | Distributionally Robust Supervised Learning (DRSL) is necessary for building reliable machine learning systems. When machine learning is deployed in the real world, its performance can be significantly degraded because test data may follow a different distribution from training data. DRSL with f-divergences explicitly considers the worst-case distribution shift by minimizing the adversarially reweighted training loss. In this paper, we analyze this DRSL, focusing on the classification scenario. Since the DRSL is explicitly formulated for a distribution shift scenario, we naturally expect it to give a robust classifier that can aggressively handle shifted distributions. However, surprisingly, we prove that the DRSL just ends up giving a classifier that exactly fits the given training distribution, which is too pessimistic. This pessimism comes from two sources: the particular losses used in classification and the fact that the variety of distributions to which the DRSL tries to be robust is too wide. Motivated by our analysis, we propose simple DRSL that overcomes this pessimism and empirically demonstrate its effectiveness. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02041v6 |
http://arxiv.org/pdf/1611.02041v6.pdf | |
PWC | https://paperswithcode.com/paper/does-distributionally-robust-supervised |
Repo | |
Framework | |
Simple and Efficient Learning using Privileged Information
Title | Simple and Efficient Learning using Privileged Information |
Authors | Xinxing Xu, Joey Tianyi Zhou, IvorW. Tsang, Zheng Qin, Rick Siow Mong Goh, Yong Liu |
Abstract | The Support Vector Machine using Privileged Information (SVM+) has been proposed to train a classifier to utilize the additional privileged information that is only available in the training phase but not available in the test phase. In this work, we propose an efficient solution for SVM+ by simply utilizing the squared hinge loss instead of the hinge loss as in the existing SVM+ formulation, which interestingly leads to a dual form with less variables and in the same form with the dual of the standard SVM. The proposed algorithm is utilized to leverage the additional web knowledge that is only available during training for the image categorization tasks. The extensive experimental results on both Caltech101 andWebQueries datasets show that our proposed method can achieve a factor of up to hundred times speedup with the comparable accuracy when compared with the existing SVM+ method. |
Tasks | Image Categorization |
Published | 2016-04-06 |
URL | http://arxiv.org/abs/1604.01518v1 |
http://arxiv.org/pdf/1604.01518v1.pdf | |
PWC | https://paperswithcode.com/paper/simple-and-efficient-learning-using |
Repo | |
Framework | |
A Constraint-Handling Technique for Genetic Algorithms using a Violation Factor
Title | A Constraint-Handling Technique for Genetic Algorithms using a Violation Factor |
Authors | Adam Chehouri, Rafic Younes, Jean Perron, Adrian Ilinca |
Abstract | Over the years, several meta-heuristic algorithms were proposed and are now emerging as common methods for constrained optimization problems. Among them, genetic algorithms (GA’s) shine as popular evolutionary algorithms (EA’s) in engineering optimization. Most engineering design problems are difficult to resolve with conventional optimization algorithms because they are highly nonlinear and contain constraints. In order to handle these constraints, the most common technique is to apply penalty functions. The major drawback is that they require tuning of parameters, which can be very challenging. In this paper, we present a constraint-handling technique for GA’s solely using the violation factor, called VCH (Violation Constraint-Handling) method. Several benchmark problems from the literature are examined. The VCH technique was able to provide a consistent performance and match results from other GA-based techniques. |
Tasks | |
Published | 2016-10-04 |
URL | http://arxiv.org/abs/1610.00976v1 |
http://arxiv.org/pdf/1610.00976v1.pdf | |
PWC | https://paperswithcode.com/paper/a-constraint-handling-technique-for-genetic |
Repo | |
Framework | |
Evaluation of the Effect of Improper Segmentation on Word Spotting
Title | Evaluation of the Effect of Improper Segmentation on Word Spotting |
Authors | Sounak Dey, Anguelos Nicolaou, Josep Llados, Umapada Pal |
Abstract | Word spotting is an important recognition task in historical document analysis. In most cases methods are developed and evaluated assuming perfect word segmentations. In this paper we propose an experimental framework to quantify the effect of goodness of word segmentation has on the performance achieved by word spotting methods in identical unbiased conditions. The framework consists of generating systematic distortions on segmentation and retrieving the original queries from the distorted dataset. We apply the framework on the George Washington and Barcelona Marriage Dataset and on several established and state-of-the-art methods. The experiments allow for an estimate of the end-to-end performance of word spotting methods. |
Tasks | |
Published | 2016-04-21 |
URL | http://arxiv.org/abs/1604.06243v1 |
http://arxiv.org/pdf/1604.06243v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluation-of-the-effect-of-improper |
Repo | |
Framework | |
Linear-time Outlier Detection via Sensitivity
Title | Linear-time Outlier Detection via Sensitivity |
Authors | Mario Lucic, Olivier Bachem, Andreas Krause |
Abstract | Outliers are ubiquitous in modern data sets. Distance-based techniques are a popular non-parametric approach to outlier detection as they require no prior assumptions on the data generating distribution and are simple to implement. Scaling these techniques to massive data sets without sacrificing accuracy is a challenging task. We propose a novel algorithm based on the intuition that outliers have a significant influence on the quality of divergence-based clustering solutions. We propose sensitivity - the worst-case impact of a data point on the clustering objective - as a measure of outlierness. We then prove that influence, a (non-trivial) upper-bound on the sensitivity, can be computed by a simple linear time algorithm. To scale beyond a single machine, we propose a communication efficient distributed algorithm. In an extensive experimental evaluation, we demonstrate the effectiveness and establish the statistical significance of the proposed approach. In particular, it outperforms the most popular distance-based approaches while being several orders of magnitude faster. |
Tasks | Outlier Detection |
Published | 2016-05-02 |
URL | http://arxiv.org/abs/1605.00519v1 |
http://arxiv.org/pdf/1605.00519v1.pdf | |
PWC | https://paperswithcode.com/paper/linear-time-outlier-detection-via-sensitivity |
Repo | |
Framework | |
Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints
Title | Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints |
Authors | Greg Durrett, Taylor Berg-Kirkpatrick, Dan Klein |
Abstract | We present a discriminative model for single-document summarization that integrally combines compression and anaphoricity constraints. Our model selects textual units to include in the summary based on a rich set of sparse features whose weights are learned on a large corpus. We allow for the deletion of content within a sentence when that deletion is licensed by compression rules; in our framework, these are implemented as dependencies between subsentential units of text. Anaphoricity constraints then improve cross-sentence coherence by guaranteeing that, for each pronoun included in the summary, the pronoun’s antecedent is included as well or the pronoun is rewritten as a full mention. When trained end-to-end, our final system outperforms prior work on both ROUGE as well as on human judgments of linguistic quality. |
Tasks | Document Summarization |
Published | 2016-03-29 |
URL | http://arxiv.org/abs/1603.08887v2 |
http://arxiv.org/pdf/1603.08887v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-based-single-document-summarization |
Repo | |
Framework | |
Chinese Song Iambics Generation with Neural Attention-based Model
Title | Chinese Song Iambics Generation with Neural Attention-based Model |
Authors | Qixin Wang, Tianyi Luo, Dong Wang, Chao Xing |
Abstract | Learning and generating Chinese poems is a charming yet challenging task. Traditional approaches involve various language modeling and machine translation techniques, however, they perform not as well when generating poems with complex pattern constraints, for example Song iambics, a famous type of poems that involve variable-length sentences and strict rhythmic patterns. This paper applies the attention-based sequence-to-sequence model to generate Chinese Song iambics. Specifically, we encode the cue sentences by a bi-directional Long-Short Term Memory (LSTM) model and then predict the entire iambic with the information provided by the encoder, in the form of an attention-based LSTM that can regularize the generation process by the fine structure of the input cues. Several techniques are investigated to improve the model, including global context integration, hybrid style training, character vector initialization and adaptation. Both the automatic and subjective evaluation results show that our model indeed can learn the complex structural and rhythmic patterns of Song iambics, and the generation is rather successful. |
Tasks | Language Modelling, Machine Translation |
Published | 2016-04-21 |
URL | http://arxiv.org/abs/1604.06274v2 |
http://arxiv.org/pdf/1604.06274v2.pdf | |
PWC | https://paperswithcode.com/paper/chinese-song-iambics-generation-with-neural |
Repo | |
Framework | |