Paper Group ANR 252
Towards Geo-Distributed Machine Learning. Online Isotonic Regression. Self-localization from Images with Small Overlap. Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values. Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016. Sparse Activity and Sparse Connectiv …
Towards Geo-Distributed Machine Learning
Title | Towards Geo-Distributed Machine Learning |
Authors | Ignacio Cano, Markus Weimer, Dhruv Mahajan, Carlo Curino, Giovanni Matteo Fumarola |
Abstract | Latency to end-users and regulatory requirements push large companies to build data centers all around the world. The resulting data is “born” geographically distributed. On the other hand, many machine learning applications require a global view of such data in order to achieve the best results. These types of applications form a new class of learning problems, which we call Geo-Distributed Machine Learning (GDML). Such applications need to cope with: 1) scarce and expensive cross-data center bandwidth, and 2) growing privacy concerns that are pushing for stricter data sovereignty regulations. Current solutions to learning from geo-distributed data sources revolve around the idea of first centralizing the data in one data center, and then training locally. As machine learning algorithms are communication-intensive, the cost of centralizing the data is thought to be offset by the lower cost of intra-data center communication during training. In this work, we show that the current centralized practice can be far from optimal, and propose a system for doing geo-distributed training. Furthermore, we argue that the geo-distributed approach is structurally more amenable to dealing with regulatory constraints, as raw data never leaves the source data center. Our empirical evaluation on three real datasets confirms the general validity of our approach, and shows that GDML is not only possible but also advisable in many scenarios. |
Tasks | |
Published | 2016-03-30 |
URL | http://arxiv.org/abs/1603.09035v1 |
http://arxiv.org/pdf/1603.09035v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-geo-distributed-machine-learning |
Repo | |
Framework | |
Online Isotonic Regression
Title | Online Isotonic Regression |
Authors | Wojciech Kotłowski, Wouter M. Koolen, Alan Malek |
Abstract | We consider the online version of the isotonic regression problem. Given a set of linearly ordered points (e.g., on the real line), the learner must predict labels sequentially at adversarially chosen positions and is evaluated by her total squared loss compared against the best isotonic (non-decreasing) function in hindsight. We survey several standard online learning algorithms and show that none of them achieve the optimal regret exponent; in fact, most of them (including Online Gradient Descent, Follow the Leader and Exponential Weights) incur linear regret. We then prove that the Exponential Weights algorithm played over a covering net of isotonic functions has a regret bounded by $O\big(T^{1/3} \log^{2/3}(T)\big)$ and present a matching $\Omega(T^{1/3})$ lower bound on regret. We provide a computationally efficient version of this algorithm. We also analyze the noise-free case, in which the revealed labels are isotonic, and show that the bound can be improved to $O(\log T)$ or even to $O(1)$ (when the labels are revealed in isotonic order). Finally, we extend the analysis beyond squared loss and give bounds for entropic loss and absolute loss. |
Tasks | |
Published | 2016-03-14 |
URL | http://arxiv.org/abs/1603.04190v2 |
http://arxiv.org/pdf/1603.04190v2.pdf | |
PWC | https://paperswithcode.com/paper/online-isotonic-regression |
Repo | |
Framework | |
Self-localization from Images with Small Overlap
Title | Self-localization from Images with Small Overlap |
Authors | Tanaka Kanji |
Abstract | With the recent success of visual features from deep convolutional neural networks (DCNN) in visual robot self-localization, it has become important and practical to address more general self-localization scenarios. In this paper, we address the scenario of self-localization from images with small overlap. We explicitly introduce a localization difficulty index as a decreasing function of view overlap between query and relevant database images and investigate performance versus difficulty for challenging cross-view self-localization tasks. We then reformulate the self-localization as a scalable bag-of-visual-features (BoVF) scene retrieval and present an efficient solution called PCA-NBNN, aiming to facilitate fast and yet discriminative correspondence between partially overlapping images. The proposed approach adopts recent findings in discriminativity preserving encoding of DCNN features using principal component analysis (PCA) and cross-domain scene matching using naive Bayes nearest neighbor distance metric (NBNN). We experimentally demonstrate that the proposed PCA-NBNN framework frequently achieves comparable results to previous DCNN features and that the BoVF model is significantly more efficient. We further address an important alternative scenario of “self-localization from images with NO overlap” and report the result. |
Tasks | |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.00993v1 |
http://arxiv.org/pdf/1603.00993v1.pdf | |
PWC | https://paperswithcode.com/paper/self-localization-from-images-with-small |
Repo | |
Framework | |
Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values
Title | Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values |
Authors | Iman Alodah, Jennifer Neville |
Abstract | Gradient boosting of regression trees is a competitive procedure for learning predictive models of continuous data that fits the data with an additive non-parametric model. The classic version of gradient boosting assumes that the data is independent and identically distributed. However, relational data with interdependent, linked instances is now common and the dependencies in such data can be exploited to improve predictive performance. Collective inference is one approach to exploit relational correlation patterns and significantly reduce classification error. However, much of the work on collective learning and inference has focused on discrete prediction tasks rather than continuous. %target values has not got that attention in terms of collective inference. In this work, we investigate how to combine these two paradigms together to improve regression in relational domains. Specifically, we propose a boosting algorithm for learning a collective inference model that predicts a continuous target variable. In the algorithm, we learn a basic relational model, collectively infer the target values, and then iteratively learn relational models to predict the residuals. We evaluate our proposed algorithm on a real network dataset and show that it outperforms alternative boosting methods. However, our investigation also revealed that the relational features interact together to produce better predictions. |
Tasks | |
Published | 2016-07-01 |
URL | http://arxiv.org/abs/1607.00110v1 |
http://arxiv.org/pdf/1607.00110v1.pdf | |
PWC | https://paperswithcode.com/paper/combining-gradient-boosting-machines-with |
Repo | |
Framework | |
Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016
Title | Predicting the future relevance of research institutions - The winning solution of the KDD Cup 2016 |
Authors | Vlad Sandulescu, Mihai Chiru |
Abstract | The world’s collective knowledge is evolving through research and new scientific discoveries. It is becoming increasingly difficult to objectively rank the impact research institutes have on global advancements. However, since the funding, governmental support, staff and students quality all mirror the projected quality of the institution, it becomes essential to measure the affiliation’s rating in a transparent and widely accepted way. We propose and investigate several methods to rank affiliations based on the number of their accepted papers at future academic conferences. We carry out our investigation using publicly available datasets such as the Microsoft Academic Graph, a heterogeneous graph which contains various information about academic papers. We analyze several models, starting with a simple probabilities-based method and then gradually expand our training dataset, engineer many more features and use mixed models and gradient boosted decision trees models to improve our predictions. |
Tasks | |
Published | 2016-09-09 |
URL | http://arxiv.org/abs/1609.02728v1 |
http://arxiv.org/pdf/1609.02728v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-the-future-relevance-of-research |
Repo | |
Framework | |
Sparse Activity and Sparse Connectivity in Supervised Learning
Title | Sparse Activity and Sparse Connectivity in Supervised Learning |
Authors | Markus Thom, Günther Palm |
Abstract | Sparseness is a useful regularizer for learning in a wide range of applications, in particular in neural networks. This paper proposes a model targeted at classification tasks, where sparse activity and sparse connectivity are used to enhance classification capabilities. The tool for achieving this is a sparseness-enforcing projection operator which finds the closest vector with a pre-defined sparseness for any given vector. In the theoretical part of this paper, a comprehensive theory for such a projection is developed. In conclusion, it is shown that the projection is differentiable almost everywhere and can thus be implemented as a smooth neuronal transfer function. The entire model can hence be tuned end-to-end using gradient-based methods. Experiments on the MNIST database of handwritten digits show that classification performance can be boosted by sparse activity or sparse connectivity. With a combination of both, performance can be significantly better compared to classical non-sparse approaches. |
Tasks | Image Classification |
Published | 2016-03-28 |
URL | http://arxiv.org/abs/1603.08367v1 |
http://arxiv.org/pdf/1603.08367v1.pdf | |
PWC | https://paperswithcode.com/paper/sparse-activity-and-sparse-connectivity-in |
Repo | |
Framework | |
Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling
Title | Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling |
Authors | Gakuto Kurata, Bing Xiang, Bowen Zhou, Mo Yu |
Abstract | Recurrent Neural Network (RNN) and one of its specific architectures, Long Short-Term Memory (LSTM), have been widely used for sequence labeling. In this paper, we first enhance LSTM-based sequence labeling to explicitly model label dependencies. Then we propose another enhancement to incorporate the global information spanning over the whole input sequence. The latter proposed method, encoder-labeler LSTM, first encodes the whole input sequence into a fixed length vector with the encoder LSTM, and then uses this encoded vector as the initial state of another LSTM for sequence labeling. Combining these methods, we can predict the label sequence with considering label dependencies and information of whole input sequence. In the experiments of a slot filling task, which is an essential component of natural language understanding, with using the standard ATIS corpus, we achieved the state-of-the-art F1-score of 95.66%. |
Tasks | Slot Filling |
Published | 2016-01-07 |
URL | http://arxiv.org/abs/1601.01530v4 |
http://arxiv.org/pdf/1601.01530v4.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-sentence-level-information-with |
Repo | |
Framework | |
Characterizing Realizability in Abstract Argumentation
Title | Characterizing Realizability in Abstract Argumentation |
Authors | Thomas Linsbichler, Jörg Pührer, Hannes Strass |
Abstract | Realizability for knowledge representation formalisms studies the following question: given a semantics and a set of interpretations, is there a knowledge base whose semantics coincides exactly with the given interpretation set? We introduce a general framework for analyzing realizability in abstract dialectical frameworks (ADFs) and various of its subclasses. In particular, the framework applies to Dung argumentation frameworks, SETAFs by Nielsen and Parsons, and bipolar ADFs. We present a uniform characterization method for the admissible, complete, preferred and model/stable semantics. We employ this method to devise an algorithm that decides realizability for the mentioned formalisms and semantics; moreover the algorithm allows for constructing a desired knowledge base whenever one exists. The algorithm is built in a modular way and thus easily extensible to new formalisms and semantics. We have also implemented our approach in answer set programming, and used the implementation to obtain several novel results on the relative expressiveness of the abovementioned formalisms. |
Tasks | Abstract Argumentation |
Published | 2016-03-31 |
URL | http://arxiv.org/abs/1603.09545v1 |
http://arxiv.org/pdf/1603.09545v1.pdf | |
PWC | https://paperswithcode.com/paper/characterizing-realizability-in-abstract |
Repo | |
Framework | |
Distribution-Specific Hardness of Learning Neural Networks
Title | Distribution-Specific Hardness of Learning Neural Networks |
Authors | Ohad Shamir |
Abstract | Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions. In this paper, we take a more nuanced view, and consider whether specific assumptions on the “niceness” of the input distribution, or “niceness” of the target function (e.g. in terms of smoothness, non-degeneracy, incoherence, random choice of parameters etc.), are sufficient to guarantee learnability using gradient-based methods. We provide evidence that neither class of assumptions alone is sufficient: On the one hand, for any member of a class of “nice” target functions, there are difficult input distributions. On the other hand, we identify a family of simple target functions, which are difficult to learn even if the input distribution is “nice”. To prove our results, we develop some tools which may be of independent interest, such as extending Fourier-based hardness techniques developed in the context of statistical queries \cite{blum1994weakly}, from the Boolean cube to Euclidean space and to more general classes of functions. |
Tasks | |
Published | 2016-09-05 |
URL | http://arxiv.org/abs/1609.01037v2 |
http://arxiv.org/pdf/1609.01037v2.pdf | |
PWC | https://paperswithcode.com/paper/distribution-specific-hardness-of-learning |
Repo | |
Framework | |
New Steps on the Exact Learning of CNF
Title | New Steps on the Exact Learning of CNF |
Authors | Montserrat Hermo, Ana Ozaki |
Abstract | A major problem in computational learning theory is whether the class of formulas in conjunctive normal form (CNF) is efficiently learnable. Although it is known that this class cannot be polynomially learned using either membership or equivalence queries alone, it is open whether CNF can be polynomially learned using both types of queries. One of the most important results concerning a restriction of the class CNF is that propositional Horn formulas are polynomial time learnable in Angluin’s exact learning model with membership and equivalence queries. In this work we push this boundary and show that the class of multivalued dependency formulas (MVDF) is polynomially learnable from interpretations. We then provide a notion of reduction between learning problems in Angluin’s model, showing that a transformation of the algorithm suffices to efficiently learn multivalued database dependencies from data relations. We also show via reductions that our main result extends well known previous results and allows us to find alternative solutions for them. |
Tasks | |
Published | 2016-09-10 |
URL | http://arxiv.org/abs/1609.03054v1 |
http://arxiv.org/pdf/1609.03054v1.pdf | |
PWC | https://paperswithcode.com/paper/new-steps-on-the-exact-learning-of-cnf |
Repo | |
Framework | |
Getting Started with Neural Models for Semantic Matching in Web Search
Title | Getting Started with Neural Models for Semantic Matching in Web Search |
Authors | Kezban Dilek Onal, Ismail Sengor Altingovde, Pinar Karagoz, Maarten de Rijke |
Abstract | The vocabulary mismatch problem is a long-standing problem in information retrieval. Semantic matching holds the promise of solving the problem. Recent advances in language technology have given rise to unsupervised neural models for learning representations of words as well as bigger textual units. Such representations enable powerful semantic matching methods. This survey is meant as an introduction to the use of neural models for semantic matching. To remain focused we limit ourselves to web search. We detail the required background and terminology, a taxonomy grouping the rapidly growing body of work in the area, and then survey work on neural models for semantic matching in the context of three tasks: query suggestion, ad retrieval, and document retrieval. We include a section on resources and best practices that we believe will help readers who are new to the area. We conclude with an assessment of the state-of-the-art and suggestions for future work. |
Tasks | Information Retrieval |
Published | 2016-11-08 |
URL | http://arxiv.org/abs/1611.03305v1 |
http://arxiv.org/pdf/1611.03305v1.pdf | |
PWC | https://paperswithcode.com/paper/getting-started-with-neural-models-for |
Repo | |
Framework | |
Fuzzy paraphrases in learning word representations with a lexicon
Title | Fuzzy paraphrases in learning word representations with a lexicon |
Authors | Yuanzhi Ke, Masafumi Hagiwara |
Abstract | A synonym of a polysemous word is usually only the paraphrase of one sense among many. When lexicons are used to improve vector-space word representations, such paraphrases are unreliable and bring noise to the vector-space. The prior works use a coefficient to adjust the overall learning of the lexicons. They regard the paraphrases equally. In this paper, we propose a novel approach that regards the paraphrases diversely to alleviate the adverse effects of polysemy. We annotate each paraphrase with a degree of reliability. The paraphrases are randomly eliminated according to the degrees when our model learns word representations. In this way, our approach drops the unreliable paraphrases, keeping more reliable paraphrases at the same time. The experimental results show that the proposed method improves the word vectors. Our approach is an attempt to address the polysemy problem keeping one vector per word. It makes the approach easier to use than the conventional methods that estimate multiple vectors for a word. Our approach also outperforms the prior works in the experiments. |
Tasks | |
Published | 2016-11-02 |
URL | http://arxiv.org/abs/1611.00674v9 |
http://arxiv.org/pdf/1611.00674v9.pdf | |
PWC | https://paperswithcode.com/paper/fuzzy-paraphrases-in-learning-word |
Repo | |
Framework | |
Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection
Title | Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection |
Authors | Valeria Efimova, Andrey Filchenkov, Anatoly Shalyto |
Abstract | Many algorithms for data analysis exist, especially for classification problems. To solve a data analysis problem, a proper algorithm should be chosen, and also its hyperparameters should be selected. In this paper, we present a new method for the simultaneous selection of an algorithm and its hyperparameters. In order to do so, we reduced this problem to the multi-armed bandit problem. We consider an algorithm as an arm and algorithm hyperparameters search during a fixed time as the corresponding arm play. We also suggest a problem-specific reward function. We performed the experiments on 10 real datasets and compare the suggested method with the existing one implemented in Auto-WEKA. The results show that our method is significantly better in most of the cases and never worse than the Auto-WEKA. |
Tasks | |
Published | 2016-11-07 |
URL | http://arxiv.org/abs/1611.02053v1 |
http://arxiv.org/pdf/1611.02053v1.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-based-simultaneous-algorithm |
Repo | |
Framework | |
Learning Generic Sentence Representations Using Convolutional Neural Networks
Title | Learning Generic Sentence Representations Using Convolutional Neural Networks |
Authors | Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, Lawrence Carin |
Abstract | We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, and using a long short-term memory recurrent neural network as a decoder. Several tasks are considered, including sentence reconstruction and future sentence prediction. Further, a hierarchical encoder-decoder model is proposed to encode a sentence to predict multiple future sentences. By training our models on a large collection of novels, we obtain a highly generic convolutional sentence encoder that performs well in practice. Experimental results on several benchmark datasets, and across a broad range of applications, demonstrate the superiority of the proposed model over competing methods. |
Tasks | |
Published | 2016-11-23 |
URL | http://arxiv.org/abs/1611.07897v2 |
http://arxiv.org/pdf/1611.07897v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-generic-sentence-representations |
Repo | |
Framework | |
Neural Networks Models for Entity Discovery and Linking
Title | Neural Networks Models for Entity Discovery and Linking |
Authors | Dan Liu, Wei Lin, Shiliang Zhang, Si Wei, Hui Jiang |
Abstract | This paper describes the USTC_NELSLIP systems submitted to the Trilingual Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population (KBP) contests. We have built two systems for entity discovery and mention detection (MD): one uses the conditional RNNLM and the other one uses the attention-based encoder-decoder framework. The entity linking (EL) system consists of two modules: a rule based candidate generation and a neural networks probability ranking model. Moreover, some simple string matching rules are used for NIL clustering. At the end, our best system has achieved an F1 score of 0.624 in the end-to-end typed mention ceaf plus metric. |
Tasks | Entity Linking, Knowledge Base Population |
Published | 2016-11-11 |
URL | http://arxiv.org/abs/1611.03558v1 |
http://arxiv.org/pdf/1611.03558v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-networks-models-for-entity-discovery |
Repo | |
Framework | |