Paper Group ANR 228
Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features. Counterfactual Language Model Adaptation for Suggesting Phrases. Synthetic Database for Evaluation of General, Fundamental Biometric Principles. License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks. An Optimal Online Method o …
Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features
Title | Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features |
Authors | Pantelis Bouboulis, Symeon Chouvardas, Sergios Theodoridis |
Abstract | We present a novel diffusion scheme for online kernel-based learning over networks. So far, a major drawback of any online learning algorithm, operating in a reproducing kernel Hilbert space (RKHS), is the need for updating a growing number of parameters as time iterations evolve. Besides complexity, this leads to an increased need of communication resources, in a distributed setting. In contrast, the proposed method approximates the solution as a fixed-size vector (of larger dimension than the input space) using Random Fourier Features. This paves the way to use standard linear combine-then-adapt techniques. To the best of our knowledge, this is the first time that a complete protocol for distributed online learning in RKHS is presented. Conditions for asymptotic convergence and boundness of the networkwise regret are also provided. The simulated tests illustrate the performance of the proposed scheme. |
Tasks | |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.08131v2 |
http://arxiv.org/pdf/1703.08131v2.pdf | |
PWC | https://paperswithcode.com/paper/online-distributed-learning-over-networks-in |
Repo | |
Framework | |
Counterfactual Language Model Adaptation for Suggesting Phrases
Title | Counterfactual Language Model Adaptation for Suggesting Phrases |
Authors | Kenneth C. Arnold, Kai-Wei Chang, Adam T. Kalai |
Abstract | Mobile devices use language models to suggest words and phrases for use in text entry. Traditional language models are based on contextual word frequency in a static corpus of text. However, certain types of phrases, when offered to writers as suggestions, may be systematically chosen more often than their frequency would predict. In this paper, we propose the task of generating suggestions that writers accept, a related but distinct task to making accurate predictions. Although this task is fundamentally interactive, we propose a counterfactual setting that permits offline training and evaluation. We find that even a simple language model can capture text characteristics that improve acceptability. |
Tasks | Language Modelling |
Published | 2017-10-04 |
URL | http://arxiv.org/abs/1710.01799v1 |
http://arxiv.org/pdf/1710.01799v1.pdf | |
PWC | https://paperswithcode.com/paper/counterfactual-language-model-adaptation-for |
Repo | |
Framework | |
Synthetic Database for Evaluation of General, Fundamental Biometric Principles
Title | Synthetic Database for Evaluation of General, Fundamental Biometric Principles |
Authors | Lee Friedman, Oleg Komogortsev |
Abstract | We create synthetic biometric databases to study general, fundamental, biometric principles. First, we check the validity of the synthetic database design by comparing it to real data in terms of biometric performance. The real data used for this validity check was from an eye-movement related biometric database. Next, we employ our database to evaluate the impact of variations of temporal persistence of features on biometric performance. We index temporal persistence with the intraclass correlation coefficient (ICC). We find that variations in temporal persistence are extremely highly correlated with variations in biometric performance. Finally, we use our synthetic database strategy to determine how many features are required to achieve particular levels of performance as the number of subjects in the database increases from 100 to 10,000. An important finding is that the number of features required to achieve various EER values (2%, 0.3%, 0.15%) is essentially constant in the database sizes that we studied. We hypothesize that the insights obtained from our study would be applicable to many biometric modalities where extracted feature properties resemble the properties of the synthetic features we discuss in this work. |
Tasks | |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1707.09543v1 |
http://arxiv.org/pdf/1707.09543v1.pdf | |
PWC | https://paperswithcode.com/paper/synthetic-database-for-evaluation-of-general |
Repo | |
Framework | |
License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks
Title | License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks |
Authors | Syed Zain Masood, Guang Shu, Afshin Dehghan, Enrique G. Ortiz |
Abstract | This work details Sighthounds fully automated license plate detection and recognition system. The core technology of the system is built using a sequence of deep Convolutional Neural Networks (CNNs) interlaced with accurate and efficient algorithms. The CNNs are trained and fine-tuned so that they are robust under different conditions (e.g. variations in pose, lighting, occlusion, etc.) and can work across a variety of license plate templates (e.g. sizes, backgrounds, fonts, etc). For quantitative analysis, we show that our system outperforms the leading license plate detection and recognition technology i.e. ALPR on several benchmarks. Our system is available to developers through the Sighthound Cloud API at https://www.sighthound.com/products/cloud |
Tasks | |
Published | 2017-03-21 |
URL | http://arxiv.org/abs/1703.07330v2 |
http://arxiv.org/pdf/1703.07330v2.pdf | |
PWC | https://paperswithcode.com/paper/license-plate-detection-and-recognition-using |
Repo | |
Framework | |
An Optimal Online Method of Selecting Source Policies for Reinforcement Learning
Title | An Optimal Online Method of Selecting Source Policies for Reinforcement Learning |
Authors | Siyuan Li, Chongjie Zhang |
Abstract | Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method. |
Tasks | Q-Learning, Robot Navigation, Transfer Learning |
Published | 2017-09-24 |
URL | http://arxiv.org/abs/1709.08201v1 |
http://arxiv.org/pdf/1709.08201v1.pdf | |
PWC | https://paperswithcode.com/paper/an-optimal-online-method-of-selecting-source |
Repo | |
Framework | |
The placement of the head that maximizes predictability. An information theoretic approach
Title | The placement of the head that maximizes predictability. An information theoretic approach |
Authors | Ramon Ferrer-i-Cancho |
Abstract | The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed. |
Tasks | |
Published | 2017-05-28 |
URL | http://arxiv.org/abs/1705.09932v3 |
http://arxiv.org/pdf/1705.09932v3.pdf | |
PWC | https://paperswithcode.com/paper/the-placement-of-the-head-that-maximizes |
Repo | |
Framework | |
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
Title | Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description |
Authors | Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia |
Abstract | We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, only the image is given. Compared to last year, multimodal systems improved, but text-only systems remain competitive. |
Tasks | Machine Translation, Multimodal Machine Translation |
Published | 2017-10-19 |
URL | http://arxiv.org/abs/1710.07177v1 |
http://arxiv.org/pdf/1710.07177v1.pdf | |
PWC | https://paperswithcode.com/paper/findings-of-the-second-shared-task-on |
Repo | |
Framework | |
L2-constrained Softmax Loss for Discriminative Face Verification
Title | L2-constrained Softmax Loss for Discriminative Face Verification |
Authors | Rajeev Ranjan, Carlos D. Castillo, Rama Chellappa |
Abstract | In recent years, the performance of face verification systems has significantly improved using deep convolutional neural networks (DCNNs). A typical pipeline for face verification includes training a deep network for subject classification with softmax loss, using the penultimate layer output as the feature descriptor, and generating a cosine similarity score given a pair of face images. The softmax loss function does not optimize the features to have higher similarity score for positive pairs and lower similarity score for negative pairs, which leads to a performance gap. In this paper, we add an L2-constraint to the feature descriptors which restricts them to lie on a hypersphere of a fixed radius. This module can be easily implemented using existing deep learning frameworks. We show that integrating this simple step in the training pipeline significantly boosts the performance of face verification. Specifically, we achieve state-of-the-art results on the challenging IJB-A dataset, achieving True Accept Rate of 0.909 at False Accept Rate 0.0001 on the face verification protocol. Additionally, we achieve state-of-the-art performance on LFW dataset with an accuracy of 99.78%, and competing performance on YTF dataset with accuracy of 96.08%. |
Tasks | Face Verification |
Published | 2017-03-28 |
URL | http://arxiv.org/abs/1703.09507v3 |
http://arxiv.org/pdf/1703.09507v3.pdf | |
PWC | https://paperswithcode.com/paper/l2-constrained-softmax-loss-for |
Repo | |
Framework | |
Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data
Title | Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data |
Authors | Michael Mayo, Eibe Frank |
Abstract | Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex “black box” models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm. |
Tasks | |
Published | 2017-07-16 |
URL | http://arxiv.org/abs/1707.04943v3 |
http://arxiv.org/pdf/1707.04943v3.pdf | |
PWC | https://paperswithcode.com/paper/improving-naive-bayes-for-regression-with |
Repo | |
Framework | |
Handling Homographs in Neural Machine Translation
Title | Handling Homographs in Neural Machine Translation |
Authors | Frederick Liu, Han Lu, Graham Neubig |
Abstract | Homographs, words with different meanings but the same surface form, have long caused difficulty for machine translation systems, as it is difficult to select the correct translation based on the context. However, with the advent of neural machine translation (NMT) systems, which can theoretically take into account global sentential context, one may hypothesize that this problem has been alleviated. In this paper, we first provide empirical evidence that existing NMT systems in fact still have significant problems in properly translating ambiguous words. We then proceed to describe methods, inspired by the word sense disambiguation literature, that model the context of the input word with context-aware word embeddings that help to differentiate the word sense be- fore feeding it into the encoder. Experiments on three language pairs demonstrate that such models improve the performance of NMT systems both in terms of BLEU score and in the accuracy of translating homographs. |
Tasks | Machine Translation, Word Embeddings, Word Sense Disambiguation |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.06510v2 |
http://arxiv.org/pdf/1708.06510v2.pdf | |
PWC | https://paperswithcode.com/paper/handling-homographs-in-neural-machine |
Repo | |
Framework | |
High-dimensional dynamics of generalization error in neural networks
Title | High-dimensional dynamics of generalization error in neural networks |
Authors | Madhu S. Advani, Andrew M. Saxe |
Abstract | We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant “high-dimensional” regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that naive application of worst-case theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation. |
Tasks | |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03667v1 |
http://arxiv.org/pdf/1710.03667v1.pdf | |
PWC | https://paperswithcode.com/paper/high-dimensional-dynamics-of-generalization |
Repo | |
Framework | |
Joint Text Embedding for Personalized Content-based Recommendation
Title | Joint Text Embedding for Personalized Content-based Recommendation |
Authors | Ting Chen, Liangjie Hong, Yue Shi, Yizhou Sun |
Abstract | Learning a good representation of text is key to many recommendation applications. Examples include news recommendation where texts to be recommended are constantly published everyday. However, most existing recommendation techniques, such as matrix factorization based methods, mainly rely on interaction histories to learn representations of items. While latent factors of items can be learned effectively from user interaction data, in many cases, such data is not available, especially for newly emerged items. In this work, we aim to address the problem of personalized recommendation for completely new items with text information available. We cast the problem as a personalized text ranking problem and propose a general framework that combines text embedding with personalized recommendation. Users and textual content are embedded into latent feature space. The text embedding function can be learned end-to-end by predicting user interactions with items. To alleviate sparsity in interaction data, and leverage large amount of text data with little or no user interactions, we further propose a joint text embedding model that incorporates unsupervised text embedding with a combination module. Experimental results show that our model can significantly improve the effectiveness of recommendation systems on real-world datasets. |
Tasks | Recommendation Systems |
Published | 2017-06-04 |
URL | http://arxiv.org/abs/1706.01084v2 |
http://arxiv.org/pdf/1706.01084v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-text-embedding-for-personalized-content |
Repo | |
Framework | |
Why PairDiff works? – A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection
Title | Why PairDiff works? – A Mathematical Analysis of Bilinear Relational Compositional Operators for Analogy Detection |
Authors | Huda Hakami, Danushka Bollegala, Hayashi Kohei |
Abstract | Representing the semantic relations that exist between two given words (or entities) is an important first step in a wide-range of NLP applications such as analogical reasoning, knowledge base completion and relational information retrieval. A simple, yet surprisingly accurate method for representing a relation between two words is to compute the vector offset (\PairDiff) between their corresponding word embeddings. Despite the empirical success, it remains unclear as to whether \PairDiff is the best operator for obtaining a relational representation from word embeddings. We conduct a theoretical analysis of generalised bilinear operators that can be used to measure the $\ell_{2}$ relational distance between two word-pairs. We show that, if the word embeddings are standardised and uncorrelated, such an operator will be independent of bilinear terms, and can be simplified to a linear form, where \PairDiff is a special case. For numerous word embedding types, we empirically verify the uncorrelation assumption, demonstrating the general applicability of our theoretical result. Moreover, we experimentally discover \PairDiff from the bilinear relation composition operator on several benchmark analogy datasets. |
Tasks | Information Retrieval, Knowledge Base Completion, Word Embeddings |
Published | 2017-09-19 |
URL | http://arxiv.org/abs/1709.06673v2 |
http://arxiv.org/pdf/1709.06673v2.pdf | |
PWC | https://paperswithcode.com/paper/why-pairdiff-works-a-mathematical-analysis-of |
Repo | |
Framework | |
A Unifying View of Explicit and Implicit Feature Maps of Graph Kernels
Title | A Unifying View of Explicit and Implicit Feature Maps of Graph Kernels |
Authors | Nils M. Kriege, Marion Neumann, Christopher Morris, Kristian Kersting, Petra Mutzel |
Abstract | Non-linear kernel methods can be approximated by fast linear ones using suitable explicit feature maps allowing their application to large scale problems. We investigate how convolution kernels for structured data are composed from base kernels and construct corresponding feature maps. On this basis we propose exact and approximative feature maps for widely used graph kernels based on the kernel trick. We analyze for which kernels and graph properties computation by explicit feature maps is feasible and actually more efficient. In particular, we derive approximative, explicit feature maps for state-of-the-art kernels supporting real-valued attributes including the GraphHopper and graph invariant kernels. In extensive experiments we show that our approaches often achieve a classification accuracy close to the exact methods based on the kernel trick, but require only a fraction of their running time. Moreover, we propose and analyze algorithms for computing random walk, shortest-path and subgraph matching kernels by explicit and implicit feature maps. Our theoretical results are confirmed experimentally by observing a phase transition when comparing running time with respect to label diversity, walk lengths and subgraph size, respectively. |
Tasks | |
Published | 2017-03-02 |
URL | https://arxiv.org/abs/1703.00676v3 |
https://arxiv.org/pdf/1703.00676v3.pdf | |
PWC | https://paperswithcode.com/paper/a-unifying-view-of-explicit-and-implicit |
Repo | |
Framework | |
A norm knockout method on indirect reciprocity to reveal indispensable norms
Title | A norm knockout method on indirect reciprocity to reveal indispensable norms |
Authors | Hitoshi Yamamoto, Isamu Okada, Satoshi Uchida, Tatsuya Sasaki |
Abstract | Although various norms for reciprocity-based cooperation have been suggested that are evolutionarily stable against invasion from free riders, the process of alternation of norms and the role of diversified norms remain unclear in the evolution of cooperation. We clarify the co-evolutionary dynamics of norms and cooperation in indirect reciprocity and also identify the indispensable norms for the evolution of cooperation. Inspired by the gene knockout method, a genetic engineering technique, we developed the norm knockout method and clarified the norms necessary for the establishment of cooperation. The results of numerical investigations revealed that the majority of norms gradually transitioned to tolerant norms after defectors are eliminated by strict norms. Furthermore, no cooperation emerges when specific norms that are intolerant to defectors are knocked out. |
Tasks | |
Published | 2017-03-11 |
URL | http://arxiv.org/abs/1703.03943v1 |
http://arxiv.org/pdf/1703.03943v1.pdf | |
PWC | https://paperswithcode.com/paper/a-norm-knockout-method-on-indirect |
Repo | |
Framework | |