Paper Group ANR 185
Pyramid Vector Quantization for Deep Learning. A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks. Large-Scale Domain Adaptation via Teacher-Student Learning. Learning to Embed Words in Context for Syntactic Tasks. Simplified Long Short-term Memory Recurrent Neural Networks: part II. Machine Learning Approac …
Pyramid Vector Quantization for Deep Learning
Title | Pyramid Vector Quantization for Deep Learning |
Authors | Vincenzo Liguori |
Abstract | This paper explores the use of Pyramid Vector Quantization (PVQ) to reduce the computational cost for a variety of neural networks (NNs) while, at the same time, compressing the weights that describe them. This is based on the fact that the dot product between an N dimensional vector of real numbers and an N dimensional PVQ vector can be calculated with only additions and subtractions and one multiplication. This is advantageous since tensor products, commonly used in NNs, can be re-conduced to a dot product or a set of dot products. Finally, it is stressed that any NN architecture that is based on an operation that can be re-conduced to a dot product can benefit from the techniques described here. |
Tasks | Quantization |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02681v1 |
http://arxiv.org/pdf/1704.02681v1.pdf | |
PWC | https://paperswithcode.com/paper/pyramid-vector-quantization-for-deep-learning |
Repo | |
Framework | |
A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks
Title | A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks |
Authors | Feng Chen, Baojian Zhou, Adil Alim, Liang Zhao |
Abstract | Detection of interesting (e.g., coherent or anomalous) clusters has been studied extensively on plain or univariate networks, with various applications. Recently, algorithms have been extended to networks with multiple attributes for each node in the real-world. In a multi-attributed network, often, a cluster of nodes is only interesting for a subset (subspace) of attributes, and this type of clusters is called subspace clusters. However, in the current literature, few methods are capable of detecting subspace clusters, which involves concurrent feature selection and network cluster detection. These relevant methods are mostly heuristic-driven and customized for specific application scenarios. In this work, we present a generic and theoretical framework for detection of interesting subspace clusters in large multi-attributed networks. Specifically, we propose a subspace graph-structured matching pursuit algorithm, namely, SG-Pursuit, to address a broad class of such problems for different score functions (e.g., coherence or anomalous functions) and topology constraints (e.g., connected subgraphs and dense subgraphs). We prove that our algorithm 1) runs in nearly-linear time on the network size and the total number of attributes and 2) enjoys rigorous guarantees (geometrical convergence rate and tight error bound) analogous to those of the state-of-the-art algorithms for sparse feature selection problems and subgraph detection problems. As a case study, we specialize SG-Pursuit to optimize a number of well-known score functions for two typical tasks, including detection of coherent dense and anomalous connected subspace clusters in real-world networks. Empirical evidence demonstrates that our proposed generic algorithm SG-Pursuit performs superior over state-of-the-art methods that are designed specifically for these two tasks. |
Tasks | Feature Selection |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05246v2 |
http://arxiv.org/pdf/1709.05246v2.pdf | |
PWC | https://paperswithcode.com/paper/a-generic-framework-for-interesting-subspace |
Repo | |
Framework | |
Large-Scale Domain Adaptation via Teacher-Student Learning
Title | Large-Scale Domain Adaptation via Teacher-Student Learning |
Authors | Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong |
Abstract | High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-trained model and the desired target domain. To perform adaptation, we employ teacher/student (T/S) learning, in which the posterior probabilities generated by the source-domain model can be used in lieu of labels to train the target-domain model. We evaluate the proposed approach in two scenarios, adapting a clean acoustic model to noisy speech and adapting an adults speech acoustic model to children speech. Significant improvements in accuracy are obtained, with reductions in word error rate of up to 44% over the original source model without the need for transcribed data in the target domain. Moreover, we show that increasing the amount of unlabeled data results in additional model robustness, which is particularly beneficial when using simulated training data in the target-domain. |
Tasks | Domain Adaptation, Speech Recognition |
Published | 2017-08-17 |
URL | http://arxiv.org/abs/1708.05466v1 |
http://arxiv.org/pdf/1708.05466v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-domain-adaptation-via-teacher |
Repo | |
Framework | |
Learning to Embed Words in Context for Syntactic Tasks
Title | Learning to Embed Words in Context for Syntactic Tasks |
Authors | Lifu Tu, Kevin Gimpel, Karen Livescu |
Abstract | We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large amount of unannotated text and evaluate them as features for part-of-speech taggers and dependency parsers trained on much smaller amounts of annotated data. We find that predictors endowed with token embeddings consistently outperform baseline predictors across a range of context window and training set sizes. |
Tasks | |
Published | 2017-06-09 |
URL | http://arxiv.org/abs/1706.02807v2 |
http://arxiv.org/pdf/1706.02807v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-embed-words-in-context-for |
Repo | |
Framework | |
Simplified Long Short-term Memory Recurrent Neural Networks: part II
Title | Simplified Long Short-term Memory Recurrent Neural Networks: part II |
Authors | Atra Akandeh, Fathi M. Salem |
Abstract | This is part II of three-part work. Here, we present a second set of inter-related five variants of simplified Long Short-term Memory (LSTM) recurrent neural networks by further reducing adaptive parameters. Two of these models have been introduced in part I of this work. We evaluate and verify our model variants on the benchmark MNIST dataset and assert that these models are comparable to the base LSTM model while use progressively less number of parameters. Moreover, we observe that in case of using the ReLU activation, the test accuracy performance of the standard LSTM will drop after a number of epochs when learning parameter become larger. However all of the new model variants sustain their performance. |
Tasks | |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.04623v1 |
http://arxiv.org/pdf/1707.04623v1.pdf | |
PWC | https://paperswithcode.com/paper/simplified-long-short-term-memory-recurrent-2 |
Repo | |
Framework | |
Machine Learning Approaches for Traffic Volume Forecasting: A Case Study of the Moroccan Highway Network
Title | Machine Learning Approaches for Traffic Volume Forecasting: A Case Study of the Moroccan Highway Network |
Authors | Abderrahim Khalifa, Younes Idsouguou, Loubna Benabbou, Mourad Zirari |
Abstract | In this paper, we aim to illustrate different approaches we followed while developing a forecasting tool for highway traffic in Morocco. Two main approaches were adopted: Statistical Analysis as a step of data exploration and data wrangling. Therefore, a beta model is carried out for a better understanding of traffic behavior. Next, we moved to Machine Learning where we worked with a bunch of algorithms such as Random Forest, Artificial Neural Networks, Extra Trees, etc. yet, we were convinced that this field of study is still considered under state of the art models, so, we were also covering an application of Long Short-Term Memory Neural Networks. |
Tasks | |
Published | 2017-11-18 |
URL | http://arxiv.org/abs/1711.06779v1 |
http://arxiv.org/pdf/1711.06779v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-approaches-for-traffic |
Repo | |
Framework | |
Group Recommendations: Axioms, Impossibilities, and Random Walks
Title | Group Recommendations: Axioms, Impossibilities, and Random Walks |
Authors | Omer Lev, Moshe Tennenholtz |
Abstract | We introduce an axiomatic approach to group recommendations, in line of previous work on the axiomatic treatment of trust-based recommendation systems, ranking systems, and other foundational work on the axiomatic approach to internet mechanisms in social choice settings. In group recommendations we wish to recommend to a group of agents, consisting of both opinionated and undecided members, a joint choice that would be acceptable to them. Such a system has many applications, such as choosing a movie or a restaurant to go to with a group of friends, recommending games for online game players, & other communal activities. Our method utilizes a given social graph to extract information on the undecided, relying on the agents influencing them. We first show that a set of fairly natural desired requirements (a.k.a axioms) leads to an impossibility, rendering mutual satisfaction of them unreachable. However, we also show a modified set of axioms that fully axiomatize a group variant of the random-walk recommendation system, expanding a previous result from the individual recommendation case. |
Tasks | Recommendation Systems |
Published | 2017-07-27 |
URL | http://arxiv.org/abs/1707.08755v1 |
http://arxiv.org/pdf/1707.08755v1.pdf | |
PWC | https://paperswithcode.com/paper/group-recommendations-axioms-impossibilities |
Repo | |
Framework | |
Low Impact Artificial Intelligences
Title | Low Impact Artificial Intelligences |
Authors | Stuart Armstrong, Benjamin Levinstein |
Abstract | There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact’. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research. | |
Tasks | |
Published | 2017-05-30 |
URL | http://arxiv.org/abs/1705.10720v1 |
http://arxiv.org/pdf/1705.10720v1.pdf | |
PWC | https://paperswithcode.com/paper/low-impact-artificial-intelligences |
Repo | |
Framework | |
Multi-Objective Software Suite of Two-Dimensional Shape Descriptors for Object-Based Image Analysis
Title | Multi-Objective Software Suite of Two-Dimensional Shape Descriptors for Object-Based Image Analysis |
Authors | Andrea Baraldi, João V. B. Soares |
Abstract | In recent years two sets of planar (2D) shape attributes, provided with an intuitive physical meaning, were proposed to the remote sensing community by, respectively, Nagao & Matsuyama and Shackelford & Davis in their seminal works on the increasingly popular geographic object based image analysis (GEOBIA) paradigm. These two published sets of intuitive geometric features were selected as initial conditions by the present R&D software project, whose multi-objective goal was to accomplish: (i) a minimally dependent and maximally informative design (knowledge/information representation) of a general purpose, user and application independent dictionary of 2D shape terms provided with a physical meaning intuitive to understand by human end users and (ii) an effective (accurate, scale invariant, easy to use) and efficient implementation of 2D shape descriptors. To comply with the Quality Assurance Framework for Earth Observation guidelines, the proposed suite of geometric functions is validated by means of a novel quantitative quality assurance policy, centered on inter feature dependence (causality) assessment. This innovative multivariate feature validation strategy is alternative to traditional feature selection procedures based on either inductive data learning classification accuracy estimation, which is inherently case specific, or cross correlation estimation, because statistical cross correlation does not imply causation. The project deliverable is an original general purpose software suite of seven validated off the shelf 2D shape descriptors intuitive to use. Alternative to existing commercial or open source software libraries of tens of planar shape functions whose informativeness remains unknown, it is eligible for use in (GE)OBIA systems in operating mode, expected to mimic human reasoning based on a convergence of evidence approach. |
Tasks | Feature Selection |
Published | 2017-01-08 |
URL | http://arxiv.org/abs/1701.01941v2 |
http://arxiv.org/pdf/1701.01941v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-objective-software-suite-of-two |
Repo | |
Framework | |
A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit
Title | A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit |
Authors | Mark W. Lewis |
Abstract | Multi-start algorithms are a common and effective tool for metaheuristic searches. In this paper we amplify multi-start capabilities by employing the parallel processing power of the graphics processer unit (GPU) to quickly generate a diverse starting set of solutions for the Unconstrained Binary Quadratic Optimization Problem which are evaluated and used to implement screening methods to select solutions for further optimization. This method is implemented as an initial high quality solution generation phase prior to a secondary steepest ascent search and a comparison of results to best known approaches on benchmark unconstrained binary quadratic problems demonstrates that GPU-enabled diversified multi-start with screening quickly yields very good results. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1706.00037v1 |
http://arxiv.org/pdf/1706.00037v1.pdf | |
PWC | https://paperswithcode.com/paper/a-diversified-multi-start-algorithm-for |
Repo | |
Framework | |
Revisiting L21-norm Robustness with Vector Outlier Regularization
Title | Revisiting L21-norm Robustness with Vector Outlier Regularization |
Authors | Bo Jiang, Chris Ding |
Abstract | In many real-world applications, data usually contain outliers. One popular approach is to use L2,1 norm function as a robust error/loss function. However, the robustness of L2,1 norm function is not well understood so far. In this paper, we propose a new Vector Outlier Regularization (VOR) framework to understand and analyze the robustness of L2,1 norm function. Our VOR function defines a data point to be outlier if it is outside a threshold with respect to a theoretical prediction, and regularize it-pull it back to the threshold line. We then prove that L2,1 function is the limiting case of this VOR with the usual least square/L2 error function as the threshold shrinks to zero. One interesting property of VOR is that how far an outlier lies away from its theoretically predicted value does not affect the final regularization and analysis results. This VOR property unmasks one of the most peculiar property of L2,1 norm function: The effects of outliers seem to be independent of how outlying they are-if an outlier is moved further away from the intrinsic manifold/subspace, the final analysis results do not change. VOR provides a new way to understand and analyze the robustness of L2,1 norm function. Applying VOR to matrix factorization leads to a new VORPCA model. We give a comprehensive comparison with trace-norm based L21-norm PCA to demonstrate the advantages of VORPCA. |
Tasks | |
Published | 2017-06-20 |
URL | https://arxiv.org/abs/1706.06409v2 |
https://arxiv.org/pdf/1706.06409v2.pdf | |
PWC | https://paperswithcode.com/paper/outlier-regularization-for-vector-data-and |
Repo | |
Framework | |
On the diffusion approximation of nonconvex stochastic gradient descent
Title | On the diffusion approximation of nonconvex stochastic gradient descent |
Authors | Wenqing Hu, Chris Junchi Li, Lei Li, Jian-Guo Liu |
Abstract | We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07562v2 |
http://arxiv.org/pdf/1705.07562v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-diffusion-approximation-of-nonconvex |
Repo | |
Framework | |
Non-Markovian Control with Gated End-to-End Memory Policy Networks
Title | Non-Markovian Control with Gated End-to-End Memory Policy Networks |
Authors | Julien Perez, Tomi Silander |
Abstract | Partially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently proposed attention-based model, the Gated End-to-End Memory Network, for sequential control. We call the resulting model the Gated End-to-End Memory Policy Network. More precisely, we use a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network. This model is end-to-end learnable and it features unbounded memory. Indeed, because of its attention mechanism and associated non-parametric memory, the proposed model allows us to define an attention mechanism over the observation stream unlike recurrent models. We show encouraging results that illustrate the capability of our attention-based model in the context of the continuous-state non-stationary control problem of stock trading. We also present an OpenAI Gym environment for simulated stock exchange and explain its relevance as a benchmark for the field of non-Markovian decision process learning. |
Tasks | |
Published | 2017-05-31 |
URL | http://arxiv.org/abs/1705.10993v1 |
http://arxiv.org/pdf/1705.10993v1.pdf | |
PWC | https://paperswithcode.com/paper/non-markovian-control-with-gated-end-to-end |
Repo | |
Framework | |
Hyperprofile-based Computation Offloading for Mobile Edge Networks
Title | Hyperprofile-based Computation Offloading for Mobile Edge Networks |
Authors | Andrew Crutcher, Caleb Koch, Kyle Coleman, Jon Patman, Flavio Esposito, Prasad Calyam |
Abstract | In recent studies, researchers have developed various computation offloading frameworks for bringing cloud services closer to the user via edge networks. Specifically, an edge device needs to offload computationally intensive tasks because of energy and processing constraints. These constraints present the challenge of identifying which edge nodes should receive tasks to reduce overall resource consumption. We propose a unique solution to this problem which incorporates elements from Knowledge-Defined Networking (KDN) to make intelligent predictions about offloading costs based on historical data. Each server instance can be represented in a multidimensional feature space where each dimension corresponds to a predicted metric. We compute features for a “hyperprofile” and position nodes based on the predicted costs of offloading a particular task. We then perform a k-Nearest Neighbor (kNN) query within the hyperprofile to select nodes for offloading computation. This paper formalizes our hyperprofile-based solution and explores the viability of using machine learning (ML) techniques to predict metrics useful for computation offloading. We also investigate the effects of using different distance metrics for the queries. Our results show various network metrics can be modeled accurately with regression, and there are circumstances where kNN queries using Euclidean distance as opposed to rectilinear distance is more favorable. |
Tasks | |
Published | 2017-07-28 |
URL | http://arxiv.org/abs/1707.09422v1 |
http://arxiv.org/pdf/1707.09422v1.pdf | |
PWC | https://paperswithcode.com/paper/hyperprofile-based-computation-offloading-for |
Repo | |
Framework | |
Forecasting Player Behavioral Data and Simulating in-Game Events
Title | Forecasting Player Behavioral Data and Simulating in-Game Events |
Authors | Anna Guitart, Pei Pei Chen, Paul Bertens, África Periáñez |
Abstract | Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors. |
Tasks | Time Series |
Published | 2017-10-05 |
URL | http://arxiv.org/abs/1710.01931v1 |
http://arxiv.org/pdf/1710.01931v1.pdf | |
PWC | https://paperswithcode.com/paper/forecasting-player-behavioral-data-and |
Repo | |
Framework | |