July 28, 2019

3160 words 15 mins read

Paper Group ANR 185

Pyramid Vector Quantization for Deep Learning. A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks. Large-Scale Domain Adaptation via Teacher-Student Learning. Learning to Embed Words in Context for Syntactic Tasks. Simplified Long Short-term Memory Recurrent Neural Networks: part II. Machine Learning Approac …

Pyramid Vector Quantization for Deep Learning


Title	Pyramid Vector Quantization for Deep Learning
Authors	Vincenzo Liguori
Abstract	This paper explores the use of Pyramid Vector Quantization (PVQ) to reduce the computational cost for a variety of neural networks (NNs) while, at the same time, compressing the weights that describe them. This is based on the fact that the dot product between an N dimensional vector of real numbers and an N dimensional PVQ vector can be calculated with only additions and subtractions and one multiplication. This is advantageous since tensor products, commonly used in NNs, can be re-conduced to a dot product or a set of dot products. Finally, it is stressed that any NN architecture that is based on an operation that can be re-conduced to a dot product can benefit from the techniques described here.
Tasks	Quantization
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02681v1
PDF	http://arxiv.org/pdf/1704.02681v1.pdf
PWC	https://paperswithcode.com/paper/pyramid-vector-quantization-for-deep-learning
Repo
Framework

A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks


Title	A Generic Framework for Interesting Subspace Cluster Detection in Multi-attributed Networks
Authors	Feng Chen, Baojian Zhou, Adil Alim, Liang Zhao
Abstract	Detection of interesting (e.g., coherent or anomalous) clusters has been studied extensively on plain or univariate networks, with various applications. Recently, algorithms have been extended to networks with multiple attributes for each node in the real-world. In a multi-attributed network, often, a cluster of nodes is only interesting for a subset (subspace) of attributes, and this type of clusters is called subspace clusters. However, in the current literature, few methods are capable of detecting subspace clusters, which involves concurrent feature selection and network cluster detection. These relevant methods are mostly heuristic-driven and customized for specific application scenarios. In this work, we present a generic and theoretical framework for detection of interesting subspace clusters in large multi-attributed networks. Specifically, we propose a subspace graph-structured matching pursuit algorithm, namely, SG-Pursuit, to address a broad class of such problems for different score functions (e.g., coherence or anomalous functions) and topology constraints (e.g., connected subgraphs and dense subgraphs). We prove that our algorithm 1) runs in nearly-linear time on the network size and the total number of attributes and 2) enjoys rigorous guarantees (geometrical convergence rate and tight error bound) analogous to those of the state-of-the-art algorithms for sparse feature selection problems and subgraph detection problems. As a case study, we specialize SG-Pursuit to optimize a number of well-known score functions for two typical tasks, including detection of coherent dense and anomalous connected subspace clusters in real-world networks. Empirical evidence demonstrates that our proposed generic algorithm SG-Pursuit performs superior over state-of-the-art methods that are designed specifically for these two tasks.
Tasks	Feature Selection
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05246v2
PDF	http://arxiv.org/pdf/1709.05246v2.pdf
PWC	https://paperswithcode.com/paper/a-generic-framework-for-interesting-subspace
Repo
Framework

Large-Scale Domain Adaptation via Teacher-Student Learning


Title	Large-Scale Domain Adaptation via Teacher-Student Learning
Authors	Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong
Abstract	High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-trained model and the desired target domain. To perform adaptation, we employ teacher/student (T/S) learning, in which the posterior probabilities generated by the source-domain model can be used in lieu of labels to train the target-domain model. We evaluate the proposed approach in two scenarios, adapting a clean acoustic model to noisy speech and adapting an adults speech acoustic model to children speech. Significant improvements in accuracy are obtained, with reductions in word error rate of up to 44% over the original source model without the need for transcribed data in the target domain. Moreover, we show that increasing the amount of unlabeled data results in additional model robustness, which is particularly beneficial when using simulated training data in the target-domain.
Tasks	Domain Adaptation, Speech Recognition
Published	2017-08-17
URL	http://arxiv.org/abs/1708.05466v1
PDF	http://arxiv.org/pdf/1708.05466v1.pdf
PWC	https://paperswithcode.com/paper/large-scale-domain-adaptation-via-teacher
Repo
Framework

Learning to Embed Words in Context for Syntactic Tasks


Title	Learning to Embed Words in Context for Syntactic Tasks
Authors	Lifu Tu, Kevin Gimpel, Karen Livescu
Abstract	We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large amount of unannotated text and evaluate them as features for part-of-speech taggers and dependency parsers trained on much smaller amounts of annotated data. We find that predictors endowed with token embeddings consistently outperform baseline predictors across a range of context window and training set sizes.
Tasks
Published	2017-06-09
URL	http://arxiv.org/abs/1706.02807v2
PDF	http://arxiv.org/pdf/1706.02807v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-embed-words-in-context-for
Repo
Framework

Simplified Long Short-term Memory Recurrent Neural Networks: part II


Title	Simplified Long Short-term Memory Recurrent Neural Networks: part II
Authors	Atra Akandeh, Fathi M. Salem
Abstract	This is part II of three-part work. Here, we present a second set of inter-related five variants of simplified Long Short-term Memory (LSTM) recurrent neural networks by further reducing adaptive parameters. Two of these models have been introduced in part I of this work. We evaluate and verify our model variants on the benchmark MNIST dataset and assert that these models are comparable to the base LSTM model while use progressively less number of parameters. Moreover, we observe that in case of using the ReLU activation, the test accuracy performance of the standard LSTM will drop after a number of epochs when learning parameter become larger. However all of the new model variants sustain their performance.
Tasks
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04623v1
PDF	http://arxiv.org/pdf/1707.04623v1.pdf
PWC	https://paperswithcode.com/paper/simplified-long-short-term-memory-recurrent-2
Repo
Framework

Machine Learning Approaches for Traffic Volume Forecasting: A Case Study of the Moroccan Highway Network


Title	Machine Learning Approaches for Traffic Volume Forecasting: A Case Study of the Moroccan Highway Network
Authors	Abderrahim Khalifa, Younes Idsouguou, Loubna Benabbou, Mourad Zirari
Abstract	In this paper, we aim to illustrate different approaches we followed while developing a forecasting tool for highway traffic in Morocco. Two main approaches were adopted: Statistical Analysis as a step of data exploration and data wrangling. Therefore, a beta model is carried out for a better understanding of traffic behavior. Next, we moved to Machine Learning where we worked with a bunch of algorithms such as Random Forest, Artificial Neural Networks, Extra Trees, etc. yet, we were convinced that this field of study is still considered under state of the art models, so, we were also covering an application of Long Short-Term Memory Neural Networks.
Tasks
Published	2017-11-18
URL	http://arxiv.org/abs/1711.06779v1
PDF	http://arxiv.org/pdf/1711.06779v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-approaches-for-traffic
Repo
Framework

Group Recommendations: Axioms, Impossibilities, and Random Walks


Title	Group Recommendations: Axioms, Impossibilities, and Random Walks
Authors	Omer Lev, Moshe Tennenholtz
Abstract	We introduce an axiomatic approach to group recommendations, in line of previous work on the axiomatic treatment of trust-based recommendation systems, ranking systems, and other foundational work on the axiomatic approach to internet mechanisms in social choice settings. In group recommendations we wish to recommend to a group of agents, consisting of both opinionated and undecided members, a joint choice that would be acceptable to them. Such a system has many applications, such as choosing a movie or a restaurant to go to with a group of friends, recommending games for online game players, & other communal activities. Our method utilizes a given social graph to extract information on the undecided, relying on the agents influencing them. We first show that a set of fairly natural desired requirements (a.k.a axioms) leads to an impossibility, rendering mutual satisfaction of them unreachable. However, we also show a modified set of axioms that fully axiomatize a group variant of the random-walk recommendation system, expanding a previous result from the individual recommendation case.
Tasks	Recommendation Systems
Published	2017-07-27
URL	http://arxiv.org/abs/1707.08755v1
PDF	http://arxiv.org/pdf/1707.08755v1.pdf
PWC	https://paperswithcode.com/paper/group-recommendations-axioms-impossibilities
Repo
Framework

Low Impact Artificial Intelligences


Title	Low Impact Artificial Intelligences
Authors	Stuart Armstrong, Benjamin Levinstein
Abstract	There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact’. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research. \|
Tasks
Published	2017-05-30
URL	http://arxiv.org/abs/1705.10720v1
PDF	http://arxiv.org/pdf/1705.10720v1.pdf
PWC	https://paperswithcode.com/paper/low-impact-artificial-intelligences
Repo
Framework

Multi-Objective Software Suite of Two-Dimensional Shape Descriptors for Object-Based Image Analysis


Title	Multi-Objective Software Suite of Two-Dimensional Shape Descriptors for Object-Based Image Analysis
Authors	Andrea Baraldi, João V. B. Soares
Abstract	In recent years two sets of planar (2D) shape attributes, provided with an intuitive physical meaning, were proposed to the remote sensing community by, respectively, Nagao & Matsuyama and Shackelford & Davis in their seminal works on the increasingly popular geographic object based image analysis (GEOBIA) paradigm. These two published sets of intuitive geometric features were selected as initial conditions by the present R&D software project, whose multi-objective goal was to accomplish: (i) a minimally dependent and maximally informative design (knowledge/information representation) of a general purpose, user and application independent dictionary of 2D shape terms provided with a physical meaning intuitive to understand by human end users and (ii) an effective (accurate, scale invariant, easy to use) and efficient implementation of 2D shape descriptors. To comply with the Quality Assurance Framework for Earth Observation guidelines, the proposed suite of geometric functions is validated by means of a novel quantitative quality assurance policy, centered on inter feature dependence (causality) assessment. This innovative multivariate feature validation strategy is alternative to traditional feature selection procedures based on either inductive data learning classification accuracy estimation, which is inherently case specific, or cross correlation estimation, because statistical cross correlation does not imply causation. The project deliverable is an original general purpose software suite of seven validated off the shelf 2D shape descriptors intuitive to use. Alternative to existing commercial or open source software libraries of tens of planar shape functions whose informativeness remains unknown, it is eligible for use in (GE)OBIA systems in operating mode, expected to mimic human reasoning based on a convergence of evidence approach.
Tasks	Feature Selection
Published	2017-01-08
URL	http://arxiv.org/abs/1701.01941v2
PDF	http://arxiv.org/pdf/1701.01941v2.pdf
PWC	https://paperswithcode.com/paper/multi-objective-software-suite-of-two
Repo
Framework

A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit


Title	A Diversified Multi-Start Algorithm for Unconstrained Binary Quadratic Problems Leveraging the Graphics Processor Unit
Authors	Mark W. Lewis
Abstract	Multi-start algorithms are a common and effective tool for metaheuristic searches. In this paper we amplify multi-start capabilities by employing the parallel processing power of the graphics processer unit (GPU) to quickly generate a diverse starting set of solutions for the Unconstrained Binary Quadratic Optimization Problem which are evaluated and used to implement screening methods to select solutions for further optimization. This method is implemented as an initial high quality solution generation phase prior to a secondary steepest ascent search and a comparison of results to best known approaches on benchmark unconstrained binary quadratic problems demonstrates that GPU-enabled diversified multi-start with screening quickly yields very good results.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1706.00037v1
PDF	http://arxiv.org/pdf/1706.00037v1.pdf
PWC	https://paperswithcode.com/paper/a-diversified-multi-start-algorithm-for
Repo
Framework

Revisiting L21-norm Robustness with Vector Outlier Regularization


Title	Revisiting L21-norm Robustness with Vector Outlier Regularization
Authors	Bo Jiang, Chris Ding
Abstract	In many real-world applications, data usually contain outliers. One popular approach is to use L2,1 norm function as a robust error/loss function. However, the robustness of L2,1 norm function is not well understood so far. In this paper, we propose a new Vector Outlier Regularization (VOR) framework to understand and analyze the robustness of L2,1 norm function. Our VOR function defines a data point to be outlier if it is outside a threshold with respect to a theoretical prediction, and regularize it-pull it back to the threshold line. We then prove that L2,1 function is the limiting case of this VOR with the usual least square/L2 error function as the threshold shrinks to zero. One interesting property of VOR is that how far an outlier lies away from its theoretically predicted value does not affect the final regularization and analysis results. This VOR property unmasks one of the most peculiar property of L2,1 norm function: The effects of outliers seem to be independent of how outlying they are-if an outlier is moved further away from the intrinsic manifold/subspace, the final analysis results do not change. VOR provides a new way to understand and analyze the robustness of L2,1 norm function. Applying VOR to matrix factorization leads to a new VORPCA model. We give a comprehensive comparison with trace-norm based L21-norm PCA to demonstrate the advantages of VORPCA.
Tasks
Published	2017-06-20
URL	https://arxiv.org/abs/1706.06409v2
PDF	https://arxiv.org/pdf/1706.06409v2.pdf
PWC	https://paperswithcode.com/paper/outlier-regularization-for-vector-data-and
Repo
Framework

On the diffusion approximation of nonconvex stochastic gradient descent


Title	On the diffusion approximation of nonconvex stochastic gradient descent
Authors	Wenqing Hu, Chris Junchi Li, Lei Li, Jian-Guo Liu
Abstract	We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07562v2
PDF	http://arxiv.org/pdf/1705.07562v2.pdf
PWC	https://paperswithcode.com/paper/on-the-diffusion-approximation-of-nonconvex
Repo
Framework

Non-Markovian Control with Gated End-to-End Memory Policy Networks


Title	Non-Markovian Control with Gated End-to-End Memory Policy Networks
Authors	Julien Perez, Tomi Silander
Abstract	Partially observable environments present an important open challenge in the domain of sequential control learning with delayed rewards. Despite numerous attempts during the two last decades, the majority of reinforcement learning algorithms and associated approximate models, applied to this context, still assume Markovian state transitions. In this paper, we explore the use of a recently proposed attention-based model, the Gated End-to-End Memory Network, for sequential control. We call the resulting model the Gated End-to-End Memory Policy Network. More precisely, we use a model-free value-based algorithm to learn policies for partially observed domains using this memory-enhanced neural network. This model is end-to-end learnable and it features unbounded memory. Indeed, because of its attention mechanism and associated non-parametric memory, the proposed model allows us to define an attention mechanism over the observation stream unlike recurrent models. We show encouraging results that illustrate the capability of our attention-based model in the context of the continuous-state non-stationary control problem of stock trading. We also present an OpenAI Gym environment for simulated stock exchange and explain its relevance as a benchmark for the field of non-Markovian decision process learning.
Tasks
Published	2017-05-31
URL	http://arxiv.org/abs/1705.10993v1
PDF	http://arxiv.org/pdf/1705.10993v1.pdf
PWC	https://paperswithcode.com/paper/non-markovian-control-with-gated-end-to-end
Repo
Framework

Hyperprofile-based Computation Offloading for Mobile Edge Networks


Title	Hyperprofile-based Computation Offloading for Mobile Edge Networks
Authors	Andrew Crutcher, Caleb Koch, Kyle Coleman, Jon Patman, Flavio Esposito, Prasad Calyam
Abstract	In recent studies, researchers have developed various computation offloading frameworks for bringing cloud services closer to the user via edge networks. Specifically, an edge device needs to offload computationally intensive tasks because of energy and processing constraints. These constraints present the challenge of identifying which edge nodes should receive tasks to reduce overall resource consumption. We propose a unique solution to this problem which incorporates elements from Knowledge-Defined Networking (KDN) to make intelligent predictions about offloading costs based on historical data. Each server instance can be represented in a multidimensional feature space where each dimension corresponds to a predicted metric. We compute features for a “hyperprofile” and position nodes based on the predicted costs of offloading a particular task. We then perform a k-Nearest Neighbor (kNN) query within the hyperprofile to select nodes for offloading computation. This paper formalizes our hyperprofile-based solution and explores the viability of using machine learning (ML) techniques to predict metrics useful for computation offloading. We also investigate the effects of using different distance metrics for the queries. Our results show various network metrics can be modeled accurately with regression, and there are circumstances where kNN queries using Euclidean distance as opposed to rectilinear distance is more favorable.
Tasks
Published	2017-07-28
URL	http://arxiv.org/abs/1707.09422v1
PDF	http://arxiv.org/pdf/1707.09422v1.pdf
PWC	https://paperswithcode.com/paper/hyperprofile-based-computation-offloading-for
Repo
Framework

Forecasting Player Behavioral Data and Simulating in-Game Events


Title	Forecasting Player Behavioral Data and Simulating in-Game Events
Authors	Anna Guitart, Pei Pei Chen, Paul Bertens, África Periáñez
Abstract	Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors.
Tasks	Time Series
Published	2017-10-05
URL	http://arxiv.org/abs/1710.01931v1
PDF	http://arxiv.org/pdf/1710.01931v1.pdf
PWC	https://paperswithcode.com/paper/forecasting-player-behavioral-data-and
Repo
Framework