Paper Group ANR 201
Using Machine Learning for Handover Optimization in Vehicular Fog Computing. Micro-Browsing Models for Search Snippets. Machine Learning of coarse-grained Molecular Dynamics Force Fields. Attention-Guided Answer Distillation for Machine Reading Comprehension. Similar Elements and Metric Labeling on Complete Graphs. Exact Recovery in the Hypergraph …
Using Machine Learning for Handover Optimization in Vehicular Fog Computing
Title | Using Machine Learning for Handover Optimization in Vehicular Fog Computing |
Authors | Salman Memon, Muthucumaru Maheswaran |
Abstract | Smart mobility management would be an important prerequisite for future fog computing systems. In this research, we propose a learning-based handover optimization for the Internet of Vehicles that would assist the smooth transition of device connections and offloaded tasks between fog nodes. To accomplish this, we make use of machine learning algorithms to learn from vehicle interactions with fog nodes. Our approach uses a three-layer feed-forward neural network to predict the correct fog node at a given location and time with 99.2 % accuracy on a test set. We also implement a dual stacked recurrent neural network (RNN) with long short-term memory (LSTM) cells capable of learning the latency, or cost, associated with these service requests. We create a simulation in JAMScript using a dataset of real-world vehicle movements to create a dataset to train these networks. We further propose the use of this predictive system in a smarter request routing mechanism to minimize the service interruption during handovers between fog nodes and to anticipate areas of low coverage through a series of experiments and test the models’ performance on a test set. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1812.11652v1 |
http://arxiv.org/pdf/1812.11652v1.pdf | |
PWC | https://paperswithcode.com/paper/using-machine-learning-for-handover |
Repo | |
Framework | |
Micro-Browsing Models for Search Snippets
Title | Micro-Browsing Models for Search Snippets |
Authors | Muhammad Asiful Islam, Ramakrishnan Srikant, Sugato Basu |
Abstract | Click-through rate (CTR) is a key signal of relevance for search engine results, both organic and sponsored. CTR of a result has two core components: (a) the probability of examination of a result by a user, and (b) the perceived relevance of the result given that it has been examined by the user. There has been considerable work on user browsing models, to model and analyze both the examination and the relevance components of CTR. In this paper, we propose a novel formulation: a micro-browsing model for how users read result snippets. The snippet text of a result often plays a critical role in the perceived relevance of the result. We study how particular words within a line of snippet can influence user behavior. We validate this new micro-browsing user model by considering the problem of predicting which snippet will yield higher CTR, and show that classification accuracy is dramatically higher with our micro-browsing user model. The key insight in this paper is that varying relatively few words within a snippet, and even their location within a snippet, can have a significant influence on the clickthrough of a snippet. |
Tasks | |
Published | 2018-10-18 |
URL | http://arxiv.org/abs/1810.08223v1 |
http://arxiv.org/pdf/1810.08223v1.pdf | |
PWC | https://paperswithcode.com/paper/micro-browsing-models-for-search-snippets |
Repo | |
Framework | |
Machine Learning of coarse-grained Molecular Dynamics Force Fields
Title | Machine Learning of coarse-grained Molecular Dynamics Force Fields |
Authors | Jiang Wang, Simon Olsson, Christoph Wehmeyer, Adria Perez, Nicholas E. Charron, Gianni de Fabritiis, Frank Noe, Cecilia Clementi |
Abstract | Atomistic or ab-initio molecular dynamics simulations are widely used to predict thermodynamics and kinetics and relate them to molecular structure. A common approach to go beyond the time- and length-scales accessible with such computationally expensive simulations is the definition of coarse-grained molecular models. Existing coarse-graining approaches define an effective interaction potential to match defined properties of high-resolution models or experimental data. In this paper, we reformulate coarse-graining as a supervised machine learning problem. We use statistical learning theory to decompose the coarse-graining error and cross-validation to select and compare the performance of different models. We introduce CGnets, a deep learning approach, that learns coarse-grained free energy functions and can be trained by a force matching scheme. CGnets maintain all physically relevant invariances and allow one to incorporate prior physics knowledge to avoid sampling of unphysical structures. We show that CGnets can capture all-atom explicit-solvent free energy surfaces with models using only a few coarse-grained beads and no solvent, while classical coarse-graining methods fail to capture crucial features of the free energy surface. Thus, CGnets are able to capture multi-body terms that emerge from the dimensionality reduction. |
Tasks | Dimensionality Reduction |
Published | 2018-12-04 |
URL | http://arxiv.org/abs/1812.01736v3 |
http://arxiv.org/pdf/1812.01736v3.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-of-coarse-grained-molecular |
Repo | |
Framework | |
Attention-Guided Answer Distillation for Machine Reading Comprehension
Title | Attention-Guided Answer Distillation for Machine Reading Comprehension |
Authors | Minghao Hu, Yuxing Peng, Furu Wei, Zhen Huang, Dongsheng Li, Nan Yang, Ming Zhou |
Abstract | Despite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models. Besides, existing approaches are also vulnerable to adversarial attacks. This paper tackles these problems by leveraging knowledge distillation, which aims to transfer knowledge from an ensemble model to a single model. We first demonstrate that vanilla knowledge distillation applied to answer span prediction is effective for reading comprehension systems. We then propose two novel approaches that not only penalize the prediction on confusing answers but also guide the training with alignment information distilled from the ensemble. Experiments show that our best student model has only a slight drop of 0.4% F1 on the SQuAD test set compared to the ensemble teacher, while running 12x faster during inference. It even outperforms the teacher on adversarial SQuAD datasets and NarrativeQA benchmark. |
Tasks | Machine Reading Comprehension, Reading Comprehension |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07644v4 |
http://arxiv.org/pdf/1808.07644v4.pdf | |
PWC | https://paperswithcode.com/paper/attention-guided-answer-distillation-for |
Repo | |
Framework | |
Similar Elements and Metric Labeling on Complete Graphs
Title | Similar Elements and Metric Labeling on Complete Graphs |
Authors | Pedro F. Felzenszwalb |
Abstract | We consider a problem that involves finding similar elements in a collection of sets. The problem is motivated by applications in machine learning and pattern recognition. We formulate the similar elements problem as an optimization and give an efficient approximation algorithm that finds a solution within a factor of 2 of the optimal. The similar elements problem is a special case of the metric labeling problem and we also give an efficient 2-approximation algorithm for the metric labeling problem on complete graphs. |
Tasks | |
Published | 2018-03-21 |
URL | http://arxiv.org/abs/1803.08037v2 |
http://arxiv.org/pdf/1803.08037v2.pdf | |
PWC | https://paperswithcode.com/paper/similar-elements-and-metric-labeling-on |
Repo | |
Framework | |
Exact Recovery in the Hypergraph Stochastic Block Model: a Spectral Algorithm
Title | Exact Recovery in the Hypergraph Stochastic Block Model: a Spectral Algorithm |
Authors | Sam Cole, Yizhe Zhu |
Abstract | We consider the exact recovery problem in the hypergraph stochastic block model (HSBM) with $k$ blocks of equal size. More precisely, we consider a random $d$-uniform hypergraph $H$ with $n$ vertices partitioned into $k$ clusters of size $s = n / k$. Hyperedges $e$ are added independently with probability $p$ if $e$ is contained within a single cluster and $q$ otherwise, where $0 \leq q < p \leq 1$. We present a spectral algorithm which recovers the clusters exactly with high probability, given mild conditions on $n, k, p, q$, and $d$. Our algorithm is based on the adjacency matrix of $H$, which is a symmetric $n \times n$ matrix whose $(u, v)$-th entry is the number of hyperedges containing both $u$ and $v$. To the best of our knowledge, our algorithm is the first to guarantee exact recovery when the number of clusters $k=\Theta(\sqrt{n})$. |
Tasks | |
Published | 2018-11-16 |
URL | https://arxiv.org/abs/1811.06931v4 |
https://arxiv.org/pdf/1811.06931v4.pdf | |
PWC | https://paperswithcode.com/paper/exact-recovery-in-the-hypergraph-stochastic |
Repo | |
Framework | |
Hardware Trojan Attacks on Neural Networks
Title | Hardware Trojan Attacks on Neural Networks |
Authors | Joseph Clements, Yingjie Lao |
Abstract | With the rising popularity of machine learning and the ever increasing demand for computational power, there is a growing need for hardware optimized implementations of neural networks and other machine learning models. As the technology evolves, it is also plausible that machine learning or artificial intelligence will soon become consumer electronic products and military equipment, in the form of well-trained models. Unfortunately, the modern fabless business model of manufacturing hardware, while economic, leads to deficiencies in security through the supply chain. In this paper, we illuminate these security issues by introducing hardware Trojan attacks on neural networks, expanding the current taxonomy of neural network security to incorporate attacks of this nature. To aid in this, we develop a novel framework for inserting malicious hardware Trojans in the implementation of a neural network classifier. We evaluate the capabilities of the adversary in this setting by implementing the attack algorithm on convolutional neural networks while controlling a variety of parameters available to the adversary. Our experimental results show that the proposed algorithm could effectively classify a selected input trigger as a specified class on the MNIST dataset by injecting hardware Trojans into $0.03%$, on average, of neurons in the 5th hidden layer of arbitrary 7-layer convolutional neural networks, while undetectable under the test data. Finally, we discuss the potential defenses to protect neural networks against hardware Trojan attacks. |
Tasks | Neural Network Security |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05768v1 |
http://arxiv.org/pdf/1806.05768v1.pdf | |
PWC | https://paperswithcode.com/paper/hardware-trojan-attacks-on-neural-networks |
Repo | |
Framework | |
Shift-based Primitives for Efficient Convolutional Neural Networks
Title | Shift-based Primitives for Efficient Convolutional Neural Networks |
Authors | Huasong Zhong, Xianggen Liu, Yihui He, Yuchun Ma |
Abstract | We propose a collection of three shift-based primitives for building efficient compact CNN-based networks. These three primitives (channel shift, address shift, shortcut shift) can reduce the inference time on GPU while maintains the prediction accuracy. These shift-based primitives only moves the pointer but avoids memory copy, thus very fast. For example, the channel shift operation is 12.7x faster compared to channel shuffle in ShuffleNet but achieves the same accuracy. The address shift and channel shift can be merged into the point-wise group convolution and invokes only a single kernel call, taking little time to perform spatial convolution and channel shift. Shortcut shift requires no time to realize residual connection through allocating space in advance. We blend these shift-based primitives with point-wise group convolution and built two inference-efficient CNN architectures named AddressNet and Enhanced AddressNet. Experiments on CIFAR100 and ImageNet datasets show that our models are faster and achieve comparable or better accuracy. |
Tasks | |
Published | 2018-09-22 |
URL | http://arxiv.org/abs/1809.08458v2 |
http://arxiv.org/pdf/1809.08458v2.pdf | |
PWC | https://paperswithcode.com/paper/shift-based-primitives-for-efficient |
Repo | |
Framework | |
Designing Adaptive Neural Networks for Energy-Constrained Image Classification
Title | Designing Adaptive Neural Networks for Energy-Constrained Image Classification |
Authors | Dimitrios Stamoulis, Ting-Wu Chin, Anand Krishnan Prakash, Haocheng Fang, Sribhuvan Sajja, Mitchell Bognar, Diana Marculescu |
Abstract | As convolutional neural networks (CNNs) enable state-of-the-art computer vision applications, their high energy consumption has emerged as a key impediment to their deployment on embedded and mobile devices. Towards efficient image classification under hardware constraints, prior work has proposed adaptive CNNs, i.e., systems of networks with different accuracy and computation characteristics, where a selection scheme adaptively selects the network to be evaluated for each input image. While previous efforts have investigated different network selection schemes, we find that they do not necessarily result in energy savings when deployed on mobile systems. The key limitation of existing methods is that they learn only how data should be processed among the CNNs and not the network architectures, with each network being treated as a blackbox. To address this limitation, we pursue a more powerful design paradigm where the architecture settings of the CNNs are treated as hyper-parameters to be globally optimized. We cast the design of adaptive CNNs as a hyper-parameter optimization problem with respect to energy, accuracy, and communication constraints imposed by the mobile device. To efficiently solve this problem, we adapt Bayesian optimization to the properties of the design space, reaching near-optimal configurations in few tens of function evaluations. Our method reduces the energy consumed for image classification on a mobile device by up to 6x, compared to the best previously published work that uses CNNs as blackboxes. Finally, we evaluate two image classification practices, i.e., classifying all images locally versus over the cloud under energy and communication constraints. |
Tasks | Image Classification |
Published | 2018-08-05 |
URL | http://arxiv.org/abs/1808.01550v2 |
http://arxiv.org/pdf/1808.01550v2.pdf | |
PWC | https://paperswithcode.com/paper/designing-adaptive-neural-networks-for-energy |
Repo | |
Framework | |
Optimal Matrix Momentum Stochastic Approximation and Applications to Q-learning
Title | Optimal Matrix Momentum Stochastic Approximation and Applications to Q-learning |
Authors | Adithya M. Devraj, Ana Bušić, Sean Meyn |
Abstract | Acceleration is an increasingly common theme in the stochastic optimization literature. The two most common examples are Nesterov’s method, and Polyak’s momentum technique. In this paper two new algorithms are introduced for root finding problems: 1) PolSA is a root finding algorithm with specially designed matrix momentum, and 2) NeSA can be regarded as a variant of Nesterov’s algorithm, or a simplification of PolSA. The PolSA algorithm is new even in the context of optimization (when cast as a root finding problem). The research surveyed in this paper is motivated by applications to reinforcement learning. It is well known that most variants of TD- and Q-learning may be cast as SA (stochastic approximation) algorithms, and the tools from general SA theory can be used to investigate convergence and bounds on convergence rate. In particular, the asymptotic variance is a common metric of performance for SA algorithms, and is also one among many metrics used in assessing the performance of stochastic optimization algorithms. There are two well known SA techniques that are known to have optimal asymptotic variance: the Ruppert-Polyak averaging technique, and stochastic Newton-Raphson (SNR). The former algorithm can have extremely bad transient performance, and the latter can be computationally expensive. It is demonstrated here that parameter estimates from the new PolSA algorithm couple with those of the ideal (but more complex) SNR algorithm. The new algorithm is thus a third approach to obtain optimal asymptotic covariance. These strong results require assumptions on the model. A linearized model is considered, and the noise is assumed to be a martingale difference sequence. Numerical results are obtained in a non-linear setting that is the motivation for this work: In PolSA implementations of Q-learning it is observed that coupling occurs with SNR in this non-ideal setting. |
Tasks | Q-Learning, Stochastic Optimization |
Published | 2018-09-17 |
URL | http://arxiv.org/abs/1809.06277v2 |
http://arxiv.org/pdf/1809.06277v2.pdf | |
PWC | https://paperswithcode.com/paper/optimal-matrix-momentum-stochastic |
Repo | |
Framework | |
Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models
Title | Regularized Maximum Likelihood Estimation and Feature Selection in Mixtures-of-Experts Models |
Authors | Faicel Chamroukhi, Bao-Tuyen Huynh |
Abstract | Mixture of Experts (MoE) are successful models for modeling heterogeneous data in many statistical learning problems including regression, clustering and classification. Generally fitted by maximum likelihood estimation via the well-known EM algorithm, their application to high-dimensional problems is still therefore challenging. We consider the problem of fitting and feature selection in MoE models, and propose a regularized maximum likelihood estimation approach that encourages sparse solutions for heterogeneous regression data models with potentially high-dimensional predictors. Unlike state-of-the art regularized MLE for MoE, the proposed modelings do not require an approximate of the penalty function. We develop two hybrid EM algorithms: an Expectation-Majorization-Maximization (EM/MM) algorithm, and an EM algorithm with coordinate ascent algorithm. The proposed algorithms allow to automatically obtaining sparse solutions without thresholding, and avoid matrix inversion by allowing univariate parameter updates. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data. |
Tasks | Feature Selection |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12161v1 |
http://arxiv.org/pdf/1810.12161v1.pdf | |
PWC | https://paperswithcode.com/paper/regularized-maximum-likelihood-estimation-and |
Repo | |
Framework | |
Solving for high dimensional committor functions using artificial neural networks
Title | Solving for high dimensional committor functions using artificial neural networks |
Authors | Yuehaw Khoo, Jianfeng Lu, Lexing Ying |
Abstract | In this note we propose a method based on artificial neural network to study the transition between states governed by stochastic processes. In particular, we aim for numerical schemes for the committor function, the central object of transition path theory, which satisfies a high-dimensional Fokker-Planck equation. By working with the variational formulation of such partial differential equation and parameterizing the committor function in terms of a neural network, approximations can be obtained via optimizing the neural network weights using stochastic algorithms. The numerical examples show that moderate accuracy can be achieved for high-dimensional problems. |
Tasks | |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1802.10275v1 |
http://arxiv.org/pdf/1802.10275v1.pdf | |
PWC | https://paperswithcode.com/paper/solving-for-high-dimensional-committor |
Repo | |
Framework | |
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
Title | Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence |
Authors | David Manheim |
Abstract | An important challenge for safety in machine learning and artificial intelligence systems is a~set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses how extant literature on multi-agent AI fails to address these failure modes, and identifies work which may be useful for the mitigation of these failure modes. |
Tasks | |
Published | 2018-10-16 |
URL | http://arxiv.org/abs/1810.10862v4 |
http://arxiv.org/pdf/1810.10862v4.pdf | |
PWC | https://paperswithcode.com/paper/multiparty-dynamics-and-failure-modes-for |
Repo | |
Framework | |
Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning
Title | Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning |
Authors | Vladimir Feinberg, Alvin Wan, Ion Stoica, Michael I. Jordan, Joseph E. Gonzalez, Sergey Levine |
Abstract | Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit usage of the dynamics model. We present model-based value expansion, which controls for uncertainty in the model by only allowing imagination to fixed depth. By enabling wider use of learned dynamics models within a model-free reinforcement learning algorithm, we improve value estimation, which, in turn, reduces the sample complexity of learning. |
Tasks | Continuous Control |
Published | 2018-02-28 |
URL | http://arxiv.org/abs/1803.00101v1 |
http://arxiv.org/pdf/1803.00101v1.pdf | |
PWC | https://paperswithcode.com/paper/model-based-value-estimation-for-efficient |
Repo | |
Framework | |
Learning Rules-First Classifiers
Title | Learning Rules-First Classifiers |
Authors | Deborah Cohen, Amit Daniely, Amir Globerson, Gal Elidan |
Abstract | Complex classifiers may exhibit “embarassing” failures in cases where humans can easily provide a justified classification. Avoiding such failures is obviously of key importance. In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier. We define a hypothesis class that captures this notion and determine its sample complexity. We also give evidence that efficient algorithms cannot achieve this sample complexity. We then derive a simple and efficient algorithm and show that its sample complexity is close to optimal, among efficient algorithms. Experiments on synthetic and sentiment analysis data demonstrate the efficacy of the method, both in terms of accuracy and interpretability. |
Tasks | Sentiment Analysis |
Published | 2018-03-08 |
URL | https://arxiv.org/abs/1803.03155v4 |
https://arxiv.org/pdf/1803.03155v4.pdf | |
PWC | https://paperswithcode.com/paper/learning-rules-first-classifiers |
Repo | |
Framework | |