July 27, 2019

2601 words 13 mins read

Paper Group ANR 574

Agent based Tools for Modeling and Simulation of Self-Organization in Peer-to-Peer, Ad-Hoc and other Complex Networks. Accumulated Gradient Normalization. Don’t relax: early stopping for convex regularization. Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer. LinXGBoost: Extension of XGBoost to Ge …

Agent based Tools for Modeling and Simulation of Self-Organization in Peer-to-Peer, Ad-Hoc and other Complex Networks


Title	Agent based Tools for Modeling and Simulation of Self-Organization in Peer-to-Peer, Ad-Hoc and other Complex Networks
Authors	Muaz A. Niazi, Amir Hussain
Abstract	Agent-based modeling and simulation tools provide a mature platform for development of complex simulations. They however, have not been applied much in the domain of mainstream modeling and simulation of computer networks. In this article, we evaluate how and if these tools can offer any value-addition in the modeling & simulation of complex networks such as pervasive computing, large-scale peer-to-peer systems, and networks involving considerable environment and human/animal/habitat interaction. Specifically, we demonstrate the effectiveness of NetLogo - a tool that has been widely used in the area of agent-based social simulation.
Tasks
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01599v1
PDF	http://arxiv.org/pdf/1708.01599v1.pdf
PWC	https://paperswithcode.com/paper/agent-based-tools-for-modeling-and-simulation
Repo
Framework

Accumulated Gradient Normalization


Title	Accumulated Gradient Normalization
Authors	Joeri Hermans, Gerasimos Spanakis, Rico Möckel
Abstract	This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically.
Tasks
Published	2017-10-06
URL	http://arxiv.org/abs/1710.02368v1
PDF	http://arxiv.org/pdf/1710.02368v1.pdf
PWC	https://paperswithcode.com/paper/accumulated-gradient-normalization
Repo
Framework

Don’t relax: early stopping for convex regularization


Title	Don’t relax: early stopping for convex regularization
Authors	Simon Matet, Lorenzo Rosasco, Silvia Villa, Bang Long Vu
Abstract	We consider the problem of designing efficient regularization algorithms when regularization is encoded by a (strongly) convex functional. Unlike classical penalization methods based on a relaxation approach, we propose an iterative method where regularization is achieved via early stopping. Our results show that the proposed procedure achieves the same recovery accuracy as penalization methods, while naturally integrating computational considerations. An empirical analysis on a number of problems provides promising results with respect to the state of the art.
Tasks
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05422v1
PDF	http://arxiv.org/pdf/1707.05422v1.pdf
PWC	https://paperswithcode.com/paper/dont-relax-early-stopping-for-convex
Repo
Framework

Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer


Title	Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer
Authors	David Isele, Mohammad Rostami, Eric Eaton
Abstract	Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.
Tasks	Dictionary Learning, Transfer Learning, Zero-Shot Learning
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03850v1
PDF	http://arxiv.org/pdf/1710.03850v1.pdf
PWC	https://paperswithcode.com/paper/using-task-descriptions-in-lifelong-machine
Repo
Framework

LinXGBoost: Extension of XGBoost to Generalized Local Linear Models


Title	LinXGBoost: Extension of XGBoost to Generalized Local Linear Models
Authors	Laurent de Vito
Abstract	XGBoost is often presented as the algorithm that wins every ML competition. Surprisingly, this is true even though predictions are piecewise constant. This might be justified in high dimensional input spaces, but when the number of features is low, a piecewise linear model is likely to perform better. XGBoost was extended into LinXGBoost that stores at each leaf a linear model. This extension, equivalent to piecewise regularized least-squares, is particularly attractive for regression of functions that exhibits jumps or discontinuities. Those functions are notoriously hard to regress. Our extension is compared to the vanilla XGBoost and Random Forest in experiments on both synthetic and real-world data sets.
Tasks
Published	2017-10-10
URL	http://arxiv.org/abs/1710.03634v1
PDF	http://arxiv.org/pdf/1710.03634v1.pdf
PWC	https://paperswithcode.com/paper/linxgboost-extension-of-xgboost-to
Repo
Framework

Memory Efficient Max Flow for Multi-label Submodular MRFs


Title	Memory Efficient Max Flow for Multi-label Submodular MRFs
Authors	Thalaiyasingam Ajanthan, Richard Hartley, Mathieu Salzmann
Abstract	Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable $X_i$ is represented by $\ell$ nodes (where $\ell$ is the number of labels) arranged in a column. However, this method in general requires $2,\ell^2$ edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. In this paper, we introduce a variant of the max-flow algorithm that requires much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer.
Tasks
Published	2017-02-20
URL	http://arxiv.org/abs/1702.05888v1
PDF	http://arxiv.org/pdf/1702.05888v1.pdf
PWC	https://paperswithcode.com/paper/memory-efficient-max-flow-for-multi-label
Repo
Framework

Overcoming Catastrophic Interference by Conceptors


Title	Overcoming Catastrophic Interference by Conceptors
Authors	Xu He, Herbert Jaeger
Abstract	Catastrophic interference has been a major roadblock in the research of continual learning. Here we propose a variant of the back-propagation algorithm, “conceptor-aided back-prop” (CAB), in which gradients are shielded by conceptors against degradation of previously learned tasks. Conceptors have their origin in reservoir computing, where they have been previously shown to overcome catastrophic forgetting. CAB extends these results to deep feedforward networks. On the disjoint MNIST task CAB outperforms two other methods for coping with catastrophic interference that have recently been proposed in the deep learning field.
Tasks	Continual Learning
Published	2017-07-16
URL	http://arxiv.org/abs/1707.04853v2
PDF	http://arxiv.org/pdf/1707.04853v2.pdf
PWC	https://paperswithcode.com/paper/overcoming-catastrophic-interference-by
Repo
Framework

Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks


Title	Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks
Authors	Shihao Wang, Dajiang Zhou, Xushen Han, Takeshi Yoshimura
Abstract	Deep convolutional neural networks (CNN) have shown their good performances in many computer vision tasks. However, the high computational complexity of CNN involves a huge amount of data movements between the computational processor core and memory hierarchy which occupies the major of the power consumption. This paper presents Chain-NN, a novel energy-efficient 1D chain architecture for accelerating deep CNNs. Chain-NN consists of the dedicated dual-channel process engines (PE). In Chain-NN, convolutions are done by the 1D systolic primitives composed of a group of adjacent PEs. These systolic primitives, together with the proposed column-wise scan input pattern, can fully reuse input operand to reduce the memory bandwidth requirement for energy saving. Moreover, the 1D chain architecture allows the systolic primitives to be easily reconfigured according to specific CNN parameters with fewer design complexity. The synthesis and layout of Chain-NN is under TSMC 28nm process. It costs 3751k logic gates and 352KB on-chip memory. The results show a 576-PE Chain-NN can be scaled up to 700MHz. This achieves a peak throughput of 806.4GOPS with 567.5mW and is able to accelerate the five convolutional layers in AlexNet at a frame rate of 326.2fps. 1421.0GOPS/W power efficiency is at least 2.5 to 4.1x times better than the state-of-the-art works.
Tasks
Published	2017-03-04
URL	http://arxiv.org/abs/1703.01457v1
PDF	http://arxiv.org/pdf/1703.01457v1.pdf
PWC	https://paperswithcode.com/paper/chain-nn-an-energy-efficient-1d-chain
Repo
Framework

A Fully Trainable Network with RNN-based Pooling


Title	A Fully Trainable Network with RNN-based Pooling
Authors	Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, Yanbo Gao
Abstract	Pooling is an important component in convolutional neural networks (CNNs) for aggregating features and reducing computational burden. Compared with other components such as convolutional layers and fully connected layers which are completely learned from data, the pooling component is still handcrafted such as max pooling and average pooling. This paper proposes a learnable pooling function using recurrent neural networks (RNN) so that the pooling can be fully adapted to data and other components of the network, leading to an improved performance. Such a network with learnable pooling function is referred to as a fully trainable network (FTN). Experimental results have demonstrated that the proposed RNN-based pooling can well approximate the existing pooling functions and improve the performance of the network. Especially for small networks, the proposed FTN can improve the performance by seven percentage points in terms of error rate on the CIFAR-10 dataset compared with the traditional CNN.
Tasks
Published	2017-06-16
URL	http://arxiv.org/abs/1706.05157v1
PDF	http://arxiv.org/pdf/1706.05157v1.pdf
PWC	https://paperswithcode.com/paper/a-fully-trainable-network-with-rnn-based
Repo
Framework


Title	Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation
Authors	Toan Q. Nguyen, David Chiang
Abstract	We present a simple method to improve neural translation of a low-resource language pair using parallel data from a related, also low-resource, language pair. The method is based on the transfer method of Zoph et al., but whereas their method ignores any source vocabulary overlap, ours exploits it. First, we split words using Byte Pair Encoding (BPE) to increase vocabulary overlap. Then, we train a model on the first language pair and transfer its parameters, including its source word embeddings, to another model and continue training on the second language pair. Our experiments show that transfer learning helps word-based translation only slightly, but when used on top of a much stronger BPE baseline, it yields larger improvements of up to 4.3 BLEU.
Tasks	Machine Translation, Transfer Learning, Word Embeddings
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09803v2
PDF	http://arxiv.org/pdf/1708.09803v2.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-across-low-resource-related
Repo
Framework

Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition


Title	Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
Authors	Kalin Stefanov, Jonas Beskow, Giampiero Salvi
Abstract	This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.
Tasks	Language Acquisition
Published	2017-11-24
URL	https://arxiv.org/abs/1711.08992v2
PDF	https://arxiv.org/pdf/1711.08992v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-vision-based-detection-of-the
Repo
Framework

Encrypted accelerated least squares regression


Title	Encrypted accelerated least squares regression
Authors	Pedro M. Esperança, Louis J. M. Aslett, Chris C. Holmes
Abstract	Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme. Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption. The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique. This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy.
Tasks
Published	2017-03-02
URL	http://arxiv.org/abs/1703.00839v1
PDF	http://arxiv.org/pdf/1703.00839v1.pdf
PWC	https://paperswithcode.com/paper/encrypted-accelerated-least-squares
Repo
Framework

Why do similarity matching objectives lead to Hebbian/anti-Hebbian networks?


Title	Why do similarity matching objectives lead to Hebbian/anti-Hebbian networks?
Authors	Cengiz Pehlevan, Anirvan Sengupta, Dmitri B. Chklovskii
Abstract	Modeling self-organization of neural networks for unsupervised learning using Hebbian and anti-Hebbian plasticity has a long history in neuroscience. Yet, derivations of single-layer networks with such local learning rules from principled optimization objectives became possible only recently, with the introduction of similarity matching objectives. What explains the success of similarity matching objectives in deriving neural networks with local learning rules? Here, using dimensionality reduction as an example, we introduce several variable substitutions that illuminate the success of similarity matching. We show that the full network objective may be optimized separately for each synapse using local learning rules both in the offline and online settings. We formalize the long-standing intuition of the rivalry between Hebbian and anti-Hebbian rules by formulating a min-max optimization problem. We introduce a novel dimensionality reduction objective using fractional matrix exponents. To illustrate the generality of our approach, we apply it to a novel formulation of dimensionality reduction combined with whitening. We confirm numerically that the networks with learning rules derived from principled objectives perform better than those with heuristic learning rules.
Tasks	Dimensionality Reduction
Published	2017-03-23
URL	http://arxiv.org/abs/1703.07914v2
PDF	http://arxiv.org/pdf/1703.07914v2.pdf
PWC	https://paperswithcode.com/paper/why-do-similarity-matching-objectives-lead-to
Repo
Framework

Neural Embeddings of Graphs in Hyperbolic Space


Title	Neural Embeddings of Graphs in Hyperbolic Space
Authors	Benjamin Paul Chamberlain, James Clough, Marc Peter Deisenroth
Abstract	Neural embeddings have been used with great success in Natural Language Processing (NLP). They provide compact representations that encapsulate word similarity and attain state-of-the-art performance in a range of linguistic tasks. The success of neural embeddings has prompted significant amounts of research into applications in domains other than language. One such domain is graph-structured data, where embeddings of vertices can be learned that encapsulate vertex similarity and improve performance on tasks including edge prediction and vertex labelling. For both NLP and graph based tasks, embeddings have been learned in high-dimensional Euclidean spaces. However, recent work has shown that the appropriate isometric space for embedding complex networks is not the flat Euclidean space, but negatively curved, hyperbolic space. We present a new concept that exploits these recent insights and propose learning neural embeddings of graphs in hyperbolic space. We provide experimental evidence that embedding graphs in their natural geometry significantly improves performance on downstream tasks for several real-world public datasets.
Tasks
Published	2017-05-29
URL	http://arxiv.org/abs/1705.10359v1
PDF	http://arxiv.org/pdf/1705.10359v1.pdf
PWC	https://paperswithcode.com/paper/neural-embeddings-of-graphs-in-hyperbolic
Repo
Framework

Neural Word Segmentation with Rich Pretraining


Title	Neural Word Segmentation with Rich Pretraining
Authors	Jie Yang, Yue Zhang, Fei Dong
Abstract	Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks.
Tasks	Word Embeddings
Published	2017-04-28
URL	http://arxiv.org/abs/1704.08960v1
PDF	http://arxiv.org/pdf/1704.08960v1.pdf
PWC	https://paperswithcode.com/paper/neural-word-segmentation-with-rich
Repo
Framework