Paper Group ANR 574
Agent based Tools for Modeling and Simulation of Self-Organization in Peer-to-Peer, Ad-Hoc and other Complex Networks. Accumulated Gradient Normalization. Don’t relax: early stopping for convex regularization. Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer. LinXGBoost: Extension of XGBoost to Ge …
Agent based Tools for Modeling and Simulation of Self-Organization in Peer-to-Peer, Ad-Hoc and other Complex Networks
Title | Agent based Tools for Modeling and Simulation of Self-Organization in Peer-to-Peer, Ad-Hoc and other Complex Networks |
Authors | Muaz A. Niazi, Amir Hussain |
Abstract | Agent-based modeling and simulation tools provide a mature platform for development of complex simulations. They however, have not been applied much in the domain of mainstream modeling and simulation of computer networks. In this article, we evaluate how and if these tools can offer any value-addition in the modeling & simulation of complex networks such as pervasive computing, large-scale peer-to-peer systems, and networks involving considerable environment and human/animal/habitat interaction. Specifically, we demonstrate the effectiveness of NetLogo - a tool that has been widely used in the area of agent-based social simulation. |
Tasks | |
Published | 2017-08-04 |
URL | http://arxiv.org/abs/1708.01599v1 |
http://arxiv.org/pdf/1708.01599v1.pdf | |
PWC | https://paperswithcode.com/paper/agent-based-tools-for-modeling-and-simulation |
Repo | |
Framework | |
Accumulated Gradient Normalization
Title | Accumulated Gradient Normalization |
Authors | Joeri Hermans, Gerasimos Spanakis, Rico Möckel |
Abstract | This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically. |
Tasks | |
Published | 2017-10-06 |
URL | http://arxiv.org/abs/1710.02368v1 |
http://arxiv.org/pdf/1710.02368v1.pdf | |
PWC | https://paperswithcode.com/paper/accumulated-gradient-normalization |
Repo | |
Framework | |
Don’t relax: early stopping for convex regularization
Title | Don’t relax: early stopping for convex regularization |
Authors | Simon Matet, Lorenzo Rosasco, Silvia Villa, Bang Long Vu |
Abstract | We consider the problem of designing efficient regularization algorithms when regularization is encoded by a (strongly) convex functional. Unlike classical penalization methods based on a relaxation approach, we propose an iterative method where regularization is achieved via early stopping. Our results show that the proposed procedure achieves the same recovery accuracy as penalization methods, while naturally integrating computational considerations. An empirical analysis on a number of problems provides promising results with respect to the state of the art. |
Tasks | |
Published | 2017-07-18 |
URL | http://arxiv.org/abs/1707.05422v1 |
http://arxiv.org/pdf/1707.05422v1.pdf | |
PWC | https://paperswithcode.com/paper/dont-relax-early-stopping-for-convex |
Repo | |
Framework | |
Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer
Title | Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer |
Authors | David Isele, Mohammad Rostami, Eric Eaton |
Abstract | Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task. |
Tasks | Dictionary Learning, Transfer Learning, Zero-Shot Learning |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03850v1 |
http://arxiv.org/pdf/1710.03850v1.pdf | |
PWC | https://paperswithcode.com/paper/using-task-descriptions-in-lifelong-machine |
Repo | |
Framework | |
LinXGBoost: Extension of XGBoost to Generalized Local Linear Models
Title | LinXGBoost: Extension of XGBoost to Generalized Local Linear Models |
Authors | Laurent de Vito |
Abstract | XGBoost is often presented as the algorithm that wins every ML competition. Surprisingly, this is true even though predictions are piecewise constant. This might be justified in high dimensional input spaces, but when the number of features is low, a piecewise linear model is likely to perform better. XGBoost was extended into LinXGBoost that stores at each leaf a linear model. This extension, equivalent to piecewise regularized least-squares, is particularly attractive for regression of functions that exhibits jumps or discontinuities. Those functions are notoriously hard to regress. Our extension is compared to the vanilla XGBoost and Random Forest in experiments on both synthetic and real-world data sets. |
Tasks | |
Published | 2017-10-10 |
URL | http://arxiv.org/abs/1710.03634v1 |
http://arxiv.org/pdf/1710.03634v1.pdf | |
PWC | https://paperswithcode.com/paper/linxgboost-extension-of-xgboost-to |
Repo | |
Framework | |
Memory Efficient Max Flow for Multi-label Submodular MRFs
Title | Memory Efficient Max Flow for Multi-label Submodular MRFs |
Authors | Thalaiyasingam Ajanthan, Richard Hartley, Mathieu Salzmann |
Abstract | Multi-label submodular Markov Random Fields (MRFs) have been shown to be solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable $X_i$ is represented by $\ell$ nodes (where $\ell$ is the number of labels) arranged in a column. However, this method in general requires $2,\ell^2$ edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. In this paper, we introduce a variant of the max-flow algorithm that requires much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer. |
Tasks | |
Published | 2017-02-20 |
URL | http://arxiv.org/abs/1702.05888v1 |
http://arxiv.org/pdf/1702.05888v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-efficient-max-flow-for-multi-label |
Repo | |
Framework | |
Overcoming Catastrophic Interference by Conceptors
Title | Overcoming Catastrophic Interference by Conceptors |
Authors | Xu He, Herbert Jaeger |
Abstract | Catastrophic interference has been a major roadblock in the research of continual learning. Here we propose a variant of the back-propagation algorithm, “conceptor-aided back-prop” (CAB), in which gradients are shielded by conceptors against degradation of previously learned tasks. Conceptors have their origin in reservoir computing, where they have been previously shown to overcome catastrophic forgetting. CAB extends these results to deep feedforward networks. On the disjoint MNIST task CAB outperforms two other methods for coping with catastrophic interference that have recently been proposed in the deep learning field. |
Tasks | Continual Learning |
Published | 2017-07-16 |
URL | http://arxiv.org/abs/1707.04853v2 |
http://arxiv.org/pdf/1707.04853v2.pdf | |
PWC | https://paperswithcode.com/paper/overcoming-catastrophic-interference-by |
Repo | |
Framework | |
Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks
Title | Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks |
Authors | Shihao Wang, Dajiang Zhou, Xushen Han, Takeshi Yoshimura |
Abstract | Deep convolutional neural networks (CNN) have shown their good performances in many computer vision tasks. However, the high computational complexity of CNN involves a huge amount of data movements between the computational processor core and memory hierarchy which occupies the major of the power consumption. This paper presents Chain-NN, a novel energy-efficient 1D chain architecture for accelerating deep CNNs. Chain-NN consists of the dedicated dual-channel process engines (PE). In Chain-NN, convolutions are done by the 1D systolic primitives composed of a group of adjacent PEs. These systolic primitives, together with the proposed column-wise scan input pattern, can fully reuse input operand to reduce the memory bandwidth requirement for energy saving. Moreover, the 1D chain architecture allows the systolic primitives to be easily reconfigured according to specific CNN parameters with fewer design complexity. The synthesis and layout of Chain-NN is under TSMC 28nm process. It costs 3751k logic gates and 352KB on-chip memory. The results show a 576-PE Chain-NN can be scaled up to 700MHz. This achieves a peak throughput of 806.4GOPS with 567.5mW and is able to accelerate the five convolutional layers in AlexNet at a frame rate of 326.2fps. 1421.0GOPS/W power efficiency is at least 2.5 to 4.1x times better than the state-of-the-art works. |
Tasks | |
Published | 2017-03-04 |
URL | http://arxiv.org/abs/1703.01457v1 |
http://arxiv.org/pdf/1703.01457v1.pdf | |
PWC | https://paperswithcode.com/paper/chain-nn-an-energy-efficient-1d-chain |
Repo | |
Framework | |
A Fully Trainable Network with RNN-based Pooling
Title | A Fully Trainable Network with RNN-based Pooling |
Authors | Shuai Li, Wanqing Li, Chris Cook, Ce Zhu, Yanbo Gao |
Abstract | Pooling is an important component in convolutional neural networks (CNNs) for aggregating features and reducing computational burden. Compared with other components such as convolutional layers and fully connected layers which are completely learned from data, the pooling component is still handcrafted such as max pooling and average pooling. This paper proposes a learnable pooling function using recurrent neural networks (RNN) so that the pooling can be fully adapted to data and other components of the network, leading to an improved performance. Such a network with learnable pooling function is referred to as a fully trainable network (FTN). Experimental results have demonstrated that the proposed RNN-based pooling can well approximate the existing pooling functions and improve the performance of the network. Especially for small networks, the proposed FTN can improve the performance by seven percentage points in terms of error rate on the CIFAR-10 dataset compared with the traditional CNN. |
Tasks | |
Published | 2017-06-16 |
URL | http://arxiv.org/abs/1706.05157v1 |
http://arxiv.org/pdf/1706.05157v1.pdf | |
PWC | https://paperswithcode.com/paper/a-fully-trainable-network-with-rnn-based |
Repo | |
Framework | |
Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation
Title | Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation |
Authors | Toan Q. Nguyen, David Chiang |
Abstract | We present a simple method to improve neural translation of a low-resource language pair using parallel data from a related, also low-resource, language pair. The method is based on the transfer method of Zoph et al., but whereas their method ignores any source vocabulary overlap, ours exploits it. First, we split words using Byte Pair Encoding (BPE) to increase vocabulary overlap. Then, we train a model on the first language pair and transfer its parameters, including its source word embeddings, to another model and continue training on the second language pair. Our experiments show that transfer learning helps word-based translation only slightly, but when used on top of a much stronger BPE baseline, it yields larger improvements of up to 4.3 BLEU. |
Tasks | Machine Translation, Transfer Learning, Word Embeddings |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09803v2 |
http://arxiv.org/pdf/1708.09803v2.pdf | |
PWC | https://paperswithcode.com/paper/transfer-learning-across-low-resource-related |
Repo | |
Framework | |
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
Title | Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition |
Authors | Kalin Stefanov, Jonas Beskow, Giampiero Salvi |
Abstract | This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions. |
Tasks | Language Acquisition |
Published | 2017-11-24 |
URL | https://arxiv.org/abs/1711.08992v2 |
https://arxiv.org/pdf/1711.08992v2.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-vision-based-detection-of-the |
Repo | |
Framework | |
Encrypted accelerated least squares regression
Title | Encrypted accelerated least squares regression |
Authors | Pedro M. Esperança, Louis J. M. Aslett, Chris C. Holmes |
Abstract | Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme. Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption. The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique. This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy. |
Tasks | |
Published | 2017-03-02 |
URL | http://arxiv.org/abs/1703.00839v1 |
http://arxiv.org/pdf/1703.00839v1.pdf | |
PWC | https://paperswithcode.com/paper/encrypted-accelerated-least-squares |
Repo | |
Framework | |
Why do similarity matching objectives lead to Hebbian/anti-Hebbian networks?
Title | Why do similarity matching objectives lead to Hebbian/anti-Hebbian networks? |
Authors | Cengiz Pehlevan, Anirvan Sengupta, Dmitri B. Chklovskii |
Abstract | Modeling self-organization of neural networks for unsupervised learning using Hebbian and anti-Hebbian plasticity has a long history in neuroscience. Yet, derivations of single-layer networks with such local learning rules from principled optimization objectives became possible only recently, with the introduction of similarity matching objectives. What explains the success of similarity matching objectives in deriving neural networks with local learning rules? Here, using dimensionality reduction as an example, we introduce several variable substitutions that illuminate the success of similarity matching. We show that the full network objective may be optimized separately for each synapse using local learning rules both in the offline and online settings. We formalize the long-standing intuition of the rivalry between Hebbian and anti-Hebbian rules by formulating a min-max optimization problem. We introduce a novel dimensionality reduction objective using fractional matrix exponents. To illustrate the generality of our approach, we apply it to a novel formulation of dimensionality reduction combined with whitening. We confirm numerically that the networks with learning rules derived from principled objectives perform better than those with heuristic learning rules. |
Tasks | Dimensionality Reduction |
Published | 2017-03-23 |
URL | http://arxiv.org/abs/1703.07914v2 |
http://arxiv.org/pdf/1703.07914v2.pdf | |
PWC | https://paperswithcode.com/paper/why-do-similarity-matching-objectives-lead-to |
Repo | |
Framework | |
Neural Embeddings of Graphs in Hyperbolic Space
Title | Neural Embeddings of Graphs in Hyperbolic Space |
Authors | Benjamin Paul Chamberlain, James Clough, Marc Peter Deisenroth |
Abstract | Neural embeddings have been used with great success in Natural Language Processing (NLP). They provide compact representations that encapsulate word similarity and attain state-of-the-art performance in a range of linguistic tasks. The success of neural embeddings has prompted significant amounts of research into applications in domains other than language. One such domain is graph-structured data, where embeddings of vertices can be learned that encapsulate vertex similarity and improve performance on tasks including edge prediction and vertex labelling. For both NLP and graph based tasks, embeddings have been learned in high-dimensional Euclidean spaces. However, recent work has shown that the appropriate isometric space for embedding complex networks is not the flat Euclidean space, but negatively curved, hyperbolic space. We present a new concept that exploits these recent insights and propose learning neural embeddings of graphs in hyperbolic space. We provide experimental evidence that embedding graphs in their natural geometry significantly improves performance on downstream tasks for several real-world public datasets. |
Tasks | |
Published | 2017-05-29 |
URL | http://arxiv.org/abs/1705.10359v1 |
http://arxiv.org/pdf/1705.10359v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-embeddings-of-graphs-in-hyperbolic |
Repo | |
Framework | |
Neural Word Segmentation with Rich Pretraining
Title | Neural Word Segmentation with Rich Pretraining |
Authors | Jie Yang, Yue Zhang, Fei Dong |
Abstract | Neural word segmentation research has benefited from large-scale raw texts by leveraging them for pretraining character and word embeddings. On the other hand, statistical segmentation research has exploited richer sources of external information, such as punctuation, automatic segmentation and POS. We investigate the effectiveness of a range of external training sources for neural word segmentation by building a modular segmentation model, pretraining the most important submodule using rich external sources. Results show that such pretraining significantly improves the model, leading to accuracies competitive to the best methods on six benchmarks. |
Tasks | Word Embeddings |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08960v1 |
http://arxiv.org/pdf/1704.08960v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-word-segmentation-with-rich |
Repo | |
Framework | |