Paper Group ANR 68
Data-driven approximation of the Koopman generator: Model reduction, system identification, and control. Multi-task Learning for Japanese Predicate Argument Structure Analysis. Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents. A Spectral Approach to Unsupervised Object Segmentation in Video. Argument Invention from F …
Data-driven approximation of the Koopman generator: Model reduction, system identification, and control
Title | Data-driven approximation of the Koopman generator: Model reduction, system identification, and control |
Authors | Stefan Klus, Feliks Nüske, Sebastian Peitz, Jan-Hendrik Niemann, Cecilia Clementi, Christof Schütte |
Abstract | We derive a data-driven method for the approximation of the Koopman generator called gEDMD, which can be regarded as a straightforward extension of EDMD (extended dynamic mode decomposition). This approach is applicable to deterministic and stochastic dynamical systems. It can be used for computing eigenvalues, eigenfunctions, and modes of the generator and for system identification. In addition to learning the governing equations of deterministic systems, which then reduces to SINDy (sparse identification of nonlinear dynamics), it is possible to identify the drift and diffusion terms of stochastic differential equations from data. Moreover, we apply gEDMD to derive coarse-grained models of high-dimensional systems, and also to determine efficient model predictive control strategies. We highlight relationships with other methods and demonstrate the efficacy of the proposed methods using several guiding examples and prototypical molecular dynamics problems. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10638v2 |
https://arxiv.org/pdf/1909.10638v2.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-approximation-of-the-koopman |
Repo | |
Framework | |
Multi-task Learning for Japanese Predicate Argument Structure Analysis
Title | Multi-task Learning for Japanese Predicate Argument Structure Analysis |
Authors | Hikaru Omori, Mamoru Komachi |
Abstract | An event-noun is a noun that has an argument structure similar to a predicate. Recent works, including those considered state-of-the-art, ignore event-nouns or build a single model for solving both Japanese predicate argument structure analysis (PASA) and event-noun argument structure analysis (ENASA). However, because there are interactions between predicates and event-nouns, it is not sufficient to target only predicates. To address this problem, we present a multi-task learning method for PASA and ENASA. Our multi-task models improved the performance of both tasks compared to a single-task model by sharing knowledge from each task. Moreover, in PASA, our models achieved state-of-the-art results in overall F1 scores on the NAIST Text Corpus. In addition, this is the first work to employ neural networks in ENASA. |
Tasks | Multi-Task Learning |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02244v1 |
http://arxiv.org/pdf/1904.02244v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-for-japanese-predicate |
Repo | |
Framework | |
Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents
Title | Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents |
Authors | Donghwan Lee, Niao He, Parameswaran Kamalaruban, Volkan Cevher |
Abstract | This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate. We provide an overview of this emerging field, with an emphasis on the decentralized setting under different coordination protocols. We highlight the evolution of reinforcement learning algorithms from single-agent to multi-agent systems, from a distributed optimization perspective, and conclude with future directions and challenges, in the hope to catalyze the growing synergy among distributed optimization, signal processing, and reinforcement learning communities. |
Tasks | Distributed Optimization, Multi-agent Reinforcement Learning |
Published | 2019-12-01 |
URL | https://arxiv.org/abs/1912.00498v1 |
https://arxiv.org/pdf/1912.00498v1.pdf | |
PWC | https://paperswithcode.com/paper/optimization-for-reinforcement-learning-from |
Repo | |
Framework | |
A Spectral Approach to Unsupervised Object Segmentation in Video
Title | A Spectral Approach to Unsupervised Object Segmentation in Video |
Authors | Elena Burceanu, Marius Leordeanu |
Abstract | We formulate object segmentation in video as a graph partitioning problem in space and time, in which nodes are pixels and their relations form local neighbourhoods. We claim that the strongest cluster in this pixel-level graph represents the salient object segmentation. We compute the main cluster using a novel and fast 3D filtering technique that finds the spectral clustering solution, namely the principal eigenvector of the graph’s adjacency matrix, without building the matrix explicitly - which would be intractable. Our method is based on the power iteration for finding the principal eigenvector of a matrix, which we prove that it is equivalent to performing a specific set of 3D convolutions in the space-time feature volume. This allows to avoid creating the matrix and have a fast parallel implementation on GPU. We show that our method is much faster than classical power iteration applied directly on the adjacency matrix. Different from other works, ours is dedicated to preserving object consistency in space and time at the level of pixels. For that it requires powerful pixel-wise features at the frame level. This makes it perfectly suitable for incorporating the output of a backbone network or other methods and fast improving over their solution without supervision. In experiments, we obtain consistent improvement, with the same set of hyper-parameters, over the top state of the art methods on DAVIS-2016 dataset, both in unsupervised and semi-supervised tasks. We also achieve top results on the well-known SegTrackv2 dataset. |
Tasks | graph partitioning, Semantic Segmentation |
Published | 2019-07-05 |
URL | https://arxiv.org/abs/1907.02731v2 |
https://arxiv.org/pdf/1907.02731v2.pdf | |
PWC | https://paperswithcode.com/paper/a-spectral-approach-to-unsupervised-object |
Repo | |
Framework | |
Argument Invention from First Principles
Title | Argument Invention from First Principles |
Authors | Yonatan Bilu, Ariel Gera, Daniel Hershcovich, Benjamin Sznajder, Dan Lahav, Guy Moshkowich, Anael Malet, Assaf Gavron, Noam Slonim |
Abstract | Competitive debaters often find themselves facing a challenging task – how to debate a topic they know very little about, with only minutes to prepare, and without access to books or the Internet? What they often do is rely on “first principles”, commonplace arguments which are relevant to many topics, and which they have refined in past debates. In this work we aim to explicitly define a taxonomy of such principled recurring arguments, and, given a controversial topic, to automatically identify which of these arguments are relevant to the topic. As far as we know, this is the first time that this approach to argument invention is formalized and made explicit in the context of NLP. The main goal of this work is to show that it is possible to define such a taxonomy. While the taxonomy suggested here should be thought of as a “first attempt” it is nonetheless coherent, covers well the relevant topics and coincides with what professional debaters actually argue in their speeches, and facilitates automatic argument invention for new topics. |
Tasks | |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08336v1 |
https://arxiv.org/pdf/1908.08336v1.pdf | |
PWC | https://paperswithcode.com/paper/argument-invention-from-first-principles-1 |
Repo | |
Framework | |
Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees
Title | Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees |
Authors | Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, Xiaowen Chu |
Abstract | To reduce the long training time of large deep neural network (DNN) models, distributed synchronous stochastic gradient descent (S-SGD) is commonly used on a cluster of workers. However, the speedup brought by multiple workers is limited by the communication overhead. Two approaches, namely pipelining and gradient sparsification, have been separately proposed to alleviate the impact of communication overheads. Yet, the gradient sparsification methods can only initiate the communication after the backpropagation, and hence miss the pipelining opportunity. In this paper, we propose a new distributed optimization method named LAGS-SGD, which combines S-SGD with a novel layer-wise adaptive gradient sparsification (LAGS) scheme. In LAGS-SGD, every worker selects a small set of “significant” gradients from each layer independently whose size can be adaptive to the communication-to-computation ratio of that layer. The layer-wise nature of LAGS-SGD opens the opportunity of overlapping communications with computations, while the adaptive nature of LAGS-SGD makes it flexible to control the communication time. We prove that LAGS-SGD has convergence guarantees and it has the same order of convergence rate as vanilla S-SGD under a weak analytical assumption. Extensive experiments are conducted to verify the analytical assumption and the convergence performance of LAGS-SGD. Experimental results on a 16-GPU cluster show that LAGS-SGD outperforms the original S-SGD and existing sparsified S-SGD without losing obvious model accuracy. |
Tasks | Distributed Optimization |
Published | 2019-11-20 |
URL | https://arxiv.org/abs/1911.08727v4 |
https://arxiv.org/pdf/1911.08727v4.pdf | |
PWC | https://paperswithcode.com/paper/layer-wise-adaptive-gradient-sparsification |
Repo | |
Framework | |
Goodness-of-fit Test for Latent Block Models
Title | Goodness-of-fit Test for Latent Block Models |
Authors | Chihiro Watanabe, Taiji Suzuki |
Abstract | Latent block models are used for probabilistic biclustering, which is shown to be an effective method for analyzing various relational data sets. However, there has been no statistical test method for determining the row and column cluster numbers of latent block models. Recent studies have constructed statistical-test-based methods for stochastic block models, which assume that the observed matrix is a square symmetric matrix and that the cluster assignments are the same for rows and columns. In this study, we developed a new goodness-of-fit test for latent block models to test whether an observed data matrix fits a given set of row and column cluster numbers, or it consists of more clusters in at least one direction of the row and the column. To construct the test method, we used a result from the random matrix theory for a sample covariance matrix. We experimentally demonstrated the effectiveness of the proposed method by showing the asymptotic behavior of the test statistic and measuring the test accuracy. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.03886v5 |
https://arxiv.org/pdf/1906.03886v5.pdf | |
PWC | https://paperswithcode.com/paper/goodness-of-fit-test-for-latent-block-models |
Repo | |
Framework | |
On the Convergence of Local Descent Methods in Federated Learning
Title | On the Convergence of Local Descent Methods in Federated Learning |
Authors | Farzin Haddadpour, Mehrdad Mahdavi |
Abstract | In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non i.i.d. data samples). In this paper, we generalize the local stochastic and full gradient descent with periodic averaging– originally designed for homogeneous distributed optimization, to solve nonconvex optimization problems in federated learning. Although scant research is available on the effectiveness of local SGD in reducing the number of communication rounds in homogeneous setting, its convergence and communication complexity in heterogeneous setting is mostly demonstrated empirically and lacks through theoretical understating. To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in federated setting, under mild assumptions, the implicit variance reduction feature of local distributed methods generalize to heterogeneous data shards and exhibits the best known convergence rates of homogeneous setting both in general nonconvex and under {\pl}~ condition (generalization of strong-convexity). Our theoretical results complement the recent empirical studies that demonstrate the applicability of local GD/SGD to federated learning. We also specialize the proposed local method for networked distributed optimization. To the best of our knowledge, the obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization. |
Tasks | Distributed Optimization |
Published | 2019-10-31 |
URL | https://arxiv.org/abs/1910.14425v2 |
https://arxiv.org/pdf/1910.14425v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-convergence-of-local-descent-methods |
Repo | |
Framework | |
A Visual Neural Network for Robust Collision Perception in Vehicle Driving Scenarios
Title | A Visual Neural Network for Robust Collision Perception in Vehicle Driving Scenarios |
Authors | Qinbing Fu, Nicola Bellotto, Huatian Wang, F. Claire Rind, Hongxin Wang, Shigang Yue |
Abstract | This research addresses the challenging problem of visual collision detection in very complex and dynamic real physical scenes, specifically, the vehicle driving scenarios. This research takes inspiration from a large-field looming sensitive neuron, i.e., the lobula giant movement detector (LGMD) in the locust’s visual pathways, which represents high spike frequency to rapid approaching objects. Building upon our previous models, in this paper we propose a novel inhibition mechanism that is capable of adapting to different levels of background complexity. This adaptive mechanism works effectively to mediate the local inhibition strength and tune the temporal latency of local excitation reaching the LGMD neuron. As a result, the proposed model is effective to extract colliding cues from complex dynamic visual scenes. We tested the proposed method using a range of stimuli including simulated movements in grating backgrounds and shifting of a natural panoramic scene, as well as vehicle crash video sequences. The experimental results demonstrate the proposed method is feasible for fast collision perception in real-world situations with potential applications in future autonomous vehicles. |
Tasks | Autonomous Vehicles |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02074v1 |
http://arxiv.org/pdf/1904.02074v1.pdf | |
PWC | https://paperswithcode.com/paper/a-visual-neural-network-for-robust-collision |
Repo | |
Framework | |
Sample-Optimal Low-Rank Approximation of Distance Matrices
Title | Sample-Optimal Low-Rank Approximation of Distance Matrices |
Authors | Piotr Indyk, Ali Vakilian, Tal Wagner, David Woodruff |
Abstract | A distance matrix $A \in \mathbb R^{n \times m}$ represents all pairwise distances, $A_{ij}=\mathrm{d}(x_i,y_j)$, between two point sets $x_1,…,x_n$ and $y_1,…,y_m$ in an arbitrary metric space $(\mathcal Z, \mathrm{d})$. Such matrices arise in various computational contexts such as learning image manifolds, handwriting recognition, and multi-dimensional unfolding. In this work we study algorithms for low-rank approximation of distance matrices. Recent work by Bakshi and Woodruff (NeurIPS 2018) showed it is possible to compute a rank-$k$ approximation of a distance matrix in time $O((n+m)^{1+\gamma}) \cdot \mathrm{poly}(k,1/\epsilon)$, where $\epsilon>0$ is an error parameter and $\gamma>0$ is an arbitrarily small constant. Notably, their bound is sublinear in the matrix size, which is unachievable for general matrices. We present an algorithm that is both simpler and more efficient. It reads only $O((n+m) k/\epsilon)$ entries of the input matrix, and has a running time of $O(n+m) \cdot \mathrm{poly}(k,1/\epsilon)$. We complement the sample complexity of our algorithm with a matching lower bound on the number of entries that must be read by any algorithm. We provide experimental results to validate the approximation quality and running time of our algorithm. |
Tasks | |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00339v1 |
https://arxiv.org/pdf/1906.00339v1.pdf | |
PWC | https://paperswithcode.com/paper/sample-optimal-low-rank-approximation-of |
Repo | |
Framework | |
LSTM-Assisted Evolutionary Self-Expressive Subspace Clustering
Title | LSTM-Assisted Evolutionary Self-Expressive Subspace Clustering |
Authors | Di Xu, Tianhang Long, Junbin Gao |
Abstract | Massive volumes of high-dimensional data that evolves over time is continuously collected by contemporary information processing systems, which brings up the problem of organizing this data into clusters, i.e. achieve the purpose of dimensional deduction, and meanwhile learning its temporal evolution patterns. In this paper, a framework for evolutionary subspace clustering, referred to as LSTM-ESCM, is introduced, which aims at clustering a set of evolving high-dimensional data points that lie in a union of low-dimensional evolving subspaces. In order to obtain the parsimonious data representation at each time step, we propose to exploit the so-called self-expressive trait of the data at each time point. At the same time, LSTM networks are implemented to extract the inherited temporal patterns behind data in an overall time frame. An efficient algorithm has been proposed based on MATLAB. Next, experiments are carried out on real-world datasets to demonstrate the effectiveness of our proposed approach. And the results show that the suggested algorithm dramatically outperforms other known similar approaches in terms of both run time and accuracy. |
Tasks | |
Published | 2019-10-19 |
URL | https://arxiv.org/abs/1910.08862v1 |
https://arxiv.org/pdf/1910.08862v1.pdf | |
PWC | https://paperswithcode.com/paper/lstm-assisted-evolutionary-self-expressive |
Repo | |
Framework | |
On the convergence of projective-simulation-based reinforcement learning in Markov decision processes
Title | On the convergence of projective-simulation-based reinforcement learning in Markov decision processes |
Authors | Jens Clausen, Walter L. Boyajian, Lea M. Trenkwalder, Vedran Dunjko, Hans J. Briegel |
Abstract | In recent years, the interest in leveraging quantum effects for enhancing machine learning tasks has significantly increased. Many algorithms speeding up supervised and unsupervised learning were established. The first framework in which ways to exploit quantum resources specifically for the broader context of reinforcement learning were found is projective simulation. Projective simulation presents an agent-based reinforcement learning approach designed in a manner which may support quantum walk-based speed-ups. Although classical variants of projective simulation have been benchmarked against common reinforcement learning algorithms, very few formal theoretical analyses have been provided for its performance in standard learning scenarios. In this paper, we provide a detailed formal discussion of the properties of this model. Specifically, we prove that one version of the projective simulation model, understood as a reinforcement learning approach, converges to optimal behavior in a large class of Markov decision processes. This proof shows that a physically-inspired approach to reinforcement learning can guarantee to converge. |
Tasks | |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11914v1 |
https://arxiv.org/pdf/1910.11914v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-convergence-of-projective-simulation |
Repo | |
Framework | |
SwarmSGD: Scalable Decentralized SGD with Local Updates
Title | SwarmSGD: Scalable Decentralized SGD with Local Updates |
Authors | Giorgi Nadiradze, Amirmojtaba Sabour, Dan Alistarh, Aditya Sharma, Ilia Markov, Vitaly Aksenov |
Abstract | The ability to scale distributed optimization to large node counts has been one of the main enablers of recent progress in machine learning. To this end, several techniques have been explored, such as asynchronous and decentralized execution–which significantly reduce the impact of communication and synchronization, and the ability for nodes to perform several local model updates before communicating–which reduces the frequency of communication. In this paper, we show that these techniques, which have so far been considered independently, can be jointly leveraged to obtain near-perfect scalability for training neural network models via stochastic gradient descent (SGD). We consider a setting with minimal coordination: we have a large number of nodes on a communication graph, each with a local subset of data, performing independent SGD updates onto their local models. After some number of local updates, each node chooses an interaction partner uniformly at random from its neighbors, and averages its local model with the neighbor’s model. Our first contribution is in proving that, even under such a relaxed setting, SGD can still be guaranteed to converge to local minima under standard assumptions. The proof improves existing techniques by jointly handling decentralization, asynchrony, and local updates, and by bounding their impact. On the practical side, we instantiate this algorithm onto a supercomputing environment, and show that it can successfully converge and scale for large-scale image classification models, matching or even slightly improving the accuracy of the baseline parallel variants. |
Tasks | Distributed Optimization, Image Classification |
Published | 2019-10-27 |
URL | https://arxiv.org/abs/1910.12308v2 |
https://arxiv.org/pdf/1910.12308v2.pdf | |
PWC | https://paperswithcode.com/paper/popsgd-decentralized-stochastic-gradient-1 |
Repo | |
Framework | |
Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks
Title | Accelerated Primal-Dual Algorithms for Distributed Smooth Convex Optimization over Networks |
Authors | Jinming Xu, Ye Tian, Ying Sun, Gesualdo Scutari |
Abstract | This paper proposes a novel family of primal-dual-based distributed algorithms for smooth, convex, multi-agent optimization over networks that uses only gradient information and gossip communications. The algorithms can also employ acceleration on the computation and communications. We provide a unified analysis of their convergence rate, measured in terms of the Bregman distance associated to the saddle point reformation of the distributed optimization problem. When acceleration is employed, the rate is shown to be optimal, in the sense that it matches (under the proposed metric) existing complexity lower bounds of distributed algorithms applicable to such a class of problem and using only gradient information and gossip communications. Preliminary numerical results on distributed least-square regression problems show that the proposed algorithm compares favorably on existing distributed schemes. |
Tasks | Distributed Optimization |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10666v2 |
https://arxiv.org/pdf/1910.10666v2.pdf | |
PWC | https://paperswithcode.com/paper/accelerated-primal-dual-algorithms-for |
Repo | |
Framework | |
DE/RM-MEDA: A New Hybrid Multi-Objective Generator
Title | DE/RM-MEDA: A New Hybrid Multi-Objective Generator |
Authors | Abel Sa J. R Malano, Guanjun Du, Guoxiang Tong, Naixue Xiong |
Abstract | Under the condition of Karush-Kuhn-Tucker, the Pareto Set (PS) in the decision area of an m-objective optimization problem is a piecewise continuous (m-1)-D manifold. For illustrate the degree of convergence of the population, we employed the ratio of the sum of the first (m-1) largest eigenvalue of the population’s covariance matrix of the sum of all eigenvalue. Based on this property, this paper proposes a new algorithm, called DE/RM-MEDA, which mix differential evolutionary (DE) and the estimation of distribution algorithm (EDA) to generate and adaptively adjusts the number of new solutions by the ratio. The proposed algorithm is experimented on nine tec09 problems. The comparison results between DE/RM-MEDA and the others algorithms, called NSGA-II-DE and RM-MEDA, show that the proposed algorithm perform better in terms of convergence and diversity metric. |
Tasks | |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.10995v1 |
https://arxiv.org/pdf/1911.10995v1.pdf | |
PWC | https://paperswithcode.com/paper/derm-meda-a-new-hybrid-multi-objective |
Repo | |
Framework | |