Paper Group ANR 1262
Parallel Restarted SPIDER – Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity. From Personalization to Privatization: Meta Matrix Factorization for Private Rating Predictions. Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision. A Compara …
Parallel Restarted SPIDER – Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity
Title | Parallel Restarted SPIDER – Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity |
Authors | Pranay Sharma, Prashant Khanduri, Saikiran Bulusu, Ketan Rajawat, Pramod K. Varshney |
Abstract | In this paper, we propose a distributed algorithm for stochastic smooth, non-convex optimization. We assume a worker-server architecture where $N$ nodes, each having $n$ (potentially infinite) number of samples, collaborate with the help of a central server to perform the optimization task. The global objective is to minimize the average of local cost functions available at individual nodes. The proposed approach is a non-trivial extension of the popular parallel-restarted SGD algorithm, incorporating the optimal variance-reduction based SPIDER gradient estimator into it. We prove convergence of our algorithm to a first-order stationary solution. The proposed approach achieves the best known communication complexity $O(\epsilon^{-1})$ along with the optimal computation complexity. For finite-sum problems (finite $n$), we achieve the optimal computation (IFO) complexity $O(\sqrt{Nn}\epsilon^{-1})$. For online problems ($n$ unknown or infinite), we achieve the optimal IFO complexity $O(\epsilon^{-3/2})$. In both the cases, we maintain the linear speedup achieved by existing methods. This is a massive improvement over the $O(\epsilon^{-2})$ IFO complexity of the existing approaches. Additionally, our algorithm is general enough to allow non-identical distributions of data across workers, as in the recently proposed federated learning paradigm. |
Tasks | |
Published | 2019-12-12 |
URL | https://arxiv.org/abs/1912.06036v1 |
https://arxiv.org/pdf/1912.06036v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-restarted-spider-communication |
Repo | |
Framework | |
From Personalization to Privatization: Meta Matrix Factorization for Private Rating Predictions
Title | From Personalization to Privatization: Meta Matrix Factorization for Private Rating Predictions |
Authors | Yujie Lin, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Dongxiao Yu, Jun Ma, Maarten de Rijke, Xiuzhen Cheng |
Abstract | Matrix factorization (MF) techniques have been shown to be effective for rating predictions (RPs) in personalized recommender systems. Existing MF methods use the same item embeddings and the same RP model for all users, while ignoring the possibility that different users may have different views about the same item and may favor different RP models. We introduce a novel MF framework, named meta matrix factorization (MetaMF), that generates private item embeddings and RP models. Given a vector representing a user, we first obtain a collaborative vector by collecting useful information from all users with a collaborative memory (CM) module. Then, we employ a meta recommender (MR) module to generate private item embeddings and a RP model based on the collaborative vector. To address the challenge of generating a large number of high-dimensional item embeddings, we devise a rise-dimensional generation (RG) strategy that first generates a low-dimensional item embedding matrix and a rise-dimensional matrix, and then multiply them to obtain high-dimensional embeddings. Finally, we use the generated model to produce private RPs for a given user. Experiments on two benchmark datasets show that MetaMF outperforms state-of-the-art MF methods. MetaMF generates similar/dissimilar item embeddings and models for different users to flexibly exploit collaborative filtering (CF), demonstrating the benefits of MetaMF. |
Tasks | Recommendation Systems |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.10086v1 |
https://arxiv.org/pdf/1910.10086v1.pdf | |
PWC | https://paperswithcode.com/paper/from-personalization-to-privatization-meta |
Repo | |
Framework | |
Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision
Title | Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision |
Authors | Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu, Lixin Duan |
Abstract | Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming. In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos. Our proposed framework consists of two major components. First, for action frame localization, we take advantage of the self-attention mechanism to weight each frame, such that the influence of background frames can be effectively eliminated. Second, considering that there are trimmed videos publicly available and also they contain useful information to leverage, we present an additional module to transfer the knowledge from trimmed videos for improving the classification performance in untrimmed ones. Extensive experiments are conducted on two benchmark datasets (i.e., THUMOS14 and ActivityNet1.3), and experimental results clearly corroborate the efficacy of our method. |
Tasks | Action Recognition In Videos, Temporal Action Localization |
Published | 2019-02-20 |
URL | http://arxiv.org/abs/1902.07370v1 |
http://arxiv.org/pdf/1902.07370v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-transferable-self-attentive |
Repo | |
Framework | |
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Title | A Comparative Analysis of Expected and Distributional Reinforcement Learning |
Authors | Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare |
Abstract | Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard approach which models expected values (expected RL). However, aside from convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove that in many realizations of the tabular and linear approximation settings, distributional RL behaves exactly the same as expected RL. In cases where the two methods behave differently, distributional RL can in fact hurt performance when it does not induce identical behaviour. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from. |
Tasks | Distributional Reinforcement Learning |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.11084v2 |
http://arxiv.org/pdf/1901.11084v2.pdf | |
PWC | https://paperswithcode.com/paper/a-comparative-analysis-of-expected-and |
Repo | |
Framework | |
Bayesian Optimization for Categorical and Category-Specific Continuous Inputs
Title | Bayesian Optimization for Categorical and Category-Specific Continuous Inputs |
Authors | Dang Nguyen, Sunil Gupta, Santu Rana, Alistair Shilton, Svetha Venkatesh |
Abstract | Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables. Our goal is to identify the best arm and the maximizer of the corresponding continuous function simultaneously. Our algorithm uses a Thompson sampling scheme that helps connecting both multi-arm bandit and BO in a unified framework. We extend our method to batch BO to allow parallel optimization when multiple resources are available. We theoretically analyze our method for convergence and prove sub-linear regret bounds. We perform a variety of experiments: optimization of several benchmark functions, hyper-parameter tuning of a neural network, and automatic selection of the best machine learning model along with its optimal hyper-parameters (a.k.a automated machine learning). Comparisons with other methods demonstrate the effectiveness of our proposed method. |
Tasks | |
Published | 2019-11-28 |
URL | https://arxiv.org/abs/1911.12473v1 |
https://arxiv.org/pdf/1911.12473v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-optimization-for-categorical-and |
Repo | |
Framework | |
On the Convergence of Adam and Beyond
Title | On the Convergence of Adam and Beyond |
Authors | Sashank J. Reddi, Satyen Kale, Sanjiv Kumar |
Abstract | Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients. In many applications, e.g. learning with large output spaces, it has been empirically observed that these algorithms fail to converge to an optimal solution (or a critical point in nonconvex settings). We show that one cause for such failures is the exponential moving average used in the algorithms. We provide an explicit example of a simple convex optimization setting where Adam does not converge to the optimal solution, and describe the precise problems with the previous analysis of Adam algorithm. Our analysis suggests that the convergence issues can be fixed by endowing such algorithms with `long-term memory’ of past gradients, and propose new variants of the Adam algorithm which not only fix the convergence issues but often also lead to improved empirical performance. | |
Tasks | Stochastic Optimization |
Published | 2019-04-19 |
URL | http://arxiv.org/abs/1904.09237v1 |
http://arxiv.org/pdf/1904.09237v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-convergence-of-adam-and-beyond-1 |
Repo | |
Framework | |
QLMC-HD: Quasi Large Margin Classifier based on Hyperdisk
Title | QLMC-HD: Quasi Large Margin Classifier based on Hyperdisk |
Authors | Hassan Ataeian, Shahriar Esmaeili, Ali Amiri, Hossein Safari |
Abstract | In the area of data classification, the different classifiers have been developed by their own strengths and weaknesses. Among these classifiers, we propose a method that is based on the maximum margin between two classes. One of the main challenges in this area is dealt with noisy data. In this paper, our aim is to optimize the method of large margin classifiers based on hyperdisk (LMC-HD) and incorporate it into a quasi-support vector data description (QSVDD) method. In the proposed method, the bounding hypersphere is calculated based on the QSVDD method. So our convex class model is more robust compared with the support vector machine (SVM) and less tight than LMC-HD. Large margin classifiers aim to maximize the margin and minimizing the risk. Since our proposed method ignores the effect of outliers and noises, so this method has the widest margin compared with other large margin classifiers. In the end, we compare our proposed method with other popular large margin classifiers by the experiments on a set of standard data which indicates our results are more efficient than the others. |
Tasks | |
Published | 2019-02-26 |
URL | https://arxiv.org/abs/1902.09692v3 |
https://arxiv.org/pdf/1902.09692v3.pdf | |
PWC | https://paperswithcode.com/paper/qlmc-hd-quasi-large-margin-classifier-based |
Repo | |
Framework | |
Capabilities and Limitations of Time-lagged Autoencoders for Slow Mode Discovery in Dynamical Systems
Title | Capabilities and Limitations of Time-lagged Autoencoders for Slow Mode Discovery in Dynamical Systems |
Authors | Wei Chen, Hythem Sidky, Andrew L. Ferguson |
Abstract | Time-lagged autoencoders (TAEs) have been proposed as a deep learning regression-based approach to the discovery of slow modes in dynamical systems. However, a rigorous analysis of nonlinear TAEs remains lacking. In this work, we discuss the capabilities and limitations of TAEs through both theoretical and numerical analyses. Theoretically, we derive bounds for nonlinear TAE performance in slow mode discovery and show that in general TAEs learn a mixture of slow and maximum variance modes. Numerically, we illustrate cases where TAEs can and cannot correctly identify the leading slowest mode in two example systems: a 2D “Washington beltway” potential and the alanine dipeptide molecule in explicit water. We also compare the TAE results with those obtained using state-free reversible VAMPnets (SRVs) as a variational-based neural network approach for slow modes discovery, and show that SRVs can correctly discover slow modes where TAEs fail. |
Tasks | |
Published | 2019-06-02 |
URL | https://arxiv.org/abs/1906.00325v1 |
https://arxiv.org/pdf/1906.00325v1.pdf | |
PWC | https://paperswithcode.com/paper/190600325 |
Repo | |
Framework | |
Sample-specific repetitive learning for photo aesthetic assessment and highlight region extraction
Title | Sample-specific repetitive learning for photo aesthetic assessment and highlight region extraction |
Authors | Ying Dai |
Abstract | Aesthetic assessment is subjective, and the distribution of the aesthetic levels is imbalanced. In order to realize the auto-assessment of photo aesthetics, we focus on retraining the CNN-based aesthetic assessment model by dropping out the unavailable samples in the middle levels from the training data set repetitively to overcome the effect of imbalanced aesthetic data on classification. Further, the method of extracting aesthetics highlight region of the photo image by using the two repetitively trained models is presented. Therefore, the correlation of the extracted region with the aesthetic levels is analyzed to illustrate what aesthetics features influence the aesthetic quality of the photo. Moreover, the testing data set is from the different data source called 500px. Experimental results show that the proposed method is effective. |
Tasks | |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08213v1 |
https://arxiv.org/pdf/1909.08213v1.pdf | |
PWC | https://paperswithcode.com/paper/sample-specific-repetitive-learning-for-photo |
Repo | |
Framework | |
Image Transformation can make Neural Networks more robust against Adversarial Examples
Title | Image Transformation can make Neural Networks more robust against Adversarial Examples |
Authors | Dang Duy Thang, Toshihiro Matsui |
Abstract | Neural networks are being applied in many tasks related to IoT with encouraging results. For example, neural networks can precisely detect human, objects and animal via surveillance camera for security purpose. However, neural networks have been recently found vulnerable to well-designed input samples that called adversarial examples. Such issue causes neural networks to misclassify adversarial examples that are imperceptible to humans. We found giving a rotation to an adversarial example image can defeat the effect of adversarial examples. Using MNIST number images as the original images, we first generated adversarial examples to neural network recognizer, which was completely fooled by the forged examples. Then we rotated the adversarial image and gave them to the recognizer to find the recognizer to regain the correct recognition. Thus, we empirically confirmed rotation to images can protect pattern recognizer based on neural networks from adversarial example attacks. |
Tasks | |
Published | 2019-01-10 |
URL | http://arxiv.org/abs/1901.03037v1 |
http://arxiv.org/pdf/1901.03037v1.pdf | |
PWC | https://paperswithcode.com/paper/image-transformation-can-make-neural-networks |
Repo | |
Framework | |
Convolutional Self-Attention Networks
Title | Convolutional Self-Attention Networks |
Authors | Baosong Yang, Longyue Wang, Derek Wong, Lidia S. Chao, Zhaopeng Tu |
Abstract | Self-attention networks (SANs) have drawn increasing interest due to their high parallelization in computation and flexibility in modeling dependencies. SANs can be further enhanced with multi-head attention by allowing the model to attend to information from different representation subspaces. In this work, we propose novel convolutional self-attention networks, which offer SANs the abilities to 1) strengthen dependencies among neighboring elements, and 2) model the interaction between features extracted by multiple attention heads. Experimental results of machine translation on different language pairs and model settings show that our approach outperforms both the strong Transformer baseline and other existing models on enhancing the locality of SANs. Comparing with prior studies, the proposed model is parameter free in terms of introducing no more parameters. |
Tasks | Machine Translation |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.03107v1 |
http://arxiv.org/pdf/1904.03107v1.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-self-attention-networks |
Repo | |
Framework | |
An automated approach for task evaluation using EEG signals
Title | An automated approach for task evaluation using EEG signals |
Authors | Vishal Anand, S. R. Sreeja, Debasis Samanta |
Abstract | Critical task and cognition-based environments, such as in military and defense operations, aviation user-technology interaction evaluation on UI, understanding intuitiveness of a hardware model or software toolkit, etc. require an assessment of how much a particular task is generating mental workload on a user. This is necessary for understanding how those tasks, operations, and activities can be improvised and made better suited for the users so that they reduce the mental workload on the individual and the operators can use them with ease and less difficulty. However, a particular task can be gauged by a user as simple while for others it may be difficult. Understanding the complexity of a particular task can only be done on user level and we propose to do this by understanding the mental workload (MWL) generated on an operator while performing a task which requires processing a lot of information to get the task done. In this work, we have proposed an experimental setup which replicates modern day workload on doing regular day job tasks. We propose an approach to automatically evaluate the task complexity perceived by an individual by using electroencephalogram (EEG) data of a user during operation. Few crucial steps that are addressed in this work include extraction and optimization of different features and selection of relevant features for dimensionality reduction and using supervised machine learning techniques. In addition to this, performance results of the classifiers are compared using all features and also using only the selected features. From the results, it can be inferred that machine learning algorithms perform better as compared to traditional approaches for mental workload estimation. |
Tasks | Dimensionality Reduction, EEG |
Published | 2019-11-07 |
URL | https://arxiv.org/abs/1911.02966v2 |
https://arxiv.org/pdf/1911.02966v2.pdf | |
PWC | https://paperswithcode.com/paper/an-automated-approach-for-task-evaluation |
Repo | |
Framework | |
ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching
Title | ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching |
Authors | Anru Zhang, Yuetian Luo, Garvesh Raskutti, Ming Yuan |
Abstract | In this paper, we develop a novel procedure for low-rank tensor regression, namely \underline{I}mportance \underline{S}ketching \underline{L}ow-rank \underline{E}stimation for \underline{T}ensors (ISLET). The central idea behind ISLET is \emph{importance sketching}, i.e., carefully designed sketches based on both the responses and low-dimensional structure of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions and under randomized Gaussian ensemble design. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical studies that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods whilst having substantial storage and run-time advantages including capabilities for parallel and distributed computing. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is $1$ or $2$ orders of magnitude faster than baseline methods. |
Tasks | |
Published | 2019-11-09 |
URL | https://arxiv.org/abs/1911.03804v1 |
https://arxiv.org/pdf/1911.03804v1.pdf | |
PWC | https://paperswithcode.com/paper/islet-fast-and-optimal-low-rank-tensor |
Repo | |
Framework | |
Factorization Bandits for Online Influence Maximization
Title | Factorization Bandits for Online Influence Maximization |
Authors | Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, Hongning Wang |
Abstract | We study the problem of online influence maximization in social networks. In this problem, a learner aims to identify the set of “best influencers” in a network by interacting with it, i.e., repeatedly selecting seed nodes and observing activation feedback in the network. We capitalize on an important property of the influence maximization problem named network assortativity, which is ignored by most existing works in online influence maximization. To realize network assortativity, we factorize the activation probability on the edges into latent factors on the corresponding nodes, including influence factor on the giving nodes and susceptibility factor on the receiving nodes. We propose an upper confidence bound based online learning solution to estimate the latent factors, and therefore the activation probabilities. Considerable regret reduction is achieved by our factorization based online influence maximization algorithm. And extensive empirical evaluations on two real-world networks showed the effectiveness of our proposed solution. |
Tasks | |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03737v2 |
https://arxiv.org/pdf/1906.03737v2.pdf | |
PWC | https://paperswithcode.com/paper/factorization-bandits-for-online-influence |
Repo | |
Framework | |
A Neural Model for Dialogue Coherence Assessment
Title | A Neural Model for Dialogue Coherence Assessment |
Authors | Mohsen Mesgar, Sebastian Bücker, Iryna Gurevych |
Abstract | Dialogue quality assessment is crucial for evaluating dialogue agents. An essential factor of high-quality dialogues is coherence - what makes dialogue utterances a whole. This paper proposes a novel dialogue coherence model trained in a hierarchical multi-task learning scenario where coherence assessment is the primary and the high-level task, and dialogue act prediction is the auxiliary and the low-level task. The results of our experiments for two benchmark dialogue corpora (i.e. SwitchBoard and DailyDialog) show that our model significantly outperforms its competitors for ranking dialogues with respect to their coherence. Although the performance of other examined models considerably varies across examined corpora, our model robustly achieves high performance. We release the source code and datasets defined for the experiments in this paper to accelerate future research on dialogue coherence. |
Tasks | Multi-Task Learning |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08486v1 |
https://arxiv.org/pdf/1908.08486v1.pdf | |
PWC | https://paperswithcode.com/paper/a-neural-model-for-dialogue-coherence |
Repo | |
Framework | |