April 2, 2020

2907 words 14 mins read

Paper Group ANR 93

Paper Group ANR 93

Communication-efficient Variance-reduced Stochastic Gradient Descent. Decentralized gradient methods: does topology matter?. Ranking Significant Discrepancies in Clinical Reports. Analyzing Visual Representations in Embodied Navigation Tasks. “Wait, I’m Still Talking!” Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model. …

Communication-efficient Variance-reduced Stochastic Gradient Descent

Title Communication-efficient Variance-reduced Stochastic Gradient Descent
Authors Hossein S. Ghadikolaei, Sindri Magnusson
Abstract We consider the problem of communication efficient distributed optimization where multiple nodes exchange important algorithm information in every iteration to solve large problems. In particular, we focus on the stochastic variance-reduced gradient and propose a novel approach to make it communication-efficient. That is, we compress the communicated information to a few bits while preserving the linear convergence rate of the original uncompressed algorithm. Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty. Moreover, it is much more robust to quantization (in terms of maintaining the true minimizer and the convergence rate) than the state-of-the-art algorithms for solving distributed optimization problems. Our results have important implications for using machine learning over internet-of-things and mobile networks.
Tasks Distributed Optimization, Quantization
Published 2020-03-10
URL https://arxiv.org/abs/2003.04686v1
PDF https://arxiv.org/pdf/2003.04686v1.pdf
PWC https://paperswithcode.com/paper/communication-efficient-variance-reduced

Decentralized gradient methods: does topology matter?

Title Decentralized gradient methods: does topology matter?
Authors Giovanni Neglia, Chuan Xu, Don Towsley, Gianmarco Calbi
Abstract Consensus-based distributed optimization methods have recently been advocated as alternatives to parameter server and ring all-reduce paradigms for large scale training of machine learning models. In this case, each worker maintains a local estimate of the optimal parameter vector and iteratively updates it by averaging the estimates obtained from its neighbors, and applying a correction on the basis of its local dataset. While theoretical results suggest that worker communication topology should have strong impact on the number of epochs needed to converge, previous experiments have shown the opposite conclusion. This paper sheds lights on this apparent contradiction and show how sparse topologies can lead to faster convergence even in the absence of communication delays.
Tasks Distributed Optimization
Published 2020-02-28
URL https://arxiv.org/abs/2002.12688v1
PDF https://arxiv.org/pdf/2002.12688v1.pdf
PWC https://paperswithcode.com/paper/decentralized-gradient-methods-does-topology

Ranking Significant Discrepancies in Clinical Reports

Title Ranking Significant Discrepancies in Clinical Reports
Authors Sean MacAvaney, Arman Cohan, Nazli Goharian, Ross Filice
Abstract Medical errors are a major public health concern and a leading cause of death worldwide. Many healthcare centers and hospitals use reporting systems where medical practitioners write a preliminary medical report and the report is later reviewed, revised, and finalized by a more experienced physician. The revisions range from stylistic to corrections of critical errors or misinterpretations of the case. Due to the large quantity of reports written daily, it is often difficult to manually and thoroughly review all the finalized reports to find such errors and learn from them. To address this challenge, we propose a novel ranking approach, consisting of textual and ontological overlaps between the preliminary and final versions of reports. The approach learns to rank the reports based on the degree of discrepancy between the versions. This allows medical practitioners to easily identify and learn from the reports in which their interpretation most substantially differed from that of the attending physician (who finalized the report). This is a crucial step towards uncovering potential errors and helping medical practitioners to learn from such errors, thus improving patient-care in the long run. We evaluate our model on a dataset of radiology reports and show that our approach outperforms both previously-proposed approaches and more recent language models by 4.5% to 15.4%.
Published 2020-01-18
URL https://arxiv.org/abs/2001.06674v1
PDF https://arxiv.org/pdf/2001.06674v1.pdf
PWC https://paperswithcode.com/paper/ranking-significant-discrepancies-in-clinical

Analyzing Visual Representations in Embodied Navigation Tasks

Title Analyzing Visual Representations in Embodied Navigation Tasks
Authors Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos
Abstract Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks. We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.
Published 2020-03-12
URL https://arxiv.org/abs/2003.05993v1
PDF https://arxiv.org/pdf/2003.05993v1.pdf
PWC https://paperswithcode.com/paper/analyzing-visual-representations-in-embodied

“Wait, I’m Still Talking!” Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model

Title “Wait, I’m Still Talking!” Predicting the Dialogue Interaction Behavior Using Imagine-Then-Arbitrate Model
Authors Zehao Lin, Xiaoming Kang, Guodun Li, Feng Ji, Haiqing Chen, Yin Zhang
Abstract Producing natural and accurate responses like human beings is the ultimate goal of intelligent dialogue agents. So far, most of the past works concentrate on selecting or generating one pertinent and fluent response according to current query and its context. These models work on a one-to-one environment, making one response to one utterance each round. However, in real human-human conversations, human often sequentially sends several short messages for readability instead of a long message in one turn. Thus messages will not end with an explicit ending signal, which is crucial for agents to decide when to reply. So the first step for an intelligent dialogue agent is not replying but deciding if it should reply at the moment. To address this issue, in this paper, we propose a novel Imagine-then-Arbitrate (ITA) neural dialogue model to help the agent decide whether to wait or to make a response directly. Our method has two imaginator modules and an arbitrator module. The two imaginators will learn the agent’s and user’s speaking style respectively, generate possible utterances as the input of the arbitrator, combining with dialogue history. And the arbitrator decides whether to wait or to make a response to the user directly. To verify the performance and effectiveness of our method, we prepared two dialogue datasets and compared our approach with several popular models. Experimental results show that our model performs well on addressing ending prediction issue and outperforms baseline models.
Published 2020-02-22
URL https://arxiv.org/abs/2002.09616v1
PDF https://arxiv.org/pdf/2002.09616v1.pdf
PWC https://paperswithcode.com/paper/wait-im-still-talking-predicting-the-dialogue

Prediction of number of cases expected and estimation of the final size of coronavirus epidemic in India using the logistic model and genetic algorithm

Title Prediction of number of cases expected and estimation of the final size of coronavirus epidemic in India using the logistic model and genetic algorithm
Authors Ganesh Kumar M, Soman K. P, Gopalakrishnan E. A, Vijay Krishna Menon, Sowmya V
Abstract In this paper, we have applied the logistic growth regression model and genetic algorithm to predict the number of coronavirus infected cases that can be expected in upcoming days in India and also estimated the final size and its peak time of the coronavirus epidemic in India.
Published 2020-03-26
URL https://arxiv.org/abs/2003.12017v1
PDF https://arxiv.org/pdf/2003.12017v1.pdf
PWC https://paperswithcode.com/paper/prediction-of-number-of-cases-expected-and

Latent Poisson models for networks with heterogeneous density

Title Latent Poisson models for networks with heterogeneous density
Authors Tiago P. Peixoto
Abstract Empirical networks are often globally sparse, with a small average number of connections per node, when compared to the total size of the network. However this sparsity tends not to be homogeneous, and networks can also be locally dense, for example with a few nodes connecting to a large fraction of the rest of the network, or with small groups of nodes with a large probability of connections between them. Here we show how latent Poisson models which generate hidden multigraphs can be effective at capturing this density heterogeneity, while being more tractable mathematically than some of the alternatives that model simple graphs directly. We show how these latent multigraphs can be reconstructed from data on simple graphs, and how this allows us to disentangle dissortative degree-degree correlations from the constraints of imposed degree sequences, and to improve the identification of community structure in empirically relevant scenarios.
Published 2020-02-18
URL https://arxiv.org/abs/2002.07803v2
PDF https://arxiv.org/pdf/2002.07803v2.pdf
PWC https://paperswithcode.com/paper/latent-poisson-models-for-networks-with

Corella: A Private Multi Server Learning Approach based on Correlated Queries

Title Corella: A Private Multi Server Learning Approach based on Correlated Queries
Authors Hamidreza Ehteram, Mohammad Ali Maddah-Ali, Mahtab Mirmohseni
Abstract The emerging applications of machine learning algorithms on mobile devices motivate us to offload the computation tasks of training a model or deploying a trained one to the cloud. One of the major challenges in this setup is to guarantee the privacy of the client’s data. Various methods have been proposed to protect privacy in the literature. Those include (i) adding noise to the client data, which reduces the accuracy of the result, (ii) using secure multiparty computation, which requires significant communication among the computing nodes or with the client, (iii) relying on homomorphic encryption methods, which significantly increases computation load. In this paper, we propose an alternative approach to protect the privacy of user data. The proposed scheme relies on a cluster of servers where at most $T$ of them for some integer $T$, may collude, that each running a deep neural network. Each server is fed with the client data, added with a $\textit{strong}$ noise. This makes the information leakage to each server information-theoretically negligible. On the other hand, the added noises for different servers are $\textit{correlated}$. This correlation among queries allows the system to be $\textit{trained}$ such that the client can recover the final result with high accuracy, by combining the outputs of the servers, with minor computation efforts. Simulation results for various datasets demonstrate the accuracy of the proposed approach.
Published 2020-03-26
URL https://arxiv.org/abs/2003.12052v1
PDF https://arxiv.org/pdf/2003.12052v1.pdf
PWC https://paperswithcode.com/paper/corella-a-private-multi-server-learning

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Title Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates
Authors Ping Zhou, Zhen Yu, Jingyi Ma, Maozai Tian
Abstract Distributed statistical inference has recently attracted immense attention. Herein, we study the asymptotic efficiency of the maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation estimator for generalized linear models with a diverging number of covariates. Then a novel method is proposed to obtain an asymptotically efficient estimator for large-scale distributed data by two rounds of communication between local machines and the central server. The assumption on the number of machines in this paper is more relaxed and thus practical for real-world applications. Simulations and a case study demonstrate the satisfactory finite-sample performance of the proposed estimators.
Published 2020-01-17
URL https://arxiv.org/abs/2001.06194v1
PDF https://arxiv.org/pdf/2001.06194v1.pdf
PWC https://paperswithcode.com/paper/communication-efficient-distributed-estimator

Improved guarantees and a multiple-descent curve for the Column Subset Selection Problem and the Nyström method

Title Improved guarantees and a multiple-descent curve for the Column Subset Selection Problem and the Nyström method
Authors Michał Dereziński, Rajiv Khanna, Michael W. Mahoney
Abstract The Column Subset Selection Problem (CSSP) and the Nystr"om method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing. A fundamental question in this area is: how well can a data subset of size k compete with the best rank k approximation? We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees which go beyond the standard worst-case analysis. Our approach leads to significantly better bounds for datasets with known rates of singular value decay, e.g., polynomial or exponential decay. Our analysis also reveals an intriguing phenomenon: the approximation factor as a function of k may exhibit multiple peaks and valleys, which we call a multiple-descent curve. A lower bound we establish shows that this behavior is not an artifact of our analysis, but rather it is an inherent property of the CSSP and Nystr"om tasks. Finally, using the example of a radial basis function (RBF) kernel, we show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
Published 2020-02-21
URL https://arxiv.org/abs/2002.09073v1
PDF https://arxiv.org/pdf/2002.09073v1.pdf
PWC https://paperswithcode.com/paper/improved-guarantees-and-a-multiple-descent
Title Trends and Advancements in Deep Neural Network Communication
Authors Felix Sattler, Thomas Wiegand, Wojciech Samek
Abstract Due to their great performance and scalability properties neural networks have become ubiquitous building blocks of many applications. With the rise of mobile and IoT, these models now are also being increasingly applied in distributed settings, where the owners of the data are separated by limited communication channels and privacy constraints. To address the challenges of these distributed environments, a wide range of training and evaluation schemes have been developed, which require the communication of neural network parametrizations. These novel approaches, which bring the “intelligence to the data” have many advantages over traditional cloud solutions such as privacy-preservation, increased security and device autonomy, communication efficiency and high training speed. This paper gives an overview over the recent advancements and challenges in this new field of research at the intersection of machine learning and communications.
Published 2020-03-06
URL https://arxiv.org/abs/2003.03320v1
PDF https://arxiv.org/pdf/2003.03320v1.pdf
PWC https://paperswithcode.com/paper/trends-and-advancements-in-deep-neural

Block Layer Decomposition schemes for training Deep Neural Networks

Title Block Layer Decomposition schemes for training Deep Neural Networks
Authors Laura Palagi, Ruggiero Seccia
Abstract Deep Feedforward Neural Networks’ (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. As a consequence, optimization algorithms can be attracted toward local minimizers which can lead to bad solutions or can slow down the optimization process. Furthermore, the time needed to find good solutions to the training problem depends on both the number of samples and the number of variables. In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method ables to effectively tackle the network’s depth and then we further extend the algorithm proposing a \textit{minibatch} BCD framework able to scale with respect to both the number of variables and the number of samples by embedding a BCD approach into a minibatch framework. By extensive numerical results on standard datasets for several architecture networks, we show how the application of BCD methods to the training phase of DFNNs permits to outperform standard batch and minibatch algorithms leading to an improvement on both the training phase and the generalization performance of the networks.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08123v1
PDF https://arxiv.org/pdf/2003.08123v1.pdf
PWC https://paperswithcode.com/paper/block-layer-decomposition-schemes-for

Towards More Efficient and Effective Inference: The Joint Decision of Multi-Participants

Title Towards More Efficient and Effective Inference: The Joint Decision of Multi-Participants
Authors Hui Zhu, Zhulin An, Kaiqiang Xu, Xiaolong Hu, Yongjun Xu
Abstract Existing approaches to improve the performances of convolutional neural networks by optimizing the local architectures or deepening the networks tend to increase the size of models significantly. In order to deploy and apply the neural networks to edge devices which are in great demand, reducing the scale of networks are quite crucial. However, It is easy to degrade the performance of image processing by compressing the networks. In this paper, we propose a method which is suitable for edge devices while improving the efficiency and effectiveness of inference. The joint decision of multi-participants, mainly contain multi-layers and multi-networks, can achieve higher classification accuracy (0.26% on CIFAR-10 and 4.49% on CIFAR-100 at most) with similar total number of parameters for classical convolutional neural networks.
Published 2020-01-19
URL https://arxiv.org/abs/2001.06774v1
PDF https://arxiv.org/pdf/2001.06774v1.pdf
PWC https://paperswithcode.com/paper/towards-more-efficient-and-effective

Universal Data Anomaly Detection via Inverse Generative Adversary Network

Title Universal Data Anomaly Detection via Inverse Generative Adversary Network
Authors Kursat Rasim Mestav, Lang Tong
Abstract The problem of detecting data anomaly is considered. Under the null hypothesis that models anomaly-free data, measurements are assumed to be from an unknown distribution with some authenticated historical samples. Under the composite alternative hypothesis, measurements are from an unknown distribution positive distance away from the distribution under the null hypothesis. No training data are available for the distribution of anomaly data. A semi-supervised deep learning technique based on an inverse generative adversary network is proposed.
Tasks Anomaly Detection
Published 2020-01-23
URL https://arxiv.org/abs/2001.08809v1
PDF https://arxiv.org/pdf/2001.08809v1.pdf
PWC https://paperswithcode.com/paper/universal-data-anomaly-detection-via-inverse

A Derivative-Free Method for Solving Elliptic Partial Differential Equations with Deep Neural Networks

Title A Derivative-Free Method for Solving Elliptic Partial Differential Equations with Deep Neural Networks
Authors Jihun Han, Mihai Nica, Adam R Stinchcombe
Abstract We introduce a deep neural network based method for solving a class of elliptic partial differential equations. We approximate the solution of the PDE with a deep neural network which is trained under the guidance of a probabilistic representation of the PDE in the spirit of the Feynman-Kac formula. The solution is given by an expectation of a martingale process driven by a Brownian motion. As Brownian walkers explore the domain, the deep neural network is iteratively trained using a form of reinforcement learning. Our method is a ‘Derivative-Free Loss Method’ since it does not require the explicit calculation of the derivatives of the neural network with respect to the input neurons in order to compute the training loss. The advantages of our method are showcased in a series of test problems: a corner singularity problem, an interface problem, and an application to a chemotaxis population model.
Published 2020-01-17
URL https://arxiv.org/abs/2001.06145v1
PDF https://arxiv.org/pdf/2001.06145v1.pdf
PWC https://paperswithcode.com/paper/a-derivative-free-method-for-solving-elliptic
comments powered by Disqus