April 3, 2020

3417 words 17 mins read

Paper Group ANR 8

The Variational InfoMax Learning Objective. An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning. Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud. RiskOracle: A Minute-level Citywide Traffic Accident Forecasting Framework. Synchronization in 5G: …

The Variational InfoMax Learning Objective


Title	The Variational InfoMax Learning Objective
Authors	Vincenzo Crescimanna, Bruce Graham
Abstract	Bayesian Inference and Information Bottleneck are the two most popular objectives for neural networks, but they can be optimised only via a variational lower bound: the Variational Information Bottleneck (VIB). In this manuscript we show that the two objectives are actually equivalent to the InfoMax: maximise the information between the data and the labels. The InfoMax representation of the two objectives is not relevant only per se, since it helps to understand the role of the network capacity, but also because it allows us to derive a variational objective, the Variational InfoMax (VIM), that maximises them directly without resorting to any lower bound. The theoretical improvement of VIM over VIB is highlighted by the computational experiments, where the model trained by VIM improves the VIB model in three different tasks: accuracy, robustness to noise and representation quality.
Tasks	Bayesian Inference
Published	2020-03-07
URL	https://arxiv.org/abs/2003.03524v1
PDF	https://arxiv.org/pdf/2003.03524v1.pdf
PWC	https://paperswithcode.com/paper/the-variational-infomax-learning-objective
Repo
Framework

An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning


Title	An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning
Authors	Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
Abstract	The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other. These systems can be combined, for example, into a cascaded system where CM produces first a decision whether the input is synthetic or bona fide speech. In case the CM decides it is a bona fide sample, then the ASV system will consider it for speaker verification. End users of the system are not interested in the performance of the individual sub-modules, but instead are interested in the performance of the combined system. Such combination can be evaluated with tandem detection cost function (t-DCF) measure, yet the individual components are trained separately from each other using their own performance metrics. In this work we study training the ASV and CM components together for a better t-DCF measure by using reinforcement learning. We demonstrate that such training procedure indeed is able to improve the performance of the combined system, and does so with more reliable results than with the standard supervised learning techniques we compare against.
Tasks	Speaker Verification
Published	2020-02-06
URL	https://arxiv.org/abs/2002.03801v1
PDF	https://arxiv.org/pdf/2002.03801v1.pdf
PWC	https://paperswithcode.com/paper/an-initial-investigation-on-optimizing-tandem
Repo
Framework

Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud


Title	Enabling Efficient and Flexible FPGA Virtualization for Deep Learning in the Cloud
Authors	Shulin Zeng, Guohao Dai, Hanbo Sun, Kai Zhong, Guangjun Ge, Kaiyuan Guo, Yu Wang, Huazhong Yang
Abstract	FPGAs have shown great potential in providing low-latency and energy-efficient solutions for deep neural network (DNN) inference applications. Currently, the majority of FPGA-based DNN accelerators in the cloud run in a time-division multiplexing way for multiple users sharing a single FPGA, and require re-compilation with $\sim$100 s overhead. Such designs lead to poor isolation and heavy performance loss for multiple users, which are far away from providing efficient and flexible FPGA virtualization for neither public nor private cloud scenarios. To solve these problems, we introduce a novel virtualization framework for instruction architecture set (ISA) based on DNN accelerators by sharing a single FPGA. We enable the isolation by introducing a two-level instruction dispatch module and a multi-core based hardware resources pool. Such designs provide isolated and runtime-programmable hardware resources, further leading to performance isolation for multiple users. On the other hand, to overcome the heavy re-compilation overheads, we propose a tiling-based instruction frame package design and two-stage static-dynamic compilation. Only the light-weight runtime information is re-compiled with $\sim$1 ms overhead, thus the performance is guaranteed for the private cloud. Our extensive experimental results show that the proposed virtualization design achieves 1.07-1.69x and 1.88-3.12x throughput improvement over previous static designs using the single-core and the multi-core architectures, respectively.
Tasks
Published	2020-03-26
URL	https://arxiv.org/abs/2003.12101v1
PDF	https://arxiv.org/pdf/2003.12101v1.pdf
PWC	https://paperswithcode.com/paper/enabling-efficient-and-flexible-fpga
Repo
Framework

RiskOracle: A Minute-level Citywide Traffic Accident Forecasting Framework


Title	RiskOracle: A Minute-level Citywide Traffic Accident Forecasting Framework
Authors	Zhengyang Zhou, Yang Wang, Xike Xie, Lianliang Chen, Hengchang Liu
Abstract	Real-time traffic accident forecasting is increasingly important for public safety and urban management (e.g., real-time safe route planning and emergency response deployment). Previous works on accident forecasting are often performed on hour levels, utilizing existed neural networks with static region-wise correlations taken into account. However, it is still challenging when the granularity of forecasting step improves as the highly dynamic nature of road network and inherent rareness of accident records in one training sample, which leads to biased results and zero-inflated issue. In this work, we propose a novel framework RiskOracle, to improve the prediction granularity to minute levels. Specifically, we first transform the zero-risk values in labels to fit the training network. Then, we propose the Differential Time-varying Graph neural network (DTGN) to capture the immediate changes of traffic status and dynamic inter-subregion correlations. Furthermore, we adopt multi-task and region selection schemes to highlight citywide most-likely accident subregions, bridging the gap between biased risk values and sporadic accident distribution. Extensive experiments on two real-world datasets demonstrate the effectiveness and scalability of our RiskOracle framework.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2003.00819v1
PDF	https://arxiv.org/pdf/2003.00819v1.pdf
PWC	https://paperswithcode.com/paper/riskoracle-a-minute-level-citywide-traffic
Repo
Framework

Synchronization in 5G: a Bayesian Approach


Title	Synchronization in 5G: a Bayesian Approach
Authors	M. Goodarzi, D. Cvetkovski, N. Maletic, J. Gutierrez, E. Grass
Abstract	In this work, we propose a hybrid approach to synchronize large scale networks. In particular, we draw on Kalman Filtering (KF) along with time-stamps generated by the Precision Time Protocol (PTP) for pairwise node synchronization. Furthermore, we investigate the merit of Factor Graphs (FGs) along with Belief Propagation (BP) algorithm in achieving high precision end-to-end network synchronization. Finally, we present the idea of dividing the large-scale network into local synchronization domains, for each of which a suitable sync algorithm is utilized. The simulation results indicate that, despite the simplifications in the hybrid approach, the error in the offset estimation remains below 5 ns.
Tasks
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12660v1
PDF	https://arxiv.org/pdf/2002.12660v1.pdf
PWC	https://paperswithcode.com/paper/synchronization-in-5g-a-bayesian-approach
Repo
Framework

Value-driven Hindsight Modelling


Title	Value-driven Hindsight Modelling
Authors	Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski, Doina Precup, David Silver, Nicolas Heess
Abstract	Value estimation is a critical component of the reinforcement learning (RL) paradigm. The question of how to effectively learn predictors for value from data is one of the major problems studied by the RL community, and different approaches exploit structure in the problem domain in different ways. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. In contrast, model-free methods directly leverage the quantity of interest from the future but have to compose with a potentially weak scalar signal (an estimate of the return). In this paper we develop an approach for representation learning in RL that sits in between these two extremes: we propose to learn what to model in a way that can directly help value prediction. To this end we determine which features of the future trajectory provide useful information to predict the associated return. This provides us with tractable prediction targets that are directly relevant for a task, and can thus accelerate learning of the value function. The idea can be understood as reasoning, in hindsight, about which aspects of the future observations could help past value prediction. We show how this can help dramatically even in simple policy evaluation settings. We then test our approach at scale in challenging domains, including on 57 Atari 2600 games.
Tasks	Atari Games, Representation Learning
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08329v1
PDF	https://arxiv.org/pdf/2002.08329v1.pdf
PWC	https://paperswithcode.com/paper/value-driven-hindsight-modelling-1
Repo
Framework

A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth


Title	A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Authors	Yiping Lu, Chao Ma, Yulong Lu, Jianfeng Lu, Lexing Ying
Abstract	Training deep neural networks with stochastic gradient descent (SGD) can often achieve zero training loss on real-world tasks although the optimization landscape is known to be highly non-convex. To understand the success of SGD for training deep neural networks, this work presents a mean-field analysis of deep residual networks, based on a line of works that interpret the continuum limit of the deep residual network as an ordinary differential equation when the network capacity tends to infinity. Specifically, we propose a new continuum limit of deep residual networks, which enjoys a good landscape in the sense that every local minimizer is global. This characterization enables us to derive the first global convergence result for multilayer neural networks in the mean-field regime. Furthermore, without assuming the convexity of the loss landscape, our proof relies on a zero-loss assumption at the global minimizer that can be achieved when the model shares a universal approximation property. Key to our result is the observation that a deep residual network resembles a shallow network ensemble, i.e. a two-layer network. We bound the difference between the shallow network and our ResNet model via the adjoint sensitivity method, which enables us to apply existing mean-field analyses of two-layer networks to deep networks. Furthermore, we propose several novel training schemes based on the new continuous model, including one training procedure that switches the order of the residual blocks and results in strong empirical performance on the benchmark datasets.
Tasks
Published	2020-03-11
URL	https://arxiv.org/abs/2003.05508v1
PDF	https://arxiv.org/pdf/2003.05508v1.pdf
PWC	https://paperswithcode.com/paper/a-mean-field-analysis-of-deep-resnet-and
Repo
Framework

Near-Optimal Algorithms for Minimax Optimization


Title	Near-Optimal Algorithms for Minimax Optimization
Authors	Tianyi Lin, Chi Jin, Michael. I. Jordan
Abstract	This paper resolves a longstanding open question pertaining to the design of near-optimal first-order algorithms for smooth and strongly-convex-strongly-concave minimax problems. Current state-of-the-art first-order algorithms find an approximate Nash equilibrium using $\tilde{O}(\kappa_{\mathbf x}+\kappa_{\mathbf y})$ or $\tilde{O}(\min{\kappa_{\mathbf x}\sqrt{\kappa_{\mathbf y}}, \sqrt{\kappa_{\mathbf x}}\kappa_{\mathbf y}})$ gradient evaluations, where $\kappa_{\mathbf x}$ and $\kappa_{\mathbf y}$ are the condition numbers for the strong-convexity and strong-concavity assumptions. A gap remains between these results and the best existing lower bound $\tilde{\Omega}(\sqrt{\kappa_{\mathbf x}\kappa_{\mathbf y}})$. This paper presents the first algorithm with $\tilde{O}(\sqrt{\kappa_{\mathbf x}\kappa_{\mathbf y}})$ gradient complexity, matching the lower bound up to logarithmic factors. Our new algorithm is designed based on an accelerated proximal point method and an accelerated solver for minimax proximal steps. It can be easily extended to the settings of strongly-convex-concave, convex-concave, nonconvex-strongly-concave, and nonconvex-concave functions. This paper also presents algorithms that match or outperform all existing methods in these settings in terms of gradient complexity, up to logarithmic factors.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.02417v2
PDF	https://arxiv.org/pdf/2002.02417v2.pdf
PWC	https://paperswithcode.com/paper/near-optimal-algorithms-for-minimax
Repo
Framework

From Matching with Diversity Constraints to Matching with Regional Quotas


Title	From Matching with Diversity Constraints to Matching with Regional Quotas
Authors	Haris Aziz, Serge Gaspers, Zhaohong Sun, Toby Walsh
Abstract	In the past few years, several new matching models have been proposed and studied that take into account complex distributional constraints. Relevant lines of work include (1) school choice with diversity constraints where students have (possibly overlapping) types and (2) hospital-doctor matching where various regional quotas are imposed. In this paper, we present a polynomial-time reduction to transform an instance of (1) to an instance of (2) and we show how the feasibility and stability of corresponding matchings are preserved under the reduction. Our reduction provides a formal connection between two important strands of work on matching with distributional constraints. We then apply the reduction in two ways. Firstly, we show that it is NP-complete to check whether a feasible and stable outcome for (1) exists. Due to our reduction, these NP-completeness results carry over to setting (2). In view of this, we help unify some of the results that have been presented in the literature. Secondly, if we have positive results for (2), then we have corresponding results for (1). One key conclusion of our results is that further developments on axiomatic and algorithmic aspects of hospital-doctor matching with regional quotas will result in corresponding results for school choice with diversity constraints.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06748v1
PDF	https://arxiv.org/pdf/2002.06748v1.pdf
PWC	https://paperswithcode.com/paper/from-matching-with-diversity-constraints-to
Repo
Framework

The Break-Even Point on Optimization Trajectories of Deep Neural Networks


Title	The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Authors	Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras
Abstract	The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the “break-even” point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD. In particular, we demonstrate on multiple classification tasks that using a large learning rate in the initial phase of training reduces the variance of the gradient, and improves the conditioning of the covariance of gradients. These effects are beneficial from the optimization perspective and become visible after the break-even point. Complementing prior work, we also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers. In short, our work shows that key properties of the loss surface are strongly influenced by SGD in the early phase of training. We argue that studying the impact of the identified effects on generalization is a promising future direction.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09572v1
PDF	https://arxiv.org/pdf/2002.09572v1.pdf
PWC	https://paperswithcode.com/paper/the-break-even-point-on-optimization
Repo
Framework

A Survey of Methods for Low-Power Deep Learning and Computer Vision


Title	A Survey of Methods for Low-Power Deep Learning and Computer Vision
Authors	Abhinav Goel, Caleb Tung, Yung-Hsiang Lu, George K. Thiruvathukal
Abstract	Deep neural networks (DNNs) are successful in many computer vision tasks. However, the most accurate DNNs require millions of parameters and operations, making them energy, computation and memory intensive. This impedes the deployment of large DNNs in low-power devices with limited compute resources. Recent research improves DNN models by reducing the memory requirement, energy consumption, and number of operations without significantly decreasing the accuracy. This paper surveys the progress of low-power deep learning and computer vision, specifically in regards to inference, and discusses the methods for compacting and accelerating DNN models. The techniques can be divided into four major categories: (1) parameter quantization and pruning, (2) compressed convolutional filters and matrix factorization, (3) network architecture search, and (4) knowledge distillation. We analyze the accuracy, advantages, disadvantages, and potential solutions to the problems with the techniques in each category. We also discuss new evaluation metrics as a guideline for future research.
Tasks	Quantization
Published	2020-03-24
URL	https://arxiv.org/abs/2003.11066v1
PDF	https://arxiv.org/pdf/2003.11066v1.pdf
PWC	https://paperswithcode.com/paper/a-survey-of-methods-for-low-power-deep
Repo
Framework

BASGD: Buffered Asynchronous SGD for Byzantine Learning


Title	BASGD: Buffered Asynchronous SGD for Byzantine Learning
Authors	Yi-Rui Yang, Wu-Jun Li
Abstract	Distributed learning has become a hot research topic, due to its wide application in cluster-based large-scale learning, federated learning, edge computing and so on. Most distributed learning methods assume no error and attack on the workers. However, many unexpected cases, such as communication error and even malicious attack, may happen in real applications. Hence, Byzantine learning (BL), which refers to distributed learning with attack or error, has recently attracted much attention. Most existing BL methods are synchronous, which will result in slow convergence when there exist heterogeneous workers. Furthermore, in some applications like federated learning and edge computing, synchronization cannot even be performed most of the time due to the online workers (clients or edge servers). Hence, asynchronous BL (ABL) is more general and practical than synchronous BL (SBL). To the best of our knowledge, there exist only two ABL methods. One of them cannot resist malicious attack. The other needs to store some training instances on the server, which has the privacy leak problem. In this paper, we propose a novel method, called buffered asynchronous stochastic gradient descent (BASGD), for BL. BASGD is an asynchronous method. Furthermore, BASGD has no need to store any training instances on the server, and hence can preserve privacy in ABL. BASGD is theoretically proved to have the ability of resisting against error and malicious attack. Moreover, BASGD has a similar theoretical convergence rate to that of vanilla asynchronous SGD (ASGD), with an extra constant variance. Empirical results show that BASGD can significantly outperform vanilla ASGD and other ABL baselines, when there exists error or attack on workers.
Tasks
Published	2020-03-02
URL	https://arxiv.org/abs/2003.00937v2
PDF	https://arxiv.org/pdf/2003.00937v2.pdf
PWC	https://paperswithcode.com/paper/basgd-buffered-asynchronous-sgd-for-byzantine
Repo
Framework


Title	Registration made easy – standalone orthopedic navigation with HoloLens
Authors	Florentin Liebmann, Simon Roner, Marco von Atzigen, Florian Wanivenhaus, Caroline Neuhaus, José Spirig, Davide Scaramuzza, Reto Sutter, Jess Snedeker, Mazda Farshad, Philipp Fürnstahl
Abstract	In surgical navigation, finding correspondence between preoperative plan and intraoperative anatomy, the so-called registration task, is imperative. One promising approach is to intraoperatively digitize anatomy and register it with the preoperative plan. State-of-the-art commercial navigation systems implement such approaches for pedicle screw placement in spinal fusion surgery. Although these systems improve surgical accuracy, they are not gold standard in clinical practice. Besides economical reasons, this may be due to their difficult integration into clinical workflows and unintuitive navigation feedback. Augmented Reality has the potential to overcome these limitations. Consequently, we propose a surgical navigation approach comprising intraoperative surface digitization for registration and intuitive holographic navigation for pedicle screw placement that runs entirely on the Microsoft HoloLens. Preliminary results from phantom experiments suggest that the method may meet clinical accuracy requirements.
Tasks
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06209v1
PDF	https://arxiv.org/pdf/2001.06209v1.pdf
PWC	https://paperswithcode.com/paper/registration-made-easy-standalone-orthopedic
Repo
Framework

IoT Based Real Time Noise Mapping System for Urban Sound Pollution Study


Title	IoT Based Real Time Noise Mapping System for Urban Sound Pollution Study
Authors	Sakib Ahmed, Touseef Saleh Bin Ahmed, Sumaiya Jafreen, Jannatul Tajrin, Jia Uddin
Abstract	This paper describes the development of a system that enables real time data visualization via a webapp regarding sound intensity using multiple node devices connected through internet. The prototypes were realized using ATmega328 (Arduino Nano) and ESP8266 hardware modules, NodeMCU Arduino wrapper library, Google maps and firebase API along with JavaScript webapp. System architecture is such that multiple node devices will be installed in different locations of the target area. On each node device, an Arduino Nano interfaced with a Sound Sensor measures the ambient sound intensity and ESP8266 Wi-Fi module transmits the data to a database via web API. On the webapp, it plots all the real-time data from the devices over Google maps according to the locations of the node devices. The logged data that is collected can then be used to carry out researches regarding sound pollution in targeted areas.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.11188v1
PDF	https://arxiv.org/pdf/2002.11188v1.pdf
PWC	https://paperswithcode.com/paper/iot-based-real-time-noise-mapping-system-for
Repo
Framework

Chemical-induced Disease Relation Extraction with Dependency Information and Prior Knowledge


Title	Chemical-induced Disease Relation Extraction with Dependency Information and Prior Knowledge
Authors	Huiwei Zhou, Shixian Ning, Yunlong Yang, Zhuang Liu, Chengkun Lang, Yingyu Lin
Abstract	Chemical-disease relation (CDR) extraction is significantly important to various areas of biomedical research and health care. Nowadays, many large-scale biomedical knowledge bases (KBs) containing triples about entity pairs and their relations have been built. KBs are important resources for biomedical relation extraction. However, previous research pays little attention to prior knowledge. In addition, the dependency tree contains important syntactic and semantic information, which helps to improve relation extraction. So how to effectively use it is also worth studying. In this paper, we propose a novel convolutional attention network (CAN) for CDR extraction. Firstly, we extract the shortest dependency path (SDP) between chemical and disease pairs in a sentence, which includes a sequence of words, dependency directions, and dependency relation tags. Then the convolution operations are performed on the SDP to produce deep semantic dependency features. After that, an attention mechanism is employed to learn the importance/weight of each semantic dependency vector related to knowledge representations learned from KBs. Finally, in order to combine dependency information and prior knowledge, the concatenation of weighted semantic dependency representations and knowledge representations is fed to the softmax layer for classification. Experiments on the BioCreative V CDR dataset show that our method achieves comparable performance with the state-of-the-art systems, and both dependency information and prior knowledge play important roles in CDR extraction task.
Tasks	Relation Extraction
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00295v1
PDF	https://arxiv.org/pdf/2001.00295v1.pdf
PWC	https://paperswithcode.com/paper/chemical-induced-disease-relation-extraction
Repo
Framework