April 3, 2020

2969 words 14 mins read

Paper Group ANR 11

Paper Group ANR 11

Communication Efficient Federated Learning over Multiple Access Channels. Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks. A Study of Human Summaries of Scientific Articles. Low-Complexity LSTM Training and Inference with FloatSD8 Weight Representation. Least squares binary quantization of neural …

Communication Efficient Federated Learning over Multiple Access Channels

Title Communication Efficient Federated Learning over Multiple Access Channels
Authors Wei-Ting Chang, Ravi Tandon
Abstract In this work, we study the problem of federated learning (FL), where distributed users aim to jointly train a machine learning model with the help of a parameter server (PS). In each iteration of FL, users compute local gradients, followed by transmission of the quantized gradients for subsequent aggregation and model updates at PS. One of the challenges of FL is that of communication overhead due to FL’s iterative nature and large model sizes. One recent direction to alleviate communication bottleneck in FL is to let users communicate simultaneously over a multiple access channel (MAC), possibly making better use of the communication resources. In this paper, we consider the problem of FL learning over a MAC. In particular, we focus on the design of digital gradient transmission schemes over a MAC, where gradients at each user are first quantized, and then transmitted over a MAC to be decoded individually at the PS. When designing digital FL schemes over MACs, there are new opportunities to assign different amount of resources (such as rate or bandwidth) to different users based on a) the informativeness of the gradients at each user, and b) the underlying channel conditions. We propose a stochastic gradient quantization scheme, where the quantization parameters are optimized based on the capacity region of the MAC. We show that such channel aware quantization for FL outperforms uniform quantization, particularly when users experience different channel conditions, and when have gradients with varying levels of informativeness.
Tasks Quantization
Published 2020-01-23
URL https://arxiv.org/abs/2001.08737v1
PDF https://arxiv.org/pdf/2001.08737v1.pdf
PWC https://paperswithcode.com/paper/communication-efficient-federated-learning

Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

Title Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks
Authors Léopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Mehran Nekuii, Oguz H Elibol, Hanlin Tang
Abstract Training with larger number of parameters while keeping fast iterations is an increasingly adopted strategy and trend for developing better performing Deep Neural Network (DNN) models. This necessitates increased memory footprint and computational requirements for training. Here we introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers. Reduced bit precision allows for a larger effective memory and increased computational speed. We name this method Shifted and Squeezed FP8 (S2FP8). We show that, unlike previous 8-bit precision training methods, the proposed method works out-of-the-box for representative models: ResNet-50, Transformer and NCF. The method can maintain model accuracy without requiring fine-tuning loss scaling parameters or keeping certain layers in single precision. We introduce two learnable statistics of the DNN tensors - shifted and squeezed factors that are used to optimally adjust the range of the tensors in 8-bits, thus minimizing the loss in information due to quantization.
Tasks Quantization
Published 2020-01-16
URL https://arxiv.org/abs/2001.05674v1
PDF https://arxiv.org/pdf/2001.05674v1.pdf
PWC https://paperswithcode.com/paper/shifted-and-squeezed-8-bit-floating-point-1

A Study of Human Summaries of Scientific Articles

Title A Study of Human Summaries of Scientific Articles
Authors Odellia Boni, Guy Feigenblat, Doron Cohen, Haggai Roitman, David Konopnicki
Abstract Researchers and students face an explosion of newly published papers which may be relevant to their work. This led to a trend of sharing human summaries of scientific papers. We analyze the summaries shared in one of these platforms Shortscience.org. The goal is to characterize human summaries of scientific papers, and use some of the insights obtained to improve and adapt existing automatic summarization systems to the domain of scientific papers.
Published 2020-02-10
URL https://arxiv.org/abs/2002.03604v1
PDF https://arxiv.org/pdf/2002.03604v1.pdf
PWC https://paperswithcode.com/paper/a-study-of-human-summaries-of-scientific

Low-Complexity LSTM Training and Inference with FloatSD8 Weight Representation

Title Low-Complexity LSTM Training and Inference with FloatSD8 Weight Representation
Authors Yu-Tung Liu, Tzi-Dar Chiueh
Abstract The FloatSD technology has been shown to have excellent performance on low-complexity convolutional neural networks (CNNs) training and inference. In this paper, we applied FloatSD to recurrent neural networks (RNNs), specifically long short-term memory (LSTM). In addition to FloatSD weight representation, we quantized the gradients and activations in model training to 8 bits. Moreover, the arithmetic precision for accumulations and the master copy of weights were reduced from 32 bits to 16 bits. We demonstrated that the proposed training scheme can successfully train several LSTM models from scratch, while fully preserving model accuracy. Finally, to verify the proposed method’s advantage in implementation, we designed an LSTM neuron circuit and showed that it achieved significantly reduced die area and power consumption.
Published 2020-01-23
URL https://arxiv.org/abs/2001.08450v1
PDF https://arxiv.org/pdf/2001.08450v1.pdf
PWC https://paperswithcode.com/paper/low-complexity-lstm-training-and-inference

Least squares binary quantization of neural networks

Title Least squares binary quantization of neural networks
Authors Hadi Pouransari, Zhucheng Tu, Oncel Tuzel
Abstract Quantizing weights and activations of deep neural networks results in significant improvement in inference efficiency at the cost of lower accuracy. A source of the accuracy gap between full precision and quantized models is the quantization error. In this work, we focus on the binary quantization, in which values are mapped to -1 and 1. We provide a unified framework to analyze different scaling strategies. Inspired by the pareto-optimality of 2-bits versus 1-bit quantization, we introduce a novel 2-bits quantization with provably least squares error. Our quantization algorithms can be implemented efficiently on the hardware using bitwise operations. We present proofs to show that our proposed methods are optimal, and also provide empirical error analysis. We conduct experiments on the ImageNet dataset and show a reduced accuracy gap when using the proposed least squares quantization algorithms.
Tasks Quantization
Published 2020-01-09
URL https://arxiv.org/abs/2001.02786v2
PDF https://arxiv.org/pdf/2001.02786v2.pdf
PWC https://paperswithcode.com/paper/least-squares-binary-quantization-of-neural

MVC-Net: A Convolutional Neural Network Architecture for Manifold-Valued Images With Applications

Title MVC-Net: A Convolutional Neural Network Architecture for Manifold-Valued Images With Applications
Authors Jose J. Bouza, Chun-Hao Yang, David Vaillancourt, Baba C. Vemuri
Abstract Geometric deep learning has attracted significant attention in recent years, in part due to the availability of exotic data types for which traditional neural network architectures are not well suited. Our goal in this paper is to generalize convolutional neural networks (CNN) to the manifold-valued image case which arises commonly in medical imaging and computer vision applications. Explicitly, the input data to the network is an image where each pixel value is a sample from a Riemannian manifold. To achieve this goal, we must generalize the basic building block of traditional CNN architectures, namely, the weighted combinations operation. To this end, we develop a tangent space combination operation which is used to define a convolution operation on manifold-valued images that we call, the Manifold-Valued Convolution (MVC). We prove theoretical properties of the MVC operation, including equivariance to the action of the isometry group admitted by the manifold and characterizing when compositions of MVC layers collapse to a single layer. We present a detailed description of how to use MVC layers to build full, multi-layer neural networks that operate on manifold-valued images, which we call the MVC-net. Further, we empirically demonstrate superior performance of the MVC-nets in medical imaging and computer vision tasks.
Published 2020-03-02
URL https://arxiv.org/abs/2003.01234v2
PDF https://arxiv.org/pdf/2003.01234v2.pdf
PWC https://paperswithcode.com/paper/mvc-net-a-convolutional-neural-network

Bounding the expected run-time of nonconvex optimization with early stopping

Title Bounding the expected run-time of nonconvex optimization with early stopping
Authors Thomas Flynn, Kwang Min Yu, Abid Malik, Nicolas D’Imperio, Shinjae Yoo
Abstract This work examines the convergence of stochastic gradient-based optimization algorithms that use early stopping based on a validation function. The form of early stopping we consider is that optimization terminates when the norm of the gradient of a validation function falls below a threshold. We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion. The guarantee accounts for the distance between the training and validation sets, measured with the Wasserstein distance. We develop the approach in the general setting of a first-order optimization algorithm, with possibly biased update directions subject to a geometric drift condition. We then derive bounds on the expected running time for early stopping variants of several algorithms, including stochastic gradient descent (SGD), decentralized SGD (DSGD), and the stochastic variance reduced gradient (SVRG) algorithm. Finally, we consider the generalization properties of the iterate returned by early stopping.
Published 2020-02-20
URL https://arxiv.org/abs/2002.08856v1
PDF https://arxiv.org/pdf/2002.08856v1.pdf
PWC https://paperswithcode.com/paper/bounding-the-expected-run-time-of-nonconvex

Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making

Title Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making
Authors Chengchun Shi, Runzhe Wan, Rui Song, Wenbin Lu, Ling Leng
Abstract The Markov assumption (MA) is fundamental to the empirical validity of reinforcement learning. In this paper, we propose a novel Forward-Backward Learning procedure to test MA in sequential decision making. The proposed test does not assume any parametric form on the joint distribution of the observed data and plays an important role for identifying the optimal policy in high-order Markov decision processes and partially observable MDPs. We apply our test to both synthetic datasets and a real data example from mobile health studies to illustrate its usefulness.
Tasks Decision Making
Published 2020-02-05
URL https://arxiv.org/abs/2002.01751v1
PDF https://arxiv.org/pdf/2002.01751v1.pdf
PWC https://paperswithcode.com/paper/does-the-markov-decision-process-fit-the-data

Hierarchical Memory Decoding for Video Captioning

Title Hierarchical Memory Decoding for Video Captioning
Authors Aming Wu, Yahong Han
Abstract Recent advances of video captioning often employ a recurrent neural network (RNN) as the decoder. However, RNN is prone to diluting long-term information. Recent works have demonstrated memory network (MemNet) has the advantage of storing long-term information. However, as the decoder, it has not been well exploited for video captioning. The reason partially comes from the difficulty of sequence decoding with MemNet. Instead of the common practice, i.e., sequence decoding with RNN, in this paper, we devise a novel memory decoder for video captioning. Concretely, after obtaining representation of each frame through a pre-trained network, we first fuse the visual and lexical information. Then, at each time step, we construct a multi-layer MemNet-based decoder, i.e., in each layer, we employ a memory set to store previous information and an attention mechanism to select the information related to the current input. Thus, this decoder avoids the dilution of long-term information. And the multi-layer architecture is helpful for capturing dependencies between frames and word sequences. Experimental results show that even without the encoding network, our decoder still could obtain competitive performance and outperform the performance of RNN decoder. Furthermore, compared with one-layer RNN decoder, our decoder has fewer parameters.
Tasks Video Captioning
Published 2020-02-27
URL https://arxiv.org/abs/2002.11886v1
PDF https://arxiv.org/pdf/2002.11886v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-memory-decoding-for-video

Generating EEG features from Acoustic features

Title Generating EEG features from Acoustic features
Authors Gautam Krishna, Co Tran, Mason Carnahan, Yan Han, Ahmed H Tewfik
Abstract In this paper we demonstrate predicting electroencephalograpgy (EEG) features from acoustic features using recurrent neural network (RNN) based regression model and generative adversarial network (GAN). We predict various types of EEG features from acoustic features. We compare our results with the previously studied problem on speech synthesis using EEG and our results demonstrate that EEG features can be generated from acoustic features with lower root mean square error (RMSE), normalized RMSE values compared to generating acoustic features from EEG features (ie: speech synthesis using EEG) when tested using the same data sets.
Tasks EEG, Speech Synthesis
Published 2020-02-29
URL https://arxiv.org/abs/2003.00007v2
PDF https://arxiv.org/pdf/2003.00007v2.pdf
PWC https://paperswithcode.com/paper/generating-eeg-features-from-acoustic

Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO

Title Continuous-action Reinforcement Learning for Playing Racing Games: Comparing SPG to PPO
Authors Mario S. Holubar, Marco A. Wiering
Abstract In this paper, a novel racing environment for OpenAI Gym is introduced. This environment operates with continuous action- and state-spaces and requires agents to learn to control the acceleration and steering of a car while navigating a randomly generated racetrack. Different versions of two actor-critic learning algorithms are tested on this environment: Sampled Policy Gradient (SPG) and Proximal Policy Optimization (PPO). An extension of SPG is introduced that aims to improve learning performance by weighting action samples during the policy update step. The effect of using experience replay (ER) is also investigated. To this end, a modification to PPO is introduced that allows for training using old action samples by optimizing the actor in log space. Finally, a new technique for performing ER is tested that aims to improve learning speed without sacrificing performance by splitting the training into two parts, whereby networks are first trained using state transitions from the replay buffer, and then using only recent experiences. The results indicate that experience replay is not beneficial to PPO in continuous action spaces. The training of SPG seems to be more stable when actions are weighted. All versions of SPG outperform PPO when ER is used. The ER trick is effective at improving training speed on a computationally less intensive version of SPG.
Published 2020-01-15
URL https://arxiv.org/abs/2001.05270v1
PDF https://arxiv.org/pdf/2001.05270v1.pdf
PWC https://paperswithcode.com/paper/continuous-action-reinforcement-learning-for

Visualizing intestines for diagnostic assistance of ileus based on intestinal region segmentation from 3D CT images

Title Visualizing intestines for diagnostic assistance of ileus based on intestinal region segmentation from 3D CT images
Authors Hirohisa Oda, Kohei Nishio, Takayuki Kitasaka, Hizuru Amano, Aitaro Takimoto, Hiroo Uchida, Kojiro Suzuki, Hayato Itoh, Masahiro Oda, Kensaku Mori
Abstract This paper presents a visualization method of intestine (the small and large intestines) regions and their stenosed parts caused by ileus from CT volumes. Since it is difficult for non-expert clinicians to find stenosed parts, the intestine and its stenosed parts should be visualized intuitively. Furthermore, the intestine regions of ileus cases are quite hard to be segmented. The proposed method segments intestine regions by 3D FCN (3D U-Net). Intestine regions are quite difficult to be segmented in ileus cases since the inside the intestine is filled with fluids. These fluids have similar intensities with intestinal wall on 3D CT volumes. We segment the intestine regions by using 3D U-Net trained by a weak annotation approach. Weak-annotation makes possible to train the 3D U-Net with small manually-traced label images of the intestine. This avoids us to prepare many annotation labels of the intestine that has long and winding shape. Each intestine segment is volume-rendered and colored based on the distance from its endpoint in volume rendering. Stenosed parts (disjoint points of an intestine segment) can be easily identified on such visualization. In the experiments, we showed that stenosed parts were intuitively visualized as endpoints of segmented regions, which are colored by red or blue.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01290v1
PDF https://arxiv.org/pdf/2003.01290v1.pdf
PWC https://paperswithcode.com/paper/visualizing-intestines-for-diagnostic

Consciousness and Automated Reasoning

Title Consciousness and Automated Reasoning
Authors Ulrike Barthelmeß, Ulrich Furbach, Claudia Schon
Abstract This paper aims at demonstrating how a first-order logic reasoning system in combination with a large knowledge base can be understood as an artificial consciousness system. For this we review some aspects from the area of philosophy of mind and in particular Baars’ Global Workspace Theory. This will be applied to the reasoning system Hyper with ConceptNet as a knowledge base. Finally we demonstrate that such a system is very well able to do conscious mind wandering.
Published 2020-01-26
URL https://arxiv.org/abs/2001.09442v1
PDF https://arxiv.org/pdf/2001.09442v1.pdf
PWC https://paperswithcode.com/paper/consciousness-and-automated-reasoning

Speech Synthesis using EEG

Title Speech Synthesis using EEG
Authors Gautam Krishna, Co Tran, Yan Han, Mason Carnahan
Abstract In this paper we demonstrate speech synthesis using different electroencephalography (EEG) feature sets recently introduced in [1]. We make use of a recurrent neural network (RNN) regression model to predict acoustic features directly from EEG features. We demonstrate our results using EEG features recorded in parallel with spoken speech as well as using EEG recorded in parallel with listening utterances. We provide EEG based speech synthesis results for four subjects in this paper and our results demonstrate the feasibility of synthesizing speech directly from EEG features.
Tasks EEG, Speech Synthesis
Published 2020-02-22
URL https://arxiv.org/abs/2002.12756v1
PDF https://arxiv.org/pdf/2002.12756v1.pdf
PWC https://paperswithcode.com/paper/speech-synthesis-using-eeg

Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia

Title Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia
Authors Xiaowei Xu, Xiangao Jiang, Chunlian Ma, Peng Du, Xukun Li, Shuangzhi Lv, Liang Yu, Yanfei Chen, Junwei Su, Guanjing Lang, Yongtao Li, Hong Zhao, Kaijin Xu, Lingxiang Ruan, Wei Wu
Abstract We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of viral pneumonia, such as Influenza-A viral pneumonia. Therefore, clinical doctors call for another early diagnostic criteria for this new type of pneumonia as soon as possible.This study aimed to establish an early screening model to distinguish COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases with pulmonary CT images using deep learning techniques. The candidate infection regions were first segmented out using a 3-dimensional deep learning model from pulmonary CT image set. These separated images were then categorized into COVID-19, Influenza-A viral pneumonia and irrelevant to infection groups, together with the corresponding confidence scores using a location-attention classification model. Finally the infection type and total confidence score of this CT case were calculated with Noisy-or Bayesian function.The experiments result of benchmark dataset showed that the overall accuracy was 86.7 % from the perspective of CT cases as a whole.The deep learning models established in this study were effective for the early screening of COVID-19 patients and demonstrated to be a promising supplementary diagnostic method for frontline clinical doctors.
Tasks Computed Tomography (CT)
Published 2020-02-21
URL https://arxiv.org/abs/2002.09334v1
PDF https://arxiv.org/pdf/2002.09334v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-system-to-screen-coronavirus
comments powered by Disqus