April 2, 2020

3131 words 15 mins read

Paper Group ANR 153

Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency. Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels. Modular Simulation Framework for Process Variation Analysis of MRAM-based Deep Belief Networks. Citation Recommendation: Approaches and Datasets. DAF-NET: a saliency based weakly supervise …


Title	Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency
Authors	Rui Zhang, Yunxing Zhang, Xuelong Li
Abstract	Graph autoencoder (GAE) serves as an effective unsupervised learning framework to represent graph data in a latent space for network embedding. Most exiting approaches typically focus on minimizing the reconstruction loss of graph structure but neglect the reconstruction of node features, which may result in overfitting due to the capacity of the autoencoders. Additionally, the adjacency matrix in these methods is always fixed such that the adjacency matrix cannot properly represent the connections among nodes in latent space. To solve this problem, in this paper, we propose a novel Graph Convolutional Auto-encoder with Bidecoder and Adaptive-sharing Adjacency method, namely BAGA. The framework encodes the topological structure and node features into latent representations, on which a bi-decoder is trained to reconstruct the graph structure and node features simultaneously. Furthermore, the adjacency matrix can be adaptively updated by the learned latent representations for better representing the connections among nodes in latent space. Experimental results on datasets validate the superiority of our method to the state-of-the-art network embedding methods on the clustering task.
Tasks	Network Embedding
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04508v1
PDF	https://arxiv.org/pdf/2003.04508v1.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-auto-encoder-with-bi
Repo
Framework

Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels


Title	Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels
Authors	Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu
Abstract	Disease diagnosis on chest X-ray images is a challenging multi-label classification task. Previous works generally classify the diseases independently on the input image without considering any correlation among diseases. However, such correlation actually exists, for example, Pleural Effusion is more likely to appear when Pneumothorax is present. In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy. To learn more natural and reliable correlation relationship, we feed each node with the image-level individual feature map corresponding to each type of disease. To our knowledge, our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning. To further deal with a practical issue of incomplete labels, DD-GCN also utilizes an adaptive loss and a curriculum learning strategy to train the model on incomplete labels. Experimental results on two popular chest X-ray (CXR) datasets show that our prediction accuracy outperforms state-of-the-arts, and the learned graph adjacency matrix establishes the correlation representations of different diseases, which is consistent with expert experience. In addition, we apply an ablation study to demonstrate the effectiveness of each component in DD-GCN.
Tasks	Multi-Label Classification
Published	2020-02-26
URL	https://arxiv.org/abs/2002.11629v2
PDF	https://arxiv.org/pdf/2002.11629v2.pdf
PWC	https://paperswithcode.com/paper/dynamic-graph-correlation-learning-for
Repo
Framework

Modular Simulation Framework for Process Variation Analysis of MRAM-based Deep Belief Networks


Title	Modular Simulation Framework for Process Variation Analysis of MRAM-based Deep Belief Networks
Authors	Paul Wood, Hossein Pourmeidani, Ronald F. DeMara
Abstract	Magnetic Random-Access Memory (MRAM) based p-bit neuromorphic computing devices are garnering increasing interest as a means to compactly and efficiently realize machine learning operations in Restricted Boltzmann Machines (RBMs). When embedded within an RBM resistive crossbar array, the p-bit based neuron realizes a tunable sigmoidal activation function. Since the stochasticity of activation is dependent on the energy barrier of the MRAM device, it is essential to assess the impact of process variation on the voltage-dependent behavior of the sigmoid function. Other influential performance factors arise from varying energy barriers on power consumption requiring a simulation environment to facilitate the multi-objective optimization of device and network parameters. Herein, transportable Python scripts are developed to analyze the output variation under changes in device dimensions on the accuracy of machine learning applications. Evaluation with RBM circuits using the MNIST dataset reveal impacts and limits for processing variation of device fabrication in terms of the resulting energy vs. accuracy tradeoffs, and the resulting simulation framework is available via a Creative Commons license.
Tasks
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00897v1
PDF	https://arxiv.org/pdf/2002.00897v1.pdf
PWC	https://paperswithcode.com/paper/modular-simulation-framework-for-process
Repo
Framework

Citation Recommendation: Approaches and Datasets


Title	Citation Recommendation: Approaches and Datasets
Authors	Michael Färber, Adam Jatowt
Abstract	Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction into automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods, and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06961v1
PDF	https://arxiv.org/pdf/2002.06961v1.pdf
PWC	https://paperswithcode.com/paper/citation-recommendation-approaches-and
Repo
Framework

DAF-NET: a saliency based weakly supervised method of dual attention fusion for fine-grained image classification


Title	DAF-NET: a saliency based weakly supervised method of dual attention fusion for fine-grained image classification
Authors	ZiChao Dong, JiLong Wu, TingTing Ren, Yue Wang, MengYing Ge
Abstract	Fine-grained image classification is a challenging problem, since the difficulty of finding discriminative features. To handle this circumstance, basically, there are two ways to go. One is use attention based method to focus on informative areas, while the other one aims to find high order between features. Further, for attention based method there are two directions, activation based and detection based, which are proved effective by scholars. However ,rare work focus on fusing two types of attention with high order feature. In this paper, we propose a novel DAF method which fuse two types of attention and use them to as PAF(part attention filter) in deep bilinear transformation module to mine the relationship between separate parts of an object. Briefly, our network constructed by a student net who attempt to output two attention maps and a teacher net uses these two maps as empirical information to refine the result. The experiment result shows that only student net could get 87.6% accuracy in CUB dataset while cooperating with teacher net could achieve 89.1% accuracy.
Tasks	Fine-Grained Image Classification, Image Classification
Published	2020-01-04
URL	https://arxiv.org/abs/2001.02219v1
PDF	https://arxiv.org/pdf/2001.02219v1.pdf
PWC	https://paperswithcode.com/paper/daf-net-a-saliency-based-weakly-supervised
Repo
Framework

Comments on Sejnowski’s “The unreasonable effectiveness of deep learning in artificial intelligence” [arXiv:2002.04806]


Title	Comments on Sejnowski’s “The unreasonable effectiveness of deep learning in artificial intelligence” [arXiv:2002.04806]
Authors	Leslie S. Smith
Abstract	Terry Sejnowski’s 2020 paper [arXiv:2002.04806] is entitled “The unreasonable effectiveness of deep learning in artificial intelligence”. However, the paper itself doesn’t attempt to answer the implied question of why Deep Convolutional Neural Networks (DCNNs) can approximate so many of the mappings that they have been trained to model. There are detailed mathematical analyses, but this short paper attempts to look at the issue differently, considering the way that these networks are used, the subset of these functions that can be achieved by training (starting from some location in the original function space), as well as the functions that in reality will be modelled.
Tasks
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09415v1
PDF	https://arxiv.org/pdf/2003.09415v1.pdf
PWC	https://paperswithcode.com/paper/comments-on-sejnowskis-the-unreasonable
Repo
Framework

Multifold Acceleration of Diffusion MRI via Slice-Interleaved Diffusion Encoding (SIDE)


Title	Multifold Acceleration of Diffusion MRI via Slice-Interleaved Diffusion Encoding (SIDE)
Authors	Yoonmi Hong, Wei-Tang Chang, Geng Chen, Ye Wu, Weili Lin, Dinggang Shen, Pew-Thian Yap
Abstract	Diffusion MRI (dMRI) is a unique imaging technique for in vivo characterization of tissue microstructure and white matter pathways. However, its relatively long acquisition time implies greater motion artifacts when imaging, for example, infants and Parkinson’s disease patients. To accelerate dMRI acquisition, we propose in this paper (i) a diffusion encoding scheme, called Slice-Interleaved Diffusion Encoding (SIDE), that interleaves each diffusion-weighted (DW) image volume with slices that are encoded with different diffusion gradients, essentially allowing the slice-undersampling of image volume associated with each diffusion gradient to significantly reduce acquisition time, and (ii) a method based on deep learning for effective reconstruction of DW images from the highly slice-undersampled data. Evaluation based on the Human Connectome Project (HCP) dataset indicates that our method can achieve a high acceleration factor of up to 6 with minimal information loss. Evaluation using dMRI data acquired with SIDE acquisition demonstrates that it is possible to accelerate the acquisition by as much as 50 folds when combined with multi-band imaging.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10908v1
PDF	https://arxiv.org/pdf/2002.10908v1.pdf
PWC	https://paperswithcode.com/paper/multifold-acceleration-of-diffusion-mri-via
Repo
Framework

Concentration Inequalities for Multinoulli Random Variables


Title	Concentration Inequalities for Multinoulli Random Variables
Authors	Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
Abstract	We investigate concentration inequalities for Dirichlet and Multinomial random variables.
Tasks
Published	2020-01-30
URL	https://arxiv.org/abs/2001.11595v1
PDF	https://arxiv.org/pdf/2001.11595v1.pdf
PWC	https://paperswithcode.com/paper/concentration-inequalities-for-multinoulli
Repo
Framework

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation


Title	Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Authors	Alessandro Raganato, Yves Scherrer, Jörg Tiedemann
Abstract	Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that attention heads learn simple positional patterns which are often redundant. In this paper, we propose to replace all but one attention head of each encoder layer with fixed – non-learnable – attentive patterns that are solely based on position and do not require any external knowledge. Our experiments show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.
Tasks	Machine Translation
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10260v1
PDF	https://arxiv.org/pdf/2002.10260v1.pdf
PWC	https://paperswithcode.com/paper/fixed-encoder-self-attention-patterns-in
Repo
Framework

Efficient Probabilistic Logic Reasoning with Graph Neural Networks


Title	Efficient Probabilistic Logic Reasoning with Graph Neural Networks
Authors	Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, Le Song
Abstract	Markov Logic Networks (MLNs), which elegantly combine logic rules and probabilistic graphical models, can be used to address many knowledge graph problems. However, inference in MLN is computationally intensive, making the industrial-scale application of MLN very difficult. In recent years, graph neural networks (GNNs) have emerged as efficient and effective tools for large-scale graph problems. Nevertheless, GNNs do not explicitly incorporate prior logic rules into the models, and may require many labeled examples for a target task. In this paper, we explore the combination of MLNs and GNNs, and use graph neural networks for variational inference in MLN. We propose a GNN variant, named ExpressGNN, which strikes a nice balance between the representation power and the simplicity of the model. Our extensive experiments on several benchmark datasets demonstrate that ExpressGNN leads to effective and efficient probabilistic logic reasoning.
Tasks
Published	2020-01-29
URL	https://arxiv.org/abs/2001.11850v2
PDF	https://arxiv.org/pdf/2001.11850v2.pdf
PWC	https://paperswithcode.com/paper/efficient-probabilistic-logic-reasoning-with-1
Repo
Framework

Neural Architecture Search For Fault Diagnosis


Title	Neural Architecture Search For Fault Diagnosis
Authors	Xudong Li, Yang Hu, Jianhua Zheng, Mingtao Li
Abstract	Data-driven methods have made great progress in fault diagnosis, especially deep learning method. Deep learning is suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosis systems. However, designing neural network architecture requires rich professional knowledge and debugging experience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difficulty of developing deep learning models. Frortunately, neural architecture search (NAS) is developing rapidly, and is becoming one of the next directions for deep learning. In this paper, we proposed a NAS method for fault diagnosis using reinforcement learning. A recurrent neural network is used as an agent to generate network architecture. The accuracy of the generated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent are updated through the strategy gradient algorithm. We use PHM 2009 Data Challenge gearbox dataset to prove the effectiveness of proposed method, and obtain state-of-the-art results compared with other artificial designed network structures. To author’s best knowledge, it’s the first time that NAS has been applied in fault diagnosis.
Tasks	Neural Architecture Search
Published	2020-02-19
URL	https://arxiv.org/abs/2002.07997v1
PDF	https://arxiv.org/pdf/2002.07997v1.pdf
PWC	https://paperswithcode.com/paper/neural-architecture-search-for-fault
Repo
Framework

Learning Multivariate Hawkes Processes at Scale


Title	Learning Multivariate Hawkes Processes at Scale
Authors	Maximilian Nickel, Matthew Le
Abstract	Multivariate Hawkes Processes (MHPs) are an important class of temporal point processes that have enabled key advances in understanding and predicting social information systems. However, due to their complex modeling of temporal dependencies, MHPs have proven to be notoriously difficult to scale, what has limited their applications to relatively small domains. In this work, we propose a novel model and computational approach to overcome this important limitation. By exploiting a characteristic sparsity pattern in real-world diffusion processes, we show that our approach allows to compute the exact likelihood and gradients of an MHP – independently of the ambient dimensions of the underlying network. We show on synthetic and real-world datasets that our model does not only achieve state-of-the-art predictive results, but also improves runtime performance by multiple orders of magnitude compared to standard methods on sparse event sequences. In combination with easily interpretable latent variables and influence structures, this allows us to analyze diffusion processes at previously unattainable scale.
Tasks	Point Processes
Published	2020-02-28
URL	https://arxiv.org/abs/2002.12501v1
PDF	https://arxiv.org/pdf/2002.12501v1.pdf
PWC	https://paperswithcode.com/paper/learning-multivariate-hawkes-processes-at
Repo
Framework

Training Large Neural Networks with Constant Memory using a New Execution Algorithm


Title	Training Large Neural Networks with Constant Memory using a New Execution Algorithm
Authors	Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Sujeeth Bharadwaj
Abstract	Widely popular transformer-based NLP models such as BERT and GPT have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism. In this paper, we introduce a new relay-style execution technique called L2L (layer-to-layer) where at any given moment, the device memory is primarily populated only with the executing layer(s)‘s footprint. The model resides in the DRAM memory attached to either a CPU or an FPGA as an entity we call eager param-server (EPS). Unlike a traditional param-server, EPS transmits the model piecemeal to the devices thereby allowing it to perform other tasks in the background such as reduction and distributed optimization. To overcome the bandwidth issues of shuttling parameters to and from EPS, the model is executed a layer at a time across many micro-batches instead of the conventional method of minibatches over whole model. In this paper, we explore a conservative version of L2L that is implemented on a modest Azure instance for BERT-Large running it with a batch size of 32 on a single V100 GPU using less than 8GB memory. Our results show a more stable learning curve, faster convergence, better accuracy and 35% reduction in memory compared to the state-of-the-art baseline. Our method reproduces BERT results on any mid-level GPU that was hitherto not feasible. L2L scales to arbitrary depth without impacting memory or devices allowing researchers to develop affordable devices. It also enables dynamic approaches such as neural architecture search. This work has been performed on GPUs first but also targeted towards high TFLOPS/Watt accelerators such as Graphcore IPUs. The code will soon be available on github.
Tasks	Distributed Optimization, Neural Architecture Search
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05645v3
PDF	https://arxiv.org/pdf/2002.05645v3.pdf
PWC	https://paperswithcode.com/paper/training-large-neural-networks-with-constant
Repo
Framework

The use of Convolutional Neural Networks for signal-background classification in Particle Physics experiments


Title	The use of Convolutional Neural Networks for signal-background classification in Particle Physics experiments
Authors	Venkitesh Ayyar, Wahid Bhimji, Lisa Gerhardt, Sally Robertson, Zahra Ronaghi
Abstract	The success of Convolutional Neural Networks (CNNs) in image classification has prompted efforts to study their use for classifying image data obtained in Particle Physics experiments. Here, we discuss our efforts to apply CNNs to 2D and 3D image data from particle physics experiments to classify signal from background. In this work we present an extensive convolutional neural architecture search, achieving high accuracy for signal/background discrimination for a HEP classification use-case based on simulated data from the Ice Cube neutrino observatory and an ATLAS-like detector. We demonstrate among other things that we can achieve the same accuracy as complex ResNet architectures with CNNs with less parameters, and present comparisons of computational requirements, training and inference times.
Tasks	Image Classification, Neural Architecture Search
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05761v1
PDF	https://arxiv.org/pdf/2002.05761v1.pdf
PWC	https://paperswithcode.com/paper/the-use-of-convolutional-neural-networks-for
Repo
Framework

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks


Title	Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
Authors	Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, Yiyu Shi
Abstract	Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs). However, it remains an open problem, how to integrate NAS with Application-Specific Integrated Circuits (ASICs), despite them being the most powerful AI accelerating platforms. The major bottleneck comes from the large design freedom associated with ASIC designs. Moreover, with the consideration that multiple DNNs will run in parallel for different workloads with diverse layer operations and sizes, integrating heterogeneous ASIC sub-accelerators for distinct DNNs in one design can significantly boost performance, and at the same time further complicate the design space. To address these challenges, in this paper we build ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced. Based on the templates, we further propose a framework, namely NASAIC, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design, such that the design specifications (specs) can be satisfied, while the accuracy can be maximized. Experimental results show that compared with successive NAS and ASIC design optimizations which lead to design spec violations, NASAIC can guarantee the results to meet the design specs with 17.77%, 2.49x, and 2.32x reductions on latency, energy, and area and with 0.76% accuracy loss. To the best of the authors’ knowledge, this is the first work on neural architecture and ASIC accelerator design co-exploration.
Tasks	Neural Architecture Search
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04116v1
PDF	https://arxiv.org/pdf/2002.04116v1.pdf
PWC	https://paperswithcode.com/paper/co-exploration-of-neural-architectures-and
Repo
Framework