Paper Group ANR 153
Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency. Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels. Modular Simulation Framework for Process Variation Analysis of MRAM-based Deep Belief Networks. Citation Recommendation: Approaches and Datasets. DAF-NET: a saliency based weakly supervise …
Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency
Title | Graph Convolutional Auto-encoder with Bi-decoder and Adaptive-sharing Adjacency |
Authors | Rui Zhang, Yunxing Zhang, Xuelong Li |
Abstract | Graph autoencoder (GAE) serves as an effective unsupervised learning framework to represent graph data in a latent space for network embedding. Most exiting approaches typically focus on minimizing the reconstruction loss of graph structure but neglect the reconstruction of node features, which may result in overfitting due to the capacity of the autoencoders. Additionally, the adjacency matrix in these methods is always fixed such that the adjacency matrix cannot properly represent the connections among nodes in latent space. To solve this problem, in this paper, we propose a novel Graph Convolutional Auto-encoder with Bidecoder and Adaptive-sharing Adjacency method, namely BAGA. The framework encodes the topological structure and node features into latent representations, on which a bi-decoder is trained to reconstruct the graph structure and node features simultaneously. Furthermore, the adjacency matrix can be adaptively updated by the learned latent representations for better representing the connections among nodes in latent space. Experimental results on datasets validate the superiority of our method to the state-of-the-art network embedding methods on the clustering task. |
Tasks | Network Embedding |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04508v1 |
https://arxiv.org/pdf/2003.04508v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-convolutional-auto-encoder-with-bi |
Repo | |
Framework | |
Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels
Title | Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels |
Authors | Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu |
Abstract | Disease diagnosis on chest X-ray images is a challenging multi-label classification task. Previous works generally classify the diseases independently on the input image without considering any correlation among diseases. However, such correlation actually exists, for example, Pleural Effusion is more likely to appear when Pneumothorax is present. In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy. To learn more natural and reliable correlation relationship, we feed each node with the image-level individual feature map corresponding to each type of disease. To our knowledge, our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning. To further deal with a practical issue of incomplete labels, DD-GCN also utilizes an adaptive loss and a curriculum learning strategy to train the model on incomplete labels. Experimental results on two popular chest X-ray (CXR) datasets show that our prediction accuracy outperforms state-of-the-arts, and the learned graph adjacency matrix establishes the correlation representations of different diseases, which is consistent with expert experience. In addition, we apply an ablation study to demonstrate the effectiveness of each component in DD-GCN. |
Tasks | Multi-Label Classification |
Published | 2020-02-26 |
URL | https://arxiv.org/abs/2002.11629v2 |
https://arxiv.org/pdf/2002.11629v2.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-graph-correlation-learning-for |
Repo | |
Framework | |
Modular Simulation Framework for Process Variation Analysis of MRAM-based Deep Belief Networks
Title | Modular Simulation Framework for Process Variation Analysis of MRAM-based Deep Belief Networks |
Authors | Paul Wood, Hossein Pourmeidani, Ronald F. DeMara |
Abstract | Magnetic Random-Access Memory (MRAM) based p-bit neuromorphic computing devices are garnering increasing interest as a means to compactly and efficiently realize machine learning operations in Restricted Boltzmann Machines (RBMs). When embedded within an RBM resistive crossbar array, the p-bit based neuron realizes a tunable sigmoidal activation function. Since the stochasticity of activation is dependent on the energy barrier of the MRAM device, it is essential to assess the impact of process variation on the voltage-dependent behavior of the sigmoid function. Other influential performance factors arise from varying energy barriers on power consumption requiring a simulation environment to facilitate the multi-objective optimization of device and network parameters. Herein, transportable Python scripts are developed to analyze the output variation under changes in device dimensions on the accuracy of machine learning applications. Evaluation with RBM circuits using the MNIST dataset reveal impacts and limits for processing variation of device fabrication in terms of the resulting energy vs. accuracy tradeoffs, and the resulting simulation framework is available via a Creative Commons license. |
Tasks | |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.00897v1 |
https://arxiv.org/pdf/2002.00897v1.pdf | |
PWC | https://paperswithcode.com/paper/modular-simulation-framework-for-process |
Repo | |
Framework | |
Citation Recommendation: Approaches and Datasets
Title | Citation Recommendation: Approaches and Datasets |
Authors | Michael Färber, Adam Jatowt |
Abstract | Citation recommendation describes the task of recommending citations for a given text. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. In recent years, several approaches and evaluation data sets have been presented. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In this article, we give a thorough introduction into automatic citation recommendation research. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. Last but not least, we shed light on the evaluation methods, and outline general challenges in the evaluation and how to meet them. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area. However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06961v1 |
https://arxiv.org/pdf/2002.06961v1.pdf | |
PWC | https://paperswithcode.com/paper/citation-recommendation-approaches-and |
Repo | |
Framework | |
DAF-NET: a saliency based weakly supervised method of dual attention fusion for fine-grained image classification
Title | DAF-NET: a saliency based weakly supervised method of dual attention fusion for fine-grained image classification |
Authors | ZiChao Dong, JiLong Wu, TingTing Ren, Yue Wang, MengYing Ge |
Abstract | Fine-grained image classification is a challenging problem, since the difficulty of finding discriminative features. To handle this circumstance, basically, there are two ways to go. One is use attention based method to focus on informative areas, while the other one aims to find high order between features. Further, for attention based method there are two directions, activation based and detection based, which are proved effective by scholars. However ,rare work focus on fusing two types of attention with high order feature. In this paper, we propose a novel DAF method which fuse two types of attention and use them to as PAF(part attention filter) in deep bilinear transformation module to mine the relationship between separate parts of an object. Briefly, our network constructed by a student net who attempt to output two attention maps and a teacher net uses these two maps as empirical information to refine the result. The experiment result shows that only student net could get 87.6% accuracy in CUB dataset while cooperating with teacher net could achieve 89.1% accuracy. |
Tasks | Fine-Grained Image Classification, Image Classification |
Published | 2020-01-04 |
URL | https://arxiv.org/abs/2001.02219v1 |
https://arxiv.org/pdf/2001.02219v1.pdf | |
PWC | https://paperswithcode.com/paper/daf-net-a-saliency-based-weakly-supervised |
Repo | |
Framework | |
Comments on Sejnowski’s “The unreasonable effectiveness of deep learning in artificial intelligence” [arXiv:2002.04806]
Title | Comments on Sejnowski’s “The unreasonable effectiveness of deep learning in artificial intelligence” [arXiv:2002.04806] |
Authors | Leslie S. Smith |
Abstract | Terry Sejnowski’s 2020 paper [arXiv:2002.04806] is entitled “The unreasonable effectiveness of deep learning in artificial intelligence”. However, the paper itself doesn’t attempt to answer the implied question of why Deep Convolutional Neural Networks (DCNNs) can approximate so many of the mappings that they have been trained to model. There are detailed mathematical analyses, but this short paper attempts to look at the issue differently, considering the way that these networks are used, the subset of these functions that can be achieved by training (starting from some location in the original function space), as well as the functions that in reality will be modelled. |
Tasks | |
Published | 2020-03-20 |
URL | https://arxiv.org/abs/2003.09415v1 |
https://arxiv.org/pdf/2003.09415v1.pdf | |
PWC | https://paperswithcode.com/paper/comments-on-sejnowskis-the-unreasonable |
Repo | |
Framework | |
Multifold Acceleration of Diffusion MRI via Slice-Interleaved Diffusion Encoding (SIDE)
Title | Multifold Acceleration of Diffusion MRI via Slice-Interleaved Diffusion Encoding (SIDE) |
Authors | Yoonmi Hong, Wei-Tang Chang, Geng Chen, Ye Wu, Weili Lin, Dinggang Shen, Pew-Thian Yap |
Abstract | Diffusion MRI (dMRI) is a unique imaging technique for in vivo characterization of tissue microstructure and white matter pathways. However, its relatively long acquisition time implies greater motion artifacts when imaging, for example, infants and Parkinson’s disease patients. To accelerate dMRI acquisition, we propose in this paper (i) a diffusion encoding scheme, called Slice-Interleaved Diffusion Encoding (SIDE), that interleaves each diffusion-weighted (DW) image volume with slices that are encoded with different diffusion gradients, essentially allowing the slice-undersampling of image volume associated with each diffusion gradient to significantly reduce acquisition time, and (ii) a method based on deep learning for effective reconstruction of DW images from the highly slice-undersampled data. Evaluation based on the Human Connectome Project (HCP) dataset indicates that our method can achieve a high acceleration factor of up to 6 with minimal information loss. Evaluation using dMRI data acquired with SIDE acquisition demonstrates that it is possible to accelerate the acquisition by as much as 50 folds when combined with multi-band imaging. |
Tasks | |
Published | 2020-02-25 |
URL | https://arxiv.org/abs/2002.10908v1 |
https://arxiv.org/pdf/2002.10908v1.pdf | |
PWC | https://paperswithcode.com/paper/multifold-acceleration-of-diffusion-mri-via |
Repo | |
Framework | |
Concentration Inequalities for Multinoulli Random Variables
Title | Concentration Inequalities for Multinoulli Random Variables |
Authors | Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric |
Abstract | We investigate concentration inequalities for Dirichlet and Multinomial random variables. |
Tasks | |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11595v1 |
https://arxiv.org/pdf/2001.11595v1.pdf | |
PWC | https://paperswithcode.com/paper/concentration-inequalities-for-multinoulli |
Repo | |
Framework | |
Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
Title | Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation |
Authors | Alessandro Raganato, Yves Scherrer, Jörg Tiedemann |
Abstract | Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that attention heads learn simple positional patterns which are often redundant. In this paper, we propose to replace all but one attention head of each encoder layer with fixed – non-learnable – attentive patterns that are solely based on position and do not require any external knowledge. Our experiments show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios. |
Tasks | Machine Translation |
Published | 2020-02-24 |
URL | https://arxiv.org/abs/2002.10260v1 |
https://arxiv.org/pdf/2002.10260v1.pdf | |
PWC | https://paperswithcode.com/paper/fixed-encoder-self-attention-patterns-in |
Repo | |
Framework | |
Efficient Probabilistic Logic Reasoning with Graph Neural Networks
Title | Efficient Probabilistic Logic Reasoning with Graph Neural Networks |
Authors | Yuyu Zhang, Xinshi Chen, Yuan Yang, Arun Ramamurthy, Bo Li, Yuan Qi, Le Song |
Abstract | Markov Logic Networks (MLNs), which elegantly combine logic rules and probabilistic graphical models, can be used to address many knowledge graph problems. However, inference in MLN is computationally intensive, making the industrial-scale application of MLN very difficult. In recent years, graph neural networks (GNNs) have emerged as efficient and effective tools for large-scale graph problems. Nevertheless, GNNs do not explicitly incorporate prior logic rules into the models, and may require many labeled examples for a target task. In this paper, we explore the combination of MLNs and GNNs, and use graph neural networks for variational inference in MLN. We propose a GNN variant, named ExpressGNN, which strikes a nice balance between the representation power and the simplicity of the model. Our extensive experiments on several benchmark datasets demonstrate that ExpressGNN leads to effective and efficient probabilistic logic reasoning. |
Tasks | |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.11850v2 |
https://arxiv.org/pdf/2001.11850v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-probabilistic-logic-reasoning-with-1 |
Repo | |
Framework | |
Neural Architecture Search For Fault Diagnosis
Title | Neural Architecture Search For Fault Diagnosis |
Authors | Xudong Li, Yang Hu, Jianhua Zheng, Mingtao Li |
Abstract | Data-driven methods have made great progress in fault diagnosis, especially deep learning method. Deep learning is suitable for processing big data, and has a strong feature extraction ability to realize end-to-end fault diagnosis systems. However, designing neural network architecture requires rich professional knowledge and debugging experience, and a lot of experiments are needed to screen models and hyperparameters, increasing the difficulty of developing deep learning models. Frortunately, neural architecture search (NAS) is developing rapidly, and is becoming one of the next directions for deep learning. In this paper, we proposed a NAS method for fault diagnosis using reinforcement learning. A recurrent neural network is used as an agent to generate network architecture. The accuracy of the generated network on the validation dataset is fed back to the agent as a reward, and the parameters of the agent are updated through the strategy gradient algorithm. We use PHM 2009 Data Challenge gearbox dataset to prove the effectiveness of proposed method, and obtain state-of-the-art results compared with other artificial designed network structures. To author’s best knowledge, it’s the first time that NAS has been applied in fault diagnosis. |
Tasks | Neural Architecture Search |
Published | 2020-02-19 |
URL | https://arxiv.org/abs/2002.07997v1 |
https://arxiv.org/pdf/2002.07997v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-architecture-search-for-fault |
Repo | |
Framework | |
Learning Multivariate Hawkes Processes at Scale
Title | Learning Multivariate Hawkes Processes at Scale |
Authors | Maximilian Nickel, Matthew Le |
Abstract | Multivariate Hawkes Processes (MHPs) are an important class of temporal point processes that have enabled key advances in understanding and predicting social information systems. However, due to their complex modeling of temporal dependencies, MHPs have proven to be notoriously difficult to scale, what has limited their applications to relatively small domains. In this work, we propose a novel model and computational approach to overcome this important limitation. By exploiting a characteristic sparsity pattern in real-world diffusion processes, we show that our approach allows to compute the exact likelihood and gradients of an MHP – independently of the ambient dimensions of the underlying network. We show on synthetic and real-world datasets that our model does not only achieve state-of-the-art predictive results, but also improves runtime performance by multiple orders of magnitude compared to standard methods on sparse event sequences. In combination with easily interpretable latent variables and influence structures, this allows us to analyze diffusion processes at previously unattainable scale. |
Tasks | Point Processes |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12501v1 |
https://arxiv.org/pdf/2002.12501v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-multivariate-hawkes-processes-at |
Repo | |
Framework | |
Training Large Neural Networks with Constant Memory using a New Execution Algorithm
Title | Training Large Neural Networks with Constant Memory using a New Execution Algorithm |
Authors | Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Sujeeth Bharadwaj |
Abstract | Widely popular transformer-based NLP models such as BERT and GPT have enormous capacity trending to billions of parameters. Current execution methods demand brute-force resources such as HBM devices and high speed interconnectivity for data parallelism. In this paper, we introduce a new relay-style execution technique called L2L (layer-to-layer) where at any given moment, the device memory is primarily populated only with the executing layer(s)‘s footprint. The model resides in the DRAM memory attached to either a CPU or an FPGA as an entity we call eager param-server (EPS). Unlike a traditional param-server, EPS transmits the model piecemeal to the devices thereby allowing it to perform other tasks in the background such as reduction and distributed optimization. To overcome the bandwidth issues of shuttling parameters to and from EPS, the model is executed a layer at a time across many micro-batches instead of the conventional method of minibatches over whole model. In this paper, we explore a conservative version of L2L that is implemented on a modest Azure instance for BERT-Large running it with a batch size of 32 on a single V100 GPU using less than 8GB memory. Our results show a more stable learning curve, faster convergence, better accuracy and 35% reduction in memory compared to the state-of-the-art baseline. Our method reproduces BERT results on any mid-level GPU that was hitherto not feasible. L2L scales to arbitrary depth without impacting memory or devices allowing researchers to develop affordable devices. It also enables dynamic approaches such as neural architecture search. This work has been performed on GPUs first but also targeted towards high TFLOPS/Watt accelerators such as Graphcore IPUs. The code will soon be available on github. |
Tasks | Distributed Optimization, Neural Architecture Search |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05645v3 |
https://arxiv.org/pdf/2002.05645v3.pdf | |
PWC | https://paperswithcode.com/paper/training-large-neural-networks-with-constant |
Repo | |
Framework | |
The use of Convolutional Neural Networks for signal-background classification in Particle Physics experiments
Title | The use of Convolutional Neural Networks for signal-background classification in Particle Physics experiments |
Authors | Venkitesh Ayyar, Wahid Bhimji, Lisa Gerhardt, Sally Robertson, Zahra Ronaghi |
Abstract | The success of Convolutional Neural Networks (CNNs) in image classification has prompted efforts to study their use for classifying image data obtained in Particle Physics experiments. Here, we discuss our efforts to apply CNNs to 2D and 3D image data from particle physics experiments to classify signal from background. In this work we present an extensive convolutional neural architecture search, achieving high accuracy for signal/background discrimination for a HEP classification use-case based on simulated data from the Ice Cube neutrino observatory and an ATLAS-like detector. We demonstrate among other things that we can achieve the same accuracy as complex ResNet architectures with CNNs with less parameters, and present comparisons of computational requirements, training and inference times. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.05761v1 |
https://arxiv.org/pdf/2002.05761v1.pdf | |
PWC | https://paperswithcode.com/paper/the-use-of-convolutional-neural-networks-for |
Repo | |
Framework | |
Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks
Title | Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks |
Authors | Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, Yiyu Shi |
Abstract | Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs). However, it remains an open problem, how to integrate NAS with Application-Specific Integrated Circuits (ASICs), despite them being the most powerful AI accelerating platforms. The major bottleneck comes from the large design freedom associated with ASIC designs. Moreover, with the consideration that multiple DNNs will run in parallel for different workloads with diverse layer operations and sizes, integrating heterogeneous ASIC sub-accelerators for distinct DNNs in one design can significantly boost performance, and at the same time further complicate the design space. To address these challenges, in this paper we build ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced. Based on the templates, we further propose a framework, namely NASAIC, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design, such that the design specifications (specs) can be satisfied, while the accuracy can be maximized. Experimental results show that compared with successive NAS and ASIC design optimizations which lead to design spec violations, NASAIC can guarantee the results to meet the design specs with 17.77%, 2.49x, and 2.32x reductions on latency, energy, and area and with 0.76% accuracy loss. To the best of the authors’ knowledge, this is the first work on neural architecture and ASIC accelerator design co-exploration. |
Tasks | Neural Architecture Search |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.04116v1 |
https://arxiv.org/pdf/2002.04116v1.pdf | |
PWC | https://paperswithcode.com/paper/co-exploration-of-neural-architectures-and |
Repo | |
Framework | |