January 28, 2020

2971 words 14 mins read

Paper Group ANR 797

Symphony of high-dimensional brain. Deep-learning-based Breast CT for Radiation Dose Reduction. Neuromorphic Electronic Systems for Reservoir Computing. Approximation Rates for Neural Networks with General Activation Functions. AugFPN: Improving Multi-scale Feature Learning for Object Detection. Learning to Score Behaviors for Guided Policy Optimiz …

Symphony of high-dimensional brain


Title	Symphony of high-dimensional brain
Authors	Alexander N. Gorban, Valeri A. Makarov, Ivan Y. Tyukin
Abstract	This paper is the final part of the scientific discussion organised by the Journal “Physics of Life Rviews” about the simplicity revolution in neuroscience and AI. This discussion was initiated by the review paper “The unreasonable effectiveness of small neural ensembles in high-dimensional brain”. Phys Life Rev 2019, doi 10.1016/j.plrev.2018.09.005, arXiv:1809.07656. The topics of the discussion varied from the necessity to take into account the difference between the theoretical random distributions and “extremely non-random” real distributions and revise the common machine learning theory, to different forms of the curse of dimensionality and high-dimensional pitfalls in neuroscience. V. K{\r{u}}rkov{'a}, A. Tozzi and J.F. Peters, R. Quian Quiroga, P. Varona, R. Barrio, G. Kreiman, L. Fortuna, C. van Leeuwen, R. Quian Quiroga, and V. Kreinovich, A.N. Gorban, V.A. Makarov, and I.Y. Tyukin participated in the discussion. In this paper we analyse the symphony of opinions and the possible outcomes of the simplicity revolution for machine learning and neuroscience.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1906.12222v1
PDF	https://arxiv.org/pdf/1906.12222v1.pdf
PWC	https://paperswithcode.com/paper/symphony-of-high-dimensional-brain
Repo
Framework

Deep-learning-based Breast CT for Radiation Dose Reduction


Title	Deep-learning-based Breast CT for Radiation Dose Reduction
Authors	Wenxiang Cong, Hongming Shan, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Ge Wang
Abstract	Cone-beam breast computed tomography (CT) provides true 3D breast images with isotropic resolution and high-contrast information, detecting calcifications as small as a few hundred microns and revealing subtle tissue differences. However, breast is highly sensitive to x-ray radiation. It is critically important for healthcare to reduce radiation dose. Few-view cone-beam CT only uses a fraction of x-ray projection data acquired by standard cone-beam breast CT, enabling significant reduction of the radiation dose. However, insufficient sampling data would cause severe streak artifacts in CT images reconstructed using conventional methods. In this study, we propose a deep-learning-based method to establish a residual neural network model for the image reconstruction, which is applied for few-view breast CT to produce high quality breast CT images. We respectively evaluate the deep-learning-based image reconstruction using one third and one quarter of x-ray projection views of the standard cone-beam breast CT. Based on clinical breast imaging dataset, we perform a supervised learning to train the neural network from few-view CT images to corresponding full-view CT images. Experimental results show that the deep learning-based image reconstruction method allows few-view breast CT to achieve a radiation dose <6 mGy per cone-beam CT scan, which is a threshold set by FDA for mammographic screening.
Tasks	Computed Tomography (CT), Image Reconstruction
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11721v1
PDF	https://arxiv.org/pdf/1909.11721v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-breast-ct-for-radiation
Repo
Framework

Neuromorphic Electronic Systems for Reservoir Computing


Title	Neuromorphic Electronic Systems for Reservoir Computing
Authors	Fatemeh Hadaeghi
Abstract	This chapter provides a comprehensive survey of the researches and motivations for hardware implementation of reservoir computing (RC) on neuromorphic electronic systems. Due to its computational efficiency and the fact that training amounts to a simple linear regression, both spiking and non-spiking implementations of reservoir computing on neuromorphic hardware have been developed. Here, a review of these experimental studies is provided to illustrate the progress in this area and to address the technical challenges which arise from this specific hardware implementation. Moreover, to deal with challenges of computation on such unconventional substrates, several lines of potential solutions are presented based on advances in other computational approaches in machine learning. keywords: Analog Microchips, FPGA, Memristors, Neuromorphic Architectures, Reservoir Computing
Tasks
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09572v1
PDF	https://arxiv.org/pdf/1908.09572v1.pdf
PWC	https://paperswithcode.com/paper/neuromorphic-electronic-systems-for-reservoir
Repo
Framework

Approximation Rates for Neural Networks with General Activation Functions


Title	Approximation Rates for Neural Networks with General Activation Functions
Authors	Jonathan W. Siegel, Jinchao Xu
Abstract	We prove some new results concerning the approximation rate of neural networks with general activation functions. Our first result concerns the rate of approximation of a two layer neural network with a polynomially-decaying non-sigmoidal activation function. We extend the dimension independent approximation rates previously obtained to this new class of activation functions. Our second result gives a weaker, but still dimension independent, approximation rate for a larger class of activation functions, removing the polynomial decay assumption. This result applies to any bounded, integrable activation function. Finally, we show that a stratified sampling approach can be used to improve the approximation rate for polynomially decaying activation functions under mild additional assumptions.
Tasks
Published	2019-04-04
URL	https://arxiv.org/abs/1904.02311v3
PDF	https://arxiv.org/pdf/1904.02311v3.pdf
PWC	https://paperswithcode.com/paper/on-the-approximation-properties-of-neural
Repo
Framework

AugFPN: Improving Multi-scale Feature Learning for Object Detection


Title	AugFPN: Improving Multi-scale Feature Learning for Object Detection
Authors	Chaoxu Guo, Bin Fan, Qian Zhang, Shiming Xiang, Chunhong Pan
Abstract	Current state-of-the-art detectors typically exploit feature pyramid to detect objects at different scales. Among them, FPN is one of the representative works that build a feature pyramid by multi-scale features summation. However, the design defects behind prevent the multi-scale features from being fully exploited. In this paper, we begin by first analyzing the design defects of feature pyramid in FPN, and then introduce a new feature pyramid architecture named AugFPN to address these problems. Specifically, AugFPN consists of three components: Consistent Supervision, Residual Feature Augmentation, and Soft RoI Selection. AugFPN narrows the semantic gaps between features of different scales before feature fusion through Consistent Supervision. In feature fusion, ratio-invariant context information is extracted by Residual Feature Augmentation to reduce the information loss of feature map at the highest pyramid level. Finally, Soft RoI Selection is employed to learn a better RoI feature adaptively after feature fusion. By replacing FPN with AugFPN in Faster R-CNN, our models achieve 2.3 and 1.6 points higher Average Precision (AP) when using ResNet50 and MobileNet-v2 as backbone respectively. Furthermore, AugFPN improves RetinaNet by 1.6 points AP and FCOS by 0.9 points AP when using ResNet50 as backbone. Codes will be made available.
Tasks	Object Detection
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05384v1
PDF	https://arxiv.org/pdf/1912.05384v1.pdf
PWC	https://paperswithcode.com/paper/augfpn-improving-multi-scale-feature-learning
Repo
Framework

Learning to Score Behaviors for Guided Policy Optimization


Title	Learning to Score Behaviors for Guided Policy Optimization
Authors	Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan
Abstract	We introduce a new approach for comparing reinforcement learning policies, using Wasserstein distances (WDs) in a newly defined latent behavioral space. We show that by utilizing the dual formulation of the WD, we can learn score functions over policy behaviors that can in turn be used to lead policy optimization towards (or away from) (un)desired behaviors. Combined with smoothed WDs, the dual formulation allows us to devise efficient algorithms that take stochastic gradient descent steps through WD regularizers. We incorporate these regularizers into two novel on-policy algorithms, Behavior-Guided Policy Gradient and Behavior-Guided Evolution Strategies, which we demonstrate can outperform existing methods in a variety of challenging environments. We also provide an open source demo.
Tasks	Efficient Exploration, Imitation Learning
Published	2019-06-11
URL	https://arxiv.org/abs/1906.04349v4
PDF	https://arxiv.org/pdf/1906.04349v4.pdf
PWC	https://paperswithcode.com/paper/wasserstein-reinforcement-learning
Repo
Framework

Making History Matter: History-Advantage Sequence Training for Visual Dialog


Title	Making History Matter: History-Advantage Sequence Training for Visual Dialog
Authors	Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang
Abstract	We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history. Given a triplet: an image, Q&A history, and current question, all the prevailing methods follow a codec (i.e., encoder-decoder) fashion in a supervised learning paradigm: a multimodal encoder encodes the triplet into a feature vector, which is then fed into the decoder for the current answer generation, supervised by the ground-truth. However, this conventional supervised learning does NOT take into account the impact of imperfect history, violating the conversational nature of visual dialog and thus making the codec more inclined to learn history bias but not contextual reasoning. To this end, inspired by the actor-critic policy gradient in reinforcement learning, we propose a novel training paradigm called History Advantage Sequence Training (HAST). Specifically, we intentionally impose wrong answers in the history, obtaining an adverse critic, and see how the historic error impacts the codec’s future behavior by History Advantage-a quantity obtained by subtracting the adverse critic from the gold reward of ground-truth history. Moreover, to make the codec more sensitive to the history, we propose a novel attention network called History-Aware Co-Attention Network (HACAN) which can be effectively trained by using HAST. Experimental results on three benchmarks: VisDial v0.9&v1.0 and GuessWhat?!, show that the proposed HAST strategy consistently outperforms the state-of-the-art supervised counterparts.
Tasks	Visual Dialog, Visual Reasoning
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09326v3
PDF	http://arxiv.org/pdf/1902.09326v3.pdf
PWC	https://paperswithcode.com/paper/making-history-matter-gold-critic-sequence
Repo
Framework

Transfusion: Understanding Transfer Learning for Medical Imaging


Title	Transfusion: Understanding Transfer Learning for Medical Imaging
Authors	Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio
Abstract	Transfer learning from natural image datasets, particularly ImageNet, using standard large models and corresponding pretrained weights has become a de-facto method for deep learning applications to medical imaging. However, there are fundamental differences in data sizes, features and task specifications between natural image classification and the target medical tasks, and there is little understanding of the effects of transfer. In this paper, we explore properties of transfer learning for medical imaging. A performance evaluation on two large scale medical imaging tasks shows that surprisingly, transfer offers little benefit to performance, and simple, lightweight models can perform comparably to ImageNet architectures. Investigating the learned representations and features, we find that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse. We isolate where useful feature reuse occurs, and outline the implications for more efficient model exploration. We also explore feature independent benefits of transfer arising from weight scalings.
Tasks	Image Classification, Transfer Learning
Published	2019-02-14
URL	https://arxiv.org/abs/1902.07208v3
PDF	https://arxiv.org/pdf/1902.07208v3.pdf
PWC	https://paperswithcode.com/paper/transfusion-understanding-transfer-learning
Repo
Framework

Low-Resource Name Tagging Learned with Weakly Labeled Data


Title	Low-Resource Name Tagging Learned with Weakly Labeled Data
Authors	Yixin Cao, Zikun Hu, Tat-Seng Chua, Zhiyuan Liu, Heng Ji
Abstract	Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency.
Tasks
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09659v1
PDF	https://arxiv.org/pdf/1908.09659v1.pdf
PWC	https://paperswithcode.com/paper/low-resource-name-tagging-learned-with-weakly
Repo
Framework

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection


Title	Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Authors	Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xuancheng Ren, Qi Su, Xu Sun
Abstract	Self-attention based Transformer has demonstrated the state-of-the-art performances in a number of natural language processing tasks. Self-attention is able to model long-term dependencies, but it may suffer from the extraction of irrelevant information in the context. To tackle the problem, we propose a novel model called \textbf{Explicit Sparse Transformer}. Explicit Sparse Transformer is able to improve the concentration of attention on the global context through an explicit selection of the most relevant segments. Extensive experimental results on a series of natural language processing and computer vision tasks, including neural machine translation, image captioning, and language modeling, all demonstrate the advantages of Explicit Sparse Transformer in model performance. We also show that our proposed sparse attention method achieves comparable or better results than the previous sparse attention method, but significantly reduces training and testing time. For example, the inference speed is twice that of sparsemax in Transformer model. Code will be available at \url{https://github.com/lancopku/Explicit-Sparse-Transformer}
Tasks	Image Captioning, Language Modelling, Machine Translation
Published	2019-12-25
URL	https://arxiv.org/abs/1912.11637v1
PDF	https://arxiv.org/pdf/1912.11637v1.pdf
PWC	https://paperswithcode.com/paper/explicit-sparse-transformer-concentrated
Repo
Framework

Recurrent Neural Networks in the Eye of Differential Equations


Title	Recurrent Neural Networks in the Eye of Differential Equations
Authors	Murphy Yuezhen Niu, Lior Horesh, Isaac Chuang
Abstract	To understand the fundamental trade-offs between training stability, temporal dynamics and architectural complexity of recurrent neural networks~(RNNs), we directly analyze RNN architectures using numerical methods of ordinary differential equations~(ODEs). We define a general family of RNNs–the ODERNNs–by relating the composition rules of RNNs to integration methods of ODEs at discrete time steps. We show that the degree of RNN’s functional nonlinearity $n$ and the range of its temporal memory $t$ can be mapped to the corresponding stage of Runge-Kutta recursion and the order of time-derivative of the ODEs. We prove that popular RNN architectures, such as LSTM and URNN, fit into different orders of $n$-$t$-ODERNNs. This exact correspondence between RNN and ODE helps us to establish the sufficient conditions for RNN training stability and facilitates more flexible top-down designs of new RNN architectures using large varieties of toolboxes from numerical integration of ODEs. We provide such an example: Quantum-inspired Universal computing Neural Network~(QUNN), which reduces the required number of training parameters from polynomial in both data length and temporal memory length to only linear in temporal memory length.
Tasks
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12933v1
PDF	http://arxiv.org/pdf/1904.12933v1.pdf
PWC	https://paperswithcode.com/paper/recurrent-neural-networks-in-the-eye-of
Repo
Framework

Incidence Networks for Geometric Deep Learning


Title	Incidence Networks for Geometric Deep Learning
Authors	Marjan Albooyeh, Daniele Bertolini, Siamak Ravanbakhsh
Abstract	Sparse incidence tensors can represent a variety of structured data. For example, we may represent attributed graphs using their node-node, node-edge, or edge-edge incidence matrices. In higher dimensions, incidence tensors can represent simplicial complexes and polytopes. In this paper, we formalize incidence tensors, analyze their structure, and present the family of equivariant networks that operate on them. We show that any incidence tensor decomposes into invariant subsets. This decomposition, in turn, leads to a decomposition of the corresponding equivariant linear maps, for which we prove an efficient pooling-and-broadcasting implementation. We demonstrate the effectiveness of this family of networks by reporting state-of-the-art on graph learning tasks for many targets in the QM9 dataset.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11460v3
PDF	https://arxiv.org/pdf/1905.11460v3.pdf
PWC	https://paperswithcode.com/paper/incidence-networks-for-geometric-deep
Repo
Framework

Differentially Private Learning with Adaptive Clipping


Title	Differentially Private Learning with Adaptive Clipping
Authors	Om Thakkar, Galen Andrew, H. Brendan McMahan
Abstract	We introduce a new adaptive clipping technique for training learning models with user-level differential privacy that removes the need for extensive parameter tuning. Previous approaches to this problem use the Federated Stochastic Gradient Descent or the Federated Averaging algorithm with noised updates, and compute a differential privacy guarantee using the Moments Accountant. These approaches rely on choosing a norm bound for each user’s update to the model, which needs to be tuned carefully. The best value depends on the learning rate, model architecture, number of passes made over each user’s data, and possibly various other parameters. We show that adaptively setting the clipping norm applied to each user’s update, based on a differentially private estimate of a target quantile of the distribution of unclipped norms, is sufficient to remove the need for such extensive parameter tuning.
Tasks
Published	2019-05-09
URL	https://arxiv.org/abs/1905.03871v1
PDF	https://arxiv.org/pdf/1905.03871v1.pdf
PWC	https://paperswithcode.com/paper/differentially-private-learning-with-adaptive
Repo
Framework

From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense


Title	From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
Authors	Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen
Abstract	Visual Question Answering (VQA) is a challenging task for evaluating the ability of comprehensive understanding of the world. Existing benchmarks usually focus on the reasoning abilities either only on the vision or mainly on the knowledge with relatively simple abilities on vision. However, the ability of answering a question that requires alternatively inferring on the image content and the commonsense knowledge is crucial for an advanced VQA system. In this paper, we introduce a VQA dataset that provides more challenging and general questions about Compositional Reasoning on vIsion and Commonsense, which is named as CRIC. To create this dataset, we develop a powerful method to automatically generate compositional questions and rich annotations from both the scene graph of a given image and some external knowledge graph. Moreover, this paper presents a new compositional model that is capable of implementing various types of reasoning functions on the image content and the knowledge graph. Further, we analyze several baselines, state-of-the-art and our model on CRIC dataset. The experimental results show that the proposed task is challenging, where state-of-the-art obtains 52.26% accuracy and our model obtains 58.38%.
Tasks	Question Answering, Visual Question Answering
Published	2019-08-08
URL	https://arxiv.org/abs/1908.02962v2
PDF	https://arxiv.org/pdf/1908.02962v2.pdf
PWC	https://paperswithcode.com/paper/from-two-graphs-to-n-questions-a-vqa-dataset
Repo
Framework

Parallel Algorithm for Approximating Nash Equilibrium in Multiplayer Stochastic Games with Application to Naval Strategic Planning


Title	Parallel Algorithm for Approximating Nash Equilibrium in Multiplayer Stochastic Games with Application to Naval Strategic Planning
Authors	Sam Ganzfried, Conner Laughlin, Charles Morefield
Abstract	Many real-world domains contain multiple agents behaving strategically with probabilistic transitions and uncertain (potentially infinite) duration. Such settings can be modeled as stochastic games. While algorithms have been developed for solving (i.e., computing a game-theoretic solution concept such as Nash equilibrium) two-player zero-sum stochastic games, research on algorithms for non-zero-sum and multiplayer stochastic games is limited. We present a new algorithm for these settings, which constitutes the first parallel algorithm for multiplayer stochastic games. We present experimental results on a 4-player stochastic game motivated by a naval strategic planning scenario, showing that our algorithm is able to quickly compute strategies constituting Nash equilibrium up to a very small degree of approximation error.
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00193v4
PDF	https://arxiv.org/pdf/1910.00193v4.pdf
PWC	https://paperswithcode.com/paper/parallel-algorithm-for-approximating-nash
Repo
Framework