April 1, 2020

3042 words 15 mins read

Paper Group ANR 468

Paper Group ANR 468

BUSU-Net: An Ensemble U-Net Framework for Medical Image Segmentation. Accelerator-aware Neural Network Design using AutoML. Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator. Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems. Vulnerabilities of Connectionist AI Applica …

BUSU-Net: An Ensemble U-Net Framework for Medical Image Segmentation

Title BUSU-Net: An Ensemble U-Net Framework for Medical Image Segmentation
Authors Wei Hao Khoong
Abstract In recent years, convolutional neural networks (CNNs) have revolutionized medical image analysis. One of the most well-known CNN architectures in semantic segmentation is the U-net, which has achieved much success in several medical image segmentation applications. Also more recently, with the rise of autoML ad advancements in neural architecture search (NAS), methods like NAS-Unet have been proposed for NAS in medical image segmentation. In this paper, with inspiration from LadderNet, U-Net, autoML and NAS, we propose an ensemble deep neural network with an underlying U-Net framework consisting of bi-directional convolutional LSTMs and dense connections, where the first (from left) U-Net-like network is deeper than the second (from left). We show that this ensemble network outperforms recent state-of-the-art networks in several evaluation metrics, and also evaluate a lightweight version of this ensemble network, which also outperforms recent state-of-the-art networks in some evaluation metrics.
Tasks AutoML, Medical Image Segmentation, Neural Architecture Search, Semantic Segmentation
Published 2020-03-03
URL https://arxiv.org/abs/2003.01581v2
PDF https://arxiv.org/pdf/2003.01581v2.pdf
PWC https://paperswithcode.com/paper/busu-net-an-ensemble-u-net-framework-for

Accelerator-aware Neural Network Design using AutoML

Title Accelerator-aware Neural Network Design using AutoML
Authors Suyog Gupta, Berkin Akin
Abstract While neural network hardware accelerators provide a substantial amount of raw compute throughput, the models deployed on them must be co-designed for the underlying hardware architecture to obtain the optimal system performance. We present a class of computer vision models designed using hardware-aware neural architecture search and customized to run on the Edge TPU, Google’s neural network hardware accelerator for low-power, edge devices. For the Edge TPU in Coral devices, these models enable real-time image classification performance while achieving accuracy typically seen only with larger, compute-heavy models running in data centers. On Pixel 4’s Edge TPU, these models improve the accuracy-latency tradeoff over existing SoTA mobile models.
Tasks AutoML, Image Classification, Neural Architecture Search
Published 2020-03-05
URL https://arxiv.org/abs/2003.02838v1
PDF https://arxiv.org/pdf/2003.02838v1.pdf
PWC https://paperswithcode.com/paper/accelerator-aware-neural-network-design-using

Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator

Title Best of Both Worlds: AutoML Codesign of a CNN and its Hardware Accelerator
Authors Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, Nicholas D. Lane
Abstract Neural architecture search (NAS) has been very successful at outperforming human-designed convolutional neural networks (CNN) in accuracy, and when hardware information is present, latency as well. However, NAS-designed CNNs typically have a complicated topology, therefore, it may be difficult to design a custom hardware (HW) accelerator for such CNNs. We automate HW-CNN codesign using NAS by including parameters from both the CNN model and the HW accelerator, and we jointly search for the best model-accelerator pair that boosts accuracy and efficiency. We call this Codesign-NAS. In this paper we focus on defining the Codesign-NAS multiobjective optimization problem, demonstrating its effectiveness, and exploring different ways of navigating the codesign search space. For CIFAR-10 image classification, we enumerate close to 4 billion model-accelerator pairs, and find the Pareto frontier within that large search space. This allows us to evaluate three different reinforcement-learning-based search strategies. Finally, compared to ResNet on its most optimal HW accelerator from within our HW design space, we improve on CIFAR-100 classification accuracy by 1.3% while simultaneously increasing performance/area by 41% in just~1000 GPU-hours of running Codesign-NAS.
Tasks AutoML, Image Classification, Multiobjective Optimization, Neural Architecture Search
Published 2020-02-11
URL https://arxiv.org/abs/2002.05022v2
PDF https://arxiv.org/pdf/2002.05022v2.pdf
PWC https://paperswithcode.com/paper/best-of-both-worlds-automl-codesign-of-a-cnn

Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems

Title Trust in AutoML: Exploring Information Needs for Establishing Trust in Automated Machine Learning Systems
Authors Jaimie Drozdal, Justin Weisz, Dakuo Wang, Gaurav Dass, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju, Hui Su
Abstract We explore trust in a relatively new area of data science: Automated Machine Learning (AutoML). In AutoML, AI methods are used to generate and optimize machine learning models by automatically engineering features, selecting models, and optimizing hyperparameters. In this paper, we seek to understand what kinds of information influence data scientists’ trust in the models produced by AutoML? We operationalize trust as a willingness to deploy a model produced using automated methods. We report results from three studies – qualitative interviews, a controlled experiment, and a card-sorting task – to understand the information needs of data scientists for establishing trust in AutoML systems. We find that including transparency features in an AutoML tool increased user trust and understandability in the tool; and out of all proposed features, model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.
Tasks AutoML
Published 2020-01-17
URL https://arxiv.org/abs/2001.06509v1
PDF https://arxiv.org/pdf/2001.06509v1.pdf
PWC https://paperswithcode.com/paper/trust-in-automl-exploring-information-needs

Vulnerabilities of Connectionist AI Applications: Evaluation and Defence

Title Vulnerabilities of Connectionist AI Applications: Evaluation and Defence
Authors Christian Berghoff, Matthias Neu, Arndt von Twickel
Abstract This article deals with the IT security of connectionist artificial intelligence (AI) applications, focusing on threats to integrity, one of the three IT security goals. Such threats are for instance most relevant in prominent AI computer vision applications. In order to present a holistic view on the IT security goal integrity, many additional aspects such as interpretability, robustness and documentation are taken into account. A comprehensive list of threats and possible mitigations is presented by reviewing the state-of-the-art literature. AI-specific vulnerabilities such as adversarial attacks and poisoning attacks as well as their AI-specific root causes are discussed in detail. Additionally and in contrast to former reviews, the whole AI supply chain is analysed with respect to vulnerabilities, including the planning, data acquisition, training, evaluation and operation phases. The discussion of mitigations is likewise not restricted to the level of the AI system itself but rather advocates viewing AI systems in the context of their supply chains and their embeddings in larger IT infrastructures and hardware devices. Based on this and the observation that adaptive attackers may circumvent any single published AI-specific defence to date, the article concludes that single protective measures are not sufficient but rather multiple measures on different levels have to be combined to achieve a minimum level of IT security for AI applications.
Published 2020-03-18
URL https://arxiv.org/abs/2003.08837v1
PDF https://arxiv.org/pdf/2003.08837v1.pdf
PWC https://paperswithcode.com/paper/vulnerabilities-of-connectionist-ai

Polygames: Improved Zero Learning

Title Polygames: Improved Zero Learning
Authors Tristan Cazenave, Yen-Chi Chen, Guan-Wei Chen, Shi-Yu Chen, Xian-Dong Chiu, Julien Dehos, Maria Elsa, Qucheng Gong, Hengyuan Hu, Vasil Khalidov, Cheng-Ling Li, Hsin-I Lin, Yu-Jin Lin, Xavier Martinet, Vegard Mella, Jeremy Rapin, Baptiste Roziere, Gabriel Synnaeve, Fabien Teytaud, Olivier Teytaud, Shi-Cheng Ye, Yi-Jun Ye, Shi-Jim Yen, Sergey Zagoruyko
Abstract Since DeepMind’s AlphaZero, Zero learning quickly became the state-of-the-art method for many board games. It can be improved using a fully convolutional structure (no fully connected layer). Using such an architecture plus global pooling, we can create bots independent of the board size. The training can be made more robust by keeping track of the best checkpoints during the training and by training against them. Using these features, we release Polygames, our framework for Zero learning, with its library of games and its checkpoints. We won against strong humans at the game of Hex in 19x19, which was often said to be untractable for zero learning; and in Havannah. We also won several first places at the TAAI competitions.
Tasks Board Games
Published 2020-01-27
URL https://arxiv.org/abs/2001.09832v1
PDF https://arxiv.org/pdf/2001.09832v1.pdf
PWC https://paperswithcode.com/paper/polygames-improved-zero-learning

Certified and fast computations with shallow covariance kernels

Title Certified and fast computations with shallow covariance kernels
Authors Daniel Kressner, Jonas Latz, Stefano Massei, Elisabeth Ullmann
Abstract Many techniques for data science and uncertainty quantification demand efficient tools to handle Gaussian random fields, which are defined in terms of their mean functions and covariance operators. Recently, parameterized Gaussian random fields have gained increased attention, due to their higher degree of flexibility. However, especially if the random field is parameterized through its covariance operator, classical random field discretization techniques fail or become inefficient. In this work we introduce and analyze a new and certified algorithm for the low-rank approximation of a parameterized family of covariance operators which represents an extension of the adaptive cross approximation method for symmetric positive definite matrices. The algorithm relies on an affine linear expansion of the covariance operator with respect to the parameters, which needs to be computed in a preprocessing step using, e.g., the empirical interpolation method. We discuss and test our new approach for isotropic covariance kernels, such as Mat'ern kernels. The numerical results demonstrate the advantages of our approach in terms of computational time and confirm that the proposed algorithm provides the basis of a fast sampling procedure for parameter dependent Gaussian random fields.
Published 2020-01-24
URL https://arxiv.org/abs/2001.09187v3
PDF https://arxiv.org/pdf/2001.09187v3.pdf
PWC https://paperswithcode.com/paper/certified-and-fast-computations-with-shallow

A Simple General Approach to Balance Task Difficulty in Multi-Task Learning

Title A Simple General Approach to Balance Task Difficulty in Multi-Task Learning
Authors Sicong Liang, Yu Zhang
Abstract In multi-task learning, difficulty levels of different tasks are varying. There are many works to handle this situation and we classify them into five categories, including the direct sum approach, the weighted sum approach, the maximum approach, the curriculum learning approach, and the multi-objective optimization approach. Those approaches have their own limitations, for example, using manually designed rules to update task weights, non-smooth objective function, and failing to incorporate other functions than training losses. In this paper, to alleviate those limitations, we propose a Balanced Multi-Task Learning (BMTL) framework. Different from existing studies which rely on task weighting, the BMTL framework proposes to transform the training loss of each task to balance difficulty levels among tasks based on an intuitive idea that tasks with larger training losses will receive more attention during the optimization procedure. We analyze the transformation function and derive necessary conditions. The proposed BMTL framework is very simple and it can be combined with most multi-task learning models. Empirical studies show the state-of-the-art performance of the proposed BMTL framework.
Tasks Multi-Task Learning
Published 2020-02-12
URL https://arxiv.org/abs/2002.04792v1
PDF https://arxiv.org/pdf/2002.04792v1.pdf
PWC https://paperswithcode.com/paper/a-simple-general-approach-to-balance-task

Discovering Symmetry Invariants and Conserved Quantities by Interpreting Siamese Neural Networks

Title Discovering Symmetry Invariants and Conserved Quantities by Interpreting Siamese Neural Networks
Authors Sebastian J. Wetzel, Roger G. Melko, Joseph Scott, Maysum Panju, Vijay Ganesh
Abstract In this paper, we introduce interpretable Siamese Neural Networks (SNN) for similarity detection to the field of theoretical physics. More precisely, we apply SNNs to events in special relativity, the transformation of electromagnetic fields, and the motion of particles in a central potential. In these examples, the SNNs learn to identify datapoints belonging to the same events, field configurations, or trajectory of motion. It turns out that in the process of learning which datapoints belong to the same event or field configuration, these SNNs also learn the relevant symmetry invariants and conserved quantities. These SNNs are highly interpretable, which enables us to reveal the symmetry invariants and conserved quantities without prior knowledge.
Published 2020-03-09
URL https://arxiv.org/abs/2003.04299v2
PDF https://arxiv.org/pdf/2003.04299v2.pdf
PWC https://paperswithcode.com/paper/discovering-symmetry-invariants-and-conserved

Deep combinatorial optimisation for optimal stopping time problems and stochastic impulse control. Application to swing options pricing and fixed transaction costs options hedging

Title Deep combinatorial optimisation for optimal stopping time problems and stochastic impulse control. Application to swing options pricing and fixed transaction costs options hedging
Authors Thomas Deschatre, Joseph Mikael
Abstract A new method for stochastic control based on neural networks and using randomisation of discrete random variables is proposed and applied to optimal stopping time problems. Numerical tests are done on the pricing of American and swing options. An extension to impulse control problems is described and applied to options hedging under fixed transaction costs. The proposed algorithms seem to be competitive with the best existing algorithms both in terms of precision and in terms of computation time.
Published 2020-01-30
URL https://arxiv.org/abs/2001.11247v1
PDF https://arxiv.org/pdf/2001.11247v1.pdf
PWC https://paperswithcode.com/paper/deep-combinatorial-optimisation-for-optimal

Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

Title Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework
Authors Dinghuai Zhang, Mao Ye, Chengyue Gong, Zhanxing Zhu, Qiang Liu
Abstract Randomized classifiers have been shown to provide a promising approach for achieving certified robustness against adversarial attacks in deep learning. However, most existing methods only leverage Gaussian smoothing noise and only work for $\ell_2$ perturbation. We propose a general framework of adversarial certification with non-Gaussian noise and for more general types of attacks, from a unified functional optimization perspective. Our new framework allows us to identify a key trade-off between accuracy and robustness via designing smoothing distributions, helping to design new families of non-Gaussian smoothing distributions that work more efficiently for different $\ell_p$ settings, including $\ell_1$, $\ell_2$ and $\ell_\infty$ attacks. Our proposed methods achieve better certification results than previous works and provide a new perspective on randomized smoothing certification.
Published 2020-02-21
URL https://arxiv.org/abs/2002.09169v1
PDF https://arxiv.org/pdf/2002.09169v1.pdf
PWC https://paperswithcode.com/paper/black-box-certification-with-randomized

Prediction of flow characteristics in the bubble column reactor by the artificial pheromone-based communication of biological ants

Title Prediction of flow characteristics in the bubble column reactor by the artificial pheromone-based communication of biological ants
Authors Shahab Shamshirband, Meisam Babanezhad, Amir Mosavi, Narjes Nabipour, Eva Hajnal, Laszlo Nadai, Kwok-Wing Chau
Abstract In order to perceive the behavior presented by the multiphase chemical reactors, the ant colony optimization algorithm was combined with computational fluid dynamics (CFD) data. This intelligent algorithm creates a probabilistic technique for computing flow and it can predict various levels of three-dimensional bubble column reactor (BCR). This artificial ant algorithm is mimicking real ant behavior. This method can anticipate the flow characteristics in the reactor using almost 30 % of the whole data in the domain. Following discovering the suitable parameters, the method is used for predicting the points not being simulated with CFD, which represent mesh refinement of Ant colony method. In addition, it is possible to anticipate the bubble-column reactors in the absence of numerical results or training of exact values of evaluated data. The major benefits include reduced computational costs and time savings. The results show a great agreement between ant colony prediction and CFD outputs in different sections of the BCR. The combination of ant colony system and neural network framework can provide the smart structure to estimate biological and nature physics base phenomena. The ant colony optimization algorithm (ACO) framework based on ant behavior can solve all local mathematical answers throughout 3D bubble column reactor. The integration of all local answers can provide the overall solution in the reactor for different characteristics. This new overview of modelling can illustrate new sight into biological behavior in nature.
Published 2020-01-09
URL https://arxiv.org/abs/2001.04276v1
PDF https://arxiv.org/pdf/2001.04276v1.pdf
PWC https://paperswithcode.com/paper/prediction-of-flow-characteristics-in-the

Learning Robust and Multilingual Speech Representations

Title Learning Robust and Multilingual Speech Representations
Authors Kazuya Kawakami, Luyu Wang, Chris Dyer, Phil Blunsom, Aaron van den Oord
Abstract Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating the representations in terms of their ability to improve the performance of speech recognition systems on read English (e.g. Wall Street Journal and LibriSpeech). This evaluation methodology overlooks two important desiderata that speech representations should have: robustness to domain shifts and transferability to other languages. In this paper we learn representations from up to 8000 hours of diverse and noisy speech data and evaluate the representations by looking at their robustness to domain shifts and their ability to improve recognition performance in many languages. We find that our representations confer significant robustness advantages to the resulting recognition systems: we see significant improvements in out-of-domain transfer relative to baseline feature sets and the features likewise provide improvements in 25 phonetically diverse languages including tonal languages and low-resource languages.
Tasks Representation Learning, Speech Recognition
Published 2020-01-29
URL https://arxiv.org/abs/2001.11128v1
PDF https://arxiv.org/pdf/2001.11128v1.pdf
PWC https://paperswithcode.com/paper/learning-robust-and-multilingual-speech

Deeper Task-Specificity Improves Joint Entity and Relation Extraction

Title Deeper Task-Specificity Improves Joint Entity and Relation Extraction
Authors Phil Crone
Abstract Multi-task learning (MTL) is an effective method for learning related tasks, but designing MTL models necessitates deciding which and how many parameters should be task-specific, as opposed to shared between tasks. We investigate this issue for the problem of jointly learning named entity recognition (NER) and relation extraction (RE) and propose a novel neural architecture that allows for deeper task-specificity than does prior work. In particular, we introduce additional task-specific bidirectional RNN layers for both the NER and RE tasks and tune the number of shared and task-specific layers separately for different datasets. We achieve state-of-the-art (SOTA) results for both tasks on the ADE dataset; on the CoNLL04 dataset, we achieve SOTA results on the NER task and competitive results on the RE task while using an order of magnitude fewer trainable parameters than the current SOTA architecture. An ablation study confirms the importance of the additional task-specific layers for achieving these results. Our work suggests that previous solutions to joint NER and RE undervalue task-specificity and demonstrates the importance of correctly balancing the number of shared and task-specific parameters for MTL approaches in general.
Tasks Joint Entity and Relation Extraction, Multi-Task Learning, Named Entity Recognition, Relation Extraction
Published 2020-02-15
URL https://arxiv.org/abs/2002.06424v1
PDF https://arxiv.org/pdf/2002.06424v1.pdf
PWC https://paperswithcode.com/paper/deeper-task-specificity-improves-joint-entity

Application of Pre-training Models in Named Entity Recognition

Title Application of Pre-training Models in Named Entity Recognition
Authors Yu Wang, Yining Sun, Zuchang Ma, Lisheng Gao, Yang Xu, Ting Sun
Abstract Named Entity Recognition (NER) is a fundamental Natural Language Processing (NLP) task to extract entities from unstructured data. The previous methods for NER were based on machine learning or deep learning. Recently, pre-training models have significantly improved performance on multiple NLP tasks. In this paper, firstly, we introduce the architecture and pre-training tasks of four common pre-training models: BERT, ERNIE, ERNIE2.0-tiny, and RoBERTa. Then, we apply these pre-training models to a NER task by fine-tuning, and compare the effects of the different model architecture and pre-training tasks on the NER task. The experiment results showed that RoBERTa achieved state-of-the-art results on the MSRA-2006 dataset.
Tasks Named Entity Recognition
Published 2020-02-09
URL https://arxiv.org/abs/2002.08902v1
PDF https://arxiv.org/pdf/2002.08902v1.pdf
PWC https://paperswithcode.com/paper/application-of-pre-training-models-in-named
comments powered by Disqus