February 1, 2020

3146 words 15 mins read

Paper Group AWR 131

Paper Group AWR 131

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images. Enhancing Adversarial Example Transferability with an Interm …

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks

Title End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Authors Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick
Abstract Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.
Tasks Continuous Control, Gaussian Processes
Published 2019-03-21
URL http://arxiv.org/abs/1903.08792v1
PDF http://arxiv.org/pdf/1903.08792v1.pdf
PWC https://paperswithcode.com/paper/end-to-end-safe-reinforcement-learning
Repo https://github.com/rcheng805/RL-CBF
Framework tf

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Title GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Authors Drew A. Hudson, Christopher D. Manning
Abstract We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene graph structures to create 22M diverse reasoning questions, all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate question biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. An extensive analysis is performed for baselines as well as state-of-the-art models, providing fine-grained results for different question types and topologies. Whereas a blind LSTM obtains mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3%, offering ample opportunity for new research to explore. We strongly hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding for images and language.
Tasks Question Answering, Visual Question Answering, Visual Reasoning
Published 2019-02-25
URL https://arxiv.org/abs/1902.09506v3
PDF https://arxiv.org/pdf/1902.09506v3.pdf
PWC https://paperswithcode.com/paper/gqa-a-new-dataset-for-compositional-question
Repo https://github.com/stanfordnlp/mac-network.git
Framework none

A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images

Title A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images
Authors Jiahao Lu, Nataša Sladoje, Christina Runow Stark, Eva Darai Ramqvist, Jan-Michaél Hirsch, Joakim Lindblad
Abstract Oral cancer incidence is rapidly increasing worldwide. The most important determinant factor in cancer survival is early diagnosis. To facilitate large scale screening, we propose a fully automated end-to-end pipeline for oral cancer screening on whole slide cytology images. The pipeline consists of regression based nucleus detection, followed by per cell focus selection, and CNN based classification. We demonstrate that the pipeline provides fast and efficient cancer classification of whole slide cytology images, improving over previous results. The complete source code is made available as open source (https://github.com/MIDA-group/OralScreen).
Tasks Oral Cancer Classification
Published 2019-10-23
URL https://arxiv.org/abs/1910.10549v1
PDF https://arxiv.org/pdf/1910.10549v1.pdf
PWC https://paperswithcode.com/paper/a-deep-learning-based-pipeline-for-efficient
Repo https://github.com/MIDA-group/OralScreen
Framework none

Enhancing Adversarial Example Transferability with an Intermediate Level Attack

Title Enhancing Adversarial Example Transferability with an Intermediate Level Attack
Authors Qian Huang, Isay Katsman, Horace He, Zeqi Gu, Serge Belongie, Ser-Nam Lim
Abstract Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. We show that we can select a layer of the source model to perturb without any knowledge of the target models while achieving high transferability. Additionally, we provide some explanatory insights regarding our method and the effect of optimizing for adversarial examples using intermediate feature maps. Our code is available at https://github.com/CUVL/Intermediate-Level-Attack.
Tasks
Published 2019-07-23
URL https://arxiv.org/abs/1907.10823v3
PDF https://arxiv.org/pdf/1907.10823v3.pdf
PWC https://paperswithcode.com/paper/enhancing-adversarial-example-transferability
Repo https://github.com/CUVL/Intermediate-Level-Attack
Framework pytorch

Retrieving Similar E-Commerce Images Using Deep Learning

Title Retrieving Similar E-Commerce Images Using Deep Learning
Authors Rishab Sharma, Anirudha Vishvakarma
Abstract In this paper, we propose a deep convolutional neural network for learning the embeddings of images in order to capture the notion of visual similarity. We present a deep siamese architecture that when trained on positive and negative pairs of images learn an embedding that accurately approximates the ranking of images in order of visual similarity notion. We also implement a novel loss calculation method using an angular loss metrics based on the problems requirement. The final embedding of the image is combined representation of the lower and top-level embeddings. We used fractional distance matrix to calculate the distance between the learned embeddings in n-dimensional space. In the end, we compare our architecture with other existing deep architecture and go on to demonstrate the superiority of our solution in terms of image retrieval by testing the architecture on four datasets. We also show how our suggested network is better than the other traditional deep CNNs used for capturing fine-grained image similarities by learning an optimum embedding.
Tasks Fine-Grained Visual Recognition, Image Retrieval, Product Recommendation, Recommendation Systems
Published 2019-01-11
URL http://arxiv.org/abs/1901.03546v1
PDF http://arxiv.org/pdf/1901.03546v1.pdf
PWC https://paperswithcode.com/paper/retrieving-similar-e-commerce-images-using
Repo https://github.com/gofynd/mildnet
Framework tf

musicnn: Pre-trained convolutional neural networks for music audio tagging

Title musicnn: Pre-trained convolutional neural networks for music audio tagging
Authors Jordi Pons, Xavier Serra
Abstract Pronounced as “musician”, the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github.com/jordipons/musicnn. This repository also includes some pre-trained vgg-like baselines. These models can be used as out-of-the-box music audio taggers, as music feature extractors, or as pre-trained models for transfer learning. We also provide the code to train the aforementioned models: https://github.com/jordipons/musicnn-training. This framework also allows implementing novel models. For example, a musically motivated convolutional neural network with an attention-based output layer (instead of the temporal pooling layer) can achieve state-of-the-art results for music audio tagging: 90.77 ROC-AUC / 38.61 PR-AUC on the MagnaTagATune dataset — and 88.81 ROC-AUC / 31.51 PR-AUC on the Million Song Dataset.
Tasks Audio Tagging, Transfer Learning
Published 2019-09-14
URL https://arxiv.org/abs/1909.06654v1
PDF https://arxiv.org/pdf/1909.06654v1.pdf
PWC https://paperswithcode.com/paper/musicnn-pre-trained-convolutional-neural
Repo https://github.com/jordipons/musicnn-training
Framework tf

Residual Non-local Attention Networks for Image Restoration

Title Residual Non-local Attention Networks for Image Restoration
Authors Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, Yun Fu
Abstract In this paper, we propose a residual non-local attention network for high-quality image restoration. Without considering the uneven distribution of information in the corrupted images, previous methods are restricted by local convolutional operation and equal treatment of spatial- and channel-wise features. To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts. Specifically, we design trunk branch and (non-)local mask branch in each (non-)local attention block. The trunk branch is used to extract hierarchical features. Local and non-local mask branches aim to adaptively rescale these hierarchical features with mixed attentions. The local mask branch concentrates on more local structures with convolutional operations, while non-local attention considers more about long-range dependencies in the whole feature map. Furthermore, we propose residual local and non-local attention learning to train the very deep network, which further enhance the representation ability of the network. Our proposed method can be generalized for various image restoration applications, such as image denoising, demosaicing, compression artifacts reduction, and super-resolution. Experiments demonstrate that our method obtains comparable or better results compared with recently leading methods quantitatively and visually.
Tasks Demosaicking, Denoising, Image Denoising, Image Restoration, Super-Resolution
Published 2019-03-24
URL http://arxiv.org/abs/1903.10082v1
PDF http://arxiv.org/pdf/1903.10082v1.pdf
PWC https://paperswithcode.com/paper/residual-non-local-attention-networks-for-1
Repo https://github.com/bruinxiong/RNAN
Framework pytorch

Fooling Detection Alone is Not Enough: First Adversarial Attack against Multiple Object Tracking

Title Fooling Detection Alone is Not Enough: First Adversarial Attack against Multiple Object Tracking
Authors Yunhan Jia, Yantao Lu, Junjie Shen, Qi Alfred Chen, Zhenyu Zhong, Tao Wei
Abstract Recent work in adversarial machine learning started to focus on the visual perception in autonomous driving and studied Adversarial Examples (AEs) for object detection models. However, in such visual perception pipeline the detected objects must also be tracked, in a process called Multiple Object Tracking (MOT), to build the moving trajectories of surrounding obstacles. Since MOT is designed to be robust against errors in object detection, it poses a general challenge to existing attack techniques that blindly target objection detection: we find that a success rate of over 98% is needed for them to actually affect the tracking results, a requirement that no existing attack technique can satisfy. In this paper, we are the first to study adversarial machine learning attacks against the complete visual perception pipeline in autonomous driving, and discover a novel attack technique, tracker hijacking, that can effectively fool MOT using AEs on object detection. Using our technique, successful AEs on as few as one single frame can move an existing object in to or out of the headway of an autonomous vehicle to cause potential safety hazards. We perform evaluation using the Berkeley Deep Drive dataset and find that on average when 3 frames are attacked, our attack can have a nearly 100% success rate while attacks that blindly target object detection only have up to 25%.
Tasks Adversarial Attack, Autonomous Driving, Multiple Object Tracking, Object Detection, Object Tracking
Published 2019-05-27
URL https://arxiv.org/abs/1905.11026v2
PDF https://arxiv.org/pdf/1905.11026v2.pdf
PWC https://paperswithcode.com/paper/fooling-detection-alone-is-not-enough-first
Repo https://github.com/anonymousjack/hijacking
Framework none

Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base

Title Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base
Authors Tao Shen, Xiubo Geng, Tao Qin, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, Daxin Jiang
Abstract We consider the problem of conversational question answering over a large-scale knowledge base. To handle huge entity vocabulary of a large-scale knowledge base, recent neural semantic parsing based approaches usually decompose the task into several subtasks and then solve them sequentially, which leads to following issues: 1) errors in earlier subtasks will be propagated and negatively affect downstream ones; and 2) each subtask cannot naturally share supervision signals with others. To tackle these issues, we propose an innovative multi-task learning framework where a pointer-equipped semantic parsing model is designed to resolve coreference in conversations, and naturally empower joint learning with a novel type-aware entity detection model. The proposed framework thus enables shared supervisions and alleviates the effect of error propagation. Experiments on a large-scale conversational question answering dataset containing 1.6M question answering pairs over 12.8M entities show that the proposed framework improves overall F1 score from 67% to 79% compared with previous state-of-the-art work.
Tasks Multi-Task Learning, Question Answering, Semantic Parsing
Published 2019-10-11
URL https://arxiv.org/abs/1910.05069v1
PDF https://arxiv.org/pdf/1910.05069v1.pdf
PWC https://paperswithcode.com/paper/multi-task-learning-for-conversational
Repo https://github.com/taoshen58/MaSP
Framework tf

Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks

Title Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
Authors Amey Agrawal, Rohit Karlupia
Abstract Recently, in the paper “Weight Agnostic Neural Networks” Gaier & Ha utilized architecture search to find networks where the topology completely encodes the knowledge. However, architecture search in topology space is expensive. We use the existing framework of binarized networks to find performant topologies by constraining the weights to be either, zero or one. We show that such topologies achieve performance similar to standard networks while pruning more than 99% weights. We further demonstrate that these topologies can perform tasks using constant weights without any explicit tuning. Finally, we discover that in our setup each neuron acts like a NOR gate, virtually learning a digital circuit. We demonstrate the efficacy of our approach on computer vision datasets.
Tasks
Published 2019-08-30
URL https://arxiv.org/abs/1909.00052v2
PDF https://arxiv.org/pdf/1909.00052v2.pdf
PWC https://paperswithcode.com/paper/learning-digital-circuits-a-journey-through
Repo https://github.com/AgrawalAmey/learning-digital-net
Framework none

An Algorithm for Routing Capsules in All Domains

Title An Algorithm for Routing Capsules in All Domains
Authors Franz A. Heinsen
Abstract Building on recent work on capsule networks, we propose a new, general-purpose form of “routing by agreement” that activates output capsules in a layer as a function of their net benefit to use and net cost to ignore input capsules from earlier layers. To illustrate the usefulness of our routing algorithm, we present two capsule networks that apply it in different domains: vision and language. The first network achieves new state-of-the-art accuracy of 99.1% on the smallNORB visual recognition task with fewer parameters and an order of magnitude less training than previous capsule models, and we find evidence that it learns to perform a form of “reverse graphics.” The second network achieves new state-of-the-art accuracies on the root sentences of the Stanford Sentiment Treebank: 58.5% on fine-grained and 95.6% on binary labels with a single-task model that routes frozen embeddings from a pretrained transformer as capsules. In both domains, we train with the same regime. Code is available at https://github.com/glassroom/heinsen_routing along with replication instructions.
Tasks
Published 2019-11-02
URL https://arxiv.org/abs/1911.00792v6
PDF https://arxiv.org/pdf/1911.00792v6.pdf
PWC https://paperswithcode.com/paper/an-algorithm-for-routing-capsules-in-all
Repo https://github.com/glassroom/heinsen_routing
Framework pytorch

PubLayNet: largest dataset ever for document layout analysis

Title PubLayNet: largest dataset ever for document layout analysis
Authors Xu Zhong, Jianbin Tang, Antonio Jimeno Yepes
Abstract Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.
Tasks Document Layout Analysis, Transfer Learning
Published 2019-08-16
URL https://arxiv.org/abs/1908.07836v1
PDF https://arxiv.org/pdf/1908.07836v1.pdf
PWC https://paperswithcode.com/paper/190807836
Repo https://github.com/ibm-aur-nlp/PubLayNet
Framework none

Demystifying Inter-Class Disentanglement

Title Demystifying Inter-Class Disentanglement
Authors Aviv Gabbay, Yedid Hoshen
Abstract Learning to disentangle the hidden factors of variations within a set of observations is a key task for artificial intelligence. We present a unified formulation for class and content disentanglement and use it to illustrate the limitations of current methods. We therefore introduce LORD, a novel method based on Latent Optimization for Representation Disentanglement. We find that latent optimization, along with an asymmetric noise regularization, is superior to amortized inference for achieving disentangled representations. In extensive experiments, our method is shown to achieve better disentanglement performance than both adversarial and non-adversarial methods that use the same level of supervision. We further introduce a clustering-based approach for extending our method for settings that exhibit in-class variation with promising results on the task of domain translation.
Tasks Style Transfer
Published 2019-06-27
URL https://arxiv.org/abs/1906.11796v3
PDF https://arxiv.org/pdf/1906.11796v3.pdf
PWC https://paperswithcode.com/paper/latent-optimization-for-non-adversarial
Repo https://github.com/avivga/lord
Framework tf

Adaptive Learning Rate Clipping Stabilizes Learning

Title Adaptive Learning Rate Clipping Stabilizes Learning
Authors Jeffrey M. Ede, Richard Beanland
Abstract Artificial neural network training with stochastic gradient descent can be destabilized by “bad batches” with high losses. This is often problematic for training with small batch sizes, high order loss functions or unstably high learning rates. To stabilize learning, we have developed adaptive learning rate clipping (ALRC) to limit backpropagated losses to a number of standard deviations above their running means. ALRC is designed to complement existing learning algorithms: Our algorithm is computationally inexpensive, can be applied to any loss function or batch size, is robust to hyperparameter choices and does not affect backpropagated gradient distributions. Experiments with CIFAR-10 supersampling show that ALCR decreases errors for unstable mean quartic error training while stable mean squared error training is unaffected. We also show that ALRC decreases unstable mean squared errors for partial scanning transmission electron micrograph completion. Our source code is publicly available at https://github.com/Jeffrey-Ede/ALRC
Tasks
Published 2019-06-21
URL https://arxiv.org/abs/1906.09060v2
PDF https://arxiv.org/pdf/1906.09060v2.pdf
PWC https://paperswithcode.com/paper/adaptive-learning-rate-clipping-stabilizes
Repo https://github.com/Jeffrey-Ede/ALRC
Framework tf

Likelihood Ratios for Out-of-Distribution Detection

Title Likelihood Ratios for Out-of-Distribution Detection
Authors Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan
Abstract Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images.
Tasks Out-of-Distribution Detection
Published 2019-06-07
URL https://arxiv.org/abs/1906.02845v2
PDF https://arxiv.org/pdf/1906.02845v2.pdf
PWC https://paperswithcode.com/paper/likelihood-ratios-for-out-of-distribution
Repo https://github.com/google-research/google-research/tree/master/genomics_ood
Framework tf
comments powered by Disqus