February 1, 2020

3146 words 15 mins read

Paper Group AWR 131

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering. A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images. Enhancing Adversarial Example Transferability with an Interm …

End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks


Title	End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks
Authors	Richard Cheng, Gabor Orosz, Richard M. Murray, Joel W. Burdick
Abstract	Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) on-line learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous car-following with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.
Tasks	Continuous Control, Gaussian Processes
Published	2019-03-21
URL	http://arxiv.org/abs/1903.08792v1
PDF	http://arxiv.org/pdf/1903.08792v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-safe-reinforcement-learning
Repo	https://github.com/rcheng805/RL-CBF
Framework	tf

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering


Title	GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Authors	Drew A. Hudson, Christopher D. Manning
Abstract	We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong and robust question engine that leverages scene graph structures to create 22M diverse reasoning questions, all come with functional programs that represent their semantics. We use the programs to gain tight control over the answer distribution and present a new tunable smoothing technique to mitigate question biases. Accompanying the dataset is a suite of new metrics that evaluate essential qualities such as consistency, grounding and plausibility. An extensive analysis is performed for baselines as well as state-of-the-art models, providing fine-grained results for different question types and topologies. Whereas a blind LSTM obtains mere 42.1%, and strong VQA models achieve 54.1%, human performance tops at 89.3%, offering ample opportunity for new research to explore. We strongly hope GQA will provide an enabling resource for the next generation of models with enhanced robustness, improved consistency, and deeper semantic understanding for images and language.
Tasks	Question Answering, Visual Question Answering, Visual Reasoning
Published	2019-02-25
URL	https://arxiv.org/abs/1902.09506v3
PDF	https://arxiv.org/pdf/1902.09506v3.pdf
PWC	https://paperswithcode.com/paper/gqa-a-new-dataset-for-compositional-question
Repo	https://github.com/stanfordnlp/mac-network.git
Framework	none

A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images


Title	A Deep Learning based Pipeline for Efficient Oral Cancer Screening on Whole Slide Images
Authors	Jiahao Lu, Nataša Sladoje, Christina Runow Stark, Eva Darai Ramqvist, Jan-Michaél Hirsch, Joakim Lindblad
Abstract	Oral cancer incidence is rapidly increasing worldwide. The most important determinant factor in cancer survival is early diagnosis. To facilitate large scale screening, we propose a fully automated end-to-end pipeline for oral cancer screening on whole slide cytology images. The pipeline consists of regression based nucleus detection, followed by per cell focus selection, and CNN based classification. We demonstrate that the pipeline provides fast and efficient cancer classification of whole slide cytology images, improving over previous results. The complete source code is made available as open source (https://github.com/MIDA-group/OralScreen).
Tasks	Oral Cancer Classification
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10549v1
PDF	https://arxiv.org/pdf/1910.10549v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-based-pipeline-for-efficient
Repo	https://github.com/MIDA-group/OralScreen
Framework	none

Enhancing Adversarial Example Transferability with an Intermediate Level Attack


Title	Enhancing Adversarial Example Transferability with an Intermediate Level Attack
Authors	Qian Huang, Isay Katsman, Horace He, Zeqi Gu, Serge Belongie, Ser-Nam Lim
Abstract	Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. We introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model, improving upon state-of-the-art methods. We show that we can select a layer of the source model to perturb without any knowledge of the target models while achieving high transferability. Additionally, we provide some explanatory insights regarding our method and the effect of optimizing for adversarial examples using intermediate feature maps. Our code is available at https://github.com/CUVL/Intermediate-Level-Attack.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10823v3
PDF	https://arxiv.org/pdf/1907.10823v3.pdf
PWC	https://paperswithcode.com/paper/enhancing-adversarial-example-transferability
Repo	https://github.com/CUVL/Intermediate-Level-Attack
Framework	pytorch

Retrieving Similar E-Commerce Images Using Deep Learning


Title	Retrieving Similar E-Commerce Images Using Deep Learning
Authors	Rishab Sharma, Anirudha Vishvakarma
Abstract	In this paper, we propose a deep convolutional neural network for learning the embeddings of images in order to capture the notion of visual similarity. We present a deep siamese architecture that when trained on positive and negative pairs of images learn an embedding that accurately approximates the ranking of images in order of visual similarity notion. We also implement a novel loss calculation method using an angular loss metrics based on the problems requirement. The final embedding of the image is combined representation of the lower and top-level embeddings. We used fractional distance matrix to calculate the distance between the learned embeddings in n-dimensional space. In the end, we compare our architecture with other existing deep architecture and go on to demonstrate the superiority of our solution in terms of image retrieval by testing the architecture on four datasets. We also show how our suggested network is better than the other traditional deep CNNs used for capturing fine-grained image similarities by learning an optimum embedding.
Tasks	Fine-Grained Visual Recognition, Image Retrieval, Product Recommendation, Recommendation Systems
Published	2019-01-11
URL	http://arxiv.org/abs/1901.03546v1
PDF	http://arxiv.org/pdf/1901.03546v1.pdf
PWC	https://paperswithcode.com/paper/retrieving-similar-e-commerce-images-using
Repo	https://github.com/gofynd/mildnet
Framework	tf

musicnn: Pre-trained convolutional neural networks for music audio tagging


Title	musicnn: Pre-trained convolutional neural networks for music audio tagging
Authors	Jordi Pons, Xavier Serra
Abstract	Pronounced as “musician”, the musicnn library contains a set of pre-trained musically motivated convolutional neural networks for music audio tagging: https://github.com/jordipons/musicnn. This repository also includes some pre-trained vgg-like baselines. These models can be used as out-of-the-box music audio taggers, as music feature extractors, or as pre-trained models for transfer learning. We also provide the code to train the aforementioned models: https://github.com/jordipons/musicnn-training. This framework also allows implementing novel models. For example, a musically motivated convolutional neural network with an attention-based output layer (instead of the temporal pooling layer) can achieve state-of-the-art results for music audio tagging: 90.77 ROC-AUC / 38.61 PR-AUC on the MagnaTagATune dataset — and 88.81 ROC-AUC / 31.51 PR-AUC on the Million Song Dataset.
Tasks	Audio Tagging, Transfer Learning
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06654v1
PDF	https://arxiv.org/pdf/1909.06654v1.pdf
PWC	https://paperswithcode.com/paper/musicnn-pre-trained-convolutional-neural
Repo	https://github.com/jordipons/musicnn-training
Framework	tf

Residual Non-local Attention Networks for Image Restoration


Title	Residual Non-local Attention Networks for Image Restoration
Authors	Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, Yun Fu
Abstract	In this paper, we propose a residual non-local attention network for high-quality image restoration. Without considering the uneven distribution of information in the corrupted images, previous methods are restricted by local convolutional operation and equal treatment of spatial- and channel-wise features. To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts. Specifically, we design trunk branch and (non-)local mask branch in each (non-)local attention block. The trunk branch is used to extract hierarchical features. Local and non-local mask branches aim to adaptively rescale these hierarchical features with mixed attentions. The local mask branch concentrates on more local structures with convolutional operations, while non-local attention considers more about long-range dependencies in the whole feature map. Furthermore, we propose residual local and non-local attention learning to train the very deep network, which further enhance the representation ability of the network. Our proposed method can be generalized for various image restoration applications, such as image denoising, demosaicing, compression artifacts reduction, and super-resolution. Experiments demonstrate that our method obtains comparable or better results compared with recently leading methods quantitatively and visually.
Tasks	Demosaicking, Denoising, Image Denoising, Image Restoration, Super-Resolution
Published	2019-03-24
URL	http://arxiv.org/abs/1903.10082v1
PDF	http://arxiv.org/pdf/1903.10082v1.pdf
PWC	https://paperswithcode.com/paper/residual-non-local-attention-networks-for-1
Repo	https://github.com/bruinxiong/RNAN
Framework	pytorch

Fooling Detection Alone is Not Enough: First Adversarial Attack against Multiple Object Tracking


Title	Fooling Detection Alone is Not Enough: First Adversarial Attack against Multiple Object Tracking
Authors	Yunhan Jia, Yantao Lu, Junjie Shen, Qi Alfred Chen, Zhenyu Zhong, Tao Wei
Abstract	Recent work in adversarial machine learning started to focus on the visual perception in autonomous driving and studied Adversarial Examples (AEs) for object detection models. However, in such visual perception pipeline the detected objects must also be tracked, in a process called Multiple Object Tracking (MOT), to build the moving trajectories of surrounding obstacles. Since MOT is designed to be robust against errors in object detection, it poses a general challenge to existing attack techniques that blindly target objection detection: we find that a success rate of over 98% is needed for them to actually affect the tracking results, a requirement that no existing attack technique can satisfy. In this paper, we are the first to study adversarial machine learning attacks against the complete visual perception pipeline in autonomous driving, and discover a novel attack technique, tracker hijacking, that can effectively fool MOT using AEs on object detection. Using our technique, successful AEs on as few as one single frame can move an existing object in to or out of the headway of an autonomous vehicle to cause potential safety hazards. We perform evaluation using the Berkeley Deep Drive dataset and find that on average when 3 frames are attacked, our attack can have a nearly 100% success rate while attacks that blindly target object detection only have up to 25%.
Tasks	Adversarial Attack, Autonomous Driving, Multiple Object Tracking, Object Detection, Object Tracking
Published	2019-05-27
URL	https://arxiv.org/abs/1905.11026v2
PDF	https://arxiv.org/pdf/1905.11026v2.pdf
PWC	https://paperswithcode.com/paper/fooling-detection-alone-is-not-enough-first
Repo	https://github.com/anonymousjack/hijacking
Framework	none

Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base


Title	Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base
Authors	Tao Shen, Xiubo Geng, Tao Qin, Daya Guo, Duyu Tang, Nan Duan, Guodong Long, Daxin Jiang
Abstract	We consider the problem of conversational question answering over a large-scale knowledge base. To handle huge entity vocabulary of a large-scale knowledge base, recent neural semantic parsing based approaches usually decompose the task into several subtasks and then solve them sequentially, which leads to following issues: 1) errors in earlier subtasks will be propagated and negatively affect downstream ones; and 2) each subtask cannot naturally share supervision signals with others. To tackle these issues, we propose an innovative multi-task learning framework where a pointer-equipped semantic parsing model is designed to resolve coreference in conversations, and naturally empower joint learning with a novel type-aware entity detection model. The proposed framework thus enables shared supervisions and alleviates the effect of error propagation. Experiments on a large-scale conversational question answering dataset containing 1.6M question answering pairs over 12.8M entities show that the proposed framework improves overall F1 score from 67% to 79% compared with previous state-of-the-art work.
Tasks	Multi-Task Learning, Question Answering, Semantic Parsing
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05069v1
PDF	https://arxiv.org/pdf/1910.05069v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-for-conversational
Repo	https://github.com/taoshen58/MaSP
Framework	tf

Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks


Title	Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
Authors	Amey Agrawal, Rohit Karlupia
Abstract	Recently, in the paper “Weight Agnostic Neural Networks” Gaier & Ha utilized architecture search to find networks where the topology completely encodes the knowledge. However, architecture search in topology space is expensive. We use the existing framework of binarized networks to find performant topologies by constraining the weights to be either, zero or one. We show that such topologies achieve performance similar to standard networks while pruning more than 99% weights. We further demonstrate that these topologies can perform tasks using constant weights without any explicit tuning. Finally, we discover that in our setup each neuron acts like a NOR gate, virtually learning a digital circuit. We demonstrate the efficacy of our approach on computer vision datasets.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1909.00052v2
PDF	https://arxiv.org/pdf/1909.00052v2.pdf
PWC	https://paperswithcode.com/paper/learning-digital-circuits-a-journey-through
Repo	https://github.com/AgrawalAmey/learning-digital-net
Framework	none

An Algorithm for Routing Capsules in All Domains


Title	An Algorithm for Routing Capsules in All Domains
Authors	Franz A. Heinsen
Abstract	Building on recent work on capsule networks, we propose a new, general-purpose form of “routing by agreement” that activates output capsules in a layer as a function of their net benefit to use and net cost to ignore input capsules from earlier layers. To illustrate the usefulness of our routing algorithm, we present two capsule networks that apply it in different domains: vision and language. The first network achieves new state-of-the-art accuracy of 99.1% on the smallNORB visual recognition task with fewer parameters and an order of magnitude less training than previous capsule models, and we find evidence that it learns to perform a form of “reverse graphics.” The second network achieves new state-of-the-art accuracies on the root sentences of the Stanford Sentiment Treebank: 58.5% on fine-grained and 95.6% on binary labels with a single-task model that routes frozen embeddings from a pretrained transformer as capsules. In both domains, we train with the same regime. Code is available at https://github.com/glassroom/heinsen_routing along with replication instructions.
Tasks
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00792v6
PDF	https://arxiv.org/pdf/1911.00792v6.pdf
PWC	https://paperswithcode.com/paper/an-algorithm-for-routing-capsules-in-all
Repo	https://github.com/glassroom/heinsen_routing
Framework	pytorch

PubLayNet: largest dataset ever for document layout analysis


Title	PubLayNet: largest dataset ever for document layout analysis
Authors	Xu Zhong, Jianbin Tang, Antonio Jimeno Yepes
Abstract	Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing over 360 thousand document images, where typical document layout elements are annotated. The experiments demonstrate that deep neural networks trained on PubLayNet accurately recognize the layout of scientific articles. The pre-trained models are also a more effective base mode for transfer learning on a different document domain. We release the dataset (https://github.com/ibm-aur-nlp/PubLayNet) to support development and evaluation of more advanced models for document layout analysis.
Tasks	Document Layout Analysis, Transfer Learning
Published	2019-08-16
URL	https://arxiv.org/abs/1908.07836v1
PDF	https://arxiv.org/pdf/1908.07836v1.pdf
PWC	https://paperswithcode.com/paper/190807836
Repo	https://github.com/ibm-aur-nlp/PubLayNet
Framework	none

Demystifying Inter-Class Disentanglement


Title	Demystifying Inter-Class Disentanglement
Authors	Aviv Gabbay, Yedid Hoshen
Abstract	Learning to disentangle the hidden factors of variations within a set of observations is a key task for artificial intelligence. We present a unified formulation for class and content disentanglement and use it to illustrate the limitations of current methods. We therefore introduce LORD, a novel method based on Latent Optimization for Representation Disentanglement. We find that latent optimization, along with an asymmetric noise regularization, is superior to amortized inference for achieving disentangled representations. In extensive experiments, our method is shown to achieve better disentanglement performance than both adversarial and non-adversarial methods that use the same level of supervision. We further introduce a clustering-based approach for extending our method for settings that exhibit in-class variation with promising results on the task of domain translation.
Tasks	Style Transfer
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11796v3
PDF	https://arxiv.org/pdf/1906.11796v3.pdf
PWC	https://paperswithcode.com/paper/latent-optimization-for-non-adversarial
Repo	https://github.com/avivga/lord
Framework	tf

Adaptive Learning Rate Clipping Stabilizes Learning


Title	Adaptive Learning Rate Clipping Stabilizes Learning
Authors	Jeffrey M. Ede, Richard Beanland
Abstract	Artificial neural network training with stochastic gradient descent can be destabilized by “bad batches” with high losses. This is often problematic for training with small batch sizes, high order loss functions or unstably high learning rates. To stabilize learning, we have developed adaptive learning rate clipping (ALRC) to limit backpropagated losses to a number of standard deviations above their running means. ALRC is designed to complement existing learning algorithms: Our algorithm is computationally inexpensive, can be applied to any loss function or batch size, is robust to hyperparameter choices and does not affect backpropagated gradient distributions. Experiments with CIFAR-10 supersampling show that ALCR decreases errors for unstable mean quartic error training while stable mean squared error training is unaffected. We also show that ALRC decreases unstable mean squared errors for partial scanning transmission electron micrograph completion. Our source code is publicly available at https://github.com/Jeffrey-Ede/ALRC
Tasks
Published	2019-06-21
URL	https://arxiv.org/abs/1906.09060v2
PDF	https://arxiv.org/pdf/1906.09060v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-learning-rate-clipping-stabilizes
Repo	https://github.com/Jeffrey-Ede/ALRC
Framework	tf

Likelihood Ratios for Out-of-Distribution Detection


Title	Likelihood Ratios for Out-of-Distribution Detection
Authors	Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan
Abstract	Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images.
Tasks	Out-of-Distribution Detection
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02845v2
PDF	https://arxiv.org/pdf/1906.02845v2.pdf
PWC	https://paperswithcode.com/paper/likelihood-ratios-for-out-of-distribution
Repo	https://github.com/google-research/google-research/tree/master/genomics_ood
Framework	tf