April 3, 2020

3186 words 15 mins read

Paper Group ANR 87

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation. Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities. Unifying Training and Inference for Panoptic Segmentation. Evolution of Image Segmentation using Deep Conv …

Understanding and Mitigating the Tradeoff Between Robustness and Accuracy


Title	Understanding and Mitigating the Tradeoff Between Robustness and Accuracy
Authors	Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang
Abstract	Adversarial training augments the training set with perturbations to improve the robust error (over worst-case perturbations), but it often leads to an increase in the standard error (on unperturbed test inputs). Previous explanations for this tradeoff rely on the assumption that no predictor in the hypothesis class has low standard and robust error. In this work, we precisely characterize the effect of augmentation on the standard error in linear regression when the optimal linear predictor has zero standard and robust error. In particular, we show that the standard error could increase even when the augmented perturbations have noiseless observations from the optimal linear predictor. We then prove that the recently proposed robust self-training (RST) estimator improves robust error without sacrificing standard error for noiseless linear regression. Empirically, for neural networks, we find that RST with different adversarial training methods improves both standard and robust error for random and adversarial rotations and adversarial $\ell_\infty$ perturbations in CIFAR-10.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10716v1
PDF	https://arxiv.org/pdf/2002.10716v1.pdf
PWC	https://paperswithcode.com/paper/understanding-and-mitigating-the-tradeoff
Repo
Framework

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation


Title	Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Authors	Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
Abstract	Convolution exploits locality for efficiency at a cost of missing long range context. Self-attention has been adopted to augment CNNs with non-local interactions. Recent works prove it possible to stack self-attention layers to obtain a fully attentional network by restricting the attention to a local region. In this paper, we attempt to remove this constraint by factorizing 2D self-attention into two 1D self-attentions. This reduces computation complexity and allows performing attention within a larger or even global region. In companion, we also propose a position-sensitive self-attention design. Combining both yields our position-sensitive axial-attention layer, a novel building block that one could stack to form axial-attention models for image classification and dense prediction. We demonstrate the effectiveness of our model on four large-scale datasets. In particular, our model outperforms all existing stand-alone self-attention models on ImageNet. Our Axial-DeepLab improves 2.8% PQ over bottom-up state-of-the-art on COCO test-dev. This previous state-of-the-art is attained by our small variant that is 3.8x parameter-efficient and 27x computation-efficient. Axial-DeepLab also achieves state-of-the-art results on Mapillary Vistas and Cityscapes.
Tasks	Image Classification, Panoptic Segmentation
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07853v1
PDF	https://arxiv.org/pdf/2003.07853v1.pdf
PWC	https://paperswithcode.com/paper/axial-deeplab-stand-alone-axial-attention-for
Repo
Framework

Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities


Title	Deep Learning for Sensor-based Human Activity Recognition: Overview, Challenges and Opportunities
Authors	Kaixuan Chen, Dalin Zhang, Lina Yao, Bin Guo, Zhiwen Yu, Yunhao Liu
Abstract	The vast proliferation of sensor devices and Internet of Things enables the applications of sensor-based activity recognition. However, there exist substantial challenges that could influence the performance of the recognition system in practical scenarios. Recently, as deep learning has demonstrated its effectiveness in many areas, plenty of deep methods have been investigated to address the challenges in activity recognition. In this study, we present a survey of the state-of-the-art deep learning methods for sensor-based human activity recognition. We first introduce the multi-modality of the sensory data and provide information for public datasets that can be used for evaluation in different challenge tasks. We then propose a new taxonomy to structure the deep methods by challenges. Challenges and challenge-related deep methods are summarized and analyzed to form an overview of the current research progress. At the end of this work, we discuss the open issues and provide some insights for future directions.
Tasks	Activity Recognition, Human Activity Recognition
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07416v1
PDF	https://arxiv.org/pdf/2001.07416v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-sensor-based-human-activity
Repo
Framework

Unifying Training and Inference for Panoptic Segmentation


Title	Unifying Training and Inference for Panoptic Segmentation
Authors	Qizhu Li, Xiaojuan Qi, Philip H. S. Torr
Abstract	We present an end-to-end network to bridge the gap between training and inference pipeline for panoptic segmentation, a task that seeks to partition an image into semantic regions for “stuff” and object instances for “things”. In contrast to recent works, our network exploits a parametrised, yet lightweight panoptic segmentation submodule, powered by an end-to-end learnt dense instance affinity, to capture the probability that any pair of pixels belong to the same instance. This panoptic submodule gives rise to a novel propagation mechanism for panoptic logits and enables the network to output a coherent panoptic segmentation map for both “stuff” and “thing” classes, without any post-processing. Reaping the benefits of end-to-end training, our full system sets new records on the popular street scene dataset, Cityscapes, achieving 61.4 PQ with a ResNet-50 backbone using only the fine annotations. On the challenging COCO dataset, our ResNet-50-based network also delivers state-of-the-art accuracy of 43.4 PQ. Moreover, our network flexibly works with and without object mask cues, performing competitively under both settings, which is of interest for applications with computation budgets.
Tasks	Panoptic Segmentation
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04982v1
PDF	https://arxiv.org/pdf/2001.04982v1.pdf
PWC	https://paperswithcode.com/paper/unifying-training-and-inference-for-panoptic
Repo
Framework

Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey


Title	Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey
Authors	Farhana Sultana, Abu Sufian, Paramartha Dutta
Abstract	From the autonomous car driving to medical diagnosis, the requirement of the task of image segmentation is everywhere. Segmentation of an image is one of the indispensable tasks in computer vision. This task is comparatively complicated than other vision tasks as it needs low-level spatial information. Basically, image segmentation can be of two types: semantic segmentation and instance segmentation. The combined version of these two basic tasks is known as panoptic segmentation. In the recent era, the success of deep convolutional neural network (CNN) has influenced the field of segmentation greatly and gave us various successful models to date. In this survey, we are going to take a glance at the evolution of both semantic and instance segmentation work based on CNN. We have also specified comparative architectural details of some state-of-the-art models and discuss their training details to present a lucid understanding of hyper-parameter tuning of those models. Lastly, we have drawn a comparison among the performance of those models on different datasets.
Tasks	Instance Segmentation, Medical Diagnosis, Panoptic Segmentation, Semantic Segmentation
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04074v2
PDF	https://arxiv.org/pdf/2001.04074v2.pdf
PWC	https://paperswithcode.com/paper/evolution-of-image-segmentation-using-deep
Repo
Framework

Effective End-to-End Learning Framework for Economic Dispatch


Title	Effective End-to-End Learning Framework for Economic Dispatch
Authors	Chenbei Lu, Kui Wang, Chenye Wu
Abstract	Conventional wisdom to improve the effectiveness of economic dispatch is to design the load forecasting method as accurately as possible. However, this approach can be problematic due to the temporal and spatial correlations between system cost and load prediction errors. This motivates us to adopt the notion of end-to-end machine learning and to propose a task-specific learning criteria to conduct economic dispatch. Specifically, to maximize the data utilization, we design an efficient optimization kernel for the learning process. We provide both theoretical analysis and empirical insights to highlight the effectiveness and efficiency of the proposed learning framework.
Tasks	Load Forecasting
Published	2020-02-22
URL	https://arxiv.org/abs/2002.12755v1
PDF	https://arxiv.org/pdf/2002.12755v1.pdf
PWC	https://paperswithcode.com/paper/effective-end-to-end-learning-framework-for
Repo
Framework

Forming Diverse Teams from Sequentially Arriving People


Title	Forming Diverse Teams from Sequentially Arriving People
Authors	Faez Ahmed, John Dickerson, Mark Fuge
Abstract	Collaborative work often benefits from having teams or organizations with heterogeneous members. In this paper, we present a method to form such diverse teams from people arriving sequentially over time. We define a monotone submodular objective function that combines the diversity and quality of a team and propose an algorithm to maximize the objective while satisfying multiple constraints. This allows us to balance both how diverse the team is and how well it can perform the task at hand. Using crowd experiments, we show that, in practice, the algorithm leads to large gains in team diversity. Using simulations, we show how to quantify the additional cost of forming diverse teams and how to address the problem of simultaneously maximizing diversity for several attributes (e.g., country of origin, gender). Our method has applications in collaborative work ranging from team formation, the assignment of workers to teams in crowdsourcing, and reviewer allocation to journal papers arriving sequentially. Our code is publicly accessible for further research.
Tasks
Published	2020-02-25
URL	https://arxiv.org/abs/2002.10697v1
PDF	https://arxiv.org/pdf/2002.10697v1.pdf
PWC	https://paperswithcode.com/paper/forming-diverse-teams-from-sequentially
Repo
Framework

Efficient Tensor Kernel methods for sparse regression


Title	Efficient Tensor Kernel methods for sparse regression
Authors	Feliks Hibraj, Marcello Pelillo, Saverio Salzo, Massimiliano Pontil
Abstract	Recently, classical kernel methods have been extended by the introduction of suitable tensor kernels so to promote sparsity in the solution of the underlying regression problem. Indeed, they solve an lp-norm regularization problem, with p=m/(m-1) and m even integer, which happens to be close to a lasso problem. However, a major drawback of the method is that storing tensors requires a considerable amount of memory, ultimately limiting its applicability. In this work we address this problem by proposing two advances. First, we directly reduce the memory requirement, by intriducing a new and more efficient layout for storing the data. Second, we use a Nystrom-type subsampling approach, which allows for a training phase with a smaller number of data points, so to reduce the computational cost. Experiments, both on synthetic and read datasets, show the effectiveness of the proposed improvements. Finally, we take case of implementing the cose in C++ so to further speed-up the computation.
Tasks
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10482v1
PDF	https://arxiv.org/pdf/2003.10482v1.pdf
PWC	https://paperswithcode.com/paper/efficient-tensor-kernel-methods-for-sparse
Repo
Framework


Title	Cooperation without Coordination: Hierarchical Predictive Planning for Decentralized Multiagent Navigation
Authors	Rose E. Wang, J. Chase Kew, Dennis Lee, Tsang-Wei Edward Lee, Tingnan Zhang, Brian Ichter, Jie Tan, Aleksandra Faust
Abstract	Decentralized multiagent planning raises many challenges, such as adaption to changing environments inexplicable by the agent’s own behavior, coordination from noisy sensor inputs like lidar, cooperation without knowing other agents’ intents. To address these challenges, we present hierarchical predictive planning (HPP) for decentralized multiagent navigation tasks. HPP learns prediction models for itself and other teammates, and uses the prediction models to propose and evaluate navigation goals that complete the cooperative task without explicit coordination. To learn the prediction models, HPP observes other agents’ behavior and learns to maps own sensors to predicted locations of other agents. HPP then uses the cross-entropy method to iteratively propose, evaluate, and improve navigation goals, under assumption that all agents in the team share a common objective. HPP removes the need for a centralized operator (i.e. robots determine their own actions without coordinating their beliefs or plans) and can be trained and easily transferred to real world environments. The results show that HPP generalizes to new environments including real-world robot team. It is also 33x more sample efficient and performs better in complex environments compared to a baseline. The video and website for this paper can be found at https://youtu.be/-LqgfksqNH8 and https://sites.google.com/view/multiagent-hpp.
Tasks
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06906v1
PDF	https://arxiv.org/pdf/2003.06906v1.pdf
PWC	https://paperswithcode.com/paper/cooperation-without-coordination-hierarchical
Repo
Framework

Multi-task Learning for Voice Trigger Detection


Title	Multi-task Learning for Voice Trigger Detection
Authors	Siddharth Sigtia, Pascal Clark, Rob Haynes, Hywel Richards, John Bridle
Abstract	We describe the design of a voice trigger detection system for smart speakers. In this study, we address two major challenges. The first is that the detectors are deployed in complex acoustic environments with external noise and loud playback by the device itself. Secondly, collecting training examples for a specific keyword or trigger phrase is challenging resulting in a scarcity of trigger phrase specific training data. We describe a two-stage cascaded architecture where a low-power detector is always running and listening for the trigger phrase. If a detection is made at this stage, the candidate audio segment is re-scored by larger, more complex models to verify that the segment contains the trigger phrase. In this study, we focus our attention on the architecture and design of these second-pass detectors. We start by training a general acoustic model that produces phonetic transcriptions given a large labelled training dataset. Next, we collect a much smaller dataset of examples that are challenging for the baseline system. We then use multi-task learning to train a model to simultaneously produce accurate phonetic transcriptions on the larger dataset \emph{and} discriminate between true and easily confusable examples using the smaller dataset. Our results demonstrate that the proposed model reduces errors by half compared to the baseline in a range of challenging test conditions \emph{without} requiring extra parameters.
Tasks	Multi-Task Learning
Published	2020-01-26
URL	https://arxiv.org/abs/2001.09519v1
PDF	https://arxiv.org/pdf/2001.09519v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-for-voice-trigger
Repo
Framework

MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers


Title	MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional Transformers
Authors	Muhammad Raza Khan, Morteza Ziyadi, Mohamed AbdelHady
Abstract	Conversational agents such as Cortana, Alexa and Siri are continuously working on increasing their capabilities by adding new domains. The support of a new domain includes the design and development of a number of NLU components for domain classification, intents classification and slots tagging (including named entity recognition). Each component only performs well when trained on a large amount of labeled data. Second, these components are deployed on limited-memory devices which requires some model compression. Third, for some domains such as the health domain, it is hard to find a single training data set that covers all the required slot types. To overcome these mentioned problems, we present a multi-task transformer-based neural architecture for slot tagging. We consider the training of a slot tagger using multiple data sets covering different slot types as a multi-task learning problem. The experimental results on the biomedical domain have shown that the proposed approach outperforms the previous state-of-the-art systems for slot tagging on the different benchmark biomedical datasets in terms of (time and memory) efficiency and effectiveness. The output slot tagger can be used by the conversational agent to better identify entities in the input utterances.
Tasks	Model Compression, Multi-Task Learning, Named Entity Recognition
Published	2020-01-24
URL	https://arxiv.org/abs/2001.08904v1
PDF	https://arxiv.org/pdf/2001.08904v1.pdf
PWC	https://paperswithcode.com/paper/mt-bioner-multi-task-learning-for-biomedical
Repo
Framework

Learning When and Where to Zoom with Deep Reinforcement Learning


Title	Learning When and Where to Zoom with Deep Reinforcement Learning
Authors	Burak Uzkent, Stefano Ermon
Abstract	While high resolution images contain semantically more useful information than their lower resolution counterparts, processing them is computationally more expensive, and in some applications, e.g. remote sensing, they can be much more expensive to acquire. For these reasons, it is desirable to develop an automatic method to selectively use high resolution data when necessary while maintaining accuracy and reducing acquisition/run-time cost. In this direction, we propose PatchDrop a reinforcement learning approach to dynamically identify when and where to use/acquire high resolution data conditioned on the paired, cheap, low resolution images. We conduct experiments on CIFAR10, CIFAR100, ImageNet and fMoW datasets where we use significantly less high resolution data while maintaining similar accuracy to models which use full high resolution images.
Tasks
Published	2020-03-01
URL	https://arxiv.org/abs/2003.00425v1
PDF	https://arxiv.org/pdf/2003.00425v1.pdf
PWC	https://paperswithcode.com/paper/learning-when-and-where-to-zoom-with-deep
Repo
Framework

DeepFL-IQA: Weak Supervision for Deep IQA Feature Learning


Title	DeepFL-IQA: Weak Supervision for Deep IQA Feature Learning
Authors	Hanhe Lin, Vlad Hosu, Dietmar Saupe
Abstract	Multi-level deep-features have been driving state-of-the-art methods for aesthetics and image quality assessment (IQA). However, most IQA benchmarks are comprised of artificially distorted images, for which features derived from ImageNet under-perform. We propose a new IQA dataset and a weakly supervised feature learning approach to train features more suitable for IQA of artificially distorted images. The dataset, KADIS-700k, is far more extensive than similar works, consisting of 140,000 pristine images, 25 distortions types, totaling 700k distorted versions. Our weakly supervised feature learning is designed as a multi-task learning type training, using eleven existing full-reference IQA metrics as proxies for differential mean opinion scores. We also introduce a benchmark database, KADID-10k, of artificially degraded images, each subjectively annotated by 30 crowd workers. We make use of our derived image feature vectors for (no-reference) image quality assessment by training and testing a shallow regression network on this database and five other benchmark IQA databases. Our method, termed DeepFL-IQA, performs better than other feature-based no-reference IQA methods and also better than all tested full-reference IQA methods on KADID-10k. For the other five benchmark IQA databases, DeepFL-IQA matches the performance of the best existing end-to-end deep learning-based methods on average.
Tasks	Image Quality Assessment, Multi-Task Learning, No-Reference Image Quality Assessment
Published	2020-01-20
URL	https://arxiv.org/abs/2001.08113v1
PDF	https://arxiv.org/pdf/2001.08113v1.pdf
PWC	https://paperswithcode.com/paper/deepfl-iqa-weak-supervision-for-deep-iqa
Repo
Framework

On the Search for Feedback in Reinforcement Learning


Title	On the Search for Feedback in Reinforcement Learning
Authors	Ran Wang, Karthikeya S. Parunandi, Dan Yu, Dileep Kalathil, Suman Chakravorty
Abstract	This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a “decoupling” principle that drastically reduces the feedback parameter space while still remaining near-optimal to the fourth-order in a small noise parameter. Based on this principle, we propose a decoupled data-based control (D2C) algorithm that addresses the stochastic control problem: first, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, a linear closed-loop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.
Tasks
Published	2020-02-21
URL	https://arxiv.org/abs/2002.09478v1
PDF	https://arxiv.org/pdf/2002.09478v1.pdf
PWC	https://paperswithcode.com/paper/on-the-search-for-feedback-in-reinforcement
Repo
Framework

Gradient Surgery for Multi-Task Learning


Title	Gradient Surgery for Multi-Task Learning
Authors	Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
Abstract	While deep learning and deep reinforcement learning (RL) systems have demonstrated impressive results in domains such as image classification, game playing, and robotic control, data efficiency remains a major challenge. Multi-task learning has emerged as a promising approach for sharing structure across multiple tasks to enable more efficient learning. However, the multi-task setting presents a number of optimization challenges, making it difficult to realize large efficiency gains compared to learning tasks independently. The reasons why multi-task learning is so challenging compared to single-task learning are not fully understood. In this work, we identify a set of three conditions of the multi-task optimization landscape that cause detrimental gradient interference, and develop a simple yet general approach for avoiding such interference between task gradients. We propose a form of gradient surgery that projects a task’s gradient onto the normal plane of the gradient of any other task that has a conflicting gradient. On a series of challenging multi-task supervised and multi-task RL problems, this approach leads to substantial gains in efficiency and performance. Further, it is model-agnostic and can be combined with previously-proposed multi-task architectures for enhanced performance.
Tasks	Image Classification, Multi-Task Learning
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06782v1
PDF	https://arxiv.org/pdf/2001.06782v1.pdf
PWC	https://paperswithcode.com/paper/gradient-surgery-for-multi-task-learning-1
Repo
Framework