April 2, 2020

3368 words 16 mins read

Paper Group ANR 203

Paper Group ANR 203

EmpTransfo: A Multi-head Transformer Architecture for Creating Empathetic Dialog Systems. Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies. Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning. Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks. Long-tail Visual …

EmpTransfo: A Multi-head Transformer Architecture for Creating Empathetic Dialog Systems

Title EmpTransfo: A Multi-head Transformer Architecture for Creating Empathetic Dialog Systems
Authors Rohola Zandie, Mohammad H. Mahoor
Abstract Understanding emotions and responding accordingly is one of the biggest challenges of dialog systems. This paper presents EmpTransfo, a multi-head Transformer architecture for creating an empathetic dialog system. EmpTransfo utilizes state-of-the-art pre-trained models (e.g., OpenAI-GPT) for language generation, though models with different sizes can be used. We show that utilizing the history of emotions and other metadata can improve the quality of generated conversations by the dialog system. Our experimental results using a challenging language corpus show that the proposed approach outperforms other models in terms of Hit@1 and PPL (Perplexity).
Tasks Text Generation
Published 2020-03-05
URL https://arxiv.org/abs/2003.02958v1
PDF https://arxiv.org/pdf/2003.02958v1.pdf
PWC https://paperswithcode.com/paper/emptransfo-a-multi-head-transformer
Repo
Framework

Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies

Title Parameterizing Branch-and-Bound Search Trees to Learn Branching Policies
Authors Giulia Zarpellon, Jason Jo, Andrea Lodi, Yoshua Bengio
Abstract Branch and Bound (B&B) is the exact tree search method typically used to solve Mixed-Integer Linear Programming problems (MILPs). Learning branching policies for MILP has become an active research area, with most works proposing to imitate the strong branching rule and specialize it to distinct classes of problems. We aim instead at learning a policy that generalizes across heterogeneous MILPs: our main hypothesis is that parameterizing the state of the B&B search tree can significantly aid this type of generalization. We propose a novel imitation learning framework, and introduce new input features and architectures to represent branching. Experiments on MILP benchmark instances clearly show the advantages of incorporating to a baseline model an explicit parameterization of the state of the search tree to modulate the branching decisions. The resulting policy reaches higher accuracy than the baseline, and on average explores smaller B&B trees, while effectively allowing generalization to generic unseen instances.
Tasks Imitation Learning
Published 2020-02-12
URL https://arxiv.org/abs/2002.05120v1
PDF https://arxiv.org/pdf/2002.05120v1.pdf
PWC https://paperswithcode.com/paper/parameterizing-branch-and-bound-search-trees
Repo
Framework

Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning

Title Accelerating Reinforcement Learning for Reaching using Continuous Curriculum Learning
Authors Sha Luo, Hamidreza Kasaei, Lambert Schomaker
Abstract Reinforcement learning has shown great promise in the training of robot behavior due to the sequential decision making characteristics. However, the required enormous amount of interactive and informative training data provides the major stumbling block for progress. In this study, we focus on accelerating reinforcement learning (RL) training and improving the performance of multi-goal reaching tasks. Specifically, we propose a precision-based continuous curriculum learning (PCCL) method in which the requirements are gradually adjusted during the training process, instead of fixing the parameter in a static schedule. To this end, we explore various continuous curriculum strategies for controlling a training process. This approach is tested using a Universal Robot 5e in both simulation and real-world multi-goal reach experiments. Experimental results support the hypothesis that a static training schedule is suboptimal, and using an appropriate decay function for curriculum learning provides superior results in a faster way.
Tasks Decision Making
Published 2020-02-07
URL https://arxiv.org/abs/2002.02697v1
PDF https://arxiv.org/pdf/2002.02697v1.pdf
PWC https://paperswithcode.com/paper/accelerating-reinforcement-learning-for
Repo
Framework

Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks

Title Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks
Authors Sourjya Roy, Priyadarshini Panda, Gopalakrishnan Srinivasan, Anand Raghunathan
Abstract Modern deep networks have millions to billions of parameters, which leads to high memory and energy requirements during training as well as during inference on resource-constrained edge devices. Consequently, pruning techniques have been proposed that remove less significant weights in deep networks, thereby reducing their memory and computational requirements. Pruning is usually performed after training the original network, and is followed by further retraining to compensate for the accuracy loss incurred during pruning. The prune-and-retrain procedure is repeated iteratively until an optimum tradeoff between accuracy and efficiency is reached. However, such iterative retraining adds to the overall training complexity of the network. In this work, we propose a dynamic pruning-while-training procedure, wherein we prune filters of the convolutional layers of a deep network during training itself, thereby precluding the need for separate retraining. We evaluate our dynamic pruning-while-training approach with three different pre-existing pruning strategies, viz. mean activation-based pruning, random pruning, and L1 normalization-based pruning. Our results for VGG-16 trained on CIFAR10 shows that L1 normalization provides the best performance among all the techniques explored in this work with less than 1% drop in accuracy after pruning 80% of the filters compared to the original network. We further evaluated the L1 normalization based pruning mechanism on CIFAR100. Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters compared to the original network and ~5% loss for high pruning rates (>80%). The proposed pruning methodology yields 41% reduction in the number of computations and memory accesses during training for CIFAR10, CIFAR100 and ImageNet compared to training with retraining for 10 epochs .
Tasks
Published 2020-03-05
URL https://arxiv.org/abs/2003.02800v1
PDF https://arxiv.org/pdf/2003.02800v1.pdf
PWC https://paperswithcode.com/paper/pruning-filters-while-training-for
Repo
Framework

Long-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss

Title Long-tail Visual Relationship Recognition with a Visiolinguistic Hubless Loss
Authors Sherif Abdelkarim, Panos Achlioptas, Jiaji Huang, Boyang Li, Kenneth Church, Mohamed Elhoseiny
Abstract Scaling up the vocabulary and complexity of current visual understanding systems is necessary in order to bridge the gap between human and machine visual intelligence. However, a crucial impediment to this end lies in the difficulty of generalizing to data distributions that come from real-world scenarios. Typically such distributions follow Zipf’s law which states that only a small portion of the collected object classes will have abundant examples (head); while most classes will contain just a few (tail). In this paper, we propose to study a novel task concerning the generalization of visual relationships that are on the distribution’s tail, i.e. we investigate how to help AI systems to better recognize rare relationships like <S:dog, P:riding, O:horse>, where the subject S, predicate P, and/or the object O come from the tail of the corresponding distributions. To achieve this goal, we first introduce two large-scale visual-relationship detection benchmarks built upon the widely used Visual Genome and GQA datasets. We also propose an intuitive evaluation protocol that gives credit to classifiers who prefer concepts that are semantically close to the ground truth class according to wordNet- or word2vec-induced metrics. Finally, we introduce a visiolinguistic version of a Hubless loss which we show experimentally that it consistently encourages classifiers to be more predictive of the tail classes while still being accurate on head classes. Our code and models are available on http://bit.ly/LTVRR.
Tasks
Published 2020-03-25
URL https://arxiv.org/abs/2004.00436v1
PDF https://arxiv.org/pdf/2004.00436v1.pdf
PWC https://paperswithcode.com/paper/long-tail-visual-relationship-recognition
Repo
Framework

Sharp Analysis of Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization

Title Sharp Analysis of Epoch Stochastic Gradient Descent Ascent Methods for Min-Max Optimization
Authors Yan Yan, Yi Xu, Qihang Lin, Wei Liu, Tianbao Yang
Abstract Epoch gradient descent method (a.k.a. Epoch-GD) proposed by (Hazan and Kale, 2011) was deemed a breakthrough for stochastic strongly convex minimization, which achieves the optimal convergence rate of O(1/T) with T iterative updates for the objective gap. However, its extension to solving stochastic min-max problems with strong convexity and strong concavity still remains open, and it is still unclear whether a fast rate of O(1/T) for the duality gap is achievable for stochastic min-max optimization under strong convexity and strong concavity. Although some recent studies have proposed stochastic algorithms with fast convergence rates for min-max problems, they require additional assumptions about the problem, e.g., smoothness, bi-linear structure, etc. In this paper, we bridge this gap by providing a sharp analysis of epoch-wise stochastic gradient descent ascent method (referred to as Epoch-GDA) for solving strongly convex strongly concave (SCSC) min-max problems, without imposing any additional assumptions about smoothness or its structure. To the best of our knowledge, our result is the first one that shows Epoch-GDA can achieve the fast rate of O(1/T) for the duality gap of general SCSC min-max problems. We emphasize that such generalization of Epoch-GD for strongly convex minimization problems to Epoch-GDA for SCSC min-max problems is non-trivial and requires novel technical analysis. Moreover, we notice that the key lemma can be also used for proving the convergence of Epoch-GDA for weakly-convex strongly-concave min-max problems, leading to the best complexity as well without smoothness or other structural conditions.
Tasks
Published 2020-02-13
URL https://arxiv.org/abs/2002.05309v1
PDF https://arxiv.org/pdf/2002.05309v1.pdf
PWC https://paperswithcode.com/paper/sharp-analysis-of-epoch-stochastic-gradient
Repo
Framework
Title AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Authors Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou
Abstract Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i.e., the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12.7x to 29.3x faster than BERT in inference time and 11.5x to 17.0x smaller in terms of parameter size, while comparable performance is maintained.
Tasks Neural Architecture Search
Published 2020-01-13
URL https://arxiv.org/abs/2001.04246v1
PDF https://arxiv.org/pdf/2001.04246v1.pdf
PWC https://paperswithcode.com/paper/adabert-task-adaptive-bert-compression-with
Repo
Framework

Gamma-Reward: A Novel Multi-Agent Reinforcement Learning Method for Traffic Signal Control

Title Gamma-Reward: A Novel Multi-Agent Reinforcement Learning Method for Traffic Signal Control
Authors Junjia Liu, Huimin Zhang, Zhuang Fu, Yao Wang
Abstract The intelligent control of traffic signal is critical to the optimization of transportation systems. To solve the problem in large-scale road networks, recent research has focused on interactions among intersections, which have shown promising results. However, existing studies pay more attention to the sensation sharing among agents and do not care about the results after taking each action. In this paper, we propose a novel multi-agent interaction mechanism, defined as Gamma-Reward that includes both original Gamma-Reward and Gamma-Attention-Reward, which use the space-time information in the replay buffer to amend the reward of each action, for traffic signal control based on deep reinforcement learning method. We give a detailed theoretical foundation and prove the proposed method can converge to Nash Equilibrium. By extending the idea of Markov Chain to the road network, this interaction mechanism replaces the graph attention method and realizes the decoupling of the road network, which is more in line with practical applications. Simulation and experiment results demonstrate that the proposed model can get better performance than previous studies, by amending the reward. To our best knowledge, our work appears to be the first to treat the road network itself as a Markov Chain.
Tasks Multi-agent Reinforcement Learning
Published 2020-02-27
URL https://arxiv.org/abs/2002.11874v1
PDF https://arxiv.org/pdf/2002.11874v1.pdf
PWC https://paperswithcode.com/paper/gamma-reward-a-novel-multi-agent
Repo
Framework

Dataset Cleaning – A Cross Validation Methodology for Large Facial Datasets using Face Recognition

Title Dataset Cleaning – A Cross Validation Methodology for Large Facial Datasets using Face Recognition
Authors Viktor Varkarakis, Peter Corcoran
Abstract In recent years, large “in the wild” face datasets have been released in an attempt to facilitate progress in tasks such as face detection, face recognition, and other tasks. Most of these datasets are acquired from webpages with automatic procedures. As a consequence, noisy data are often found. Furthermore, in these large face datasets, the annotation of identities is important as they are used for training face recognition algorithms. But due to the automatic way of gathering these datasets and due to their large size, many identities folder contain mislabeled samples which deteriorates the quality of the datasets. In this work, it is presented a semi-automatic method for cleaning the noisy large face datasets with the use of face recognition. This methodology is applied to clean the CelebA dataset and show its effectiveness. Furthermore, the list with the mislabelled samples in the CelebA dataset is made available.
Tasks Face Detection, Face Recognition
Published 2020-03-24
URL https://arxiv.org/abs/2003.10815v1
PDF https://arxiv.org/pdf/2003.10815v1.pdf
PWC https://paperswithcode.com/paper/dataset-cleaning-a-cross-validation
Repo
Framework

Long Short-Term Sample Distillation

Title Long Short-Term Sample Distillation
Authors Liang Jiang, Zujie Wen, Zhongping Liang, Yafang Wang, Gerard de Melo, Zhe Li, Liangzhuang Ma, Jiaxing Zhang, Xiaolong Li, Yuan Qi
Abstract In the past decade, there has been substantial progress at training increasingly deep neural networks. Recent advances within the teacher–student training paradigm have established that information about past training updates show promise as a source of guidance during subsequent training steps. Based on this notion, in this paper, we propose Long Short-Term Sample Distillation, a novel training policy that simultaneously leverages multiple phases of the previous training process to guide the later training updates to a neural network, while efficiently proceeding in just one single generation pass. With Long Short-Term Sample Distillation, the supervision signal for each sample is decomposed into two parts: a long-term signal and a short-term one. The long-term teacher draws on snapshots from several epochs ago in order to provide steadfast guidance and to guarantee teacher–student differences, while the short-term one yields more up-to-date cues with the goal of enabling higher-quality updates. Moreover, the teachers for each sample are unique, such that, overall, the model learns from a very diverse set of teachers. Comprehensive experimental results across a range of vision and NLP tasks demonstrate the effectiveness of this new training method.
Tasks
Published 2020-03-02
URL https://arxiv.org/abs/2003.00739v1
PDF https://arxiv.org/pdf/2003.00739v1.pdf
PWC https://paperswithcode.com/paper/long-short-term-sample-distillation
Repo
Framework

KPNet: Towards Minimal Face Detector

Title KPNet: Towards Minimal Face Detector
Authors Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
Abstract The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors. In this work, we find that the appearance feature of a generic face is discriminative enough for a tiny and shallow neural network to verify from the background. And the essential barriers behind us are 1) the vague definition of the face bounding box and 2) tricky design of anchor-boxes or receptive field. Unlike most top-down methods for joint face detection and alignment, the proposed KPNet detects small facial keypoints instead of the whole face by in a bottom-up manner. It first predicts the facial landmarks from a low-resolution image via the well-designed fine-grained scale approximation and scale adaptive soft-argmax operator. Finally, the precise face bounding boxes, no matter how we define it, can be inferred from the keypoints. Without any complex head architecture or meticulous network designing, the KPNet achieves state-of-the-art accuracy on generic face detection and alignment benchmarks with only $\sim1M$ parameters, which runs at 1000fps on GPU and is easy to perform real-time on most modern front-end chips.
Tasks Face Detection
Published 2020-03-17
URL https://arxiv.org/abs/2003.07543v1
PDF https://arxiv.org/pdf/2003.07543v1.pdf
PWC https://paperswithcode.com/paper/kpnet-towards-minimal-face-detector
Repo
Framework

Deep Multi-task Multi-label CNN for Effective Facial Attribute Classification

Title Deep Multi-task Multi-label CNN for Effective Facial Attribute Classification
Authors Longbiao Mao, Yan Yan, Jing-Hao Xue, Hanzi Wang
Abstract Facial Attribute Classification (FAC) has attracted increasing attention in computer vision and pattern recognition. However, state-of-the-art FAC methods perform face detection/alignment and FAC independently. The inherent dependencies between these tasks are not fully exploited. In addition, most methods predict all facial attributes using the same CNN network architecture, which ignores the different learning complexities of facial attributes. To address the above problems, we propose a novel deep multi-task multi-label CNN, termed DMM-CNN, for effective FAC. Specifically, DMM-CNN jointly optimizes two closely-related tasks (i.e., facial landmark detection and FAC) to improve the performance of FAC by taking advantage of multi-task learning. To deal with the diverse learning complexities of facial attributes, we divide the attributes into two groups: objective attributes and subjective attributes. Two different network architectures are respectively designed to extract features for two groups of attributes, and a novel dynamic weighting scheme is proposed to automatically assign the loss weight to each facial attribute during training. Furthermore, an adaptive thresholding strategy is developed to effectively alleviate the problem of class imbalance for multi-label learning. Experimental results on the challenging CelebA and LFWA datasets show the superiority of the proposed DMM-CNN method compared with several state-of-the-art FAC methods.
Tasks Face Detection, Facial Attribute Classification, Facial Landmark Detection, Multi-Label Learning, Multi-Task Learning
Published 2020-02-10
URL https://arxiv.org/abs/2002.03683v1
PDF https://arxiv.org/pdf/2002.03683v1.pdf
PWC https://paperswithcode.com/paper/deep-multi-task-multi-label-cnn-for-effective
Repo
Framework

Global Texture Enhancement for Fake Face Detection in the Wild

Title Global Texture Enhancement for Fake Face Detection in the Wild
Authors Zhengzhe Liu, Xiaojuan Qi, Philip Torr
Abstract Generative Adversarial Networks (GANs) can generate realistic fake face images that can easily fool human beings.On the contrary, a common Convolutional Neural Network(CNN) discriminator can achieve more than 99.9% accuracyin discerning fake/real images. In this paper, we conduct an empirical study on fake/real faces, and have two important observations: firstly, the texture of fake faces is substantially different from real ones; secondly, global texture statistics are more robust to image editing and transferable to fake faces from different GANs and datasets. Motivated by the above observations, we propose a new architecture coined as Gram-Net, which leverages global image texture representations for robust fake image detection. Experimental results on several datasets demonstrate that our Gram-Net outperforms existing approaches. Especially, our Gram-Netis more robust to image editings, e.g. down-sampling, JPEG compression, blur, and noise. More importantly, our Gram-Net generalizes significantly better in detecting fake faces from GAN models not seen in the training phase and can perform decently in detecting fake natural images.
Tasks Face Detection, Fake Image Detection
Published 2020-02-01
URL https://arxiv.org/abs/2002.00133v3
PDF https://arxiv.org/pdf/2002.00133v3.pdf
PWC https://paperswithcode.com/paper/global-texture-enhancement-for-fake-face
Repo
Framework

Improved Gradient based Adversarial Attacks for Quantized Networks

Title Improved Gradient based Adversarial Attacks for Quantized Networks
Authors Kartik Gupta, Thalaiyasingam Ajanthan
Abstract Neural network quantization has become increasingly popular due to efficient memory consumption and faster computation resulting from bitwise operations on the quantized networks. Even though they exhibit excellent generalization capabilities, their robustness properties are not well-understood. In this work, we systematically study the robustness of quantized networks against gradient based adversarial attacks and demonstrate that these quantized models suffer from gradient vanishing issues and show a fake sense of security. By attributing gradient vanishing to poor forward-backward signal propagation in the trained network, we introduce a simple temperature scaling approach to mitigate this issue while preserving the decision boundary. Despite being a simple modification to existing gradient based adversarial attacks, experiments on CIFAR-10/100 datasets with VGG-16 and ResNet-18 networks demonstrate that our temperature scaled attacks obtain near-perfect success rate on quantized networks while outperforming original attacks on adversarially trained models as well as floating-point networks.
Tasks Quantization
Published 2020-03-30
URL https://arxiv.org/abs/2003.13511v1
PDF https://arxiv.org/pdf/2003.13511v1.pdf
PWC https://paperswithcode.com/paper/improved-gradient-based-adversarial-attacks
Repo
Framework

GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation

Title GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation
Authors Wallace Lira, Johannes Merz, Daniel Ritchie, Daniel Cohen-Or, Hao Zhang
Abstract We introduce GANHopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images which resemble weighted hybrids between images from the two in-put domains. Our network is trained on unpaired images from the two domains only, without any in-between images.All hops are produced using a single generator along each direction. In addition to the standard cycle-consistency and adversarial losses, we introduce a new hybrid discrimina-tor, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count. We also introduce a smoothness term to constrain the magnitude of each hop,further regularizing the translation. Compared to previous methods, GANHopper excels at image translations involving domain-specific image features and geometric variations while also preserving non-domain-specific features such as backgrounds and general color schemes.
Tasks Image-to-Image Translation, Unsupervised Image-To-Image Translation
Published 2020-02-24
URL https://arxiv.org/abs/2002.10102v2
PDF https://arxiv.org/pdf/2002.10102v2.pdf
PWC https://paperswithcode.com/paper/ganhopper-multi-hop-gan-for-unsupervised
Repo
Framework
comments powered by Disqus