April 3, 2020

# Paper Group AWR 24

Advbox: a toolbox to generate adversarial examples that fool neural networks. On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks. Inductive Document Network Embedding with Topic-Word Attention. Explaining Away Attacks Against Neural Networks. Predicting Many Properties of a Quantum System from Very Few Measurements. Mu …

#### Advbox: a toolbox to generate adversarial examples that fool neural networks

Title Advbox: a toolbox to generate adversarial examples that fool neural networks
Authors Dou Goodman, Hao Xin, Wang Yang, Wu Yuesheng, Xiong Junfeng, Zhang Huan
Published 2020-01-13
URL https://arxiv.org/abs/2001.05574v4
PDF https://arxiv.org/pdf/2001.05574v4.pdf
Framework tf

#### On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks

Title On the Learning Property of Logistic and Softmax Losses for Deep Neural Networks
Authors Xiangrui Li, Xin Li, Deng Pan, Dongxiao Zhu
Abstract Deep convolutional neural networks (CNNs) trained with logistic and softmax losses have made significant advancement in visual recognition tasks in computer vision. When training data exhibit class imbalances, the class-wise reweighted version of logistic and softmax losses are often used to boost performance of the unweighted version. In this paper, motivated to explain the reweighting mechanism, we explicate the learning property of those two loss functions by analyzing the necessary condition (e.g., gradient equals to zero) after training CNNs to converge to a local minimum. The analysis immediately provides us explanations for understanding (1) quantitative effects of the class-wise reweighting mechanism: deterministic effectiveness for binary classification using logistic loss yet indeterministic for multi-class classification using softmax loss; (2) disadvantage of logistic loss for single-label multi-class classification via one-vs.-all approach, which is due to the averaging effect on predicted probabilities for the negative class (e.g., non-target classes) in the learning process. With the disadvantage and advantage of logistic loss disentangled, we thereafter propose a novel reweighted logistic loss for multi-class classification. Our simple yet effective formulation improves ordinary logistic loss by focusing on learning hard non-target classes (target vs. non-target class in one-vs.-all) and turned out to be competitive with softmax loss. We evaluate our method on several benchmark datasets to demonstrate its effectiveness.
Published 2020-03-04
URL https://arxiv.org/abs/2003.02309v1
PDF https://arxiv.org/pdf/2003.02309v1.pdf
PWC https://paperswithcode.com/paper/on-the-learning-property-of-logistic-and
Repo https://github.com/Dichoto/LGL-INR
Framework pytorch

#### Inductive Document Network Embedding with Topic-Word Attention

Title Inductive Document Network Embedding with Topic-Word Attention
Authors Robin Brochier, Adrien Guille, Julien Velcin
Abstract Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TWA), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.
Published 2020-01-10
URL https://arxiv.org/abs/2001.03369v1
PDF https://arxiv.org/pdf/2001.03369v1.pdf
PWC https://paperswithcode.com/paper/inductive-document-network-embedding-with
Repo https://github.com/brochier/idne
Framework none

#### Explaining Away Attacks Against Neural Networks

Title Explaining Away Attacks Against Neural Networks
Authors Sean Saito, Jin Wang
Abstract We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. Utilizing this intuition, we propose a framework which can identify whether a given input is adversarial based on the explanations given by the model. Code for our experiments can be found here: https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks.
Published 2020-03-06
URL https://arxiv.org/abs/2003.05748v1
PDF https://arxiv.org/pdf/2003.05748v1.pdf
PWC https://paperswithcode.com/paper/explaining-away-attacks-against-neural
Repo https://github.com/seansaito/Explaining-Away-Attacks-Against-Neural-Networks
Framework pytorch

#### Predicting Many Properties of a Quantum System from Very Few Measurements

Title Predicting Many Properties of a Quantum System from Very Few Measurements
Authors Hsin-Yuan Huang, Richard Kueng, John Preskill
Abstract Predicting properties of complex, large-scale quantum systems is essential for developing quantum technologies. We present an efficient method for constructing an approximate classical description of a quantum state using very few measurements of the state. This description, called a classical shadow, can be used to predict many different properties: order $\log M$ measurements suffice to accurately predict $M$ different functions of the state with high success probability. The number of measurements is independent of the system size, and saturates information-theoretic lower bounds. Moreover, target properties to predict can be selected after the measurements are completed. We support our theoretical findings with extensive numerical experiments. We apply classical shadows to predict quantum fidelities, entanglement entropies, two-point correlation functions, expectation values of local observables, and the energy variance of many-body local Hamiltonians, which allows applications to speedup variational quantum algorithms. The numerical results highlight the advantages of classical shadows relative to previously known methods.
Published 2020-02-18
URL https://arxiv.org/abs/2002.08953v1
PDF https://arxiv.org/pdf/2002.08953v1.pdf
PWC https://paperswithcode.com/paper/predicting-many-properties-of-a-quantum
Repo https://github.com/momohuang/predicting-quantum-properties
Framework none

#### Multitask Emotion Recognition with Incomplete Labels

Title Multitask Emotion Recognition with Incomplete Labels
Authors Didan Deng, Zhaokang Chen, Bertram E. Shi
Abstract We train a unified model to perform three tasks: facial action unit detection, expression classification, and valence-arousal estimation. We address two main challenges of learning the three tasks. First, most existing datasets are highly imbalanced. Second, most existing datasets do not contain labels for all three tasks. To tackle the first challenge, we apply data balancing techniques to experimental datasets. To tackle the second challenge, we propose an algorithm for the multitask model to learn from missing (incomplete) labels. This algorithm has two steps. We first train a teacher model to perform all three tasks, where each instance is trained by the ground truth label of its corresponding task. Secondly, we refer to the outputs of the teacher model as the soft labels. We use the soft labels and the ground truth to train the student model. We find that most of the student models outperform their teacher model on all the three tasks. Finally, we use model ensembling to boost performance further on the three tasks.
Tasks Action Unit Detection, Emotion Recognition, Facial Action Unit Detection
Published 2020-02-10
URL https://arxiv.org/abs/2002.03557v2
PDF https://arxiv.org/pdf/2002.03557v2.pdf
PWC https://paperswithcode.com/paper/fau-facial-expressions-valence-and-arousal-a
Framework pytorch

#### Robust Robotic Pouring using Audition and Haptics

Title Robust Robotic Pouring using Audition and Haptics
Authors Hongzhuo Liang, Chuangchuang Zhou, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun, Jianwei Zhang
Abstract Robust and accurate estimation of liquid height lies as an essential part of pouring tasks for service robots. However, vision-based methods often fail in occluded conditions while audio-based methods cannot work well in a noisy environment. We instead propose a multimodal pouring network (MP-Net) that is able to robustly predict liquid height by conditioning on both audition and haptics input. MP-Net is trained on a self-collected multimodal pouring dataset. This dataset contains 300 robot pouring recordings with audio and force/torque measurements for three types of target containers. We also augment the audio data by inserting robot noise. We evaluated MP-Net on our collected dataset and a wide variety of robot experiments. Both network training results and robot experiments demonstrate that MP-Net is robust against noise and changes to the task and environment. Moreover, we further combine the predicted height and force data to estimate the shape of the target container.
Published 2020-02-29
URL https://arxiv.org/abs/2003.00342v1
PDF https://arxiv.org/pdf/2003.00342v1.pdf
PWC https://paperswithcode.com/paper/robust-robotic-pouring-using-audition-and
Repo https://github.com/lianghongzhuo/MultimodalPouring
Framework pytorch

#### Adversarial Robustness Through Local Lipschitzness

Title Adversarial Robustness Through Local Lipschitzness
Authors Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, Kamalika Chaudhuri
Abstract A standard method for improving the robustness of neural networks is adversarial training, where the network is trained on adversarial examples that are close to the training inputs. This produces classifiers that are robust, but it often decreases clean accuracy. Prior work even posits that the tradeoff between robustness and accuracy may be inevitable. We investigate this tradeoff in more depth through the lens of local Lipschitzness. In many image datasets, the classes are separated in the sense that images with different labels are not extremely close in $\ell_\infty$ distance. Using this separation as a starting point, we argue that it is possible to achieve both accuracy and robustness by encouraging the classifier to be locally smooth around the data. More precisely, we consider classifiers that are obtained by rounding locally Lipschitz functions. Theoretically, we show that such classifiers exist for any dataset such that there is a positive distance between the support of different classes. Empirically, we compare the local Lipschitzness of classifiers trained by several methods. Our results show that having a small Lipschitz constant correlates with achieving high clean and robust accuracy, and therefore, the smoothness of the classifier is an important property to consider in the context of adversarial examples. Code available at https://github.com/yangarbiter/robust-local-lipschitz .
Published 2020-03-05
URL https://arxiv.org/abs/2003.02460v1
PDF https://arxiv.org/pdf/2003.02460v1.pdf
Repo https://github.com/yangarbiter/robust-local-lipschitz
Framework pytorch

#### Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning

Title Periodic Intra-Ensemble Knowledge Distillation for Reinforcement Learning
Authors Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
Abstract Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts’ policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillation (PIEKD). PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation. Our experiments demonstrate that PIEKD improves upon a state-of-the-art RL method in sample efficiency on several challenging MuJoCo benchmark tasks. Additionally, we perform ablation studies to better understand PIEKD.
Published 2020-02-01
URL https://arxiv.org/abs/2002.00149v1
PDF https://arxiv.org/pdf/2002.00149v1.pdf
PWC https://paperswithcode.com/paper/periodic-intra-ensemble-knowledge
Repo https://github.com/pfnet-research/piekd
Framework none

#### BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations

Title BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations
Authors Hyungjun Kim, Kyungsu Kim, Jinseok Kim, Jae-Joon Kim
Abstract Binary Neural Networks (BNNs) have been garnering interest thanks to their compute cost reduction and memory savings. However, BNNs suffer from performance degradation mainly due to the gradient mismatch caused by binarizing activations. Previous works tried to address the gradient mismatch problem by reducing the discrepancy between activation functions used at forward pass and its differentiable approximation used at backward pass, which is an indirect measure. In this work, we use the gradient of smoothed loss function to better estimate the gradient mismatch in quantized neural network. Analysis using the gradient mismatch estimator indicates that using higher precision for activation is more effective than modifying the differentiable approximation of activation function. Based on the observation, we propose a new training scheme for binary activation networks called BinaryDuo in which two binary activations are coupled into a ternary activation during training. Experimental results show that BinaryDuo outperforms state-of-the-art BNNs on various benchmarks with the same amount of parameters and computing cost.
Published 2020-02-16
URL https://arxiv.org/abs/2002.06517v1
PDF https://arxiv.org/pdf/2002.06517v1.pdf
Repo https://github.com/Hyungjun-K1m/BinaryDuo
Framework pytorch

Authors Peng Xu
Abstract Free-hand sketches are highly hieroglyphic and illustrative, which have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly more popular. The prosperity of deep learning has also immensely promoted the research for the free-hand sketch. This paper presents a comprehensive survey of the free-hand sketch oriented deep learning techniques. The main contents of this survey include: (i) The intrinsic traits and domain-unique challenges of the free-hand sketch are discussed, to clarify the essential differences between free-hand sketch and other data modalities, e.g., natural photo. (ii) The development of the free-hand sketch community in the deep learning era is reviewed, by surveying the existing datasets, research topics, and the state-of-the-art methods via a detailed taxonomy. (iii) Moreover, the bottlenecks, open problems, and potential research directions of this community have also been discussed to promote the future works.
Published 2020-01-08
URL https://arxiv.org/abs/2001.02600v1
PDF https://arxiv.org/pdf/2001.02600v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-free-hand-sketch-a-survey
Framework tf

#### On Pruning Adversarially Robust Neural Networks

Title On Pruning Adversarially Robust Neural Networks
Authors Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana
Abstract In safety-critical but computationally resource-constrained applications, deep learning faces two key challenges: lack of robustness against adversarial attacks and large neural network size (often millions of parameters). While the research community has extensively explored the use of robust training and network pruning \emph{independently} to address one of these challenges, we show that integrating existing pruning techniques with multiple types of robust training techniques, including verifiably robust training, leads to poor robust accuracy even though such techniques can preserve high regular accuracy. We further demonstrate that making pruning techniques aware of the robust learning objective can lead to a large improvement in performance. We realize this insight by formulating the pruning objective as an empirical risk minimization problem which is then solved using SGD. We demonstrate the success of the proposed pruning technique across CIFAR-10, SVHN, and ImageNet dataset with four different robust training techniques: iterative adversarial training, randomized smoothing, MixTrain, and CROWN-IBP. Specifically, at 99% connection pruning ratio, we achieve gains up to 3.2, 10.0, and 17.8 percentage points in robust accuracy under state-of-the-art adversarial attacks for ImageNet, CIFAR-10, and SVHN dataset, respectively. Our code and compressed networks are publicly available at https://github.com/inspire-group/compactness-robustness
Published 2020-02-24
URL https://arxiv.org/abs/2002.10509v1
PDF https://arxiv.org/pdf/2002.10509v1.pdf
Repo https://github.com/inspire-group/compactness-robustness
Framework pytorch

#### Learning Video Object Segmentation from Unlabeled Videos

Title Learning Video Object Segmentation from Unlabeled Videos
Authors Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi
Abstract We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data. We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures intrinsic properties of VOS at multiple granularities. Our approach can help advance understanding of visual patterns in VOS and significantly reduce annotation burden. With a carefully-designed architecture and strong representation learning ability, our learned model can be applied to diverse VOS settings, including object-level zero-shot VOS, instance-level zero-shot VOS, and one-shot VOS. Experiments demonstrate promising performance in these settings, as well as the potential of MuG in leveraging unlabeled data to further improve the segmentation accuracy.
Tasks Representation Learning, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2020-03-10
URL https://arxiv.org/abs/2003.05020v1
PDF https://arxiv.org/pdf/2003.05020v1.pdf
PWC https://paperswithcode.com/paper/learning-video-object-segmentation-from-2
Repo https://github.com/carrierlxk/MuG
Framework none

#### TailorGAN: Making User-Defined Fashion Designs

Title TailorGAN: Making User-Defined Fashion Designs
Authors Lele Chen, Justin Tian, Guo Li, Cheng-Haw Wu, Erh-Kan King, Kuan-Ting Chen, Shao-Hang Hsieh, Chenliang Xu
Abstract Attribute editing has become an important and emerging topic of computer vision. In this paper, we consider a task: given a reference garment image A and another image B with target attribute (collar/sleeve), generate a photo-realistic image which combines the texture from reference A and the new attribute from reference B. The highly convoluted attributes and the lack of paired data are the main challenges to the task. To overcome those limitations, we propose a novel self-supervised model to synthesize garment images with disentangled attributes (e.g., collar and sleeves) without paired data. Our method consists of a reconstruction learning step and an adversarial learning step. The model learns texture and location information through reconstruction learning. And, the model’s capability is generalized to achieve single-attribute manipulation by adversarial learning. Meanwhile, we compose a new dataset, named GarmentSet, with annotation of landmarks of collars and sleeves on clean garment images. Extensive experiments on this dataset and real-world samples demonstrate that our method can synthesize much better results than the state-of-the-art methods in both quantitative and qualitative comparisons.
Published 2020-01-17
URL https://arxiv.org/abs/2001.06427v2
PDF https://arxiv.org/pdf/2001.06427v2.pdf
PWC https://paperswithcode.com/paper/tailorgan-making-user-defined-fashion-designs
Repo https://github.com/gli-27/TailorGAN
Framework pytorch

#### Domain Adaptation by Class Centroid Matching and Local Manifold Self-Learning

Title Domain Adaptation by Class Centroid Matching and Local Manifold Self-Learning
Authors Lei Tian, Yongqiang Tang, Liangchen Hu, Zhida Ren, Wensheng Zhang
Abstract Domain adaptation has been a fundamental technology for transferring knowledge from a source domain to a target domain. The key issue of domain adaptation is how to reduce the distribution discrepancy between two domains in a proper way such that they can be treated indifferently for learning. Different from existing methods that make label prediction for target samples independently, in this paper, we propose a novel domain adaptation approach that assigns pseudo-labels to target data with the guidance of class centroids in two domains, so that the data distribution structure of both source and target domains can be emphasized. Besides, to explore the structure information of target data more thoroughly, we further introduce a local connectivity self-learning strategy into our proposal to adaptively capture the inherent local manifold structure of target samples. The aforementioned class centroid matching and local manifold self-learning are integrated into one joint optimization problem and an iterative optimization algorithm is designed to solve it with theoretical convergence guarantee. In addition to unsupervised domain adaptation, we further extend our method to the semi-supervised scenario including both homogeneous and heterogeneous settings in a direct but elegant way. Extensive experiments on five benchmark datasets validate the significant superiority of our proposal in both unsupervised and semi-supervised manners.