January 28, 2020

3688 words 18 mins read

Paper Group ANR 863

Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning. Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network. Multi-Category Fairness in Sponsored Search Auctions. Making the Invisible Visible: Action Recognition Through Walls and Occlusions. Mean Shift Rejection: Training Deep Neural Networ …

Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning


Title	Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning
Authors	Chaoyang Wang, Chen Kong, Simon Lucey
Abstract	We propose to learn a 3D pose estimator by distilling knowledge from Non-Rigid Structure from Motion (NRSfM). Our method uses solely 2D landmark annotations. No 3D data, multi-view/temporal footage, or object specific prior is required. This alleviates the data bottleneck, which is one of the major concern for supervised methods. The challenge for using NRSfM as teacher is that they often make poor depth reconstruction when the 2D projections have strong ambiguity. Directly using those wrong depth as hard target would negatively impact the student. Instead, we propose a novel loss that ties depth prediction to the cost function used in NRSfM. This gives the student pose estimator freedom to reduce depth error by associating with image features. Validated on H3.6M dataset, our learned 3D pose estimation network achieves more accurate reconstruction compared to NRSfM methods. It also outperforms other weakly supervised methods, in spite of using significantly less supervision.
Tasks	3D Pose Estimation, Depth Estimation, Pose Estimation
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06377v1
PDF	https://arxiv.org/pdf/1908.06377v1.pdf
PWC	https://paperswithcode.com/paper/distill-knowledge-from-nrsfm-for-weakly
Repo
Framework

Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network


Title	Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network
Authors	Biyi Fang, Diego Klabjan
Abstract	Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. We first introduce regret with rolling window, a new performance metric for online streaming learning, which measures the performance of an algorithm on every fixed number of contiguous samples. At the same time, we propose a family of algorithms based on gradient descent with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties of the algorithms. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide the first analysis of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with ReLU activation. In this case we establish that if initial weights are close to a stationary point, the same square root regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms.
Tasks
Published	2019-05-22
URL	https://arxiv.org/abs/1905.09356v2
PDF	https://arxiv.org/pdf/1905.09356v2.pdf
PWC	https://paperswithcode.com/paper/convergence-analyses-of-online-adam-algorithm
Repo
Framework

Multi-Category Fairness in Sponsored Search Auctions


Title	Multi-Category Fairness in Sponsored Search Auctions
Authors	Shuchi Chawla, Christina Ilvento, Meena Jagadeesan
Abstract	Fairness in advertising is a topic of particular concern motivated by theoretical and empirical observations in both the computer science and economics literature. We examine the problem of fairness in advertising for general purpose platforms that service advertisers from many different categories. First, we propose inter-category and intra-category fairness desiderata that take inspiration from individual fairness and envy-freeness. Second, we investigate the “platform utility” (a proxy for the quality of the allocation) achievable by mechanisms satisfying these desiderata. More specifically, we compare the utility of fair mechanisms against the unfair optimal, and we show by construction that our fairness desiderata are compatible with utility. That is, we construct a family of fair mechanisms with high utility that perform close to optimally within a class of fair mechanisms. Our mechanisms also enjoy nice implementation properties including metric-obliviousness, which allows the platform to produce fair allocations without needing to know the specifics of the fairness requirements.
Tasks
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08732v2
PDF	https://arxiv.org/pdf/1906.08732v2.pdf
PWC	https://paperswithcode.com/paper/individual-fairness-in-sponsored-search
Repo
Framework

Making the Invisible Visible: Action Recognition Through Walls and Occlusions


Title	Making the Invisible Visible: Action Recognition Through Walls and Occlusions
Authors	Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, Dina Katabi
Abstract	Understanding people’s actions and interactions typically depends on seeing them. Automating the process of action recognition from visual data has been the topic of much research in the computer vision community. But what if it is too dark, or if the person is occluded or behind a wall? In this paper, we introduce a neural network model that can detect human actions through walls and occlusions, and in poor lighting conditions. Our model takes radio frequency (RF) signals as input, generates 3D human skeletons as an intermediate representation, and recognizes actions and interactions of multiple people over time. By translating the input to an intermediate skeleton-based representation, our model can learn from both vision-based and RF-based datasets, and allow the two tasks to help each other. We show that our model achieves comparable accuracy to vision-based action recognition systems in visible scenarios, yet continues to work accurately when people are not visible, hence addressing scenarios that are beyond the limit of today’s vision-based action recognition.
Tasks	3D Human Pose Estimation, RF-based Pose Estimation, Skeleton Based Action Recognition
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09300v1
PDF	https://arxiv.org/pdf/1909.09300v1.pdf
PWC	https://paperswithcode.com/paper/making-the-invisible-visible-action
Repo
Framework

Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization


Title	Mean Shift Rejection: Training Deep Neural Networks Without Minibatch Statistics or Normalization
Authors	Brendan Ruff, Taylor Beck, Joscha Bach
Abstract	Deep convolutional neural networks are known to be unstable during training at high learning rate unless normalization techniques are employed. Normalizing weights or activations allows the use of higher learning rates, resulting in faster convergence and higher test accuracy. Batch normalization requires minibatch statistics that approximate the dataset statistics but this incurs additional compute and memory costs and causes a communication bottleneck for distributed training. Weight normalization and initialization-only schemes do not achieve comparable test accuracy. We introduce a new understanding of the cause of training instability and provide a technique that is independent of normalization and minibatch statistics. Our approach treats training instability as a spatial common mode signal which is suppressed by placing the model on a channel-wise zero-mean isocline that is maintained throughout training. Firstly, we apply channel-wise zero-mean initialization of filter kernels with overall unity kernel magnitude. At each training step we modify the gradients of spatial kernels so that their weighted channel-wise mean is subtracted in order to maintain the common mode rejection condition. This prevents the onset of mean shift. This new technique allows direct training of the test graph so that training and test models are identical. We also demonstrate that injecting random noise throughout the network during training improves generalization. This is based on the idea that, as a side effect, batch normalization performs deep data augmentation by injecting minibatch noise due to the weakness of the dataset approximation. Our technique achieves higher accuracy compared to batch normalization and for the first time shows that minibatches and normalization are unnecessary for state-of-the-art training.
Tasks	Data Augmentation
Published	2019-11-29
URL	https://arxiv.org/abs/1911.13173v1
PDF	https://arxiv.org/pdf/1911.13173v1.pdf
PWC	https://paperswithcode.com/paper/mean-shift-rejection-training-deep-neural
Repo
Framework

Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks


Title	Patch Reordering: a Novel Way to Achieve Rotation and Translation Invariance in Convolutional Neural Networks
Authors	Xu Shen, Xinmei Tian, Shaoyan Sun, Dacheng Tao
Abstract	Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance on many visual recognition tasks. However, the combination of convolution and pooling operations only shows invariance to small local location changes in meaningful objects in input. Sometimes, such networks are trained using data augmentation to encode this invariance into the parameters, which restricts the capacity of the model to learn the content of these objects. A more efficient use of the parameter budget is to encode rotation or translation invariance into the model architecture, which relieves the model from the need to learn them. To enable the model to focus on learning the content of objects other than their locations, we propose to conduct patch ranking of the feature maps before feeding them into the next layer. When patch ranking is combined with convolution and pooling operations, we obtain consistent representations despite the location of meaningful objects in input. We show that the patch ranking module improves the performance of the CNN on many benchmark tasks, including MNIST digit recognition, large-scale image recognition, and image retrieval. The code is available at https://github.com//jasonustc/caffe-multigpu/tree/TICNN .
Tasks	Data Augmentation, Image Retrieval
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12682v1
PDF	https://arxiv.org/pdf/1911.12682v1.pdf
PWC	https://paperswithcode.com/paper/patch-reordering-a-novel-way-to-achieve
Repo
Framework

Variational Recurrent Models for Solving Partially Observable Control Tasks


Title	Variational Recurrent Models for Solving Partially Observable Control Tasks
Authors	Dongqi Han, Kenji Doya, Jun Tani
Abstract	In partially observable (PO) environments, deep reinforcement learning (RL) agents often suffer from unsatisfactory performance, since two problems need to be tackled together: how to extract information from the raw observations to solve the task, and how to improve the policy. In this study, we propose an RL algorithm for solving PO tasks. Our method comprises two parts: a variational recurrent model (VRM) for modeling the environment, and an RL controller that has access to both the environment and the VRM. The proposed algorithm was tested in two types of PO robotic control tasks, those in which either coordinates or velocities were not observable and those that require long-term memorization. Our experiments show that the proposed algorithm achieved better data efficiency and/or learned more optimal policy than other alternative approaches in tasks in which unobserved states cannot be inferred from raw observations in a simple manner.
Tasks
Published	2019-12-23
URL	https://arxiv.org/abs/1912.10703v2
PDF	https://arxiv.org/pdf/1912.10703v2.pdf
PWC	https://paperswithcode.com/paper/variational-recurrent-models-for-solving-1
Repo
Framework

A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning


Title	A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning
Authors	Lesia Semenova, Cynthia Rudin, Ronald Parr
Abstract	The Rashomon effect occurs when many different explanations exist for the same phenomenon. In machine learning, Leo Breiman used this term to characterize problems where many accurate-but-different models exist to describe the same data. In this work, we study how the Rashomon effect can be useful for understanding the relationship between training and test performance, and the possibility that simple-yet-accurate models exist for many problems. We consider the Rashomon set - the set of almost-equally-accurate models for a given problem - and study its properties and the types of models it could contain. We present the Rashomon ratio as a new measure related to simplicity of model classes, which is the ratio of the volume of the set of accurate models to the volume of the hypothesis space; the Rashomon ratio is different from standard complexity measures from statistical learning theory. For a hierarchy of hypothesis spaces, the Rashomon ratio can help modelers to navigate the trade-off between simplicity and accuracy. In particular, we find empirically that a plot of empirical risk vs. Rashomon ratio forms a characteristic $\Gamma$-shaped Rashomon curve, whose elbow seems to be a reliable model selection criterion. When the Rashomon set is large, models that are accurate - but that also have various other useful properties - can often be obtained. These models might obey various constraints such as interpretability, fairness, or monotonicity.
Tasks	Model Selection
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01755v2
PDF	https://arxiv.org/pdf/1908.01755v2.pdf
PWC	https://paperswithcode.com/paper/a-study-in-rashomon-curves-and-volumes-a-new
Repo
Framework

End-to-end Cloud Segmentation in High-Resolution Multispectral Satellite Imagery Using Deep Learning


Title	End-to-end Cloud Segmentation in High-Resolution Multispectral Satellite Imagery Using Deep Learning
Authors	Giorgio Morales, Alejandro Ramírez, Joel Telles
Abstract	Segmenting clouds in high-resolution satellite images is an arduous and challenging task due to the many types of geographies and clouds a satellite can capture. Therefore, it needs to be automated and optimized, specially for those who regularly process great amounts of satellite images, such as governmental institutions. In that sense, the contribution of this work is twofold: We present the CloudPeru2 dataset, consisting of 22,400 images of 512x512 pixels and their respective hand-drawn cloud masks, as well as the proposal of an end-to-end segmentation method for clouds using a Convolutional Neural Network (CNN) based on the Deeplab v3+ architecture. The results over the test set achieved an accuracy of 96.62%, precision of 96.46%, specificity of 98.53%, and sensitivity of 96.72% which is superior to the compared methods.
Tasks
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12743v1
PDF	http://arxiv.org/pdf/1904.12743v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-cloud-segmentation-in-high
Repo
Framework

AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence


Title	AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
Authors	Jeff Clune
Abstract	Perhaps the most ambitious scientific quest in human history is the creation of general artificial intelligence, which roughly means AI that is as smart or smarter than humans. The dominant approach in the machine learning community is to attempt to discover each of the pieces required for intelligence, with the implicit assumption that some future group will complete the Herculean task of figuring out how to combine all of those pieces into a complex thinking machine. I call this the “manual AI approach”. This paper describes another exciting path that ultimately may be more successful at producing general AI. It is based on the clear trend in machine learning that hand-designed solutions eventually are replaced by more effective, learned solutions. The idea is to create an AI-generating algorithm (AI-GA), which automatically learns how to produce general AI. Three Pillars are essential for the approach: (1) meta-learning architectures, (2) meta-learning the learning algorithms themselves, and (3) generating effective learning environments. I argue that either approach could produce general AI first, and both are scientifically worthwhile irrespective of which is the fastest path. Because both are promising, yet the ML community is currently committed to the manual approach, I argue that our community should increase its research investment in the AI-GA approach. To encourage such research, I describe promising work in each of the Three Pillars. I also discuss AI-GA-specific safety and ethical considerations. Because it it may be the fastest path to general AI and because it is inherently scientifically interesting to understand the conditions in which a simple algorithm can produce general AI (as happened on Earth where Darwinian evolution produced human intelligence), I argue that the pursuit of AI-GAs should be considered a new grand challenge of computer science research.
Tasks	Meta-Learning
Published	2019-05-27
URL	https://arxiv.org/abs/1905.10985v2
PDF	https://arxiv.org/pdf/1905.10985v2.pdf
PWC	https://paperswithcode.com/paper/ai-gas-ai-generating-algorithms-an-alternate
Repo
Framework

Explaining Explanations to Society


Title	Explaining Explanations to Society
Authors	Leilani H. Gilpin, Cecilia Testart, Nathaniel Fruchter, Julius Adebayo
Abstract	There is a disconnect between explanatory artificial intelligence (XAI) methods and the types of explanations that are useful for and demanded by society (policy makers, government officials, etc.) Questions that experts in artificial intelligence (AI) ask opaque systems provide inside explanations, focused on debugging, reliability, and validation. These are different from those that society will ask of these systems to build trust and confidence in their decisions. Although explanatory AI systems can answer many questions that experts desire, they often don’t explain why they made decisions in a way that is precise (true to the model) and understandable to humans. These outside explanations can be used to build trust, comply with regulatory and policy changes, and act as external validation. In this paper, we focus on XAI methods for deep neural networks (DNNs) because of DNNs’ use in decision-making and inherent opacity. We explore the types of questions that explanatory DNN systems can answer and discuss challenges in building explanatory systems that provide outside explanations for societal requirements and benefit.
Tasks	Decision Making
Published	2019-01-19
URL	http://arxiv.org/abs/1901.06560v1
PDF	http://arxiv.org/pdf/1901.06560v1.pdf
PWC	https://paperswithcode.com/paper/explaining-explanations-to-society
Repo
Framework

Training Robust Deep Neural Networks via Adversarial Noise Propagation


Title	Training Robust Deep Neural Networks via Adversarial Noise Propagation
Authors	Aishan Liu, Xianglong Liu, Chongzhi Zhang, Hang Yu, Qiang Liu, Junfeng He
Abstract	Deep neural networks have been found vulnerable to noises like adversarial examples and corruption in practice. A number of adversarial defense methods have been developed, which indeed improve the model robustness towards adversarial examples in practice. However, only relying on training with the data mixed with noises, most of them still fail to defend the generalized types of noises. Motivated by the fact that hidden layers play a very important role in maintaining a robust model, this paper comes up with a simple yet powerful training algorithm named Adversarial Noise Propagation (ANP) that injects diversified noises into the hidden layers in a layer-wise manner. We show that ANP can be efficiently implemented by exploiting the nature of the popular backward-forward training style for deep models. To comprehensively understand the behaviors and contributions of hidden layers, we further explore the insights from hidden representation insensitivity and human vision perception alignment. Extensive experiments on MNIST, CIFAR-10, CIFAR-10-C, CIFAR-10-P and ImageNet demonstrate that ANP enables the strong robustness for deep models against the generalized noises including both adversarial and corrupted ones, and significantly outperforms various adversarial defense methods.
Tasks	Adversarial Defense
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09034v1
PDF	https://arxiv.org/pdf/1909.09034v1.pdf
PWC	https://paperswithcode.com/paper/training-robust-deep-neural-networks-via
Repo
Framework

Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix


Title	Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix
Authors	Chaomin Shen, Yaxin Peng, Guixu Zhang, Jinsong Fan
Abstract	We propose a scheme for defending against adversarial attacks by suppressing the largest eigenvalue of the Fisher information matrix (FIM). Our starting point is one explanation on the rationale of adversarial examples. Based on the idea of the difference between a benign sample and its adversarial example is measured by the Euclidean norm, while the difference between their classification probability densities at the last (softmax) layer of the network could be measured by the Kullback-Leibler (KL) divergence, the explanation shows that the output difference is a quadratic form of the input difference. If the eigenvalue of this quadratic form (a.k.a. FIM) is large, the output difference becomes large even when the input difference is small, which explains the adversarial phenomenon. This makes the adversarial defense possible by controlling the eigenvalues of the FIM. Our solution is adding one term representing the trace of the FIM to the loss function of the original network, as the largest eigenvalue is bounded by the trace. Our defensive scheme is verified by experiments using a variety of common attacking methods on typical deep neural networks, e.g. LeNet, VGG and ResNet, with datasets MNIST, CIFAR-10, and German Traffic Sign Recognition Benchmark (GTSRB). Our new network, after adopting the novel loss function and retraining, has an effective and robust defensive capability, as it decreases the fooling ratio of the generated adversarial examples, and remains the classification accuracy of the original network.
Tasks	Adversarial Defense, Traffic Sign Recognition
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06137v1
PDF	https://arxiv.org/pdf/1909.06137v1.pdf
PWC	https://paperswithcode.com/paper/defending-against-adversarial-attacks-by-3
Repo
Framework

Transfer Learning and Meta Classification Based Deep Churn Prediction System for Telecom Industry


Title	Transfer Learning and Meta Classification Based Deep Churn Prediction System for Telecom Industry
Authors	Uzair Ahmed, Asifullah Khan, Saddam Hussain Khan, Abdul Basit, Irfan Ul Haq, Yeon Soo Lee
Abstract	A churn prediction system guides telecom service providers to reduce revenue loss. However, the development of a churn prediction system for a telecom industry is a challenging task, mainly due to the large size of the data, high dimensional features, and imbalanced distribution of the data. In this paper, we present a solution to the inherent problems of churn prediction, using the concept of Transfer Learning (TL) and Ensemble-based Meta-Classification. The proposed method TL-DeepE is applied in two stages. The first stage employs TL by fine-tuning multiple pre-trained Deep Convolution Neural Networks (CNNs). Telecom datasets are normally in vector form, which is converted into 2D images because Deep CNNs have high learning capacity on images. In the second stage, predictions from these Deep CNNs are appended to the original feature vector and thus are used to build a final feature vector for the high-level Genetic Programming (GP) and AdaBoost based ensemble classifier. Thus, the experiments are conducted using various CNNs as base classifiers and the GP-AdaBoost as a meta-classifier. By using 10-fold cross-validation, the performance of the proposed TL-DeepE system is compared with existing techniques, for two standard telecommunication datasets; Orange and Cell2cell. Performing experiments on Orange and Cell2cell datasets, the prediction accuracy obtained was 75.4% and 68.2%, while the area under the curve was 0.83 and 0.74, respectively.
Tasks	Transfer Learning
Published	2019-01-18
URL	http://arxiv.org/abs/1901.06091v2
PDF	http://arxiv.org/pdf/1901.06091v2.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-and-meta-classification
Repo
Framework

Gated Convolutional Networks with Hybrid Connectivity for Image Classification


Title	Gated Convolutional Networks with Hybrid Connectivity for Image Classification
Authors	Chuanguang Yang, Zhulin An, Hui Zhu, Xiaolong Hu, Kun Zhang, Kaiqiang Xu, Chao Li, Yongjun Xu
Abstract	We propose a simple yet effective method to reduce the redundancy of DenseNet by substantially decreasing the number of stacked modules by replacing the original bottleneck by our SMG module, which is augmented by local residual. Furthermore, SMG module is equipped with an efficient two-stage pipeline, which aims to DenseNet-like architectures that need to integrate all previous outputs, i.e., squeezing the incoming informative but redundant features gradually by hierarchical convolutions as a hourglass shape and then exciting it by multi-kernel depthwise convolutions, the output of which would be compact and hold more informative multi-scale features. We further develop a forget and an update gate by introducing the popular attention modules to implement the effective fusion instead of a simple addition between reused and new features. Due to the Hybrid Connectivity (nested combination of global dense and local residual) and Gated mechanisms, we called our network as the HCGNet. Experimental results on CIFAR and ImageNet datasets show that HCGNet is more prominently efficient than DenseNet, and can also significantly outperform state-of-the-art networks with less complexity. Moreover, HCGNet also shows the remarkable interpretability and robustness by network dissection and adversarial defense, respectively. On MS-COCO, HCGNet can consistently learn better features than popular backbones.
Tasks	Adversarial Defense, Image Classification
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09699v3
PDF	https://arxiv.org/pdf/1908.09699v3.pdf
PWC	https://paperswithcode.com/paper/gated-convolutional-networks-with-hybrid
Repo
Framework