October 21, 2019

2750 words 13 mins read

Paper Group AWR 38

Paper Group AWR 38

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation. Deep learning for pedestrians: backpropagation in CNNs. Deep Facial Expression Recognition: A Survey. On the Decision Boundary of Deep Neural Networks. Exploiting temporal and depth information for multi-frame face anti-spoofing. Deep Neural Network C …

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation

Title DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
Authors Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson
Abstract We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions. By probing convolutional neural networks (CNNs) trained to classify cancer in mammograms, we show that many individual units in the final convolutional layer of a CNN respond strongly to diseased tissue concepts specified by the BI-RADS lexicon. After expert annotation of the interpretable units, our proposed method is able to generate explanations for CNN mammogram classification that are correlated with ground truth radiology reports on the DDSM dataset. We show that DeepMiner not only enables better understanding of the nuances of CNN classification decisions, but also possibly discovers new visual knowledge relevant to medical diagnosis.
Tasks Medical Diagnosis
Published 2018-05-31
URL http://arxiv.org/abs/1805.12323v1
PDF http://arxiv.org/pdf/1805.12323v1.pdf
PWC https://paperswithcode.com/paper/deepminer-discovering-interpretable
Repo https://github.com/jimmyyhwu/ddsm-visual-primitives
Framework pytorch

Deep learning for pedestrians: backpropagation in CNNs

Title Deep learning for pedestrians: backpropagation in CNNs
Authors Laurent Boué
Abstract The goal of this document is to provide a pedagogical introduction to the main concepts underpinning the training of deep neural networks using gradient descent; a process known as backpropagation. Although we focus on a very influential class of architectures called “convolutional neural networks” (CNNs) the approach is generic and useful to the machine learning community as a whole. Motivated by the observation that derivations of backpropagation are often obscured by clumsy index-heavy narratives that appear somewhat mathemagical, we aim to offer a conceptually clear, vectorized description that articulates well the higher level logic. Following the principle of “writing is nature’s way of letting you know how sloppy your thinking is”, we try to make the calculations meticulous, self-contained and yet as intuitive as possible. Taking nothing for granted, ample illustrations serve as visual guides and an extensive bibliography is provided for further explorations. (For the sake of clarity, long mathematical derivations and visualizations have been broken up into short “summarized views” and longer “detailed views” encoded into the PDF as optional content groups. Some figures contain animations designed to illustrate important concepts in a more engaging style. For these reasons, we advise to download the document locally and open it using Adobe Acrobat Reader. Other viewers were not tested and may not render the detailed views, animations correctly.)
Tasks
Published 2018-11-29
URL http://arxiv.org/abs/1811.11987v1
PDF http://arxiv.org/pdf/1811.11987v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-for-pedestrians-backpropagation
Repo https://github.com/Ranlot/backpropagation-CNNs
Framework pytorch

Deep Facial Expression Recognition: A Survey

Title Deep Facial Expression Recognition: A Survey
Authors Shan Li, Weihong Deng
Abstract With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.
Tasks Facial Expression Recognition
Published 2018-04-23
URL http://arxiv.org/abs/1804.08348v2
PDF http://arxiv.org/pdf/1804.08348v2.pdf
PWC https://paperswithcode.com/paper/deep-facial-expression-recognition-a-survey
Repo https://github.com/yijiazh/DFER_Summer2019
Framework tf

On the Decision Boundary of Deep Neural Networks

Title On the Decision Boundary of Deep Neural Networks
Authors Yu Li, Lizhong Ding, Xin Gao
Abstract While deep learning models and techniques have achieved great empirical success, our understanding of the source of success in many aspects remains very limited. In an attempt to bridge the gap, we investigate the decision boundary of a production deep learning architecture with weak assumptions on both the training data and the model. We demonstrate, both theoretically and empirically, that the last weight layer of a neural network converges to a linear SVM trained on the output of the last hidden layer, for both the binary case and the multi-class case with the commonly used cross-entropy loss. Furthermore, we show empirically that training a neural network as a whole, instead of only fine-tuning the last weight layer, may result in better bias constant for the last weight layer, which is important for generalization. In addition to facilitating the understanding of deep learning, our result can be helpful for solving a broad range of practical problems of deep learning, such as catastrophic forgetting and adversarial attacking. The experiment codes are available at https://github.com/lykaust15/NN_decision_boundary
Tasks
Published 2018-08-16
URL http://arxiv.org/abs/1808.05385v3
PDF http://arxiv.org/pdf/1808.05385v3.pdf
PWC https://paperswithcode.com/paper/on-the-decision-boundary-of-deep-neural
Repo https://github.com/lykaust15/NN_decision_boundary
Framework tf

Exploiting temporal and depth information for multi-frame face anti-spoofing

Title Exploiting temporal and depth information for multi-frame face anti-spoofing
Authors Zezheng Wang, Chenxu Zhao, Yunxiao Qin, Qiusheng Zhou, Guojun Qi, Jun Wan, Zhen Lei
Abstract Face anti-spoofing is significant to the security of face recognition systems. Previous works on depth supervised learning have proved the effectiveness for face anti-spoofing. Nevertheless, they only considered the depth as an auxiliary supervision in the single frame. Different from these methods, we develop a new method to estimate depth information from multiple RGB frames and propose a depth-supervised architecture which can efficiently encodes spatiotemporal information for presentation attack detection. It includes two novel modules: optical flow guided feature block (OFFB) and convolution gated recurrent units (ConvGRU) module, which are designed to extract short-term and long-term motion to discriminate living and spoofing faces. Extensive experiments demonstrate that the proposed approach achieves state-of-the-art results on four benchmark datasets, namely OULU-NPU, SiW, CASIA-MFSD, and Replay-Attack.
Tasks Face Anti-Spoofing, Face Recognition, Optical Flow Estimation
Published 2018-11-13
URL http://arxiv.org/abs/1811.05118v3
PDF http://arxiv.org/pdf/1811.05118v3.pdf
PWC https://paperswithcode.com/paper/exploiting-temporal-and-depth-information-for
Repo https://github.com/clks-wzz/PRNet-Depth-Generation
Framework tf

Deep Neural Network Compression with Single and Multiple Level Quantization

Title Deep Neural Network Compression with Single and Multiple Level Quantization
Authors Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, Hongkai Xiong
Abstract Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary).We are the first to consider the network quantization from both width and depth level. In the width level, parameters are divided into two parts: one for quantization and the other for re-training to eliminate the quantization loss. SLQ leverages the distribution of the parameters to improve the width level. In the depth level, we introduce incremental layer compensation to quantize layers iteratively which decreases the quantization loss in each iteration. The proposed approaches are validated with extensive experiments based on the state-of-the-art neural networks including AlexNet, VGG-16, GoogleNet and ResNet-18. Both SLQ and MLQ achieve impressive results.
Tasks Neural Network Compression, Quantization
Published 2018-03-06
URL http://arxiv.org/abs/1803.03289v2
PDF http://arxiv.org/pdf/1803.03289v2.pdf
PWC https://paperswithcode.com/paper/deep-neural-network-compression-with-single
Repo https://github.com/yuhuixu1993/SLQ
Framework none

Amortized Bayesian inference for clustering models

Title Amortized Bayesian inference for clustering models
Authors Ari Pakman, Liam Paninski
Abstract We develop methods for efficient amortized approximate Bayesian inference over posterior distributions of probabilistic clustering models, such as Dirichlet process mixture models. The approach is based on mapping distributed, symmetry-invariant representations of cluster arrangements into conditional probabilities. The method parallelizes easily, yields iid samples from the approximate posterior of cluster assignments with the same computational cost of a single Gibbs sampler sweep, and can easily be applied to both conjugate and non-conjugate models, as training only requires samples from the generative model.
Tasks Bayesian Inference
Published 2018-11-24
URL http://arxiv.org/abs/1811.09747v1
PDF http://arxiv.org/pdf/1811.09747v1.pdf
PWC https://paperswithcode.com/paper/amortized-bayesian-inference-for-clustering
Repo https://github.com/aripakman/neural_clustering_process
Framework pytorch

Reinforcement Learning for Solving the Vehicle Routing Problem

Title Reinforcement Learning for Solving the Vehicle Routing Problem
Authors Mohammadreza Nazari, Afshin Oroojlooy, Lawrence V. Snyder, Martin Takáč
Abstract We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google’s OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.
Tasks Combinatorial Optimization
Published 2018-02-12
URL http://arxiv.org/abs/1802.04240v2
PDF http://arxiv.org/pdf/1802.04240v2.pdf
PWC https://paperswithcode.com/paper/reinforcement-learning-for-solving-the
Repo https://github.com/OptMLGroup/VRP-RL
Framework tf

Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

Title Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
Authors Mattias Teye, Hossein Azizpour, Kevin Smith
Abstract We show that training a deep network using batch normalization is equivalent to approximate inference in Bayesian models. We further demonstrate that this finding allows us to make meaningful estimates of the model uncertainty using conventional architectures, without modifications to the network or the training procedure. Our approach is thoroughly validated by measuring the quality of uncertainty in a series of empirical experiments on different tasks. It outperforms baselines with strong statistical significance, and displays competitive performance with recent Bayesian approaches.
Tasks
Published 2018-02-18
URL http://arxiv.org/abs/1802.06455v2
PDF http://arxiv.org/pdf/1802.06455v2.pdf
PWC https://paperswithcode.com/paper/bayesian-uncertainty-estimation-for-batch
Repo https://github.com/petteriTeikari/pyML_regression_skeleton
Framework none

Node Classification for Signed Social Networks Using Diffuse Interface Methods

Title Node Classification for Signed Social Networks Using Diffuse Interface Methods
Authors Pedro Mercado, Jessica Bosch, Martin Stoll
Abstract Signed networks contain both positive and negative kinds of interactions like friendship and enmity. The task of node classification in non-signed graphs has proven to be beneficial in many real world applications, yet extensions to signed networks remain largely unexplored. In this paper we introduce the first analysis of node classification in signed social networks via diffuse interface methods based on the Ginzburg-Landau functional together with different extensions of the graph Laplacian to signed networks. We show that blending the information from both positive and negative interactions leads to performance improvement in real signed social networks, consistently outperforming the current state of the art.
Tasks Node Classification
Published 2018-09-07
URL https://arxiv.org/abs/1809.06432v2
PDF https://arxiv.org/pdf/1809.06432v2.pdf
PWC https://paperswithcode.com/paper/node-classification-for-signed-social
Repo https://github.com/melopeo/GL
Framework none

Music Genre Classification using Masked Conditional Neural Networks

Title Music Genre Classification using Masked Conditional Neural Networks
Authors Fady Medhat, David Chesmore, John Robinson
Abstract The ConditionaL Neural Networks (CLNN) and the Masked ConditionaL Neural Networks (MCLNN) exploit the nature of multi-dimensional temporal signals. The CLNN captures the conditional temporal influence between the frames in a window and the mask in the MCLNN enforces a systematic sparseness that follows a filterbank-like pattern over the network links. The mask induces the network to learn about time-frequency representations in bands, allowing the network to sustain frequency shifts. Additionally, the mask in the MCLNN automates the exploration of a range of feature combinations, usually done through an exhaustive manual search. We have evaluated the MCLNN performance using the Ballroom and Homburg datasets of music genres. MCLNN has achieved accuracies that are competitive to state-of-the-art handcrafted attempts in addition to models based on Convolutional Neural Networks.
Tasks
Published 2018-02-18
URL http://arxiv.org/abs/1802.06432v2
PDF http://arxiv.org/pdf/1802.06432v2.pdf
PWC https://paperswithcode.com/paper/music-genre-classification-using-masked
Repo https://github.com/fadymedhat/MCLNN
Framework tf

Federated Learning for Mobile Keyboard Prediction

Title Federated Learning for Mobile Keyboard Prediction
Authors Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, Daniel Ramage
Abstract We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over the use of their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices.
Tasks Language Modelling
Published 2018-11-08
URL http://arxiv.org/abs/1811.03604v2
PDF http://arxiv.org/pdf/1811.03604v2.pdf
PWC https://paperswithcode.com/paper/federated-learning-for-mobile-keyboard
Repo https://github.com/MsAmberWelch/Privacy-Engineering
Framework tf

CINIC-10 is not ImageNet or CIFAR-10

Title CINIC-10 is not ImageNet or CIFAR-10
Authors Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey
Abstract In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard benchmarks for well known models. Details for download, usage, and compilation can be found in the associated github repository.
Tasks Image Classification
Published 2018-10-02
URL http://arxiv.org/abs/1810.03505v1
PDF http://arxiv.org/pdf/1810.03505v1.pdf
PWC https://paperswithcode.com/paper/cinic-10-is-not-imagenet-or-cifar-10
Repo https://github.com/BayesWatch/cinic-10
Framework pytorch

Everybody Dance Now

Title Everybody Dance Now
Authors Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros
Abstract This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to-video translation using pose as an intermediate representation. To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Although our method is quite simple, it produces surprisingly compelling results (see video). This motivates us to also provide a forensics tool for reliable synthetic content detection, which is able to distinguish videos synthesized by our system from real data. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer.
Tasks Face Generation, Image-to-Image Translation, Video Generation
Published 2018-08-22
URL https://arxiv.org/abs/1808.07371v2
PDF https://arxiv.org/pdf/1808.07371v2.pdf
PWC https://paperswithcode.com/paper/everybody-dance-now
Repo https://github.com/ShutoAraki/EverybodyDanceNow
Framework none

Infrared and visible image fusion using Latent Low-Rank Representation

Title Infrared and visible image fusion using Latent Low-Rank Representation
Authors Hui Li, Xiao-Jun Wu
Abstract Infrared and visible image fusion is an important problem in the field of image fusion which has been applied widely in many fields. To better preserve the useful information from source images, in this paper, we propose a novel image fusion method based on latent low-rank representation(LatLRR) which is simple and effective. Firstly, the source images are decomposed into low-rank parts(global structure) and saliency parts(local structure) by LatLRR. Then, the lowrank parts are fused by weighted-average strategy to preserve more contour information. Then, the saliency parts are simply fused by sum strategy which is a efficient operation in this fusion framework. Finally, the fused image is obtained by combining the fused low-rank part and the fused saliency part. Compared with other fusion methods experimentally, the proposed method has better fusion performance than stateof-the-art fusion methods in both subjective and objective evaluation. The Code of our fusion method is available at https://github.com/hli1221/imagefusion Infrared visible latlrr
Tasks Infrared And Visible Image Fusion
Published 2018-04-24
URL https://arxiv.org/abs/1804.08992v4
PDF https://arxiv.org/pdf/1804.08992v4.pdf
PWC https://paperswithcode.com/paper/infrared-and-visible-image-fusion-using
Repo https://github.com/exceptionLi/imagefusion_Infrared_visible_latlrr
Framework none
comments powered by Disqus