October 21, 2019

2750 words 13 mins read

Paper Group AWR 38

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation. Deep learning for pedestrians: backpropagation in CNNs. Deep Facial Expression Recognition: A Survey. On the Decision Boundary of Deep Neural Networks. Exploiting temporal and depth information for multi-frame face anti-spoofing. Deep Neural Network C …

DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation


Title	DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
Authors	Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson
Abstract	We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions. By probing convolutional neural networks (CNNs) trained to classify cancer in mammograms, we show that many individual units in the final convolutional layer of a CNN respond strongly to diseased tissue concepts specified by the BI-RADS lexicon. After expert annotation of the interpretable units, our proposed method is able to generate explanations for CNN mammogram classification that are correlated with ground truth radiology reports on the DDSM dataset. We show that DeepMiner not only enables better understanding of the nuances of CNN classification decisions, but also possibly discovers new visual knowledge relevant to medical diagnosis.
Tasks	Medical Diagnosis
Published	2018-05-31
URL	http://arxiv.org/abs/1805.12323v1
PDF	http://arxiv.org/pdf/1805.12323v1.pdf
PWC	https://paperswithcode.com/paper/deepminer-discovering-interpretable
Repo	https://github.com/jimmyyhwu/ddsm-visual-primitives
Framework	pytorch

Deep learning for pedestrians: backpropagation in CNNs


Title	Deep learning for pedestrians: backpropagation in CNNs
Authors	Laurent Boué
Abstract	The goal of this document is to provide a pedagogical introduction to the main concepts underpinning the training of deep neural networks using gradient descent; a process known as backpropagation. Although we focus on a very influential class of architectures called “convolutional neural networks” (CNNs) the approach is generic and useful to the machine learning community as a whole. Motivated by the observation that derivations of backpropagation are often obscured by clumsy index-heavy narratives that appear somewhat mathemagical, we aim to offer a conceptually clear, vectorized description that articulates well the higher level logic. Following the principle of “writing is nature’s way of letting you know how sloppy your thinking is”, we try to make the calculations meticulous, self-contained and yet as intuitive as possible. Taking nothing for granted, ample illustrations serve as visual guides and an extensive bibliography is provided for further explorations. (For the sake of clarity, long mathematical derivations and visualizations have been broken up into short “summarized views” and longer “detailed views” encoded into the PDF as optional content groups. Some figures contain animations designed to illustrate important concepts in a more engaging style. For these reasons, we advise to download the document locally and open it using Adobe Acrobat Reader. Other viewers were not tested and may not render the detailed views, animations correctly.)
Tasks
Published	2018-11-29
URL	http://arxiv.org/abs/1811.11987v1
PDF	http://arxiv.org/pdf/1811.11987v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-pedestrians-backpropagation
Repo	https://github.com/Ranlot/backpropagation-CNNs
Framework	pytorch

Deep Facial Expression Recognition: A Survey


Title	Deep Facial Expression Recognition: A Survey
Authors	Shan Li, Weihong Deng
Abstract	With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems.
Tasks	Facial Expression Recognition
Published	2018-04-23
URL	http://arxiv.org/abs/1804.08348v2
PDF	http://arxiv.org/pdf/1804.08348v2.pdf
PWC	https://paperswithcode.com/paper/deep-facial-expression-recognition-a-survey
Repo	https://github.com/yijiazh/DFER_Summer2019
Framework	tf

On the Decision Boundary of Deep Neural Networks


Title	On the Decision Boundary of Deep Neural Networks
Authors	Yu Li, Lizhong Ding, Xin Gao
Abstract	While deep learning models and techniques have achieved great empirical success, our understanding of the source of success in many aspects remains very limited. In an attempt to bridge the gap, we investigate the decision boundary of a production deep learning architecture with weak assumptions on both the training data and the model. We demonstrate, both theoretically and empirically, that the last weight layer of a neural network converges to a linear SVM trained on the output of the last hidden layer, for both the binary case and the multi-class case with the commonly used cross-entropy loss. Furthermore, we show empirically that training a neural network as a whole, instead of only fine-tuning the last weight layer, may result in better bias constant for the last weight layer, which is important for generalization. In addition to facilitating the understanding of deep learning, our result can be helpful for solving a broad range of practical problems of deep learning, such as catastrophic forgetting and adversarial attacking. The experiment codes are available at https://github.com/lykaust15/NN_decision_boundary
Tasks
Published	2018-08-16
URL	http://arxiv.org/abs/1808.05385v3
PDF	http://arxiv.org/pdf/1808.05385v3.pdf
PWC	https://paperswithcode.com/paper/on-the-decision-boundary-of-deep-neural
Repo	https://github.com/lykaust15/NN_decision_boundary
Framework	tf

Exploiting temporal and depth information for multi-frame face anti-spoofing


Title	Exploiting temporal and depth information for multi-frame face anti-spoofing
Authors	Zezheng Wang, Chenxu Zhao, Yunxiao Qin, Qiusheng Zhou, Guojun Qi, Jun Wan, Zhen Lei
Abstract	Face anti-spoofing is significant to the security of face recognition systems. Previous works on depth supervised learning have proved the effectiveness for face anti-spoofing. Nevertheless, they only considered the depth as an auxiliary supervision in the single frame. Different from these methods, we develop a new method to estimate depth information from multiple RGB frames and propose a depth-supervised architecture which can efficiently encodes spatiotemporal information for presentation attack detection. It includes two novel modules: optical flow guided feature block (OFFB) and convolution gated recurrent units (ConvGRU) module, which are designed to extract short-term and long-term motion to discriminate living and spoofing faces. Extensive experiments demonstrate that the proposed approach achieves state-of-the-art results on four benchmark datasets, namely OULU-NPU, SiW, CASIA-MFSD, and Replay-Attack.
Tasks	Face Anti-Spoofing, Face Recognition, Optical Flow Estimation
Published	2018-11-13
URL	http://arxiv.org/abs/1811.05118v3
PDF	http://arxiv.org/pdf/1811.05118v3.pdf
PWC	https://paperswithcode.com/paper/exploiting-temporal-and-depth-information-for
Repo	https://github.com/clks-wzz/PRNet-Depth-Generation
Framework	tf

Deep Neural Network Compression with Single and Multiple Level Quantization


Title	Deep Neural Network Compression with Single and Multiple Level Quantization
Authors	Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, Hongkai Xiong
Abstract	Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary).We are the first to consider the network quantization from both width and depth level. In the width level, parameters are divided into two parts: one for quantization and the other for re-training to eliminate the quantization loss. SLQ leverages the distribution of the parameters to improve the width level. In the depth level, we introduce incremental layer compensation to quantize layers iteratively which decreases the quantization loss in each iteration. The proposed approaches are validated with extensive experiments based on the state-of-the-art neural networks including AlexNet, VGG-16, GoogleNet and ResNet-18. Both SLQ and MLQ achieve impressive results.
Tasks	Neural Network Compression, Quantization
Published	2018-03-06
URL	http://arxiv.org/abs/1803.03289v2
PDF	http://arxiv.org/pdf/1803.03289v2.pdf
PWC	https://paperswithcode.com/paper/deep-neural-network-compression-with-single
Repo	https://github.com/yuhuixu1993/SLQ
Framework	none

Amortized Bayesian inference for clustering models


Title	Amortized Bayesian inference for clustering models
Authors	Ari Pakman, Liam Paninski
Abstract	We develop methods for efficient amortized approximate Bayesian inference over posterior distributions of probabilistic clustering models, such as Dirichlet process mixture models. The approach is based on mapping distributed, symmetry-invariant representations of cluster arrangements into conditional probabilities. The method parallelizes easily, yields iid samples from the approximate posterior of cluster assignments with the same computational cost of a single Gibbs sampler sweep, and can easily be applied to both conjugate and non-conjugate models, as training only requires samples from the generative model.
Tasks	Bayesian Inference
Published	2018-11-24
URL	http://arxiv.org/abs/1811.09747v1
PDF	http://arxiv.org/pdf/1811.09747v1.pdf
PWC	https://paperswithcode.com/paper/amortized-bayesian-inference-for-clustering
Repo	https://github.com/aripakman/neural_clustering_process
Framework	pytorch

Reinforcement Learning for Solving the Vehicle Routing Problem


Title	Reinforcement Learning for Solving the Vehicle Routing Problem
Authors	Mohammadreza Nazari, Afshin Oroojlooy, Lawrence V. Snyder, Martin Takáč
Abstract	We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google’s OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems.
Tasks	Combinatorial Optimization
Published	2018-02-12
URL	http://arxiv.org/abs/1802.04240v2
PDF	http://arxiv.org/pdf/1802.04240v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-solving-the
Repo	https://github.com/OptMLGroup/VRP-RL
Framework	tf

Bayesian Uncertainty Estimation for Batch Normalized Deep Networks


Title	Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
Authors	Mattias Teye, Hossein Azizpour, Kevin Smith
Abstract	We show that training a deep network using batch normalization is equivalent to approximate inference in Bayesian models. We further demonstrate that this finding allows us to make meaningful estimates of the model uncertainty using conventional architectures, without modifications to the network or the training procedure. Our approach is thoroughly validated by measuring the quality of uncertainty in a series of empirical experiments on different tasks. It outperforms baselines with strong statistical significance, and displays competitive performance with recent Bayesian approaches.
Tasks
Published	2018-02-18
URL	http://arxiv.org/abs/1802.06455v2
PDF	http://arxiv.org/pdf/1802.06455v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-uncertainty-estimation-for-batch
Repo	https://github.com/petteriTeikari/pyML_regression_skeleton
Framework	none


Title	Node Classification for Signed Social Networks Using Diffuse Interface Methods
Authors	Pedro Mercado, Jessica Bosch, Martin Stoll
Abstract	Signed networks contain both positive and negative kinds of interactions like friendship and enmity. The task of node classification in non-signed graphs has proven to be beneficial in many real world applications, yet extensions to signed networks remain largely unexplored. In this paper we introduce the first analysis of node classification in signed social networks via diffuse interface methods based on the Ginzburg-Landau functional together with different extensions of the graph Laplacian to signed networks. We show that blending the information from both positive and negative interactions leads to performance improvement in real signed social networks, consistently outperforming the current state of the art.
Tasks	Node Classification
Published	2018-09-07
URL	https://arxiv.org/abs/1809.06432v2
PDF	https://arxiv.org/pdf/1809.06432v2.pdf
PWC	https://paperswithcode.com/paper/node-classification-for-signed-social
Repo	https://github.com/melopeo/GL
Framework	none

Music Genre Classification using Masked Conditional Neural Networks


Title	Music Genre Classification using Masked Conditional Neural Networks
Authors	Fady Medhat, David Chesmore, John Robinson
Abstract	The ConditionaL Neural Networks (CLNN) and the Masked ConditionaL Neural Networks (MCLNN) exploit the nature of multi-dimensional temporal signals. The CLNN captures the conditional temporal influence between the frames in a window and the mask in the MCLNN enforces a systematic sparseness that follows a filterbank-like pattern over the network links. The mask induces the network to learn about time-frequency representations in bands, allowing the network to sustain frequency shifts. Additionally, the mask in the MCLNN automates the exploration of a range of feature combinations, usually done through an exhaustive manual search. We have evaluated the MCLNN performance using the Ballroom and Homburg datasets of music genres. MCLNN has achieved accuracies that are competitive to state-of-the-art handcrafted attempts in addition to models based on Convolutional Neural Networks.
Tasks
Published	2018-02-18
URL	http://arxiv.org/abs/1802.06432v2
PDF	http://arxiv.org/pdf/1802.06432v2.pdf
PWC	https://paperswithcode.com/paper/music-genre-classification-using-masked
Repo	https://github.com/fadymedhat/MCLNN
Framework	tf

Federated Learning for Mobile Keyboard Prediction


Title	Federated Learning for Mobile Keyboard Prediction
Authors	Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, Daniel Ramage
Abstract	We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over the use of their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices.
Tasks	Language Modelling
Published	2018-11-08
URL	http://arxiv.org/abs/1811.03604v2
PDF	http://arxiv.org/pdf/1811.03604v2.pdf
PWC	https://paperswithcode.com/paper/federated-learning-for-mobile-keyboard
Repo	https://github.com/MsAmberWelch/Privacy-Engineering
Framework	tf

CINIC-10 is not ImageNet or CIFAR-10


Title	CINIC-10 is not ImageNet or CIFAR-10
Authors	Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey
Abstract	In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard benchmarks for well known models. Details for download, usage, and compilation can be found in the associated github repository.
Tasks	Image Classification
Published	2018-10-02
URL	http://arxiv.org/abs/1810.03505v1
PDF	http://arxiv.org/pdf/1810.03505v1.pdf
PWC	https://paperswithcode.com/paper/cinic-10-is-not-imagenet-or-cifar-10
Repo	https://github.com/BayesWatch/cinic-10
Framework	pytorch

Everybody Dance Now


Title	Everybody Dance Now
Authors	Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros
Abstract	This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to-video translation using pose as an intermediate representation. To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Although our method is quite simple, it produces surprisingly compelling results (see video). This motivates us to also provide a forensics tool for reliable synthetic content detection, which is able to distinguish videos synthesized by our system from real data. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer.
Tasks	Face Generation, Image-to-Image Translation, Video Generation
Published	2018-08-22
URL	https://arxiv.org/abs/1808.07371v2
PDF	https://arxiv.org/pdf/1808.07371v2.pdf
PWC	https://paperswithcode.com/paper/everybody-dance-now
Repo	https://github.com/ShutoAraki/EverybodyDanceNow
Framework	none

Infrared and visible image fusion using Latent Low-Rank Representation


Title	Infrared and visible image fusion using Latent Low-Rank Representation
Authors	Hui Li, Xiao-Jun Wu
Abstract	Infrared and visible image fusion is an important problem in the field of image fusion which has been applied widely in many fields. To better preserve the useful information from source images, in this paper, we propose a novel image fusion method based on latent low-rank representation(LatLRR) which is simple and effective. Firstly, the source images are decomposed into low-rank parts(global structure) and saliency parts(local structure) by LatLRR. Then, the lowrank parts are fused by weighted-average strategy to preserve more contour information. Then, the saliency parts are simply fused by sum strategy which is a efficient operation in this fusion framework. Finally, the fused image is obtained by combining the fused low-rank part and the fused saliency part. Compared with other fusion methods experimentally, the proposed method has better fusion performance than stateof-the-art fusion methods in both subjective and objective evaluation. The Code of our fusion method is available at https://github.com/hli1221/imagefusion Infrared visible latlrr
Tasks	Infrared And Visible Image Fusion
Published	2018-04-24
URL	https://arxiv.org/abs/1804.08992v4
PDF	https://arxiv.org/pdf/1804.08992v4.pdf
PWC	https://paperswithcode.com/paper/infrared-and-visible-image-fusion-using
Repo	https://github.com/exceptionLi/imagefusion_Infrared_visible_latlrr
Framework	none