Paper Group AWR 38
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation. Deep learning for pedestrians: backpropagation in CNNs. Deep Facial Expression Recognition: A Survey. On the Decision Boundary of Deep Neural Networks. Exploiting temporal and depth information for multi-frame face anti-spoofing. Deep Neural Network C …
DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation
Title | DeepMiner: Discovering Interpretable Representations for Mammogram Classification and Explanation |
Authors | Jimmy Wu, Bolei Zhou, Diondra Peck, Scott Hsieh, Vandana Dialani, Lester Mackey, Genevieve Patterson |
Abstract | We propose DeepMiner, a framework to discover interpretable representations in deep neural networks and to build explanations for medical predictions. By probing convolutional neural networks (CNNs) trained to classify cancer in mammograms, we show that many individual units in the final convolutional layer of a CNN respond strongly to diseased tissue concepts specified by the BI-RADS lexicon. After expert annotation of the interpretable units, our proposed method is able to generate explanations for CNN mammogram classification that are correlated with ground truth radiology reports on the DDSM dataset. We show that DeepMiner not only enables better understanding of the nuances of CNN classification decisions, but also possibly discovers new visual knowledge relevant to medical diagnosis. |
Tasks | Medical Diagnosis |
Published | 2018-05-31 |
URL | http://arxiv.org/abs/1805.12323v1 |
http://arxiv.org/pdf/1805.12323v1.pdf | |
PWC | https://paperswithcode.com/paper/deepminer-discovering-interpretable |
Repo | https://github.com/jimmyyhwu/ddsm-visual-primitives |
Framework | pytorch |
Deep learning for pedestrians: backpropagation in CNNs
Title | Deep learning for pedestrians: backpropagation in CNNs |
Authors | Laurent Boué |
Abstract | The goal of this document is to provide a pedagogical introduction to the main concepts underpinning the training of deep neural networks using gradient descent; a process known as backpropagation. Although we focus on a very influential class of architectures called “convolutional neural networks” (CNNs) the approach is generic and useful to the machine learning community as a whole. Motivated by the observation that derivations of backpropagation are often obscured by clumsy index-heavy narratives that appear somewhat mathemagical, we aim to offer a conceptually clear, vectorized description that articulates well the higher level logic. Following the principle of “writing is nature’s way of letting you know how sloppy your thinking is”, we try to make the calculations meticulous, self-contained and yet as intuitive as possible. Taking nothing for granted, ample illustrations serve as visual guides and an extensive bibliography is provided for further explorations. (For the sake of clarity, long mathematical derivations and visualizations have been broken up into short “summarized views” and longer “detailed views” encoded into the PDF as optional content groups. Some figures contain animations designed to illustrate important concepts in a more engaging style. For these reasons, we advise to download the document locally and open it using Adobe Acrobat Reader. Other viewers were not tested and may not render the detailed views, animations correctly.) |
Tasks | |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.11987v1 |
http://arxiv.org/pdf/1811.11987v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-pedestrians-backpropagation |
Repo | https://github.com/Ranlot/backpropagation-CNNs |
Framework | pytorch |
Deep Facial Expression Recognition: A Survey
Title | Deep Facial Expression Recognition: A Survey |
Authors | Shan Li, Weihong Deng |
Abstract | With the transition of facial expression recognition (FER) from laboratory-controlled to challenging in-the-wild conditions and the recent success of deep learning techniques in various fields, deep neural networks have increasingly been leveraged to learn discriminative representations for automatic FER. Recent deep FER systems generally focus on two important issues: overfitting caused by a lack of sufficient training data and expression-unrelated variations, such as illumination, head pose and identity bias. In this paper, we provide a comprehensive survey on deep FER, including datasets and algorithms that provide insights into these intrinsic problems. First, we describe the standard pipeline of a deep FER system with the related background knowledge and suggestions of applicable implementations for each stage. We then introduce the available datasets that are widely used in the literature and provide accepted data selection and evaluation principles for these datasets. For the state of the art in deep FER, we review existing novel deep neural networks and related training strategies that are designed for FER based on both static images and dynamic image sequences, and discuss their advantages and limitations. Competitive performances on widely used benchmarks are also summarized in this section. We then extend our survey to additional related issues and application scenarios. Finally, we review the remaining challenges and corresponding opportunities in this field as well as future directions for the design of robust deep FER systems. |
Tasks | Facial Expression Recognition |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08348v2 |
http://arxiv.org/pdf/1804.08348v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-facial-expression-recognition-a-survey |
Repo | https://github.com/yijiazh/DFER_Summer2019 |
Framework | tf |
On the Decision Boundary of Deep Neural Networks
Title | On the Decision Boundary of Deep Neural Networks |
Authors | Yu Li, Lizhong Ding, Xin Gao |
Abstract | While deep learning models and techniques have achieved great empirical success, our understanding of the source of success in many aspects remains very limited. In an attempt to bridge the gap, we investigate the decision boundary of a production deep learning architecture with weak assumptions on both the training data and the model. We demonstrate, both theoretically and empirically, that the last weight layer of a neural network converges to a linear SVM trained on the output of the last hidden layer, for both the binary case and the multi-class case with the commonly used cross-entropy loss. Furthermore, we show empirically that training a neural network as a whole, instead of only fine-tuning the last weight layer, may result in better bias constant for the last weight layer, which is important for generalization. In addition to facilitating the understanding of deep learning, our result can be helpful for solving a broad range of practical problems of deep learning, such as catastrophic forgetting and adversarial attacking. The experiment codes are available at https://github.com/lykaust15/NN_decision_boundary |
Tasks | |
Published | 2018-08-16 |
URL | http://arxiv.org/abs/1808.05385v3 |
http://arxiv.org/pdf/1808.05385v3.pdf | |
PWC | https://paperswithcode.com/paper/on-the-decision-boundary-of-deep-neural |
Repo | https://github.com/lykaust15/NN_decision_boundary |
Framework | tf |
Exploiting temporal and depth information for multi-frame face anti-spoofing
Title | Exploiting temporal and depth information for multi-frame face anti-spoofing |
Authors | Zezheng Wang, Chenxu Zhao, Yunxiao Qin, Qiusheng Zhou, Guojun Qi, Jun Wan, Zhen Lei |
Abstract | Face anti-spoofing is significant to the security of face recognition systems. Previous works on depth supervised learning have proved the effectiveness for face anti-spoofing. Nevertheless, they only considered the depth as an auxiliary supervision in the single frame. Different from these methods, we develop a new method to estimate depth information from multiple RGB frames and propose a depth-supervised architecture which can efficiently encodes spatiotemporal information for presentation attack detection. It includes two novel modules: optical flow guided feature block (OFFB) and convolution gated recurrent units (ConvGRU) module, which are designed to extract short-term and long-term motion to discriminate living and spoofing faces. Extensive experiments demonstrate that the proposed approach achieves state-of-the-art results on four benchmark datasets, namely OULU-NPU, SiW, CASIA-MFSD, and Replay-Attack. |
Tasks | Face Anti-Spoofing, Face Recognition, Optical Flow Estimation |
Published | 2018-11-13 |
URL | http://arxiv.org/abs/1811.05118v3 |
http://arxiv.org/pdf/1811.05118v3.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-temporal-and-depth-information-for |
Repo | https://github.com/clks-wzz/PRNet-Depth-Generation |
Framework | tf |
Deep Neural Network Compression with Single and Multiple Level Quantization
Title | Deep Neural Network Compression with Single and Multiple Level Quantization |
Authors | Yuhui Xu, Yongzhuang Wang, Aojun Zhou, Weiyao Lin, Hongkai Xiong |
Abstract | Network quantization is an effective solution to compress deep neural networks for practical usage. Existing network quantization methods cannot sufficiently exploit the depth information to generate low-bit compressed network. In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary).We are the first to consider the network quantization from both width and depth level. In the width level, parameters are divided into two parts: one for quantization and the other for re-training to eliminate the quantization loss. SLQ leverages the distribution of the parameters to improve the width level. In the depth level, we introduce incremental layer compensation to quantize layers iteratively which decreases the quantization loss in each iteration. The proposed approaches are validated with extensive experiments based on the state-of-the-art neural networks including AlexNet, VGG-16, GoogleNet and ResNet-18. Both SLQ and MLQ achieve impressive results. |
Tasks | Neural Network Compression, Quantization |
Published | 2018-03-06 |
URL | http://arxiv.org/abs/1803.03289v2 |
http://arxiv.org/pdf/1803.03289v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-neural-network-compression-with-single |
Repo | https://github.com/yuhuixu1993/SLQ |
Framework | none |
Amortized Bayesian inference for clustering models
Title | Amortized Bayesian inference for clustering models |
Authors | Ari Pakman, Liam Paninski |
Abstract | We develop methods for efficient amortized approximate Bayesian inference over posterior distributions of probabilistic clustering models, such as Dirichlet process mixture models. The approach is based on mapping distributed, symmetry-invariant representations of cluster arrangements into conditional probabilities. The method parallelizes easily, yields iid samples from the approximate posterior of cluster assignments with the same computational cost of a single Gibbs sampler sweep, and can easily be applied to both conjugate and non-conjugate models, as training only requires samples from the generative model. |
Tasks | Bayesian Inference |
Published | 2018-11-24 |
URL | http://arxiv.org/abs/1811.09747v1 |
http://arxiv.org/pdf/1811.09747v1.pdf | |
PWC | https://paperswithcode.com/paper/amortized-bayesian-inference-for-clustering |
Repo | https://github.com/aripakman/neural_clustering_process |
Framework | pytorch |
Reinforcement Learning for Solving the Vehicle Routing Problem
Title | Reinforcement Learning for Solving the Vehicle Routing Problem |
Authors | Mohammadreza Nazari, Afshin Oroojlooy, Lawrence V. Snyder, Martin Takáč |
Abstract | We present an end-to-end framework for solving the Vehicle Routing Problem (VRP) using reinforcement learning. In this approach, we train a single model that finds near-optimal solutions for problem instances sampled from a given distribution, only by observing the reward signals and following feasibility rules. Our model represents a parameterized stochastic policy, and by applying a policy gradient algorithm to optimize its parameters, the trained model produces the solution as a sequence of consecutive actions in real time, without the need to re-train for every new problem instance. On capacitated VRP, our approach outperforms classical heuristics and Google’s OR-Tools on medium-sized instances in solution quality with comparable computation time (after training). We demonstrate how our approach can handle problems with split delivery and explore the effect of such deliveries on the solution quality. Our proposed framework can be applied to other variants of the VRP such as the stochastic VRP, and has the potential to be applied more generally to combinatorial optimization problems. |
Tasks | Combinatorial Optimization |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.04240v2 |
http://arxiv.org/pdf/1802.04240v2.pdf | |
PWC | https://paperswithcode.com/paper/reinforcement-learning-for-solving-the |
Repo | https://github.com/OptMLGroup/VRP-RL |
Framework | tf |
Bayesian Uncertainty Estimation for Batch Normalized Deep Networks
Title | Bayesian Uncertainty Estimation for Batch Normalized Deep Networks |
Authors | Mattias Teye, Hossein Azizpour, Kevin Smith |
Abstract | We show that training a deep network using batch normalization is equivalent to approximate inference in Bayesian models. We further demonstrate that this finding allows us to make meaningful estimates of the model uncertainty using conventional architectures, without modifications to the network or the training procedure. Our approach is thoroughly validated by measuring the quality of uncertainty in a series of empirical experiments on different tasks. It outperforms baselines with strong statistical significance, and displays competitive performance with recent Bayesian approaches. |
Tasks | |
Published | 2018-02-18 |
URL | http://arxiv.org/abs/1802.06455v2 |
http://arxiv.org/pdf/1802.06455v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-uncertainty-estimation-for-batch |
Repo | https://github.com/petteriTeikari/pyML_regression_skeleton |
Framework | none |
Node Classification for Signed Social Networks Using Diffuse Interface Methods
Title | Node Classification for Signed Social Networks Using Diffuse Interface Methods |
Authors | Pedro Mercado, Jessica Bosch, Martin Stoll |
Abstract | Signed networks contain both positive and negative kinds of interactions like friendship and enmity. The task of node classification in non-signed graphs has proven to be beneficial in many real world applications, yet extensions to signed networks remain largely unexplored. In this paper we introduce the first analysis of node classification in signed social networks via diffuse interface methods based on the Ginzburg-Landau functional together with different extensions of the graph Laplacian to signed networks. We show that blending the information from both positive and negative interactions leads to performance improvement in real signed social networks, consistently outperforming the current state of the art. |
Tasks | Node Classification |
Published | 2018-09-07 |
URL | https://arxiv.org/abs/1809.06432v2 |
https://arxiv.org/pdf/1809.06432v2.pdf | |
PWC | https://paperswithcode.com/paper/node-classification-for-signed-social |
Repo | https://github.com/melopeo/GL |
Framework | none |
Music Genre Classification using Masked Conditional Neural Networks
Title | Music Genre Classification using Masked Conditional Neural Networks |
Authors | Fady Medhat, David Chesmore, John Robinson |
Abstract | The ConditionaL Neural Networks (CLNN) and the Masked ConditionaL Neural Networks (MCLNN) exploit the nature of multi-dimensional temporal signals. The CLNN captures the conditional temporal influence between the frames in a window and the mask in the MCLNN enforces a systematic sparseness that follows a filterbank-like pattern over the network links. The mask induces the network to learn about time-frequency representations in bands, allowing the network to sustain frequency shifts. Additionally, the mask in the MCLNN automates the exploration of a range of feature combinations, usually done through an exhaustive manual search. We have evaluated the MCLNN performance using the Ballroom and Homburg datasets of music genres. MCLNN has achieved accuracies that are competitive to state-of-the-art handcrafted attempts in addition to models based on Convolutional Neural Networks. |
Tasks | |
Published | 2018-02-18 |
URL | http://arxiv.org/abs/1802.06432v2 |
http://arxiv.org/pdf/1802.06432v2.pdf | |
PWC | https://paperswithcode.com/paper/music-genre-classification-using-masked |
Repo | https://github.com/fadymedhat/MCLNN |
Framework | tf |
Federated Learning for Mobile Keyboard Prediction
Title | Federated Learning for Mobile Keyboard Prediction |
Authors | Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, Daniel Ramage |
Abstract | We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training language models on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over the use of their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices. |
Tasks | Language Modelling |
Published | 2018-11-08 |
URL | http://arxiv.org/abs/1811.03604v2 |
http://arxiv.org/pdf/1811.03604v2.pdf | |
PWC | https://paperswithcode.com/paper/federated-learning-for-mobile-keyboard |
Repo | https://github.com/MsAmberWelch/Privacy-Engineering |
Framework | tf |
CINIC-10 is not ImageNet or CIFAR-10
Title | CINIC-10 is not ImageNet or CIFAR-10 |
Authors | Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey |
Abstract | In this brief technical report we introduce the CINIC-10 dataset as a plug-in extended alternative for CIFAR-10. It was compiled by combining CIFAR-10 with images selected and downsampled from the ImageNet database. We present the approach to compiling the dataset, illustrate the example images for different classes, give pixel distributions for each part of the repository, and give some standard benchmarks for well known models. Details for download, usage, and compilation can be found in the associated github repository. |
Tasks | Image Classification |
Published | 2018-10-02 |
URL | http://arxiv.org/abs/1810.03505v1 |
http://arxiv.org/pdf/1810.03505v1.pdf | |
PWC | https://paperswithcode.com/paper/cinic-10-is-not-imagenet-or-cifar-10 |
Repo | https://github.com/BayesWatch/cinic-10 |
Framework | pytorch |
Everybody Dance Now
Title | Everybody Dance Now |
Authors | Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros |
Abstract | This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to-video translation using pose as an intermediate representation. To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Although our method is quite simple, it produces surprisingly compelling results (see video). This motivates us to also provide a forensics tool for reliable synthetic content detection, which is able to distinguish videos synthesized by our system from real data. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer. |
Tasks | Face Generation, Image-to-Image Translation, Video Generation |
Published | 2018-08-22 |
URL | https://arxiv.org/abs/1808.07371v2 |
https://arxiv.org/pdf/1808.07371v2.pdf | |
PWC | https://paperswithcode.com/paper/everybody-dance-now |
Repo | https://github.com/ShutoAraki/EverybodyDanceNow |
Framework | none |
Infrared and visible image fusion using Latent Low-Rank Representation
Title | Infrared and visible image fusion using Latent Low-Rank Representation |
Authors | Hui Li, Xiao-Jun Wu |
Abstract | Infrared and visible image fusion is an important problem in the field of image fusion which has been applied widely in many fields. To better preserve the useful information from source images, in this paper, we propose a novel image fusion method based on latent low-rank representation(LatLRR) which is simple and effective. Firstly, the source images are decomposed into low-rank parts(global structure) and saliency parts(local structure) by LatLRR. Then, the lowrank parts are fused by weighted-average strategy to preserve more contour information. Then, the saliency parts are simply fused by sum strategy which is a efficient operation in this fusion framework. Finally, the fused image is obtained by combining the fused low-rank part and the fused saliency part. Compared with other fusion methods experimentally, the proposed method has better fusion performance than stateof-the-art fusion methods in both subjective and objective evaluation. The Code of our fusion method is available at https://github.com/hli1221/imagefusion Infrared visible latlrr |
Tasks | Infrared And Visible Image Fusion |
Published | 2018-04-24 |
URL | https://arxiv.org/abs/1804.08992v4 |
https://arxiv.org/pdf/1804.08992v4.pdf | |
PWC | https://paperswithcode.com/paper/infrared-and-visible-image-fusion-using |
Repo | https://github.com/exceptionLi/imagefusion_Infrared_visible_latlrr |
Framework | none |