January 25, 2020

3106 words 15 mins read

Paper Group NAWR 30

Skin Lesion Segmentation using SegNet with Binary Cross-Entropy. On the (In)fidelity and Sensitivity of Explanations. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss. Errudite: Scalable, Reproducible, and Testable Error Analysis. Digging Into Self-Supervised Monocular Depth Estimation. RUBi: Reducing Unimodal Biases fo …

Skin Lesion Segmentation using SegNet with Binary Cross-Entropy


Title	Skin Lesion Segmentation using SegNet with Binary Cross-Entropy
Authors	Prashant Brahmbhatt, Siddhi Nath Rajan
Abstract	In this paper a simple and computationally efficient approach as per the complexity has been presented for Automatic Skin Lesion Segmentation using a Deep Learning architecture called SegNet including some additional specifications for the improvisation of the results. The secondary objective is to keep the pre/post -processing of the images minimal. The presented model is trained on limited images from the PH2 dataset which includes dermoscopic images, manually segmented. It also contains their masks, the clinical diagnosis and the identification of several dermoscopic structures, performed by professional dermatologists. The aim is to achieve a performance threshold Jaccard Index (IOU) 92% after evaluation.
Tasks	Lesion Segmentation, Skin Cancer Segmentation
Published	2019-11-15
URL	https://raw.githubusercontent.com/hashbanger/Skin_Lesion_Segmentation/master/abstract.txt
PDF	https://drive.google.com/file/d/1pgAXmKgY2NerSMzvaS9M8PKnP0cTrbQM/view?usp=sharing
PWC	https://paperswithcode.com/paper/skin-lesion-segmentation-using-segnet-with
Repo	https://github.com/hashbanger/Skin_Lesion_Segmentation
Framework	none

On the (In)fidelity and Sensitivity of Explanations


Title	On the (In)fidelity and Sensitivity of Explanations
Authors	Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I. Inouye, Pradeep K. Ravikumar
Abstract	We consider objective evaluation measures of saliency explanations for complex black-box machine learning models. We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity. We analyze optimal explanations with respect to both these measures, and while the optimal explanation for sensitivity is a vacuous constant explanation, the optimal explanation for infidelity is a novel combination of two popular explanation methods. By varying the perturbation distribution that defines infidelity, we obtain novel explanations by optimizing infidelity, which we show to out-perform existing explanations in both quantitative and qualitative measurements. Another salient question given these measures is how to modify any given explanation to have better values with respect to these measures. We propose a simple modification based on lowering sensitivity, and moreover show that when done appropriately, we could simultaneously improve both sensitivity as well as fidelity.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9278-on-the-infidelity-and-sensitivity-of-explanations
PDF	http://papers.nips.cc/paper/9278-on-the-infidelity-and-sensitivity-of-explanations.pdf
PWC	https://paperswithcode.com/paper/on-the-infidelity-and-sensitivity-of
Repo	https://github.com/chihkuanyeh/saliency_evaluation
Framework	pytorch


Title	Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss
Authors	Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
Abstract	We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons.
Tasks	Face Generation, Talking Face Generation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Hierarchical_Cross-Modal_Talking_Face_Generation_With_Dynamic_Pixel-Wise_Loss_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Hierarchical_Cross-Modal_Talking_Face_Generation_With_Dynamic_Pixel-Wise_Loss_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/hierarchical-cross-modal-talking-face-1
Repo	https://github.com/lelechen63/ATVGnet
Framework	pytorch

Errudite: Scalable, Reproducible, and Testable Error Analysis


Title	Errudite: Scalable, Reproducible, and Testable Error Analysis
Authors	Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld
Abstract	Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.
Tasks
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-1073/
PDF	https://www.aclweb.org/anthology/P19-1073
PWC	https://paperswithcode.com/paper/errudite-scalable-reproducible-and-testable
Repo	https://github.com/uwdata/errudite
Framework	none

Digging Into Self-Supervised Monocular Depth Estimation


Title	Digging Into Self-Supervised Monocular Depth Estimation
Authors	Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel J. Brostow
Abstract	Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Godard_Digging_Into_Self-Supervised_Monocular_Depth_Estimation_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Godard_Digging_Into_Self-Supervised_Monocular_Depth_Estimation_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/digging-into-self-supervised-monocular-depth-1
Repo	https://github.com/nianticlabs/monodepth2
Framework	pytorch

RUBi: Reducing Unimodal Biases for Visual Question Answering


Title	RUBi: Reducing Unimodal Biases for Visual Question Answering
Authors	Remi Cadene, Corentin Dancette, Hedi Ben Younes, Matthieu Cord, Devi Parikh
Abstract	Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training.
Tasks	Question Answering, Visual Question Answering
Published	2019-12-01
URL	http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering
PDF	http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering.pdf
PWC	https://paperswithcode.com/paper/rubi-reducing-unimodal-biases-for-visual
Repo	https://github.com/cdancette/rubi.bootstrap.pytorch
Framework	pytorch

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features


Title	D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Authors	Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, Torsten Sattler
Abstract	In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.
Tasks	3D Reconstruction
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/d2-net-a-trainable-cnn-for-joint-description
Repo	https://github.com/mihaidusmanu/d2-net
Framework	pytorch

Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification


Title	Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification
Authors	Arulkumar Subramaniam, Athira Nambiar, Anurag Mittal
Abstract	Person re-identification (Re-ID) is an important real-world surveillance problem that entails associating a person’s identity over a network of cameras. Video-based Re-ID approaches have gained significant attention recently since a video, and not just an image, is often available. In this work, we propose a novel Co-segmentation inspired video Re-ID deep architecture and formulate a Co-segmentation based Attention Module (COSAM) that activates a common set of salient features across multiple frames of a video via mutual consensus in an unsupervised manner. As opposed to most of the prior work, our approach is able to attend to person accessories along with the person. Our plug-and-play and interpretable COSAM module applied on two deep architectures (ResNet50, SE-ResNet50) outperform the state-of-the-art methods on three benchmark datasets.
Tasks	Person Re-Identification, Video-Based Person Re-Identification
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Subramaniam_Co-Segmentation_Inspired_Attention_Networks_for_Video-Based_Person_Re-Identification_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Subramaniam_Co-Segmentation_Inspired_Attention_Networks_for_Video-Based_Person_Re-Identification_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/co-segmentation-inspired-attention-networks
Repo	https://github.com/InnovArul/vidreid_cosegmentation
Framework	pytorch

Addressing Failure Detection by Learning Model Confidence


Title	Addressing Failure Detection by Learning Model Confidence
Authors	Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez
Abstract	Assessing reliably the confidence of a deep neural net and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.
Tasks	Image Classification, Semantic Segmentation
Published	2019-12-01
URL	http://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence
PDF	http://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence.pdf
PWC	https://paperswithcode.com/paper/addressing-failure-detection-by-learning
Repo	https://github.com/valeoai/ConfidNet
Framework	pytorch

Conditional Structure Generation through Graph Variational Generative Adversarial Nets


Title	Conditional Structure Generation through Graph Variational Generative Adversarial Nets
Authors	Carl Yang, Peiye Zhuang, Wenhan Shi, Alan Luu, Pan Li
Abstract	Graph embedding has been intensively studied recently, due to the advance of various neural network models. Theoretical analyses and empirical studies have pushed forward the translation of discrete graph structures into distributed representation vectors, but seldom considered the reverse direction, i.e., generation of graphs from given related context spaces. Particularly, since graphs often become more meaningful when associated with semantic contexts (e.g., social networks of certain communities, gene networks of certain diseases), the ability to infer graph structures according to given semantic conditions could be of great value. While existing graph generative models only consider graph structures without semantic contexts, we formulate the novel problem of conditional structure generation, and propose a novel unified model of graph variational generative adversarial nets (CondGen) to handle the intrinsic challenges of flexible context-structure conditioning and permutation-invariant generation. Extensive experiments on two deliberately created benchmark datasets of real-world context-enriched networks demonstrate the supreme effectiveness and generalizability of CondGen.
Tasks	Graph Embedding
Published	2019-12-01
URL	http://papers.nips.cc/paper/8415-conditional-structure-generation-through-graph-variational-generative-adversarial-nets
PDF	http://papers.nips.cc/paper/8415-conditional-structure-generation-through-graph-variational-generative-adversarial-nets.pdf
PWC	https://paperswithcode.com/paper/conditional-structure-generation-through
Repo	https://github.com/KelestZ/CondGen
Framework	pytorch

Modular Deep Probabilistic Programming


Title	Modular Deep Probabilistic Programming
Authors	Zhenwen Dai, Eric Meissner, Neil D. Lawrence
Abstract	Modularity is a key feature of deep learning libraries but has not been fully exploited for probabilistic programming. We propose to improve modularity of probabilistic programming language by offering not only plain probabilistic distributions but also sophisticated probabilistic model such as Bayesian non-parametric models as fundamental building blocks. We demonstrate this idea by presenting a modular probabilistic programming language MXFusion, which includes a new type of re-usable building blocks, called probabilistic modules. A probabilistic module consists of a set of random variables with associated probabilistic distributions and dedicated inference methods. Under the framework of variational inference, the pre-specified inference methods of individual probabilistic modules can be transparently used for inference of the whole probabilistic model. We demonstrate the power and convenience of probabilistic modules in MXFusion with various examples of Gaussian process models, which are evaluated with experiments on real data.
Tasks	Probabilistic Programming
Published	2019-05-01
URL	https://openreview.net/forum?id=B1xnPsA5KX
PDF	https://openreview.net/pdf?id=B1xnPsA5KX
PWC	https://paperswithcode.com/paper/modular-deep-probabilistic-programming
Repo	https://github.com/amzn/MXFusion
Framework	mxnet

Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation


Title	Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation
Authors	Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh
Abstract	There is much interest in computer vision to utilize commodity hardware for gaze estimation. A number of papers have shown that algorithms based on deep convolutional architectures are approaching accuracies where streaming data from mass-market devices can offer good gaze tracking performance, although a gap still remains between what is possible and the performance users will expect in real deployments. We observe that one obvious avenue for improvement relates to a gap between some basic technical assumptions behind most existing approaches and the statistical properties of the data used for training. Specifically, most training datasets involve tens of users with a few hundreds (or more) repeated acquisitions per user. The non i.i.d. nature of this data suggests better estimation may be possible if the model explicitly made use of such “repeated measurements” from each user as is commonly done in classical statistical analysis using so-called mixed effects models. The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. Such a formulation seeks to specifically utilize information regarding the hierarchical structure of the training data – each node in the hierarchy is a user who provides tens or hundreds of repeated samples. This modification yields an architecture that offers state of the art performance on various publicly available datasets improving results by 10-20%.
Tasks	Gaze Estimation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Xiong_Mixed_Effects_Neural_Networks_MeNets_With_Applications_to_Gaze_Estimation_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Xiong_Mixed_Effects_Neural_Networks_MeNets_With_Applications_to_Gaze_Estimation_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/mixed-effects-neural-networks-menets-with
Repo	https://github.com/vsingh-group/MeNets
Framework	none

Automated Brain Disorders Diagnosis Through Deep Neural Networks


Title	Automated Brain Disorders Diagnosis Through Deep Neural Networks
Authors	Gabriel Maggiotti
Abstract	In most cases, the diagnosis of brain disorders such as epilepsy is slow and requires endless visits to doctors and EEG technicians. This project aims to automate brain disorder diagnosis by using Artificial Intelligence and deep learning. Brain could have many disorders that can be detected by reading an Electroencephalography. Using an EEG device and collecting the electrical signals directly from the brain with a non-invasive procedure gives significant information about its health. Classifying and detecting anomalies on these signals is what currently doctors do when reading an Electroencephalography. With the right amount of data and the use of Artificial Intelligence, it could be possible to learn and classify these signals into groups like (i.e: anxiety, epilepsy spikes, etc). Then, a trained Neural Network to interpret those signals and identify evidence of a disorder to finally automate the detection and classification of those disorders found.
Tasks	EEG
Published	2019-01-11
URL	http://vixra.org/abs/1901.0166
PDF	http://vixra.org/pdf/1901.0166v1.pdf
PWC	https://paperswithcode.com/paper/automated-brain-disorders-diagnosis-through
Repo	https://github.com/gmaggiotti/epilepsy-prediction
Framework	tf

Learning Non-Volumetric Depth Fusion Using Successive Reprojections


Title	Learning Non-Volumetric Depth Fusion Using Successive Reprojections
Authors	Simon Donne, Andreas Geiger
Abstract	Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction – most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data.
Tasks	3D Reconstruction, Depth Estimation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Donne_Learning_Non-Volumetric_Depth_Fusion_Using_Successive_Reprojections_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Donne_Learning_Non-Volumetric_Depth_Fusion_Using_Successive_Reprojections_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/learning-non-volumetric-depth-fusion-using
Repo	https://github.com/simon-donne/defusr
Framework	pytorch

Derivative-Free Optimization of Neural Networks using Local Search


Title	Derivative-Free Optimization of Neural Networks using Local Search
Authors	Ahmed Aly, Gianluca Guadagni, Joanne Bechta Dugan
Abstract	Deep Neural Networks have received a great deal of attention in the past few years. Applications of Deep Learning broached areas of different domains such as Reinforcement Learning and Computer Vision. Despite their popularity and success, training neural networks can be a challenging process. This paper presents a study on derivative-free, single-candidate optimization of neural networks using Local Search (LS). LS is an algorithm where constrained noise is iteratively applied to subsets of the search space. It is coupled with a Score Decay mechanism to enhance performance. LS is a subsidiary of the Random Search family. Experiments were conducted using a setup that is both suitable for an introduction of the algorithm and representative of modern deep learning tasks, based on the FashionMNIST dataset. Training of a 5-Million parameter CNN was done in several scenarios, including Stochastic Gradient Descent (SGD) coupled with Backpropagation (BP) for comparison. Results reveal that although LS was not competitive in terms of convergence speed, it was actually able to converge to a lower loss than SGD. In addition, LS trained the CNN using Accuracy rather than Loss as a learning signal, though to a lower performance. In conclusion, LS presents a viable alternative in cases where SGD fails or is not suitable. The simplicity of LS can make it attractive to non-experts who would want to try neural nets for the first-time or on novel, non-differentiable tasks.
Tasks
Published	2019-10-15
URL	https://www.researchgate.net/publication/338501738_Derivative-Free_Optimization_of_Neural_Networks_Using_Local_Search
PDF	https://www.researchgate.net/publication/338501738_Derivative-Free_Optimization_of_Neural_Networks_Using_Local_Search
PWC	https://paperswithcode.com/paper/derivative-free-optimization-of-neural
Repo	https://github.com/AroMorin/DNNOP
Framework	pytorch