January 25, 2020

3106 words 15 mins read

Paper Group NAWR 30

Paper Group NAWR 30

Skin Lesion Segmentation using SegNet with Binary Cross-Entropy. On the (In)fidelity and Sensitivity of Explanations. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss. Errudite: Scalable, Reproducible, and Testable Error Analysis. Digging Into Self-Supervised Monocular Depth Estimation. RUBi: Reducing Unimodal Biases fo …

Skin Lesion Segmentation using SegNet with Binary Cross-Entropy

Title Skin Lesion Segmentation using SegNet with Binary Cross-Entropy
Authors Prashant Brahmbhatt, Siddhi Nath Rajan
Abstract In this paper a simple and computationally efficient approach as per the complexity has been presented for Automatic Skin Lesion Segmentation using a Deep Learning architecture called SegNet including some additional specifications for the improvisation of the results. The secondary objective is to keep the pre/post -processing of the images minimal. The presented model is trained on limited images from the PH2 dataset which includes dermoscopic images, manually segmented. It also contains their masks, the clinical diagnosis and the identification of several dermoscopic structures, performed by professional dermatologists. The aim is to achieve a performance threshold Jaccard Index (IOU) 92% after evaluation.
Tasks Lesion Segmentation, Skin Cancer Segmentation
Published 2019-11-15
URL https://raw.githubusercontent.com/hashbanger/Skin_Lesion_Segmentation/master/abstract.txt
PDF https://drive.google.com/file/d/1pgAXmKgY2NerSMzvaS9M8PKnP0cTrbQM/view?usp=sharing
PWC https://paperswithcode.com/paper/skin-lesion-segmentation-using-segnet-with
Repo https://github.com/hashbanger/Skin_Lesion_Segmentation
Framework none

On the (In)fidelity and Sensitivity of Explanations

Title On the (In)fidelity and Sensitivity of Explanations
Authors Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I. Inouye, Pradeep K. Ravikumar
Abstract We consider objective evaluation measures of saliency explanations for complex black-box machine learning models. We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity. We analyze optimal explanations with respect to both these measures, and while the optimal explanation for sensitivity is a vacuous constant explanation, the optimal explanation for infidelity is a novel combination of two popular explanation methods. By varying the perturbation distribution that defines infidelity, we obtain novel explanations by optimizing infidelity, which we show to out-perform existing explanations in both quantitative and qualitative measurements. Another salient question given these measures is how to modify any given explanation to have better values with respect to these measures. We propose a simple modification based on lowering sensitivity, and moreover show that when done appropriately, we could simultaneously improve both sensitivity as well as fidelity.
Tasks
Published 2019-12-01
URL http://papers.nips.cc/paper/9278-on-the-infidelity-and-sensitivity-of-explanations
PDF http://papers.nips.cc/paper/9278-on-the-infidelity-and-sensitivity-of-explanations.pdf
PWC https://paperswithcode.com/paper/on-the-infidelity-and-sensitivity-of
Repo https://github.com/chihkuanyeh/saliency_evaluation
Framework pytorch

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss

Title Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss
Authors Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu
Abstract We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons.
Tasks Face Generation, Talking Face Generation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Hierarchical_Cross-Modal_Talking_Face_Generation_With_Dynamic_Pixel-Wise_Loss_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Hierarchical_Cross-Modal_Talking_Face_Generation_With_Dynamic_Pixel-Wise_Loss_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/hierarchical-cross-modal-talking-face-1
Repo https://github.com/lelechen63/ATVGnet
Framework pytorch

Errudite: Scalable, Reproducible, and Testable Error Analysis

Title Errudite: Scalable, Reproducible, and Testable Error Analysis
Authors Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld
Abstract Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.
Tasks
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-1073/
PDF https://www.aclweb.org/anthology/P19-1073
PWC https://paperswithcode.com/paper/errudite-scalable-reproducible-and-testable
Repo https://github.com/uwdata/errudite
Framework none

Digging Into Self-Supervised Monocular Depth Estimation

Title Digging Into Self-Supervised Monocular Depth Estimation
Authors Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel J. Brostow
Abstract Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark.
Tasks Depth Estimation, Monocular Depth Estimation
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Godard_Digging_Into_Self-Supervised_Monocular_Depth_Estimation_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Godard_Digging_Into_Self-Supervised_Monocular_Depth_Estimation_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/digging-into-self-supervised-monocular-depth-1
Repo https://github.com/nianticlabs/monodepth2
Framework pytorch

RUBi: Reducing Unimodal Biases for Visual Question Answering

Title RUBi: Reducing Unimodal Biases for Visual Question Answering
Authors Remi Cadene, Corentin Dancette, Hedi Ben Younes, Matthieu Cord, Devi Parikh
Abstract Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training.
Tasks Question Answering, Visual Question Answering
Published 2019-12-01
URL http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering
PDF http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering.pdf
PWC https://paperswithcode.com/paper/rubi-reducing-unimodal-biases-for-visual
Repo https://github.com/cdancette/rubi.bootstrap.pytorch
Framework pytorch

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features

Title D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Authors Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, Torsten Sattler
Abstract In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.
Tasks 3D Reconstruction
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/d2-net-a-trainable-cnn-for-joint-description
Repo https://github.com/mihaidusmanu/d2-net
Framework pytorch

Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification

Title Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification
Authors Arulkumar Subramaniam, Athira Nambiar, Anurag Mittal
Abstract Person re-identification (Re-ID) is an important real-world surveillance problem that entails associating a person’s identity over a network of cameras. Video-based Re-ID approaches have gained significant attention recently since a video, and not just an image, is often available. In this work, we propose a novel Co-segmentation inspired video Re-ID deep architecture and formulate a Co-segmentation based Attention Module (COSAM) that activates a common set of salient features across multiple frames of a video via mutual consensus in an unsupervised manner. As opposed to most of the prior work, our approach is able to attend to person accessories along with the person. Our plug-and-play and interpretable COSAM module applied on two deep architectures (ResNet50, SE-ResNet50) outperform the state-of-the-art methods on three benchmark datasets.
Tasks Person Re-Identification, Video-Based Person Re-Identification
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Subramaniam_Co-Segmentation_Inspired_Attention_Networks_for_Video-Based_Person_Re-Identification_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Subramaniam_Co-Segmentation_Inspired_Attention_Networks_for_Video-Based_Person_Re-Identification_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/co-segmentation-inspired-attention-networks
Repo https://github.com/InnovArul/vidreid_cosegmentation
Framework pytorch

Addressing Failure Detection by Learning Model Confidence

Title Addressing Failure Detection by Learning Model Confidence
Authors Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez
Abstract Assessing reliably the confidence of a deep neural net and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction.
Tasks Image Classification, Semantic Segmentation
Published 2019-12-01
URL http://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence
PDF http://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence.pdf
PWC https://paperswithcode.com/paper/addressing-failure-detection-by-learning
Repo https://github.com/valeoai/ConfidNet
Framework pytorch

Conditional Structure Generation through Graph Variational Generative Adversarial Nets

Title Conditional Structure Generation through Graph Variational Generative Adversarial Nets
Authors Carl Yang, Peiye Zhuang, Wenhan Shi, Alan Luu, Pan Li
Abstract Graph embedding has been intensively studied recently, due to the advance of various neural network models. Theoretical analyses and empirical studies have pushed forward the translation of discrete graph structures into distributed representation vectors, but seldom considered the reverse direction, i.e., generation of graphs from given related context spaces. Particularly, since graphs often become more meaningful when associated with semantic contexts (e.g., social networks of certain communities, gene networks of certain diseases), the ability to infer graph structures according to given semantic conditions could be of great value. While existing graph generative models only consider graph structures without semantic contexts, we formulate the novel problem of conditional structure generation, and propose a novel unified model of graph variational generative adversarial nets (CondGen) to handle the intrinsic challenges of flexible context-structure conditioning and permutation-invariant generation. Extensive experiments on two deliberately created benchmark datasets of real-world context-enriched networks demonstrate the supreme effectiveness and generalizability of CondGen.
Tasks Graph Embedding
Published 2019-12-01
URL http://papers.nips.cc/paper/8415-conditional-structure-generation-through-graph-variational-generative-adversarial-nets
PDF http://papers.nips.cc/paper/8415-conditional-structure-generation-through-graph-variational-generative-adversarial-nets.pdf
PWC https://paperswithcode.com/paper/conditional-structure-generation-through
Repo https://github.com/KelestZ/CondGen
Framework pytorch

Modular Deep Probabilistic Programming

Title Modular Deep Probabilistic Programming
Authors Zhenwen Dai, Eric Meissner, Neil D. Lawrence
Abstract Modularity is a key feature of deep learning libraries but has not been fully exploited for probabilistic programming. We propose to improve modularity of probabilistic programming language by offering not only plain probabilistic distributions but also sophisticated probabilistic model such as Bayesian non-parametric models as fundamental building blocks. We demonstrate this idea by presenting a modular probabilistic programming language MXFusion, which includes a new type of re-usable building blocks, called probabilistic modules. A probabilistic module consists of a set of random variables with associated probabilistic distributions and dedicated inference methods. Under the framework of variational inference, the pre-specified inference methods of individual probabilistic modules can be transparently used for inference of the whole probabilistic model. We demonstrate the power and convenience of probabilistic modules in MXFusion with various examples of Gaussian process models, which are evaluated with experiments on real data.
Tasks Probabilistic Programming
Published 2019-05-01
URL https://openreview.net/forum?id=B1xnPsA5KX
PDF https://openreview.net/pdf?id=B1xnPsA5KX
PWC https://paperswithcode.com/paper/modular-deep-probabilistic-programming
Repo https://github.com/amzn/MXFusion
Framework mxnet

Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation

Title Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation
Authors Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh
Abstract There is much interest in computer vision to utilize commodity hardware for gaze estimation. A number of papers have shown that algorithms based on deep convolutional architectures are approaching accuracies where streaming data from mass-market devices can offer good gaze tracking performance, although a gap still remains between what is possible and the performance users will expect in real deployments. We observe that one obvious avenue for improvement relates to a gap between some basic technical assumptions behind most existing approaches and the statistical properties of the data used for training. Specifically, most training datasets involve tens of users with a few hundreds (or more) repeated acquisitions per user. The non i.i.d. nature of this data suggests better estimation may be possible if the model explicitly made use of such “repeated measurements” from each user as is commonly done in classical statistical analysis using so-called mixed effects models. The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. Such a formulation seeks to specifically utilize information regarding the hierarchical structure of the training data – each node in the hierarchy is a user who provides tens or hundreds of repeated samples. This modification yields an architecture that offers state of the art performance on various publicly available datasets improving results by 10-20%.
Tasks Gaze Estimation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Xiong_Mixed_Effects_Neural_Networks_MeNets_With_Applications_to_Gaze_Estimation_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Xiong_Mixed_Effects_Neural_Networks_MeNets_With_Applications_to_Gaze_Estimation_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/mixed-effects-neural-networks-menets-with
Repo https://github.com/vsingh-group/MeNets
Framework none

Automated Brain Disorders Diagnosis Through Deep Neural Networks

Title Automated Brain Disorders Diagnosis Through Deep Neural Networks
Authors Gabriel Maggiotti
Abstract In most cases, the diagnosis of brain disorders such as epilepsy is slow and requires endless visits to doctors and EEG technicians. This project aims to automate brain disorder diagnosis by using Artificial Intelligence and deep learning. Brain could have many disorders that can be detected by reading an Electroencephalography. Using an EEG device and collecting the electrical signals directly from the brain with a non-invasive procedure gives significant information about its health. Classifying and detecting anomalies on these signals is what currently doctors do when reading an Electroencephalography. With the right amount of data and the use of Artificial Intelligence, it could be possible to learn and classify these signals into groups like (i.e: anxiety, epilepsy spikes, etc). Then, a trained Neural Network to interpret those signals and identify evidence of a disorder to finally automate the detection and classification of those disorders found.
Tasks EEG
Published 2019-01-11
URL http://vixra.org/abs/1901.0166
PDF http://vixra.org/pdf/1901.0166v1.pdf
PWC https://paperswithcode.com/paper/automated-brain-disorders-diagnosis-through
Repo https://github.com/gmaggiotti/epilepsy-prediction
Framework tf

Learning Non-Volumetric Depth Fusion Using Successive Reprojections

Title Learning Non-Volumetric Depth Fusion Using Successive Reprojections
Authors Simon Donne, Andreas Geiger
Abstract Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction – most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data.
Tasks 3D Reconstruction, Depth Estimation
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Donne_Learning_Non-Volumetric_Depth_Fusion_Using_Successive_Reprojections_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Donne_Learning_Non-Volumetric_Depth_Fusion_Using_Successive_Reprojections_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/learning-non-volumetric-depth-fusion-using
Repo https://github.com/simon-donne/defusr
Framework pytorch
Title Derivative-Free Optimization of Neural Networks using Local Search
Authors Ahmed Aly, Gianluca Guadagni, Joanne Bechta Dugan
Abstract Deep Neural Networks have received a great deal of attention in the past few years. Applications of Deep Learning broached areas of different domains such as Reinforcement Learning and Computer Vision. Despite their popularity and success, training neural networks can be a challenging process. This paper presents a study on derivative-free, single-candidate optimization of neural networks using Local Search (LS). LS is an algorithm where constrained noise is iteratively applied to subsets of the search space. It is coupled with a Score Decay mechanism to enhance performance. LS is a subsidiary of the Random Search family. Experiments were conducted using a setup that is both suitable for an introduction of the algorithm and representative of modern deep learning tasks, based on the FashionMNIST dataset. Training of a 5-Million parameter CNN was done in several scenarios, including Stochastic Gradient Descent (SGD) coupled with Backpropagation (BP) for comparison. Results reveal that although LS was not competitive in terms of convergence speed, it was actually able to converge to a lower loss than SGD. In addition, LS trained the CNN using Accuracy rather than Loss as a learning signal, though to a lower performance. In conclusion, LS presents a viable alternative in cases where SGD fails or is not suitable. The simplicity of LS can make it attractive to non-experts who would want to try neural nets for the first-time or on novel, non-differentiable tasks.
Tasks
Published 2019-10-15
URL https://www.researchgate.net/publication/338501738_Derivative-Free_Optimization_of_Neural_Networks_Using_Local_Search
PDF https://www.researchgate.net/publication/338501738_Derivative-Free_Optimization_of_Neural_Networks_Using_Local_Search
PWC https://paperswithcode.com/paper/derivative-free-optimization-of-neural
Repo https://github.com/AroMorin/DNNOP
Framework pytorch
comments powered by Disqus