Paper Group NAWR 30
Skin Lesion Segmentation using SegNet with Binary Cross-Entropy. On the (In)fidelity and Sensitivity of Explanations. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss. Errudite: Scalable, Reproducible, and Testable Error Analysis. Digging Into Self-Supervised Monocular Depth Estimation. RUBi: Reducing Unimodal Biases fo …
Skin Lesion Segmentation using SegNet with Binary Cross-Entropy
Title | Skin Lesion Segmentation using SegNet with Binary Cross-Entropy |
Authors | Prashant Brahmbhatt, Siddhi Nath Rajan |
Abstract | In this paper a simple and computationally efficient approach as per the complexity has been presented for Automatic Skin Lesion Segmentation using a Deep Learning architecture called SegNet including some additional specifications for the improvisation of the results. The secondary objective is to keep the pre/post -processing of the images minimal. The presented model is trained on limited images from the PH2 dataset which includes dermoscopic images, manually segmented. It also contains their masks, the clinical diagnosis and the identification of several dermoscopic structures, performed by professional dermatologists. The aim is to achieve a performance threshold Jaccard Index (IOU) 92% after evaluation. |
Tasks | Lesion Segmentation, Skin Cancer Segmentation |
Published | 2019-11-15 |
URL | https://raw.githubusercontent.com/hashbanger/Skin_Lesion_Segmentation/master/abstract.txt |
https://drive.google.com/file/d/1pgAXmKgY2NerSMzvaS9M8PKnP0cTrbQM/view?usp=sharing | |
PWC | https://paperswithcode.com/paper/skin-lesion-segmentation-using-segnet-with |
Repo | https://github.com/hashbanger/Skin_Lesion_Segmentation |
Framework | none |
On the (In)fidelity and Sensitivity of Explanations
Title | On the (In)fidelity and Sensitivity of Explanations |
Authors | Chih-Kuan Yeh, Cheng-Yu Hsieh, Arun Suggala, David I. Inouye, Pradeep K. Ravikumar |
Abstract | We consider objective evaluation measures of saliency explanations for complex black-box machine learning models. We propose simple robust variants of two notions that have been considered in recent literature: (in)fidelity, and sensitivity. We analyze optimal explanations with respect to both these measures, and while the optimal explanation for sensitivity is a vacuous constant explanation, the optimal explanation for infidelity is a novel combination of two popular explanation methods. By varying the perturbation distribution that defines infidelity, we obtain novel explanations by optimizing infidelity, which we show to out-perform existing explanations in both quantitative and qualitative measurements. Another salient question given these measures is how to modify any given explanation to have better values with respect to these measures. We propose a simple modification based on lowering sensitivity, and moreover show that when done appropriately, we could simultaneously improve both sensitivity as well as fidelity. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9278-on-the-infidelity-and-sensitivity-of-explanations |
http://papers.nips.cc/paper/9278-on-the-infidelity-and-sensitivity-of-explanations.pdf | |
PWC | https://paperswithcode.com/paper/on-the-infidelity-and-sensitivity-of |
Repo | https://github.com/chihkuanyeh/saliency_evaluation |
Framework | pytorch |
Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss
Title | Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss |
Authors | Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu |
Abstract | We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons. |
Tasks | Face Generation, Talking Face Generation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Chen_Hierarchical_Cross-Modal_Talking_Face_Generation_With_Dynamic_Pixel-Wise_Loss_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Chen_Hierarchical_Cross-Modal_Talking_Face_Generation_With_Dynamic_Pixel-Wise_Loss_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-cross-modal-talking-face-1 |
Repo | https://github.com/lelechen63/ATVGnet |
Framework | pytorch |
Errudite: Scalable, Reproducible, and Testable Error Analysis
Title | Errudite: Scalable, Reproducible, and Testable Error Analysis |
Authors | Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld |
Abstract | Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs. |
Tasks | |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-1073/ |
https://www.aclweb.org/anthology/P19-1073 | |
PWC | https://paperswithcode.com/paper/errudite-scalable-reproducible-and-testable |
Repo | https://github.com/uwdata/errudite |
Framework | none |
Digging Into Self-Supervised Monocular Depth Estimation
Title | Digging Into Self-Supervised Monocular Depth Estimation |
Authors | Clement Godard, Oisin Mac Aodha, Michael Firman, Gabriel J. Brostow |
Abstract | Per-pixel ground-truth depth data is challenging to acquire at scale. To overcome this limitation, self-supervised learning has emerged as a promising alternative for training models to perform monocular depth estimation. In this paper, we propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods. Research on self-supervised monocular training usually explores increasingly complex architectures, loss functions, and image formation models, all of which have recently helped to close the gap with fully-supervised methods. We show that a surprisingly simple model, and associated design choices, lead to superior predictions. In particular, we propose (i) a minimum reprojection loss, designed to robustly handle occlusions, (ii) a full-resolution multi-scale sampling method that reduces visual artifacts, and (iii) an auto-masking loss to ignore training pixels that violate camera motion assumptions. We demonstrate the effectiveness of each component in isolation, and show high quality, state-of-the-art results on the KITTI benchmark. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Godard_Digging_Into_Self-Supervised_Monocular_Depth_Estimation_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Godard_Digging_Into_Self-Supervised_Monocular_Depth_Estimation_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/digging-into-self-supervised-monocular-depth-1 |
Repo | https://github.com/nianticlabs/monodepth2 |
Framework | pytorch |
RUBi: Reducing Unimodal Biases for Visual Question Answering
Title | RUBi: Reducing Unimodal Biases for Visual Question Answering |
Authors | Remi Cadene, Corentin Dancette, Hedi Ben Younes, Matthieu Cord, Devi Parikh |
Abstract | Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. |
Tasks | Question Answering, Visual Question Answering |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering |
http://papers.nips.cc/paper/8371-rubi-reducing-unimodal-biases-for-visual-question-answering.pdf | |
PWC | https://paperswithcode.com/paper/rubi-reducing-unimodal-biases-for-visual |
Repo | https://github.com/cdancette/rubi.bootstrap.pytorch |
Framework | pytorch |
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Title | D2-Net: A Trainable CNN for Joint Description and Detection of Local Features |
Authors | Mihai Dusmanu, Ignacio Rocco, Tomas Pajdla, Marc Pollefeys, Josef Sivic, Akihiko Torii, Torsten Sattler |
Abstract | In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction. |
Tasks | 3D Reconstruction |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/d2-net-a-trainable-cnn-for-joint-description |
Repo | https://github.com/mihaidusmanu/d2-net |
Framework | pytorch |
Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification
Title | Co-Segmentation Inspired Attention Networks for Video-Based Person Re-Identification |
Authors | Arulkumar Subramaniam, Athira Nambiar, Anurag Mittal |
Abstract | Person re-identification (Re-ID) is an important real-world surveillance problem that entails associating a person’s identity over a network of cameras. Video-based Re-ID approaches have gained significant attention recently since a video, and not just an image, is often available. In this work, we propose a novel Co-segmentation inspired video Re-ID deep architecture and formulate a Co-segmentation based Attention Module (COSAM) that activates a common set of salient features across multiple frames of a video via mutual consensus in an unsupervised manner. As opposed to most of the prior work, our approach is able to attend to person accessories along with the person. Our plug-and-play and interpretable COSAM module applied on two deep architectures (ResNet50, SE-ResNet50) outperform the state-of-the-art methods on three benchmark datasets. |
Tasks | Person Re-Identification, Video-Based Person Re-Identification |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Subramaniam_Co-Segmentation_Inspired_Attention_Networks_for_Video-Based_Person_Re-Identification_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Subramaniam_Co-Segmentation_Inspired_Attention_Networks_for_Video-Based_Person_Re-Identification_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/co-segmentation-inspired-attention-networks |
Repo | https://github.com/InnovArul/vidreid_cosegmentation |
Framework | pytorch |
Addressing Failure Detection by Learning Model Confidence
Title | Addressing Failure Detection by Learning Model Confidence |
Authors | Charles Corbière, Nicolas Thome, Avner Bar-Hen, Matthieu Cord, Patrick Pérez |
Abstract | Assessing reliably the confidence of a deep neural net and predicting its failures is of primary importance for the practical deployment of these models. In this paper, we propose a new target criterion for model confidence, corresponding to the True Class Probability (TCP). We show how using the TCP is more suited than relying on the classic Maximum Class Probability (MCP). We provide in addition theoretical guarantees for TCP in the context of failure prediction. Since the true class is by essence unknown at test time, we propose to learn TCP criterion on the training set, introducing a specific learning scheme adapted to this context. Extensive experiments are conducted for validating the relevance of the proposed approach. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. We show that our approach consistently outperforms several strong methods, from MCP to Bayesian uncertainty, as well as recent approaches specifically designed for failure prediction. |
Tasks | Image Classification, Semantic Segmentation |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence |
http://papers.nips.cc/paper/8556-addressing-failure-detection-by-learning-model-confidence.pdf | |
PWC | https://paperswithcode.com/paper/addressing-failure-detection-by-learning |
Repo | https://github.com/valeoai/ConfidNet |
Framework | pytorch |
Conditional Structure Generation through Graph Variational Generative Adversarial Nets
Title | Conditional Structure Generation through Graph Variational Generative Adversarial Nets |
Authors | Carl Yang, Peiye Zhuang, Wenhan Shi, Alan Luu, Pan Li |
Abstract | Graph embedding has been intensively studied recently, due to the advance of various neural network models. Theoretical analyses and empirical studies have pushed forward the translation of discrete graph structures into distributed representation vectors, but seldom considered the reverse direction, i.e., generation of graphs from given related context spaces. Particularly, since graphs often become more meaningful when associated with semantic contexts (e.g., social networks of certain communities, gene networks of certain diseases), the ability to infer graph structures according to given semantic conditions could be of great value. While existing graph generative models only consider graph structures without semantic contexts, we formulate the novel problem of conditional structure generation, and propose a novel unified model of graph variational generative adversarial nets (CondGen) to handle the intrinsic challenges of flexible context-structure conditioning and permutation-invariant generation. Extensive experiments on two deliberately created benchmark datasets of real-world context-enriched networks demonstrate the supreme effectiveness and generalizability of CondGen. |
Tasks | Graph Embedding |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8415-conditional-structure-generation-through-graph-variational-generative-adversarial-nets |
http://papers.nips.cc/paper/8415-conditional-structure-generation-through-graph-variational-generative-adversarial-nets.pdf | |
PWC | https://paperswithcode.com/paper/conditional-structure-generation-through |
Repo | https://github.com/KelestZ/CondGen |
Framework | pytorch |
Modular Deep Probabilistic Programming
Title | Modular Deep Probabilistic Programming |
Authors | Zhenwen Dai, Eric Meissner, Neil D. Lawrence |
Abstract | Modularity is a key feature of deep learning libraries but has not been fully exploited for probabilistic programming. We propose to improve modularity of probabilistic programming language by offering not only plain probabilistic distributions but also sophisticated probabilistic model such as Bayesian non-parametric models as fundamental building blocks. We demonstrate this idea by presenting a modular probabilistic programming language MXFusion, which includes a new type of re-usable building blocks, called probabilistic modules. A probabilistic module consists of a set of random variables with associated probabilistic distributions and dedicated inference methods. Under the framework of variational inference, the pre-specified inference methods of individual probabilistic modules can be transparently used for inference of the whole probabilistic model. We demonstrate the power and convenience of probabilistic modules in MXFusion with various examples of Gaussian process models, which are evaluated with experiments on real data. |
Tasks | Probabilistic Programming |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=B1xnPsA5KX |
https://openreview.net/pdf?id=B1xnPsA5KX | |
PWC | https://paperswithcode.com/paper/modular-deep-probabilistic-programming |
Repo | https://github.com/amzn/MXFusion |
Framework | mxnet |
Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation
Title | Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation |
Authors | Yunyang Xiong, Hyunwoo J. Kim, Vikas Singh |
Abstract | There is much interest in computer vision to utilize commodity hardware for gaze estimation. A number of papers have shown that algorithms based on deep convolutional architectures are approaching accuracies where streaming data from mass-market devices can offer good gaze tracking performance, although a gap still remains between what is possible and the performance users will expect in real deployments. We observe that one obvious avenue for improvement relates to a gap between some basic technical assumptions behind most existing approaches and the statistical properties of the data used for training. Specifically, most training datasets involve tens of users with a few hundreds (or more) repeated acquisitions per user. The non i.i.d. nature of this data suggests better estimation may be possible if the model explicitly made use of such “repeated measurements” from each user as is commonly done in classical statistical analysis using so-called mixed effects models. The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. Such a formulation seeks to specifically utilize information regarding the hierarchical structure of the training data – each node in the hierarchy is a user who provides tens or hundreds of repeated samples. This modification yields an architecture that offers state of the art performance on various publicly available datasets improving results by 10-20%. |
Tasks | Gaze Estimation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Xiong_Mixed_Effects_Neural_Networks_MeNets_With_Applications_to_Gaze_Estimation_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Xiong_Mixed_Effects_Neural_Networks_MeNets_With_Applications_to_Gaze_Estimation_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/mixed-effects-neural-networks-menets-with |
Repo | https://github.com/vsingh-group/MeNets |
Framework | none |
Automated Brain Disorders Diagnosis Through Deep Neural Networks
Title | Automated Brain Disorders Diagnosis Through Deep Neural Networks |
Authors | Gabriel Maggiotti |
Abstract | In most cases, the diagnosis of brain disorders such as epilepsy is slow and requires endless visits to doctors and EEG technicians. This project aims to automate brain disorder diagnosis by using Artificial Intelligence and deep learning. Brain could have many disorders that can be detected by reading an Electroencephalography. Using an EEG device and collecting the electrical signals directly from the brain with a non-invasive procedure gives significant information about its health. Classifying and detecting anomalies on these signals is what currently doctors do when reading an Electroencephalography. With the right amount of data and the use of Artificial Intelligence, it could be possible to learn and classify these signals into groups like (i.e: anxiety, epilepsy spikes, etc). Then, a trained Neural Network to interpret those signals and identify evidence of a disorder to finally automate the detection and classification of those disorders found. |
Tasks | EEG |
Published | 2019-01-11 |
URL | http://vixra.org/abs/1901.0166 |
http://vixra.org/pdf/1901.0166v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-brain-disorders-diagnosis-through |
Repo | https://github.com/gmaggiotti/epilepsy-prediction |
Framework | tf |
Learning Non-Volumetric Depth Fusion Using Successive Reprojections
Title | Learning Non-Volumetric Depth Fusion Using Successive Reprojections |
Authors | Simon Donne, Andreas Geiger |
Abstract | Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction – most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data. |
Tasks | 3D Reconstruction, Depth Estimation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Donne_Learning_Non-Volumetric_Depth_Fusion_Using_Successive_Reprojections_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Donne_Learning_Non-Volumetric_Depth_Fusion_Using_Successive_Reprojections_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/learning-non-volumetric-depth-fusion-using |
Repo | https://github.com/simon-donne/defusr |
Framework | pytorch |
Derivative-Free Optimization of Neural Networks using Local Search
Title | Derivative-Free Optimization of Neural Networks using Local Search |
Authors | Ahmed Aly, Gianluca Guadagni, Joanne Bechta Dugan |
Abstract | Deep Neural Networks have received a great deal of attention in the past few years. Applications of Deep Learning broached areas of different domains such as Reinforcement Learning and Computer Vision. Despite their popularity and success, training neural networks can be a challenging process. This paper presents a study on derivative-free, single-candidate optimization of neural networks using Local Search (LS). LS is an algorithm where constrained noise is iteratively applied to subsets of the search space. It is coupled with a Score Decay mechanism to enhance performance. LS is a subsidiary of the Random Search family. Experiments were conducted using a setup that is both suitable for an introduction of the algorithm and representative of modern deep learning tasks, based on the FashionMNIST dataset. Training of a 5-Million parameter CNN was done in several scenarios, including Stochastic Gradient Descent (SGD) coupled with Backpropagation (BP) for comparison. Results reveal that although LS was not competitive in terms of convergence speed, it was actually able to converge to a lower loss than SGD. In addition, LS trained the CNN using Accuracy rather than Loss as a learning signal, though to a lower performance. In conclusion, LS presents a viable alternative in cases where SGD fails or is not suitable. The simplicity of LS can make it attractive to non-experts who would want to try neural nets for the first-time or on novel, non-differentiable tasks. |
Tasks | |
Published | 2019-10-15 |
URL | https://www.researchgate.net/publication/338501738_Derivative-Free_Optimization_of_Neural_Networks_Using_Local_Search |
https://www.researchgate.net/publication/338501738_Derivative-Free_Optimization_of_Neural_Networks_Using_Local_Search | |
PWC | https://paperswithcode.com/paper/derivative-free-optimization-of-neural |
Repo | https://github.com/AroMorin/DNNOP |
Framework | pytorch |