January 27, 2020

2980 words 14 mins read

Paper Group ANR 1333

Paper Group ANR 1333

Contextual Online False Discovery Rate Control. Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning. Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping. A Useful Taxonomy for Adversarial Robustness of Neural Networks. M3D-GAN: Multi-Modal Multi-Domain Trans …

Contextual Online False Discovery Rate Control

Title Contextual Online False Discovery Rate Control
Authors Shiyun Chen, Shiva Kasiviswanathan
Abstract Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which is the expected proportion of type I error, is an important challenge for making meaningful inferences. In this paper, we consider the problem of controlling FDR in an online manner. Concretely, we consider an ordered, possibly infinite, sequence of hypotheses, arriving one at each timestep, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis. The decision whether or not to reject the current hypothesis must be made immediately at each timestep, before the next hypothesis is observed. The model of multi-dimensional feature set provides a very general way of leveraging the auxiliary information in the data which helps in maximizing the number of discoveries. We propose a new class of powerful online testing procedures, where the rejections thresholds (significance levels) are learnt sequentially by incorporating contextual information and previous results. We prove that any rule in this class controls online FDR under some standard assumptions. We then focus on a subclass of these procedures, based on weighting significance levels, to derive a practical algorithm that learns a parametric weight function in an online fashion to gain more discoveries. We also theoretically prove, in a stylized setting, that our proposed procedures would lead to an increase in the achieved statistical power over a popular online testing procedure proposed by Javanmard & Montanari (2018). Finally, we demonstrate the favorable performance of our procedure, by comparing it to state-of-the-art online multiple testing procedures, on both synthetic data and real data generated from different applications.
Tasks
Published 2019-02-07
URL http://arxiv.org/abs/1902.02885v2
PDF http://arxiv.org/pdf/1902.02885v2.pdf
PWC https://paperswithcode.com/paper/contextual-online-false-discovery-rate
Repo
Framework

Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning

Title Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning
Authors Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean Fanello
Abstract Volumetric (4D) performance capture is fundamental for AR/VR content generation. Whereas previous work in 4D performance capture has shown impressive results in studio settings, the technology is still far from being accessible to a typical consumer who, at best, might own a single RGBD sensor. Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD camera. The key insight is to leverage previously seen “calibration” images of a given user to extrapolate what should be rendered in a novel viewpoint from the data available in the sensor. Given these past observations from multiple viewpoints, and the current RGBD image from a fixed view, we propose an end-to-end framework that fuses both these data sources to generate novel renderings of the performer. We demonstrate that the method can produce high fidelity images, and handle extreme changes in subject pose and camera viewpoints. We also show that the system generalizes to performers not seen in the training data. We run exhaustive experiments demonstrating the effectiveness of the proposed semi-parametric model (i.e. calibration images available to the neural network) compared to other state of the art machine learned solutions. Further, we compare the method with more traditional pipelines that employ multi-view capture. We show that our framework is able to achieve compelling results, with substantially less infrastructure than previously required.
Tasks Calibration
Published 2019-05-29
URL https://arxiv.org/abs/1905.12162v1
PDF https://arxiv.org/pdf/1905.12162v1.pdf
PWC https://paperswithcode.com/paper/volumetric-capture-of-humans-with-a-single-1
Repo
Framework

Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping

Title Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping
Authors Xi Yang, Bojian Wu, Issei Sato, Takeo Igarashi
Abstract Deep neural networks (DNNs) have a high accuracy on image classification tasks. However, DNNs trained by such dataset with co-occurrence bias may rely on wrong features while making decisions for classification. It will greatly affect the transferability of pre-trained DNNs. In this paper, we propose an interactive method to direct classifiers paying attentions to the regions that are manually specified by the users, in order to mitigate the influence of co-occurrence bias. We test on CelebA dataset, the pre-trained AlexNet is fine-tuned to focus on the specific facial attributes based on the results of Grad-CAM.
Tasks Image Classification
Published 2019-05-02
URL http://arxiv.org/abs/1905.00593v1
PDF http://arxiv.org/pdf/1905.00593v1.pdf
PWC https://paperswithcode.com/paper/directing-dnns-attention-for-facial
Repo
Framework

A Useful Taxonomy for Adversarial Robustness of Neural Networks

Title A Useful Taxonomy for Adversarial Robustness of Neural Networks
Authors Leslie N. Smith
Abstract Adversarial attacks and defenses are currently active areas of research for the deep learning community. A recent review paper divided the defense approaches into three categories; gradient masking, robust optimization, and adversarial example detection. We divide gradient masking and robust optimization differently: (1) increasing intra-class compactness and inter-class separation of the feature vectors improves adversarial robustness, and (2) marginalization or removal of non-robust image features also improves adversarial robustness. By reframing these topics differently, we provide a fresh perspective that provides insight into the underlying factors that enable training more robust networks and can help inspire novel solutions. In addition, there are several papers in the literature of adversarial defenses that claim there is a cost for adversarial robustness, or a trade-off between robustness and accuracy but, under this proposed taxonomy, we hypothesis that this is not universal. We follow up on our taxonomy with several challenges to the deep learning research community that builds on the connections and insights in this paper.
Tasks
Published 2019-10-23
URL https://arxiv.org/abs/1910.10679v1
PDF https://arxiv.org/pdf/1910.10679v1.pdf
PWC https://paperswithcode.com/paper/a-useful-taxonomy-for-adversarial-robustness
Repo
Framework

M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention

Title M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
Authors Shuang Ma, Daniel McDuff, Yale Song
Abstract Generative adversarial networks have led to significant advances in cross-modal/domain translation. However, typically these networks are designed for a specific task (e.g., dialogue generation or image synthesis, but not both). We present a unified model, M3D-GAN, that can translate across a wide range of modalities (e.g., text, image, and speech) and domains (e.g., attributes in images or emotions in speech). Our model consists of modality subnets that convert data from different modalities into unified representations, and a unified computing body where data from different modalities share the same network architecture. We introduce a universal attention module that is jointly trained with the whole network and learns to encode a large range of domain information into a highly structured latent space. We use this to control synthesis in novel ways, such as producing diverse realistic pictures from a sketch or varying the emotion of synthesized speech. We evaluate our approach on extensive benchmark tasks, including image-to-image, text-to-image, image captioning, text-to-speech, speech recognition, and machine translation. Our results show state-of-the-art performance on some of the tasks.
Tasks Dialogue Generation, Image Captioning, Image Generation, Machine Translation, Speech Recognition
Published 2019-07-09
URL https://arxiv.org/abs/1907.04378v1
PDF https://arxiv.org/pdf/1907.04378v1.pdf
PWC https://paperswithcode.com/paper/m3d-gan-multi-modal-multi-domain-translation
Repo
Framework

Spiking Neural Network on Neuromorphic Hardware for Energy-Efficient Unidimensional SLAM

Title Spiking Neural Network on Neuromorphic Hardware for Energy-Efficient Unidimensional SLAM
Authors Guangzhi Tang, Arpit Shah, Konstantinos P. Michmizos
Abstract Energy-efficient simultaneous localization and mapping (SLAM) is crucial for mobile robots exploring unknown environments. The mammalian brain solves SLAM via a network of specialized neurons, exhibiting asynchronous computations and event-based communications, with very low energy consumption. We propose a brain-inspired spiking neural network (SNN) architecture that solves the unidimensional SLAM by introducing spike-based reference frame transformation, visual likelihood computation, and Bayesian inference. We integrated our neuromorphic algorithm to Intel’s Loihi neuromorphic processor, a non-Von Neumann hardware that mimics the brain’s computing paradigms. We performed comparative analyses for accuracy and energy-efficiency between our neuromorphic approach and the GMapping algorithm, which is widely used in small environments. Our Loihi-based SNN architecture consumes 100 times less energy than GMapping run on a CPU while having comparable accuracy in head direction localization and map-generation. These results pave the way for scaling our approach towards active-SLAM alternative solutions for Loihi-controlled autonomous robots.
Tasks Bayesian Inference, Simultaneous Localization and Mapping
Published 2019-03-06
URL https://arxiv.org/abs/1903.02504v2
PDF https://arxiv.org/pdf/1903.02504v2.pdf
PWC https://paperswithcode.com/paper/spiking-neural-network-on-neuromorphic
Repo
Framework

Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding

Title Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding
Authors Muhammad Osama, Dave Zachariah, Thomas B. Schön
Abstract We address the problem of inferring the causal effect of an exposure on an outcome across space, using observational data. The data is possibly subject to unmeasured confounding variables which, in a standard approach, must be adjusted for by estimating a nuisance function. Here we develop a method that eliminates the nuisance function, while mitigating the resulting errors-in-variables. The result is a robust and accurate inference method for spatially varying heterogeneous causal effects. The properties of the method are demonstrated on synthetic as well as real data from Germany and the US.
Tasks
Published 2019-01-28
URL https://arxiv.org/abs/1901.09919v2
PDF https://arxiv.org/pdf/1901.09919v2.pdf
PWC https://paperswithcode.com/paper/inferring-heterogeneous-causal-effects-in
Repo
Framework

TSK-Streams: Learning TSK Fuzzy Systems on Data Streams

Title TSK-Streams: Learning TSK Fuzzy Systems on Data Streams
Authors Ammar Shaker, Eyke Hüllermeier
Abstract The problem of adaptive learning from evolving and possibly non-stationary data streams has attracted a lot of interest in machine learning in the recent past, and also stimulated research in related fields, such as computational intelligence and fuzzy systems. In particular, several rule-based methods for the incremental induction of regression models have been proposed. In this paper, we develop a method that combines the strengths of two existing approaches rooted in different learning paradigms. More concretely, our method adopts basic principles of the state-of-the-art learning algorithm AMRules and enriches them by the representational advantages of fuzzy rules. In a comprehensive experimental study, TSK-Streams is shown to be highly competitive in terms of performance.
Tasks
Published 2019-11-10
URL https://arxiv.org/abs/1911.03951v1
PDF https://arxiv.org/pdf/1911.03951v1.pdf
PWC https://paperswithcode.com/paper/tsk-streams-learning-tsk-fuzzy-systems-on
Repo
Framework

Conditional GANs For Painting Generation

Title Conditional GANs For Painting Generation
Authors Adeel Mufti, Biagio Antonelli, Julius Monello
Abstract We examined the use of modern Generative Adversarial Nets to generate novel images of oil paintings using the Painter By Numbers dataset. We implemented Spectral Normalization GAN (SN-GAN) and Spectral Normalization GAN with Gradient Penalty, and compared their outputs to a Deep Convolutional GAN. Visually, and quantitatively according to the Sliced Wasserstein Distance metric, we determined that the SN-GAN produced paintings that were most comparable to our training dataset. We then performed a series of experiments to add supervised conditioning to SN-GAN, the culmination of which is what we believe to be a novel architecture that can generate face paintings with user-specified characteristics.
Tasks
Published 2019-03-06
URL http://arxiv.org/abs/1903.06259v1
PDF http://arxiv.org/pdf/1903.06259v1.pdf
PWC https://paperswithcode.com/paper/conditional-gans-for-painting-generation
Repo
Framework

Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond

Title Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond
Authors Naimul Mefraz Khan, Nabila Abraham, Ling Guan
Abstract In this paper, we highlight three issues that limit performance of machine learning on biomedical images, and tackle them through 3 case studies: 1) Interactive Machine Learning (IML): we show how IML can drastically improve exploration time and quality of direct volume rendering. 2) transfer learning: we show how transfer learning along with intelligent pre-processing can result in better Alzheimer’s diagnosis using a much smaller training set 3) data imbalance: we show how our novel focal Tversky loss function can provide better segmentation results taking into account the imbalanced nature of segmentation datasets. The case studies are accompanied by in-depth analytical discussion of results with possible future directions.
Tasks Transfer Learning
Published 2019-02-13
URL http://arxiv.org/abs/1902.05908v1
PDF http://arxiv.org/pdf/1902.05908v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-on-biomedical-images
Repo
Framework

Missing MRI Pulse Sequence Synthesis using Multi-Modal Generative Adversarial Network

Title Missing MRI Pulse Sequence Synthesis using Multi-Modal Generative Adversarial Network
Authors Anmol Sharma, Ghassan Hamarneh
Abstract Magnetic resonance imaging (MRI) is being increasingly utilized to assess, diagnose, and plan treatment for a variety of diseases. The ability to visualize tissue in varied contrasts in the form of MR pulse sequences in a single scan provides valuable insights to physicians, as well as enabling automated systems performing downstream analysis. However many issues like prohibitive scan time, image corruption, different acquisition protocols, or allergies to certain contrast materials may hinder the process of acquiring multiple sequences for a patient. This poses challenges to both physicians and automated systems since complementary information provided by the missing sequences is lost. In this paper, we propose a variant of generative adversarial network (GAN) capable of leveraging redundant information contained within multiple available sequences in order to generate one or more missing sequences for a patient scan. The proposed network is designed as a multi-input, multi-output network which combines information from all the available pulse sequences, implicitly infers which sequences are missing, and synthesizes the missing ones in a single forward pass. We demonstrate and validate our method on two brain MRI datasets each with four sequences, and show the applicability of the proposed method in simultaneously synthesizing all missing sequences in any possible scenario where either one, two, or three of the four sequences may be missing. We compare our approach with competing unimodal and multi-modal methods, and show that we outperform both quantitatively and qualitatively.
Tasks
Published 2019-04-27
URL https://arxiv.org/abs/1904.12200v3
PDF https://arxiv.org/pdf/1904.12200v3.pdf
PWC https://paperswithcode.com/paper/missing-mri-pulse-sequence-synthesis-using
Repo
Framework

DisSim: A Discourse-Aware Syntactic Text Simplification Frameworkfor English and German

Title DisSim: A Discourse-Aware Syntactic Text Simplification Frameworkfor English and German
Authors Christina Niklaus, Matthias Cetto, Andre Freitas, Siegfried Handschuh
Abstract We introduce DisSim, a discourse-aware sentence splitting framework for English and German whose goal is to transform syntactically complex sentences into an intermediate representation that presents a simple and more regular structure which is easier to process for downstream semantic applications. For this purpose, we turn input sentences into a two-layered semantic hierarchy in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them. In that way, we preserve the coherence structure of the input and, hence, its interpretability for downstream tasks.
Tasks Text Simplification
Published 2019-09-26
URL https://arxiv.org/abs/1909.12140v1
PDF https://arxiv.org/pdf/1909.12140v1.pdf
PWC https://paperswithcode.com/paper/dissim-a-discourse-aware-syntactic-text
Repo
Framework

A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes

Title A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes
Authors Pedro V. A. de Freitas, Paulo R. C. Mendes, Gabriel N. P. dos Santos, Antonio José G. Busson, Álan Livio Guedes, Sérgio Colcher, Ruy Luiz Milidiú
Abstract Due to the extensive use of video-sharing platforms and services for their storage, the amount of such media on the internet has become massive. This volume of data makes it difficult to control the kind of content that may be present in such video files. One of the main concerns regarding the video content is if it has an inappropriate subject matter, such as nudity, violence, or other potentially disturbing content. More than telling if a video is either appropriate or inappropriate, it is also important to identify which parts of it contain such content, for preserving parts that would be discarded in a simple broad analysis. In this work, we present a multimodal~(using audio and image features) architecture based on Convolutional Neural Networks (CNNs) for detecting inappropriate scenes in video files. In the task of classifying video files, our model achieved 98.95% and 98.94% of F1-score for the appropriate and inappropriate classes, respectively. We also present a censoring tool that automatically censors inappropriate segments of a video file.
Tasks
Published 2019-11-10
URL https://arxiv.org/abs/1911.03974v1
PDF https://arxiv.org/pdf/1911.03974v1.pdf
PWC https://paperswithcode.com/paper/a-multimodal-cnn-based-tool-to-censure
Repo
Framework

Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering

Title Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering
Authors Liwei Wu, Hsiang-Fu Yu, Nikhil Rao, James Sharpnack, Cho-Jui Hsieh
Abstract In this paper, we consider recommender systems with side information in the form of graphs. Existing collaborative filtering algorithms mainly utilize only immediate neighborhood information and have a hard time taking advantage of deeper neighborhoods beyond 1-2 hops. The main caveat of exploiting deeper graph information is the rapidly growing time and space complexity when incorporating information from these neighborhoods. In this paper, we propose using Graph DNA, a novel Deep Neighborhood Aware graph encoding algorithm, for exploiting deeper neighborhood information. DNA encoding computes approximate deep neighborhood information in linear time using Bloom filters, a space-efficient probabilistic data structure and results in a per-node encoding that is logarithmic in the number of nodes in the graph. It can be used in conjunction with both feature-based and graph-regularization-based collaborative filtering algorithms. Graph DNA has the advantages of being memory and time efficient and providing additional regularization when compared to directly using higher order graph information. We conduct experiments on real-world datasets, showing graph DNA can be easily used with 4 popular collaborative filtering algorithms and consistently leads to a performance boost with little computational and memory overhead.
Tasks Recommendation Systems
Published 2019-05-29
URL https://arxiv.org/abs/1905.12217v1
PDF https://arxiv.org/pdf/1905.12217v1.pdf
PWC https://paperswithcode.com/paper/graph-dna-deep-neighborhood-aware-graph
Repo
Framework

Self-Supervised Audio-Visual Co-Segmentation

Title Self-Supervised Audio-Visual Co-Segmentation
Authors Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh McDermott, Antonio Torralba
Abstract Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object segmentation and sound source separation that learns from natural videos through self-supervision. The model is an extension of recently proposed work that maps image pixels to sounds. Here, we introduce a learning approach to disentangle concepts in the neural networks, and assign semantic categories to network feature channels to enable independent image segmentation and sound source separation after audio-visual training on videos. Our evaluations show that the disentangled model outperforms several baselines in semantic segmentation and sound source separation.
Tasks Semantic Segmentation
Published 2019-04-18
URL http://arxiv.org/abs/1904.09013v1
PDF http://arxiv.org/pdf/1904.09013v1.pdf
PWC https://paperswithcode.com/paper/self-supervised-audio-visual-co-segmentation
Repo
Framework
comments powered by Disqus