Paper Group ANR 1333
Contextual Online False Discovery Rate Control. Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning. Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping. A Useful Taxonomy for Adversarial Robustness of Neural Networks. M3D-GAN: Multi-Modal Multi-Domain Trans …
Contextual Online False Discovery Rate Control
Title | Contextual Online False Discovery Rate Control |
Authors | Shiyun Chen, Shiva Kasiviswanathan |
Abstract | Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which is the expected proportion of type I error, is an important challenge for making meaningful inferences. In this paper, we consider the problem of controlling FDR in an online manner. Concretely, we consider an ordered, possibly infinite, sequence of hypotheses, arriving one at each timestep, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis. The decision whether or not to reject the current hypothesis must be made immediately at each timestep, before the next hypothesis is observed. The model of multi-dimensional feature set provides a very general way of leveraging the auxiliary information in the data which helps in maximizing the number of discoveries. We propose a new class of powerful online testing procedures, where the rejections thresholds (significance levels) are learnt sequentially by incorporating contextual information and previous results. We prove that any rule in this class controls online FDR under some standard assumptions. We then focus on a subclass of these procedures, based on weighting significance levels, to derive a practical algorithm that learns a parametric weight function in an online fashion to gain more discoveries. We also theoretically prove, in a stylized setting, that our proposed procedures would lead to an increase in the achieved statistical power over a popular online testing procedure proposed by Javanmard & Montanari (2018). Finally, we demonstrate the favorable performance of our procedure, by comparing it to state-of-the-art online multiple testing procedures, on both synthetic data and real data generated from different applications. |
Tasks | |
Published | 2019-02-07 |
URL | http://arxiv.org/abs/1902.02885v2 |
http://arxiv.org/pdf/1902.02885v2.pdf | |
PWC | https://paperswithcode.com/paper/contextual-online-false-discovery-rate |
Repo | |
Framework | |
Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning
Title | Volumetric Capture of Humans with a Single RGBD Camera via Semi-Parametric Learning |
Authors | Rohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean Fanello |
Abstract | Volumetric (4D) performance capture is fundamental for AR/VR content generation. Whereas previous work in 4D performance capture has shown impressive results in studio settings, the technology is still far from being accessible to a typical consumer who, at best, might own a single RGBD sensor. Thus, in this work, we propose a method to synthesize free viewpoint renderings using a single RGBD camera. The key insight is to leverage previously seen “calibration” images of a given user to extrapolate what should be rendered in a novel viewpoint from the data available in the sensor. Given these past observations from multiple viewpoints, and the current RGBD image from a fixed view, we propose an end-to-end framework that fuses both these data sources to generate novel renderings of the performer. We demonstrate that the method can produce high fidelity images, and handle extreme changes in subject pose and camera viewpoints. We also show that the system generalizes to performers not seen in the training data. We run exhaustive experiments demonstrating the effectiveness of the proposed semi-parametric model (i.e. calibration images available to the neural network) compared to other state of the art machine learned solutions. Further, we compare the method with more traditional pipelines that employ multi-view capture. We show that our framework is able to achieve compelling results, with substantially less infrastructure than previously required. |
Tasks | Calibration |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12162v1 |
https://arxiv.org/pdf/1905.12162v1.pdf | |
PWC | https://paperswithcode.com/paper/volumetric-capture-of-humans-with-a-single-1 |
Repo | |
Framework | |
Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping
Title | Directing DNNs Attention for Facial Attribution Classification using Gradient-weighted Class Activation Mapping |
Authors | Xi Yang, Bojian Wu, Issei Sato, Takeo Igarashi |
Abstract | Deep neural networks (DNNs) have a high accuracy on image classification tasks. However, DNNs trained by such dataset with co-occurrence bias may rely on wrong features while making decisions for classification. It will greatly affect the transferability of pre-trained DNNs. In this paper, we propose an interactive method to direct classifiers paying attentions to the regions that are manually specified by the users, in order to mitigate the influence of co-occurrence bias. We test on CelebA dataset, the pre-trained AlexNet is fine-tuned to focus on the specific facial attributes based on the results of Grad-CAM. |
Tasks | Image Classification |
Published | 2019-05-02 |
URL | http://arxiv.org/abs/1905.00593v1 |
http://arxiv.org/pdf/1905.00593v1.pdf | |
PWC | https://paperswithcode.com/paper/directing-dnns-attention-for-facial |
Repo | |
Framework | |
A Useful Taxonomy for Adversarial Robustness of Neural Networks
Title | A Useful Taxonomy for Adversarial Robustness of Neural Networks |
Authors | Leslie N. Smith |
Abstract | Adversarial attacks and defenses are currently active areas of research for the deep learning community. A recent review paper divided the defense approaches into three categories; gradient masking, robust optimization, and adversarial example detection. We divide gradient masking and robust optimization differently: (1) increasing intra-class compactness and inter-class separation of the feature vectors improves adversarial robustness, and (2) marginalization or removal of non-robust image features also improves adversarial robustness. By reframing these topics differently, we provide a fresh perspective that provides insight into the underlying factors that enable training more robust networks and can help inspire novel solutions. In addition, there are several papers in the literature of adversarial defenses that claim there is a cost for adversarial robustness, or a trade-off between robustness and accuracy but, under this proposed taxonomy, we hypothesis that this is not universal. We follow up on our taxonomy with several challenges to the deep learning research community that builds on the connections and insights in this paper. |
Tasks | |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10679v1 |
https://arxiv.org/pdf/1910.10679v1.pdf | |
PWC | https://paperswithcode.com/paper/a-useful-taxonomy-for-adversarial-robustness |
Repo | |
Framework | |
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
Title | M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention |
Authors | Shuang Ma, Daniel McDuff, Yale Song |
Abstract | Generative adversarial networks have led to significant advances in cross-modal/domain translation. However, typically these networks are designed for a specific task (e.g., dialogue generation or image synthesis, but not both). We present a unified model, M3D-GAN, that can translate across a wide range of modalities (e.g., text, image, and speech) and domains (e.g., attributes in images or emotions in speech). Our model consists of modality subnets that convert data from different modalities into unified representations, and a unified computing body where data from different modalities share the same network architecture. We introduce a universal attention module that is jointly trained with the whole network and learns to encode a large range of domain information into a highly structured latent space. We use this to control synthesis in novel ways, such as producing diverse realistic pictures from a sketch or varying the emotion of synthesized speech. We evaluate our approach on extensive benchmark tasks, including image-to-image, text-to-image, image captioning, text-to-speech, speech recognition, and machine translation. Our results show state-of-the-art performance on some of the tasks. |
Tasks | Dialogue Generation, Image Captioning, Image Generation, Machine Translation, Speech Recognition |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04378v1 |
https://arxiv.org/pdf/1907.04378v1.pdf | |
PWC | https://paperswithcode.com/paper/m3d-gan-multi-modal-multi-domain-translation |
Repo | |
Framework | |
Spiking Neural Network on Neuromorphic Hardware for Energy-Efficient Unidimensional SLAM
Title | Spiking Neural Network on Neuromorphic Hardware for Energy-Efficient Unidimensional SLAM |
Authors | Guangzhi Tang, Arpit Shah, Konstantinos P. Michmizos |
Abstract | Energy-efficient simultaneous localization and mapping (SLAM) is crucial for mobile robots exploring unknown environments. The mammalian brain solves SLAM via a network of specialized neurons, exhibiting asynchronous computations and event-based communications, with very low energy consumption. We propose a brain-inspired spiking neural network (SNN) architecture that solves the unidimensional SLAM by introducing spike-based reference frame transformation, visual likelihood computation, and Bayesian inference. We integrated our neuromorphic algorithm to Intel’s Loihi neuromorphic processor, a non-Von Neumann hardware that mimics the brain’s computing paradigms. We performed comparative analyses for accuracy and energy-efficiency between our neuromorphic approach and the GMapping algorithm, which is widely used in small environments. Our Loihi-based SNN architecture consumes 100 times less energy than GMapping run on a CPU while having comparable accuracy in head direction localization and map-generation. These results pave the way for scaling our approach towards active-SLAM alternative solutions for Loihi-controlled autonomous robots. |
Tasks | Bayesian Inference, Simultaneous Localization and Mapping |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02504v2 |
https://arxiv.org/pdf/1903.02504v2.pdf | |
PWC | https://paperswithcode.com/paper/spiking-neural-network-on-neuromorphic |
Repo | |
Framework | |
Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding
Title | Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding |
Authors | Muhammad Osama, Dave Zachariah, Thomas B. Schön |
Abstract | We address the problem of inferring the causal effect of an exposure on an outcome across space, using observational data. The data is possibly subject to unmeasured confounding variables which, in a standard approach, must be adjusted for by estimating a nuisance function. Here we develop a method that eliminates the nuisance function, while mitigating the resulting errors-in-variables. The result is a robust and accurate inference method for spatially varying heterogeneous causal effects. The properties of the method are demonstrated on synthetic as well as real data from Germany and the US. |
Tasks | |
Published | 2019-01-28 |
URL | https://arxiv.org/abs/1901.09919v2 |
https://arxiv.org/pdf/1901.09919v2.pdf | |
PWC | https://paperswithcode.com/paper/inferring-heterogeneous-causal-effects-in |
Repo | |
Framework | |
TSK-Streams: Learning TSK Fuzzy Systems on Data Streams
Title | TSK-Streams: Learning TSK Fuzzy Systems on Data Streams |
Authors | Ammar Shaker, Eyke Hüllermeier |
Abstract | The problem of adaptive learning from evolving and possibly non-stationary data streams has attracted a lot of interest in machine learning in the recent past, and also stimulated research in related fields, such as computational intelligence and fuzzy systems. In particular, several rule-based methods for the incremental induction of regression models have been proposed. In this paper, we develop a method that combines the strengths of two existing approaches rooted in different learning paradigms. More concretely, our method adopts basic principles of the state-of-the-art learning algorithm AMRules and enriches them by the representational advantages of fuzzy rules. In a comprehensive experimental study, TSK-Streams is shown to be highly competitive in terms of performance. |
Tasks | |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03951v1 |
https://arxiv.org/pdf/1911.03951v1.pdf | |
PWC | https://paperswithcode.com/paper/tsk-streams-learning-tsk-fuzzy-systems-on |
Repo | |
Framework | |
Conditional GANs For Painting Generation
Title | Conditional GANs For Painting Generation |
Authors | Adeel Mufti, Biagio Antonelli, Julius Monello |
Abstract | We examined the use of modern Generative Adversarial Nets to generate novel images of oil paintings using the Painter By Numbers dataset. We implemented Spectral Normalization GAN (SN-GAN) and Spectral Normalization GAN with Gradient Penalty, and compared their outputs to a Deep Convolutional GAN. Visually, and quantitatively according to the Sliced Wasserstein Distance metric, we determined that the SN-GAN produced paintings that were most comparable to our training dataset. We then performed a series of experiments to add supervised conditioning to SN-GAN, the culmination of which is what we believe to be a novel architecture that can generate face paintings with user-specified characteristics. |
Tasks | |
Published | 2019-03-06 |
URL | http://arxiv.org/abs/1903.06259v1 |
http://arxiv.org/pdf/1903.06259v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-gans-for-painting-generation |
Repo | |
Framework | |
Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond
Title | Machine Learning on Biomedical Images: Interactive Learning, Transfer Learning, Class Imbalance, and Beyond |
Authors | Naimul Mefraz Khan, Nabila Abraham, Ling Guan |
Abstract | In this paper, we highlight three issues that limit performance of machine learning on biomedical images, and tackle them through 3 case studies: 1) Interactive Machine Learning (IML): we show how IML can drastically improve exploration time and quality of direct volume rendering. 2) transfer learning: we show how transfer learning along with intelligent pre-processing can result in better Alzheimer’s diagnosis using a much smaller training set 3) data imbalance: we show how our novel focal Tversky loss function can provide better segmentation results taking into account the imbalanced nature of segmentation datasets. The case studies are accompanied by in-depth analytical discussion of results with possible future directions. |
Tasks | Transfer Learning |
Published | 2019-02-13 |
URL | http://arxiv.org/abs/1902.05908v1 |
http://arxiv.org/pdf/1902.05908v1.pdf | |
PWC | https://paperswithcode.com/paper/machine-learning-on-biomedical-images |
Repo | |
Framework | |
Missing MRI Pulse Sequence Synthesis using Multi-Modal Generative Adversarial Network
Title | Missing MRI Pulse Sequence Synthesis using Multi-Modal Generative Adversarial Network |
Authors | Anmol Sharma, Ghassan Hamarneh |
Abstract | Magnetic resonance imaging (MRI) is being increasingly utilized to assess, diagnose, and plan treatment for a variety of diseases. The ability to visualize tissue in varied contrasts in the form of MR pulse sequences in a single scan provides valuable insights to physicians, as well as enabling automated systems performing downstream analysis. However many issues like prohibitive scan time, image corruption, different acquisition protocols, or allergies to certain contrast materials may hinder the process of acquiring multiple sequences for a patient. This poses challenges to both physicians and automated systems since complementary information provided by the missing sequences is lost. In this paper, we propose a variant of generative adversarial network (GAN) capable of leveraging redundant information contained within multiple available sequences in order to generate one or more missing sequences for a patient scan. The proposed network is designed as a multi-input, multi-output network which combines information from all the available pulse sequences, implicitly infers which sequences are missing, and synthesizes the missing ones in a single forward pass. We demonstrate and validate our method on two brain MRI datasets each with four sequences, and show the applicability of the proposed method in simultaneously synthesizing all missing sequences in any possible scenario where either one, two, or three of the four sequences may be missing. We compare our approach with competing unimodal and multi-modal methods, and show that we outperform both quantitatively and qualitatively. |
Tasks | |
Published | 2019-04-27 |
URL | https://arxiv.org/abs/1904.12200v3 |
https://arxiv.org/pdf/1904.12200v3.pdf | |
PWC | https://paperswithcode.com/paper/missing-mri-pulse-sequence-synthesis-using |
Repo | |
Framework | |
DisSim: A Discourse-Aware Syntactic Text Simplification Frameworkfor English and German
Title | DisSim: A Discourse-Aware Syntactic Text Simplification Frameworkfor English and German |
Authors | Christina Niklaus, Matthias Cetto, Andre Freitas, Siegfried Handschuh |
Abstract | We introduce DisSim, a discourse-aware sentence splitting framework for English and German whose goal is to transform syntactically complex sentences into an intermediate representation that presents a simple and more regular structure which is easier to process for downstream semantic applications. For this purpose, we turn input sentences into a two-layered semantic hierarchy in the form of core facts and accompanying contexts, while identifying the rhetorical relations that hold between them. In that way, we preserve the coherence structure of the input and, hence, its interpretability for downstream tasks. |
Tasks | Text Simplification |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12140v1 |
https://arxiv.org/pdf/1909.12140v1.pdf | |
PWC | https://paperswithcode.com/paper/dissim-a-discourse-aware-syntactic-text |
Repo | |
Framework | |
A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes
Title | A Multimodal CNN-based Tool to Censure Inappropriate Video Scenes |
Authors | Pedro V. A. de Freitas, Paulo R. C. Mendes, Gabriel N. P. dos Santos, Antonio José G. Busson, Álan Livio Guedes, Sérgio Colcher, Ruy Luiz Milidiú |
Abstract | Due to the extensive use of video-sharing platforms and services for their storage, the amount of such media on the internet has become massive. This volume of data makes it difficult to control the kind of content that may be present in such video files. One of the main concerns regarding the video content is if it has an inappropriate subject matter, such as nudity, violence, or other potentially disturbing content. More than telling if a video is either appropriate or inappropriate, it is also important to identify which parts of it contain such content, for preserving parts that would be discarded in a simple broad analysis. In this work, we present a multimodal~(using audio and image features) architecture based on Convolutional Neural Networks (CNNs) for detecting inappropriate scenes in video files. In the task of classifying video files, our model achieved 98.95% and 98.94% of F1-score for the appropriate and inappropriate classes, respectively. We also present a censoring tool that automatically censors inappropriate segments of a video file. |
Tasks | |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.03974v1 |
https://arxiv.org/pdf/1911.03974v1.pdf | |
PWC | https://paperswithcode.com/paper/a-multimodal-cnn-based-tool-to-censure |
Repo | |
Framework | |
Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering
Title | Graph DNA: Deep Neighborhood Aware Graph Encoding for Collaborative Filtering |
Authors | Liwei Wu, Hsiang-Fu Yu, Nikhil Rao, James Sharpnack, Cho-Jui Hsieh |
Abstract | In this paper, we consider recommender systems with side information in the form of graphs. Existing collaborative filtering algorithms mainly utilize only immediate neighborhood information and have a hard time taking advantage of deeper neighborhoods beyond 1-2 hops. The main caveat of exploiting deeper graph information is the rapidly growing time and space complexity when incorporating information from these neighborhoods. In this paper, we propose using Graph DNA, a novel Deep Neighborhood Aware graph encoding algorithm, for exploiting deeper neighborhood information. DNA encoding computes approximate deep neighborhood information in linear time using Bloom filters, a space-efficient probabilistic data structure and results in a per-node encoding that is logarithmic in the number of nodes in the graph. It can be used in conjunction with both feature-based and graph-regularization-based collaborative filtering algorithms. Graph DNA has the advantages of being memory and time efficient and providing additional regularization when compared to directly using higher order graph information. We conduct experiments on real-world datasets, showing graph DNA can be easily used with 4 popular collaborative filtering algorithms and consistently leads to a performance boost with little computational and memory overhead. |
Tasks | Recommendation Systems |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12217v1 |
https://arxiv.org/pdf/1905.12217v1.pdf | |
PWC | https://paperswithcode.com/paper/graph-dna-deep-neighborhood-aware-graph |
Repo | |
Framework | |
Self-Supervised Audio-Visual Co-Segmentation
Title | Self-Supervised Audio-Visual Co-Segmentation |
Authors | Andrew Rouditchenko, Hang Zhao, Chuang Gan, Josh McDermott, Antonio Torralba |
Abstract | Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object segmentation and sound source separation that learns from natural videos through self-supervision. The model is an extension of recently proposed work that maps image pixels to sounds. Here, we introduce a learning approach to disentangle concepts in the neural networks, and assign semantic categories to network feature channels to enable independent image segmentation and sound source separation after audio-visual training on videos. Our evaluations show that the disentangled model outperforms several baselines in semantic segmentation and sound source separation. |
Tasks | Semantic Segmentation |
Published | 2019-04-18 |
URL | http://arxiv.org/abs/1904.09013v1 |
http://arxiv.org/pdf/1904.09013v1.pdf | |
PWC | https://paperswithcode.com/paper/self-supervised-audio-visual-co-segmentation |
Repo | |
Framework | |