Paper Group ANR 187
Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift. Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review. StyleRemix: An Interpretable Representation for Neural Image Style Transfer. Cascaded Context Pyramid for Full-Resolution 3D …
Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift
Title | Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift |
Authors | Rene Bidart, Alexander Wong |
Abstract | In this study, we propose the Affine Variational Autoencoder (AVAE), a variant of Variational Autoencoder (VAE) designed to improve robustness by overcoming the inability of VAEs to generalize to distributional shifts in the form of affine perturbations. By optimizing an affine transform to maximize ELBO, the proposed AVAE transforms an input to the training distribution without the need to increase model complexity to model the full distribution of affine transforms. In addition, we introduce a training procedure to create an efficient model by learning a subset of the training distribution, and using the AVAE to improve generalization and robustness to distributional shift at test time. Experiments on affine perturbations demonstrate that the proposed AVAE significantly improves generalization and robustness to distributional shift in the form of affine perturbations without an increase in model complexity. |
Tasks | |
Published | 2019-05-13 |
URL | https://arxiv.org/abs/1905.05300v1 |
https://arxiv.org/pdf/1905.05300v1.pdf | |
PWC | https://paperswithcode.com/paper/affine-variational-autoencoders-an-efficient |
Repo | |
Framework | |
Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review
Title | Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review |
Authors | Qinbing Fu, Hongxin Wang, Cheng Hu, Shigang Yue |
Abstract | Motion perception is a critical capability determining a variety of aspects of insects’ life, including avoiding predators, foraging and so forth. A good number of motion detectors have been identified in the insects’ visual pathways. Computational modelling of these motion detectors has not only been providing effective solutions to artificial intelligence, but also benefiting the understanding of complicated biological visual systems. These biological mechanisms through millions of years of evolutionary development will have formed solid modules for constructing dynamic vision systems for future intelligent machines. This article reviews the computational motion perception models originating from biological research of insects’ visual systems in the literature. These motion perception models or neural networks comprise the looming sensitive neuronal models of lobula giant movement detectors (LGMDs) in locusts, the translation sensitive neural systems of direction selective neurons (DSNs) in fruit flies, bees and locusts, as well as the small target motion detectors (STMDs) in dragonflies and hover flies. We also review the applications of these models to robots and vehicles. Through these modelling studies, we summarise the methodologies that generate different direction and size selectivity in motion perception. At last, we discuss about multiple systems integration and hardware realisation of these bio-inspired motion perception models. |
Tasks | |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02048v1 |
http://arxiv.org/pdf/1904.02048v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-computational-models-and-applications |
Repo | |
Framework | |
StyleRemix: An Interpretable Representation for Neural Image Style Transfer
Title | StyleRemix: An Interpretable Representation for Neural Image Style Transfer |
Authors | Hongmin Xu, Qiang Li, Wenbo Zhang, Wen Zheng |
Abstract | Multi-Style Transfer (MST) intents to capture the high-level visual vocabulary of different styles and expresses these vocabularies in a joint model to transfer each specific style. Recently, Style Embedding Learning (SEL) based methods represent each style with an explicit set of parameters to perform MST task. However, most existing SEL methods either learn explicit style representation with numerous independent parameters or learn a relatively black-box style representation, which makes them difficult to control the stylized results. In this paper, we outline a novel MST model, StyleRemix, to compactly and explicitly integrate multiple styles into one network. By decomposing diverse styles into the same basis, StyleRemix represents a specific style in a continuous vector space with 1-dimensional coefficients. With the interpretable style representation, StyleRemix not only enables the style visualization task but also allows several ways of remixing styles in the smooth style embedding space.~Extensive experiments demonstrate the effectiveness of StyleRemix on various MST tasks compared to state-of-the-art SEL approaches. |
Tasks | Style Transfer |
Published | 2019-02-27 |
URL | http://arxiv.org/abs/1902.10425v3 |
http://arxiv.org/pdf/1902.10425v3.pdf | |
PWC | https://paperswithcode.com/paper/styleremix-an-interpretable-representation |
Repo | |
Framework | |
Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion
Title | Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion |
Authors | Pingping Zhang, Wei Liu, Yinjie Lei, Huchuan Lu, Xiaoyun Yang |
Abstract | Semantic Scene Completion (SSC) aims to simultaneously predict the volumetric occupancy and semantic category of a 3D scene. It helps intelligent devices to understand and interact with the surrounding scenes. Due to the high-memory requirement, current methods only produce low-resolution completion predictions, and generally lose the object details. Furthermore, they also ignore the multi-scale spatial contexts, which play a vital role for the 3D inference. To address these issues, in this work we propose a novel deep learning framework, named Cascaded Context Pyramid Network (CCPNet), to jointly infer the occupancy and semantic labels of a volumetric 3D scene from a single depth image. The proposed CCPNet improves the labeling coherence with a cascaded context pyramid. Meanwhile, based on the low-level features, it progressively restores the fine-structures of objects with Guided Residual Refinement (GRR) modules. Our proposed framework has three outstanding advantages: (1) it explicitly models the 3D spatial context for performance improvement; (2) full-resolution 3D volumes are produced with structure-preserving details; (3) light-weight models with low-memory requirements are captured with a good extensibility. Extensive experiments demonstrate that in spite of taking a single-view depth map, our proposed framework can generate high-quality SSC results, and outperforms state-of-the-art approaches on both the synthetic SUNCG and real NYU datasets. |
Tasks | |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00382v1 |
https://arxiv.org/pdf/1908.00382v1.pdf | |
PWC | https://paperswithcode.com/paper/cascaded-context-pyramid-for-full-resolution |
Repo | |
Framework | |
A deep-learning-based surrogate model for data assimilation in dynamic subsurface flow problems
Title | A deep-learning-based surrogate model for data assimilation in dynamic subsurface flow problems |
Authors | Meng Tang, Yimin Liu, Louis J. Durlofsky |
Abstract | A deep-learning-based surrogate model is developed and applied for predicting dynamic subsurface flow in channelized geological models. The surrogate model is based on deep convolutional and recurrent neural network architectures, specifically a residual U-Net and a convolutional long short term memory recurrent network. Training samples entail global pressure and saturation maps, at a series of time steps, generated by simulating oil-water flow in many (1500 in our case) realizations of a 2D channelized system. After training, the `recurrent R-U-Net’ surrogate model is shown to be capable of accurately predicting dynamic pressure and saturation maps and well rates (e.g., time-varying oil and water rates at production wells) for new geological realizations. Assessments demonstrating high surrogate-model accuracy are presented for an individual geological realization and for an ensemble of 500 test geomodels. The surrogate model is then used for the challenging problem of data assimilation (history matching) in a channelized system. For this study, posterior reservoir models are generated using the randomized maximum likelihood method, with the permeability field represented using the recently developed CNN-PCA parameterization. The flow responses required during the data assimilation procedure are provided by the recurrent R-U-Net. The overall approach is shown to lead to substantial reduction in prediction uncertainty. High-fidelity numerical simulation results for the posterior geomodels (generated by the surrogate-based data assimilation procedure) are shown to be in essential agreement with the recurrent R-U-Net predictions. The accuracy and dramatic speedup provided by the surrogate model suggest that it may eventually enable the application of more formal posterior sampling methods in realistic problems. | |
Tasks | |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05823v1 |
https://arxiv.org/pdf/1908.05823v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-learning-based-surrogate-model-for |
Repo | |
Framework | |
Scalable Knowledge Graph Construction from Twitter
Title | Scalable Knowledge Graph Construction from Twitter |
Authors | Omar Alonso, Vasileios Kandylas, Serge-Eric Tremblay |
Abstract | We describe a knowledge graph derived from Twitter data with the goal of discovering relationships between people, links, and topics. The goal is to filter out noise from Twitter and surface an inside-out view that relies on high quality content. The generated graph contains many relationships where the user can query and traverse the structure from different angles allowing the development of new applications. |
Tasks | graph construction |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.05986v1 |
https://arxiv.org/pdf/1906.05986v1.pdf | |
PWC | https://paperswithcode.com/paper/scalable-knowledge-graph-construction-from |
Repo | |
Framework | |
Pairwise Fairness for Ranking and Regression
Title | Pairwise Fairness for Ranking and Regression |
Authors | Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Serena Wang |
Abstract | We present pairwise fairness metrics for ranking models and regression models that form analogues of statistical fairness notions such as equal opportunity, equal accuracy, and statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using existing constrained optimization and robust optimization techniques developed for fair classification. Experiments illustrate the broad applicability and trade-offs of these methods. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05330v3 |
https://arxiv.org/pdf/1906.05330v3.pdf | |
PWC | https://paperswithcode.com/paper/pairwise-fairness-for-ranking-and-regression |
Repo | |
Framework | |
Distributed Deep Learning with Event-Triggered Communication
Title | Distributed Deep Learning with Event-Triggered Communication |
Authors | Jemin George, Prudhvi Gurram |
Abstract | We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural networks to recognize handwritten digits in images, while aperiodically sharing the model parameters with their one-hop neighbors. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural network, while the event-triggered communication provides significant reduction in inter-agent communication. Results also show that the proposed algorithm allows the individual agents to recognize the digits even though the training data corresponding to all the digits are not locally available to each agent. |
Tasks | |
Published | 2019-09-08 |
URL | https://arxiv.org/abs/1909.05020v1 |
https://arxiv.org/pdf/1909.05020v1.pdf | |
PWC | https://paperswithcode.com/paper/distributed-deep-learning-with-event |
Repo | |
Framework | |
LumièreNet: Lecture Video Synthesis from Audio
Title | LumièreNet: Lecture Video Synthesis from Audio |
Authors | Byung-Hak Kim, Varun Ganapathi |
Abstract | We present Lumi`ereNet, a simple, modular, and completely deep-learning based architecture that synthesizes, high quality, full-pose headshot lecture videos from instructor’s new audio narration of any length. Unlike prior works, Lumi`ereNet is entirely composed of trainable neural network modules to learn mapping functions from the audio to video through (intermediate) estimated pose-based compact and abstract latent codes. Our video demos are available at [22] and [23]. |
Tasks | |
Published | 2019-07-04 |
URL | https://arxiv.org/abs/1907.02253v1 |
https://arxiv.org/pdf/1907.02253v1.pdf | |
PWC | https://paperswithcode.com/paper/lumierenet-lecture-video-synthesis-from-audio |
Repo | |
Framework | |
Meta-Learning for Few-Shot Time Series Classification
Title | Meta-Learning for Few-Shot Time Series Classification |
Authors | Jyoti Narwariya, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, Vishnu Tv |
Abstract | Deep neural networks (DNNs) have achieved state-of-the-art results on time series classification (TSC) tasks. In this work, we focus on leveraging DNNs in the often-encountered practical scenario where access to labeled training data is difficult, and where DNNs would be prone to overfitting. We leverage recent advancements in gradient-based meta-learning, and propose an approach to train a residual neural network with convolutional layers as a meta-learning agent for few-shot TSC. The network is trained on a diverse set of few-shot tasks sampled from various domains (e.g. healthcare, activity recognition, etc.) such that it can solve a target task from another domain using only a small number of training samples from the target task. Most existing meta-learning approaches are limited in practice as they assume a fixed number of target classes across tasks. We overcome this limitation in order to train a common agent across domains with each domain having different number of target classes, we utilize a triplet-loss based learning procedure that does not require any constraints to be enforced on the number of classes for the few-shot TSC tasks. To the best of our knowledge, we are the first to use meta-learning based pre-training for TSC. Our approach sets a new benchmark for few-shot TSC, outperforming several strong baselines on few-shot tasks sampled from 41 datasets in UCR TSC Archive. We observe that pre-training under the meta-learning paradigm allows the network to quickly adapt to new unseen tasks with small number of labeled instances. |
Tasks | Activity Recognition, Meta-Learning, Time Series, Time Series Classification |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.07155v2 |
https://arxiv.org/pdf/1909.07155v2.pdf | |
PWC | https://paperswithcode.com/paper/meta-learning-for-few-shot-time-series |
Repo | |
Framework | |
Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning
Title | Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning |
Authors | Noel C. F. Codella, Michael Hind, Karthikeyan Natesan Ramamurthy, Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei, Aleksandra Mojsilović |
Abstract | Using machine learning in high-stakes applications often requires predictions to be accompanied by explanations comprehensible to the domain user, who has ultimate responsibility for decisions and outcomes. Recently, a new framework for providing explanations, called TED, has been proposed to provide meaningful explanations for predictions. This framework augments training data to include explanations elicited from domain users, in addition to features and labels. This approach ensures that explanations for predictions are tailored to the complexity expectations and domain knowledge of the consumer. In this paper, we build on this foundational work, by exploring more sophisticated instantiations of the TED framework and empirically evaluate their effectiveness in two diverse domains, chemical odor and skin cancer prediction. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, improving modeling accuracy. |
Tasks | Multi-Task Learning |
Published | 2019-06-05 |
URL | https://arxiv.org/abs/1906.02299v1 |
https://arxiv.org/pdf/1906.02299v1.pdf | |
PWC | https://paperswithcode.com/paper/teaching-ai-to-explain-its-decisions-using |
Repo | |
Framework | |
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Title | Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision |
Authors | Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger |
Abstract | Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes. |
Tasks | 3D Reconstruction |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07372v2 |
https://arxiv.org/pdf/1912.07372v2.pdf | |
PWC | https://paperswithcode.com/paper/differentiable-volumetric-rendering-learning |
Repo | |
Framework | |
Memory-Based Neighbourhood Embedding for Visual Recognition
Title | Memory-Based Neighbourhood Embedding for Visual Recognition |
Authors | Suichan Li, Dapeng Chen, Bin Liu, Nenghai Yu, Rui Zhao |
Abstract | Learning discriminative image feature embeddings is of great importance to visual recognition. To achieve better feature embeddings, most current methods focus on designing different network structures or loss functions, and the estimated feature embeddings are usually only related to the input images. In this paper, we propose Memory-based Neighbourhood Embedding (MNE) to enhance a general CNN feature by considering its neighbourhood. The method aims to solve two critical problems, i.e., how to acquire more relevant neighbours in the network training and how to aggregate the neighbourhood information for a more discriminative embedding. We first augment an episodic memory module into the network, which can provide more relevant neighbours for both training and testing. Then the neighbours are organized in a tree graph with the target instance as the root node. The neighbourhood information is gradually aggregated to the root node in a bottom-up manner, and aggregation weights are supervised by the class relationships between the nodes. We apply MNE on image search and few shot learning tasks. Extensive ablation studies demonstrate the effectiveness of each component, and our method significantly outperforms the state-of-the-art approaches. |
Tasks | Few-Shot Learning, Image Retrieval |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.04992v1 |
https://arxiv.org/pdf/1908.04992v1.pdf | |
PWC | https://paperswithcode.com/paper/memory-based-neighbourhood-embedding-for |
Repo | |
Framework | |
Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
Title | Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges |
Authors | Rob Ashmore, Radu Calinescu, Colin Paterson |
Abstract | Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research. |
Tasks | |
Published | 2019-05-10 |
URL | https://arxiv.org/abs/1905.04223v1 |
https://arxiv.org/pdf/1905.04223v1.pdf | |
PWC | https://paperswithcode.com/paper/assuring-the-machine-learning-lifecycle |
Repo | |
Framework | |
Deep Learning for Energy Estimation and Particle Identification in Gamma-ray Astronomy
Title | Deep Learning for Energy Estimation and Particle Identification in Gamma-ray Astronomy |
Authors | Evgeny Postnikov, Alexander Kryukov, Stanislav Polyakov, Dmitry Zhurov |
Abstract | Deep learning techniques, namely convolutional neural networks (CNN), have previously been adapted to select gamma-ray events in the TAIGA experiment, having achieved a good quality of selection as compared with the conventional Hillas approach. Another important task for the TAIGA data analysis was also solved with CNN: gamma-ray energy estimation showed some improvement in comparison with the conventional method based on the Hillas analysis. Furthermore, our software was completely redeveloped for the graphics processing unit (GPU), which led to significantly faster calculations in both of these tasks. All the results have been obtained with the simulated data of TAIGA Monte Carlo software; their experimental confirmation is envisaged for the near future. |
Tasks | |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.10480v1 |
https://arxiv.org/pdf/1907.10480v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-energy-estimation-and |
Repo | |
Framework | |