January 31, 2020

3008 words 15 mins read

Paper Group ANR 187

Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift. Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review. StyleRemix: An Interpretable Representation for Neural Image Style Transfer. Cascaded Context Pyramid for Full-Resolution 3D …

Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift


Title	Affine Variational Autoencoders: An Efficient Approach for Improving Generalization and Robustness to Distribution Shift
Authors	Rene Bidart, Alexander Wong
Abstract	In this study, we propose the Affine Variational Autoencoder (AVAE), a variant of Variational Autoencoder (VAE) designed to improve robustness by overcoming the inability of VAEs to generalize to distributional shifts in the form of affine perturbations. By optimizing an affine transform to maximize ELBO, the proposed AVAE transforms an input to the training distribution without the need to increase model complexity to model the full distribution of affine transforms. In addition, we introduce a training procedure to create an efficient model by learning a subset of the training distribution, and using the AVAE to improve generalization and robustness to distributional shift at test time. Experiments on affine perturbations demonstrate that the proposed AVAE significantly improves generalization and robustness to distributional shift in the form of affine perturbations without an increase in model complexity.
Tasks
Published	2019-05-13
URL	https://arxiv.org/abs/1905.05300v1
PDF	https://arxiv.org/pdf/1905.05300v1.pdf
PWC	https://paperswithcode.com/paper/affine-variational-autoencoders-an-efficient
Repo
Framework

Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review


Title	Towards Computational Models and Applications of Insect Visual Systems for Motion Perception: A Review
Authors	Qinbing Fu, Hongxin Wang, Cheng Hu, Shigang Yue
Abstract	Motion perception is a critical capability determining a variety of aspects of insects’ life, including avoiding predators, foraging and so forth. A good number of motion detectors have been identified in the insects’ visual pathways. Computational modelling of these motion detectors has not only been providing effective solutions to artificial intelligence, but also benefiting the understanding of complicated biological visual systems. These biological mechanisms through millions of years of evolutionary development will have formed solid modules for constructing dynamic vision systems for future intelligent machines. This article reviews the computational motion perception models originating from biological research of insects’ visual systems in the literature. These motion perception models or neural networks comprise the looming sensitive neuronal models of lobula giant movement detectors (LGMDs) in locusts, the translation sensitive neural systems of direction selective neurons (DSNs) in fruit flies, bees and locusts, as well as the small target motion detectors (STMDs) in dragonflies and hover flies. We also review the applications of these models to robots and vehicles. Through these modelling studies, we summarise the methodologies that generate different direction and size selectivity in motion perception. At last, we discuss about multiple systems integration and hardware realisation of these bio-inspired motion perception models.
Tasks
Published	2019-04-03
URL	http://arxiv.org/abs/1904.02048v1
PDF	http://arxiv.org/pdf/1904.02048v1.pdf
PWC	https://paperswithcode.com/paper/towards-computational-models-and-applications
Repo
Framework

StyleRemix: An Interpretable Representation for Neural Image Style Transfer


Title	StyleRemix: An Interpretable Representation for Neural Image Style Transfer
Authors	Hongmin Xu, Qiang Li, Wenbo Zhang, Wen Zheng
Abstract	Multi-Style Transfer (MST) intents to capture the high-level visual vocabulary of different styles and expresses these vocabularies in a joint model to transfer each specific style. Recently, Style Embedding Learning (SEL) based methods represent each style with an explicit set of parameters to perform MST task. However, most existing SEL methods either learn explicit style representation with numerous independent parameters or learn a relatively black-box style representation, which makes them difficult to control the stylized results. In this paper, we outline a novel MST model, StyleRemix, to compactly and explicitly integrate multiple styles into one network. By decomposing diverse styles into the same basis, StyleRemix represents a specific style in a continuous vector space with 1-dimensional coefficients. With the interpretable style representation, StyleRemix not only enables the style visualization task but also allows several ways of remixing styles in the smooth style embedding space.~Extensive experiments demonstrate the effectiveness of StyleRemix on various MST tasks compared to state-of-the-art SEL approaches.
Tasks	Style Transfer
Published	2019-02-27
URL	http://arxiv.org/abs/1902.10425v3
PDF	http://arxiv.org/pdf/1902.10425v3.pdf
PWC	https://paperswithcode.com/paper/styleremix-an-interpretable-representation
Repo
Framework

Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion


Title	Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion
Authors	Pingping Zhang, Wei Liu, Yinjie Lei, Huchuan Lu, Xiaoyun Yang
Abstract	Semantic Scene Completion (SSC) aims to simultaneously predict the volumetric occupancy and semantic category of a 3D scene. It helps intelligent devices to understand and interact with the surrounding scenes. Due to the high-memory requirement, current methods only produce low-resolution completion predictions, and generally lose the object details. Furthermore, they also ignore the multi-scale spatial contexts, which play a vital role for the 3D inference. To address these issues, in this work we propose a novel deep learning framework, named Cascaded Context Pyramid Network (CCPNet), to jointly infer the occupancy and semantic labels of a volumetric 3D scene from a single depth image. The proposed CCPNet improves the labeling coherence with a cascaded context pyramid. Meanwhile, based on the low-level features, it progressively restores the fine-structures of objects with Guided Residual Refinement (GRR) modules. Our proposed framework has three outstanding advantages: (1) it explicitly models the 3D spatial context for performance improvement; (2) full-resolution 3D volumes are produced with structure-preserving details; (3) light-weight models with low-memory requirements are captured with a good extensibility. Extensive experiments demonstrate that in spite of taking a single-view depth map, our proposed framework can generate high-quality SSC results, and outperforms state-of-the-art approaches on both the synthetic SUNCG and real NYU datasets.
Tasks
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00382v1
PDF	https://arxiv.org/pdf/1908.00382v1.pdf
PWC	https://paperswithcode.com/paper/cascaded-context-pyramid-for-full-resolution
Repo
Framework

A deep-learning-based surrogate model for data assimilation in dynamic subsurface flow problems


Title	A deep-learning-based surrogate model for data assimilation in dynamic subsurface flow problems
Authors	Meng Tang, Yimin Liu, Louis J. Durlofsky
Abstract	A deep-learning-based surrogate model is developed and applied for predicting dynamic subsurface flow in channelized geological models. The surrogate model is based on deep convolutional and recurrent neural network architectures, specifically a residual U-Net and a convolutional long short term memory recurrent network. Training samples entail global pressure and saturation maps, at a series of time steps, generated by simulating oil-water flow in many (1500 in our case) realizations of a 2D channelized system. After training, the `recurrent R-U-Net’ surrogate model is shown to be capable of accurately predicting dynamic pressure and saturation maps and well rates (e.g., time-varying oil and water rates at production wells) for new geological realizations. Assessments demonstrating high surrogate-model accuracy are presented for an individual geological realization and for an ensemble of 500 test geomodels. The surrogate model is then used for the challenging problem of data assimilation (history matching) in a channelized system. For this study, posterior reservoir models are generated using the randomized maximum likelihood method, with the permeability field represented using the recently developed CNN-PCA parameterization. The flow responses required during the data assimilation procedure are provided by the recurrent R-U-Net. The overall approach is shown to lead to substantial reduction in prediction uncertainty. High-fidelity numerical simulation results for the posterior geomodels (generated by the surrogate-based data assimilation procedure) are shown to be in essential agreement with the recurrent R-U-Net predictions. The accuracy and dramatic speedup provided by the surrogate model suggest that it may eventually enable the application of more formal posterior sampling methods in realistic problems. \|
Tasks
Published	2019-08-16
URL	https://arxiv.org/abs/1908.05823v1
PDF	https://arxiv.org/pdf/1908.05823v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-learning-based-surrogate-model-for
Repo
Framework

Scalable Knowledge Graph Construction from Twitter


Title	Scalable Knowledge Graph Construction from Twitter
Authors	Omar Alonso, Vasileios Kandylas, Serge-Eric Tremblay
Abstract	We describe a knowledge graph derived from Twitter data with the goal of discovering relationships between people, links, and topics. The goal is to filter out noise from Twitter and surface an inside-out view that relies on high quality content. The generated graph contains many relationships where the user can query and traverse the structure from different angles allowing the development of new applications.
Tasks	graph construction
Published	2019-06-14
URL	https://arxiv.org/abs/1906.05986v1
PDF	https://arxiv.org/pdf/1906.05986v1.pdf
PWC	https://paperswithcode.com/paper/scalable-knowledge-graph-construction-from
Repo
Framework

Pairwise Fairness for Ranking and Regression


Title	Pairwise Fairness for Ranking and Regression
Authors	Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Serena Wang
Abstract	We present pairwise fairness metrics for ranking models and regression models that form analogues of statistical fairness notions such as equal opportunity, equal accuracy, and statistical parity. Our pairwise formulation supports both discrete protected groups, and continuous protected attributes. We show that the resulting training problems can be efficiently and effectively solved using existing constrained optimization and robust optimization techniques developed for fair classification. Experiments illustrate the broad applicability and trade-offs of these methods.
Tasks
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05330v3
PDF	https://arxiv.org/pdf/1906.05330v3.pdf
PWC	https://paperswithcode.com/paper/pairwise-fairness-for-ranking-and-regression
Repo
Framework

Distributed Deep Learning with Event-Triggered Communication


Title	Distributed Deep Learning with Event-Triggered Communication
Authors	Jemin George, Prudhvi Gurram
Abstract	We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural networks to recognize handwritten digits in images, while aperiodically sharing the model parameters with their one-hop neighbors. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural network, while the event-triggered communication provides significant reduction in inter-agent communication. Results also show that the proposed algorithm allows the individual agents to recognize the digits even though the training data corresponding to all the digits are not locally available to each agent.
Tasks
Published	2019-09-08
URL	https://arxiv.org/abs/1909.05020v1
PDF	https://arxiv.org/pdf/1909.05020v1.pdf
PWC	https://paperswithcode.com/paper/distributed-deep-learning-with-event
Repo
Framework

LumièreNet: Lecture Video Synthesis from Audio


Title	LumièreNet: Lecture Video Synthesis from Audio
Authors	Byung-Hak Kim, Varun Ganapathi
Abstract	We present Lumi`ereNet, a simple, modular, and completely deep-learning based architecture that synthesizes, high quality, full-pose headshot lecture videos from instructor’s new audio narration of any length. Unlike prior works, Lumi`ereNet is entirely composed of trainable neural network modules to learn mapping functions from the audio to video through (intermediate) estimated pose-based compact and abstract latent codes. Our video demos are available at [22] and [23].
Tasks
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02253v1
PDF	https://arxiv.org/pdf/1907.02253v1.pdf
PWC	https://paperswithcode.com/paper/lumierenet-lecture-video-synthesis-from-audio
Repo
Framework

Meta-Learning for Few-Shot Time Series Classification


Title	Meta-Learning for Few-Shot Time Series Classification
Authors	Jyoti Narwariya, Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, Vishnu Tv
Abstract	Deep neural networks (DNNs) have achieved state-of-the-art results on time series classification (TSC) tasks. In this work, we focus on leveraging DNNs in the often-encountered practical scenario where access to labeled training data is difficult, and where DNNs would be prone to overfitting. We leverage recent advancements in gradient-based meta-learning, and propose an approach to train a residual neural network with convolutional layers as a meta-learning agent for few-shot TSC. The network is trained on a diverse set of few-shot tasks sampled from various domains (e.g. healthcare, activity recognition, etc.) such that it can solve a target task from another domain using only a small number of training samples from the target task. Most existing meta-learning approaches are limited in practice as they assume a fixed number of target classes across tasks. We overcome this limitation in order to train a common agent across domains with each domain having different number of target classes, we utilize a triplet-loss based learning procedure that does not require any constraints to be enforced on the number of classes for the few-shot TSC tasks. To the best of our knowledge, we are the first to use meta-learning based pre-training for TSC. Our approach sets a new benchmark for few-shot TSC, outperforming several strong baselines on few-shot tasks sampled from 41 datasets in UCR TSC Archive. We observe that pre-training under the meta-learning paradigm allows the network to quickly adapt to new unseen tasks with small number of labeled instances.
Tasks	Activity Recognition, Meta-Learning, Time Series, Time Series Classification
Published	2019-09-13
URL	https://arxiv.org/abs/1909.07155v2
PDF	https://arxiv.org/pdf/1909.07155v2.pdf
PWC	https://paperswithcode.com/paper/meta-learning-for-few-shot-time-series
Repo
Framework

Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning


Title	Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning
Authors	Noel C. F. Codella, Michael Hind, Karthikeyan Natesan Ramamurthy, Murray Campbell, Amit Dhurandhar, Kush R. Varshney, Dennis Wei, Aleksandra Mojsilović
Abstract	Using machine learning in high-stakes applications often requires predictions to be accompanied by explanations comprehensible to the domain user, who has ultimate responsibility for decisions and outcomes. Recently, a new framework for providing explanations, called TED, has been proposed to provide meaningful explanations for predictions. This framework augments training data to include explanations elicited from domain users, in addition to features and labels. This approach ensures that explanations for predictions are tailored to the complexity expectations and domain knowledge of the consumer. In this paper, we build on this foundational work, by exploring more sophisticated instantiations of the TED framework and empirically evaluate their effectiveness in two diverse domains, chemical odor and skin cancer prediction. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, improving modeling accuracy.
Tasks	Multi-Task Learning
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02299v1
PDF	https://arxiv.org/pdf/1906.02299v1.pdf
PWC	https://paperswithcode.com/paper/teaching-ai-to-explain-its-decisions-using
Repo
Framework

Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision


Title	Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Authors	Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger
Abstract	Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.
Tasks	3D Reconstruction
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07372v2
PDF	https://arxiv.org/pdf/1912.07372v2.pdf
PWC	https://paperswithcode.com/paper/differentiable-volumetric-rendering-learning
Repo
Framework

Memory-Based Neighbourhood Embedding for Visual Recognition


Title	Memory-Based Neighbourhood Embedding for Visual Recognition
Authors	Suichan Li, Dapeng Chen, Bin Liu, Nenghai Yu, Rui Zhao
Abstract	Learning discriminative image feature embeddings is of great importance to visual recognition. To achieve better feature embeddings, most current methods focus on designing different network structures or loss functions, and the estimated feature embeddings are usually only related to the input images. In this paper, we propose Memory-based Neighbourhood Embedding (MNE) to enhance a general CNN feature by considering its neighbourhood. The method aims to solve two critical problems, i.e., how to acquire more relevant neighbours in the network training and how to aggregate the neighbourhood information for a more discriminative embedding. We first augment an episodic memory module into the network, which can provide more relevant neighbours for both training and testing. Then the neighbours are organized in a tree graph with the target instance as the root node. The neighbourhood information is gradually aggregated to the root node in a bottom-up manner, and aggregation weights are supervised by the class relationships between the nodes. We apply MNE on image search and few shot learning tasks. Extensive ablation studies demonstrate the effectiveness of each component, and our method significantly outperforms the state-of-the-art approaches.
Tasks	Few-Shot Learning, Image Retrieval
Published	2019-08-14
URL	https://arxiv.org/abs/1908.04992v1
PDF	https://arxiv.org/pdf/1908.04992v1.pdf
PWC	https://paperswithcode.com/paper/memory-based-neighbourhood-embedding-for
Repo
Framework

Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges


Title	Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
Authors	Rob Ashmore, Radu Calinescu, Colin Paterson
Abstract	Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our paper provides a comprehensive survey of the state-of-the-art in the assurance of ML, i.e. in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e. of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The paper begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research.
Tasks
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04223v1
PDF	https://arxiv.org/pdf/1905.04223v1.pdf
PWC	https://paperswithcode.com/paper/assuring-the-machine-learning-lifecycle
Repo
Framework

Deep Learning for Energy Estimation and Particle Identification in Gamma-ray Astronomy


Title	Deep Learning for Energy Estimation and Particle Identification in Gamma-ray Astronomy
Authors	Evgeny Postnikov, Alexander Kryukov, Stanislav Polyakov, Dmitry Zhurov
Abstract	Deep learning techniques, namely convolutional neural networks (CNN), have previously been adapted to select gamma-ray events in the TAIGA experiment, having achieved a good quality of selection as compared with the conventional Hillas approach. Another important task for the TAIGA data analysis was also solved with CNN: gamma-ray energy estimation showed some improvement in comparison with the conventional method based on the Hillas analysis. Furthermore, our software was completely redeveloped for the graphics processing unit (GPU), which led to significantly faster calculations in both of these tasks. All the results have been obtained with the simulated data of TAIGA Monte Carlo software; their experimental confirmation is envisaged for the near future.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.10480v1
PDF	https://arxiv.org/pdf/1907.10480v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-energy-estimation-and
Repo
Framework