January 26, 2020

2866 words 14 mins read

Paper Group ANR 1368

PyHessian: Neural Networks Through the Lens of the Hessian. Variational Conditional GAN for Fine-grained Controllable Image Generation. Fourier-CPPNs for Image Synthesis. Spatial-Aware Non-Local Attention for Fashion Landmark Detection. Capacity allocation through neural network layers. Ranking architectures using meta-learning. Drug-Drug Adverse E …

PyHessian: Neural Networks Through the Lens of the Hessian


Title	PyHessian: Neural Networks Through the Lens of the Hessian
Authors	Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney
Abstract	We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open source. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape smoother, thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our extensive analysis shows new finer-scale insights, demonstrating that, while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallower networks.
Tasks
Published	2019-12-16
URL	https://arxiv.org/abs/1912.07145v3
PDF	https://arxiv.org/pdf/1912.07145v3.pdf
PWC	https://paperswithcode.com/paper/pyhessian-neural-networks-through-the-lens-of
Repo
Framework

Variational Conditional GAN for Fine-grained Controllable Image Generation


Title	Variational Conditional GAN for Fine-grained Controllable Image Generation
Authors	Mingqi Hu, Deyu Zhou, Yulan He
Abstract	In this paper, we propose a novel variational generator framework for conditional GANs to catch semantic details for improving the generation quality and diversity. Traditional generators in conditional GANs simply concatenate the conditional vector with the noise as the input representation, which is directly employed for upsampling operations. However, the hidden condition information is not fully exploited, especially when the input is a class label. Therefore, we introduce a variational inference into the generator to infer the posterior of latent variable only from the conditional input, which helps achieve a variable augmented representation for image generation. Qualitative and quantitative experimental results show that the proposed method outperforms the state-of-the-art approaches and achieves the realistic controllable images.
Tasks	Image Generation
Published	2019-09-22
URL	https://arxiv.org/abs/1909.09979v1
PDF	https://arxiv.org/pdf/1909.09979v1.pdf
PWC	https://paperswithcode.com/paper/190909979
Repo
Framework

Fourier-CPPNs for Image Synthesis


Title	Fourier-CPPNs for Image Synthesis
Authors	Mattie Tesfaldet, Xavier Snelgrove, David Vazquez
Abstract	Compositional Pattern Producing Networks (CPPNs) are differentiable networks that independently map (x, y) pixel coordinates to (r, g, b) colour values. Recently, CPPNs have been used for creating interesting imagery for creative purposes, e.g., neural art. However their architecture biases generated images to be overly smooth, lacking high-frequency detail. In this work, we extend CPPNs to explicitly model the frequency information for each pixel output, capturing frequencies beyond the DC component. We show that our Fourier-CPPNs (F-CPPNs) provide improved visual detail for image synthesis.
Tasks	Image Generation
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09273v1
PDF	https://arxiv.org/pdf/1909.09273v1.pdf
PWC	https://paperswithcode.com/paper/fourier-cppns-for-image-synthesis
Repo
Framework

Spatial-Aware Non-Local Attention for Fashion Landmark Detection


Title	Spatial-Aware Non-Local Attention for Fashion Landmark Detection
Authors	Yixin Li, Shengqin Tang, Yun Ye, Jinwen Ma
Abstract	Fashion landmark detection is a challenging task even using the current deep learning techniques, due to the large variation and non-rigid deformation of clothes. In order to tackle these problems, we propose Spatial-Aware Non-Local (SANL) block, an attentive module in deep neural network which can utilize spatial information while capturing global dependency. Actually, the SANL block is constructed from the non-local block in the residual manner which can learn the spatial related representation by taking a spatial attention map from Grad-CAM. We then establish our fashion landmark detection framework on feature pyramid network, equipped with four SANL blocks in the backbone. It is demonstrated by the experimental results on two large-scale fashion datasets that our proposed fashion landmark detection approach with the SANL blocks outperforms the current state-of-the-art methods considerably. Some supplementary experiments on fine-grained image classification also show the effectiveness of the proposed SANL block.
Tasks	Fine-Grained Image Classification, Image Classification
Published	2019-03-11
URL	http://arxiv.org/abs/1903.04104v1
PDF	http://arxiv.org/pdf/1903.04104v1.pdf
PWC	https://paperswithcode.com/paper/spatial-aware-non-local-attention-for-fashion
Repo
Framework

Capacity allocation through neural network layers


Title	Capacity allocation through neural network layers
Authors	Jonathan Donier
Abstract	Capacity analysis has been recently introduced as a way to analyze how linear models distribute their modelling capacity across the input space. In this paper, we extend the notion of capacity allocation to the case of neural networks with non-linear layers. We show that under some hypotheses the problem is equivalent to linear capacity allocation, within some extended input space that factors in the non-linearities. We introduce the notion of layer decoupling, which quantifies the degree to which a non-linear activation decouples its outputs, and show that it plays a central role in capacity allocation through layers. In the highly non-linear limit where decoupling is total, we show that the propagation of capacity throughout the layers follows a simple markovian rule, which turns into a diffusion PDE in the limit of deep networks with residual layers. This allows us to recover some known results about deep neural networks, such as the size of the effective receptive field, or why ResNets avoid the shattering problem.
Tasks
Published	2019-02-22
URL	http://arxiv.org/abs/1902.08572v2
PDF	http://arxiv.org/pdf/1902.08572v2.pdf
PWC	https://paperswithcode.com/paper/capacity-allocation-through-neural-network
Repo
Framework

Ranking architectures using meta-learning


Title	Ranking architectures using meta-learning
Authors	Alina Dubatovka, Efi Kokiopoulou, Luciano Sbaiz, Andrea Gesmundo, Gabor Bartok, Jesse Berent
Abstract	Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor.
Tasks	Meta-Learning, Neural Architecture Search
Published	2019-11-26
URL	https://arxiv.org/abs/1911.11481v1
PDF	https://arxiv.org/pdf/1911.11481v1.pdf
PWC	https://paperswithcode.com/paper/ranking-architectures-using-meta-learning
Repo
Framework

Drug-Drug Adverse Effect Prediction with Graph Co-Attention


Title	Drug-Drug Adverse Effect Prediction with Graph Co-Attention
Authors	Andreea Deac, Yu-Hsiang Huang, Petar Veličković, Pietro Liò, Jian Tang
Abstract	Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in the US now) and it is thus of high interest to be able to predict the potential side effects as early as possible. Systematic combinatorial screening of possible drug-drug interactions (DDI) is challenging and expensive. However, the recent significant increases in data availability from pharmaceutical research and development efforts offer a novel paradigm for recovering relevant insights for DDI prediction. Accordingly, several recent approaches focus on curating massive DDI datasets (with millions of examples) and training machine learning models on them. Here we propose a neural network architecture able to set state-of-the-art results on this task—using the type of the side-effect and the molecular structure of the drugs alone—by leveraging a co-attentional mechanism. In particular, we show the importance of integrating joint information from the drug pairs early on when learning each drug’s representation.
Tasks
Published	2019-05-02
URL	http://arxiv.org/abs/1905.00534v1
PDF	http://arxiv.org/pdf/1905.00534v1.pdf
PWC	https://paperswithcode.com/paper/drug-drug-adverse-effect-prediction-with
Repo
Framework

ReQA: An Evaluation for End-to-End Answer Retrieval Models


Title	ReQA: An Evaluation for End-to-End Answer Retrieval Models
Authors	Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer
Abstract	Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance. However, retrieving relevant answers from a huge corpus of documents is still a challenging problem, and places different requirements on the model architecture. There is growing interest in developing scalable answer retrieval models trained end-to-end, bypassing the typical document retrieval step. In this paper, we introduce Retrieval Question-Answering (ReQA), a benchmark for evaluating large-scale sentence-level answer retrieval models. We establish baselines using both neural encoding models as well as classical information retrieval techniques. We release our evaluation code to encourage further work on this challenging task.
Tasks	Information Retrieval, Question Answering
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04780v2
PDF	https://arxiv.org/pdf/1907.04780v2.pdf
PWC	https://paperswithcode.com/paper/reqa-an-evaluation-for-end-to-end-answer
Repo
Framework

Masking by Moving: Learning Distraction-Free Radar Odometry from Pose Information


Title	Masking by Moving: Learning Distraction-Free Radar Odometry from Pose Information
Authors	Dan Barnes, Rob Weston, Ingmar Posner
Abstract	This paper presents an end-to-end radar odometry system which delivers robust, real-time pose estimates based on a learned embedding space free of sensing artefacts and distractor objects. The system deploys a fully differentiable, correlation-based radar matching approach. This provides the same level of interpretability as established scan-matching methods and allows for a principled derivation of uncertainty estimates. The system is trained in a (self-)supervised way using only previously obtained pose information as a training signal. Using 280km of urban driving data, we demonstrate that our approach outperforms the previous state-of-the-art in radar odometry by reducing errors by up 68% whilst running an order of magnitude faster.
Tasks
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03752v4
PDF	https://arxiv.org/pdf/1909.03752v4.pdf
PWC	https://paperswithcode.com/paper/masking-by-moving-learning-distraction-free
Repo
Framework

Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series


Title	Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series
Authors	Yang Guo, Zhengyuan Liu, Pavitra Krishnswamy, Savitha Ramasamy
Abstract	Real-world clinical time series data sets exhibit a high prevalence of missing values. Hence, there is an increasing interest in missing data imputation. Traditional statistical approaches impose constraints on the data-generating process and decouple imputation from prediction. Recent works propose recurrent neural network based approaches for missing data imputation and prediction with time series data. However, they generate deterministic outputs and neglect the inherent uncertainty. In this work, we introduce a unified Bayesian recurrent framework for simultaneous imputation and prediction on time series data sets. We evaluate our approach on two real-world mortality prediction tasks using the MIMIC-III and PhysioNet benchmark datasets. We demonstrate strong performance gains over state-of-the-art (SOTA) methods, and provide strategies to use the resulting probability distributions to better assess reliability of the imputations and predictions.
Tasks	Imputation, Mortality Prediction, Time Series
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07572v2
PDF	https://arxiv.org/pdf/1911.07572v2.pdf
PWC	https://paperswithcode.com/paper/bayesian-recurrent-framework-for-missing-data
Repo
Framework

A Reproducible Comparison of RSSI Fingerprinting Localization Methods Using LoRaWAN


Title	A Reproducible Comparison of RSSI Fingerprinting Localization Methods Using LoRaWAN
Authors	Grigorios G. Anagnostopoulos, Alexandros Kalousis
Abstract	The use of fingerprinting localization techniques in outdoor IoT settings has started to gain popularity over the recent years. Communication signals of Low Power Wide Area Networks (LPWAN), such as LoRaWAN, are used to estimate the location of low power mobile devices. In this study, a publicly available dataset of LoRaWAN RSSI measurements is utilized to compare different machine learning methods and their accuracy in producing location estimates. The tested methods are: the k Nearest Neighbours method, the Extra Trees method and a neural network approach using a Multilayer Perceptron. To facilitate the reproducibility of tests and the comparability of results, the code and the train/validation/test split of the dataset used in this study have become available. The neural network approach was the method with the highest accuracy, achieving a mean error of 358 meters and a median error of 204 meters.
Tasks
Published	2019-08-14
URL	https://arxiv.org/abs/1908.05085v1
PDF	https://arxiv.org/pdf/1908.05085v1.pdf
PWC	https://paperswithcode.com/paper/a-reproducible-comparison-of-rssi
Repo
Framework

One-view occlusion detection for stereo matching with a fully connected CRF model


Title	One-view occlusion detection for stereo matching with a fully connected CRF model
Authors	Mikhail G. Mozerov, Joost van de Weijer
Abstract	In this paper, we extend the standard belief propagation (BP) sequential technique proposed in the tree-reweighted sequential method to the fully connected CRF models with the geodesic distance affinity. The proposed method has been applied to the stereo matching problem. Also a new approach to the BP marginal solution is proposed that we call one-view occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure. We show that the OVOD approach considerably improves results for cost augmentation and energy minimization techniques in comparison with the standard one-view affinity space implementation. We apply our method to the Middlebury data set and reach state-of-the-art especially for median, average and mean squared error metrics.
Tasks	Stereo Matching, Stereo Matching Hand
Published	2019-01-12
URL	http://arxiv.org/abs/1901.03852v1
PDF	http://arxiv.org/pdf/1901.03852v1.pdf
PWC	https://paperswithcode.com/paper/one-view-occlusion-detection-for-stereo
Repo
Framework

Learning Cross-Domain Representation with Multi-Graph Neural Network


Title	Learning Cross-Domain Representation with Multi-Graph Neural Network
Authors	Yi Ouyang, Bin Guo, Xing Tang, Xiuqiang He, Jian Xiong, Zhiwen Yu
Abstract	Learning effective embedding has been proved to be useful in many real-world problems, such as recommender systems, search ranking and online advertisement. However, one of the challenges is data sparsity in learning large-scale item embedding, as users’ historical behavior data are usually lacking or insufficient in an individual domain. In fact, user’s behaviors from different domains regarding the same items are usually relevant. Therefore, we can learn complete user behaviors to alleviate the sparsity using complementary information from correlated domains. It is intuitive to model users’ behaviors using graph, and graph neural networks (GNNs) have recently shown the great power for representation learning, which can be used to learn item embedding. However, it is challenging to transfer the information across domains and learn cross-domain representation using the existing GNNs. To address these challenges, in this paper, we propose a novel model - Deep Multi-Graph Embedding (DMGE) to learn cross-domain representation. Specifically, we first construct a multi-graph based on users’ behaviors from different domains, and then propose a multi-graph neural network to learn cross-domain representation in an unsupervised manner. Particularly, we present a multiple-gradient descent optimizer for efficiently training the model. We evaluate our approach on various large-scale real-world datasets, and the experimental results show that DMGE outperforms other state-of-art embedding methods in various tasks.
Tasks	Graph Embedding, Recommendation Systems, Representation Learning
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10095v1
PDF	https://arxiv.org/pdf/1905.10095v1.pdf
PWC	https://paperswithcode.com/paper/learning-cross-domain-representation-with
Repo
Framework

DDTCDR: Deep Dual Transfer Cross Domain Recommendation


Title	DDTCDR: Deep Dual Transfer Cross Domain Recommendation
Authors	Pan Li, Alexander Tuzhilin
Abstract	Cross domain recommender systems have been increasingly valuable for helping consumers identify the most satisfying items from different categories. However, previously proposed cross-domain models did not take into account bidirectional latent relations between users and items. In addition, they do not explicitly model information of user and item features, while utilizing only user ratings information for recommendations. To address these concerns, in this paper we propose a novel approach to cross-domain recommendations based on the mechanism of dual learning that transfers information between two related domains in an iterative manner until the learning process stabilizes. We develop a novel latent orthogonal mapping to extract user preferences over multiple domains while preserving relations between users across different latent spaces. Combining with autoencoder approach to extract the latent essence of feature information, we propose Deep Dual Transfer Cross Domain Recommendation (DDTCDR) model to provide recommendations in respective domains. We test the proposed method on a large dataset containing three domains of movies, book and music items and demonstrate that it consistently and significantly outperforms several state-of-the-art baselines and also classical transfer learning approaches.
Tasks	Recommendation Systems, Transfer Learning
Published	2019-10-11
URL	https://arxiv.org/abs/1910.05189v1
PDF	https://arxiv.org/pdf/1910.05189v1.pdf
PWC	https://paperswithcode.com/paper/ddtcdr-deep-dual-transfer-cross-domain
Repo
Framework

An Empirical Evaluation of Adversarial Robustness under Transfer Learning


Title	An Empirical Evaluation of Adversarial Robustness under Transfer Learning
Authors	Todor Davchev, Timos Korres, Stathi Fotiadis, Nick Antonopoulos, Subramanian Ramamoorthy
Abstract	In this work, we evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which features learnt by a fast gradient sign method (FGSM) and its iterative alternative (PGD) can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. We find that using PGD examples during training on the source task leads to more general robust features that are easier to transfer. Furthermore, under successful transfer, it achieves 5.2% more accuracy against white-box PGD attacks than suitable baselines. Overall, our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise.
Tasks	Transfer Learning
Published	2019-05-07
URL	https://arxiv.org/abs/1905.02675v4
PDF	https://arxiv.org/pdf/1905.02675v4.pdf
PWC	https://paperswithcode.com/paper/towards-evaluating-and-understanding-robust
Repo
Framework