Paper Group ANR 1368
PyHessian: Neural Networks Through the Lens of the Hessian. Variational Conditional GAN for Fine-grained Controllable Image Generation. Fourier-CPPNs for Image Synthesis. Spatial-Aware Non-Local Attention for Fashion Landmark Detection. Capacity allocation through neural network layers. Ranking architectures using meta-learning. Drug-Drug Adverse E …
PyHessian: Neural Networks Through the Lens of the Hessian
Title | PyHessian: Neural Networks Through the Lens of the Hessian |
Authors | Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney |
Abstract | We present PYHESSIAN, a new scalable framework that enables fast computation of Hessian (i.e., second-order derivative) information for deep neural networks. PYHESSIAN enables fast computations of the top Hessian eigenvalues, the Hessian trace, and the full Hessian eigenvalue/spectral density, and it supports distributed-memory execution on cloud/supercomputer systems and is available as open source. This general framework can be used to analyze neural network models, including the topology of the loss landscape (i.e., curvature information) to gain insight into the behavior of different models/optimizers. To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks. One recent claim, based on simpler first-order analysis, is that residual connections and Batch Normalization make the loss landscape smoother, thus making it easier for Stochastic Gradient Descent to converge to a good solution. Our extensive analysis shows new finer-scale insights, demonstrating that, while conventional wisdom is sometimes validated, in other cases it is simply incorrect. In particular, we find that Batch Normalization does not necessarily make the loss landscape smoother, especially for shallower networks. |
Tasks | |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07145v3 |
https://arxiv.org/pdf/1912.07145v3.pdf | |
PWC | https://paperswithcode.com/paper/pyhessian-neural-networks-through-the-lens-of |
Repo | |
Framework | |
Variational Conditional GAN for Fine-grained Controllable Image Generation
Title | Variational Conditional GAN for Fine-grained Controllable Image Generation |
Authors | Mingqi Hu, Deyu Zhou, Yulan He |
Abstract | In this paper, we propose a novel variational generator framework for conditional GANs to catch semantic details for improving the generation quality and diversity. Traditional generators in conditional GANs simply concatenate the conditional vector with the noise as the input representation, which is directly employed for upsampling operations. However, the hidden condition information is not fully exploited, especially when the input is a class label. Therefore, we introduce a variational inference into the generator to infer the posterior of latent variable only from the conditional input, which helps achieve a variable augmented representation for image generation. Qualitative and quantitative experimental results show that the proposed method outperforms the state-of-the-art approaches and achieves the realistic controllable images. |
Tasks | Image Generation |
Published | 2019-09-22 |
URL | https://arxiv.org/abs/1909.09979v1 |
https://arxiv.org/pdf/1909.09979v1.pdf | |
PWC | https://paperswithcode.com/paper/190909979 |
Repo | |
Framework | |
Fourier-CPPNs for Image Synthesis
Title | Fourier-CPPNs for Image Synthesis |
Authors | Mattie Tesfaldet, Xavier Snelgrove, David Vazquez |
Abstract | Compositional Pattern Producing Networks (CPPNs) are differentiable networks that independently map (x, y) pixel coordinates to (r, g, b) colour values. Recently, CPPNs have been used for creating interesting imagery for creative purposes, e.g., neural art. However their architecture biases generated images to be overly smooth, lacking high-frequency detail. In this work, we extend CPPNs to explicitly model the frequency information for each pixel output, capturing frequencies beyond the DC component. We show that our Fourier-CPPNs (F-CPPNs) provide improved visual detail for image synthesis. |
Tasks | Image Generation |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09273v1 |
https://arxiv.org/pdf/1909.09273v1.pdf | |
PWC | https://paperswithcode.com/paper/fourier-cppns-for-image-synthesis |
Repo | |
Framework | |
Spatial-Aware Non-Local Attention for Fashion Landmark Detection
Title | Spatial-Aware Non-Local Attention for Fashion Landmark Detection |
Authors | Yixin Li, Shengqin Tang, Yun Ye, Jinwen Ma |
Abstract | Fashion landmark detection is a challenging task even using the current deep learning techniques, due to the large variation and non-rigid deformation of clothes. In order to tackle these problems, we propose Spatial-Aware Non-Local (SANL) block, an attentive module in deep neural network which can utilize spatial information while capturing global dependency. Actually, the SANL block is constructed from the non-local block in the residual manner which can learn the spatial related representation by taking a spatial attention map from Grad-CAM. We then establish our fashion landmark detection framework on feature pyramid network, equipped with four SANL blocks in the backbone. It is demonstrated by the experimental results on two large-scale fashion datasets that our proposed fashion landmark detection approach with the SANL blocks outperforms the current state-of-the-art methods considerably. Some supplementary experiments on fine-grained image classification also show the effectiveness of the proposed SANL block. |
Tasks | Fine-Grained Image Classification, Image Classification |
Published | 2019-03-11 |
URL | http://arxiv.org/abs/1903.04104v1 |
http://arxiv.org/pdf/1903.04104v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-aware-non-local-attention-for-fashion |
Repo | |
Framework | |
Capacity allocation through neural network layers
Title | Capacity allocation through neural network layers |
Authors | Jonathan Donier |
Abstract | Capacity analysis has been recently introduced as a way to analyze how linear models distribute their modelling capacity across the input space. In this paper, we extend the notion of capacity allocation to the case of neural networks with non-linear layers. We show that under some hypotheses the problem is equivalent to linear capacity allocation, within some extended input space that factors in the non-linearities. We introduce the notion of layer decoupling, which quantifies the degree to which a non-linear activation decouples its outputs, and show that it plays a central role in capacity allocation through layers. In the highly non-linear limit where decoupling is total, we show that the propagation of capacity throughout the layers follows a simple markovian rule, which turns into a diffusion PDE in the limit of deep networks with residual layers. This allows us to recover some known results about deep neural networks, such as the size of the effective receptive field, or why ResNets avoid the shattering problem. |
Tasks | |
Published | 2019-02-22 |
URL | http://arxiv.org/abs/1902.08572v2 |
http://arxiv.org/pdf/1902.08572v2.pdf | |
PWC | https://paperswithcode.com/paper/capacity-allocation-through-neural-network |
Repo | |
Framework | |
Ranking architectures using meta-learning
Title | Ranking architectures using meta-learning |
Authors | Alina Dubatovka, Efi Kokiopoulou, Luciano Sbaiz, Andrea Gesmundo, Gabor Bartok, Jesse Berent |
Abstract | Neural architecture search has recently attracted lots of research efforts as it promises to automate the manual design of neural networks. However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training. The performance predictor is task-aware taking as input not only the candidate architecture but also task meta-features and it has been designed to collectively learn from several tasks. In this work, we introduce a pairwise ranking loss for training a network able to rank candidate architectures for a new unseen task conditioning on its task meta-features. We present experimental results, showing that the ranking network is more effective in architecture search than the previously proposed performance predictor. |
Tasks | Meta-Learning, Neural Architecture Search |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11481v1 |
https://arxiv.org/pdf/1911.11481v1.pdf | |
PWC | https://paperswithcode.com/paper/ranking-architectures-using-meta-learning |
Repo | |
Framework | |
Drug-Drug Adverse Effect Prediction with Graph Co-Attention
Title | Drug-Drug Adverse Effect Prediction with Graph Co-Attention |
Authors | Andreea Deac, Yu-Hsiang Huang, Petar Veličković, Pietro Liò, Jian Tang |
Abstract | Complex or co-existing diseases are commonly treated using drug combinations, which can lead to higher risk of adverse side effects. The detection of polypharmacy side effects is usually done in Phase IV clinical trials, but there are still plenty which remain undiscovered when the drugs are put on the market. Such accidents have been affecting an increasing proportion of the population (15% in the US now) and it is thus of high interest to be able to predict the potential side effects as early as possible. Systematic combinatorial screening of possible drug-drug interactions (DDI) is challenging and expensive. However, the recent significant increases in data availability from pharmaceutical research and development efforts offer a novel paradigm for recovering relevant insights for DDI prediction. Accordingly, several recent approaches focus on curating massive DDI datasets (with millions of examples) and training machine learning models on them. Here we propose a neural network architecture able to set state-of-the-art results on this task—using the type of the side-effect and the molecular structure of the drugs alone—by leveraging a co-attentional mechanism. In particular, we show the importance of integrating joint information from the drug pairs early on when learning each drug’s representation. |
Tasks | |
Published | 2019-05-02 |
URL | http://arxiv.org/abs/1905.00534v1 |
http://arxiv.org/pdf/1905.00534v1.pdf | |
PWC | https://paperswithcode.com/paper/drug-drug-adverse-effect-prediction-with |
Repo | |
Framework | |
ReQA: An Evaluation for End-to-End Answer Retrieval Models
Title | ReQA: An Evaluation for End-to-End Answer Retrieval Models |
Authors | Amin Ahmad, Noah Constant, Yinfei Yang, Daniel Cer |
Abstract | Popular QA benchmarks like SQuAD have driven progress on the task of identifying answer spans within a specific passage, with models now surpassing human performance. However, retrieving relevant answers from a huge corpus of documents is still a challenging problem, and places different requirements on the model architecture. There is growing interest in developing scalable answer retrieval models trained end-to-end, bypassing the typical document retrieval step. In this paper, we introduce Retrieval Question-Answering (ReQA), a benchmark for evaluating large-scale sentence-level answer retrieval models. We establish baselines using both neural encoding models as well as classical information retrieval techniques. We release our evaluation code to encourage further work on this challenging task. |
Tasks | Information Retrieval, Question Answering |
Published | 2019-07-10 |
URL | https://arxiv.org/abs/1907.04780v2 |
https://arxiv.org/pdf/1907.04780v2.pdf | |
PWC | https://paperswithcode.com/paper/reqa-an-evaluation-for-end-to-end-answer |
Repo | |
Framework | |
Masking by Moving: Learning Distraction-Free Radar Odometry from Pose Information
Title | Masking by Moving: Learning Distraction-Free Radar Odometry from Pose Information |
Authors | Dan Barnes, Rob Weston, Ingmar Posner |
Abstract | This paper presents an end-to-end radar odometry system which delivers robust, real-time pose estimates based on a learned embedding space free of sensing artefacts and distractor objects. The system deploys a fully differentiable, correlation-based radar matching approach. This provides the same level of interpretability as established scan-matching methods and allows for a principled derivation of uncertainty estimates. The system is trained in a (self-)supervised way using only previously obtained pose information as a training signal. Using 280km of urban driving data, we demonstrate that our approach outperforms the previous state-of-the-art in radar odometry by reducing errors by up 68% whilst running an order of magnitude faster. |
Tasks | |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03752v4 |
https://arxiv.org/pdf/1909.03752v4.pdf | |
PWC | https://paperswithcode.com/paper/masking-by-moving-learning-distraction-free |
Repo | |
Framework | |
Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series
Title | Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series |
Authors | Yang Guo, Zhengyuan Liu, Pavitra Krishnswamy, Savitha Ramasamy |
Abstract | Real-world clinical time series data sets exhibit a high prevalence of missing values. Hence, there is an increasing interest in missing data imputation. Traditional statistical approaches impose constraints on the data-generating process and decouple imputation from prediction. Recent works propose recurrent neural network based approaches for missing data imputation and prediction with time series data. However, they generate deterministic outputs and neglect the inherent uncertainty. In this work, we introduce a unified Bayesian recurrent framework for simultaneous imputation and prediction on time series data sets. We evaluate our approach on two real-world mortality prediction tasks using the MIMIC-III and PhysioNet benchmark datasets. We demonstrate strong performance gains over state-of-the-art (SOTA) methods, and provide strategies to use the resulting probability distributions to better assess reliability of the imputations and predictions. |
Tasks | Imputation, Mortality Prediction, Time Series |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07572v2 |
https://arxiv.org/pdf/1911.07572v2.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-recurrent-framework-for-missing-data |
Repo | |
Framework | |
A Reproducible Comparison of RSSI Fingerprinting Localization Methods Using LoRaWAN
Title | A Reproducible Comparison of RSSI Fingerprinting Localization Methods Using LoRaWAN |
Authors | Grigorios G. Anagnostopoulos, Alexandros Kalousis |
Abstract | The use of fingerprinting localization techniques in outdoor IoT settings has started to gain popularity over the recent years. Communication signals of Low Power Wide Area Networks (LPWAN), such as LoRaWAN, are used to estimate the location of low power mobile devices. In this study, a publicly available dataset of LoRaWAN RSSI measurements is utilized to compare different machine learning methods and their accuracy in producing location estimates. The tested methods are: the k Nearest Neighbours method, the Extra Trees method and a neural network approach using a Multilayer Perceptron. To facilitate the reproducibility of tests and the comparability of results, the code and the train/validation/test split of the dataset used in this study have become available. The neural network approach was the method with the highest accuracy, achieving a mean error of 358 meters and a median error of 204 meters. |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05085v1 |
https://arxiv.org/pdf/1908.05085v1.pdf | |
PWC | https://paperswithcode.com/paper/a-reproducible-comparison-of-rssi |
Repo | |
Framework | |
One-view occlusion detection for stereo matching with a fully connected CRF model
Title | One-view occlusion detection for stereo matching with a fully connected CRF model |
Authors | Mikhail G. Mozerov, Joost van de Weijer |
Abstract | In this paper, we extend the standard belief propagation (BP) sequential technique proposed in the tree-reweighted sequential method to the fully connected CRF models with the geodesic distance affinity. The proposed method has been applied to the stereo matching problem. Also a new approach to the BP marginal solution is proposed that we call one-view occlusion detection (OVOD). In contrast to the standard winner takes all (WTA) estimation, the proposed OVOD solution allows to find occluded regions in the disparity map and simultaneously improve the matching result. As a result we can perform only one energy minimization process and avoid the cost calculation for the second view and the left-right check procedure. We show that the OVOD approach considerably improves results for cost augmentation and energy minimization techniques in comparison with the standard one-view affinity space implementation. We apply our method to the Middlebury data set and reach state-of-the-art especially for median, average and mean squared error metrics. |
Tasks | Stereo Matching, Stereo Matching Hand |
Published | 2019-01-12 |
URL | http://arxiv.org/abs/1901.03852v1 |
http://arxiv.org/pdf/1901.03852v1.pdf | |
PWC | https://paperswithcode.com/paper/one-view-occlusion-detection-for-stereo |
Repo | |
Framework | |
Learning Cross-Domain Representation with Multi-Graph Neural Network
Title | Learning Cross-Domain Representation with Multi-Graph Neural Network |
Authors | Yi Ouyang, Bin Guo, Xing Tang, Xiuqiang He, Jian Xiong, Zhiwen Yu |
Abstract | Learning effective embedding has been proved to be useful in many real-world problems, such as recommender systems, search ranking and online advertisement. However, one of the challenges is data sparsity in learning large-scale item embedding, as users’ historical behavior data are usually lacking or insufficient in an individual domain. In fact, user’s behaviors from different domains regarding the same items are usually relevant. Therefore, we can learn complete user behaviors to alleviate the sparsity using complementary information from correlated domains. It is intuitive to model users’ behaviors using graph, and graph neural networks (GNNs) have recently shown the great power for representation learning, which can be used to learn item embedding. However, it is challenging to transfer the information across domains and learn cross-domain representation using the existing GNNs. To address these challenges, in this paper, we propose a novel model - Deep Multi-Graph Embedding (DMGE) to learn cross-domain representation. Specifically, we first construct a multi-graph based on users’ behaviors from different domains, and then propose a multi-graph neural network to learn cross-domain representation in an unsupervised manner. Particularly, we present a multiple-gradient descent optimizer for efficiently training the model. We evaluate our approach on various large-scale real-world datasets, and the experimental results show that DMGE outperforms other state-of-art embedding methods in various tasks. |
Tasks | Graph Embedding, Recommendation Systems, Representation Learning |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10095v1 |
https://arxiv.org/pdf/1905.10095v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-cross-domain-representation-with |
Repo | |
Framework | |
DDTCDR: Deep Dual Transfer Cross Domain Recommendation
Title | DDTCDR: Deep Dual Transfer Cross Domain Recommendation |
Authors | Pan Li, Alexander Tuzhilin |
Abstract | Cross domain recommender systems have been increasingly valuable for helping consumers identify the most satisfying items from different categories. However, previously proposed cross-domain models did not take into account bidirectional latent relations between users and items. In addition, they do not explicitly model information of user and item features, while utilizing only user ratings information for recommendations. To address these concerns, in this paper we propose a novel approach to cross-domain recommendations based on the mechanism of dual learning that transfers information between two related domains in an iterative manner until the learning process stabilizes. We develop a novel latent orthogonal mapping to extract user preferences over multiple domains while preserving relations between users across different latent spaces. Combining with autoencoder approach to extract the latent essence of feature information, we propose Deep Dual Transfer Cross Domain Recommendation (DDTCDR) model to provide recommendations in respective domains. We test the proposed method on a large dataset containing three domains of movies, book and music items and demonstrate that it consistently and significantly outperforms several state-of-the-art baselines and also classical transfer learning approaches. |
Tasks | Recommendation Systems, Transfer Learning |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.05189v1 |
https://arxiv.org/pdf/1910.05189v1.pdf | |
PWC | https://paperswithcode.com/paper/ddtcdr-deep-dual-transfer-cross-domain |
Repo | |
Framework | |
An Empirical Evaluation of Adversarial Robustness under Transfer Learning
Title | An Empirical Evaluation of Adversarial Robustness under Transfer Learning |
Authors | Todor Davchev, Timos Korres, Stathi Fotiadis, Nick Antonopoulos, Subramanian Ramamoorthy |
Abstract | In this work, we evaluate adversarial robustness in the context of transfer learning from a source trained on CIFAR 100 to a target network trained on CIFAR 10. Specifically, we study the effects of using robust optimisation in the source and target networks. This allows us to identify transfer learning strategies under which adversarial defences are successfully retained, in addition to revealing potential vulnerabilities. We study the extent to which features learnt by a fast gradient sign method (FGSM) and its iterative alternative (PGD) can preserve their defence properties against black and white-box attacks under three different transfer learning strategies. We find that using PGD examples during training on the source task leads to more general robust features that are easier to transfer. Furthermore, under successful transfer, it achieves 5.2% more accuracy against white-box PGD attacks than suitable baselines. Overall, our empirical evaluations give insights on how well adversarial robustness under transfer learning can generalise. |
Tasks | Transfer Learning |
Published | 2019-05-07 |
URL | https://arxiv.org/abs/1905.02675v4 |
https://arxiv.org/pdf/1905.02675v4.pdf | |
PWC | https://paperswithcode.com/paper/towards-evaluating-and-understanding-robust |
Repo | |
Framework | |