Paper Group AWR 314
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification. DAG-GNN: DAG Structure Learning with Graph Neural Networks. High-Fidelity Image Generation With Fewer Labels. Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction. Dual Graph Convolutional Network for Semantic Segmentation. Learning …
MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification
Title | MultiDepth: Single-Image Depth Estimation via Multi-Task Regression and Classification |
Authors | Lukas Liebel, Marco Körner |
Abstract | We introduce MultiDepth, a novel training strategy and convolutional neural network (CNN) architecture that allows approaching single-image depth estimation (SIDE) as a multi-task problem. SIDE is an important part of road scene understanding. It, thus, plays a vital role in advanced driver assistance systems and autonomous vehicles. Best results for the SIDE task so far have been achieved using deep CNNs. However, optimization of regression problems, such as estimating depth, is still a challenging task. For the related tasks of image classification and semantic segmentation, numerous CNN-based methods with robust training behavior have been proposed. Hence, in order to overcome the notorious instability and slow convergence of depth value regression during training, MultiDepth makes use of depth interval classification as an auxiliary task. The auxiliary task can be disabled at test-time to predict continuous depth values using the main regression branch more efficiently. We applied MultiDepth to road scenes and present results on the KITTI depth prediction dataset. In experiments, we were able to show that end-to-end multi-task learning with both, regression and classification, is able to considerably improve training and yield more accurate results. |
Tasks | Autonomous Vehicles, Depth Estimation, Image Classification, Multi-Task Learning, Scene Understanding, Semantic Segmentation |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.11111v1 |
https://arxiv.org/pdf/1907.11111v1.pdf | |
PWC | https://paperswithcode.com/paper/multidepth-single-image-depth-estimation-via |
Repo | https://github.com/lukasliebel/MultiDepth |
Framework | tf |
DAG-GNN: DAG Structure Learning with Graph Neural Networks
Title | DAG-GNN: DAG Structure Learning with Graph Neural Networks |
Authors | Yue Yu, Jie Chen, Tian Gao, Mo Yu |
Abstract | Learning a faithful directed acyclic graph (DAG) from samples of a joint distribution is a challenging combinatorial problem, owing to the intractable search space superexponential in the number of graph nodes. A recent breakthrough formulates the problem as a continuous optimization with a structural constraint that ensures acyclicity (Zheng et al., 2018). The authors apply the approach to the linear structural equation model (SEM) and the least-squares loss function that are statistically well justified but nevertheless limited. Motivated by the widespread success of deep learning that is capable of capturing complex nonlinear mappings, in this work we propose a deep generative model and apply a variant of the structural constraint to learn the DAG. At the heart of the generative model is a variational autoencoder parameterized by a novel graph neural network architecture, which we coin DAG-GNN. In addition to the richer capacity, an advantage of the proposed model is that it naturally handles discrete variables as well as vector-valued ones. We demonstrate that on synthetic data sets, the proposed method learns more accurate graphs for nonlinearly generated samples; and on benchmark data sets with discrete variables, the learned graphs are reasonably close to the global optima. The code is available at \url{https://github.com/fishmoon1234/DAG-GNN}. |
Tasks | |
Published | 2019-04-22 |
URL | http://arxiv.org/abs/1904.10098v1 |
http://arxiv.org/pdf/1904.10098v1.pdf | |
PWC | https://paperswithcode.com/paper/dag-gnn-dag-structure-learning-with-graph |
Repo | https://github.com/fishmoon1234/DAG-GNN |
Framework | pytorch |
High-Fidelity Image Generation With Fewer Labels
Title | High-Fidelity Image Generation With Fewer Labels |
Authors | Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly |
Abstract | Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work we demonstrate how one can benefit from recent work on self- and semi-supervised learning to outperform the state of the art on both unsupervised ImageNet synthesis, as well as in the conditional setting. In particular, the proposed approach is able to match the sample quality (as measured by FID) of the current state-of-the-art conditional model BigGAN on ImageNet using only 10% of the labels and outperform it using 20% of the labels. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2019-03-06 |
URL | https://arxiv.org/abs/1903.02271v2 |
https://arxiv.org/pdf/1903.02271v2.pdf | |
PWC | https://paperswithcode.com/paper/high-fidelity-image-generation-with-fewer |
Repo | https://github.com/google/compare_gan |
Framework | tf |
Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction
Title | Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction |
Authors | Zekun Li, Zeyu Cui, Shu Wu, Xiaoyu Zhang, Liang Wang |
Abstract | Click-through rate (CTR) prediction is an essential task in web applications such as online advertising and recommender systems, whose features are usually in multi-field form. The key of this task is to model feature interactions among different feature fields. Recently proposed deep learning based models follow a general paradigm: raw sparse input multi-filed features are first mapped into dense field embedding vectors, and then simply concatenated together to feed into deep neural networks (DNN) or other specifically designed networks to learn high-order feature interactions. However, the simple \emph{unstructured combination} of feature fields will inevitably limit the capability to model sophisticated interactions among different fields in a sufficiently flexible and explicit fashion. In this work, we propose to represent the multi-field features in a graph structure intuitively, where each node corresponds to a feature field and different fields can interact through edges. The task of modeling feature interactions can be thus converted to modeling node interactions on the corresponding graph. To this end, we design a novel model Feature Interaction Graph Neural Networks (Fi-GNN). Taking advantage of the strong representative power of graphs, our proposed model can not only model sophisticated feature interactions in a flexible and explicit fashion, but also provide good model explanations for CTR prediction. Experimental results on two real-world datasets show its superiority over the state-of-the-arts. |
Tasks | Click-Through Rate Prediction, Recommendation Systems |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05552v1 |
https://arxiv.org/pdf/1910.05552v1.pdf | |
PWC | https://paperswithcode.com/paper/fi-gnn-modeling-feature-interactions-via |
Repo | https://github.com/CRIPAC-DIG/Fi_GNNs |
Framework | tf |
Dual Graph Convolutional Network for Semantic Segmentation
Title | Dual Graph Convolutional Network for Semantic Segmentation |
Authors | Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr |
Abstract | Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation. In contrast to previous work that uses multi-scale feature fusion or dilated convolutions, we propose a novel graph-convolutional network (GCN) to address this problem. Our Dual Graph Convolutional Network (DGCNet) models the global context of the input feature by modelling two orthogonal graphs in a single framework. The first component models spatial relationships between pixels in the image, whilst the second models interdependencies along the channel dimensions of the network’s feature map. This is done efficiently by projecting the feature into a new, lower-dimensional space where all pairwise interactions can be modelled, before reprojecting into the original space. Our simple method provides substantial benefits over a strong baseline and achieves state-of-the-art results on both Cityscapes (82.0% mean IoU) and Pascal Context (53.7% mean IoU) datasets. |
Tasks | Semantic Segmentation |
Published | 2019-09-13 |
URL | https://arxiv.org/abs/1909.06121v2 |
https://arxiv.org/pdf/1909.06121v2.pdf | |
PWC | https://paperswithcode.com/paper/dual-graph-convolutional-network-for-semantic |
Repo | https://github.com/lxtGH/GALD-Net |
Framework | pytorch |
Learning to Adapt for Stereo
Title | Learning to Adapt for Stereo |
Authors | Alessio Tonioni, Oscar Rahnama, Thomas Joy, Luigi Di Stefano, Thalaiyasingam Ajanthan, Philip H. S. Torr |
Abstract | Real world applications of stereo depth estimation require models that are robust to dynamic variations in the environment. Even though deep learning based stereo methods are successful, they often fail to generalize to unseen variations in the environment, making them less suitable for practical applications such as autonomous driving. In this work, we introduce a “learning-to-adapt” framework that enables deep stereo methods to continuously adapt to new target domains in an unsupervised manner. Specifically, our approach incorporates the adaptation procedure into the learning objective to obtain a base set of parameters that are better suited for unsupervised online adaptation. To further improve the quality of the adaptation, we learn a confidence measure that effectively masks the errors introduced during the unsupervised adaptation. We evaluate our method on synthetic and real-world stereo datasets and our experiments evidence that learning-to-adapt is, indeed beneficial for online adaptation on vastly different domains. |
Tasks | Autonomous Driving, Depth Estimation, Stereo Depth Estimation |
Published | 2019-04-05 |
URL | http://arxiv.org/abs/1904.02957v1 |
http://arxiv.org/pdf/1904.02957v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-adapt-for-stereo |
Repo | https://github.com/CVLAB-Unibo/Learning2AdaptForStereo |
Framework | tf |
A Novel Independent RNN Approach to Classification of Seizures against Non-seizures
Title | A Novel Independent RNN Approach to Classification of Seizures against Non-seizures |
Authors | Xinghua Yao, Qiang Cheng, Guo-Qiang Zhang |
Abstract | In current clinical practices, electroencephalograms (EEG) are reviewed and analyzed by trained neurologists to provide supports for therapeutic decisions. Manual reviews can be laborious and error prone. Automatic and accurate seizure/non-seizure classification methods are desirable. A critical challenge is that seizure morphologies exhibit considerable variabilities. In order to capture essential seizure features, this paper leverages an emerging deep learning model, the independently recurrent neural network (IndRNN), to construct a new approach for the seizure/non-seizure classification. This new approach gradually expands the time scales, thereby extracting temporal and spatial features from the local time duration to the entire record. Evaluations are conducted with cross-validation experiments across subjects over the noisy data of CHB-MIT. Experimental results demonstrate that the proposed approach outperforms the current state-of-the-art methods. In addition, we explore how the segment length affects the classification performance. Thirteen different segment lengths are assessed, showing that the classification performance varies over the segment lengths, and the maximal fluctuating margin is more than 4%. Thus, the segment length is an important factor influencing the classification performance. |
Tasks | EEG |
Published | 2019-03-22 |
URL | http://arxiv.org/abs/1903.09326v1 |
http://arxiv.org/pdf/1903.09326v1.pdf | |
PWC | https://paperswithcode.com/paper/a-novel-independent-rnn-approach-to |
Repo | https://github.com/gabi-a/EEG-Literature |
Framework | none |
SleepEEGNet: Automated Sleep Stage Scoring with Sequence to Sequence Deep Learning Approach
Title | SleepEEGNet: Automated Sleep Stage Scoring with Sequence to Sequence Deep Learning Approach |
Authors | Sajad Mousavi, Fatemeh Afghah, U. Rajendra Acharya |
Abstract | Electroencephalogram (EEG) is a common base signal used to monitor brain activity and diagnose sleep disorders. Manual sleep stage scoring is a time-consuming task for sleep experts and is limited by inter-rater reliability. In this paper, we propose an automatic sleep stage annotation method called SleepEEGNet using a single-channel EEG signal. The SleepEEGNet is composed of deep convolutional neural networks (CNNs) to extract time-invariant features, frequency information, and a sequence to sequence model to capture the complex and long short-term context dependencies between sleep epochs and scores. In addition, to reduce the effect of the class imbalance problem presented in the available sleep datasets, we applied novel loss functions to have an equal misclassified error for each sleep stage while training the network. We evaluated the proposed method on different single-EEG channels (i.e., Fpz-Cz and Pz-Oz EEG channels) from the Physionet Sleep-EDF datasets published in 2013 and 2018. The evaluation results demonstrate that the proposed method achieved the best annotation performance compared to current literature, with an overall accuracy of 84.26%, a macro F1-score of 79.66% and Cohen’s Kappa coefficient = 0.79. Our developed model is ready to test with more sleep EEG signals and aid the sleep specialists to arrive at an accurate diagnosis. The source code is available at https://github.com/SajadMo/SleepEEGNet. |
Tasks | EEG, Sleep Stage Detection |
Published | 2019-03-05 |
URL | http://arxiv.org/abs/1903.02108v1 |
http://arxiv.org/pdf/1903.02108v1.pdf | |
PWC | https://paperswithcode.com/paper/sleepeegnet-automated-sleep-stage-scoring |
Repo | https://github.com/SajadMo/SleepEEGNet |
Framework | tf |
Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
Title | Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models |
Authors | Yuge Shi, N. Siddharth, Brooks Paige, Philip H. S. Torr |
Abstract | Learning generative models that span multiple data modalities, such as vision and language, is often motivated by the desire to learn more useful, generalisable representations that faithfully capture common underlying factors between the modalities. In this work, we characterise successful learning of such models as the fulfillment of four criteria: i) implicit latent decomposition into shared and private subspaces, ii) coherent joint generation over all modalities, iii) coherent cross-generation across individual modalities, and iv) improved model learning for individual modalities through multi-modal integration. Here, we propose a mixture-of-experts multimodal variational autoencoder (MMVAE) to learn generative models on different sets of modalities, including a challenging image-language dataset, and demonstrate its ability to satisfy all four criteria, both qualitatively and quantitatively. |
Tasks | |
Published | 2019-11-08 |
URL | https://arxiv.org/abs/1911.03393v1 |
https://arxiv.org/pdf/1911.03393v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-mixture-of-experts-autoencoders |
Repo | https://github.com/iffsid/mmvae |
Framework | pytorch |
Kernelized Bayesian Softmax for Text Generation
Title | Kernelized Bayesian Softmax for Text Generation |
Authors | Ning Miao, Hao Zhou, Chengqi Zhao, Wenxian Shi, Lei Li |
Abstract | Neural models for text generation require a softmax layer with proper token embeddings during the decoding phase. Most existing approaches adopt single point embedding for each token. However, a word may have multiple senses according to different context, some of which might be distinct. In this paper, we propose KerBS, a novel approach for learning better embeddings for text generation. KerBS embodies two advantages: (a) it employs a Bayesian composition of embeddings for words with multiple senses; (b) it is adaptive to semantic variances of words and robust to rare sentence context by imposing learned kernels to capture the closeness of words (senses) in the embedding space. Empirical studies show that KerBS significantly boosts the performance of several text generation tasks. |
Tasks | Text Generation |
Published | 2019-11-01 |
URL | https://arxiv.org/abs/1911.00274v1 |
https://arxiv.org/pdf/1911.00274v1.pdf | |
PWC | https://paperswithcode.com/paper/kernelized-bayesian-softmax-for-text |
Repo | https://github.com/NingMiao/KerBS |
Framework | tf |
The Multi-Lane Capsule Network (MLCN)
Title | The Multi-Lane Capsule Network (MLCN) |
Authors | Vanderson Martins do Rosario, Edson Borin, Mauricio Breternitz Jr |
Abstract | We introduce Multi-Lane Capsule Networks (MLCN), which are a separable and resource efficient organization of Capsule Networks (CapsNet) that allows parallel processing, while achieving high accuracy at reduced cost. A MLCN is composed of a number of (distinct) parallel lanes, each contributing to a dimension of the result, trained using the routing-by-agreement organization of CapsNet. Our results indicate similar accuracy with a much reduced cost in number of parameters for the Fashion-MNIST and Cifar10 datsets. They also indicate that the MLCN outperforms the original CapsNet when using a proposed novel configuration for the lanes. MLCN also has faster training and inference times, being more than two-fold faster than the original CapsNet in the same accelerator. |
Tasks | |
Published | 2019-02-22 |
URL | http://arxiv.org/abs/1902.08431v1 |
http://arxiv.org/pdf/1902.08431v1.pdf | |
PWC | https://paperswithcode.com/paper/the-multi-lane-capsule-network-mlcn |
Repo | https://github.com/vandersonmr/lanes-capsnet |
Framework | tf |
Progressive Stochastic Binarization of Deep Networks
Title | Progressive Stochastic Binarization of Deep Networks |
Authors | David Hartmann, Michael Wand |
Abstract | A plethora of recent research has focused on improving the memory footprint and inference speed of deep networks by reducing the complexity of (i) numerical representations (for example, by deterministic or stochastic quantization) and (ii) arithmetic operations (for example, by binarization of weights). We propose a stochastic binarization scheme for deep networks that allows for efficient inference on hardware by restricting itself to additions of small integers and fixed shifts. Unlike previous approaches, the underlying randomized approximation is progressive, thus permitting an adaptive control of the accuracy of each operation at run-time. In a low-precision setting, we match the accuracy of previous binarized approaches. Our representation is unbiased - it approaches continuous computation with increasing sample size. In a high-precision regime, the computational costs are competitive with previous quantization schemes. Progressive stochastic binarization also permits localized, dynamic accuracy control within a single network, thereby providing a new tool for adaptively focusing computational attention. We evaluate our method on networks of various architectures, already pretrained on ImageNet. With representational costs comparable to previous schemes, we obtain accuracies close to the original floating point implementation. This includes pruned networks, except the known special case of certain types of separated convolutions. By focusing computational attention using progressive sampling, we reduce inference costs on ImageNet further by a factor of up to 33% (before network pruning). |
Tasks | Network Pruning, Quantization |
Published | 2019-04-03 |
URL | http://arxiv.org/abs/1904.02205v1 |
http://arxiv.org/pdf/1904.02205v1.pdf | |
PWC | https://paperswithcode.com/paper/progressive-stochastic-binarization-of-deep |
Repo | https://github.com/JGU-VC/progressive_stochastic_binarization |
Framework | tf |
Amortized Population Gibbs Samplers with Neural Sufficient Statistics
Title | Amortized Population Gibbs Samplers with Neural Sufficient Statistics |
Authors | Hao Wu, Heiko Zimmermann, Eli Sennesh, Tuan Anh Le, Jan-Willem van de Meent |
Abstract | Amortized variational methods have proven difficult to scale to structured problems, such as inferring positions of multiple objects from video images. We develop amortized population Gibbs (APG) samplers, a class of scalable methods that frames structured variational inference as adaptive importance sampling. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train each conditional proposal by minimizing the inclusive KL divergence with respect to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics. Experiments show that APG samplers can train highly structured deep generative models in an unsupervised manner, and achieve substantial improvements in inference accuracy relative to standard autoencoding variational methods. |
Tasks | |
Published | 2019-11-04 |
URL | https://arxiv.org/abs/1911.01382v2 |
https://arxiv.org/pdf/1911.01382v2.pdf | |
PWC | https://paperswithcode.com/paper/amortized-population-gibbs-samplers-with |
Repo | https://github.com/hao-w/apg-samplers |
Framework | pytorch |
Stability of Graph Scattering Transforms
Title | Stability of Graph Scattering Transforms |
Authors | Fernando Gama, Joan Bruna, Alejandro Ribeiro |
Abstract | Scattering transforms are non-trainable deep convolutional architectures that exploit the multi-scale resolution of a wavelet filter bank to obtain an appropriate representation of data. More importantly, they are proven invariant to translations, and stable to perturbations that are close to translations. This stability property dons the scattering transform with a robustness to small changes in the metric domain of the data. When considering network data, regular convolutions do not hold since the data domain presents an irregular structure given by the network topology. In this work, we extend scattering transforms to network data by using multiresolution graph wavelets, whose computation can be obtained by means of graph convolutions. Furthermore, we prove that the resulting graph scattering transforms are stable to metric perturbations of the underlying network. This renders graph scattering transforms robust to changes on the network topology, making it particularly useful for cases of transfer learning, topology estimation or time-varying graphs. |
Tasks | Transfer Learning |
Published | 2019-06-11 |
URL | https://arxiv.org/abs/1906.04784v1 |
https://arxiv.org/pdf/1906.04784v1.pdf | |
PWC | https://paperswithcode.com/paper/stability-of-graph-scattering-transforms |
Repo | https://github.com/alelab-upenn/graph-scattering-transforms |
Framework | pytorch |
On Relativistic $f$-Divergences
Title | On Relativistic $f$-Divergences |
Authors | Alexia Jolicoeur-Martineau |
Abstract | This paper provides a more rigorous look at Relativistic Generative Adversarial Networks (RGANs). We prove that the objective function of the discriminator is a statistical divergence for any concave function $f$ with minimal properties ($f(0)=0$, $f’(0) \neq 0$, $\sup_x f(x)>0$). We also devise a few variants of relativistic $f$-divergences. Wasserstein GAN was originally justified by the idea that the Wasserstein distance (WD) is most sensible because it is weak (i.e., it induces a weak topology). We show that the WD is weaker than $f$-divergences which are weaker than relativistic $f$-divergences. Given the good performance of RGANs, this suggests that WGAN does not performs well primarily because of the weak metric, but rather because of regularization and the use of a relativistic discriminator. We also take a closer look at estimators of relativistic $f$-divergences. We introduce the minimum-variance unbiased estimator (MVUE) for Relativistic paired GANs (RpGANs; originally called RGANs which could bring confusion) and show that it does not perform better. Furthermore, we show that the estimator of Relativistic average GANs (RaGANs) is only asymptotically unbiased, but that the finite-sample bias is small. Removing this bias does not improve performance. |
Tasks | |
Published | 2019-01-08 |
URL | http://arxiv.org/abs/1901.02474v1 |
http://arxiv.org/pdf/1901.02474v1.pdf | |
PWC | https://paperswithcode.com/paper/on-relativistic-f-divergences |
Repo | https://github.com/AlexiaJM/relativistic-f-divergences |
Framework | tf |