February 2, 2020

3328 words 16 mins read

Paper Group AWR 4

Paper Group AWR 4

Multitask Learning On Graph Neural Networks Applied To Molecular Property Predictions. A Self Validation Network for Object-Level Human Attention Estimation. Improving Visual Relation Detection using Depth Maps. Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization. Representational Rényi heterogenei …

Multitask Learning On Graph Neural Networks Applied To Molecular Property Predictions

Title Multitask Learning On Graph Neural Networks Applied To Molecular Property Predictions
Authors Fabio Capela, Vincent Nouchi, Ruud Van Deursen, Igor V. Tetko, Guillaume Godin
Abstract Prediction of molecular properties, including physico-chemical properties, is a challenging task in chemistry. Herein we present a new state-of-the-art multitask prediction method based on existing graph neural network models. We have used different architectures for our models and the results clearly demonstrate that multitask learning can improve model performance. Additionally, a significant reduction of variance in the models has been observed. Most importantly, datasets with a small amount of data points reach better results without the need of augmentation.
Tasks Data Augmentation
Published 2019-10-29
URL https://arxiv.org/abs/1910.13124v2
PDF https://arxiv.org/pdf/1910.13124v2.pdf
PWC https://paperswithcode.com/paper/multitask-learning-on-graph-neural-networks-1
Repo https://github.com/firmenich/MultiTask-GNN
Framework pytorch

A Self Validation Network for Object-Level Human Attention Estimation

Title A Self Validation Network for Object-Level Human Attention Estimation
Authors Zehua Zhang, Chen Yu, David Crandall
Abstract Due to the foveated nature of the human vision system, people can focus their visual attention on a small region of their visual field at a time, which usually contains only a single object. Estimating this object of attention in first-person (egocentric) videos is useful for many human-centered real-world applications such as augmented reality applications and driver assistance systems. A straightforward solution for this problem is to pick the object whose bounding box is hit by the gaze, where eye gaze point estimation is obtained from a traditional eye gaze estimator and object candidates are generated from an off-the-shelf object detector. However, such an approach can fail because it addresses the where and the what problems separately, despite that they are highly related, chicken-and-egg problems. In this paper, we propose a novel unified model that incorporates both spatial and temporal evidence in identifying as well as locating the attended object in firstperson videos. It introduces a novel Self Validation Module that enforces and leverages consistency of the where and the what concepts. We evaluate on two public datasets, demonstrating that Self Validation Module significantly benefits both training and testing and that our model outperforms the state-of-the-art.
Tasks
Published 2019-10-31
URL https://arxiv.org/abs/1910.14260v2
PDF https://arxiv.org/pdf/1910.14260v2.pdf
PWC https://paperswithcode.com/paper/a-self-validation-network-for-object-level
Repo https://github.com/zehzhang/MindreaderNet-Mr.-Net-
Framework tf

Improving Visual Relation Detection using Depth Maps

Title Improving Visual Relation Detection using Depth Maps
Authors Sahand Sharifzadeh, Sina Moayed Baharlou, Max Berrendorf, Rajat Koner, Volker Tresp
Abstract State-of-the-art visual relation detection methods mostly rely on object information extracted from RGB images such as predicted class probabilities, 2D bounding boxes and feature maps. Depth maps can additionally provide valuable information on object relations, e.g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding. In this work, we study the effect of using different object information with a focus on depth maps. To enable this study, we release a new synthetic dataset of depth maps, VG-Depth, as an extension to Visual Genome (VG). We also note that given the highly imbalanced distribution of relations in VG, typical evaluation metrics for visual relation detection cannot reveal improvements of under-represented relations. To address this problem, we propose using an additional metric, calling it Macro Recall@K, and demonstrate its remarkable performance on VG. Finally, our experiments confirm that by effective utilization of depth maps within a simple, yet competitive framework, the performance of visual relation detection can be significantly improved.
Tasks
Published 2019-05-02
URL https://arxiv.org/abs/1905.00966v3
PDF https://arxiv.org/pdf/1905.00966v3.pdf
PWC https://paperswithcode.com/paper/improving-visual-relation-detection-using
Repo https://github.com/Sina-Baharlou/Depth-VRD
Framework pytorch

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Title Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization
Authors Qi Zhou, Houqiang Li, Jie Wang
Abstract Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.
Tasks
Published 2019-11-28
URL https://arxiv.org/abs/1911.12574v1
PDF https://arxiv.org/pdf/1911.12574v1.pdf
PWC https://paperswithcode.com/paper/deep-model-based-reinforcement-learning-via
Repo https://github.com/MIRALab-USTC/RL-POMBU
Framework tf

Representational Rényi heterogeneity

Title Representational Rényi heterogeneity
Authors Abraham Nunes, Martin Alda, Timothy Bardouille, Thomas Trappenberg
Abstract A discrete system’s heterogeneity is measured by the R'enyi heterogeneity family of indices (also known as Hill numbers or Hannah-Kay indices), whose units are known as the numbers equivalent, and whose scaling properties are consistent and intuitive. Unfortunately, numbers equivalent heterogeneity measures for non-categorical data require a priori (A) categorical partitioning and (B) pairwise distance measurement on the space of observable data. This precludes their application to problems in disciplines where categories are ill-defined or where semantically relevant features must be learned as abstractions from some data. We thus introduce representational R'enyi heterogeneity (RRH), which transforms an observable domain onto a latent space upon which the R'enyi heterogeneity is both tractable and semantically relevant. This method does not require a priori binning nor definition of a distance function on the observable space. Compared with existing state-of-the-art indices on a beta-mixture distribution, we show that RRH more accurately detects the number of distinct mixture components. We also show that RRH can measure heterogeneity in natural images whose semantically relevant features must be abstracted using deep generative models. We further show that RRH can uniquely capture heterogeneity caused by distinct components in mixture distributions. Our novel approach will enable measurement of heterogeneity in disciplines where a priori categorical partitions of observable data are not possible, or where semantically relevant features must be inferred using latent variable models.
Tasks Latent Variable Models
Published 2019-12-10
URL https://arxiv.org/abs/1912.05031v2
PDF https://arxiv.org/pdf/1912.05031v2.pdf
PWC https://paperswithcode.com/paper/representational-renyi-heterogeneity
Repo https://github.com/abrahamnunes/RRH
Framework none

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Title A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling
Authors Haoran Chen, Ke Lin, Alexander Maye, Jianming Li, Xiaolin Hu
Abstract Given the features of a video, recurrent neural network can be used to automatically generate a caption for the video. Existing methods for video captioning have at least three limitations. First, semantic information has been widely applied to boost the performance of video captioning models, but existing networks often fail to provide meaningful semantic features. Second, Teacher Forcing algorithm is often utilized to optimize video captioning models, but during training and inference, different strategies are applied to guide word generation, which lead to poor performance. Third, current video captioning models are prone to generate relatively short captions, which express video contents inappropriately. Towards resolving these three problems, we make three improvements correspondingly. First of all, we utilize both static spatial features and dynamic spatio-temporal features as input for semantic detection network (SDN) in order to generate meaningful semantic features for videos. Then, we propose a scheduled sampling strategy which gradually transfers the training phase from a teacher guiding manner towards a more self teaching manner. At last, the ordinary logarithm probability loss function is leveraged by sentence length so that short sentence inclination is alleviated. Our model achieves state-of-the-art results on the Youtube2Text dataset and is competitive with the state-of-the-art models on the MSR-VTT dataset.
Tasks Video Captioning
Published 2019-08-31
URL https://arxiv.org/abs/1909.00121v2
PDF https://arxiv.org/pdf/1909.00121v2.pdf
PWC https://paperswithcode.com/paper/a-semantics-assisted-video-captioning-model
Repo https://github.com/WingsBrokenAngel/Semantics-AssistedVideoCaptioningModelTrainedwithScheduledSamplingStrategy
Framework tf

Adaptive Masked Proxies for Few-Shot Segmentation

Title Adaptive Masked Proxies for Few-Shot Segmentation
Authors Mennatullah Siam, Boris Oreshkin, Martin Jagersand
Abstract Deep learning has thrived by training on large-scale datasets. However, in robotics applications sample efficiency is critical. We propose a novel adaptive masked proxies method that constructs the final segmentation layer weights from few labelled samples. It utilizes multi-resolution average pooling on base embeddings masked with the label to act as a positive proxy for the new class, while fusing it with the previously learned class signatures. Our method is evaluated on PASCAL-$5^i$ dataset and outperforms the state-of-the-art in the few-shot semantic segmentation. Unlike previous methods, our approach does not require a second branch to estimate parameters or prototypes, which enables it to be used with 2-stream motion and appearance based segmentation networks. We further propose a novel setup for evaluating continual learning of object segmentation which we name incremental PASCAL (iPASCAL) where our method outperforms the baseline method. Our code is publicly available at https://github.com/MSiam/AdaptiveMaskedProxies.
Tasks Continual Learning, Few-Shot Semantic Segmentation, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation
Published 2019-02-19
URL https://arxiv.org/abs/1902.11123v5
PDF https://arxiv.org/pdf/1902.11123v5.pdf
PWC https://paperswithcode.com/paper/adaptive-masked-weight-imprinting-for-few
Repo https://github.com/MSiam/AdaptiveMaskedProxies
Framework pytorch

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

Title Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning
Authors Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Hongyang Chao, Tao Mei
Abstract It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence to sequence manner via Recurrent Neural Network (RNN). Nevertheless, the training of RNN still suffers to some degree from vanishing/exploding gradient problem, making the optimization difficult. Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations. In this paper, we present a novel design — Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both encoder and decoder networks for video captioning. Technically, we exploit convolutional block structures that compute intermediate states of a fixed number of inputs and stack several blocks to capture long-term relationships. The structure in encoder is further equipped with temporal deformable convolution to enable free-form deformation of temporal sampling. Our model also capitalizes on temporal attention mechanism for sentence generation. Extensive experiments are conducted on both MSVD and MSR-VTT video captioning datasets, and superior results are reported when comparing to conventional RNN-based encoder-decoder techniques. More remarkably, TDConvED increases CIDEr-D performance from 58.8% to 67.2% on MSVD.
Tasks Video Captioning
Published 2019-05-03
URL https://arxiv.org/abs/1905.01077v1
PDF https://arxiv.org/pdf/1905.01077v1.pdf
PWC https://paperswithcode.com/paper/temporal-deformable-convolutional-encoder
Repo https://github.com/b05902062/TDConvED
Framework pytorch

Genetic Algorithm-based Polar Code Construction for the AWGN Channel

Title Genetic Algorithm-based Polar Code Construction for the AWGN Channel
Authors Ahmed Elkelesh, Moustafa Ebada, Sebastian Cammerer, Stephan ten Brink
Abstract We propose a new polar code construction framework (i.e., selecting the frozen bit positions) for the additive white Gaussian noise (AWGN) channel, tailored to a given decoding algorithm, rather than based on the (not necessarily optimal) assumption of successive cancellation (SC) decoding. The proposed framework is based on the Genetic Algorithm (GenAlg), where populations (i.e., collections) of information sets evolve successively via evolutionary transformations based on their individual error-rate performance. These populations converge towards an information set that fits the decoding behavior. Using our proposed algorithm, we construct a polar code of length 2048 with code rate 0.5, without the CRC-aid, tailored to plain successive cancellation list (SCL) decoding, achieving the same error-rate performance as the CRC-aided SCL decoding, and leading to a coding gain of 1 dB at BER of $10^{-6}$. Further, a belief propagation (BP)-tailored polar code approaches the SCL error-rate performance without any modifications in the decoding algorithm itself.
Tasks
Published 2019-01-19
URL http://arxiv.org/abs/1901.06444v1
PDF http://arxiv.org/pdf/1901.06444v1.pdf
PWC https://paperswithcode.com/paper/genetic-algorithm-based-polar-code
Repo https://github.com/AhmedElkelesh/Genetic-Algorithm-based-Polar-Code-Construction
Framework none

VideoBERT: A Joint Model for Video and Language Representation Learning

Title VideoBERT: A Joint Model for Video and Language Representation Learning
Authors Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid
Abstract Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube. Whereas most existing approaches learn low-level representations, we propose a joint visual-linguistic model to learn high-level features without any explicit supervision. In particular, inspired by its recent success in language modeling, we build upon the BERT model to learn bidirectional joint distributions over sequences of visual and linguistic tokens, derived from vector quantization of video data and off-the-shelf speech recognition outputs, respectively. We use VideoBERT in numerous tasks, including action classification and video captioning. We show that it can be applied directly to open-vocabulary classification, and confirm that large amounts of training data and cross-modal information are critical to performance. Furthermore, we outperform the state-of-the-art on video captioning, and quantitative results verify that the model learns high-level semantic features.
Tasks Action Classification, Language Modelling, Quantization, Representation Learning, Speech Recognition, Video Captioning
Published 2019-04-03
URL https://arxiv.org/abs/1904.01766v2
PDF https://arxiv.org/pdf/1904.01766v2.pdf
PWC https://paperswithcode.com/paper/videobert-a-joint-model-for-video-and
Repo https://github.com/DataScienceNigeria/AI-powered-by-Google-s-VideoBERT-
Framework none

When Does Label Smoothing Help?

Title When Does Label Smoothing Help?
Authors Rafael Müller, Simon Kornblith, Geoffrey Hinton
Abstract The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation and speech recognition. Despite its widespread use, label smoothing is still poorly understood. Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. However, we also observe that if a teacher network is trained with label smoothing, knowledge distillation into a student network is much less effective. To explain these observations, we visualize how label smoothing changes the representations learned by the penultimate layer of the network. We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters. This results in loss of information in the logits about resemblances between instances of different classes, which is necessary for distillation, but does not hurt generalization or calibration of the model’s predictions.
Tasks Calibration, Image Classification, Speech Recognition
Published 2019-06-06
URL https://arxiv.org/abs/1906.02629v2
PDF https://arxiv.org/pdf/1906.02629v2.pdf
PWC https://paperswithcode.com/paper/when-does-label-smoothing-help
Repo https://github.com/seominseok0429/label-smoothing-visualization-pytorch
Framework pytorch

Probing Biomedical Embeddings from Language Models

Title Probing Biomedical Embeddings from Language Models
Authors Qiao Jin, Bhuwan Dhingra, William W. Cohen, Xinghua Lu
Abstract Contextualized word embeddings derived from pre-trained language models (LMs) show significant improvements on downstream NLP tasks. Pre-training on domain-specific corpora, such as biomedical articles, further improves their performance. In this paper, we conduct probing experiments to determine what additional information is carried intrinsically by the in-domain trained contextualized embeddings. For this we use the pre-trained LMs as fixed feature extractors and restrict the downstream task models to not have additional sequence modeling layers. We compare BERT, ELMo, BioBERT and BioELMo, a biomedical version of ELMo trained on 10M PubMed abstracts. Surprisingly, while fine-tuned BioBERT is better than BioELMo in biomedical NER and NLI tasks, as a fixed feature extractor BioELMo outperforms BioBERT in our probing tasks. We use visualization and nearest neighbor analysis to show that better encoding of entity-type and relational information leads to this superiority.
Tasks Word Embeddings
Published 2019-04-03
URL http://arxiv.org/abs/1904.02181v1
PDF http://arxiv.org/pdf/1904.02181v1.pdf
PWC https://paperswithcode.com/paper/probing-biomedical-embeddings-from-language
Repo https://github.com/Andy-jqa/bioelmo
Framework tf

Machine Learning and System Identification for Estimation in Physical Systems

Title Machine Learning and System Identification for Estimation in Physical Systems
Authors Fredrik Bagge Carlson
Abstract In this thesis, we draw inspiration from both classical system identification and modern machine learning in order to solve estimation problems for real-world, physical systems. The main approach to estimation and learning adopted is optimization based. Concepts such as regularization will be utilized for encoding of prior knowledge and basis-function expansions will be used to add nonlinear modeling power while keeping data requirements practical. The thesis covers a wide range of applications, many inspired by applications within robotics, but also extending outside this already wide field. Usage of the proposed methods and algorithms are in many cases illustrated in the real-world applications that motivated the research. Topics covered include dynamics modeling and estimation, model-based reinforcement learning, spectral estimation, friction modeling and state estimation and calibration in robotic machining. In the work on modeling and identification of dynamics, we develop regularization strategies that allow us to incorporate prior domain knowledge into flexible, overparameterized models. We make use of classical control theory to gain insight into training and regularization while using flexible tools from modern deep learning. A particular focus of the work is to allow use of modern methods in scenarios where gathering data is associated with a high cost. In the robotics-inspired parts of the thesis, we develop methods that are practically motivated and ensure that they are implementable also outside the research setting. We demonstrate this by performing experiments in realistic settings and providing open-source implementations of all proposed methods and algorithms.
Tasks Calibration
Published 2019-06-05
URL https://arxiv.org/abs/1906.02003v1
PDF https://arxiv.org/pdf/1906.02003v1.pdf
PWC https://paperswithcode.com/paper/machine-learning-and-system-identification
Repo https://github.com/baggepinnen/LTVModels.jl
Framework none

Variational Denoising Network: Toward Blind Noise Modeling and Removal

Title Variational Denoising Network: Toward Blind Noise Modeling and Removal
Authors Zongsheng Yue, Hongwei Yong, Qian Zhao, Lei Zhang, Deyu Meng
Abstract Blind image denoising is an important yet very challenging problem in computer vision due to the complicated acquisition process of real images. In this work we propose a new variational inference method, which integrates both noise estimation and image denoising into a unique Bayesian framework, for blind image denoising. Specifically, an approximate posterior, parameterized by deep neural networks, is presented by taking the intrinsic clean image and noise variances as latent variables conditioned on the input noisy image. This posterior provides explicit parametric forms for all its involved hyper-parameters, and thus can be easily implemented for blind image denoising with automatic noise estimation for the test noisy image. On one hand, as other data-driven deep learning methods, our method, namely variational denoising network (VDN), can perform denoising efficiently due to its explicit form of posterior expression. On the other hand, VDN inherits the advantages of traditional model-driven approaches, especially the good generalization capability of generative models. VDN has good interpretability and can be flexibly utilized to estimate and remove complicated non-i.i.d. noise collected in real scenarios. Comprehensive experiments are performed to substantiate the superiority of our method in blind image denoising.
Tasks Denoising, Image Denoising
Published 2019-08-29
URL https://arxiv.org/abs/1908.11314v2
PDF https://arxiv.org/pdf/1908.11314v2.pdf
PWC https://paperswithcode.com/paper/variational-denoising-network-toward-blind
Repo https://github.com/zsyOAOA/VDNet
Framework pytorch

DeepFlow: History Matching in the Space of Deep Generative Models

Title DeepFlow: History Matching in the Space of Deep Generative Models
Authors Lukas Mosser, Olivier Dubrule, Martin J. Blunt
Abstract The calibration of a reservoir model with observed transient data of fluid pressures and rates is a key task in obtaining a predictive model of the flow and transport behaviour of the earth’s subsurface. The model calibration task, commonly referred to as “history matching”, can be formalised as an ill-posed inverse problem where we aim to find the underlying spatial distribution of petrophysical properties that explain the observed dynamic data. We use a generative adversarial network pretrained on geostatistical object-based models to represent the distribution of rock properties for a synthetic model of a hydrocarbon reservoir. The dynamic behaviour of the reservoir fluids is modelled using a transient two-phase incompressible Darcy formulation. We invert for the underlying reservoir properties by first modeling property distributions using the pre-trained generative model then using the adjoint equations of the forward problem to perform gradient descent on the latent variables that control the output of the generative model. In addition to the dynamic observation data, we include well rock-type constraints by introducing an additional objective function. Our contribution shows that for a synthetic test case, we are able to obtain solutions to the inverse problem by optimising in the latent variable space of a deep generative model, given a set of transient observations of a non-linear forward problem.
Tasks Calibration
Published 2019-05-14
URL https://arxiv.org/abs/1905.05749v2
PDF https://arxiv.org/pdf/1905.05749v2.pdf
PWC https://paperswithcode.com/paper/deepflow-history-matching-in-the-space-of
Repo https://github.com/LukasMosser/DeepFlow
Framework pytorch
comments powered by Disqus