February 1, 2020

3153 words 15 mins read

Paper Group AWR 143

Learning document embeddings along with their uncertainties. Continual Reinforcement Learning in 3D Non-stationary Environments. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Learning to Denoise Distantly-Labeled Data for Entity Typing. Neural Contextual Bandits with UCB-based Exploration. Semi-Conditio …

Learning document embeddings along with their uncertainties


Title	Learning document embeddings along with their uncertainties
Authors	Santosh Kesiraju, Oldřich Plchot, Lukáš Burget, Suryakanth V Gangashetty
Abstract	Majority of the text modelling techniques yield only point-estimates of document embeddings and lack in capturing the uncertainty of the estimates. These uncertainties give a notion of how well the embeddings represent a document. We present Bayesian subspace multinomial model (Bayesian SMM), a generative log-linear model that learns to represent documents in the form of Gaussian distributions, thereby encoding the uncertainty in its co-variance. Additionally, in the proposed Bayesian SMM, we address a commonly encountered problem of intractability that appears during variational inference in mixed-logit models. We also present a generative Gaussian linear classifier for topic identification that exploits the uncertainty in document embeddings. Our intrinsic evaluation using perplexity measure shows that the proposed Bayesian SMM fits the data better as compared to the state-of-the-art neural variational document model on Fisher speech and 20Newsgroups text corpora. Our topic identification experiments show that the proposed systems are robust to over-fitting on unseen test data. The topic ID results show that the proposed model is outperforms state-of-the-art unsupervised topic models and achieve comparable results to the state-of-the-art fully supervised discriminative models.
Tasks	Topic Models
Published	2019-08-20
URL	https://arxiv.org/abs/1908.07599v3
PDF	https://arxiv.org/pdf/1908.07599v3.pdf
PWC	https://paperswithcode.com/paper/190807599
Repo	https://github.com/skesiraju/BaySMM
Framework	pytorch

Continual Reinforcement Learning in 3D Non-stationary Environments


Title	Continual Reinforcement Learning in 3D Non-stationary Environments
Authors	Vincenzo Lomonaco, Karan Desai, Eugenio Culurciello, Davide Maltoni
Abstract	High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10112v1
PDF	https://arxiv.org/pdf/1905.10112v1.pdf
PWC	https://paperswithcode.com/paper/continual-reinforcement-learning-in-3d-non
Repo	https://github.com/vlomonaco/crlmaze
Framework	pytorch

BioBERT: a pre-trained biomedical language representation model for biomedical text mining


Title	BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Authors	Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang
Abstract	Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.
Tasks	Language Modelling, Medical Named Entity Recognition, Medical Relation Extraction, Named Entity Recognition, Question Answering, Relation Extraction, Sentence Classification
Published	2019-01-25
URL	https://arxiv.org/abs/1901.08746v4
PDF	https://arxiv.org/pdf/1901.08746v4.pdf
PWC	https://paperswithcode.com/paper/biobert-a-pre-trained-biomedical-language
Repo	https://github.com/ManasRMohanty/DS5500-capstone
Framework	none

Learning to Denoise Distantly-Labeled Data for Entity Typing


Title	Learning to Denoise Distantly-Labeled Data for Entity Typing
Authors	Yasumasa Onoe, Greg Durrett
Abstract	Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data.
Tasks	Denoising, Entity Typing
Published	2019-05-04
URL	https://arxiv.org/abs/1905.01566v1
PDF	https://arxiv.org/pdf/1905.01566v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-denoise-distantly-labeled-data
Repo	https://github.com/yasumasaonoe/DenoiseET
Framework	pytorch

Neural Contextual Bandits with UCB-based Exploration


Title	Neural Contextual Bandits with UCB-based Exploration
Authors	Dongruo Zhou, Lihong Li, Quanquan Gu
Abstract	We study the stochastic contextual bandit problem, where the reward is generated from an unknown function with additive noise. No assumption is made about the reward function other than boundedness. We propose a new algorithm, NeuralUCB, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under standard assumptions, NeuralUCB achieves $\tilde O(\sqrt{T})$ regret, where $T$ is the number of rounds. To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee. We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks.
Tasks	Efficient Exploration, Multi-Armed Bandits
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04462v2
PDF	https://arxiv.org/pdf/1911.04462v2.pdf
PWC	https://paperswithcode.com/paper/neural-contextual-bandits-with-upper
Repo	https://github.com/sauxpa/neural_exploration
Framework	none

Semi-Conditional Normalizing Flows for Semi-Supervised Learning


Title	Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Authors	Andrei Atanov, Alexandra Volokhova, Arsenii Ashukha, Ivan Sosnovik, Dmitry Vetrov
Abstract	This paper proposes a semi-conditional normalizing flow model for semi-supervised learning. The model uses both labelled and unlabeled data to learn an explicit model of joint distribution over objects and labels. Semi-conditional architecture of the model allows us to efficiently compute a value and gradients of the marginal likelihood for unlabeled objects. The conditional part of the model is based on a proposed conditional coupling layer. We demonstrate performance of the model for semi-supervised classification problem on different datasets. The model outperforms the baseline approach based on variational auto-encoders on MNIST dataset.
Tasks
Published	2019-05-01
URL	http://arxiv.org/abs/1905.00505v1
PDF	http://arxiv.org/pdf/1905.00505v1.pdf
PWC	https://paperswithcode.com/paper/semi-conditional-normalizing-flows-for-semi
Repo	https://github.com/bayesgroup/semi-supervised-NFs
Framework	pytorch

Emotion Recognition in Conversations with Transfer Learning from Generative Conversation Modeling


Title	Emotion Recognition in Conversations with Transfer Learning from Generative Conversation Modeling
Authors	Devamanyu Hazarika, Soujanya Poria, Roger Zimmermann, Rada Mihalcea
Abstract	Recognizing emotions in conversations is a challenging task due to the presence of contextual dependencies governed by self- and inter-personal influences. Recent approaches have focused on modeling these dependencies primarily via supervised learning. However, purely supervised strategies demand large amounts of annotated data, which is lacking in most of the available corpora in this task. To tackle this challenge, we look at transfer learning approaches as a viable alternative. Given the large amount of available conversational data, we investigate whether generative conversational models can be leveraged to transfer affective knowledge for the target task of detecting emotions in context. We propose an approach where we first train a neural dialogue model and then perform parameter transfer to initiate our target model. Apart from the traditional pre-trained sentence encoders, we also incorporate parameter transfer from the recurrent components that model inter-sentence context across the whole conversation. Based on this idea, we perform several experiments across multiple datasets and find improvement in performance and robustness against limited training data. Our models also achieve better validation performances in significantly fewer epochs. Overall, we infer that knowledge acquired from dialogue generators can indeed help recognize emotions in conversations.
Tasks	Emotion Recognition, Emotion Recognition in Conversation, Transfer Learning
Published	2019-10-11
URL	https://arxiv.org/abs/1910.04980v2
PDF	https://arxiv.org/pdf/1910.04980v2.pdf
PWC	https://paperswithcode.com/paper/emotion-recognition-in-conversations-with
Repo	https://github.com/SenticNet/conv-emotion
Framework	pytorch

Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset


Title	Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset
Authors	Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh
Abstract	Recognition of defects in concrete infrastructure, especially in bridges, is a costly and time consuming crucial first step in the assessment of the structural integrity. Large variation in appearance of the concrete material, changing illumination and weather conditions, a variety of possible surface markings as well as the possibility for different types of defects to overlap, make it a challenging real-world task. In this work we introduce the novel COncrete DEfect BRidge IMage dataset (CODEBRIM) for multi-target classification of five commonly appearing concrete defects. We investigate and compare two reinforcement learning based meta-learning approaches, MetaQNN and efficient neural architecture search, to find suitable convolutional neural network architectures for this challenging multi-class multi-target task. We show that learned architectures have fewer overall parameters in addition to yielding better multi-target accuracy in comparison to popular neural architectures from the literature evaluated in the context of our application.
Tasks	Meta-Learning, Neural Architecture Search
Published	2019-04-02
URL	http://arxiv.org/abs/1904.08486v1
PDF	http://arxiv.org/pdf/1904.08486v1.pdf
PWC	https://paperswithcode.com/paper/190408486
Repo	https://github.com/SAGNIKMJR/CODEBRIM_MetaQNN
Framework	pytorch

Associatively Segmenting Instances and Semantics in Point Clouds


Title	Associatively Segmenting Instances and Semantics in Point Clouds
Authors	Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia
Abstract	A 3D point cloud describes the real scene precisely and intuitively.To date how to segment diversified elements in such an informative 3D scene is rarely discussed. In this paper, we first introduce a simple and flexible framework to segment instances and semantics in point clouds simultaneously. Then, we propose two approaches which make the two tasks take advantage of each other, leading to a win-win situation. Specifically, we make instance segmentation benefit from semantic segmentation through learning semantic-aware point-level instance embedding. Meanwhile, semantic features of the points belonging to the same instance are fused together to make more accurate per-point semantic predictions. Our method largely outperforms the state-of-the-art method in 3D instance segmentation along with a significant improvement in 3D semantic segmentation. Code has been made available at: https://github.com/WXinlong/ASIS.
Tasks	3D Instance Segmentation, 3D Semantic Segmentation, Instance Segmentation, Semantic Segmentation
Published	2019-02-26
URL	http://arxiv.org/abs/1902.09852v2
PDF	http://arxiv.org/pdf/1902.09852v2.pdf
PWC	https://paperswithcode.com/paper/associatively-segmenting-instances-and
Repo	https://github.com/LebronGG/ASIS
Framework	tf

Adaptively Sparse Transformers


Title	Adaptively Sparse Transformers
Authors	Gonçalo M. Correia, Vlad Niculae, André F. T. Martins
Abstract	Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with $\alpha$-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the $\alpha$ parameter – which controls the shape and sparsity of $\alpha$-entmax – allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations.
Tasks	Machine Translation
Published	2019-08-30
URL	https://arxiv.org/abs/1909.00015v2
PDF	https://arxiv.org/pdf/1909.00015v2.pdf
PWC	https://paperswithcode.com/paper/adaptively-sparse-transformers
Repo	https://github.com/deep-spin/entmax
Framework	pytorch

Multidataset Independent Subspace Analysis with Application to Multimodal Fusion


Title	Multidataset Independent Subspace Analysis with Application to Multimodal Fusion
Authors	Rogers F. Silva, Sergey M. Plis, Tulay Adali, Marios S. Pattichis, Vince D. Calhoun
Abstract	In the last two decades, unsupervised latent variable models—blind source separation (BSS) especially—have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivotal insights into complex systems. To take advantage of the complex multidimensional subspace structures that capture underlying modes of shared and unique variability across and within datasets, we present a direct, principled approach to multidataset combination. We design a new method called multidataset independent subspace analysis (MISA) that leverages joint information from multiple heterogeneous datasets in a flexible and synergistic fashion. Methodological innovations exploiting the Kotz distribution for subspace modeling in conjunction with a novel combinatorial optimization for evasion of local minima enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model. We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes and low signal-to-noise ratio scenarios, promoting novel applications in both unimodal and multimodal brain imaging data.
Tasks	Combinatorial Optimization, Latent Variable Models
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04048v1
PDF	https://arxiv.org/pdf/1911.04048v1.pdf
PWC	https://paperswithcode.com/paper/multidataset-independent-subspace-analysis
Repo	https://github.com/rsilva8/MISA
Framework	none

End-to-end Lane Detection through Differentiable Least-Squares Fitting


Title	End-to-end Lane Detection through Differentiable Least-Squares Fitting
Authors	Wouter Van Gansbeke, Bert De Brabandere, Davy Neven, Marc Proesmans, Luc Van Gool
Abstract	Lane detection is typically tackled with a two-step pipeline in which a segmentation mask of the lane markings is predicted first, and a lane line model (like a parabola or spline) is fitted to the post-processed mask next. The problem with such a two-step approach is that the parameters of the network are not optimized for the true task of interest (estimating the lane curvature parameters) but for a proxy task (segmenting the lane markings), resulting in sub-optimal performance. In this work, we propose a method to train a lane detector in an end-to-end manner, directly regressing the lane parameters. The architecture consists of two components: a deep network that predicts a segmentation-like weight map for each lane line, and a differentiable least-squares fitting module that returns for each map the parameters of the best-fitting curve in the weighted least-squares sense. These parameters can subsequently be supervised with a loss function of choice. Our method relies on the observation that it is possible to backpropagate through a least-squares fitting procedure. This leads to an end-to-end method where the features are optimized for the true task of interest: the network implicitly learns to generate features that prevent instabilities during the model fitting step, as opposed to two-step pipelines that need to handle outliers with heuristics. Additionally, the system is not just a black box but offers a degree of interpretability because the intermediately generated segmentation-like weight maps can be inspected and visualized. Code and a video is available at github.com/wvangansbeke/LaneDetection_End2End.
Tasks	Lane Detection
Published	2019-02-01
URL	https://arxiv.org/abs/1902.00293v3
PDF	https://arxiv.org/pdf/1902.00293v3.pdf
PWC	https://paperswithcode.com/paper/end-to-end-lane-detection-through
Repo	https://github.com/wvangansbeke/LaneDetection_End2End
Framework	pytorch

Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning


Title	Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Authors	Nathan Kallus, Masatoshi Uehara
Abstract	Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem’s importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS’s inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages.
Tasks	Multi-Armed Bandits
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03735v1
PDF	https://arxiv.org/pdf/1906.03735v1.pdf
PWC	https://paperswithcode.com/paper/intrinsically-efficient-stable-and-bounded
Repo	https://github.com/CausalML/IntrinsicallyEfficientStableOPE
Framework	none

DDFlow: Learning Optical Flow with Unlabeled Data Distillation


Title	DDFlow: Learning Optical Flow with Unlabeled Data Distillation
Authors	Pengpeng Liu, Irwin King, Michael R. Lyu, Jia Xu
Abstract	We present DDFlow, a data distillation approach to learning optical flow estimation from unlabeled data. The approach distills reliable predictions from a teacher network, and uses these predictions as annotations to guide a student network to learn optical flow. Unlike existing work relying on hand-crafted energy terms to handle occlusion, our approach is data-driven, and learns optical flow for occluded pixels. This enables us to train our model with a much simpler loss function, and achieve a much higher accuracy. We conduct a rigorous evaluation on the challenging Flying Chairs, MPI Sintel, KITTI 2012 and 2015 benchmarks, and show that our approach significantly outperforms all existing unsupervised learning methods, while running at real time.
Tasks	Optical Flow Estimation
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09145v1
PDF	http://arxiv.org/pdf/1902.09145v1.pdf
PWC	https://paperswithcode.com/paper/ddflow-learning-optical-flow-with-unlabeled
Repo	https://github.com/ppliuboy/DDFlow
Framework	tf

BINet: a binary inpainting network for deep patch-based image compression


Title	BINet: a binary inpainting network for deep patch-based image compression
Authors	André Nortje, Willie Brink, Herman A. Engelbrecht, Herman Kamper
Abstract	Recent deep learning models outperform standard lossy image compression codecs. However, applying these models on a patch-by-patch basis requires that each image patch be encoded and decoded independently. The influence from adjacent patches is therefore lost, leading to block artefacts at low bitrates. We propose the Binary Inpainting Network (BINet), an autoencoder framework which incorporates binary inpainting to reinstate interdependencies between adjacent patches, for improved patch-based compression of still images. When decoding a patch, BINet additionally uses the binarised encodings from surrounding patches to guide its reconstruction. In contrast to sequential inpainting methods where patches are decoded based on previons reconstructions, BINet operates directly on the binary codes of surrounding patches without access to the original or reconstructed image data. Encoding and decoding can therefore be performed in parallel. We demonstrate that BINet improves the compression quality of a competitive deep image codec across a range of compression levels.
Tasks	Image Compression
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05189v1
PDF	https://arxiv.org/pdf/1912.05189v1.pdf
PWC	https://paperswithcode.com/paper/binet-a-binary-inpainting-network-for-deep
Repo	https://github.com/adnortje/binet
Framework	pytorch