Paper Group AWR 143
Learning document embeddings along with their uncertainties. Continual Reinforcement Learning in 3D Non-stationary Environments. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Learning to Denoise Distantly-Labeled Data for Entity Typing. Neural Contextual Bandits with UCB-based Exploration. Semi-Conditio …
Learning document embeddings along with their uncertainties
Title | Learning document embeddings along with their uncertainties |
Authors | Santosh Kesiraju, Oldřich Plchot, Lukáš Burget, Suryakanth V Gangashetty |
Abstract | Majority of the text modelling techniques yield only point-estimates of document embeddings and lack in capturing the uncertainty of the estimates. These uncertainties give a notion of how well the embeddings represent a document. We present Bayesian subspace multinomial model (Bayesian SMM), a generative log-linear model that learns to represent documents in the form of Gaussian distributions, thereby encoding the uncertainty in its co-variance. Additionally, in the proposed Bayesian SMM, we address a commonly encountered problem of intractability that appears during variational inference in mixed-logit models. We also present a generative Gaussian linear classifier for topic identification that exploits the uncertainty in document embeddings. Our intrinsic evaluation using perplexity measure shows that the proposed Bayesian SMM fits the data better as compared to the state-of-the-art neural variational document model on Fisher speech and 20Newsgroups text corpora. Our topic identification experiments show that the proposed systems are robust to over-fitting on unseen test data. The topic ID results show that the proposed model is outperforms state-of-the-art unsupervised topic models and achieve comparable results to the state-of-the-art fully supervised discriminative models. |
Tasks | Topic Models |
Published | 2019-08-20 |
URL | https://arxiv.org/abs/1908.07599v3 |
https://arxiv.org/pdf/1908.07599v3.pdf | |
PWC | https://paperswithcode.com/paper/190807599 |
Repo | https://github.com/skesiraju/BaySMM |
Framework | pytorch |
Continual Reinforcement Learning in 3D Non-stationary Environments
Title | Continual Reinforcement Learning in 3D Non-stationary Environments |
Authors | Vincenzo Lomonaco, Karan Desai, Eugenio Culurciello, Davide Maltoni |
Abstract | High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations. |
Tasks | |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10112v1 |
https://arxiv.org/pdf/1905.10112v1.pdf | |
PWC | https://paperswithcode.com/paper/continual-reinforcement-learning-in-3d-non |
Repo | https://github.com/vlomonaco/crlmaze |
Framework | pytorch |
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Title | BioBERT: a pre-trained biomedical language representation model for biomedical text mining |
Authors | Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang |
Abstract | Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert. |
Tasks | Language Modelling, Medical Named Entity Recognition, Medical Relation Extraction, Named Entity Recognition, Question Answering, Relation Extraction, Sentence Classification |
Published | 2019-01-25 |
URL | https://arxiv.org/abs/1901.08746v4 |
https://arxiv.org/pdf/1901.08746v4.pdf | |
PWC | https://paperswithcode.com/paper/biobert-a-pre-trained-biomedical-language |
Repo | https://github.com/ManasRMohanty/DS5500-capstone |
Framework | none |
Learning to Denoise Distantly-Labeled Data for Entity Typing
Title | Learning to Denoise Distantly-Labeled Data for Entity Typing |
Authors | Yasumasa Onoe, Greg Durrett |
Abstract | Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data. |
Tasks | Denoising, Entity Typing |
Published | 2019-05-04 |
URL | https://arxiv.org/abs/1905.01566v1 |
https://arxiv.org/pdf/1905.01566v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-denoise-distantly-labeled-data |
Repo | https://github.com/yasumasaonoe/DenoiseET |
Framework | pytorch |
Neural Contextual Bandits with UCB-based Exploration
Title | Neural Contextual Bandits with UCB-based Exploration |
Authors | Dongruo Zhou, Lihong Li, Quanquan Gu |
Abstract | We study the stochastic contextual bandit problem, where the reward is generated from an unknown function with additive noise. No assumption is made about the reward function other than boundedness. We propose a new algorithm, NeuralUCB, which leverages the representation power of deep neural networks and uses a neural network-based random feature mapping to construct an upper confidence bound (UCB) of reward for efficient exploration. We prove that, under standard assumptions, NeuralUCB achieves $\tilde O(\sqrt{T})$ regret, where $T$ is the number of rounds. To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee. We also show the algorithm is empirically competitive against representative baselines in a number of benchmarks. |
Tasks | Efficient Exploration, Multi-Armed Bandits |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04462v2 |
https://arxiv.org/pdf/1911.04462v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-contextual-bandits-with-upper |
Repo | https://github.com/sauxpa/neural_exploration |
Framework | none |
Semi-Conditional Normalizing Flows for Semi-Supervised Learning
Title | Semi-Conditional Normalizing Flows for Semi-Supervised Learning |
Authors | Andrei Atanov, Alexandra Volokhova, Arsenii Ashukha, Ivan Sosnovik, Dmitry Vetrov |
Abstract | This paper proposes a semi-conditional normalizing flow model for semi-supervised learning. The model uses both labelled and unlabeled data to learn an explicit model of joint distribution over objects and labels. Semi-conditional architecture of the model allows us to efficiently compute a value and gradients of the marginal likelihood for unlabeled objects. The conditional part of the model is based on a proposed conditional coupling layer. We demonstrate performance of the model for semi-supervised classification problem on different datasets. The model outperforms the baseline approach based on variational auto-encoders on MNIST dataset. |
Tasks | |
Published | 2019-05-01 |
URL | http://arxiv.org/abs/1905.00505v1 |
http://arxiv.org/pdf/1905.00505v1.pdf | |
PWC | https://paperswithcode.com/paper/semi-conditional-normalizing-flows-for-semi |
Repo | https://github.com/bayesgroup/semi-supervised-NFs |
Framework | pytorch |
Emotion Recognition in Conversations with Transfer Learning from Generative Conversation Modeling
Title | Emotion Recognition in Conversations with Transfer Learning from Generative Conversation Modeling |
Authors | Devamanyu Hazarika, Soujanya Poria, Roger Zimmermann, Rada Mihalcea |
Abstract | Recognizing emotions in conversations is a challenging task due to the presence of contextual dependencies governed by self- and inter-personal influences. Recent approaches have focused on modeling these dependencies primarily via supervised learning. However, purely supervised strategies demand large amounts of annotated data, which is lacking in most of the available corpora in this task. To tackle this challenge, we look at transfer learning approaches as a viable alternative. Given the large amount of available conversational data, we investigate whether generative conversational models can be leveraged to transfer affective knowledge for the target task of detecting emotions in context. We propose an approach where we first train a neural dialogue model and then perform parameter transfer to initiate our target model. Apart from the traditional pre-trained sentence encoders, we also incorporate parameter transfer from the recurrent components that model inter-sentence context across the whole conversation. Based on this idea, we perform several experiments across multiple datasets and find improvement in performance and robustness against limited training data. Our models also achieve better validation performances in significantly fewer epochs. Overall, we infer that knowledge acquired from dialogue generators can indeed help recognize emotions in conversations. |
Tasks | Emotion Recognition, Emotion Recognition in Conversation, Transfer Learning |
Published | 2019-10-11 |
URL | https://arxiv.org/abs/1910.04980v2 |
https://arxiv.org/pdf/1910.04980v2.pdf | |
PWC | https://paperswithcode.com/paper/emotion-recognition-in-conversations-with |
Repo | https://github.com/SenticNet/conv-emotion |
Framework | pytorch |
Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset
Title | Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset |
Authors | Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh |
Abstract | Recognition of defects in concrete infrastructure, especially in bridges, is a costly and time consuming crucial first step in the assessment of the structural integrity. Large variation in appearance of the concrete material, changing illumination and weather conditions, a variety of possible surface markings as well as the possibility for different types of defects to overlap, make it a challenging real-world task. In this work we introduce the novel COncrete DEfect BRidge IMage dataset (CODEBRIM) for multi-target classification of five commonly appearing concrete defects. We investigate and compare two reinforcement learning based meta-learning approaches, MetaQNN and efficient neural architecture search, to find suitable convolutional neural network architectures for this challenging multi-class multi-target task. We show that learned architectures have fewer overall parameters in addition to yielding better multi-target accuracy in comparison to popular neural architectures from the literature evaluated in the context of our application. |
Tasks | Meta-Learning, Neural Architecture Search |
Published | 2019-04-02 |
URL | http://arxiv.org/abs/1904.08486v1 |
http://arxiv.org/pdf/1904.08486v1.pdf | |
PWC | https://paperswithcode.com/paper/190408486 |
Repo | https://github.com/SAGNIKMJR/CODEBRIM_MetaQNN |
Framework | pytorch |
Associatively Segmenting Instances and Semantics in Point Clouds
Title | Associatively Segmenting Instances and Semantics in Point Clouds |
Authors | Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia |
Abstract | A 3D point cloud describes the real scene precisely and intuitively.To date how to segment diversified elements in such an informative 3D scene is rarely discussed. In this paper, we first introduce a simple and flexible framework to segment instances and semantics in point clouds simultaneously. Then, we propose two approaches which make the two tasks take advantage of each other, leading to a win-win situation. Specifically, we make instance segmentation benefit from semantic segmentation through learning semantic-aware point-level instance embedding. Meanwhile, semantic features of the points belonging to the same instance are fused together to make more accurate per-point semantic predictions. Our method largely outperforms the state-of-the-art method in 3D instance segmentation along with a significant improvement in 3D semantic segmentation. Code has been made available at: https://github.com/WXinlong/ASIS. |
Tasks | 3D Instance Segmentation, 3D Semantic Segmentation, Instance Segmentation, Semantic Segmentation |
Published | 2019-02-26 |
URL | http://arxiv.org/abs/1902.09852v2 |
http://arxiv.org/pdf/1902.09852v2.pdf | |
PWC | https://paperswithcode.com/paper/associatively-segmenting-instances-and |
Repo | https://github.com/LebronGG/ASIS |
Framework | tf |
Adaptively Sparse Transformers
Title | Adaptively Sparse Transformers |
Authors | Gonçalo M. Correia, Vlad Niculae, André F. T. Martins |
Abstract | Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with $\alpha$-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the $\alpha$ parameter – which controls the shape and sparsity of $\alpha$-entmax – allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations. |
Tasks | Machine Translation |
Published | 2019-08-30 |
URL | https://arxiv.org/abs/1909.00015v2 |
https://arxiv.org/pdf/1909.00015v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptively-sparse-transformers |
Repo | https://github.com/deep-spin/entmax |
Framework | pytorch |
Multidataset Independent Subspace Analysis with Application to Multimodal Fusion
Title | Multidataset Independent Subspace Analysis with Application to Multimodal Fusion |
Authors | Rogers F. Silva, Sergey M. Plis, Tulay Adali, Marios S. Pattichis, Vince D. Calhoun |
Abstract | In the last two decades, unsupervised latent variable models—blind source separation (BSS) especially—have enjoyed a strong reputation for the interpretable features they produce. Seldom do these models combine the rich diversity of information available in multiple datasets. Multidatasets, on the other hand, yield joint solutions otherwise unavailable in isolation, with a potential for pivotal insights into complex systems. To take advantage of the complex multidimensional subspace structures that capture underlying modes of shared and unique variability across and within datasets, we present a direct, principled approach to multidataset combination. We design a new method called multidataset independent subspace analysis (MISA) that leverages joint information from multiple heterogeneous datasets in a flexible and synergistic fashion. Methodological innovations exploiting the Kotz distribution for subspace modeling in conjunction with a novel combinatorial optimization for evasion of local minima enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model. We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes and low signal-to-noise ratio scenarios, promoting novel applications in both unimodal and multimodal brain imaging data. |
Tasks | Combinatorial Optimization, Latent Variable Models |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04048v1 |
https://arxiv.org/pdf/1911.04048v1.pdf | |
PWC | https://paperswithcode.com/paper/multidataset-independent-subspace-analysis |
Repo | https://github.com/rsilva8/MISA |
Framework | none |
End-to-end Lane Detection through Differentiable Least-Squares Fitting
Title | End-to-end Lane Detection through Differentiable Least-Squares Fitting |
Authors | Wouter Van Gansbeke, Bert De Brabandere, Davy Neven, Marc Proesmans, Luc Van Gool |
Abstract | Lane detection is typically tackled with a two-step pipeline in which a segmentation mask of the lane markings is predicted first, and a lane line model (like a parabola or spline) is fitted to the post-processed mask next. The problem with such a two-step approach is that the parameters of the network are not optimized for the true task of interest (estimating the lane curvature parameters) but for a proxy task (segmenting the lane markings), resulting in sub-optimal performance. In this work, we propose a method to train a lane detector in an end-to-end manner, directly regressing the lane parameters. The architecture consists of two components: a deep network that predicts a segmentation-like weight map for each lane line, and a differentiable least-squares fitting module that returns for each map the parameters of the best-fitting curve in the weighted least-squares sense. These parameters can subsequently be supervised with a loss function of choice. Our method relies on the observation that it is possible to backpropagate through a least-squares fitting procedure. This leads to an end-to-end method where the features are optimized for the true task of interest: the network implicitly learns to generate features that prevent instabilities during the model fitting step, as opposed to two-step pipelines that need to handle outliers with heuristics. Additionally, the system is not just a black box but offers a degree of interpretability because the intermediately generated segmentation-like weight maps can be inspected and visualized. Code and a video is available at github.com/wvangansbeke/LaneDetection_End2End. |
Tasks | Lane Detection |
Published | 2019-02-01 |
URL | https://arxiv.org/abs/1902.00293v3 |
https://arxiv.org/pdf/1902.00293v3.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-lane-detection-through |
Repo | https://github.com/wvangansbeke/LaneDetection_End2End |
Framework | pytorch |
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Title | Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning |
Authors | Nathan Kallus, Masatoshi Uehara |
Abstract | Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem’s importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ensure semiparametric local efficiency if Q-functions are well-specified, but if they are not they can be worse than both IS and SNIS. It also does not enjoy SNIS’s inherent stability and boundedness. We propose new estimators for OPE based on empirical likelihood that are always more efficient than IS, SNIS, and DR and satisfy the same stability and boundedness properties as SNIS. On the way, we categorize various properties and classify existing estimators by them. Besides the theoretical guarantees, empirical studies suggest the new estimators provide advantages. |
Tasks | Multi-Armed Bandits |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03735v1 |
https://arxiv.org/pdf/1906.03735v1.pdf | |
PWC | https://paperswithcode.com/paper/intrinsically-efficient-stable-and-bounded |
Repo | https://github.com/CausalML/IntrinsicallyEfficientStableOPE |
Framework | none |
DDFlow: Learning Optical Flow with Unlabeled Data Distillation
Title | DDFlow: Learning Optical Flow with Unlabeled Data Distillation |
Authors | Pengpeng Liu, Irwin King, Michael R. Lyu, Jia Xu |
Abstract | We present DDFlow, a data distillation approach to learning optical flow estimation from unlabeled data. The approach distills reliable predictions from a teacher network, and uses these predictions as annotations to guide a student network to learn optical flow. Unlike existing work relying on hand-crafted energy terms to handle occlusion, our approach is data-driven, and learns optical flow for occluded pixels. This enables us to train our model with a much simpler loss function, and achieve a much higher accuracy. We conduct a rigorous evaluation on the challenging Flying Chairs, MPI Sintel, KITTI 2012 and 2015 benchmarks, and show that our approach significantly outperforms all existing unsupervised learning methods, while running at real time. |
Tasks | Optical Flow Estimation |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09145v1 |
http://arxiv.org/pdf/1902.09145v1.pdf | |
PWC | https://paperswithcode.com/paper/ddflow-learning-optical-flow-with-unlabeled |
Repo | https://github.com/ppliuboy/DDFlow |
Framework | tf |
BINet: a binary inpainting network for deep patch-based image compression
Title | BINet: a binary inpainting network for deep patch-based image compression |
Authors | André Nortje, Willie Brink, Herman A. Engelbrecht, Herman Kamper |
Abstract | Recent deep learning models outperform standard lossy image compression codecs. However, applying these models on a patch-by-patch basis requires that each image patch be encoded and decoded independently. The influence from adjacent patches is therefore lost, leading to block artefacts at low bitrates. We propose the Binary Inpainting Network (BINet), an autoencoder framework which incorporates binary inpainting to reinstate interdependencies between adjacent patches, for improved patch-based compression of still images. When decoding a patch, BINet additionally uses the binarised encodings from surrounding patches to guide its reconstruction. In contrast to sequential inpainting methods where patches are decoded based on previons reconstructions, BINet operates directly on the binary codes of surrounding patches without access to the original or reconstructed image data. Encoding and decoding can therefore be performed in parallel. We demonstrate that BINet improves the compression quality of a competitive deep image codec across a range of compression levels. |
Tasks | Image Compression |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05189v1 |
https://arxiv.org/pdf/1912.05189v1.pdf | |
PWC | https://paperswithcode.com/paper/binet-a-binary-inpainting-network-for-deep |
Repo | https://github.com/adnortje/binet |
Framework | pytorch |