October 20, 2019

3071 words 15 mins read

Paper Group AWR 312

Conditional Density Estimation with Bayesian Normalising Flows. Non-blind Image Restoration Based on Convolutional Neural Network. On Offline Evaluation of Vision-based Driving Models. The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA. Multi-Task Learning as Multi-Objective Optimization. Variat …

Conditional Density Estimation with Bayesian Normalising Flows


Title	Conditional Density Estimation with Bayesian Normalising Flows
Authors	Brian L Trippe, Richard E Turner
Abstract	Modeling complex conditional distributions is critical in a variety of settings. Despite a long tradition of research into conditional density estimation, current methods employ either simple parametric forms or are difficult to learn in practice. This paper employs normalising flows as a flexible likelihood model and presents an efficient method for fitting them to complex densities. These estimators must trade-off between modeling distributional complexity, functional complexity and heteroscedasticity without overfitting. We recognize these trade-offs as modeling decisions and develop a Bayesian framework for placing priors over these conditional density estimators using variational Bayesian neural networks. We evaluate this method on several small benchmark regression datasets, on some of which it obtains state of the art performance. Finally, we apply the method to two spatial density modeling tasks with over 1 million datapoints using the New York City yellow taxi dataset and the Chicago crime dataset.
Tasks	Density Estimation, Normalising Flows
Published	2018-02-14
URL	http://arxiv.org/abs/1802.04908v1
PDF	http://arxiv.org/pdf/1802.04908v1.pdf
PWC	https://paperswithcode.com/paper/conditional-density-estimation-with-bayesian
Repo	https://github.com/blt2114/CDE_with_BNF
Framework	tf


Title	Non-blind Image Restoration Based on Convolutional Neural Network
Authors	Kazutaka Uchida, Masayuki Tanaka, Masatoshi Okutomi
Abstract	Blind image restoration processors based on convolutional neural network (CNN) are intensively researched because of their high performance. However, they are too sensitive to the perturbation of the degradation model. They easily fail to restore the image whose degradation model is slightly different from the trained degradation model. In this paper, we propose a non-blind CNN-based image restoration processor, aiming to be robust against a perturbation of the degradation model compared to the blind restoration processor. Experimental comparisons demonstrate that the proposed non-blind CNN-based image restoration processor can robustly restore images compared to existing blind CNN-based image restoration processors.
Tasks	Image Restoration
Published	2018-09-11
URL	http://arxiv.org/abs/1809.03757v1
PDF	http://arxiv.org/pdf/1809.03757v1.pdf
PWC	https://paperswithcode.com/paper/non-blind-image-restoration-based-on
Repo	https://github.com/kuchida/NonBlindImageRestoration
Framework	none

On Offline Evaluation of Vision-based Driving Models


Title	On Offline Evaluation of Vision-based Driving Models
Authors	Felipe Codevilla, Antonio M. López, Vladlen Koltun, Alexey Dosovitskiy
Abstract	Autonomous driving models should ideally be evaluated by deploying them on a fleet of physical vehicles in the real world. Unfortunately, this approach is not practical for the vast majority of researchers. An attractive alternative is to evaluate models offline, on a pre-collected validation dataset with ground truth annotation. In this paper, we investigate the relation between various online and offline metrics for evaluation of autonomous driving models. We find that offline prediction error is not necessarily correlated with driving quality, and two models with identical prediction error can differ dramatically in their driving performance. We show that the correlation of offline evaluation with driving quality can be significantly improved by selecting an appropriate validation dataset and suitable offline metrics. The supplementary video can be viewed at https://www.youtube.com/watch?v=P8K8Z-iF0cY
Tasks	Autonomous Driving
Published	2018-09-13
URL	http://arxiv.org/abs/1809.04843v1
PDF	http://arxiv.org/pdf/1809.04843v1.pdf
PWC	https://paperswithcode.com/paper/on-offline-evaluation-of-vision-based-driving
Repo	https://github.com/felipecode/coiltraine
Framework	none

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA


Title	The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA
Authors	Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, Jenna Wiens
Abstract	During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with a difficult decision when defending against the Cavaliers: “Do you double and risk giving up easy shots, or stay at home and do the best you can?” It’s a tough call, but finding a good defensive strategy that effectively incorporates doubling can make all the difference in the NBA. In this paper, we analyze double teaming in the NBA, quantifying the trade-off between risk and reward. Using player trajectory data pertaining to over 643,000 possessions, we identified when the ball handler was double teamed. Given these data and the corresponding outcome (i.e., was the defense successful), we used deep reinforcement learning to estimate the quality of the defensive actions. We present qualitative and quantitative results summarizing our learned defensive strategy for defending. We show that our policy value estimates are predictive of points per possession and win percentage. Overall, the proposed framework represents a step toward a more comprehensive understanding of defensive strategies in the NBA.
Tasks
Published	2018-03-08
URL	http://arxiv.org/abs/1803.02940v1
PDF	http://arxiv.org/pdf/1803.02940v1.pdf
PWC	https://paperswithcode.com/paper/the-advantage-of-doubling-a-deep
Repo	https://github.com/igfox/AdvantageOfDoubling
Framework	pytorch

Multi-Task Learning as Multi-Objective Optimization


Title	Multi-Task Learning as Multi-Objective Optimization
Authors	Ozan Sener, Vladlen Koltun
Abstract	In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.
Tasks	Depth Estimation, Instance Segmentation, Multi-Label Classification, Multi-Task Learning, Scene Understanding, Semantic Segmentation
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04650v2
PDF	http://arxiv.org/pdf/1810.04650v2.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-as-multi-objective
Repo	https://github.com/IntelVCL/MultiObjectiveOptimization
Framework	pytorch

Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems


Title	Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems
Authors	Bo-Hsiang Tseng, Florian Kreyssig, Pawel Budzianowski, Inigo Casanueva, Yen-Chen Wu, Stefan Ultes, Milica Gasic
Abstract	Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given a semantic representation provided by the dialogue manager, the language generator should generate sentences that convey desired information. Traditional template-based generators can produce sentences with all necessary information, but these sentences are not sufficiently diverse. With RNN-based models, the diversity of the generated sentences can be high, however, in the process some information is lost. In this work, we improve an RNN-based generator by considering latent information at the sentence level during generation using the conditional variational autoencoder architecture. We demonstrate that our model outperforms the original RNN-based generator, while yielding highly diverse sentences. In addition, our model performs better when the training data is limited.
Tasks	Spoken Dialogue Systems, Text Generation
Published	2018-12-20
URL	http://arxiv.org/abs/1812.08879v1
PDF	http://arxiv.org/pdf/1812.08879v1.pdf
PWC	https://paperswithcode.com/paper/variational-cross-domain-natural-language
Repo	https://github.com/andy194673/nlg-scvae
Framework	pytorch

DSKG: A Deep Sequential Model for Knowledge Graph Completion


Title	DSKG: A Deep Sequential Model for Knowledge Graph Completion
Authors	Lingbing Guo, Qingheng Zhang, Weiyi Ge, Wei Hu, Yuzhong Qu
Abstract	Knowledge graph (KG) completion aims to fill the missing facts in a KG, where a fact is represented as a triple in the form of $(subject, relation, object)$. Current KG completion models compel two-thirds of a triple provided (e.g., $subject$ and $relation$) to predict the remaining one. In this paper, we propose a new model, which uses a KG-specific multi-layer recurrent neural network (RNN) to model triples in a KG as sequences. It outperformed several state-of-the-art KG completion models on the conventional entity prediction task for many evaluation metrics, based on two benchmark datasets and a more difficult dataset. Furthermore, our model is enabled by the sequential characteristic and thus capable of predicting the whole triples only given one entity. Our experiments demonstrated that our model achieved promising performance on this new triple prediction task.
Tasks	Knowledge Graph Completion
Published	2018-10-30
URL	http://arxiv.org/abs/1810.12582v2
PDF	http://arxiv.org/pdf/1810.12582v2.pdf
PWC	https://paperswithcode.com/paper/dskg-a-deep-sequential-model-for-knowledge
Repo	https://github.com/nju-websoft/DSKG
Framework	tf

Offline Multi-Action Policy Learning: Generalization and Optimization


Title	Offline Multi-Action Policy Learning: Generalization and Optimization
Authors	Zhengyuan Zhou, Susan Athey, Stefan Wager
Abstract	In many settings, a decision-maker wishes to learn a rule, or policy, that maps from observable characteristics of an individual to an action. Examples include selecting offers, prices, advertisements, or emails to send to consumers, as well as the problem of determining which medication to prescribe to a patient. While there is a growing body of literature devoted to this problem, most existing results are focused on the case where data comes from a randomized experiment, and further, there are only two possible actions, such as giving a drug to a patient or not. In this paper, we study the offline multi-action policy learning problem with observational data and where the policy may need to respect budget constraints or belong to a restricted policy class such as decision trees. We build on the theory of efficient semi-parametric inference in order to propose and implement a policy learning algorithm that achieves asymptotically minimax-optimal regret. To the best of our knowledge, this is the first result of this type in the multi-action setup, and it provides a substantial performance improvement over the existing learning algorithms. We then consider additional computational challenges that arise in implementing our method for the case where the policy is restricted to take the form of a decision tree. We propose two different approaches, one using a mixed integer program formulation and the other using a tree-search based algorithm.
Tasks
Published	2018-10-10
URL	http://arxiv.org/abs/1810.04778v2
PDF	http://arxiv.org/pdf/1810.04778v2.pdf
PWC	https://paperswithcode.com/paper/offline-multi-action-policy-learning
Repo	https://github.com/grf-labs/policyTree
Framework	none

Hybrid semi-Markov CRF for Neural Sequence Labeling


Title	Hybrid semi-Markov CRF for Neural Sequence Labeling
Authors	Zhi-Xiu Ye, Zhen-Hua Ling
Abstract	This paper proposes hybrid semi-Markov conditional random fields (SCRFs) for neural sequence labeling in natural language processing. Based on conventional conditional random fields (CRFs), SCRFs have been designed for the tasks of assigning labels to segments by extracting features from and describing transitions between segments instead of words. In this paper, we improve the existing SCRF methods by employing word-level and segment-level information simultaneously. First, word-level labels are utilized to derive the segment scores in SCRFs. Second, a CRF output layer and an SCRF output layer are integrated into an unified neural network and trained jointly. Experimental results on CoNLL 2003 named entity recognition (NER) shared task show that our model achieves state-of-the-art performance when no external knowledge is used.
Tasks	Named Entity Recognition
Published	2018-05-10
URL	http://arxiv.org/abs/1805.03838v1
PDF	http://arxiv.org/pdf/1805.03838v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-semi-markov-crf-for-neural-sequence
Repo	https://github.com/ZhixiuYe/HSCRF-pytorch
Framework	pytorch

Large-Margin Classification in Hyperbolic Space


Title	Large-Margin Classification in Hyperbolic Space
Authors	Hyunghoon Cho, Benjamin DeMeo, Jian Peng, Bonnie Berger
Abstract	Representing data in hyperbolic space can effectively capture latent hierarchical relationships. With the goal of enabling accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and elucidate through new theoretical work its connection to the Euclidean counterpart. We demonstrate the performance improvement of hyperbolic SVM for multi-class prediction tasks on real-world complex networks as well as simulated datasets. Our work allows analytic pipelines that take the inherent hyperbolic geometry of the data into account in an end-to-end fashion without resorting to ill-fitting tools developed for Euclidean space.
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00437v1
PDF	http://arxiv.org/pdf/1806.00437v1.pdf
PWC	https://paperswithcode.com/paper/large-margin-classification-in-hyperbolic
Repo	https://github.com/plumdeq/hsvm
Framework	pytorch

Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network


Title	Position Detection and Direction Prediction for Arbitrary-Oriented Ships via Multitask Rotation Region Convolutional Neural Network
Authors	Xue Yang, Hao Sun, Xian Sun, Menglong Yan, Zhi Guo, Kun Fu
Abstract	Ship detection is of great importance and full of challenges in the field of remote sensing. The complexity of application scenarios, the redundancy of detection region, and the difficulty of dense ship detection are all the main obstacles that limit the successful operation of traditional methods in ship detection. In this paper, we propose a brand new detection model based on multitask rotational region convolutional neural network to solve the problems above. This model is mainly consist of five consecutive parts: Dense Feature Pyramid Network (DFPN), adaptive region of interest (ROI) Align, rotational bounding box regression, prow direction prediction and rotational nonmaximum suppression (R-NMS). First of all, the low-level location information and high-level semantic information are fully utilized through multiscale feature networks. Then, we design Adaptive ROI Align to obtain high quality proposals which remain complete spatial and semantic information. Unlike most previous approaches, the prediction obtained by our method is the minimum bounding rectangle of the object with less redundant regions. Therefore, rotational region detection framework is more suitable to detect the dense object than traditional detection model. Additionally, we can find the berthing and sailing direction of ship through prediction. A detailed evaluation based on SRSS for rotation detection shows that our detection method has a competitive performance.
Tasks
Published	2018-06-13
URL	http://arxiv.org/abs/1806.04828v2
PDF	http://arxiv.org/pdf/1806.04828v2.pdf
PWC	https://paperswithcode.com/paper/position-detection-and-direction-prediction
Repo	https://github.com/DetectionTeamUCAS/RRPN_Faster-RCNN_Tensorflow
Framework	tf

Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans


Title	Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans
Authors	Sasank Chilamkurthy, Rohit Ghosh, Swetha Tanamala, Mustafa Biviji, Norbert G. Campeau, Vasantha Kumar Venugopal, Vidur Mahajan, Pooja Rao, Prashant Warier
Abstract	Importance: Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. Objective: To develop and validate a set of deep learning algorithms for automated detection of following key findings from non-contrast head CT scans: intracranial hemorrhage (ICH) and its types, intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), extradural (EDH) and subarachnoid (SAH) hemorrhages, calvarial fractures, midline shift and mass effect. Design and Settings: We retrospectively collected a dataset containing 313,318 head CT scans along with their clinical reports from various centers. A part of this dataset (Qure25k dataset) was used to validate and the rest to develop algorithms. Additionally, a dataset (CQ500 dataset) was collected from different centers in two batches B1 & B2 to clinically validate the algorithms. Main Outcomes and Measures: Original clinical radiology report and consensus of three independent radiologists were considered as gold standard for Qure25k and CQ500 datasets respectively. Area under receiver operating characteristics curve (AUC) for each finding was primarily used to evaluate the algorithms. Results: Qure25k dataset contained 21,095 scans (mean age 43.31; 42.87% female) while batches B1 and B2 of CQ500 dataset consisted of 214 (mean age 43.40; 43.92% female) and 277 (mean age 51.70; 30.31% female) scans respectively. On Qure25k dataset, the algorithms achieved AUCs of 0.9194, 0.8977, 0.9559, 0.9161, 0.9288 and 0.9044 for detecting ICH, IPH, IVH, SDH, EDH and SAH respectively. AUCs for the same on CQ500 dataset were 0.9419, 0.9544, 0.9310, 0.9521, 0.9731 and 0.9574 respectively. For detecting calvarial fractures, midline shift and mass effect, AUCs on Qure25k dataset were 0.9244, 0.9276 and 0.8583 respectively, while AUCs on CQ500 dataset were 0.9624, 0.9697 and 0.9216 respectively.
Tasks
Published	2018-03-13
URL	http://arxiv.org/abs/1803.05854v2
PDF	http://arxiv.org/pdf/1803.05854v2.pdf
PWC	https://paperswithcode.com/paper/development-and-validation-of-deep-learning
Repo	https://github.com/jarodroland/ConvOuch
Framework	none

Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text


Title	Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text
Authors	Iroro Orife
Abstract	Yor`ub'a is a widely spoken West African language with a writing system rich in tonal and orthographic diacritics. With very few exceptions, diacritics are omitted from electronic texts, due to limited device and application support. Diacritics provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any Yor`ub'a text-to-speech (TTS), automatic speech recognition (ASR) and natural language processing (NLP) tasks. Reframing Automatic Diacritic Restoration (ADR) as a machine translation task, we experiment with two different attentive Sequence-to-Sequence neural models to process undiacritized text. On our evaluation dataset, this approach produces diacritization error rates of less than 5%. We have released pre-trained models, datasets and source-code as an open-source project to advance efforts on Yor`ub'a language technology.
Tasks	Machine Translation, Speech Recognition
Published	2018-04-03
URL	http://arxiv.org/abs/1804.00832v2
PDF	http://arxiv.org/pdf/1804.00832v2.pdf
PWC	https://paperswithcode.com/paper/attentive-sequence-to-sequence-learning-for
Repo	https://github.com/Niger-Volta-LTI/yoruba-adr
Framework	pytorch

W-TALC: Weakly-supervised Temporal Activity Localization and Classification


Title	W-TALC: Weakly-supervised Temporal Activity Localization and Classification
Authors	Sujoy Paul, Sourya Roy, Amit K Roy-Chowdhury
Abstract	Most activity localization methods in the literature suffer from the burden of frame-wise annotation requirement. Learning from weak labels may be a potential solution towards reducing such manual labeling effort. Recent years have witnessed a substantial influx of tagged videos on the Internet, which can serve as a rich source of weakly-supervised training data. Specifically, the correlations between videos with similar tags can be utilized to temporally localize the activities. Towards this goal, we present W-TALC, a Weakly-supervised Temporal Activity Localization and Classification framework using only video-level labels. The proposed network can be divided into two sub-networks, namely the Two-Stream based feature extractor network and a weakly-supervised module, which we learn by optimizing two complimentary loss functions. Qualitative and quantitative results on two challenging datasets - Thumos14 and ActivityNet1.2, demonstrate that the proposed method is able to detect activities at a fine granularity and achieve better performance than current state-of-the-art methods.
Tasks	Weakly Supervised Action Localization
Published	2018-07-27
URL	http://arxiv.org/abs/1807.10418v3
PDF	http://arxiv.org/pdf/1807.10418v3.pdf
PWC	https://paperswithcode.com/paper/w-talc-weakly-supervised-temporal-activity
Repo	https://github.com/sujoyp/wtalc-pytorch
Framework	pytorch

Recurrent Convolutional Fusion for RGB-D Object Recognition


Title	Recurrent Convolutional Fusion for RGB-D Object Recognition
Authors	Mohammad Reza Loghmani, Mirco Planamente, Barbara Caputo, Markus Vincze
Abstract	Providing machines with the ability to recognize objects like humans has always been one of the primary goals of machine vision. The introduction of RGB-D cameras has paved the way for a significant leap forward in this direction thanks to the rich information provided by these sensors. However, the machine vision community still lacks an effective method to synergically use the RGB and depth data to improve object recognition. In order to take a step in this direction, we introduce a novel end-to-end architecture for RGB-D object recognition called recurrent convolutional fusion (RCFusion). Our method generates compact and highly discriminative multi-modal features by combining complementary RGB and depth information representing different levels of abstraction. Extensive experiments on two popular datasets, RGB-D Object Dataset and JHUIT-50, show that RCFusion significantly outperforms state-of-the-art approaches in both the object categorization and instance recognition tasks.
Tasks	Object Recognition
Published	2018-06-05
URL	http://arxiv.org/abs/1806.01673v3
PDF	http://arxiv.org/pdf/1806.01673v3.pdf
PWC	https://paperswithcode.com/paper/recurrent-convolutional-fusion-for-rgb-d
Repo	https://github.com/MRLoghmani/rcfusion
Framework	tf