January 28, 2020

3077 words 15 mins read

Paper Group ANR 910

Answer Them All! Toward Universal Visual Question Answering Models. Hybrid Camera Pose Estimation with Online Partitioning. Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator. Deep Structured Cross-Modal Anomaly Detection. DublinCity: Annotated LiDAR Point Cloud and its Applications. Task Agnostic Continual Learning via Meta L …

Answer Them All! Toward Universal Visual Question Answering Models


Title	Answer Them All! Toward Universal Visual Question Answering Models
Authors	Robik Shrestha, Kushal Kafle, Christopher Kanan
Abstract	Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning. A good VQA algorithm should be capable of both, but only a few VQA algorithms are tested in this manner. We compare five state-of-the-art VQA algorithms across eight VQA datasets covering both domains. To make the comparison fair, all of the models are standardized as much as possible, e.g., they use the same visual features, answer vocabularies, etc. We find that methods do not generalize across the two domains. To address this problem, we propose a new VQA algorithm that rivals or exceeds the state-of-the-art for both domains.
Tasks	Question Answering, Visual Question Answering
Published	2019-03-01
URL	http://arxiv.org/abs/1903.00366v2
PDF	http://arxiv.org/pdf/1903.00366v2.pdf
PWC	https://paperswithcode.com/paper/answer-them-all-toward-universal-visual
Repo
Framework

Hybrid Camera Pose Estimation with Online Partitioning


Title	Hybrid Camera Pose Estimation with Online Partitioning
Authors	Xinyi Li, Haibin Ling
Abstract	This paper presents a hybrid real-time camera pose estimation framework with a novel partitioning scheme and introduces motion averaging to on-line monocular systems. Breaking through the limitations of fixed-size temporal partitioning in most conventional pose estimation mechanisms, the proposed approach significantly improves the accuracy of local bundle adjustment by gathering spatially-strongly-connected cameras into each block. With the dynamic initialization using intermediate computation values, our proposed self-adaptive Levenberg-Marquardt solver achieves a quadratic convergence rate to further enhance the efficiency of the local optimization. Moreover, the dense data association between blocks by virtue of our co-visibility-based partitioning enables us to explore and implement motion averaging to efficiently align the blocks globally, updating camera motion estimations on-the-fly. Experiment results on benchmarks convincingly demonstrate the practicality and robustness of our proposed approach by outperforming conventional bundle adjustment by orders of magnitude.
Tasks	Pose Estimation
Published	2019-08-05
URL	https://arxiv.org/abs/1908.01797v1
PDF	https://arxiv.org/pdf/1908.01797v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-camera-pose-estimation-with-online
Repo
Framework

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator


Title	Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator
Authors	Tian Zhao, Yaqi Zhang, Kunle Olukotun
Abstract	Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads. Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel data movement and resource underutilization. We show that by supporting more general loop constructs that capture design parameters in accelerators, it is possible to improve resource utilization using cross-kernel optimization without sacrificing programmability. Such abstraction level enables a design space search that can lead to efficient usage of on-chip resources on a spatial architecture across a range of problem sizes. We evaluate our optimization strategy on such abstraction with DeepBench using a configurable spatial accelerator. We demonstrate that this implementation provides a geometric speedup of 30x in performance, 1.6x in area, and 2x in power efficiency compared to a Tesla V100 GPU, and a geometric speedup of 2x compared to Microsoft Brainwave implementation on a Stratix 10 FPGA.
Tasks
Published	2019-09-26
URL	https://arxiv.org/abs/1909.13654v1
PDF	https://arxiv.org/pdf/1909.13654v1.pdf
PWC	https://paperswithcode.com/paper/serving-recurrent-neural-networks-efficiently
Repo
Framework


Title	Deep Structured Cross-Modal Anomaly Detection
Authors	Yuening Li, Ninghao Liu, Jundong Li, Mengnan Du, Xia Hu
Abstract	Anomaly detection is a fundamental problem in data mining field with many real-world applications. A vast majority of existing anomaly detection methods predominately focused on data collected from a single source. In real-world applications, instances often have multiple types of features, such as images (ID photos, finger prints) and texts (bank transaction histories, user online social media posts), resulting in the so-called multi-modal data. In this paper, we focus on identifying anomalies whose patterns are disparate across different modalities, i.e., cross-modal anomalies. Some of the data instances within a multi-modal context are often not anomalous when they are viewed separately in each individual modality, but contains inconsistent patterns when multiple sources are jointly considered. The existence of multi-modal data in many real-world scenarios brings both opportunities and challenges to the canonical task of anomaly detection. On the one hand, in multi-modal data, information of different modalities may complement each other in improving the detection performance. On the other hand, complicated distributions across different modalities call for a principled framework to characterize their inherent and complex correlations, which is often difficult to capture with conventional linear models. To this end, we propose a novel deep structured anomaly detection framework to identify the cross-modal anomalies embedded in the data. Experiments on real-world datasets demonstrate the effectiveness of the proposed framework comparing with the state-of-the-art.
Tasks	Anomaly Detection
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03848v1
PDF	https://arxiv.org/pdf/1908.03848v1.pdf
PWC	https://paperswithcode.com/paper/deep-structured-cross-modal-anomaly-detection
Repo
Framework

DublinCity: Annotated LiDAR Point Cloud and its Applications


Title	DublinCity: Annotated LiDAR Point Cloud and its Applications
Authors	S. M. Iman Zolanvari, Susana Ruano, Aakanksha Rana, Alan Cummins, Rogerio Eduardo da Silva, Morteza Rahbar, Aljosa Smolic
Abstract	Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100’000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements. To validate the performance of our dataset, two different applications are showcased. Firstly, the labelled point cloud is employed for training Convolutional Neural Networks (CNNs) to classify urban elements. The dataset is tested on the well-known state-of-the-art CNNs (i.e., PointNet, PointNet++ and So-Net). Secondly, the complete ALS dataset is applied as detailed ground truth for city-scale image-based 3D reconstruction.
Tasks	3D Reconstruction, Scene Understanding
Published	2019-09-06
URL	https://arxiv.org/abs/1909.03613v1
PDF	https://arxiv.org/pdf/1909.03613v1.pdf
PWC	https://paperswithcode.com/paper/dublincity-annotated-lidar-point-cloud-and
Repo
Framework

Task Agnostic Continual Learning via Meta Learning


Title	Task Agnostic Continual Learning via Meta Learning
Authors	Xu He, Jakub Sygnowski, Alexandre Galashov, Andrei A. Rusu, Yee Whye Teh, Razvan Pascanu
Abstract	While neural networks are powerful function approximators, they suffer from catastrophic forgetting when the data distribution is not stationary. One particular formalism that studies learning under non-stationary distribution is provided by continual learning, where the non-stationarity is imposed by a sequence of distinct tasks. Most methods in this space assume, however, the knowledge of task boundaries, and focus on alleviating catastrophic forgetting. In this work, we depart from this view and move the focus towards faster remembering – i.e measuring how quickly the network recovers performance rather than measuring the network’s performance without any adaptation. We argue that in many settings this can be more effective and that it opens the door to combining meta-learning and continual learning techniques, leveraging their complementary advantages. We propose a framework specific for the scenario where no information about task boundaries or task identity is given. It relies on a separation of concerns into what task is being solved and how the task should be solved. This framework is implemented by differentiating task specific parameters from task agnostic parameters, where the latter are optimized in a continual meta learning fashion, without access to multiple tasks at the same time. We showcase this framework in a supervised learning scenario and discuss the implication of the proposed formalism.
Tasks	Continual Learning, Meta-Learning
Published	2019-06-12
URL	https://arxiv.org/abs/1906.05201v1
PDF	https://arxiv.org/pdf/1906.05201v1.pdf
PWC	https://paperswithcode.com/paper/task-agnostic-continual-learning-via-meta
Repo
Framework

Submodular Batch Selection for Training Deep Neural Networks


Title	Submodular Batch Selection for Training Deep Neural Networks
Authors	K J Joseph, Vamshi Teja R, Krishnakant Singh, Vineeth N Balasubramanian
Abstract	Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today. We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.
Tasks	Combinatorial Optimization
Published	2019-06-20
URL	https://arxiv.org/abs/1906.08771v1
PDF	https://arxiv.org/pdf/1906.08771v1.pdf
PWC	https://paperswithcode.com/paper/submodular-batch-selection-for-training-deep
Repo
Framework

Combining Reinforcement Learning and Configuration Checking for Maximum k-plex Problem


Title	Combining Reinforcement Learning and Configuration Checking for Maximum k-plex Problem
Authors	Peilin Chen, Hai Wan, Shaowei Cai, Weilin Luo, Jia Li
Abstract	The Maximum k-plex Problem is an important combinatorial optimization problem with increasingly wide applications. Due to its exponential time complexity, many heuristic methods have been proposed which can return a good-quality solution in a reasonable time. However, most of the heuristic algorithms are memoryless and unable to utilize the experience during the search. Inspired by the multi-armed bandit (MAB) problem in reinforcement learning (RL), we propose a novel perturbation mechanism named BLP, which can learn online to select a good vertex for perturbation when getting stuck in local optima. To our best of knowledge, this is the first attempt to combine local search with RL for the maximum $ k $-plex problem. Besides, we also propose a novel strategy, named Dynamic-threshold Configuration Checking (DTCC), which extends the original Configuration Checking (CC) strategy from two aspects. Based on the BLP and DTCC, we develop a local search algorithm named BDCC and improve it by a hyperheuristic strategy. The experimental result shows that our algorithms dominate on the standard DIMACS and BHOSLIB benchmarks and achieve state-of-the-art performance on massive graphs.
Tasks	Combinatorial Optimization
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02578v1
PDF	https://arxiv.org/pdf/1906.02578v1.pdf
PWC	https://paperswithcode.com/paper/combining-reinforcement-learning-and
Repo
Framework

Model Specification Test with Unlabeled Data: Approach from Covariate Shift


Title	Model Specification Test with Unlabeled Data: Approach from Covariate Shift
Authors	Masahiro Kato, Hikaru Kawarazaki
Abstract	We propose a novel framework of the model specification test in regression using unlabeled test data. In many cases, we have conducted statistical inferences based on the assumption that we can correctly specify a model. However, it is difficult to confirm whether a model is correctly specified. To overcome this problem, existing works have devised statistical tests for model specification. Existing works have defined a correctly specified model in regression as a model with zero conditional mean of the error term over train data only. Extending the definition in conventional statistical tests, we define a correctly specified model as a model with zero conditional mean of the error term over any distribution of the explanatory variable. This definition is a natural consequence of the orthogonality of the explanatory variable and the error term. If a model does not satisfy this condition, the model might lack robustness with regards to the distribution shift. The proposed method would enable us to reject a misspecified model under our definition. By applying the proposed method, we can obtain a model that predicts the label for the unlabeled test data well without losing the interpretability of the model. In experiments, we show how the proposed method works for synthetic and real-world datasets.
Tasks
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00688v2
PDF	https://arxiv.org/pdf/1911.00688v2.pdf
PWC	https://paperswithcode.com/paper/model-specification-test-with-unlabeled-data
Repo
Framework

Learning Compositional Neural Programs with Recursive Tree Search and Planning


Title	Learning Compositional Neural Programs with Recursive Tree Search and Planning
Authors	Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas
Abstract	We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search algorithms, which we augment with recursion. AlphaNPI only assumes a hierarchical program specification with sparse rewards: 1 when the program execution satisfies the specification, and 0 otherwise. Using this specification, AlphaNPI is able to train NPI models effectively with RL for the first time, completely eliminating the need for strong supervision in the form of execution traces. The experiments show that AlphaNPI can sort as well as previous strongly supervised NPI variants. The AlphaNPI agent is also trained on a Tower of Hanoi puzzle with two disks and is shown to generalize to puzzles with an arbitrary number of disk
Tasks
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12941v1
PDF	https://arxiv.org/pdf/1905.12941v1.pdf
PWC	https://paperswithcode.com/paper/learning-compositional-neural-programs-with
Repo
Framework

Adapting and evaluating a deep learning language model for clinical why-question answering


Title	Adapting and evaluating a deep learning language model for clinical why-question answering
Authors	Andrew Wen, Mohamed Y. Elwazir, Sungrim Moon, Jungwei Fan
Abstract	Objectives: To adapt and evaluate a deep learning language model for answering why-questions based on patient-specific clinical text. Materials and Methods: Bidirectional encoder representations from transformers (BERT) models were trained with varying data sources to perform SQuAD 2.0 style why-question answering (why-QA) on clinical notes. The evaluation focused on: 1) comparing the merits from different training data, 2) error analysis. Results: The best model achieved an accuracy of 0.707 (or 0.760 by partial match). Training toward customization for the clinical language helped increase 6% in accuracy. Discussion: The error analysis suggested that the model did not really perform deep reasoning and that clinical why-QA might warrant more sophisticated solutions. Conclusion: The BERT model achieved moderate accuracy in clinical why-QA and should benefit from the rapidly evolving technology. Despite the identified limitations, it could serve as a competent proxy for question-driven clinical information extraction.
Tasks	Language Modelling, Question Answering
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05604v1
PDF	https://arxiv.org/pdf/1911.05604v1.pdf
PWC	https://paperswithcode.com/paper/adapting-and-evaluating-a-deep-learning
Repo
Framework

Teacher-Student Learning Paradigm for Tri-training: An Efficient Method for Unlabeled Data Exploitation


Title	Teacher-Student Learning Paradigm for Tri-training: An Efficient Method for Unlabeled Data Exploitation
Authors	Yash Bhalgat, Zhe Liu, Pritam Gundecha, Jalal Mahmud, Amita Misra
Abstract	Given that labeled data is expensive to obtain in real-world scenarios, many semi-supervised algorithms have explored the task of exploitation of unlabeled data. Traditional tri-training algorithm and tri-training with disagreement have shown promise in tasks where labeled data is limited. In this work, we introduce a new paradigm for tri-training, mimicking the real world teacher-student learning process. We show that the adaptive teacher-student thresholds used in the proposed method provide more control over the learning process with higher label quality. We perform evaluation on SemEval sentiment analysis task and provide comprehensive comparisons over experimental settings containing varied labeled versus unlabeled data rates. Experimental results show that our method outperforms other strong semi-supervised baselines, while requiring less number of labeled training samples.
Tasks	Sentiment Analysis
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11233v1
PDF	https://arxiv.org/pdf/1909.11233v1.pdf
PWC	https://paperswithcode.com/paper/teacher-student-learning-paradigm-for-tri
Repo
Framework

Feature Fusion Encoder Decoder Network For Automatic Liver Lesion Segmentation


Title	Feature Fusion Encoder Decoder Network For Automatic Liver Lesion Segmentation
Authors	Xueying Chen, Rong Zhang, Pingkun Yan
Abstract	Liver lesion segmentation is a difficult yet critical task for medical image analysis. Recently, deep learning based image segmentation methods have achieved promising performance, which can be divided into three categories: 2D, 2.5D and 3D, based on the dimensionality of the models. However, 2.5D and 3D methods can have very high complexity and 2D methods may not perform satisfactorily. To obtain competitive performance with low complexity, in this paper, we propose a Feature-fusion Encoder-Decoder Network (FED-Net) based 2D segmentation model to tackle the challenging problem of liver lesion segmentation from CT images. Our feature fusion method is based on the attention mechanism, which fuses high-level features carrying semantic information with low-level features having image details. Additionally, to compensate for the information loss during the upsampling process, a dense upsampling convolution and a residual convolutional structure are proposed. We tested our method on the dataset of MICCAI 2017 Liver Tumor Segmentation (LiTS) Challenge and achieved competitive results compared with other state-of-the-art methods.
Tasks	Lesion Segmentation, Semantic Segmentation
Published	2019-03-28
URL	http://arxiv.org/abs/1903.11834v1
PDF	http://arxiv.org/pdf/1903.11834v1.pdf
PWC	https://paperswithcode.com/paper/feature-fusion-encoder-decoder-network-for
Repo
Framework

A Study on Action Detection in the Wild


Title	A Study on Action Detection in the Wild
Authors	Yubo Zhang, Pavel Tokmakov, Martial Hebert, Cordelia Schmid
Abstract	The recent introduction of the AVA dataset for action detection has caused a renewed interest to this problem. Several approaches have been recently proposed that improved the performance. However, all of them have ignored the main difficulty of the AVA dataset - its realistic distribution of training and test examples. This dataset was collected by exhaustive annotation of human action in uncurated videos. As a result, the most common categories, such as `stand' or` sit’, contain tens of thousands of examples, whereas rare ones have only dozens. In this work we study the problem of action detection in a highly-imbalanced dataset. Differently from previous work on handling long-tail category distributions, we begin by analyzing the imbalance in the test set. We demonstrate that the standard AP metric is not informative for the categories in the tail, and propose an alternative one - Sampled AP. Armed with this new measure, we study the problem of transferring representations from the data-rich head to the rare tail categories and propose a simple but effective approach.
Tasks	Action Detection
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12993v2
PDF	https://arxiv.org/pdf/1904.12993v2.pdf
PWC	https://paperswithcode.com/paper/a-study-on-action-detection-in-the-wild
Repo
Framework

Weakly Supervised Gaussian Networks for Action Detection


Title	Weakly Supervised Gaussian Networks for Action Detection
Authors	Basura Fernando, Cheston Tan Yin Chet, Hakan Bilen
Abstract	Detecting temporal extents of human actions in videos is a challenging computer vision problem that requires detailed manual supervision including frame-level labels. This expensive annotation process limits deploying action detectors to a limited number of categories. We propose a novel method, called WSGN, that learns to detect actions from \emph{weak supervision}, using only video-level labels. WSGN learns to exploit both video-specific and dataset-wide statistics to predict relevance of each frame to an action category. This strategy leads to significant gains in action detection for two standard benchmarks THUMOS14 and Charades. Our method obtains excellent results compared to state-of-the-art methods that uses similar features and loss functions on THUMOS14 dataset. Similarly, our weakly supervised method is only 0.3% mAP behind a state-of-the-art supervised method on challenging Charades dataset for action localization.
Tasks	Action Detection, Action Localization, Temporal Action Localization
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07774v4
PDF	https://arxiv.org/pdf/1904.07774v4.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-gaussian-networks-for
Repo
Framework