February 1, 2020

3411 words 17 mins read

Paper Group AWR 247

DA-RefineNet:A Dual Input Whole Slide Image Segmentation Algorithm Based on Attention. Regularizing Neural Networks for Future Trajectory Prediction via Inverse Reinforcement Learning Framework. Graph-Based Object Classification for Neuromorphic Vision Sensing. Tag-based Multi-Span Extraction in Reading Comprehension. Biologically plausible deep le …

DA-RefineNet:A Dual Input Whole Slide Image Segmentation Algorithm Based on Attention


Title	DA-RefineNet:A Dual Input Whole Slide Image Segmentation Algorithm Based on Attention
Authors	Ziqiang Li, Rentuo Tao, Qianrun Wu, Bin Li
Abstract	Due to the high resolution of pathological images, the automated semantic segmentation in the medical pathological images has shown greater challenges than that in natural images. Sliding Window method has shown its effect on solving problem caused by the high resolution of whole slide images (WSI). However, owing to its localization, Sliding Window method also suffers from lack of global information. In this paper, a dual input semantic segmentation network based on attention is proposed, in which, one input provides small-scale fine information, the other input provides large-scale coarse information. Compared with single input methods, our method based on dual inputs and attention: DA-RefineNet exhibits a dramatic performance improvement on ICIAR2018 breast cancer segmentation task.
Tasks	Semantic Segmentation
Published	2019-07-15
URL	https://arxiv.org/abs/1907.06358v2
PDF	https://arxiv.org/pdf/1907.06358v2.pdf
PWC	https://paperswithcode.com/paper/ca-refineneta-dual-input-wsi-image
Repo	https://github.com/iceli1007/ICIAR2018_BACH_Challenge-DA-Refinenet
Framework	pytorch

Regularizing Neural Networks for Future Trajectory Prediction via Inverse Reinforcement Learning Framework


Title	Regularizing Neural Networks for Future Trajectory Prediction via Inverse Reinforcement Learning Framework
Authors	Dooseop Choi, Kyoungwook Min, Jeongdan Choi
Abstract	Predicting distant future trajectories of agents in a dynamic scene is not an easy problem because the future trajectory of an agent is affected by not only his/her past trajectory but also the scene contexts. To tackle this problem, we propose a model based on recurrent neural networks (RNNs) and a novel method for training the model. The proposed model is based on an encoder-decoder architecture where the encoder encodes inputs (past trajectories and scene context information) while the decoder produces a trajectory from the context vector given by the encoder. We train the networks of the proposed model to produce a future trajectory, which is the closest to the true trajectory, while maximizing a reward from a reward function. The reward function is also trained at the same time to maximize the margin between the rewards from the ground-truth trajectory and its estimate. The reward function plays the role of a regularizer for the proposed model so the trained networks are able to better utilize the scene context information for the prediction task. We evaluated the proposed model on several public datasets. Experimental results show that the prediction performance of the proposed model is much improved by the regularization, which outperforms the-state-of-the-arts in terms of accuracy. The implementation codes are available at https://github.com/d1024choi/traj-pred-irl/.
Tasks	Trajectory Prediction
Published	2019-07-10
URL	https://arxiv.org/abs/1907.04525v2
PDF	https://arxiv.org/pdf/1907.04525v2.pdf
PWC	https://paperswithcode.com/paper/regularizing-neural-networks-for-future
Repo	https://github.com/d1024choi/traj-pred-irl
Framework	tf

Graph-Based Object Classification for Neuromorphic Vision Sensing


Title	Graph-Based Object Classification for Neuromorphic Vision Sensing
Authors	Yin Bi, Aaron Chadha, Alhabib Abbas, Eirina Bourtsoulatze, Yiannis Andreopoulos
Abstract	Neuromorphic vision sensing (NVS)\ devices represent visual information as sequences of asynchronous discrete events (a.k.a., ``spikes’') in response to changes in scene reflectance. Unlike conventional active pixel sensing (APS), NVS allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, object classification with NVS streams cannot leverage on state-of-the-art convolutional neural networks (CNNs), since NVS does not produce frame representations. To circumvent this mismatch between sensing and processing with CNNs, we propose a compact graph representation for NVS. We couple this with novel residual graph CNN architectures and show that, when trained on spatio-temporal NVS data for object classification, such residual graph CNNs preserve the spatial and temporal coherence of spike events, while requiring less computation and memory. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we present and make available a 100k dataset of NVS recordings of the American sign language letters, acquired with an iniLabs DAVIS240c device under real-world conditions. \|
Tasks	Object Classification
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06648v1
PDF	https://arxiv.org/pdf/1908.06648v1.pdf
PWC	https://paperswithcode.com/paper/graph-based-object-classification-for
Repo	https://github.com/PIX2NVS/NVS2Graph
Framework	pytorch

Tag-based Multi-Span Extraction in Reading Comprehension


Title	Tag-based Multi-Span Extraction in Reading Comprehension
Authors	Avia Efrat, Elad Segal, Mor Shoham
Abstract	With models reaching human performance on many popular reading comprehension datasets in recent years, a new dataset, DROP, introduced questions that were expected to present a harder challenge for reading comprehension models. Among these new types of questions were “multi-span questions”, questions whose answers consist of several spans from either the paragraph or the question itself. Until now, only one model attempted to tackle multi-span questions as a part of its design. In this work, we suggest a new approach for tackling multi-span questions, based on sequence tagging, which differs from previous approaches for answering span questions. We show that our approach leads to an absolute improvement of 29.7 EM and 15.1 F1 compared to existing state-of-the-art results, while not hurting performance on other question types. Furthermore, we show that our model slightly eclipses the current state-of-the-art results on the entire DROP dataset.
Tasks	Reading Comprehension
Published	2019-09-29
URL	https://arxiv.org/abs/1909.13375v2
PDF	https://arxiv.org/pdf/1909.13375v2.pdf
PWC	https://paperswithcode.com/paper/tag-based-multi-span-extraction-in-reading
Repo	https://github.com/llamazing/numnet_plus
Framework	pytorch

Biologically plausible deep learning – but how far can we go with shallow networks?


Title	Biologically plausible deep learning – but how far can we go with shallow networks?
Authors	Bernd Illing, Wulfram Gerstner, Johanni Brea
Abstract	Training deep neural networks with the error backpropagation algorithm is considered implausible from a biological perspective. Numerous recent publications suggest elaborate models for biologically plausible variants of deep learning, typically defining success as reaching around 98% test accuracy on the MNIST data set. Here, we investigate how far we can go on digit (MNIST) and object (CIFAR10) classification with biologically plausible, local learning rules in a network with one hidden layer and a single readout layer. The hidden layer weights are either fixed (random or random Gabor filters) or trained with unsupervised methods (PCA, ICA or Sparse Coding) that can be implemented by local learning rules. The readout layer is trained with a supervised, local learning rule. We first implement these models with rate neurons. This comparison reveals, first, that unsupervised learning does not lead to better performance than fixed random projections or Gabor filters for large hidden layers. Second, networks with localized receptive fields perform significantly better than networks with all-to-all connectivity and can reach backpropagation performance on MNIST. We then implement two of the networks - fixed, localized, random & random Gabor filters in the hidden layer - with spiking leaky integrate-and-fire neurons and spike timing dependent plasticity to train the readout layer. These spiking models achieve > 98.2% test accuracy on MNIST, which is close to the performance of rate networks with one hidden layer trained with backpropagation. The performance of our shallow network models is comparable to most current biologically plausible models of deep learning. Furthermore, our results with a shallow spiking network provide an important reference and suggest the use of datasets other than MNIST for testing the performance of future models of biologically plausible deep learning.
Tasks
Published	2019-02-27
URL	https://arxiv.org/abs/1905.04101v2
PDF	https://arxiv.org/pdf/1905.04101v2.pdf
PWC	https://paperswithcode.com/paper/190504101
Repo	https://github.com/EPFL-LCN/pub-illing2019-nnetworks
Framework	none

Object Detection based on Region Decomposition and Assembly


Title	Object Detection based on Region Decomposition and Assembly
Authors	Seung-Hwan Bae
Abstract	Region-based object detection infers object regions for one or more categories in an image. Due to the recent advances in deep learning and region proposal methods, object detectors based on convolutional neural networks (CNNs) have been flourishing and provided the promising detection results. However, the detection accuracy is degraded often because of the low discriminability of object CNN features caused by occlusions and inaccurate region proposals. In this paper, we therefore propose a region decomposition and assembly detector (R-DAD) for more accurate object detection. In the proposed R-DAD, we first decompose an object region into multiple small regions. To capture an entire appearance and part details of the object jointly, we extract CNN features within the whole object region and decomposed regions. We then learn the semantic relations between the object and its parts by combining the multi-region features stage by stage with region assembly blocks, and use the combined and high-level semantic features for the object classification and localization. In addition, for more accurate region proposals, we propose a multi-scale proposal layer that can generate object proposals of various scales. We integrate the R-DAD into several feature extractors, and prove the distinct performance improvement on PASCAL07/12 and MSCOCO18 compared to the recent convolutional detectors.
Tasks	Object Classification, Object Detection
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08225v1
PDF	http://arxiv.org/pdf/1901.08225v1.pdf
PWC	https://paperswithcode.com/paper/object-detection-based-on-region
Repo	https://github.com/hualuluu/--every-day-paper--
Framework	none

Region-specific Diffeomorphic Metric Mapping


Title	Region-specific Diffeomorphic Metric Mapping
Authors	Zhengyang Shen, François-Xavier Vialard, Marc Niethammer
Abstract	We introduce a region-specific diffeomorphic metric mapping (RDMM) registration approach. RDMM is non-parametric, estimating spatio-temporal velocity fields which parameterize the sought-for spatial transformation. Regularization of these velocity fields is necessary. However, while existing non-parametric registration approaches, e.g., the large displacement diffeomorphic metric mapping (LDDMM) model, use a fixed spatially-invariant regularization our model advects a spatially-varying regularizer with the estimated velocity field, thereby naturally attaching a spatio-temporal regularizer to deforming objects. We explore a family of RDMM registration approaches: 1) a registration model where regions with separate regularizations are pre-defined (e.g., in an atlas space), 2) a registration model where a general spatially-varying regularizer is estimated, and 3) a registration model where the spatially-varying regularizer is obtained via an end-to-end trained deep learning (DL) model. We provide a variational derivation of RDMM, show that the model can assure diffeomorphic transformations in the continuum, and that LDDMM is a particular instance of RDMM. To evaluate RDMM performance we experiment 1) on synthetic 2D data and 2) on two 3D datasets: knee magnetic resonance images (MRIs) of the Osteoarthritis Initiative (OAI) and computed tomography images (CT) of the lung. Results show that our framework achieves state-of-the-art image registration performance, while providing additional information via a learned spatio-temoporal regularizer. Further, our deep learning approach allows for very fast RDMM and LDDMM estimations. Our code will be open-sourced. Code is available at https://github.com/uncbiag/registration.
Tasks	Image Registration, Medical Image Registration
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00139v2
PDF	https://arxiv.org/pdf/1906.00139v2.pdf
PWC	https://paperswithcode.com/paper/190600139
Repo	https://github.com/hbgtjxzbbx/introduction
Framework	none

Meta-learning algorithms for Few-Shot Computer Vision


Title	Meta-learning algorithms for Few-Shot Computer Vision
Authors	Etienne Bennequin
Abstract	Few-Shot Learning is the challenge of training a model with only a small amount of data. Many solutions to this problem use meta-learning algorithms, i.e. algorithms that learn to learn. By sampling few-shot tasks from a larger dataset, we can teach these algorithms to solve new, unseen tasks. This document reports my work on meta-learning algorithms for Few-Shot Computer Vision. This work was done during my internship at Sicara, a French company building image recognition solutions for businesses. It contains: 1. an extensive review of the state-of-the-art in few-shot computer vision; 2. a benchmark of meta-learning algorithms for few-shot image classification; 3. the introduction to a novel meta-learning algorithm for few-shot object detection, which is still in development.
Tasks	Few-Shot Image Classification, Few-Shot Learning, Few-Shot Object Detection, Image Classification, Meta-Learning, Object Detection
Published	2019-09-30
URL	https://arxiv.org/abs/1909.13579v1
PDF	https://arxiv.org/pdf/1909.13579v1.pdf
PWC	https://paperswithcode.com/paper/meta-learning-algorithms-for-few-shot
Repo	https://github.com/ebennequin/FewShotVision
Framework	pytorch

Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation


Title	Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation
Authors	Wataru Shimoda, Keiji Yanai
Abstract	To minimize the annotation costs associated with the training of semantic segmentation models, researchers have extensively investigated weakly-supervised segmentation approaches. In the current weakly-supervised segmentation methods, the most widely adopted approach is based on visualization. However, the visualization results are not generally equal to semantic segmentation. Therefore, to perform accurate semantic segmentation under the weakly supervised condition, it is necessary to consider the mapping functions that convert the visualization results into semantic segmentation. For such mapping functions, the conditional random field and iterative re-training using the outputs of a segmentation model are usually used. However, these methods do not always guarantee improvements in accuracy; therefore, if we apply these mapping functions iteratively multiple times, eventually the accuracy will not improve or will decrease. In this paper, to make the most of such mapping functions, we assume that the results of the mapping function include noise, and we improve the accuracy by removing noise. To achieve our aim, we propose the self-supervised difference detection module, which estimates noise from the results of the mapping functions by predicting the difference between the segmentation masks before and after the mapping. We verified the effectiveness of the proposed method by performing experiments on the PASCAL Visual Object Classes 2012 dataset, and we achieved 64.9% in the val set and 65.5% in the test set. Both of the results become new state-of-the-art under the same setting of weakly supervised semantic segmentation.
Tasks	Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-11-04
URL	https://arxiv.org/abs/1911.01370v2
PDF	https://arxiv.org/pdf/1911.01370v2.pdf
PWC	https://paperswithcode.com/paper/self-supervised-difference-detection-for-1
Repo	https://github.com/shimoda-uec/ssdd
Framework	pytorch

Discriminative and Robust Online Learning for Siamese Visual Tracking


Title	Discriminative and Robust Online Learning for Siamese Visual Tracking
Authors	Jinghao Zhou, Peng Wang, Haoyang Sun
Abstract	The problem of visual object tracking has traditionally been handled by variant tracking paradigms, either learning a model of the object’s appearance exclusively online or matching the object with the target in an offline-trained embedding space. Despite the recent success, each method agonizes over its intrinsic constraint. The online-only approaches suffer from a lack of generalization of the model they learn thus are inferior in target regression, while the offline-only approaches (e.g., convolutional siamese trackers) lack the target-specific context information thus are not discriminative enough to handle distractors, and robust enough to deformation. Therefore, we propose an online module with an attention mechanism for offline siamese networks to extract target-specific features under L2 error. We further propose a filter update strategy adaptive to treacherous background noises for discriminative learning, and a template update strategy to handle large target deformations for robust learning. Effectiveness can be validated in the consistent improvement over three siamese baselines: SiamFC, SiamRPN++, and SiamMask. Beyond that, our model based on SiamRPN++ obtains the best results over six popular tracking benchmarks and can operate beyond real-time.
Tasks	Object Tracking, Visual Object Tracking, Visual Tracking
Published	2019-09-06
URL	https://arxiv.org/abs/1909.02959v2
PDF	https://arxiv.org/pdf/1909.02959v2.pdf
PWC	https://paperswithcode.com/paper/discriminative-and-robust-online-learning-for
Repo	https://github.com/shallowtoil/DROL
Framework	none

Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation


Title	Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation
Authors	Lequan Yu, Shujun Wang, Xiaomeng Li, Chi-Wing Fu, Pheng-Ann Heng
Abstract	Training deep convolutional neural networks usually requires a large amount of labeled data. However, it is expensive and time-consuming to annotate data for medical image segmentation tasks. In this paper, we present a novel uncertainty-aware semi-supervised framework for left atrium segmentation from 3D MR images. Our framework can effectively leverage the unlabeled data by encouraging consistent predictions of the same input under different perturbations. Concretely, the framework consists of a student model and a teacher model, and the student model learns from the teacher model by minimizing a segmentation loss and a consistency loss with respect to the targets of the teacher model. We design a novel uncertainty-aware scheme to enable the student model to gradually learn from the meaningful and reliable targets by exploiting the uncertainty information. Experiments show that our method achieves high performance gains by incorporating the unlabeled data. Our method outperforms the state-of-the-art semi-supervised methods, demonstrating the potential of our framework for the challenging semi-supervised problems.
Tasks	Medical Image Segmentation, Semantic Segmentation
Published	2019-07-16
URL	https://arxiv.org/abs/1907.07034v1
PDF	https://arxiv.org/pdf/1907.07034v1.pdf
PWC	https://paperswithcode.com/paper/uncertainty-aware-self-ensembling-model-for
Repo	https://github.com/yulequan/UA-MT
Framework	pytorch

Compositional Generalization for Primitive Substitutions


Title	Compositional Generalization for Primitive Substitutions
Authors	Yuanpeng Li, Liang Zhao, Jianyu Wang, Joel Hestness
Abstract	Compositional generalization is a basic mechanism in human language learning, but current neural networks lack such ability. In this paper, we conduct fundamental research for encoding compositionality in neural networks. Conventional methods use a single representation for the input sentence, making it hard to apply prior knowledge of compositionality. In contrast, our approach leverages such knowledge with two representations, one generating attention maps, and the other mapping attended input words to output symbols. We reduce the entropy in each representation to improve generalization. Our experiments demonstrate significant improvements over the conventional methods in five NLP tasks including instruction learning and machine translation. In the SCAN domain, it boosts accuracies from 14.0% to 98.8% in Jump task, and from 92.0% to 99.7% in TurnLeft task. It also beats human performance on a few-shot learning task. We hope the proposed approach can help ease future research towards human-level compositional language learning.
Tasks	Few-Shot Learning, Machine Translation
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02612v1
PDF	https://arxiv.org/pdf/1910.02612v1.pdf
PWC	https://paperswithcode.com/paper/compositional-generalization-for-primitive
Repo	https://github.com/yli1/CGPS
Framework	tf

Activity Driven Weakly Supervised Object Detection


Title	Activity Driven Weakly Supervised Object Detection
Authors	Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan
Abstract	Weakly supervised object detection aims at reducing the amount of supervision required to train detection models. Such models are traditionally learned from images/videos labelled only with the object class and not the object bounding box. In our work, we try to leverage not only the object class labels but also the action labels associated with the data. We show that the action depicted in the image/video can provide strong cues about the location of the associated object. We learn a spatial prior for the object dependent on the action (e.g. “ball” is closer to “leg of the person” in “kicking ball”), and incorporate this prior to simultaneously train a joint object detection and action classification model. We conducted experiments on both video datasets and image datasets to evaluate the performance of our weakly supervised object detection model. Our approach outperformed the current state-of-the-art (SOTA) method by more than 6% in mAP on the Charades video dataset.
Tasks	Action Classification, Object Detection, Weakly Supervised Object Detection
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01665v1
PDF	http://arxiv.org/pdf/1904.01665v1.pdf
PWC	https://paperswithcode.com/paper/activity-driven-weakly-supervised-object
Repo	https://github.com/zhenheny/ADWSOD
Framework	pytorch

Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions


Title	Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions
Authors	Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Remco Veltkamp, Ronald Poppe
Abstract	Deep learning approaches have been established as the main methodology for video classification and recognition. Recently, 3-dimensional convolutions have been used to achieve state-of-the-art performance in many challenging video datasets. Because of the high level of complexity of these methods, as the convolution operations are also extended to additional dimension in order to extract features from them as well, providing a visualization for the signals that the network interpret as informative, is a challenging task. An effective notion of understanding the network’s inner-workings would be to isolate the spatio-temporal regions on the video that the network finds most informative. We propose a method called Saliency Tubes which demonstrate the foremost points and regions in both frame level and over time that are found to be the main focus points of the network. We demonstrate our findings on widely used datasets for third-person and egocentric action classification and enhance the set of methods and visualizations that improve 3D Convolutional Neural Networks (CNNs) intelligibility.
Tasks	Action Classification, Video Classification
Published	2019-02-04
URL	https://arxiv.org/abs/1902.01078v2
PDF	https://arxiv.org/pdf/1902.01078v2.pdf
PWC	https://paperswithcode.com/paper/saliency-tubes-visual-explanations-for-spatio
Repo	https://github.com/alexandrosstergiou/Saliency-Tubes-Visual-Explanations-for-Spatio-Temporal-Convolutions
Framework	pytorch

On the Effect of Low-Frequency Terms on Neural-IR Models


Title	On the Effect of Low-Frequency Terms on Neural-IR Models
Authors	Sebastian Hofstätter, Navid Rekabsaz, Carsten Eickhoff, Allan Hanbury
Abstract	Low-frequency terms are a recurring challenge for information retrieval models, especially neural IR frameworks struggle with adequately capturing infrequently observed words. While these terms are often removed from neural models - mainly as a concession to efficiency demands - they traditionally play an important role in the performance of IR models. In this paper, we analyze the effects of low-frequency terms on the performance and robustness of neural IR models. We conduct controlled experiments on three recent neural IR models, trained on a large-scale passage retrieval collection. We evaluate the neural IR models with various vocabulary sizes for their respective word embeddings, considering different levels of constraints on the available GPU memory. We observe that despite the significant benefits of using larger vocabularies, the performance gap between the vocabularies can be, to a great extent, mitigated by extensive tuning of a related parameter: the number of documents to re-rank. We further investigate the use of subword-token embedding models, and in particular FastText, for neural IR models. Our experiments show that using FastText brings slight improvements to the overall performance of the neural IR models in comparison to models trained on the full vocabulary, while the improvement becomes much more pronounced for queries containing low-frequency terms.
Tasks	Information Retrieval, Word Embeddings
Published	2019-04-29
URL	http://arxiv.org/abs/1904.12683v2
PDF	http://arxiv.org/pdf/1904.12683v2.pdf
PWC	https://paperswithcode.com/paper/on-the-effect-of-low-frequency-terms-on
Repo	https://github.com/sebastian-hofstaetter/sigir19-neural-ir
Framework	pytorch