April 3, 2020

3369 words 16 mins read

Paper Group AWR 64

A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis. Snippext: Semi-supervised Opinion Mining with Augmented Data. Bayesian task embedding for few-shot Bayesian optimization. Reconstruction of 3D flight trajectories from ad-hoc camera networks. Harmonic Convolutional Networks based on Discrete Cosine Transform. Lig …

A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis


Title	A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis
Authors	Mahendra Khened, Avinash Kori, Haran Rajkumar, Balaji Srinivasan, Ganapathy Krishnamurthi
Abstract	Histopathology tissue analysis is considered the gold standard in cancer diagnosis and prognosis. Given the large size of these images and the increase in the number of potential cancer cases, an automated solution as an aid to histopathologists is highly desirable. In the recent past, deep learning-based techniques have provided state of the art results in a wide variety of image analysis tasks, including analysis of digitized slides. However, the size of images and variability in histopathology tasks makes it a challenge to develop an integrated framework for histopathology image analysis. We propose a deep learning-based framework for histopathology tissue analysis. We demonstrate the generalizability of our framework, including training and inference, on several open-source datasets, which include CAMELYON (breast cancer metastases), DigestPath (colon cancer), and PAIP (liver cancer) datasets. We discuss multiple types of uncertainties pertaining to data and model, namely aleatoric and epistemic, respectively. Simultaneously, we demonstrate our model generalization across different data distribution by evaluating some samples on TCGA data. On CAMELYON16 test data (n=139) for the task of lesion detection, the FROC score achieved was 0.86 and in the CAMELYON17 test-data (n=500) for the task of pN-staging the Cohen’s kappa score achieved was 0.9090 (third in the open leaderboard). On DigestPath test data (n=212) for the task of tumor segmentation, a Dice score of 0.782 was achieved (fourth in the challenge). On PAIP test data (n=40) for the task of viable tumor segmentation, a Jaccard Index of 0.75 (third in the challenge) was achieved, and for viable tumor burden, a score of 0.633 was achieved (second in the challenge). Our entire framework and related documentation are freely available at GitHub and PyPi.
Tasks	Semantic Segmentation
Published	2020-01-01
URL	https://arxiv.org/abs/2001.00258v1
PDF	https://arxiv.org/pdf/2001.00258v1.pdf
PWC	https://paperswithcode.com/paper/a-generalized-deep-learning-framework-for
Repo	https://github.com/koriavinash1/DigitalHistoPath
Framework	tf

Snippext: Semi-supervised Opinion Mining with Augmented Data


Title	Snippext: Semi-supervised Opinion Mining with Augmented Data
Authors	Zhengjie Miao, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan
Abstract	Online services are interested in solutions to opinion mining, which is the problem of extracting aspects, opinions, and sentiments from text. One method to mine opinions is to leverage the recent success of pre-trained language models which can be fine-tuned to obtain high-quality extractions from reviews. However, fine-tuning language models still requires a non-trivial amount of training data. In this paper, we study the problem of how to significantly reduce the amount of labeled training data required in fine-tuning language models for opinion mining. We describe Snippext, an opinion mining system developed over a language model that is fine-tuned through semi-supervised learning with augmented data. A novelty of Snippext is its clever use of a two-prong approach to achieve state-of-the-art (SOTA) performance with little labeled training data through: (1) data augmentation to automatically generate more labeled training data from existing ones, and (2) a semi-supervised learning technique to leverage the massive amount of unlabeled data in addition to the (limited amount of) labeled data. We show with extensive experiments that Snippext performs comparably and can even exceed previous SOTA results on several opinion mining tasks with only half the training data required. Furthermore, it achieves new SOTA results when all training data are leveraged. By comparison to a baseline pipeline, we found that Snippext extracts significantly more fine-grained opinions which enable new opportunities of downstream applications.
Tasks	Data Augmentation, Language Modelling, Opinion Mining
Published	2020-02-07
URL	https://arxiv.org/abs/2002.03049v1
PDF	https://arxiv.org/pdf/2002.03049v1.pdf
PWC	https://paperswithcode.com/paper/snippext-semi-supervised-opinion-mining-with
Repo	https://github.com/rit-git/Snippext_public
Framework	pytorch

Bayesian task embedding for few-shot Bayesian optimization


Title	Bayesian task embedding for few-shot Bayesian optimization
Authors	Steven Atkinson, Sayan Ghosh, Natarajan Chennimalai-Kumar, Genghis Khan, Liping Wang
Abstract	We describe a method for Bayesian optimization by which one may incorporate data from multiple systems whose quantitative interrelationships are unknown a priori. All general (nonreal-valued) features of the systems are associated with continuous latent variables that enter as inputs into a single metamodel that simultaneously learns the response surfaces of all of the systems. Bayesian inference is used to determine appropriate beliefs regarding the latent variables. We explain how the resulting probabilistic metamodel may be used for Bayesian optimization tasks and demonstrate its implementation on a variety of synthetic and real-world examples, comparing its performance under zero-, one-, and few-shot settings against traditional Bayesian optimization, which usually requires substantially more data from the system of interest.
Tasks	Bayesian Inference
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00637v1
PDF	https://arxiv.org/pdf/2001.00637v1.pdf
PWC	https://paperswithcode.com/paper/bayesian-task-embedding-for-few-shot-bayesian
Repo	https://github.com/sdatkinson/BEBO
Framework	pytorch

Reconstruction of 3D flight trajectories from ad-hoc camera networks


Title	Reconstruction of 3D flight trajectories from ad-hoc camera networks
Authors	Jingtong Li, Jesse Murray, Dorina Ismaili, Konrad Schindler, Cenek Albl
Abstract	We present a method to reconstruct the 3D trajectory of an airborne robotic system only from videos recorded with cameras that are unsynchronized, may feature rolling shutter distortion, and whose viewpoints are unknown. Our approach enables robust and accurate outside-in tracking of dynamically flying targets, with cheap and easy-to-deploy equipment. We show that, in spite of the weakly constrained setting, recent developments in computer vision make it possible to reconstruct trajectories in 3D from unsynchronized, uncalibrated networks of consumer cameras, and validate the proposed method in a realistic field experiment. We make our code available along with the data, including cm-accurate ground-truth from differential GNSS navigation.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04784v1
PDF	https://arxiv.org/pdf/2003.04784v1.pdf
PWC	https://paperswithcode.com/paper/reconstruction-of-3d-flight-trajectories-from
Repo	https://github.com/CenekAlbl/mvus
Framework	none

Harmonic Convolutional Networks based on Discrete Cosine Transform


Title	Harmonic Convolutional Networks based on Discrete Cosine Transform
Authors	Matej Ulicny, Vladimir A. Krylov, Rozenn Dahyot
Abstract	Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. In this paper we propose to revert to learning combinations of preset spectral filters by switching to CNNs with harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT) filters which have excellent energy compaction properties and are widely used for image compression. The proposed harmonic blocks rely on DCT-modeling and replace conventional convolutional layers to produce partially or fully harmonic versions of new or existing CNN architectures. We demonstrate how the harmonic networks can be efficiently compressed in a straightforward manner by truncating high-frequency information in harmonic blocks which is possible due to the redundancies in the spectral domain. We report extensive experimental validation demonstrating the benefits of the introduction of harmonic blocks into state-of-the-art CNN models in image classification, segmentation and edge detection applications.
Tasks	Edge Detection, Image Classification, Object Detection
Published	2020-01-18
URL	https://arxiv.org/abs/2001.06570v1
PDF	https://arxiv.org/pdf/2001.06570v1.pdf
PWC	https://paperswithcode.com/paper/harmonic-convolutional-networks-based-on
Repo	https://github.com/matej-ulicny/harmonic-networks
Framework	pytorch

LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation


Title	LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
Authors	Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang
Abstract	Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs — feature transformation and nonlinear activation — contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance. In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN — neighborhood aggregation — for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.5% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) — a state-of-the-art GCN-based recommender model — under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives.
Tasks	Graph Classification
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02126v1
PDF	https://arxiv.org/pdf/2002.02126v1.pdf
PWC	https://paperswithcode.com/paper/lightgcn-simplifying-and-powering-graph
Repo	https://github.com/kuandeng/LightGCN
Framework	tf

Siamese Box Adaptive Network for Visual Tracking


Title	Siamese Box Adaptive Network for Visual Tracking
Authors	Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, Rongrong Ji
Abstract	Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. Unfortunately, they typically call for tedious and heuristic configurations. To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN. The no-prior box design avoids hyper-parameters associated with the candidate boxes, making SiamBAN more flexible and general. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019, OTB100, NFS, UAV123, and LaSOT demonstrate that SiamBAN achieves state-of-the-art performance and runs at 40 FPS, confirming its effectiveness and efficiency. The code will be available at https://github.com/hqucv/siamban.
Tasks	Visual Tracking
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06761v1
PDF	https://arxiv.org/pdf/2003.06761v1.pdf
PWC	https://paperswithcode.com/paper/siamese-box-adaptive-network-for-visual
Repo	https://github.com/hqucv/siamban
Framework	none

Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors


Title	Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors
Authors	Rituraj Kaushik, Timothée Anne, Jean-Baptiste Mouret
Abstract	Meta-learning algorithms can accelerate the model-based reinforcement learning (MBRL) algorithms by finding an initial set of parameters for the dynamical model such that the model can be trained to match the actual dynamics of the system with only a few data-points. However, in the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain where the dynamics of the robot can be significantly different from one another. In this paper, first, we show that when meta-training situations (the prior situations) have such diverse dynamics, using a single set of meta-trained parameters as a starting point still requires a large number of observations from the real system to learn a useful model of the dynamics. Second, we propose an algorithm called FAMLE that mitigates this limitation by meta-training several initial starting points (i.e., initial parameters) for training the model and allows the robot to select the most suitable starting point to adapt the model to the current situation with only a few gradient steps. We compare FAMLE to MBRL, MBRL with a meta-trained model with MAML, and model-free policy search algorithm PPO for various simulated and real robotic tasks, and show that FAMLE allows the robots to adapt to novel damages in significantly fewer time-steps than the baselines.
Tasks	Meta-Learning
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04663v1
PDF	https://arxiv.org/pdf/2003.04663v1.pdf
PWC	https://paperswithcode.com/paper/fast-online-adaptation-in-robotics-through
Repo	https://github.com/resibots/kaushik_2020_famle
Framework	none

DRMIME: Differentiable Mutual Information and Matrix Exponential for Multi-Resolution Image Registration


Title	DRMIME: Differentiable Mutual Information and Matrix Exponential for Multi-Resolution Image Registration
Authors	Abhishek Nan, Matthew Tennant, Uriel Rubin, Nilanjan Ray
Abstract	In this work, we present a novel unsupervised image registration algorithm. It is differentiable end-to-end and can be used for both multi-modal and mono-modal registration. This is done using mutual information (MI) as a metric. The novelty here is that rather than using traditional ways of approximating MI, we use a neural estimator called MINE and supplement it with matrix exponential for transformation matrix computation. This leads to improved results as compared to the standard algorithms available out-of-the-box in state-of-the-art image registration toolboxes.
Tasks	Image Registration
Published	2020-01-27
URL	https://arxiv.org/abs/2001.09865v1
PDF	https://arxiv.org/pdf/2001.09865v1.pdf
PWC	https://paperswithcode.com/paper/drmime-differentiable-mutual-information-and
Repo	https://github.com/abnan/DRMIME
Framework	none

Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog


Title	Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog
Authors	Zekang Li, Zongjia Li, Jinchao Zhang, Yang Feng, Cheng Niu, Jie Zhou
Abstract	Audio-Visual Scene-Aware Dialog (AVSD) is a task to generate responses when chatting about a given video, which is organized as a track of the 8th Dialog System Technology Challenge (DSTC8). To solve the task, we propose a universal multimodal transformer and introduce the multi-task learning method to learn joint representations among different modalities as well as generate informative and fluent responses. Our method extends the natural language generation pre-trained model to multimodal dialogue generation task. Our system achieves the best performance in both objective and subjective evaluations in the challenge.
Tasks	Dialogue Generation, Multi-Task Learning, Text Generation
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00163v1
PDF	https://arxiv.org/pdf/2002.00163v1.pdf
PWC	https://paperswithcode.com/paper/bridging-text-and-video-a-universal
Repo	https://github.com/ictnlp/DSTC8-AVSD
Framework	pytorch

Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing


Title	Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing
Authors	Hao-Hsiang Yang, Chao-Han Huck Yang, Yi-Chang James Tsai
Abstract	Single image dehazing is the ill-posed two-dimensional signal reconstruction problem. Recently, deep convolutional neural networks (CNN) have been successfully used in many computer vision problems. In this paper, we propose a Y-net that is named for its structure. This network reconstructs clear images by aggregating multi-scale features maps. Additionally, we propose a Wavelet Structure SIMilarity (W-SSIM) loss function in the training step. In the proposed loss function, discrete wavelet transforms are applied repeatedly to divide the image into differently sized patches with different frequencies and scales. The proposed loss function is the accumulation of SSIM loss of various patches with respective ratios. Extensive experimental results demonstrate that the proposed Y-net with the W-SSIM loss function restores high-quality clear images and outperforms state-of-the-art algorithms. Code and models are available at https://github.com/dectrfov/Y-net.
Tasks	Image Dehazing, Single Image Dehazing
Published	2020-03-31
URL	https://arxiv.org/abs/2003.13912v1
PDF	https://arxiv.org/pdf/2003.13912v1.pdf
PWC	https://paperswithcode.com/paper/y-net-multi-scale-feature-aggregation-network
Repo	https://github.com/dectrfov/Y-net
Framework	none

An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization


Title	An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization
Authors	Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Kangning Liu, Sudarshini Tyagi, Laura Heacock, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras
Abstract	Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: https://github.com/nyukat/GMIC.
Tasks	Lesion Segmentation
Published	2020-02-13
URL	https://arxiv.org/abs/2002.07613v1
PDF	https://arxiv.org/pdf/2002.07613v1.pdf
PWC	https://paperswithcode.com/paper/an-interpretable-classifier-for-high
Repo	https://github.com/nyukat/GMIC
Framework	pytorch

Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion


Title	Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion
Authors	Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang
Abstract	An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and add an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can be enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods.
Tasks	Voice Conversion
Published	2020-01-22
URL	https://arxiv.org/abs/2001.07849v3
PDF	https://arxiv.org/pdf/2001.07849v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-disentanglement
Repo	https://github.com/unilight/cdvae-vc
Framework	tf

Deep learning of dynamical attractors from time series measurements


Title	Deep learning of dynamical attractors from time series measurements
Authors	William Gilpin
Abstract	Experimental measurements of physical systems often have a finite number of independent channels, causing essential dynamical variables to remain unobserved. However, many popular methods for unsupervised inference of latent dynamics from experimental data implicitly assume that the measurements have higher intrinsic dimensionality than the underlying system—making coordinate identification a dimensionality reduction problem. Here, we study the opposite limit, in which hidden governing coordinates must be inferred from only a low-dimensional time series of measurements. Inspired by classical techniques for studying the strange attractors of chaotic systems, we introduce a general embedding technique for time series, consisting of an autoencoder trained with a novel latent-space loss function. We first apply our technique to a variety of synthetic and real-world datasets with known strange attractors, and we use established and novel measures of attractor fidelity to show that our method successfully reconstructs attractors better than existing techniques. We then use our technique to discover dynamical attractors in datasets ranging from patient electrocardiograms, to household electricity usage, to eruptions of the Old Faithful geyser—demonstrating diverse applications of our technique for exploratory data analysis.
Tasks	Dimensionality Reduction, Time Series
Published	2020-02-14
URL	https://arxiv.org/abs/2002.05909v1
PDF	https://arxiv.org/pdf/2002.05909v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-of-dynamical-attractors-from
Repo	https://github.com/williamgilpin/fnn
Framework	tf

Structural Deep Clustering Network


Title	Structural Deep Clustering Network
Authors	Deyu Bo, Xiao Wang, Chuan Shi, Meiqi Zhu, Emiao Lu, Peng Cui
Abstract	Clustering is a fundamental task in data analysis. Recently, deep clustering, which derives inspiration primarily from deep learning approaches, achieves state-of-the-art performance and has attracted considerable attention. Current deep clustering methods usually boost the clustering results by means of the powerful representation ability of deep learning, e.g., autoencoder, suggesting that learning an effective representation for clustering is a crucial requirement. The strength of deep clustering methods is to extract the useful representations from the data itself, rather than the structure of data, which receives scarce attention in representation learning. Motivated by the great success of Graph Convolutional Network (GCN) in encoding the graph structure, we propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering. Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures and guide the update of the whole model. In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder. Furthermore, we theoretically analyze the delivery operator, i.e., with the delivery operator, GCN improves the autoencoder-specific representation as a high-order graph regularization constraint and autoencoder helps alleviate the over-smoothing problem in GCN. Through comprehensive experiments, we demonstrate that our propose model can consistently perform better over the state-of-the-art techniques.
Tasks	Representation Learning
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01633v3
PDF	https://arxiv.org/pdf/2002.01633v3.pdf
PWC	https://paperswithcode.com/paper/structural-deep-clustering-network
Repo	https://github.com/461054993/SDCN
Framework	pytorch