Paper Group AWR 64
A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis. Snippext: Semi-supervised Opinion Mining with Augmented Data. Bayesian task embedding for few-shot Bayesian optimization. Reconstruction of 3D flight trajectories from ad-hoc camera networks. Harmonic Convolutional Networks based on Discrete Cosine Transform. Lig …
A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis
Title | A Generalized Deep Learning Framework for Whole-Slide Image Segmentation and Analysis |
Authors | Mahendra Khened, Avinash Kori, Haran Rajkumar, Balaji Srinivasan, Ganapathy Krishnamurthi |
Abstract | Histopathology tissue analysis is considered the gold standard in cancer diagnosis and prognosis. Given the large size of these images and the increase in the number of potential cancer cases, an automated solution as an aid to histopathologists is highly desirable. In the recent past, deep learning-based techniques have provided state of the art results in a wide variety of image analysis tasks, including analysis of digitized slides. However, the size of images and variability in histopathology tasks makes it a challenge to develop an integrated framework for histopathology image analysis. We propose a deep learning-based framework for histopathology tissue analysis. We demonstrate the generalizability of our framework, including training and inference, on several open-source datasets, which include CAMELYON (breast cancer metastases), DigestPath (colon cancer), and PAIP (liver cancer) datasets. We discuss multiple types of uncertainties pertaining to data and model, namely aleatoric and epistemic, respectively. Simultaneously, we demonstrate our model generalization across different data distribution by evaluating some samples on TCGA data. On CAMELYON16 test data (n=139) for the task of lesion detection, the FROC score achieved was 0.86 and in the CAMELYON17 test-data (n=500) for the task of pN-staging the Cohen’s kappa score achieved was 0.9090 (third in the open leaderboard). On DigestPath test data (n=212) for the task of tumor segmentation, a Dice score of 0.782 was achieved (fourth in the challenge). On PAIP test data (n=40) for the task of viable tumor segmentation, a Jaccard Index of 0.75 (third in the challenge) was achieved, and for viable tumor burden, a score of 0.633 was achieved (second in the challenge). Our entire framework and related documentation are freely available at GitHub and PyPi. |
Tasks | Semantic Segmentation |
Published | 2020-01-01 |
URL | https://arxiv.org/abs/2001.00258v1 |
https://arxiv.org/pdf/2001.00258v1.pdf | |
PWC | https://paperswithcode.com/paper/a-generalized-deep-learning-framework-for |
Repo | https://github.com/koriavinash1/DigitalHistoPath |
Framework | tf |
Snippext: Semi-supervised Opinion Mining with Augmented Data
Title | Snippext: Semi-supervised Opinion Mining with Augmented Data |
Authors | Zhengjie Miao, Yuliang Li, Xiaolan Wang, Wang-Chiew Tan |
Abstract | Online services are interested in solutions to opinion mining, which is the problem of extracting aspects, opinions, and sentiments from text. One method to mine opinions is to leverage the recent success of pre-trained language models which can be fine-tuned to obtain high-quality extractions from reviews. However, fine-tuning language models still requires a non-trivial amount of training data. In this paper, we study the problem of how to significantly reduce the amount of labeled training data required in fine-tuning language models for opinion mining. We describe Snippext, an opinion mining system developed over a language model that is fine-tuned through semi-supervised learning with augmented data. A novelty of Snippext is its clever use of a two-prong approach to achieve state-of-the-art (SOTA) performance with little labeled training data through: (1) data augmentation to automatically generate more labeled training data from existing ones, and (2) a semi-supervised learning technique to leverage the massive amount of unlabeled data in addition to the (limited amount of) labeled data. We show with extensive experiments that Snippext performs comparably and can even exceed previous SOTA results on several opinion mining tasks with only half the training data required. Furthermore, it achieves new SOTA results when all training data are leveraged. By comparison to a baseline pipeline, we found that Snippext extracts significantly more fine-grained opinions which enable new opportunities of downstream applications. |
Tasks | Data Augmentation, Language Modelling, Opinion Mining |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.03049v1 |
https://arxiv.org/pdf/2002.03049v1.pdf | |
PWC | https://paperswithcode.com/paper/snippext-semi-supervised-opinion-mining-with |
Repo | https://github.com/rit-git/Snippext_public |
Framework | pytorch |
Bayesian task embedding for few-shot Bayesian optimization
Title | Bayesian task embedding for few-shot Bayesian optimization |
Authors | Steven Atkinson, Sayan Ghosh, Natarajan Chennimalai-Kumar, Genghis Khan, Liping Wang |
Abstract | We describe a method for Bayesian optimization by which one may incorporate data from multiple systems whose quantitative interrelationships are unknown a priori. All general (nonreal-valued) features of the systems are associated with continuous latent variables that enter as inputs into a single metamodel that simultaneously learns the response surfaces of all of the systems. Bayesian inference is used to determine appropriate beliefs regarding the latent variables. We explain how the resulting probabilistic metamodel may be used for Bayesian optimization tasks and demonstrate its implementation on a variety of synthetic and real-world examples, comparing its performance under zero-, one-, and few-shot settings against traditional Bayesian optimization, which usually requires substantially more data from the system of interest. |
Tasks | Bayesian Inference |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00637v1 |
https://arxiv.org/pdf/2001.00637v1.pdf | |
PWC | https://paperswithcode.com/paper/bayesian-task-embedding-for-few-shot-bayesian |
Repo | https://github.com/sdatkinson/BEBO |
Framework | pytorch |
Reconstruction of 3D flight trajectories from ad-hoc camera networks
Title | Reconstruction of 3D flight trajectories from ad-hoc camera networks |
Authors | Jingtong Li, Jesse Murray, Dorina Ismaili, Konrad Schindler, Cenek Albl |
Abstract | We present a method to reconstruct the 3D trajectory of an airborne robotic system only from videos recorded with cameras that are unsynchronized, may feature rolling shutter distortion, and whose viewpoints are unknown. Our approach enables robust and accurate outside-in tracking of dynamically flying targets, with cheap and easy-to-deploy equipment. We show that, in spite of the weakly constrained setting, recent developments in computer vision make it possible to reconstruct trajectories in 3D from unsynchronized, uncalibrated networks of consumer cameras, and validate the proposed method in a realistic field experiment. We make our code available along with the data, including cm-accurate ground-truth from differential GNSS navigation. |
Tasks | |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04784v1 |
https://arxiv.org/pdf/2003.04784v1.pdf | |
PWC | https://paperswithcode.com/paper/reconstruction-of-3d-flight-trajectories-from |
Repo | https://github.com/CenekAlbl/mvus |
Framework | none |
Harmonic Convolutional Networks based on Discrete Cosine Transform
Title | Harmonic Convolutional Networks based on Discrete Cosine Transform |
Authors | Matej Ulicny, Vladimir A. Krylov, Rozenn Dahyot |
Abstract | Convolutional neural networks (CNNs) learn filters in order to capture local correlation patterns in feature space. In this paper we propose to revert to learning combinations of preset spectral filters by switching to CNNs with harmonic blocks. We rely on the use of the Discrete Cosine Transform (DCT) filters which have excellent energy compaction properties and are widely used for image compression. The proposed harmonic blocks rely on DCT-modeling and replace conventional convolutional layers to produce partially or fully harmonic versions of new or existing CNN architectures. We demonstrate how the harmonic networks can be efficiently compressed in a straightforward manner by truncating high-frequency information in harmonic blocks which is possible due to the redundancies in the spectral domain. We report extensive experimental validation demonstrating the benefits of the introduction of harmonic blocks into state-of-the-art CNN models in image classification, segmentation and edge detection applications. |
Tasks | Edge Detection, Image Classification, Object Detection |
Published | 2020-01-18 |
URL | https://arxiv.org/abs/2001.06570v1 |
https://arxiv.org/pdf/2001.06570v1.pdf | |
PWC | https://paperswithcode.com/paper/harmonic-convolutional-networks-based-on |
Repo | https://github.com/matej-ulicny/harmonic-networks |
Framework | pytorch |
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
Title | LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation |
Authors | Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang |
Abstract | Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs — feature transformation and nonlinear activation — contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance. In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN — neighborhood aggregation — for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.5% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) — a state-of-the-art GCN-based recommender model — under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives. |
Tasks | Graph Classification |
Published | 2020-02-06 |
URL | https://arxiv.org/abs/2002.02126v1 |
https://arxiv.org/pdf/2002.02126v1.pdf | |
PWC | https://paperswithcode.com/paper/lightgcn-simplifying-and-powering-graph |
Repo | https://github.com/kuandeng/LightGCN |
Framework | tf |
Siamese Box Adaptive Network for Visual Tracking
Title | Siamese Box Adaptive Network for Visual Tracking |
Authors | Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, Rongrong Ji |
Abstract | Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. Unfortunately, they typically call for tedious and heuristic configurations. To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN. The no-prior box design avoids hyper-parameters associated with the candidate boxes, making SiamBAN more flexible and general. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019, OTB100, NFS, UAV123, and LaSOT demonstrate that SiamBAN achieves state-of-the-art performance and runs at 40 FPS, confirming its effectiveness and efficiency. The code will be available at https://github.com/hqucv/siamban. |
Tasks | Visual Tracking |
Published | 2020-03-15 |
URL | https://arxiv.org/abs/2003.06761v1 |
https://arxiv.org/pdf/2003.06761v1.pdf | |
PWC | https://paperswithcode.com/paper/siamese-box-adaptive-network-for-visual |
Repo | https://github.com/hqucv/siamban |
Framework | none |
Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors
Title | Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors |
Authors | Rituraj Kaushik, Timothée Anne, Jean-Baptiste Mouret |
Abstract | Meta-learning algorithms can accelerate the model-based reinforcement learning (MBRL) algorithms by finding an initial set of parameters for the dynamical model such that the model can be trained to match the actual dynamics of the system with only a few data-points. However, in the real world, a robot might encounter any situation starting from motor failures to finding itself in a rocky terrain where the dynamics of the robot can be significantly different from one another. In this paper, first, we show that when meta-training situations (the prior situations) have such diverse dynamics, using a single set of meta-trained parameters as a starting point still requires a large number of observations from the real system to learn a useful model of the dynamics. Second, we propose an algorithm called FAMLE that mitigates this limitation by meta-training several initial starting points (i.e., initial parameters) for training the model and allows the robot to select the most suitable starting point to adapt the model to the current situation with only a few gradient steps. We compare FAMLE to MBRL, MBRL with a meta-trained model with MAML, and model-free policy search algorithm PPO for various simulated and real robotic tasks, and show that FAMLE allows the robots to adapt to novel damages in significantly fewer time-steps than the baselines. |
Tasks | Meta-Learning |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.04663v1 |
https://arxiv.org/pdf/2003.04663v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-online-adaptation-in-robotics-through |
Repo | https://github.com/resibots/kaushik_2020_famle |
Framework | none |
DRMIME: Differentiable Mutual Information and Matrix Exponential for Multi-Resolution Image Registration
Title | DRMIME: Differentiable Mutual Information and Matrix Exponential for Multi-Resolution Image Registration |
Authors | Abhishek Nan, Matthew Tennant, Uriel Rubin, Nilanjan Ray |
Abstract | In this work, we present a novel unsupervised image registration algorithm. It is differentiable end-to-end and can be used for both multi-modal and mono-modal registration. This is done using mutual information (MI) as a metric. The novelty here is that rather than using traditional ways of approximating MI, we use a neural estimator called MINE and supplement it with matrix exponential for transformation matrix computation. This leads to improved results as compared to the standard algorithms available out-of-the-box in state-of-the-art image registration toolboxes. |
Tasks | Image Registration |
Published | 2020-01-27 |
URL | https://arxiv.org/abs/2001.09865v1 |
https://arxiv.org/pdf/2001.09865v1.pdf | |
PWC | https://paperswithcode.com/paper/drmime-differentiable-mutual-information-and |
Repo | https://github.com/abnan/DRMIME |
Framework | none |
Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog
Title | Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog |
Authors | Zekang Li, Zongjia Li, Jinchao Zhang, Yang Feng, Cheng Niu, Jie Zhou |
Abstract | Audio-Visual Scene-Aware Dialog (AVSD) is a task to generate responses when chatting about a given video, which is organized as a track of the 8th Dialog System Technology Challenge (DSTC8). To solve the task, we propose a universal multimodal transformer and introduce the multi-task learning method to learn joint representations among different modalities as well as generate informative and fluent responses. Our method extends the natural language generation pre-trained model to multimodal dialogue generation task. Our system achieves the best performance in both objective and subjective evaluations in the challenge. |
Tasks | Dialogue Generation, Multi-Task Learning, Text Generation |
Published | 2020-02-01 |
URL | https://arxiv.org/abs/2002.00163v1 |
https://arxiv.org/pdf/2002.00163v1.pdf | |
PWC | https://paperswithcode.com/paper/bridging-text-and-video-a-universal |
Repo | https://github.com/ictnlp/DSTC8-AVSD |
Framework | pytorch |
Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing
Title | Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing |
Authors | Hao-Hsiang Yang, Chao-Han Huck Yang, Yi-Chang James Tsai |
Abstract | Single image dehazing is the ill-posed two-dimensional signal reconstruction problem. Recently, deep convolutional neural networks (CNN) have been successfully used in many computer vision problems. In this paper, we propose a Y-net that is named for its structure. This network reconstructs clear images by aggregating multi-scale features maps. Additionally, we propose a Wavelet Structure SIMilarity (W-SSIM) loss function in the training step. In the proposed loss function, discrete wavelet transforms are applied repeatedly to divide the image into differently sized patches with different frequencies and scales. The proposed loss function is the accumulation of SSIM loss of various patches with respective ratios. Extensive experimental results demonstrate that the proposed Y-net with the W-SSIM loss function restores high-quality clear images and outperforms state-of-the-art algorithms. Code and models are available at https://github.com/dectrfov/Y-net. |
Tasks | Image Dehazing, Single Image Dehazing |
Published | 2020-03-31 |
URL | https://arxiv.org/abs/2003.13912v1 |
https://arxiv.org/pdf/2003.13912v1.pdf | |
PWC | https://paperswithcode.com/paper/y-net-multi-scale-feature-aggregation-network |
Repo | https://github.com/dectrfov/Y-net |
Framework | none |
An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization
Title | An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization |
Authors | Yiqiu Shen, Nan Wu, Jason Phang, Jungkyu Park, Kangning Liu, Sudarshini Tyagi, Laura Heacock, S. Gene Kim, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras |
Abstract | Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: https://github.com/nyukat/GMIC. |
Tasks | Lesion Segmentation |
Published | 2020-02-13 |
URL | https://arxiv.org/abs/2002.07613v1 |
https://arxiv.org/pdf/2002.07613v1.pdf | |
PWC | https://paperswithcode.com/paper/an-interpretable-classifier-for-high |
Repo | https://github.com/nyukat/GMIC |
Framework | pytorch |
Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion
Title | Unsupervised Representation Disentanglement using Cross Domain Features and Adversarial Learning in Variational Autoencoder based Voice Conversion |
Authors | Wen-Chin Huang, Hao Luo, Hsin-Te Hwang, Chen-Chou Lo, Yu-Huai Peng, Yu Tsao, Hsin-Min Wang |
Abstract | An effective approach for voice conversion (VC) is to disentangle linguistic content from other components in the speech signal. The effectiveness of variational autoencoder (VAE) based VC (VAE-VC), for instance, strongly relies on this principle. In our prior work, we proposed a cross-domain VAE-VC (CDVAE-VC) framework, which utilized acoustic features of different properties, to improve the performance of VAE-VC. We believed that the success came from more disentangled latent representations. In this paper, we extend the CDVAE-VC framework by incorporating the concept of adversarial learning, in order to further increase the degree of disentanglement, thereby improving the quality and similarity of converted speech. More specifically, we first investigate the effectiveness of incorporating the generative adversarial networks (GANs) with CDVAE-VC. Then, we consider the concept of domain adversarial training and add an explicit constraint to the latent representation, realized by a speaker classifier, to explicitly eliminate the speaker information that resides in the latent code. Experimental results confirm that the degree of disentanglement of the learned latent representation can be enhanced by both GANs and the speaker classifier. Meanwhile, subjective evaluation results in terms of quality and similarity scores demonstrate the effectiveness of our proposed methods. |
Tasks | Voice Conversion |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.07849v3 |
https://arxiv.org/pdf/2001.07849v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-representation-disentanglement |
Repo | https://github.com/unilight/cdvae-vc |
Framework | tf |
Deep learning of dynamical attractors from time series measurements
Title | Deep learning of dynamical attractors from time series measurements |
Authors | William Gilpin |
Abstract | Experimental measurements of physical systems often have a finite number of independent channels, causing essential dynamical variables to remain unobserved. However, many popular methods for unsupervised inference of latent dynamics from experimental data implicitly assume that the measurements have higher intrinsic dimensionality than the underlying system—making coordinate identification a dimensionality reduction problem. Here, we study the opposite limit, in which hidden governing coordinates must be inferred from only a low-dimensional time series of measurements. Inspired by classical techniques for studying the strange attractors of chaotic systems, we introduce a general embedding technique for time series, consisting of an autoencoder trained with a novel latent-space loss function. We first apply our technique to a variety of synthetic and real-world datasets with known strange attractors, and we use established and novel measures of attractor fidelity to show that our method successfully reconstructs attractors better than existing techniques. We then use our technique to discover dynamical attractors in datasets ranging from patient electrocardiograms, to household electricity usage, to eruptions of the Old Faithful geyser—demonstrating diverse applications of our technique for exploratory data analysis. |
Tasks | Dimensionality Reduction, Time Series |
Published | 2020-02-14 |
URL | https://arxiv.org/abs/2002.05909v1 |
https://arxiv.org/pdf/2002.05909v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-of-dynamical-attractors-from |
Repo | https://github.com/williamgilpin/fnn |
Framework | tf |
Structural Deep Clustering Network
Title | Structural Deep Clustering Network |
Authors | Deyu Bo, Xiao Wang, Chuan Shi, Meiqi Zhu, Emiao Lu, Peng Cui |
Abstract | Clustering is a fundamental task in data analysis. Recently, deep clustering, which derives inspiration primarily from deep learning approaches, achieves state-of-the-art performance and has attracted considerable attention. Current deep clustering methods usually boost the clustering results by means of the powerful representation ability of deep learning, e.g., autoencoder, suggesting that learning an effective representation for clustering is a crucial requirement. The strength of deep clustering methods is to extract the useful representations from the data itself, rather than the structure of data, which receives scarce attention in representation learning. Motivated by the great success of Graph Convolutional Network (GCN) in encoding the graph structure, we propose a Structural Deep Clustering Network (SDCN) to integrate the structural information into deep clustering. Specifically, we design a delivery operator to transfer the representations learned by autoencoder to the corresponding GCN layer, and a dual self-supervised mechanism to unify these two different deep neural architectures and guide the update of the whole model. In this way, the multiple structures of data, from low-order to high-order, are naturally combined with the multiple representations learned by autoencoder. Furthermore, we theoretically analyze the delivery operator, i.e., with the delivery operator, GCN improves the autoencoder-specific representation as a high-order graph regularization constraint and autoencoder helps alleviate the over-smoothing problem in GCN. Through comprehensive experiments, we demonstrate that our propose model can consistently perform better over the state-of-the-art techniques. |
Tasks | Representation Learning |
Published | 2020-02-05 |
URL | https://arxiv.org/abs/2002.01633v3 |
https://arxiv.org/pdf/2002.01633v3.pdf | |
PWC | https://paperswithcode.com/paper/structural-deep-clustering-network |
Repo | https://github.com/461054993/SDCN |
Framework | pytorch |