February 2, 2020

3223 words 16 mins read

Paper Group AWR 3

Distributionally Robust Optimization and Generalization in Kernel Methods. ICDM 2019 Knowledge Graph Contest: Team UWA. NEURO-DRAM: a 3D recurrent visual attention model for interpretable neuroimaging classification. Planning with State Abstractions for Non-Markovian Task Specifications. Neural Assistant: Joint Action Prediction, Response Generatio …

Distributionally Robust Optimization and Generalization in Kernel Methods


Title	Distributionally Robust Optimization and Generalization in Kernel Methods
Authors	Matthew Staib, Stefanie Jegelka
Abstract	Distributionally robust optimization (DRO) has attracted attention in machine learning due to its connections to regularization, generalization, and robustness. Existing work has considered uncertainty sets based on phi-divergences and Wasserstein distances, each of which have drawbacks. In this paper, we study DRO with uncertainty sets measured via maximum mean discrepancy (MMD). We show that MMD DRO is roughly equivalent to regularization by the Hilbert norm and, as a byproduct, reveal deep connections to classic results in statistical learning. In particular, we obtain an alternative proof of a generalization bound for Gaussian kernel ridge regression via a DRO lense. The proof also suggests a new regularizer. Our results apply beyond kernel methods: we derive a generically applicable approximation of MMD DRO, and show that it generalizes recent work on variance-based regularization.
Tasks
Published	2019-05-27
URL	https://arxiv.org/abs/1905.10943v1
PDF	https://arxiv.org/pdf/1905.10943v1.pdf
PWC	https://paperswithcode.com/paper/distributionally-robust-optimization-and
Repo	https://github.com/mstaib/mmd-dro-code
Framework	none

ICDM 2019 Knowledge Graph Contest: Team UWA


Title	ICDM 2019 Knowledge Graph Contest: Team UWA
Authors	Michael Stewart, Majigsuren Enkhsaikhan, Wei Liu
Abstract	We present an overview of our triple extraction system for the ICDM 2019 Knowledge Graph Contest. Our system uses a pipeline-based approach to extract a set of triples from a given document. It offers a simple and effective solution to the challenge of knowledge graph construction from domain-specific text. It also provides the facility to visualise useful information about each triple such as the degree, betweenness, structured relation type(s), and named entity types.
Tasks	graph construction
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01807v1
PDF	https://arxiv.org/pdf/1909.01807v1.pdf
PWC	https://paperswithcode.com/paper/icdm-2019-knowledge-graph-contest-team-uwa
Repo	https://github.com/Michael-Stewart-Webdev/text2kg-visualisation
Framework	none

NEURO-DRAM: a 3D recurrent visual attention model for interpretable neuroimaging classification


Title	NEURO-DRAM: a 3D recurrent visual attention model for interpretable neuroimaging classification
Authors	David Wood, James Cole, Thomas Booth
Abstract	Deep learning is attracting significant interest in the neuroimaging community as a means to diagnose psychiatric and neurological disorders from structural magnetic resonance images. However, there is a tendency amongst researchers to adopt architectures optimized for traditional computer vision tasks, rather than design networks customized for neuroimaging data. We address this by introducing NEURO-DRAM, a 3D recurrent visual attention model tailored for neuroimaging classification. The model comprises an agent which, trained by reinforcement learning, learns to navigate through volumetric images, selectively attending to the most informative regions for a given task. When applied to Alzheimer’s disease prediction, NEURODRAM achieves state-of-the-art classification accuracy on an out-of-sample dataset, significantly outperforming a baseline convolutional neural network. When further applied to the task of predicting which patients with mild cognitive impairment will be diagnosed with Alzheimer’s disease within two years, the model achieves state-of-the-art accuracy with no additional training. Encouragingly, the agent learns, without explicit instruction, a search policy in agreement with standardized radiological hallmarks of Alzheimer’s disease, suggesting a route to automated biomarker discovery for more poorly understood disorders.
Tasks	Disease Prediction
Published	2019-10-10
URL	https://arxiv.org/abs/1910.04721v3
PDF	https://arxiv.org/pdf/1910.04721v3.pdf
PWC	https://paperswithcode.com/paper/neuro-dram-a-3d-recurrent-visual-attention
Repo	https://github.com/neurodram/3D-recurrent-visual-attention-model
Framework	pytorch

Planning with State Abstractions for Non-Markovian Task Specifications


Title	Planning with State Abstractions for Non-Markovian Task Specifications
Authors	Yoonseon Oh, Roma Patel, Thao Nguyen, Baichuan Huang, Ellie Pavlick, Stefanie Tellex
Abstract	Often times, we specify tasks for a robot using temporal language that can also span different levels of abstraction. The example command `go to the kitchen before going to the second floor'' contains spatial abstraction, given that` floor’’ consists of individual rooms that can also be referred to in isolation (“kitchen”, for example). There is also a temporal ordering of events, defined by the word “before”. Previous works have used Linear Temporal Logic (LTL) to interpret temporal language (such as “before”), and Abstract Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such as “kitchen” and “second floor”), separately. To handle both types of commands at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a novel approach capable of representing non-Markovian reward functions at different levels of abstractions. The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. AP-MDP performs faster than a non-hierarchical method of solving LTL problems in over 95% of tasks, and this number only increases as the size of the environment domain increases. We also present a neural sequence-to-sequence model trained to translate language commands into LTL expression, and a new corpus of non-Markovian language commands spanning different levels of abstraction. We test our framework with the collected language commands on a drone, demonstrating that our approach enables a robot to efficiently solve temporal commands at different levels of abstraction.
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.12096v1
PDF	https://arxiv.org/pdf/1905.12096v1.pdf
PWC	https://paperswithcode.com/paper/planning-with-state-abstractions-for-non
Repo	https://github.com/h2r/ltl-amdp
Framework	none

Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning


Title	Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning
Authors	Arvind Neelakantan, Semih Yavuz, Sharan Narang, Vishaal Prasad, Ben Goodrich, Daniel Duckworth, Chinnadhurai Sankar, Xifeng Yan
Abstract	Task-oriented dialog presents a difficult challenge encompassing multiple problems including multi-turn language understanding and generation, knowledge retrieval and reasoning, and action prediction. Modern dialog systems typically begin by converting conversation history to a symbolic object referred to as belief state by using supervised learning. The belief state is then used to reason on an external knowledge source whose result along with the conversation history is used in action prediction and response generation tasks independently. Such a pipeline of individually optimized components not only makes the development process cumbersome but also makes it non-trivial to leverage session-level user reinforcement signals. In this paper, we develop Neural Assistant: a single neural network model that takes conversation history and an external knowledge source as input and jointly produces both text response and action to be taken by the system as output. The model learns to reason on the provided knowledge source with weak supervision signal coming from the text generation and the action prediction tasks, hence removing the need for belief state annotations. In the MultiWOZ dataset, we study the effect of distant supervision, and the size of knowledge base on model performance. We find that the Neural Assistant without belief states is able to incorporate external knowledge information achieving higher factual accuracy scores compared to Transformer. In settings comparable to reported baseline systems, Neural Assistant when provided with oracle belief state significantly improves language generation performance.
Tasks	Text Generation
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14613v1
PDF	https://arxiv.org/pdf/1910.14613v1.pdf
PWC	https://paperswithcode.com/paper/neural-assistant-joint-action-prediction
Repo	https://github.com/tensorflow/tensor2tensor
Framework	tf

An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning


Title	An Ensemble of Epoch-wise Empirical Bayes for Few-shot Learning
Authors	Yaoyao Liu, Bernt Schiele, Qianru Sun
Abstract	Few-shot learning aims to train efficient predictive models with a few examples. The lack of training data leads to poor models that perform high-variance or low-confidence predictions. In this paper, we propose to meta-learn the ensemble of epoch-wise empirical Bayes models (E3BM) to achieve robust predictions. “Epoch-wise” means that each training epoch has a Bayes model whose parameters are specifically learned and deployed. “Empirical” means that the hyperparameters, e.g., used for learning and ensembling the epoch-wise models, are generated by hyperprior learners conditional on task-specific data. We introduce four kinds of hyperprior learners by considering inductive vs. transductive, and epoch-dependent vs. epoch-independent, in the paradigm of meta-learning. We conduct extensive experiments for five-class few-shot tasks on three challenging benchmarks: miniImageNet, tieredImageNet, and FC100, and achieve top performance using the epoch-dependent transductive hyperprior learner, which captures the richest information. Our ablation study shows that both “epoch-wise ensemble” and “empirical” encourage high efficiency and robustness in the model performance.
Tasks	Few-Shot Learning, Meta-Learning
Published	2019-04-17
URL	https://arxiv.org/abs/1904.08479v5
PDF	https://arxiv.org/pdf/1904.08479v5.pdf
PWC	https://paperswithcode.com/paper/lcc-learning-to-customize-and-combine-neural
Repo	https://github.com/yaoyao-liu/E3BM
Framework	pytorch

Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters


Title	Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters
Authors	Axel Barroso-Laguna, Edgar Riba, Daniel Ponsa, Krystian Mikolajczyk
Abstract	We introduce a novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture. Handcrafted filters provide anchor structures for learned filters, which localize, score and rank repeatable features. Scale-space representation is used within the network to extract keypoints at different levels. We design a loss function to detect robust features that exist across a range of scales and to maximize the repeatability score. Our Key.Net model is trained on data synthetically created from ImageNet and evaluated on HPatches benchmark. Results show that our approach outperforms state-of-the-art detectors in terms of repeatability, matching performance and complexity.
Tasks	Keypoint Detection
Published	2019-04-01
URL	https://arxiv.org/abs/1904.00889v3
PDF	https://arxiv.org/pdf/1904.00889v3.pdf
PWC	https://paperswithcode.com/paper/keynet-keypoint-detection-by-handcrafted-and
Repo	https://github.com/axelBarroso/Key.Net
Framework	tf

Synthesizing New Retinal Symptom Images by Multiple Generative Models


Title	Synthesizing New Retinal Symptom Images by Multiple Generative Models
Authors	Yi-Chieh Liu, Hao-Hsiang Yang, Chao-Han Huck Yang, Jia-Hong Huang, Meng Tian, Hiromasa Morikawa, Yi-Chang James Tsai, Jesper Tegner
Abstract	Age-Related Macular Degeneration (AMD) is an asymptomatic retinal disease which may result in loss of vision. There is limited access to high-quality relevant retinal images and poor understanding of the features defining sub-classes of this disease. Motivated by recent advances in machine learning we specifically explore the potential of generative modeling, using Generative Adversarial Networks (GANs) and style transferring, to facilitate clinical diagnosis and disease understanding by feature extraction. We design an analytic pipeline which first generates synthetic retinal images from clinical images; a subsequent verification step is applied. In the synthesizing step we merge GANs (DCGANs and WGANs architectures) and style transferring for the image generation, whereas the verified step controls the accuracy of the generated images. We find that the generated images contain sufficient pathological details to facilitate ophthalmologists’ task of disease classification and in discovery of disease relevant features. In particular, our system predicts the drusen and geographic atrophy sub-classes of AMD. Furthermore, the performance using CFP images for GANs outperforms the classification based on using only the original clinical dataset. Our results are evaluated using existing classifier of retinal diseases and class activated maps, supporting the predictive power of the synthetic images and their utility for feature extraction. Our code examples are available online.
Tasks	Image Generation
Published	2019-02-11
URL	http://arxiv.org/abs/1902.04147v1
PDF	http://arxiv.org/pdf/1902.04147v1.pdf
PWC	https://paperswithcode.com/paper/synthesizing-new-retinal-symptom-images-by
Repo	https://github.com/huckiyang/EyeNet-GANs
Framework	pytorch

Learning a Generative Model of Cancer Metastasis


Title	Learning a Generative Model of Cancer Metastasis
Authors	Benjamin Kompa, Beau Coker
Abstract	We introduce a Unified Disentanglement Network (UFDN) trained on The Cancer Genome Atlas (TCGA). We demonstrate that the UFDN learns a biologically relevant latent space of gene expression data by applying our network to two classification tasks of cancer status and cancer type. Our UFDN specific algorithms perform comparably to random forest methods. The UFDN allows for continuous, partial interpolation between distinct cancer types. Furthermore, we perform an analysis of differentially expressed genes between skin cutaneous melanoma(SKCM) samples and the same samples interpolated into glioblastoma (GBM). We demonstrate that our interpolations learn relevant metagenes that recapitulate known glioblastoma mechanisms and suggest possible starting points for investigations into the metastasis of SKCM into GBM.
Tasks
Published	2019-01-17
URL	http://arxiv.org/abs/1901.06023v1
PDF	http://arxiv.org/pdf/1901.06023v1.pdf
PWC	https://paperswithcode.com/paper/learning-a-generative-model-of-cancer
Repo	https://github.com/bkompa/UFDN-TCGA
Framework	pytorch

The Indirect Convolution Algorithm


Title	The Indirect Convolution Algorithm
Authors	Marat Dukhan
Abstract	Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS libraries. Convolutions with 1x1 kernels can be directly represented as a GEMM call, but convolutions with larger kernels require a special memory layout transformation - im2col or im2row - to fit into GEMM interface. The Indirect Convolution algorithm provides the efficiency of the GEMM primitive without the overhead of im2col transformation. In contrast to GEMM-based algorithms, the Indirect Convolution does not reshuffle the data to fit into the GEMM primitive but introduces an indirection buffer - a buffer of pointers to the start of each row of image pixels. This broadens the application of our modified GEMM function to convolutions with arbitrary kernel size, padding, stride, and dilation. The Indirect Convolution algorithm reduces memory overhead proportionally to the number of input channels and outperforms the GEMM-based algorithm by up to 62% on convolution parameters which involve im2col transformations in GEMM-based algorithms. This, however, comes at cost of minor performance reduction on 1x1 stride-1 convolutions.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.02129v1
PDF	https://arxiv.org/pdf/1907.02129v1.pdf
PWC	https://paperswithcode.com/paper/the-indirect-convolution-algorithm
Repo	https://github.com/google/XNNPACK
Framework	tf

A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels


Title	A Hierarchical Grocery Store Image Dataset with Visual and Semantic Labels
Authors	Marcus Klasson, Cheng Zhang, Hedvig Kjellström
Abstract	Image classification models built into visual support systems and other assistive devices need to provide accurate predictions about their environment. We focus on an application of assistive technology for people with visual impairments, for daily activities such as shopping or cooking. In this paper, we provide a new benchmark dataset for a challenging task in this application - classification of fruits, vegetables, and refrigerated products, e.g. milk packages and juice cartons, in grocery stores. To enable the learning process to utilize multiple sources of structured information, this dataset not only contains a large volume of natural images but also includes the corresponding information of the product from an online shopping website. Such information encompasses the hierarchical structure of the object classes, as well as an iconic image of each type of object. This dataset can be used to train and evaluate image classification models for helping visually impaired people in natural environments. Additionally, we provide benchmark results evaluated on pretrained convolutional neural networks often used for image understanding purposes, and also a multi-view variational autoencoder, which is capable of utilizing the rich product information in the dataset.
Tasks	Image Classification
Published	2019-01-03
URL	http://arxiv.org/abs/1901.00711v1
PDF	http://arxiv.org/pdf/1901.00711v1.pdf
PWC	https://paperswithcode.com/paper/a-hierarchical-grocery-store-image-dataset
Repo	https://github.com/marcusklasson/GroceryStoreDataset
Framework	none

Traffic4cast-Traffic Map Movie Forecasting – Team MIE-Lab


Title	Traffic4cast-Traffic Map Movie Forecasting – Team MIE-Lab
Authors	Henry Martin, Ye Hong, Dominik Bucher, Christian Rupprecht, René Buffat
Abstract	The goal of the IARAI competition traffic4cast was to predict the city-wide traffic status within a 15-minute time window, based on information from the previous hour. The traffic status was given as multi-channel images (one pixel roughly corresponds to 100x100 meters), where one channel indicated the traffic volume, another one the average speed of vehicles, and a third one their rough heading. As part of our work on the competition, we evaluated many different network architectures, analyzed the statistical properties of the given data in detail, and thought about how to transform the problem to be able to take additional spatio-temporal context-information into account, such as the street network, the positions of traffic lights, or the weather. This document summarizes our efforts that led to our best submission, and gives some insights about which other approaches we evaluated, and why they did not work as well as imagined.
Tasks
Published	2019-10-27
URL	https://arxiv.org/abs/1910.13824v2
PDF	https://arxiv.org/pdf/1910.13824v2.pdf
PWC	https://paperswithcode.com/paper/traffic4cast-traffic-map-movie-forecasting
Repo	https://github.com/mie-lab/traffic4cast
Framework	pytorch

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation


Title	UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation
Authors	Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang
Abstract	The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects – an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.
Tasks	Computed Tomography (CT), Instance Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05074v2
PDF	https://arxiv.org/pdf/1912.05074v2.pdf
PWC	https://paperswithcode.com/paper/unet-redesigning-skip-connections-to-exploit
Repo	https://github.com/MrGiovanni/UNetPlusPlus
Framework	pytorch

Insights into LSTM Fully Convolutional Networks for Time Series Classification


Title	Insights into LSTM Fully Convolutional Networks for Time Series Classification
Authors	Fazle Karim, Somshubra Majumdar, Houshang Darabi
Abstract	Long Short Term Memory Fully Convolutional Neural Networks (LSTM-FCN) and Attention LSTM-FCN (ALSTM-FCN) have shown to achieve state-of-the-art performance on the task of classifying time series signals on the old University of California-Riverside (UCR) time series repository. However, there has been no study on why LSTM-FCN and ALSTM-FCN perform well. In this paper, we perform a series of ablation tests (3627 experiments) on LSTM-FCN and ALSTM-FCN to provide a better understanding of the model and each of its sub-module. Results from the ablation tests on ALSTM-FCN and LSTM-FCN show that the LSTM and the FCN blocks perform better when applied in a conjoined manner. Two z-normalizing techniques, z-normalizing each sample independently and z-normalizing the whole dataset, are compared using a Wilcoxson signed-rank test to show a statistical difference in performance. In addition, we provide an understanding of the impact dimension shuffle has on LSTM-FCN by comparing its performance with LSTM-FCN when no dimension shuffle is applied. Finally, we demonstrate the performance of the LSTM-FCN when the LSTM block is replaced by a GRU, basic RNN, and Dense Block.
Tasks	Time Series, Time Series Classification
Published	2019-02-27
URL	https://arxiv.org/abs/1902.10756v3
PDF	https://arxiv.org/pdf/1902.10756v3.pdf
PWC	https://paperswithcode.com/paper/insights-into-lstm-fully-convolutional
Repo	https://github.com/titu1994/LSTM-FCN-Ablation
Framework	tf

Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction


Title	Unsupervised Emergence of Egocentric Spatial Structure from Sensorimotor Prediction
Authors	Alban Laflaquière, Michael Garcia Ortiz
Abstract	Despite its omnipresence in robotics application, the nature of spatial knowledge and the mechanisms that underlie its emergence in autonomous agents are still poorly understood. Recent theoretical works suggest that the Euclidean structure of space induces invariants in an agent’s raw sensorimotor experience. We hypothesize that capturing these invariants is beneficial for sensorimotor prediction and that, under certain exploratory conditions, a motor representation capturing the structure of the external space should emerge as a byproduct of learning to predict future sensory experiences. We propose a simple sensorimotor predictive scheme, apply it to different agents and types of exploration, and evaluate the pertinence of these hypotheses. We show that a naive agent can capture the topology and metric regularity of its sensor’s position in an egocentric spatial frame without any a priori knowledge, nor extraneous supervision.
Tasks
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01401v3
PDF	https://arxiv.org/pdf/1906.01401v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-emergence-of-egocentric-spatial
Repo	https://github.com/alaflaquiere/learn-spatial-structure
Framework	tf