October 20, 2019

3430 words 17 mins read

Paper Group AWR 243

Efficient Attention: Attention with Linear Complexities. Estimating Depth from RGB and Sparse Sensing. Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks. Unsupervised Grammar Induction with Depth-bounded PCFG. Insights on representational similarity in neural networks with canonical correlation. Merg …

Efficient Attention: Attention with Linear Complexities


Title	Efficient Attention: Attention with Linear Complexities
Authors	Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li
Abstract	The attention mechanism has seen wide applications in computer vision and natural language processing. Recent works developed the dot-product attention mechanism and applied it to various vision and language tasks. However, the memory and computational costs of dot-product attention grows quadratically with the spatiotemporal size of the input. Such growth prohibits the application of the mechanism on large inputs, e.g., long sequences, high-resolution images, or large videos. To remedy this drawback, this paper proposes a novel efficient attention mechanism, which is equivalent to dot-product attention but has substantially less memory and computational costs. The resource efficiency allows more widespread and flexible incorporation of efficient attention modules into a neural network, which leads to improved accuracies. Empirical evaluations on object recognition and image classification demonstrated the effectiveness of its advantages. Models with efficient attention achieved state-of-the-art performance on MS-COCO 2017 and significant improvement on ImageNet. Further, the resource efficiency of the mechanism democratizes attention to complicated models, which were unable to incorporate original dot-product attention due to prohibitively high costs. As an exemplar, an efficient attention-augmented model achieved state-of-the-art accuracies for stereo depth estimation on the Scene Flow dataset. Code is available at https://github.com/cmsflash/efficient-attention.
Tasks	Depth Estimation, Image Classification, Instance Segmentation, Object Detection, Object Recognition, Semantic Segmentation, Stereo Depth Estimation
Published	2018-12-04
URL	https://arxiv.org/abs/1812.01243v8
PDF	https://arxiv.org/pdf/1812.01243v8.pdf
PWC	https://paperswithcode.com/paper/factorized-attention-self-attention-with
Repo	https://github.com/cmsflash/efficient-attention
Framework	pytorch

Estimating Depth from RGB and Sparse Sensing


Title	Estimating Depth from RGB and Sparse Sensing
Authors	Zhao Chen, Vijay Badrinarayanan, Gilad Drozdov, Andrew Rabinovich
Abstract	We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The model works simultaneously for both indoor/outdoor scenes and produces state-of-the-art dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI datasets. We surpass the state-of-the-art for monocular depth estimation even with depth values for only 1 out of every ~10000 image pixels, and we outperform other sparse-to-dense depth methods at all sparsity levels. With depth values for 1/256 of the image pixels, we achieve a mean absolute error of less than 1% of actual depth on indoor scenes, comparable to the performance of consumer-grade depth sensor hardware. Our experiments demonstrate that it would indeed be possible to efficiently transform sparse depth measurements obtained using e.g. lower-power depth sensors or SLAM systems into high-quality dense depth maps.
Tasks	Depth Estimation, Monocular Depth Estimation
Published	2018-04-08
URL	http://arxiv.org/abs/1804.02771v2
PDF	http://arxiv.org/pdf/1804.02771v2.pdf
PWC	https://paperswithcode.com/paper/estimating-depth-from-rgb-and-sparse-sensing
Repo	https://github.com/lakshjaisinghani/Estimating-Depth-from-RGB-and-Sparse-Sensing
Framework	pytorch

Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks


Title	Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks
Authors	Xuan Gu, Hans Knutsson, Markus Nilsson, Anders Eklund
Abstract	Diffusion magnetic resonance imaging (diffusion MRI) is a non-invasive microstructure assessment technique. Scalar measures, such as FA (fractional anisotropy) and MD (mean diffusivity), quantifying micro-structural tissue properties can be obtained using diffusion models and data processing pipelines. However, it is costly and time consuming to collect high quality diffusion data. Here, we therefore demonstrate how Generative Adversarial Networks (GANs) can be used to generate synthetic diffusion scalar measures from structural T1-weighted images in a single optimized step. Specifically, we train the popular CycleGAN model to learn to map a T1 image to FA or MD, and vice versa. As an application, we show that synthetic FA images can be used as a target for non-linear registration, to correct for geometric distortions common in diffusion MRI.
Tasks
Published	2018-10-05
URL	http://arxiv.org/abs/1810.02683v3
PDF	http://arxiv.org/pdf/1810.02683v3.pdf
PWC	https://paperswithcode.com/paper/generating-diffusion-mri-scalar-maps-from-t1
Repo	https://github.com/xuagu37/CycleGAN
Framework	tf

Unsupervised Grammar Induction with Depth-bounded PCFG


Title	Unsupervised Grammar Induction with Depth-bounded PCFG
Authors	Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz
Abstract	There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, gram- mars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.
Tasks
Published	2018-02-23
URL	http://arxiv.org/abs/1802.08545v2
PDF	http://arxiv.org/pdf/1802.08545v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-grammar-induction-with-depth
Repo	https://github.com/lifengjin/db-pcfg
Framework	none

Insights on representational similarity in neural networks with canonical correlation


Title	Insights on representational similarity in neural networks with canonical correlation
Authors	Ari S. Morcos, Maithra Raghu, Samy Bengio
Abstract	Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations.
Tasks
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05759v3
PDF	http://arxiv.org/pdf/1806.05759v3.pdf
PWC	https://paperswithcode.com/paper/insights-on-representational-similarity-in
Repo	https://github.com/moskomule/cca.pytorch
Framework	pytorch

Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation


Title	Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation
Authors	Chang Li, Ilya Markov, Maarten de Rijke, Masrour Zoghi
Abstract	Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaved comparison methods, how to effectively choose the pair of rankers to generate the result list without degrading the user experience too much can be formalized as a K-armed dueling bandit problem, which is an online partial-information learning framework, where feedback comes in the form of pair-wise preferences. A commercial search system may evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers has not been fully studied. In this paper, we focus on solving the large-scale online ranker evaluation problem under the so-called Condorcet assumption, where there exists an optimal ranker that is preferred to all other rankers. We propose Merge Double Thompson Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that localizes the comparisons carried out by the algorithm to small batches of rankers, and then employs the Thompson Sampling (TS) to reduce the comparisons between suboptimal rankers inside these small batches. The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search. Our main finding is that for large-scale Condorcet ranker evaluation problems MergeDTS outperforms the state-of-the-art dueling bandit algorithms.
Tasks	Information Retrieval, Online Ranker Evaluation
Published	2018-12-11
URL	http://arxiv.org/abs/1812.04412v1
PDF	http://arxiv.org/pdf/1812.04412v1.pdf
PWC	https://paperswithcode.com/paper/merge-double-thompson-sampling-for-large
Repo	https://github.com/chang-li/MergeDTS
Framework	none

Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data


Title	Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data
Authors	Jacson Rodrigues Correia-Silva, Rodrigo F. Berriel, Claudine Badue, Alberto F. de Souza, Thiago Oliveira-Santos
Abstract	In the past few years, Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems. Many companies employ resources and money to generate these models and provide them as an API, therefore it is in their best interest to protect them, i.e., to avoid that someone else copies them. Recent studies revealed that state-of-the-art CNNs are vulnerable to adversarial examples attacks, and this weakness indicates that CNNs do not need to operate in the problem domain (PD). Therefore, we hypothesize that they also do not need to be trained with examples of the PD in order to operate in it. Given these facts, in this paper, we investigate if a target black-box CNN can be copied by persuading it to confess its knowledge through random non-labeled data. The copy is two-fold: i) the target network is queried with random data and its predictions are used to create a fake dataset with the knowledge of the network; and ii) a copycat network is trained with the fake dataset and should be able to achieve similar performance as the target network. This hypothesis was evaluated locally in three problems (facial expression, object, and crosswalk classification) and against a cloud-based API. In the copy attacks, images from both non-problem domain and PD were used. All copycat networks achieved at least 93.7% of the performance of the original models with non-problem domain data, and at least 98.6% using additional data from the PD. Additionally, the copycat CNN successfully copied at least 97.3% of the performance of the Microsoft Azure Emotion API. Our results show that it is possible to create a copycat CNN by simply querying a target network as black-box with random non-labeled data.
Tasks
Published	2018-06-14
URL	http://arxiv.org/abs/1806.05476v1
PDF	http://arxiv.org/pdf/1806.05476v1.pdf
PWC	https://paperswithcode.com/paper/copycat-cnn-stealing-knowledge-by-persuading
Repo	https://github.com/jeiks/Stealing_DL_Models
Framework	caffe2

Attributed Network Embedding for Incomplete Attributed Networks


Title	Attributed Network Embedding for Incomplete Attributed Networks
Authors	Chengbin Hou, Shan He, Ke Tang
Abstract	Attributed networks are ubiquitous since a network often comes with auxiliary attribute information e.g. a social network with user profiles. Attributed Network Embedding (ANE) has recently attracted considerable attention, which aims to learn unified low dimensional node embeddings while preserving both structural and attribute information. The resulting node embeddings can then facilitate various network downstream tasks e.g. link prediction. Although there are several ANE methods, most of them cannot deal with incomplete attributed networks with missing links and/or missing node attributes, which often occur in real-world scenarios. To address this issue, we propose a robust ANE method, the general idea of which is to reconstruct a unified denser network by fusing two sources of information for information enhancement, and then employ a random walks based network embedding method for learning node embeddings. The experiments of link prediction, node classification, visualization, and parameter sensitivity analysis on six real-world datasets validate the effectiveness of our method to incomplete attributed networks.
Tasks	Link Prediction, Network Embedding, Node Classification
Published	2018-11-28
URL	https://arxiv.org/abs/1811.11728v2
PDF	https://arxiv.org/pdf/1811.11728v2.pdf
PWC	https://paperswithcode.com/paper/attributed-network-embedding-for-incomplete
Repo	https://github.com/houchengbin/OpenANE
Framework	tf

Tell Me Where to Look: Guided Attention Inference Network


Title	Tell Me Where to Look: Guided Attention Inference Network
Authors	Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu
Abstract	Weakly supervised learning with only coarse labels can obtain visual explanations of deep neural network such as attention maps by back-propagating gradients. These attention maps are then available as priors for tasks such as object localization and semantic segmentation. In one common framework we address three shortcomings of previous approaches in modeling such attention maps: We (1) first time make attention maps an explicit and natural component of the end-to-end training, (2) provide self-guidance directly on these maps by exploring supervision form the network itself to improve them, and (3) seamlessly bridge the gap between using weak and extra supervision if available. Despite its simplicity, experiments on the semantic segmentation task demonstrate the effectiveness of our methods. We clearly surpass the state-of-the-art on Pascal VOC 2012 val. and test set. Besides, the proposed framework provides a way not only explaining the focus of the learner but also feeding back with direct guidance towards specific tasks. Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance.
Tasks	Object Localization, Semantic Segmentation
Published	2018-02-27
URL	http://arxiv.org/abs/1802.10171v1
PDF	http://arxiv.org/pdf/1802.10171v1.pdf
PWC	https://paperswithcode.com/paper/tell-me-where-to-look-guided-attention
Repo	https://github.com/AustinDoolittle/Pytorch-Gain
Framework	pytorch

Dimensionality-Driven Learning with Noisy Labels


Title	Dimensionality-Driven Learning with Noisy Labels
Authors	Xingjun Ma, Yisen Wang, Michael E. Houle, Shuo Zhou, Sarah M. Erfani, Shu-Tao Xia, Sudanthi Wijewickrema, James Bailey
Abstract	Datasets with significant proportions of noisy (incorrect) class labels present challenges for training accurate Deep Neural Networks (DNNs). We propose a new perspective for understanding DNN generalization for such datasets, by investigating the dimensionality of the deep representation subspace of training samples. We show that from a dimensionality perspective, DNNs exhibit quite distinctive learning styles when trained with clean labels versus when trained with a proportion of noisy labels. Based on this finding, we develop a new dimensionality-driven learning strategy, which monitors the dimensionality of subspaces during training and adapts the loss function accordingly. We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution.
Tasks
Published	2018-06-07
URL	http://arxiv.org/abs/1806.02612v2
PDF	http://arxiv.org/pdf/1806.02612v2.pdf
PWC	https://paperswithcode.com/paper/dimensionality-driven-learning-with-noisy
Repo	https://github.com/ansuini/IntrinsicDimDeep
Framework	pytorch

RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition


Title	RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Authors	Albert Zeyer, Tamer Alkhouli, Hermann Ney
Abstract	We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.
Tasks	Speech Recognition
Published	2018-05-14
URL	http://arxiv.org/abs/1805.05225v2
PDF	http://arxiv.org/pdf/1805.05225v2.pdf
PWC	https://paperswithcode.com/paper/returnn-as-a-generic-flexible-neural-toolkit
Repo	https://github.com/rwth-i6/returnn
Framework	tf

Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography


Title	Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography
Authors	Eunhee Kang, Hyun Jung Koo, Dong Hyun Yang, Joon Bum Seo, Jong Chul Ye
Abstract	In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phases by learning from the CT images in the routine dose phases. Although a supervised learning approach is not possible due to the differences in the underlying heart structure in two phases, the images in the two phases are closely related so that we propose a cycle-consistent adversarial denoising network to learn the non-degenerate mapping between the low and high dose cardiac phases. Experimental results showed that the proposed method effectively reduces the noise in the low-dose CT image while the preserving detailed texture and edge information. Moreover, thanks to the cyclic consistency and identity loss, the proposed network does not create any artificial features that are not present in the input images. Visual grading and quality evaluation also confirm that the proposed method provides significant improvement in diagnostic quality.
Tasks	Denoising
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09748v3
PDF	http://arxiv.org/pdf/1806.09748v3.pdf
PWC	https://paperswithcode.com/paper/cycle-consistent-adversarial-denoising
Repo	https://github.com/hyeongyuy/CT-CYCLE_IDNETITY_GAN_tensorflow
Framework	tf

Exploring Embedding Methods in Binary Hyperdimensional Computing: A Case Study for Motor-Imagery based Brain-Computer Interfaces


Title	Exploring Embedding Methods in Binary Hyperdimensional Computing: A Case Study for Motor-Imagery based Brain-Computer Interfaces
Authors	Michael Hersche, José del R. Millán, Luca Benini, Abbas Rahimi
Abstract	Key properties of brain-inspired hyperdimensional (HD) computing make it a prime candidate for energy-efficient and fast learning in biosignal processing. The main challenge is however to formulate embedding methods that map biosignal measures to a binary HD space. In this paper, we explore variety of such embedding methods and examine them with a challenging application of motor imagery brain-computer interface (MI-BCI) from electroencephalography (EEG) recordings. We explore embedding methods including random projections, quantization based thermometer and Gray coding, and learning HD representations using end-to-end training. All these methods, differing in complexity, aim to represent EEG signals in binary HD space, e.g. with 10,000 bits. This leads to development of a set of HD learning and classification methods that can be selectively chosen (or configured) based on accuracy and/or computational complexity requirements of a given task. We compare them with state-of-the-art linear support vector machine (SVM) on an NVIDIA TX2 board using the 4-class BCI competition IV-2a dataset as well as a new 3-class dataset. Compared to SVM, results on 3-class dataset show that simple thermometer embedding achieves moderate average accuracy (79.56% vs. 82.67%) with 26.8$\times$ faster training time and 22.3$\times$ lower energy; on the other hand, switching to end-to-end training with learned HD representations wipes out these training benefits while boosting the accuracy to 84.22% (1.55% higher than SVM). Similar trend is observed on the 4-class dataset where SVM achieves on average 74.29%: the thermometer embedding achieves 89.9$\times$ faster training time and 58.7$\times$ lower energy, but a lower accuracy (67.09%) than the learned representation of 72.54%.
Tasks	EEG, Quantization
Published	2018-12-13
URL	http://arxiv.org/abs/1812.05705v2
PDF	http://arxiv.org/pdf/1812.05705v2.pdf
PWC	https://paperswithcode.com/paper/exploring-embedding-methods-in-binary
Repo	https://github.com/MHersche/HDembedding-BCI
Framework	pytorch

IVUS-Net: An Intravascular Ultrasound Segmentation Network


Title	IVUS-Net: An Intravascular Ultrasound Segmentation Network
Authors	Ji Yang, Lin Tong, Mehdi Faraji, Anup Basu
Abstract	IntraVascular UltraSound (IVUS) is one of the most effective imaging modalities that provides assistance to experts in order to diagnose and treat cardiovascular diseases. We address a central problem in IVUS image analysis with Fully Convolutional Network (FCN): automatically delineate the lumen and media-adventitia borders in IVUS images, which is crucial to shorten the diagnosis process or benefits a faster and more accurate 3D reconstruction of the artery. Particularly, we propose an FCN architecture, called IVUS-Net, followed by a post-processing contour extraction step, in order to automatically segments the interior (lumen) and exterior (media-adventitia) regions of the human arteries. We evaluated our IVUS-Net on the test set of a standard publicly available dataset containing 326 IVUS B-mode images with two measurements, namely Jaccard Measure (JM) and Hausdorff Distances (HD). The evaluation result shows that IVUS-Net outperforms the state-of-the-art lumen and media segmentation methods by 4% to 20% in terms of HD distance. IVUS-Net performs well on images in the test set that contain a significant amount of major artifacts such as bifurcations, shadows, and side branches that are not common in the training set. Furthermore, using a modern GPU, IVUS-Net segments each IVUS frame only in 0.15 seconds. The proposed work, to the best of our knowledge, is the first deep learning based method for segmentation of both the lumen and the media vessel walls in 20 MHz IVUS B-mode images that achieves the best results without any manual intervention. Code is available at https://github.com/Kulbear/ivus-segmentation-icsm2018
Tasks	3D Reconstruction
Published	2018-06-10
URL	http://arxiv.org/abs/1806.03583v2
PDF	http://arxiv.org/pdf/1806.03583v2.pdf
PWC	https://paperswithcode.com/paper/ivus-net-an-intravascular-ultrasound
Repo	https://github.com/Kulbear/ivus-segmentation-icsm2018
Framework	tf

VFunc: a Deep Generative Model for Functions


Title	VFunc: a Deep Generative Model for Functions
Authors	Philip Bachman, Riashat Islam, Alessandro Sordoni, Zafarali Ahmed
Abstract	We introduce a deep generative model for functions. Our model provides a joint distribution p(f, z) over functions f and latent variables z which lets us efficiently sample from the marginal p(f) and maximize a variational lower bound on the entropy H(f). We can thus maximize objectives of the form E_{f~p(f)}[R(f)] + c*H(f), where R(f) denotes, e.g., a data log-likelihood term or an expected reward. Such objectives encompass Bayesian deep learning in function space, rather than parameter space, and Bayesian deep RL with representations of uncertainty that offer benefits over bootstrapping and parameter noise. In this short paper we describe our model, situate it in the context of prior work, and present proof-of-concept experiments for regression and RL.
Tasks
Published	2018-07-11
URL	http://arxiv.org/abs/1807.04106v1
PDF	http://arxiv.org/pdf/1807.04106v1.pdf
PWC	https://paperswithcode.com/paper/vfunc-a-deep-generative-model-for-functions
Repo	https://github.com/zafarali/emdp
Framework	none