Paper Group AWR 243
Efficient Attention: Attention with Linear Complexities. Estimating Depth from RGB and Sparse Sensing. Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks. Unsupervised Grammar Induction with Depth-bounded PCFG. Insights on representational similarity in neural networks with canonical correlation. Merg …
Efficient Attention: Attention with Linear Complexities
Title | Efficient Attention: Attention with Linear Complexities |
Authors | Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, Hongsheng Li |
Abstract | The attention mechanism has seen wide applications in computer vision and natural language processing. Recent works developed the dot-product attention mechanism and applied it to various vision and language tasks. However, the memory and computational costs of dot-product attention grows quadratically with the spatiotemporal size of the input. Such growth prohibits the application of the mechanism on large inputs, e.g., long sequences, high-resolution images, or large videos. To remedy this drawback, this paper proposes a novel efficient attention mechanism, which is equivalent to dot-product attention but has substantially less memory and computational costs. The resource efficiency allows more widespread and flexible incorporation of efficient attention modules into a neural network, which leads to improved accuracies. Empirical evaluations on object recognition and image classification demonstrated the effectiveness of its advantages. Models with efficient attention achieved state-of-the-art performance on MS-COCO 2017 and significant improvement on ImageNet. Further, the resource efficiency of the mechanism democratizes attention to complicated models, which were unable to incorporate original dot-product attention due to prohibitively high costs. As an exemplar, an efficient attention-augmented model achieved state-of-the-art accuracies for stereo depth estimation on the Scene Flow dataset. Code is available at https://github.com/cmsflash/efficient-attention. |
Tasks | Depth Estimation, Image Classification, Instance Segmentation, Object Detection, Object Recognition, Semantic Segmentation, Stereo Depth Estimation |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01243v8 |
https://arxiv.org/pdf/1812.01243v8.pdf | |
PWC | https://paperswithcode.com/paper/factorized-attention-self-attention-with |
Repo | https://github.com/cmsflash/efficient-attention |
Framework | pytorch |
Estimating Depth from RGB and Sparse Sensing
Title | Estimating Depth from RGB and Sparse Sensing |
Authors | Zhao Chen, Vijay Badrinarayanan, Gilad Drozdov, Andrew Rabinovich |
Abstract | We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The model works simultaneously for both indoor/outdoor scenes and produces state-of-the-art dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI datasets. We surpass the state-of-the-art for monocular depth estimation even with depth values for only 1 out of every ~10000 image pixels, and we outperform other sparse-to-dense depth methods at all sparsity levels. With depth values for 1/256 of the image pixels, we achieve a mean absolute error of less than 1% of actual depth on indoor scenes, comparable to the performance of consumer-grade depth sensor hardware. Our experiments demonstrate that it would indeed be possible to efficiently transform sparse depth measurements obtained using e.g. lower-power depth sensors or SLAM systems into high-quality dense depth maps. |
Tasks | Depth Estimation, Monocular Depth Estimation |
Published | 2018-04-08 |
URL | http://arxiv.org/abs/1804.02771v2 |
http://arxiv.org/pdf/1804.02771v2.pdf | |
PWC | https://paperswithcode.com/paper/estimating-depth-from-rgb-and-sparse-sensing |
Repo | https://github.com/lakshjaisinghani/Estimating-Depth-from-RGB-and-Sparse-Sensing |
Framework | pytorch |
Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks
Title | Generating Diffusion MRI scalar maps from T1 weighted images using generative adversarial networks |
Authors | Xuan Gu, Hans Knutsson, Markus Nilsson, Anders Eklund |
Abstract | Diffusion magnetic resonance imaging (diffusion MRI) is a non-invasive microstructure assessment technique. Scalar measures, such as FA (fractional anisotropy) and MD (mean diffusivity), quantifying micro-structural tissue properties can be obtained using diffusion models and data processing pipelines. However, it is costly and time consuming to collect high quality diffusion data. Here, we therefore demonstrate how Generative Adversarial Networks (GANs) can be used to generate synthetic diffusion scalar measures from structural T1-weighted images in a single optimized step. Specifically, we train the popular CycleGAN model to learn to map a T1 image to FA or MD, and vice versa. As an application, we show that synthetic FA images can be used as a target for non-linear registration, to correct for geometric distortions common in diffusion MRI. |
Tasks | |
Published | 2018-10-05 |
URL | http://arxiv.org/abs/1810.02683v3 |
http://arxiv.org/pdf/1810.02683v3.pdf | |
PWC | https://paperswithcode.com/paper/generating-diffusion-mri-scalar-maps-from-t1 |
Repo | https://github.com/xuagu37/CycleGAN |
Framework | tf |
Unsupervised Grammar Induction with Depth-bounded PCFG
Title | Unsupervised Grammar Induction with Depth-bounded PCFG |
Authors | Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz |
Abstract | There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016). This work extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchical sequence models, and therefore more fully exploits the space reductions of depth-bounding. Results for this model on grammar acquisition from transcribed child-directed speech and newswire text exceed or are competitive with those of other models when evaluated on parse accuracy. Moreover, gram- mars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models. |
Tasks | |
Published | 2018-02-23 |
URL | http://arxiv.org/abs/1802.08545v2 |
http://arxiv.org/pdf/1802.08545v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-grammar-induction-with-depth |
Repo | https://github.com/lifengjin/db-pcfg |
Framework | none |
Insights on representational similarity in neural networks with canonical correlation
Title | Insights on representational similarity in neural networks with canonical correlation |
Authors | Ari S. Morcos, Maithra Raghu, Samy Bengio |
Abstract | Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method (Raghu et al., 2017). We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottom-up pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations. |
Tasks | |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05759v3 |
http://arxiv.org/pdf/1806.05759v3.pdf | |
PWC | https://paperswithcode.com/paper/insights-on-representational-similarity-in |
Repo | https://github.com/moskomule/cca.pytorch |
Framework | pytorch |
Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation
Title | Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation |
Authors | Chang Li, Ilya Markov, Maarten de Rijke, Masrour Zoghi |
Abstract | Online ranker evaluation is one of the key challenges in information retrieval. While the preferences of rankers can be inferred by interleaved comparison methods, how to effectively choose the pair of rankers to generate the result list without degrading the user experience too much can be formalized as a K-armed dueling bandit problem, which is an online partial-information learning framework, where feedback comes in the form of pair-wise preferences. A commercial search system may evaluate a large number of rankers concurrently, and scaling effectively in the presence of numerous rankers has not been fully studied. In this paper, we focus on solving the large-scale online ranker evaluation problem under the so-called Condorcet assumption, where there exists an optimal ranker that is preferred to all other rankers. We propose Merge Double Thompson Sampling (MergeDTS), which first utilizes a divide-and-conquer strategy that localizes the comparisons carried out by the algorithm to small batches of rankers, and then employs the Thompson Sampling (TS) to reduce the comparisons between suboptimal rankers inside these small batches. The effectiveness (regret) and efficiency (time complexity) of MergeDTS are extensively evaluated using examples from the domain of online evaluation for web search. Our main finding is that for large-scale Condorcet ranker evaluation problems MergeDTS outperforms the state-of-the-art dueling bandit algorithms. |
Tasks | Information Retrieval, Online Ranker Evaluation |
Published | 2018-12-11 |
URL | http://arxiv.org/abs/1812.04412v1 |
http://arxiv.org/pdf/1812.04412v1.pdf | |
PWC | https://paperswithcode.com/paper/merge-double-thompson-sampling-for-large |
Repo | https://github.com/chang-li/MergeDTS |
Framework | none |
Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data
Title | Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data |
Authors | Jacson Rodrigues Correia-Silva, Rodrigo F. Berriel, Claudine Badue, Alberto F. de Souza, Thiago Oliveira-Santos |
Abstract | In the past few years, Convolutional Neural Networks (CNNs) have been achieving state-of-the-art performance on a variety of problems. Many companies employ resources and money to generate these models and provide them as an API, therefore it is in their best interest to protect them, i.e., to avoid that someone else copies them. Recent studies revealed that state-of-the-art CNNs are vulnerable to adversarial examples attacks, and this weakness indicates that CNNs do not need to operate in the problem domain (PD). Therefore, we hypothesize that they also do not need to be trained with examples of the PD in order to operate in it. Given these facts, in this paper, we investigate if a target black-box CNN can be copied by persuading it to confess its knowledge through random non-labeled data. The copy is two-fold: i) the target network is queried with random data and its predictions are used to create a fake dataset with the knowledge of the network; and ii) a copycat network is trained with the fake dataset and should be able to achieve similar performance as the target network. This hypothesis was evaluated locally in three problems (facial expression, object, and crosswalk classification) and against a cloud-based API. In the copy attacks, images from both non-problem domain and PD were used. All copycat networks achieved at least 93.7% of the performance of the original models with non-problem domain data, and at least 98.6% using additional data from the PD. Additionally, the copycat CNN successfully copied at least 97.3% of the performance of the Microsoft Azure Emotion API. Our results show that it is possible to create a copycat CNN by simply querying a target network as black-box with random non-labeled data. |
Tasks | |
Published | 2018-06-14 |
URL | http://arxiv.org/abs/1806.05476v1 |
http://arxiv.org/pdf/1806.05476v1.pdf | |
PWC | https://paperswithcode.com/paper/copycat-cnn-stealing-knowledge-by-persuading |
Repo | https://github.com/jeiks/Stealing_DL_Models |
Framework | caffe2 |
Attributed Network Embedding for Incomplete Attributed Networks
Title | Attributed Network Embedding for Incomplete Attributed Networks |
Authors | Chengbin Hou, Shan He, Ke Tang |
Abstract | Attributed networks are ubiquitous since a network often comes with auxiliary attribute information e.g. a social network with user profiles. Attributed Network Embedding (ANE) has recently attracted considerable attention, which aims to learn unified low dimensional node embeddings while preserving both structural and attribute information. The resulting node embeddings can then facilitate various network downstream tasks e.g. link prediction. Although there are several ANE methods, most of them cannot deal with incomplete attributed networks with missing links and/or missing node attributes, which often occur in real-world scenarios. To address this issue, we propose a robust ANE method, the general idea of which is to reconstruct a unified denser network by fusing two sources of information for information enhancement, and then employ a random walks based network embedding method for learning node embeddings. The experiments of link prediction, node classification, visualization, and parameter sensitivity analysis on six real-world datasets validate the effectiveness of our method to incomplete attributed networks. |
Tasks | Link Prediction, Network Embedding, Node Classification |
Published | 2018-11-28 |
URL | https://arxiv.org/abs/1811.11728v2 |
https://arxiv.org/pdf/1811.11728v2.pdf | |
PWC | https://paperswithcode.com/paper/attributed-network-embedding-for-incomplete |
Repo | https://github.com/houchengbin/OpenANE |
Framework | tf |
Tell Me Where to Look: Guided Attention Inference Network
Title | Tell Me Where to Look: Guided Attention Inference Network |
Authors | Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu |
Abstract | Weakly supervised learning with only coarse labels can obtain visual explanations of deep neural network such as attention maps by back-propagating gradients. These attention maps are then available as priors for tasks such as object localization and semantic segmentation. In one common framework we address three shortcomings of previous approaches in modeling such attention maps: We (1) first time make attention maps an explicit and natural component of the end-to-end training, (2) provide self-guidance directly on these maps by exploring supervision form the network itself to improve them, and (3) seamlessly bridge the gap between using weak and extra supervision if available. Despite its simplicity, experiments on the semantic segmentation task demonstrate the effectiveness of our methods. We clearly surpass the state-of-the-art on Pascal VOC 2012 val. and test set. Besides, the proposed framework provides a way not only explaining the focus of the learner but also feeding back with direct guidance towards specific tasks. Under mild assumptions our method can also be understood as a plug-in to existing weakly supervised learners to improve their generalization performance. |
Tasks | Object Localization, Semantic Segmentation |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.10171v1 |
http://arxiv.org/pdf/1802.10171v1.pdf | |
PWC | https://paperswithcode.com/paper/tell-me-where-to-look-guided-attention |
Repo | https://github.com/AustinDoolittle/Pytorch-Gain |
Framework | pytorch |
Dimensionality-Driven Learning with Noisy Labels
Title | Dimensionality-Driven Learning with Noisy Labels |
Authors | Xingjun Ma, Yisen Wang, Michael E. Houle, Shuo Zhou, Sarah M. Erfani, Shu-Tao Xia, Sudanthi Wijewickrema, James Bailey |
Abstract | Datasets with significant proportions of noisy (incorrect) class labels present challenges for training accurate Deep Neural Networks (DNNs). We propose a new perspective for understanding DNN generalization for such datasets, by investigating the dimensionality of the deep representation subspace of training samples. We show that from a dimensionality perspective, DNNs exhibit quite distinctive learning styles when trained with clean labels versus when trained with a proportion of noisy labels. Based on this finding, we develop a new dimensionality-driven learning strategy, which monitors the dimensionality of subspaces during training and adapts the loss function accordingly. We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution. |
Tasks | |
Published | 2018-06-07 |
URL | http://arxiv.org/abs/1806.02612v2 |
http://arxiv.org/pdf/1806.02612v2.pdf | |
PWC | https://paperswithcode.com/paper/dimensionality-driven-learning-with-noisy |
Repo | https://github.com/ansuini/IntrinsicDimDeep |
Framework | pytorch |
RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Title | RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition |
Authors | Albert Zeyer, Tamer Alkhouli, Hermann Ney |
Abstract | We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications. |
Tasks | Speech Recognition |
Published | 2018-05-14 |
URL | http://arxiv.org/abs/1805.05225v2 |
http://arxiv.org/pdf/1805.05225v2.pdf | |
PWC | https://paperswithcode.com/paper/returnn-as-a-generic-flexible-neural-toolkit |
Repo | https://github.com/rwth-i6/returnn |
Framework | tf |
Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography
Title | Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography |
Authors | Eunhee Kang, Hyun Jung Koo, Dong Hyun Yang, Joon Bum Seo, Jong Chul Ye |
Abstract | In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. To address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phases by learning from the CT images in the routine dose phases. Although a supervised learning approach is not possible due to the differences in the underlying heart structure in two phases, the images in the two phases are closely related so that we propose a cycle-consistent adversarial denoising network to learn the non-degenerate mapping between the low and high dose cardiac phases. Experimental results showed that the proposed method effectively reduces the noise in the low-dose CT image while the preserving detailed texture and edge information. Moreover, thanks to the cyclic consistency and identity loss, the proposed network does not create any artificial features that are not present in the input images. Visual grading and quality evaluation also confirm that the proposed method provides significant improvement in diagnostic quality. |
Tasks | Denoising |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09748v3 |
http://arxiv.org/pdf/1806.09748v3.pdf | |
PWC | https://paperswithcode.com/paper/cycle-consistent-adversarial-denoising |
Repo | https://github.com/hyeongyuy/CT-CYCLE_IDNETITY_GAN_tensorflow |
Framework | tf |
Exploring Embedding Methods in Binary Hyperdimensional Computing: A Case Study for Motor-Imagery based Brain-Computer Interfaces
Title | Exploring Embedding Methods in Binary Hyperdimensional Computing: A Case Study for Motor-Imagery based Brain-Computer Interfaces |
Authors | Michael Hersche, José del R. Millán, Luca Benini, Abbas Rahimi |
Abstract | Key properties of brain-inspired hyperdimensional (HD) computing make it a prime candidate for energy-efficient and fast learning in biosignal processing. The main challenge is however to formulate embedding methods that map biosignal measures to a binary HD space. In this paper, we explore variety of such embedding methods and examine them with a challenging application of motor imagery brain-computer interface (MI-BCI) from electroencephalography (EEG) recordings. We explore embedding methods including random projections, quantization based thermometer and Gray coding, and learning HD representations using end-to-end training. All these methods, differing in complexity, aim to represent EEG signals in binary HD space, e.g. with 10,000 bits. This leads to development of a set of HD learning and classification methods that can be selectively chosen (or configured) based on accuracy and/or computational complexity requirements of a given task. We compare them with state-of-the-art linear support vector machine (SVM) on an NVIDIA TX2 board using the 4-class BCI competition IV-2a dataset as well as a new 3-class dataset. Compared to SVM, results on 3-class dataset show that simple thermometer embedding achieves moderate average accuracy (79.56% vs. 82.67%) with 26.8$\times$ faster training time and 22.3$\times$ lower energy; on the other hand, switching to end-to-end training with learned HD representations wipes out these training benefits while boosting the accuracy to 84.22% (1.55% higher than SVM). Similar trend is observed on the 4-class dataset where SVM achieves on average 74.29%: the thermometer embedding achieves 89.9$\times$ faster training time and 58.7$\times$ lower energy, but a lower accuracy (67.09%) than the learned representation of 72.54%. |
Tasks | EEG, Quantization |
Published | 2018-12-13 |
URL | http://arxiv.org/abs/1812.05705v2 |
http://arxiv.org/pdf/1812.05705v2.pdf | |
PWC | https://paperswithcode.com/paper/exploring-embedding-methods-in-binary |
Repo | https://github.com/MHersche/HDembedding-BCI |
Framework | pytorch |
IVUS-Net: An Intravascular Ultrasound Segmentation Network
Title | IVUS-Net: An Intravascular Ultrasound Segmentation Network |
Authors | Ji Yang, Lin Tong, Mehdi Faraji, Anup Basu |
Abstract | IntraVascular UltraSound (IVUS) is one of the most effective imaging modalities that provides assistance to experts in order to diagnose and treat cardiovascular diseases. We address a central problem in IVUS image analysis with Fully Convolutional Network (FCN): automatically delineate the lumen and media-adventitia borders in IVUS images, which is crucial to shorten the diagnosis process or benefits a faster and more accurate 3D reconstruction of the artery. Particularly, we propose an FCN architecture, called IVUS-Net, followed by a post-processing contour extraction step, in order to automatically segments the interior (lumen) and exterior (media-adventitia) regions of the human arteries. We evaluated our IVUS-Net on the test set of a standard publicly available dataset containing 326 IVUS B-mode images with two measurements, namely Jaccard Measure (JM) and Hausdorff Distances (HD). The evaluation result shows that IVUS-Net outperforms the state-of-the-art lumen and media segmentation methods by 4% to 20% in terms of HD distance. IVUS-Net performs well on images in the test set that contain a significant amount of major artifacts such as bifurcations, shadows, and side branches that are not common in the training set. Furthermore, using a modern GPU, IVUS-Net segments each IVUS frame only in 0.15 seconds. The proposed work, to the best of our knowledge, is the first deep learning based method for segmentation of both the lumen and the media vessel walls in 20 MHz IVUS B-mode images that achieves the best results without any manual intervention. Code is available at https://github.com/Kulbear/ivus-segmentation-icsm2018 |
Tasks | 3D Reconstruction |
Published | 2018-06-10 |
URL | http://arxiv.org/abs/1806.03583v2 |
http://arxiv.org/pdf/1806.03583v2.pdf | |
PWC | https://paperswithcode.com/paper/ivus-net-an-intravascular-ultrasound |
Repo | https://github.com/Kulbear/ivus-segmentation-icsm2018 |
Framework | tf |
VFunc: a Deep Generative Model for Functions
Title | VFunc: a Deep Generative Model for Functions |
Authors | Philip Bachman, Riashat Islam, Alessandro Sordoni, Zafarali Ahmed |
Abstract | We introduce a deep generative model for functions. Our model provides a joint distribution p(f, z) over functions f and latent variables z which lets us efficiently sample from the marginal p(f) and maximize a variational lower bound on the entropy H(f). We can thus maximize objectives of the form E_{f~p(f)}[R(f)] + c*H(f), where R(f) denotes, e.g., a data log-likelihood term or an expected reward. Such objectives encompass Bayesian deep learning in function space, rather than parameter space, and Bayesian deep RL with representations of uncertainty that offer benefits over bootstrapping and parameter noise. In this short paper we describe our model, situate it in the context of prior work, and present proof-of-concept experiments for regression and RL. |
Tasks | |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.04106v1 |
http://arxiv.org/pdf/1807.04106v1.pdf | |
PWC | https://paperswithcode.com/paper/vfunc-a-deep-generative-model-for-functions |
Repo | https://github.com/zafarali/emdp |
Framework | none |