February 1, 2020

3272 words 16 mins read

Paper Group AWR 350

Paper Group AWR 350

Edge-labeling Graph Neural Network for Few-shot Learning. CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis. CFSNet: Toward a Controllable Feature Space for Image Restoration. Breast Cancer Diagnosis with Transfer Learning and Global Pooling. LYTNet: A Convolutional Neural Network for Real-Time Pedestrian Traffic L …

Edge-labeling Graph Neural Network for Few-shot Learning

Title Edge-labeling Graph Neural Network for Few-shot Learning
Authors Jongmin Kim, Taesup Kim, Sungwoong Kim, Chang D. Yoo
Abstract In this paper, we propose a novel edge-labeling graph neural network (EGNN), which adapts a deep neural network on the edge-labeling graph, for few-shot learning. The previous graph neural network (GNN) approaches in few-shot learning have been based on the node-labeling framework, which implicitly models the intra-cluster similarity and the inter-cluster dissimilarity. In contrast, the proposed EGNN learns to predict the edge-labels rather than the node-labels on the graph that enables the evolution of an explicit clustering by iteratively updating the edge-labels with direct exploitation of both intra-cluster similarity and the inter-cluster dissimilarity. It is also well suited for performing on various numbers of classes without retraining, and can be easily extended to perform a transductive inference. The parameters of the EGNN are learned by episodic training with an edge-labeling loss to obtain a well-generalizable model for unseen low-data problem. On both of the supervised and semi-supervised few-shot image classification tasks with two benchmark datasets, the proposed EGNN significantly improves the performances over the existing GNNs.
Tasks Few-Shot Image Classification, Few-Shot Learning, Image Classification
Published 2019-05-04
URL https://arxiv.org/abs/1905.01436v1
PDF https://arxiv.org/pdf/1905.01436v1.pdf
PWC https://paperswithcode.com/paper/edge-labeling-graph-neural-network-for-few
Repo https://github.com/khy0809/fewshot-egnn
Framework pytorch

CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis

Title CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis
Authors Ahti Kalervo, Juha Ylioinas, Markus Häikiö, Antti Karhu, Juho Kannala
Abstract Better understanding and modelling of building interiors and the emergence of more impressive AR/VR technology has brought up the need for automatic parsing of floorplan images. However, there is a clear lack of representative datasets to investigate the problem further. To address this shortcoming, this paper presents a novel image dataset called CubiCasa5K, a large-scale floorplan image dataset containing 5000 samples annotated into over 80 floorplan object categories. The dataset annotations are performed in a dense and versatile manner by using polygons for separating the different objects. Diverging from the classical approaches based on strong heuristics and low-level pixel operations, we present a method relying on an improved multi-task convolutional neural network. By releasing the novel dataset and our implementations, this study significantly boosts the research on automatic floorplan image analysis as it provides a richer set of tools for investigating the problem in a more comprehensive manner.
Tasks
Published 2019-04-03
URL http://arxiv.org/abs/1904.01920v1
PDF http://arxiv.org/pdf/1904.01920v1.pdf
PWC https://paperswithcode.com/paper/cubicasa5k-a-dataset-and-an-improved-multi
Repo https://github.com/CubiCasa/CubiCasa5k
Framework pytorch

CFSNet: Toward a Controllable Feature Space for Image Restoration

Title CFSNet: Toward a Controllable Feature Space for Image Restoration
Authors Wei Wang, Ruiming Guo, Yapeng Tian, Wenming Yang
Abstract Deep learning methods have witnessed the great progress in image restoration with specific metrics (e.g., PSNR, SSIM). However, the perceptual quality of the restored image is relatively subjective, and it is necessary for users to control the reconstruction result according to personal preferences or image characteristics, which cannot be done using existing deterministic networks. This motivates us to exquisitely design a unified interactive framework for general image restoration tasks. Under this framework, users can control continuous transition of different objectives, e.g., the perception-distortion trade-off of image super-resolution, the trade-off between noise reduction and detail preservation. We achieve this goal by controlling the latent features of the designed network. To be specific, our proposed framework, named Controllable Feature Space Network (CFSNet), is entangled by two branches based on different objectives. Our framework can adaptively learn the coupling coefficients of different layers and channels, which provides finer control of the restored image quality. Experiments on several typical image restoration tasks fully validate the effective benefits of the proposed method. Code is available at https://github.com/qibao77/CFSNet.
Tasks Image Restoration, Image Super-Resolution, Super-Resolution
Published 2019-04-01
URL https://arxiv.org/abs/1904.00634v2
PDF https://arxiv.org/pdf/1904.00634v2.pdf
PWC https://paperswithcode.com/paper/cfsnet-toward-a-controllable-feature-space
Repo https://github.com/qibao77/CFSNet
Framework pytorch

Breast Cancer Diagnosis with Transfer Learning and Global Pooling

Title Breast Cancer Diagnosis with Transfer Learning and Global Pooling
Authors Sara Hosseinzadeh Kassani, Peyman Hosseinzadeh Kassani, Michal J. Wesolowski, Kevin A. Schneider, Ralph Deters
Abstract Breast cancer is one of the most common causes of cancer-related death in women worldwide. Early and accurate diagnosis of breast cancer may significantly increase the survival rate of patients. In this study, we aim to develop a fully automatic, deep learning-based, method using descriptor features extracted by Deep Convolutional Neural Network (DCNN) models and pooling operation for the classification of hematoxylin and eosin stain (H&E) histological breast cancer images provided as a part of the International Conference on Image Analysis and Recognition (ICIAR) 2018 Grand Challenge on BreAst Cancer Histology (BACH) Images. Different data augmentation methods are applied to optimize the DCNN performance. We also investigated the efficacy of different stain normalization methods as a pre-processing step. The proposed network architecture using a pre-trained Xception model yields 92.50% average classification accuracy.
Tasks Data Augmentation, Transfer Learning
Published 2019-09-26
URL https://arxiv.org/abs/1909.11839v1
PDF https://arxiv.org/pdf/1909.11839v1.pdf
PWC https://paperswithcode.com/paper/breast-cancer-diagnosis-with-transfer
Repo https://github.com/sara-kassani/ICIAR_Transfer_Learning_Global_Average_Pooling
Framework tf

LYTNet: A Convolutional Neural Network for Real-Time Pedestrian Traffic Lights and Zebra Crossing Recognition for the Visually Impaired

Title LYTNet: A Convolutional Neural Network for Real-Time Pedestrian Traffic Lights and Zebra Crossing Recognition for the Visually Impaired
Authors Samuel Yu, Heon Lee, John Kim
Abstract Currently, the visually impaired rely on either a sighted human, guide dog, or white cane to safely navigate. However, the training of guide dogs is extremely expensive, and canes cannot provide essential information regarding the color of traffic lights and direction of crosswalks. In this paper, we propose a deep learning based solution that provides information regarding the traffic light mode and the position of the zebra crossing. Previous solutions that utilize machine learning only provide one piece of information and are mostly binary: only detecting red or green lights. The proposed convolutional neural network, LYTNet, is designed for comprehensiveness, accuracy, and computational efficiency. LYTNet delivers both of the two most important pieces of information for the visually impaired to cross the road. We provide five classes of pedestrian traffic lights rather than the commonly seen three or four, and a direction vector representing the midline of the zebra crossing that is converted from the 2D image plane to real-world positions. We created our own dataset of pedestrian traffic lights containing over 5000 photos taken at hundreds of intersections in Shanghai. The experiments carried out achieve a classification accuracy of 94%, average angle error of 6.35 degrees, with a frame rate of 20 frames per second when testing the network on an iPhone 7 with additional post-processing steps.
Tasks
Published 2019-07-23
URL https://arxiv.org/abs/1907.09706v1
PDF https://arxiv.org/pdf/1907.09706v1.pdf
PWC https://paperswithcode.com/paper/lytnet-a-convolutional-neural-network-for
Repo https://github.com/samuelyu2002/pedestrian-traffic-lights
Framework pytorch

Concept Saliency Maps to Visualize Relevant Features in Deep Generative Models

Title Concept Saliency Maps to Visualize Relevant Features in Deep Generative Models
Authors Lennart Brocki, Neo Christopher Chung
Abstract Evaluating, explaining, and visualizing high-level concepts in generative models, such as variational autoencoders (VAEs), is challenging in part due to a lack of known prediction classes that are required to generate saliency maps in supervised learning. While saliency maps may help identify relevant features (e.g., pixels) in the input for classification tasks of deep neural networks, similar frameworks are understudied in unsupervised learning. Therefore, we introduce a new method of obtaining saliency maps for latent representations of known or novel high-level concepts, often called concept vectors in generative models. Concept scores, analogous to class scores in classification tasks, are defined as dot products between concept vectors and encoded input data, which can be readily used to compute the gradients. The resulting concept saliency maps are shown to highlight input features deemed important for high-level concepts. Our method is applied to the VAE’s latent space of CelebA dataset in which known attributes such as “smiles” and “hats” are used to elucidate relevant facial features. Furthermore, our application to spatial transcriptomic (ST) data of a mouse olfactory bulb demonstrates the potential of latent representations of morphological layers and molecular features in advancing our understanding of complex biological systems. By extending the popular method of saliency maps to generative models, the proposed concept saliency maps help improve interpretability of latent variable models in deep learning. Codes to reproduce and to implement concept saliency maps: https://github.com/lenbrocki/concept-saliency-maps
Tasks Latent Variable Models
Published 2019-10-29
URL https://arxiv.org/abs/1910.13140v1
PDF https://arxiv.org/pdf/1910.13140v1.pdf
PWC https://paperswithcode.com/paper/concept-saliency-maps-to-visualize-relevant
Repo https://github.com/lenbrocki/concept-saliency-maps
Framework tf

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

Title Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Authors Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur
Abstract We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. Espresso achieves state-of-the-art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-to-end systems without data augmentation, and is 4–11x faster for decoding than similar systems (e.g. ESPnet).
Tasks Data Augmentation, Language Modelling, Machine Translation, Speech Recognition
Published 2019-09-18
URL https://arxiv.org/abs/1909.08723v3
PDF https://arxiv.org/pdf/1909.08723v3.pdf
PWC https://paperswithcode.com/paper/espresso-a-fast-end-to-end-neural-speech
Repo https://github.com/freewym/espresso
Framework pytorch

Efficient Deep Neural Network for Photo-realistic Image Super-Resolution

Title Efficient Deep Neural Network for Photo-realistic Image Super-Resolution
Authors Namhyuk Ahn, Byungkon Kang, Kyung-Ah Sohn
Abstract Recent progress in the deep learning-based models has improved photo-realistic (or perceptual) single-image super-resolution significantly. However, despite their powerful performance, many models are difficult to apply to the real-world applications because of the heavy computational requirements. To facilitate the use of a deep learning model under such demands, we focus on keeping the model fast and lightweight while maintaining its performance. In detail, we design an architecture that implements a cascading mechanism on a residual network to boost the performance with limited resources via multi-level feature fusion. Moreover, we adopt group convolution and weight-tying for our proposed model in order to achieve extreme efficiency. In addition to our network, we use the adversarial learning paradigm and a multi-scale discriminator approach. By doing so, we show that the performances of the proposed models surpass those of the recent methods, which have a complexity similar to ours, for both traditional pixel-based and perception-based tasks. To verify the effectiveness of our models, we investigate through extensive internal experiments and benchmark using various datasets.
Tasks Image Super-Resolution, Super-Resolution
Published 2019-03-06
URL https://arxiv.org/abs/1903.02240v2
PDF https://arxiv.org/pdf/1903.02240v2.pdf
PWC https://paperswithcode.com/paper/photo-realistic-image-super-resolution-with
Repo https://github.com/nmhkahn/PCARN-pytorch
Framework pytorch

Learning Fair Representations for Kernel Models

Title Learning Fair Representations for Kernel Models
Authors Zilong Tan, Samuel Yeom, Matt Fredrikson, Ameet Talwalkar
Abstract Fair representations are a powerful tool for establishing criteria like statistical parity, proxy non-discrimination, and equality of opportunity in learned models. Existing techniques for learning these representations are typically model-agnostic, as they preprocess the original data such that the output satisfies some fairness criterion, and can be used with arbitrary learning methods. In contrast, we demonstrate the promise of learning a model-aware fair representation, focusing on kernel-based models. We leverage the classical Sufficient Dimension Reduction (SDR) framework to construct representations as subspaces of the reproducing kernel Hilbert space (RKHS), whose member functions are guaranteed to satisfy fairness. Our method supports several fairness criteria, continuous and discrete data, and multiple protected attributes. We further show how to calibrate the accuracy tradeoff by characterizing it in terms of the principal angles between subspaces of the RKHS. Finally, we apply our approach to obtain the first Fair Gaussian Process (FGP) prior for fair Bayesian learning, and show that it is competitive with, and in some cases outperforms, state-of-the-art methods on real data.
Tasks Dimensionality Reduction
Published 2019-06-27
URL https://arxiv.org/abs/1906.11813v2
PDF https://arxiv.org/pdf/1906.11813v2.pdf
PWC https://paperswithcode.com/paper/learning-fair-representations-for-kernel
Repo https://github.com/ZilongTan/fgp
Framework none

Learning Dynamic Context Augmentation for Global Entity Linking

Title Learning Dynamic Context Augmentation for Global Entity Linking
Authors Xiyuan Yang, Xiaotao Gu, Sheng Lin, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren
Abstract Despite of the recent success of collective entity linking (EL) methods, these “global” inference methods may yield sub-optimal results when the “all-mention coherence” assumption breaks, and often suffer from high computational cost at the inference stage, due to the complex search space. In this paper, we propose a simple yet effective solution, called Dynamic Context Augmentation (DCA), for collective EL, which requires only one pass through the mentions in a document. DCA sequentially accumulates context information to make efficient, collective inference, and can cope with different local EL models as a plug-and-enhance module. We explore both supervised and reinforcement learning strategies for learning the DCA model. Extensive experiments show the effectiveness of our model with different learning settings, base models, decision orders and attention mechanisms.
Tasks Entity Linking
Published 2019-09-04
URL https://arxiv.org/abs/1909.02117v1
PDF https://arxiv.org/pdf/1909.02117v1.pdf
PWC https://paperswithcode.com/paper/learning-dynamic-context-augmentation-for
Repo https://github.com/YoungXiyuan/DCA
Framework pytorch

Zero-Shot Entity Linking by Reading Entity Descriptions

Title Zero-Shot Entity Linking by Reading Entity Descriptions
Authors Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee
Abstract We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data. The goal is to enable robust transfer to highly specialized domains, and so no metadata or alias tables are assumed. In this setting, entities are only identified by text descriptions, and models must rely strictly on language understanding to resolve the new entities. First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities. Second, we propose a simple and effective adaptive pre-training strategy, which we term domain-adaptive pre-training (DAP), to address the domain shift problem associated with linking unseen entities in a new domain. We present experiments on a new dataset that we construct for this task and show that DAP improves over strong pre-training baselines, including BERT. The data and code are available at https://github.com/lajanugen/zeshel.
Tasks Entity Linking, Reading Comprehension
Published 2019-06-18
URL https://arxiv.org/abs/1906.07348v1
PDF https://arxiv.org/pdf/1906.07348v1.pdf
PWC https://paperswithcode.com/paper/zero-shot-entity-linking-by-reading-entity
Repo https://github.com/lajanugen/zeshel
Framework tf

FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance

Title FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance
Authors Wataru Sakata, Tomohide Shibata, Ribeka Tanaka, Sadao Kurohashi
Abstract Frequently Asked Question (FAQ) retrieval is an important task where the objective is to retrieve an appropriate Question-Answer (QA) pair from a database based on a user’s query. We propose a FAQ retrieval system that considers the similarity between a user’s query and a question as well as the relevance between the query and an answer. Although a common approach to FAQ retrieval is to construct labeled data for training, it takes annotation costs. Therefore, we use a traditional unsupervised information retrieval system to calculate the similarity between the query and question. On the other hand, the relevance between the query and answer can be learned by using QA pairs in a FAQ database. The recently-proposed BERT model is used for the relevance calculation. Since the number of QA pairs in FAQ page is not enough to train a model, we cope with this issue by leveraging FAQ sets that are similar to the one in question. We evaluate our approach on two datasets. The first one is localgovFAQ, a dataset we construct in a Japanese administrative municipality domain. The second is StackExchange dataset, which is the public dataset in English. We demonstrate that our proposed method outperforms baseline methods on these datasets.
Tasks Information Retrieval, Question Similarity
Published 2019-05-08
URL https://arxiv.org/abs/1905.02851v2
PDF https://arxiv.org/pdf/1905.02851v2.pdf
PWC https://paperswithcode.com/paper/faq-retrieval-using-query-question-similarity
Repo https://github.com/ku-nlp/bert-based-faqir
Framework tf

On the Robustness of Deep K-Nearest Neighbors

Title On the Robustness of Deep K-Nearest Neighbors
Authors Chawin Sitawarin, David Wagner
Abstract Despite a large amount of attention on adversarial examples, very few works have demonstrated an effective defense against this threat. We examine Deep k-Nearest Neighbor (DkNN), a proposed defense that combines k-Nearest Neighbor (kNN) and deep learning to improve the model’s robustness to adversarial examples. It is challenging to evaluate the robustness of this scheme due to a lack of efficient algorithm for attacking kNN classifiers with large k and high-dimensional data. We propose a heuristic attack that allows us to use gradient descent to find adversarial examples for kNN classifiers, and then apply it to attack the DkNN defense as well. Results suggest that our attack is moderately stronger than any naive attack on kNN and significantly outperforms other attacks on DkNN.
Tasks
Published 2019-03-20
URL http://arxiv.org/abs/1903.08333v1
PDF http://arxiv.org/pdf/1903.08333v1.pdf
PWC https://paperswithcode.com/paper/on-the-robustness-of-deep-k-nearest-neighbors
Repo https://github.com/fiona-lxd/AdvKnn
Framework pytorch

Shredder: Learning Noise Distributions to Protect Inference Privacy

Title Shredder: Learning Noise Distributions to Protect Inference Privacy
Authors Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Dean Tullsen, Hadi Esmaeilzadeh
Abstract Sheer amount of computation in deep neural networks has pushed their execution to the cloud. This de facto cloud-hosted inference, however, raises serious privacy concerns as private data is communicated and stored in remote servers. The data could be mishandled by cloud providers, used for unsolicited analytics, or simply compromised through network and system security vulnerability. To that end, this paper devises SHREDDER that reduces the information content of the communicated data without diminishing the cloud’s ability to maintain acceptably high accuracy. To that end, SHREDDER learns two sets of noise distributions whose samples, named multiplicative and additive noise tensors, are applied to the communicated data while maintaining the inference accuracy. The key idea is that SHREDDER learns these noise distributions offline without altering the topology or the weights of the pre-trained network. SHREDDER repeatedly learns sample noise tensors from the distributions by casting the tensors as a set of trainable parameters while keeping the weights constant. Since the key idea is learning the noise, we are able to devise a loss function that strikes a balance between accuracy and information degradation. To this end, we use self-supervision to train the noise tensors to achieve an intermediate representation of the data that contains less private information. Experimentation with real-world deep neural networks shows that, compared to the original execution, SHREDDER reduces the mutual information between the input and the communicated data by 66.90%, and yields a misclassification rate of 94.5% over private labels, significantly reducing adversary’s ability to infer private data, while sacrificing only 1.74% loss in accuracy without any knowledge about the private labels.
Tasks
Published 2019-05-26
URL https://arxiv.org/abs/1905.11814v2
PDF https://arxiv.org/pdf/1905.11814v2.pdf
PWC https://paperswithcode.com/paper/190511814
Repo https://github.com/mireshghallah/shredder-v2-self-supervised
Framework pytorch

Large Memory Layers with Product Keys

Title Large Memory Layers with Product Keys
Authors Guillaume Lample, Alexandre Sablayrolles, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou
Abstract This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks. In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture. In particular, we found that a memory augmented model with only 12 layers outperforms a baseline transformer model with 24 layers, while being twice faster at inference time. We release our code for reproducibility purposes.
Tasks Language Modelling
Published 2019-07-10
URL https://arxiv.org/abs/1907.05242v2
PDF https://arxiv.org/pdf/1907.05242v2.pdf
PWC https://paperswithcode.com/paper/large-memory-layers-with-product-keys
Repo https://github.com/facebookresearch/XLM
Framework pytorch
comments powered by Disqus