January 31, 2020

2986 words 15 mins read

Paper Group AWR 383

Paper Group AWR 383

XNAS: Neural Architecture Search with Expert Advice. Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks. Multi-hop Convolutions on Weighted Graphs. Selective Style Transfer for Text. Structuring Latent Spaces for Stylized Response Generation. Learning to Perform Role-Filler Binding with Schematic Knowledge. Join …

XNAS: Neural Architecture Search with Expert Advice

Title XNAS: Neural Architecture Search with Expert Advice
Authors Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik-Manor
Abstract This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice. Its optimization criterion is well fitted for an architecture-selection, i.e., it minimizes the regret incurred by a sub-optimal selection of operations. Unlike previous search relaxations, that require hard pruning of architectures, our method is designed to dynamically wipe out inferior architectures and enhance superior ones. It achieves an optimal worst-case regret bound and suggests the use of multiple learning-rates, based on the amount of information carried by the backward gradients. Experiments show that our algorithm achieves a strong performance over several image classification datasets. Specifically, it obtains an error rate of 1.6% for CIFAR-10, 24% for ImageNet under mobile settings, and achieves state-of-the-art results on three additional datasets.
Tasks Image Classification, Neural Architecture Search
Published 2019-06-19
URL https://arxiv.org/abs/1906.08031v1
PDF https://arxiv.org/pdf/1906.08031v1.pdf
PWC https://paperswithcode.com/paper/xnas-neural-architecture-search-with-expert
Repo https://github.com/NivNayman/XNAS
Framework pytorch

Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks

Title Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks
Authors Zhun Fan, Jiewei Lu, Benzhang Qiu, Tao Jiang, Kang An, Alex Noel Josephraj, Chuliang Wei
Abstract Automated steel bar counting and center localization plays an important role in the factory automation of steel bars. Traditional methods only focus on steel bar counting and their performances are often limited by complex industrial environments. Convolutional neural network (CNN), which has great capability to deal with complex tasks in challenging environments, is applied in this work. A framework called CNN-DC is proposed to achieve automated steel bar counting and center localization simultaneously. The proposed framework CNN-DC first detects the candidate center points with a deep CNN. Then an effective clustering algorithm named as Distance Clustering(DC) is proposed to cluster the candidate center points and locate the true centers of steel bars. The proposed CNN-DC can achieve 99.26% accuracy for steel bar counting and 4.1% center offset for center localization on the established steel bar dataset, which demonstrates that the proposed CNN-DC can perform well on automated steel bar counting and center localization. Code is made publicly available at: https://github.com/BenzhangQiu/Steel-bar-Detection.
Tasks
Published 2019-06-03
URL https://arxiv.org/abs/1906.00891v2
PDF https://arxiv.org/pdf/1906.00891v2.pdf
PWC https://paperswithcode.com/paper/190600891
Repo https://github.com/BenzhangQiu/Steel-bar-Detection
Framework tf

Multi-hop Convolutions on Weighted Graphs

Title Multi-hop Convolutions on Weighted Graphs
Authors Qikui Zhu, Bo Du, Pingkun Yan
Abstract Graph Convolutional Networks (GCNs) have made significant advances in semi-supervised learning, especially for classification tasks. However, existing GCN based methods have two main drawbacks. First, to increase the receptive field and improve the representation capability of GCNs, larger kernels or deeper network architectures are used, which greatly increases the computational complexity and the number of parameters. Second, methods working on higher order graphs computed directly from adjacency matrices may alter the relationship between graph nodes, particularly for weighted graphs. In addition, the direct construction of higher-order graphs introduces redundant information, which may result in lower network performance. To address the above weaknesses, in this paper, we propose a new method of multi-hop convolutional network on weighted graphs. The proposed method consists of multiple convolutional branches, where each branch extracts node representation from a $k$-hop graph with small kernels. Such design systematically aggregates multi-scale contextual information without adding redundant information. Furthermore, to efficiently combine the extracted information from the multi-hop branches, an adaptive weight computation (AWC) layer is proposed. We demonstrate the superiority of our MultiHop in six publicly available datasets, including three citation network datasets and three medical image datasets. The experimental results show that our proposed MultiHop method achieves the highest classification accuracy and outperforms the state-of-the-art methods. The source code of this work have been released on GitHub (https://github.com/ahukui/Multi-hop-Convolutions-on-Weighted-Graphs).
Tasks
Published 2019-11-12
URL https://arxiv.org/abs/1911.04978v1
PDF https://arxiv.org/pdf/1911.04978v1.pdf
PWC https://paperswithcode.com/paper/multi-hop-convolutions-on-weighted-graphs
Repo https://github.com/ahukui/Multi-hop-Convolutions-on-Weighted-Graphs
Framework none

Selective Style Transfer for Text

Title Selective Style Transfer for Text
Authors Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas
Abstract This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available.
Tasks Data Augmentation, Scene Text Detection, Style Transfer
Published 2019-06-04
URL https://arxiv.org/abs/1906.01466v1
PDF https://arxiv.org/pdf/1906.01466v1.pdf
PWC https://paperswithcode.com/paper/selective-style-transfer-for-text
Repo https://github.com/furkanbiten/SelectiveTextStyleTransfer
Framework tf

Structuring Latent Spaces for Stylized Response Generation

Title Structuring Latent Spaces for Stylized Response Generation
Authors Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan
Abstract Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.
Tasks Style Transfer
Published 2019-09-03
URL https://arxiv.org/abs/1909.05361v1
PDF https://arxiv.org/pdf/1909.05361v1.pdf
PWC https://paperswithcode.com/paper/structuring-latent-spaces-for-stylized
Repo https://github.com/golsun/StyleFusion
Framework none

Learning to Perform Role-Filler Binding with Schematic Knowledge

Title Learning to Perform Role-Filler Binding with Schematic Knowledge
Authors Catherine Chen, Qihong Lu, Andre Beukers, Christopher Baldassano, Kenneth A. Norman
Abstract Through specific experiences, humans learn structural relationships underlying events in the world. Generalizing knowledge of structural relationships to new situations requires dynamic role-filler binding, the ability to associate specific “fillers” with abstract “roles”. Previous work found that artificial neural networks can learn this ability when explicitly told what the roles and fillers are. We show that networks can learn these relationships even without explicitly labeled roles and fillers, and show that analyses inspired by neural decoding can provide a means of understanding what the networks have learned.
Tasks Question Answering
Published 2019-02-24
URL https://arxiv.org/abs/1902.09006v2
PDF https://arxiv.org/pdf/1902.09006v2.pdf
PWC https://paperswithcode.com/paper/learning-to-apply-schematic-knowledge-to
Repo https://github.com/cchen23/generalized_schema_learning
Framework tf

Jointly Optimizing Diversity and Relevance in Neural Response Generation

Title Jointly Optimizing Diversity and Relevance in Neural Response Generation
Authors Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan
Abstract Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.
Tasks Chatbot, Dialogue Generation
Published 2019-02-28
URL http://arxiv.org/abs/1902.11205v3
PDF http://arxiv.org/pdf/1902.11205v3.pdf
PWC https://paperswithcode.com/paper/jointly-optimizing-diversity-and-relevance-in
Repo https://github.com/golsun/SpaceFusion
Framework none

Decoupled Attention Network for Text Recognition

Title Decoupled Attention Network for Text Recognition
Authors Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, Mingxiang Cai
Abstract Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.
Tasks Scene Text Recognition
Published 2019-12-21
URL https://arxiv.org/abs/1912.10205v1
PDF https://arxiv.org/pdf/1912.10205v1.pdf
PWC https://paperswithcode.com/paper/decoupled-attention-network-for-text
Repo https://github.com/Canjie-Luo/Scene-Text-Image-Transformer
Framework pytorch

Bidirectional Scene Text Recognition with a Single Decoder

Title Bidirectional Scene Text Recognition with a Single Decoder
Authors Maurits Bleeker, Maarten de Rijke
Abstract Scene Text Recognition (STR) is the problem of recognizing the correct word or character sequence in a cropped word image. To obtain more robust output sequences, the notion of bidirectional STR has been introduced. So far, bidirectional STRs have been implemented by using two separate decoders; one for left-to-right decoding and one for right-to-left. Having two separate decoders for almost the same task with the same output space is undesirable from a computational and optimization point of view. We introduce the bidirectional Scene Text Transformer (Bi-STET), a novel bidirectional STR method with a single decoder for bidirectional text decoding. With its single decoder, Bi-STET outperforms methods that apply bidirectional decoding by using two separate decoders while also being more efficient than those methods, Furthermore, we achieve or beat state-of-the-art (SOTA) methods on all STR benchmarks with Bi-STET. Finally, we provide analyses and insights into the performance of Bi-STET.
Tasks Scene Text Recognition
Published 2019-12-08
URL https://arxiv.org/abs/1912.03656v2
PDF https://arxiv.org/pdf/1912.03656v2.pdf
PWC https://paperswithcode.com/paper/bidirectional-scene-text-recognition-with-a
Repo https://github.com/MauritsBleeker/Bi-STET
Framework pytorch

Pyramid Mask Text Detector

Title Pyramid Mask Text Detector
Authors Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, Qingjie Liu
Abstract Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset.
Tasks Instance Segmentation, Scene Text Detection, Scene Text Recognition, Semantic Segmentation
Published 2019-03-28
URL http://arxiv.org/abs/1903.11800v1
PDF http://arxiv.org/pdf/1903.11800v1.pdf
PWC https://paperswithcode.com/paper/pyramid-mask-text-detector
Repo https://github.com/STVIR/PMTD
Framework none

Multi-scale self-guided attention for medical image segmentation

Title Multi-scale self-guided attention for medical image segmentation
Authors Ashish Sinha, Jose Dolz
Abstract Even though convolutional neural networks (CNNs) are driving progress in medical image segmentation, standard models still have some drawbacks. First, the use of multi-scale approaches, i.e., encoder-decoder architectures, leads to a redundant use of information, where similar low-level features are extracted multiple times at multiple scales. Second, long-range feature dependencies are not efficiently modeled, resulting in non-optimal discriminative feature representations associated with each semantic class. In this paper we attempt to overcome these limitations with the proposed architecture, by capturing richer contextual dependencies based on the use of guided self-attention mechanisms. This approach is able to integrate local features with their corresponding global dependencies, as well as highlight interdependent channel maps in an adaptive manner. Further, the additional loss between different modules guides the attention mechanisms to neglect irrelevant information and focus on more discriminant regions of the image by emphasizing relevant feature associations. We evaluate the proposed model in the context of semantic segmentation on three different datasets: abdominal organs, cardiovascular structures and brain tumors. A series of ablation experiments support the importance of these attention modules in the proposed architecture. In addition, compared to other state-of-the-art segmentation networks our model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation. This demonstrates the efficiency of our approach to generate precise and reliable automatic segmentations of medical images. Our code is made publicly available at https://github.com/sinAshish/Multi-Scale-Attention
Tasks Attentive segmentation networks, Deep Attention, Medical Image Segmentation, Semantic Segmentation
Published 2019-06-07
URL https://arxiv.org/abs/1906.02849v3
PDF https://arxiv.org/pdf/1906.02849v3.pdf
PWC https://paperswithcode.com/paper/multi-scale-guided-attention-for-medical
Repo https://github.com/sinAshish/Multi-Scale-Attention
Framework pytorch

Saliency Guided Self-attention Network for Weakly and Semi-supervised Semantic Segmentation

Title Saliency Guided Self-attention Network for Weakly and Semi-supervised Semantic Segmentation
Authors Qi Yao, Xiaojin Gong
Abstract Weakly supervised semantic segmentation (WSSS) using only image-level labels can greatly reduce the annotation cost and therefore has attracted considerable research interest. However, its performance is still inferior to the fully supervised counterparts. To mitigate the performance gap, we propose a saliency guided self-attention network (SGAN) to address the WSSS problem. The introduced self-attention mechanism is able to capture rich and extensive contextual information but may mis-spread attentions to unexpected regions. In order to enable this mechanism to work effectively under weak supervision, we integrate class-agnostic saliency priors into the self-attention mechanism and utilize class-specific attention cues as an additional supervision for SGAN. Our SGAN is able to produce dense and accurate localization cues so that the segmentation performance is boosted. Moreover, by simply replacing the additional supervisions with partially labeled ground-truth, SGAN works effectively for semi-supervised semantic segmentation as well. Experiments on the PASCAL VOC 2012 and COCO datasets show that our approach outperforms all other state-of-the-art methods in both weakly and semi-supervised settings.
Tasks Semantic Segmentation, Semi-Supervised Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published 2019-10-12
URL https://arxiv.org/abs/1910.05475v2
PDF https://arxiv.org/pdf/1910.05475v2.pdf
PWC https://paperswithcode.com/paper/saliency-guided-self-attention-network-for
Repo https://github.com/yaoqi-zd/SGAN
Framework pytorch

Direct Fitting of Gaussian Mixture Models

Title Direct Fitting of Gaussian Mixture Models
Authors Leonid Keselman, Martial Hebert
Abstract When fitting Gaussian Mixture Models to 3D geometry, the model is typically fit to point clouds, even when the shapes were obtained as 3D meshes. Here we present a formulation for fitting Gaussian Mixture Models (GMMs) directly to a triangular mesh instead of using points sampled from its surface. Part of this work analyzes a general formulation for evaluating likelihood of geometric objects. This modification enables fitting higher-quality GMMs under a wider range of initialization conditions. Additionally, models obtained from this fitting method are shown to produce an improvement in 3D registration for both meshes and RGB-D frames. This result is general and applicable to arbitrary geometric objects, including representing uncertainty from sensor measurements.
Tasks
Published 2019-04-11
URL https://arxiv.org/abs/1904.05537v2
PDF https://arxiv.org/pdf/1904.05537v2.pdf
PWC https://paperswithcode.com/paper/direct-fitting-of-gaussian-mixture-models
Repo https://github.com/leonidk/direct_gmm
Framework none

Universal Adversarial Audio Perturbations

Title Universal Adversarial Audio Perturbations
Authors Sajjad Abdoli, Luiz G. Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, Alessandro L. Koerich
Abstract We demonstrate the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios. We propose two methods for finding such perturbations. The first method is based on an iterative, greedy approach that is well-known in computer vision: it aggregates small perturbations to the input so as to push it to the decision boundary. The second method, which is the main contribution of this work, is a novel penalty formulation, which finds targeted and untargeted universal adversarial perturbations. Differently from the greedy approach, the penalty method minimizes an appropriate objective function on a batch of samples. Therefore, it produces more successful attacks when the number of training samples is limited. Moreover, we provide a proof that the proposed penalty method theoretically converges to a solution that corresponds to universal adversarial perturbations. We also demonstrate that it is possible to provide successful attacks using the penalty method when only one sample from the target dataset is available for the attacker. Experimental results on attacking five 1D CNN architectures have shown attack success rates higher than 85.4% and 83.1% for targeted and untargeted attacks, respectively using the proposed penalty method.
Tasks Audio Classification
Published 2019-08-08
URL https://arxiv.org/abs/1908.03173v3
PDF https://arxiv.org/pdf/1908.03173v3.pdf
PWC https://paperswithcode.com/paper/universal-adversarial-audio-perturbations
Repo https://github.com/sajabdoli/UAP
Framework tf

Software and application patterns for explanation methods

Title Software and application patterns for explanation methods
Authors Maximilian Alber
Abstract Deep neural networks successfully pervaded many applications domains and are increasingly used in critical decision processes. Understanding their workings is desirable or even required to further foster their potential as well as to access sensitive domains like medical applications or autonomous driving. One key to this broader usage of explaining frameworks is the accessibility and understanding of respective software. In this work we introduce software and application patterns for explanation techniques that aim to explain individual predictions of neural networks. We discuss how to code well-known algorithms efficiently within deep learning software frameworks and describe how to embed algorithms in downstream implementations. Building on this we show how explanation methods can be used in applications to understand predictions for miss-classified samples, to compare algorithms or networks, and to examine the focus of networks. Furthermore, we review available open-source packages and discuss challenges posed by complex and evolving neural network structures to explanation algorithm development and implementations.
Tasks Autonomous Driving
Published 2019-04-09
URL http://arxiv.org/abs/1904.04734v1
PDF http://arxiv.org/pdf/1904.04734v1.pdf
PWC https://paperswithcode.com/paper/software-and-application-patterns-for
Repo https://github.com/albermax/interpretable_ai_book__sw_chapter
Framework none
comments powered by Disqus