January 31, 2020

2986 words 15 mins read

Paper Group AWR 383

XNAS: Neural Architecture Search with Expert Advice. Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks. Multi-hop Convolutions on Weighted Graphs. Selective Style Transfer for Text. Structuring Latent Spaces for Stylized Response Generation. Learning to Perform Role-Filler Binding with Schematic Knowledge. Join …

XNAS: Neural Architecture Search with Expert Advice


Title	XNAS: Neural Architecture Search with Expert Advice
Authors	Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik-Manor
Abstract	This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice. Its optimization criterion is well fitted for an architecture-selection, i.e., it minimizes the regret incurred by a sub-optimal selection of operations. Unlike previous search relaxations, that require hard pruning of architectures, our method is designed to dynamically wipe out inferior architectures and enhance superior ones. It achieves an optimal worst-case regret bound and suggests the use of multiple learning-rates, based on the amount of information carried by the backward gradients. Experiments show that our algorithm achieves a strong performance over several image classification datasets. Specifically, it obtains an error rate of 1.6% for CIFAR-10, 24% for ImageNet under mobile settings, and achieves state-of-the-art results on three additional datasets.
Tasks	Image Classification, Neural Architecture Search
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08031v1
PDF	https://arxiv.org/pdf/1906.08031v1.pdf
PWC	https://paperswithcode.com/paper/xnas-neural-architecture-search-with-expert
Repo	https://github.com/NivNayman/XNAS
Framework	pytorch

Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks


Title	Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks
Authors	Zhun Fan, Jiewei Lu, Benzhang Qiu, Tao Jiang, Kang An, Alex Noel Josephraj, Chuliang Wei
Abstract	Automated steel bar counting and center localization plays an important role in the factory automation of steel bars. Traditional methods only focus on steel bar counting and their performances are often limited by complex industrial environments. Convolutional neural network (CNN), which has great capability to deal with complex tasks in challenging environments, is applied in this work. A framework called CNN-DC is proposed to achieve automated steel bar counting and center localization simultaneously. The proposed framework CNN-DC first detects the candidate center points with a deep CNN. Then an effective clustering algorithm named as Distance Clustering(DC) is proposed to cluster the candidate center points and locate the true centers of steel bars. The proposed CNN-DC can achieve 99.26% accuracy for steel bar counting and 4.1% center offset for center localization on the established steel bar dataset, which demonstrates that the proposed CNN-DC can perform well on automated steel bar counting and center localization. Code is made publicly available at: https://github.com/BenzhangQiu/Steel-bar-Detection.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00891v2
PDF	https://arxiv.org/pdf/1906.00891v2.pdf
PWC	https://paperswithcode.com/paper/190600891
Repo	https://github.com/BenzhangQiu/Steel-bar-Detection
Framework	tf

Multi-hop Convolutions on Weighted Graphs


Title	Multi-hop Convolutions on Weighted Graphs
Authors	Qikui Zhu, Bo Du, Pingkun Yan
Abstract	Graph Convolutional Networks (GCNs) have made significant advances in semi-supervised learning, especially for classification tasks. However, existing GCN based methods have two main drawbacks. First, to increase the receptive field and improve the representation capability of GCNs, larger kernels or deeper network architectures are used, which greatly increases the computational complexity and the number of parameters. Second, methods working on higher order graphs computed directly from adjacency matrices may alter the relationship between graph nodes, particularly for weighted graphs. In addition, the direct construction of higher-order graphs introduces redundant information, which may result in lower network performance. To address the above weaknesses, in this paper, we propose a new method of multi-hop convolutional network on weighted graphs. The proposed method consists of multiple convolutional branches, where each branch extracts node representation from a $k$-hop graph with small kernels. Such design systematically aggregates multi-scale contextual information without adding redundant information. Furthermore, to efficiently combine the extracted information from the multi-hop branches, an adaptive weight computation (AWC) layer is proposed. We demonstrate the superiority of our MultiHop in six publicly available datasets, including three citation network datasets and three medical image datasets. The experimental results show that our proposed MultiHop method achieves the highest classification accuracy and outperforms the state-of-the-art methods. The source code of this work have been released on GitHub (https://github.com/ahukui/Multi-hop-Convolutions-on-Weighted-Graphs).
Tasks
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04978v1
PDF	https://arxiv.org/pdf/1911.04978v1.pdf
PWC	https://paperswithcode.com/paper/multi-hop-convolutions-on-weighted-graphs
Repo	https://github.com/ahukui/Multi-hop-Convolutions-on-Weighted-Graphs
Framework	none

Selective Style Transfer for Text


Title	Selective Style Transfer for Text
Authors	Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas
Abstract	This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available.
Tasks	Data Augmentation, Scene Text Detection, Style Transfer
Published	2019-06-04
URL	https://arxiv.org/abs/1906.01466v1
PDF	https://arxiv.org/pdf/1906.01466v1.pdf
PWC	https://paperswithcode.com/paper/selective-style-transfer-for-text
Repo	https://github.com/furkanbiten/SelectiveTextStyleTransfer
Framework	tf

Structuring Latent Spaces for Stylized Response Generation


Title	Structuring Latent Spaces for Stylized Response Generation
Authors	Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan
Abstract	Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.
Tasks	Style Transfer
Published	2019-09-03
URL	https://arxiv.org/abs/1909.05361v1
PDF	https://arxiv.org/pdf/1909.05361v1.pdf
PWC	https://paperswithcode.com/paper/structuring-latent-spaces-for-stylized
Repo	https://github.com/golsun/StyleFusion
Framework	none

Learning to Perform Role-Filler Binding with Schematic Knowledge


Title	Learning to Perform Role-Filler Binding with Schematic Knowledge
Authors	Catherine Chen, Qihong Lu, Andre Beukers, Christopher Baldassano, Kenneth A. Norman
Abstract	Through specific experiences, humans learn structural relationships underlying events in the world. Generalizing knowledge of structural relationships to new situations requires dynamic role-filler binding, the ability to associate specific “fillers” with abstract “roles”. Previous work found that artificial neural networks can learn this ability when explicitly told what the roles and fillers are. We show that networks can learn these relationships even without explicitly labeled roles and fillers, and show that analyses inspired by neural decoding can provide a means of understanding what the networks have learned.
Tasks	Question Answering
Published	2019-02-24
URL	https://arxiv.org/abs/1902.09006v2
PDF	https://arxiv.org/pdf/1902.09006v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-apply-schematic-knowledge-to
Repo	https://github.com/cchen23/generalized_schema_learning
Framework	tf

Jointly Optimizing Diversity and Relevance in Neural Response Generation


Title	Jointly Optimizing Diversity and Relevance in Neural Response Generation
Authors	Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan
Abstract	Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.
Tasks	Chatbot, Dialogue Generation
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11205v3
PDF	http://arxiv.org/pdf/1902.11205v3.pdf
PWC	https://paperswithcode.com/paper/jointly-optimizing-diversity-and-relevance-in
Repo	https://github.com/golsun/SpaceFusion
Framework	none

Decoupled Attention Network for Text Recognition


Title	Decoupled Attention Network for Text Recognition
Authors	Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, Mingxiang Cai
Abstract	Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition.
Tasks	Scene Text Recognition
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10205v1
PDF	https://arxiv.org/pdf/1912.10205v1.pdf
PWC	https://paperswithcode.com/paper/decoupled-attention-network-for-text
Repo	https://github.com/Canjie-Luo/Scene-Text-Image-Transformer
Framework	pytorch

Bidirectional Scene Text Recognition with a Single Decoder


Title	Bidirectional Scene Text Recognition with a Single Decoder
Authors	Maurits Bleeker, Maarten de Rijke
Abstract	Scene Text Recognition (STR) is the problem of recognizing the correct word or character sequence in a cropped word image. To obtain more robust output sequences, the notion of bidirectional STR has been introduced. So far, bidirectional STRs have been implemented by using two separate decoders; one for left-to-right decoding and one for right-to-left. Having two separate decoders for almost the same task with the same output space is undesirable from a computational and optimization point of view. We introduce the bidirectional Scene Text Transformer (Bi-STET), a novel bidirectional STR method with a single decoder for bidirectional text decoding. With its single decoder, Bi-STET outperforms methods that apply bidirectional decoding by using two separate decoders while also being more efficient than those methods, Furthermore, we achieve or beat state-of-the-art (SOTA) methods on all STR benchmarks with Bi-STET. Finally, we provide analyses and insights into the performance of Bi-STET.
Tasks	Scene Text Recognition
Published	2019-12-08
URL	https://arxiv.org/abs/1912.03656v2
PDF	https://arxiv.org/pdf/1912.03656v2.pdf
PWC	https://paperswithcode.com/paper/bidirectional-scene-text-recognition-with-a
Repo	https://github.com/MauritsBleeker/Bi-STET
Framework	pytorch

Pyramid Mask Text Detector


Title	Pyramid Mask Text Detector
Authors	Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, Qingjie Liu
Abstract	Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset.
Tasks	Instance Segmentation, Scene Text Detection, Scene Text Recognition, Semantic Segmentation
Published	2019-03-28
URL	http://arxiv.org/abs/1903.11800v1
PDF	http://arxiv.org/pdf/1903.11800v1.pdf
PWC	https://paperswithcode.com/paper/pyramid-mask-text-detector
Repo	https://github.com/STVIR/PMTD
Framework	none

Multi-scale self-guided attention for medical image segmentation


Title	Multi-scale self-guided attention for medical image segmentation
Authors	Ashish Sinha, Jose Dolz
Abstract	Even though convolutional neural networks (CNNs) are driving progress in medical image segmentation, standard models still have some drawbacks. First, the use of multi-scale approaches, i.e., encoder-decoder architectures, leads to a redundant use of information, where similar low-level features are extracted multiple times at multiple scales. Second, long-range feature dependencies are not efficiently modeled, resulting in non-optimal discriminative feature representations associated with each semantic class. In this paper we attempt to overcome these limitations with the proposed architecture, by capturing richer contextual dependencies based on the use of guided self-attention mechanisms. This approach is able to integrate local features with their corresponding global dependencies, as well as highlight interdependent channel maps in an adaptive manner. Further, the additional loss between different modules guides the attention mechanisms to neglect irrelevant information and focus on more discriminant regions of the image by emphasizing relevant feature associations. We evaluate the proposed model in the context of semantic segmentation on three different datasets: abdominal organs, cardiovascular structures and brain tumors. A series of ablation experiments support the importance of these attention modules in the proposed architecture. In addition, compared to other state-of-the-art segmentation networks our model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation. This demonstrates the efficiency of our approach to generate precise and reliable automatic segmentations of medical images. Our code is made publicly available at https://github.com/sinAshish/Multi-Scale-Attention
Tasks	Attentive segmentation networks, Deep Attention, Medical Image Segmentation, Semantic Segmentation
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02849v3
PDF	https://arxiv.org/pdf/1906.02849v3.pdf
PWC	https://paperswithcode.com/paper/multi-scale-guided-attention-for-medical
Repo	https://github.com/sinAshish/Multi-Scale-Attention
Framework	pytorch

Saliency Guided Self-attention Network for Weakly and Semi-supervised Semantic Segmentation


Title	Saliency Guided Self-attention Network for Weakly and Semi-supervised Semantic Segmentation
Authors	Qi Yao, Xiaojin Gong
Abstract	Weakly supervised semantic segmentation (WSSS) using only image-level labels can greatly reduce the annotation cost and therefore has attracted considerable research interest. However, its performance is still inferior to the fully supervised counterparts. To mitigate the performance gap, we propose a saliency guided self-attention network (SGAN) to address the WSSS problem. The introduced self-attention mechanism is able to capture rich and extensive contextual information but may mis-spread attentions to unexpected regions. In order to enable this mechanism to work effectively under weak supervision, we integrate class-agnostic saliency priors into the self-attention mechanism and utilize class-specific attention cues as an additional supervision for SGAN. Our SGAN is able to produce dense and accurate localization cues so that the segmentation performance is boosted. Moreover, by simply replacing the additional supervisions with partially labeled ground-truth, SGAN works effectively for semi-supervised semantic segmentation as well. Experiments on the PASCAL VOC 2012 and COCO datasets show that our approach outperforms all other state-of-the-art methods in both weakly and semi-supervised settings.
Tasks	Semantic Segmentation, Semi-Supervised Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-10-12
URL	https://arxiv.org/abs/1910.05475v2
PDF	https://arxiv.org/pdf/1910.05475v2.pdf
PWC	https://paperswithcode.com/paper/saliency-guided-self-attention-network-for
Repo	https://github.com/yaoqi-zd/SGAN
Framework	pytorch

Direct Fitting of Gaussian Mixture Models


Title	Direct Fitting of Gaussian Mixture Models
Authors	Leonid Keselman, Martial Hebert
Abstract	When fitting Gaussian Mixture Models to 3D geometry, the model is typically fit to point clouds, even when the shapes were obtained as 3D meshes. Here we present a formulation for fitting Gaussian Mixture Models (GMMs) directly to a triangular mesh instead of using points sampled from its surface. Part of this work analyzes a general formulation for evaluating likelihood of geometric objects. This modification enables fitting higher-quality GMMs under a wider range of initialization conditions. Additionally, models obtained from this fitting method are shown to produce an improvement in 3D registration for both meshes and RGB-D frames. This result is general and applicable to arbitrary geometric objects, including representing uncertainty from sensor measurements.
Tasks
Published	2019-04-11
URL	https://arxiv.org/abs/1904.05537v2
PDF	https://arxiv.org/pdf/1904.05537v2.pdf
PWC	https://paperswithcode.com/paper/direct-fitting-of-gaussian-mixture-models
Repo	https://github.com/leonidk/direct_gmm
Framework	none

Universal Adversarial Audio Perturbations


Title	Universal Adversarial Audio Perturbations
Authors	Sajjad Abdoli, Luiz G. Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, Alessandro L. Koerich
Abstract	We demonstrate the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios. We propose two methods for finding such perturbations. The first method is based on an iterative, greedy approach that is well-known in computer vision: it aggregates small perturbations to the input so as to push it to the decision boundary. The second method, which is the main contribution of this work, is a novel penalty formulation, which finds targeted and untargeted universal adversarial perturbations. Differently from the greedy approach, the penalty method minimizes an appropriate objective function on a batch of samples. Therefore, it produces more successful attacks when the number of training samples is limited. Moreover, we provide a proof that the proposed penalty method theoretically converges to a solution that corresponds to universal adversarial perturbations. We also demonstrate that it is possible to provide successful attacks using the penalty method when only one sample from the target dataset is available for the attacker. Experimental results on attacking five 1D CNN architectures have shown attack success rates higher than 85.4% and 83.1% for targeted and untargeted attacks, respectively using the proposed penalty method.
Tasks	Audio Classification
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03173v3
PDF	https://arxiv.org/pdf/1908.03173v3.pdf
PWC	https://paperswithcode.com/paper/universal-adversarial-audio-perturbations
Repo	https://github.com/sajabdoli/UAP
Framework	tf

Software and application patterns for explanation methods


Title	Software and application patterns for explanation methods
Authors	Maximilian Alber
Abstract	Deep neural networks successfully pervaded many applications domains and are increasingly used in critical decision processes. Understanding their workings is desirable or even required to further foster their potential as well as to access sensitive domains like medical applications or autonomous driving. One key to this broader usage of explaining frameworks is the accessibility and understanding of respective software. In this work we introduce software and application patterns for explanation techniques that aim to explain individual predictions of neural networks. We discuss how to code well-known algorithms efficiently within deep learning software frameworks and describe how to embed algorithms in downstream implementations. Building on this we show how explanation methods can be used in applications to understand predictions for miss-classified samples, to compare algorithms or networks, and to examine the focus of networks. Furthermore, we review available open-source packages and discuss challenges posed by complex and evolving neural network structures to explanation algorithm development and implementations.
Tasks	Autonomous Driving
Published	2019-04-09
URL	http://arxiv.org/abs/1904.04734v1
PDF	http://arxiv.org/pdf/1904.04734v1.pdf
PWC	https://paperswithcode.com/paper/software-and-application-patterns-for
Repo	https://github.com/albermax/interpretable_ai_book__sw_chapter
Framework	none