Paper Group AWR 383
XNAS: Neural Architecture Search with Expert Advice. Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks. Multi-hop Convolutions on Weighted Graphs. Selective Style Transfer for Text. Structuring Latent Spaces for Stylized Response Generation. Learning to Perform Role-Filler Binding with Schematic Knowledge. Join …
XNAS: Neural Architecture Search with Expert Advice
Title | XNAS: Neural Architecture Search with Expert Advice |
Authors | Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, Lihi Zelnik-Manor |
Abstract | This paper introduces a novel optimization method for differential neural architecture search, based on the theory of prediction with expert advice. Its optimization criterion is well fitted for an architecture-selection, i.e., it minimizes the regret incurred by a sub-optimal selection of operations. Unlike previous search relaxations, that require hard pruning of architectures, our method is designed to dynamically wipe out inferior architectures and enhance superior ones. It achieves an optimal worst-case regret bound and suggests the use of multiple learning-rates, based on the amount of information carried by the backward gradients. Experiments show that our algorithm achieves a strong performance over several image classification datasets. Specifically, it obtains an error rate of 1.6% for CIFAR-10, 24% for ImageNet under mobile settings, and achieves state-of-the-art results on three additional datasets. |
Tasks | Image Classification, Neural Architecture Search |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.08031v1 |
https://arxiv.org/pdf/1906.08031v1.pdf | |
PWC | https://paperswithcode.com/paper/xnas-neural-architecture-search-with-expert |
Repo | https://github.com/NivNayman/XNAS |
Framework | pytorch |
Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks
Title | Automated Steel Bar Counting and Center Localization with Convolutional Neural Networks |
Authors | Zhun Fan, Jiewei Lu, Benzhang Qiu, Tao Jiang, Kang An, Alex Noel Josephraj, Chuliang Wei |
Abstract | Automated steel bar counting and center localization plays an important role in the factory automation of steel bars. Traditional methods only focus on steel bar counting and their performances are often limited by complex industrial environments. Convolutional neural network (CNN), which has great capability to deal with complex tasks in challenging environments, is applied in this work. A framework called CNN-DC is proposed to achieve automated steel bar counting and center localization simultaneously. The proposed framework CNN-DC first detects the candidate center points with a deep CNN. Then an effective clustering algorithm named as Distance Clustering(DC) is proposed to cluster the candidate center points and locate the true centers of steel bars. The proposed CNN-DC can achieve 99.26% accuracy for steel bar counting and 4.1% center offset for center localization on the established steel bar dataset, which demonstrates that the proposed CNN-DC can perform well on automated steel bar counting and center localization. Code is made publicly available at: https://github.com/BenzhangQiu/Steel-bar-Detection. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00891v2 |
https://arxiv.org/pdf/1906.00891v2.pdf | |
PWC | https://paperswithcode.com/paper/190600891 |
Repo | https://github.com/BenzhangQiu/Steel-bar-Detection |
Framework | tf |
Multi-hop Convolutions on Weighted Graphs
Title | Multi-hop Convolutions on Weighted Graphs |
Authors | Qikui Zhu, Bo Du, Pingkun Yan |
Abstract | Graph Convolutional Networks (GCNs) have made significant advances in semi-supervised learning, especially for classification tasks. However, existing GCN based methods have two main drawbacks. First, to increase the receptive field and improve the representation capability of GCNs, larger kernels or deeper network architectures are used, which greatly increases the computational complexity and the number of parameters. Second, methods working on higher order graphs computed directly from adjacency matrices may alter the relationship between graph nodes, particularly for weighted graphs. In addition, the direct construction of higher-order graphs introduces redundant information, which may result in lower network performance. To address the above weaknesses, in this paper, we propose a new method of multi-hop convolutional network on weighted graphs. The proposed method consists of multiple convolutional branches, where each branch extracts node representation from a $k$-hop graph with small kernels. Such design systematically aggregates multi-scale contextual information without adding redundant information. Furthermore, to efficiently combine the extracted information from the multi-hop branches, an adaptive weight computation (AWC) layer is proposed. We demonstrate the superiority of our MultiHop in six publicly available datasets, including three citation network datasets and three medical image datasets. The experimental results show that our proposed MultiHop method achieves the highest classification accuracy and outperforms the state-of-the-art methods. The source code of this work have been released on GitHub (https://github.com/ahukui/Multi-hop-Convolutions-on-Weighted-Graphs). |
Tasks | |
Published | 2019-11-12 |
URL | https://arxiv.org/abs/1911.04978v1 |
https://arxiv.org/pdf/1911.04978v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-hop-convolutions-on-weighted-graphs |
Repo | https://github.com/ahukui/Multi-hop-Convolutions-on-Weighted-Graphs |
Framework | none |
Selective Style Transfer for Text
Title | Selective Style Transfer for Text |
Authors | Raul Gomez, Ali Furkan Biten, Lluis Gomez, Jaume Gibert, Marçal Rusiñol, Dimosthenis Karatzas |
Abstract | This paper explores the possibilities of image style transfer applied to text maintaining the original transcriptions. Results on different text domains (scene text, machine printed text and handwritten text) and cross modal results demonstrate that this is feasible, and open different research lines. Furthermore, two architectures for selective style transfer, which means transferring style to only desired image pixels, are proposed. Finally, scene text selective style transfer is evaluated as a data augmentation technique to expand scene text detection datasets, resulting in a boost of text detectors performance. Our implementation of the described models is publicly available. |
Tasks | Data Augmentation, Scene Text Detection, Style Transfer |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01466v1 |
https://arxiv.org/pdf/1906.01466v1.pdf | |
PWC | https://paperswithcode.com/paper/selective-style-transfer-for-text |
Repo | https://github.com/furkanbiten/SelectiveTextStyleTransfer |
Framework | tf |
Structuring Latent Spaces for Stylized Response Generation
Title | Structuring Latent Spaces for Stylized Response Generation |
Authors | Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan |
Abstract | Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines. |
Tasks | Style Transfer |
Published | 2019-09-03 |
URL | https://arxiv.org/abs/1909.05361v1 |
https://arxiv.org/pdf/1909.05361v1.pdf | |
PWC | https://paperswithcode.com/paper/structuring-latent-spaces-for-stylized |
Repo | https://github.com/golsun/StyleFusion |
Framework | none |
Learning to Perform Role-Filler Binding with Schematic Knowledge
Title | Learning to Perform Role-Filler Binding with Schematic Knowledge |
Authors | Catherine Chen, Qihong Lu, Andre Beukers, Christopher Baldassano, Kenneth A. Norman |
Abstract | Through specific experiences, humans learn structural relationships underlying events in the world. Generalizing knowledge of structural relationships to new situations requires dynamic role-filler binding, the ability to associate specific “fillers” with abstract “roles”. Previous work found that artificial neural networks can learn this ability when explicitly told what the roles and fillers are. We show that networks can learn these relationships even without explicitly labeled roles and fillers, and show that analyses inspired by neural decoding can provide a means of understanding what the networks have learned. |
Tasks | Question Answering |
Published | 2019-02-24 |
URL | https://arxiv.org/abs/1902.09006v2 |
https://arxiv.org/pdf/1902.09006v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-apply-schematic-knowledge-to |
Repo | https://github.com/cchen23/generalized_schema_learning |
Framework | tf |
Jointly Optimizing Diversity and Relevance in Neural Response Generation
Title | Jointly Optimizing Diversity and Relevance in Neural Response Generation |
Authors | Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan |
Abstract | Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a SpaceFusion model to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance. |
Tasks | Chatbot, Dialogue Generation |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11205v3 |
http://arxiv.org/pdf/1902.11205v3.pdf | |
PWC | https://paperswithcode.com/paper/jointly-optimizing-diversity-and-relevance-in |
Repo | https://github.com/golsun/SpaceFusion |
Framework | none |
Decoupled Attention Network for Text Recognition
Title | Decoupled Attention Network for Text Recognition |
Authors | Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Canjie Luo, Xiaoxue Chen, Yaqiang Wu, Qianying Wang, Mingxiang Cai |
Abstract | Text recognition has attracted considerable research interests because of its various applications. The cutting-edge text recognition methods are based on attention mechanisms. However, most of attention methods usually suffer from serious alignment problem due to its recurrency alignment operation, where the alignment relies on historical decoding results. To remedy this issue, we propose a decoupled attention network (DAN), which decouples the alignment operation from using historical decoding results. DAN is an effective, flexible and robust end-to-end text recognizer, which consists of three components: 1) a feature encoder that extracts visual features from the input image; 2) a convolutional alignment module that performs the alignment operation based on visual features from the encoder; and 3) a decoupled text decoder that makes final prediction by jointly using the feature map and attention maps. Experimental results show that DAN achieves state-of-the-art performance on multiple text recognition tasks, including offline handwritten text recognition and regular/irregular scene text recognition. |
Tasks | Scene Text Recognition |
Published | 2019-12-21 |
URL | https://arxiv.org/abs/1912.10205v1 |
https://arxiv.org/pdf/1912.10205v1.pdf | |
PWC | https://paperswithcode.com/paper/decoupled-attention-network-for-text |
Repo | https://github.com/Canjie-Luo/Scene-Text-Image-Transformer |
Framework | pytorch |
Bidirectional Scene Text Recognition with a Single Decoder
Title | Bidirectional Scene Text Recognition with a Single Decoder |
Authors | Maurits Bleeker, Maarten de Rijke |
Abstract | Scene Text Recognition (STR) is the problem of recognizing the correct word or character sequence in a cropped word image. To obtain more robust output sequences, the notion of bidirectional STR has been introduced. So far, bidirectional STRs have been implemented by using two separate decoders; one for left-to-right decoding and one for right-to-left. Having two separate decoders for almost the same task with the same output space is undesirable from a computational and optimization point of view. We introduce the bidirectional Scene Text Transformer (Bi-STET), a novel bidirectional STR method with a single decoder for bidirectional text decoding. With its single decoder, Bi-STET outperforms methods that apply bidirectional decoding by using two separate decoders while also being more efficient than those methods, Furthermore, we achieve or beat state-of-the-art (SOTA) methods on all STR benchmarks with Bi-STET. Finally, we provide analyses and insights into the performance of Bi-STET. |
Tasks | Scene Text Recognition |
Published | 2019-12-08 |
URL | https://arxiv.org/abs/1912.03656v2 |
https://arxiv.org/pdf/1912.03656v2.pdf | |
PWC | https://paperswithcode.com/paper/bidirectional-scene-text-recognition-with-a |
Repo | https://github.com/MauritsBleeker/Bi-STET |
Framework | pytorch |
Pyramid Mask Text Detector
Title | Pyramid Mask Text Detector |
Authors | Jingchao Liu, Xuebo Liu, Jie Sheng, Ding Liang, Xin Li, Qingjie Liu |
Abstract | Scene text detection, an essential step of scene text recognition system, is to locate text instances in natural scene images automatically. Some recent attempts benefiting from Mask R-CNN formulate scene text detection task as an instance segmentation problem and achieve remarkable performance. In this paper, we present a new Mask R-CNN based framework named Pyramid Mask Text Detector (PMTD) to handle the scene text detection. Instead of binary text mask generated by the existing Mask R-CNN based methods, our PMTD performs pixel-level regression under the guidance of location-aware supervision, yielding a more informative soft text mask for each text instance. As for the generation of text boxes, PMTD reinterprets the obtained 2D soft mask into 3D space and introduces a novel plane clustering algorithm to derive the optimal text box on the basis of 3D shape. Experiments on standard datasets demonstrate that the proposed PMTD brings consistent and noticeable gain and clearly outperforms state-of-the-art methods. Specifically, it achieves an F-measure of 80.13% on ICDAR 2017 MLT dataset. |
Tasks | Instance Segmentation, Scene Text Detection, Scene Text Recognition, Semantic Segmentation |
Published | 2019-03-28 |
URL | http://arxiv.org/abs/1903.11800v1 |
http://arxiv.org/pdf/1903.11800v1.pdf | |
PWC | https://paperswithcode.com/paper/pyramid-mask-text-detector |
Repo | https://github.com/STVIR/PMTD |
Framework | none |
Multi-scale self-guided attention for medical image segmentation
Title | Multi-scale self-guided attention for medical image segmentation |
Authors | Ashish Sinha, Jose Dolz |
Abstract | Even though convolutional neural networks (CNNs) are driving progress in medical image segmentation, standard models still have some drawbacks. First, the use of multi-scale approaches, i.e., encoder-decoder architectures, leads to a redundant use of information, where similar low-level features are extracted multiple times at multiple scales. Second, long-range feature dependencies are not efficiently modeled, resulting in non-optimal discriminative feature representations associated with each semantic class. In this paper we attempt to overcome these limitations with the proposed architecture, by capturing richer contextual dependencies based on the use of guided self-attention mechanisms. This approach is able to integrate local features with their corresponding global dependencies, as well as highlight interdependent channel maps in an adaptive manner. Further, the additional loss between different modules guides the attention mechanisms to neglect irrelevant information and focus on more discriminant regions of the image by emphasizing relevant feature associations. We evaluate the proposed model in the context of semantic segmentation on three different datasets: abdominal organs, cardiovascular structures and brain tumors. A series of ablation experiments support the importance of these attention modules in the proposed architecture. In addition, compared to other state-of-the-art segmentation networks our model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation. This demonstrates the efficiency of our approach to generate precise and reliable automatic segmentations of medical images. Our code is made publicly available at https://github.com/sinAshish/Multi-Scale-Attention |
Tasks | Attentive segmentation networks, Deep Attention, Medical Image Segmentation, Semantic Segmentation |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.02849v3 |
https://arxiv.org/pdf/1906.02849v3.pdf | |
PWC | https://paperswithcode.com/paper/multi-scale-guided-attention-for-medical |
Repo | https://github.com/sinAshish/Multi-Scale-Attention |
Framework | pytorch |
Saliency Guided Self-attention Network for Weakly and Semi-supervised Semantic Segmentation
Title | Saliency Guided Self-attention Network for Weakly and Semi-supervised Semantic Segmentation |
Authors | Qi Yao, Xiaojin Gong |
Abstract | Weakly supervised semantic segmentation (WSSS) using only image-level labels can greatly reduce the annotation cost and therefore has attracted considerable research interest. However, its performance is still inferior to the fully supervised counterparts. To mitigate the performance gap, we propose a saliency guided self-attention network (SGAN) to address the WSSS problem. The introduced self-attention mechanism is able to capture rich and extensive contextual information but may mis-spread attentions to unexpected regions. In order to enable this mechanism to work effectively under weak supervision, we integrate class-agnostic saliency priors into the self-attention mechanism and utilize class-specific attention cues as an additional supervision for SGAN. Our SGAN is able to produce dense and accurate localization cues so that the segmentation performance is boosted. Moreover, by simply replacing the additional supervisions with partially labeled ground-truth, SGAN works effectively for semi-supervised semantic segmentation as well. Experiments on the PASCAL VOC 2012 and COCO datasets show that our approach outperforms all other state-of-the-art methods in both weakly and semi-supervised settings. |
Tasks | Semantic Segmentation, Semi-Supervised Semantic Segmentation, Weakly-Supervised Semantic Segmentation |
Published | 2019-10-12 |
URL | https://arxiv.org/abs/1910.05475v2 |
https://arxiv.org/pdf/1910.05475v2.pdf | |
PWC | https://paperswithcode.com/paper/saliency-guided-self-attention-network-for |
Repo | https://github.com/yaoqi-zd/SGAN |
Framework | pytorch |
Direct Fitting of Gaussian Mixture Models
Title | Direct Fitting of Gaussian Mixture Models |
Authors | Leonid Keselman, Martial Hebert |
Abstract | When fitting Gaussian Mixture Models to 3D geometry, the model is typically fit to point clouds, even when the shapes were obtained as 3D meshes. Here we present a formulation for fitting Gaussian Mixture Models (GMMs) directly to a triangular mesh instead of using points sampled from its surface. Part of this work analyzes a general formulation for evaluating likelihood of geometric objects. This modification enables fitting higher-quality GMMs under a wider range of initialization conditions. Additionally, models obtained from this fitting method are shown to produce an improvement in 3D registration for both meshes and RGB-D frames. This result is general and applicable to arbitrary geometric objects, including representing uncertainty from sensor measurements. |
Tasks | |
Published | 2019-04-11 |
URL | https://arxiv.org/abs/1904.05537v2 |
https://arxiv.org/pdf/1904.05537v2.pdf | |
PWC | https://paperswithcode.com/paper/direct-fitting-of-gaussian-mixture-models |
Repo | https://github.com/leonidk/direct_gmm |
Framework | none |
Universal Adversarial Audio Perturbations
Title | Universal Adversarial Audio Perturbations |
Authors | Sajjad Abdoli, Luiz G. Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, Alessandro L. Koerich |
Abstract | We demonstrate the existence of universal adversarial perturbations, which can fool a family of audio classification architectures, for both targeted and untargeted attack scenarios. We propose two methods for finding such perturbations. The first method is based on an iterative, greedy approach that is well-known in computer vision: it aggregates small perturbations to the input so as to push it to the decision boundary. The second method, which is the main contribution of this work, is a novel penalty formulation, which finds targeted and untargeted universal adversarial perturbations. Differently from the greedy approach, the penalty method minimizes an appropriate objective function on a batch of samples. Therefore, it produces more successful attacks when the number of training samples is limited. Moreover, we provide a proof that the proposed penalty method theoretically converges to a solution that corresponds to universal adversarial perturbations. We also demonstrate that it is possible to provide successful attacks using the penalty method when only one sample from the target dataset is available for the attacker. Experimental results on attacking five 1D CNN architectures have shown attack success rates higher than 85.4% and 83.1% for targeted and untargeted attacks, respectively using the proposed penalty method. |
Tasks | Audio Classification |
Published | 2019-08-08 |
URL | https://arxiv.org/abs/1908.03173v3 |
https://arxiv.org/pdf/1908.03173v3.pdf | |
PWC | https://paperswithcode.com/paper/universal-adversarial-audio-perturbations |
Repo | https://github.com/sajabdoli/UAP |
Framework | tf |
Software and application patterns for explanation methods
Title | Software and application patterns for explanation methods |
Authors | Maximilian Alber |
Abstract | Deep neural networks successfully pervaded many applications domains and are increasingly used in critical decision processes. Understanding their workings is desirable or even required to further foster their potential as well as to access sensitive domains like medical applications or autonomous driving. One key to this broader usage of explaining frameworks is the accessibility and understanding of respective software. In this work we introduce software and application patterns for explanation techniques that aim to explain individual predictions of neural networks. We discuss how to code well-known algorithms efficiently within deep learning software frameworks and describe how to embed algorithms in downstream implementations. Building on this we show how explanation methods can be used in applications to understand predictions for miss-classified samples, to compare algorithms or networks, and to examine the focus of networks. Furthermore, we review available open-source packages and discuss challenges posed by complex and evolving neural network structures to explanation algorithm development and implementations. |
Tasks | Autonomous Driving |
Published | 2019-04-09 |
URL | http://arxiv.org/abs/1904.04734v1 |
http://arxiv.org/pdf/1904.04734v1.pdf | |
PWC | https://paperswithcode.com/paper/software-and-application-patterns-for |
Repo | https://github.com/albermax/interpretable_ai_book__sw_chapter |
Framework | none |