February 1, 2020

3096 words 15 mins read

Paper Group AWR 301

Paper Group AWR 301

Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification. Meta-learning of textual representations. Query-based Deep Improvisation. Adversarial Feature Distribution Alignment for Semi-Supervised Learning. DeepShift: Towards Multiplication-Less Neural Networks. Multimodal Transformer for Unaligned Multimodal Language Sequences. Aestheti …

Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Title Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification
Authors Jing Wang, Weiqing Min, Sujuan Hou, Shengnan Ma, Yuanjie Zheng, Haishuai Wang, Shuqiang Jiang
Abstract Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classification task, we have curated a dataset, Logo-2K+, a new large-scale publicly available real-world logo dataset with 2,341 categories and 167,140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classification. DRNA-Net consists of four sub-networks: the navigator sub-network first selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its confidence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region cropping and region dropping. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classification. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https://github.com/msn199959/Logo-2k-plus-Dataset.
Tasks Data Augmentation, Product Recommendation
Published 2019-11-11
URL https://arxiv.org/abs/1911.07924v1
PDF https://arxiv.org/pdf/1911.07924v1.pdf
PWC https://paperswithcode.com/paper/logo-2k-a-large-scale-logo-dataset-for
Repo https://github.com/msn199959/Logo-2k-plus-Dataset
Framework pytorch

Meta-learning of textual representations

Title Meta-learning of textual representations
Authors Jorge Madrid, Hugo Jair Escalante, Eduardo Morales
Abstract Recent progress in AutoML has lead to state-of-the-art methods (e.g., AutoSKLearn) that can be readily used by non-experts to approach any supervised learning problem. Whereas these methods are quite effective, they are still limited in the sense that they work for tabular (matrix formatted) data only. This paper describes one step forward in trying to automate the design of supervised learning methods in the context of text mining. We introduce a meta learning methodology for automatically obtaining a representation for text mining tasks starting from raw text. We report experiments considering 60 different textual representations and more than 80 text mining datasets associated to a wide variety of tasks. Experimental results show the proposed methodology is a promising solution to obtain highly effective off the shell text classification pipelines.
Tasks AutoML, Meta-Learning, Text Classification
Published 2019-06-21
URL https://arxiv.org/abs/1906.08934v2
PDF https://arxiv.org/pdf/1906.08934v2.pdf
PWC https://paperswithcode.com/paper/meta-learning-of-textual-representations
Repo https://github.com/jorgegus/autotext
Framework none

Query-based Deep Improvisation

Title Query-based Deep Improvisation
Authors Shlomo Dubnov
Abstract In this paper we explore techniques for generating new music using a Variational Autoencoder (VAE) neural network that was trained on a corpus of specific style. Instead of randomly sampling the latent states of the network to produce free improvisation, we generate new music by querying the network with musical input in a style different from the training corpus. This allows us to produce new musical output with longer-term structure that blends aspects of the query to the style of the network. In order to control the level of this blending we add a noisy channel between the VAE encoder and decoder using bit-allocation algorithm from communication rate-distortion theory. Our experiments provide new insight into relations between the representational and structural information of latent states and the query signal, suggesting their possible use for composition purposes.
Tasks
Published 2019-06-21
URL https://arxiv.org/abs/1906.09155v1
PDF https://arxiv.org/pdf/1906.09155v1.pdf
PWC https://paperswithcode.com/paper/query-based-deep-improvisation
Repo https://github.com/sdubnov/qbdi
Framework none

Adversarial Feature Distribution Alignment for Semi-Supervised Learning

Title Adversarial Feature Distribution Alignment for Semi-Supervised Learning
Authors Christoph Mayer, Matthieu Paul, Radu Timofte
Abstract Training deep neural networks with only a few labeled samples can lead to overfitting. This is problematic in semi-supervised learning where only a few labeled samples are available. In this paper, we show that a consequence of overfitting in SSL is feature distribution misalignment between labeled and unlabeled samples. Hence, we propose a new feature distribution alignment method. Our method is particularly effective when using only a small amount of labeled samples. We test our method on CIFAR10 and SVHN. On SVHN we achieve a test error of 3.88% (250 labeled samples) and 3.39% (1000 labeled samples) which is close to the fully supervised model 2.89% (73k labeled samples). In comparison, the current SOTA achieves only 4.29% and 3.74%. Finally, we provide a theoretical insight why feature distribution alignment occurs and show that our method reduces it.
Tasks
Published 2019-12-22
URL https://arxiv.org/abs/1912.10428v1
PDF https://arxiv.org/pdf/1912.10428v1.pdf
PWC https://paperswithcode.com/paper/adversarial-feature-distribution-alignment
Repo https://github.com/kleinzcy/Semi-supervised-Learning
Framework none

DeepShift: Towards Multiplication-Less Neural Networks

Title DeepShift: Towards Multiplication-Less Neural Networks
Authors Mostafa Elhoushi, Zihao Chen, Farhan Shafiq, Ye Henry Tian, Joey Yiwei Li
Abstract Deployment of convolutional neural networks (CNNs) in mobile environments, their high computation and power budgets proves to be a major bottleneck. Convolution layers and fully connected layers, because of their intense use of multiplications, are the dominant contributer to this computation budget. This paper proposes to tackle this problem by introducing two new operations: convolutional shifts and fully-connected shifts, that replace multiplications all together with bitwise shift and sign flipping instead. For inference, both approaches may require only 6 bits to represent the weights. This family of neural network architectures (that use convolutional shifts and fully-connected shifts) are referred to as DeepShift models. We propose two methods to train DeepShift models: DeepShift-Q that trains regular weights constrained to powers of 2, and DeepShift-PS that trains the values of the shifts and sign flips directly. Training the DeepShift versions of ResNet18 architecture from scratch, we obtained accuracies of 92.33% on CIFAR10 dataset, and Top-1/Top-5 accuracies of 65.63%/86.33% on Imagenet dataset. Training the DeepShift version of VGG16 on ImageNet from scratch, resulted in a drop of less than 0.3% in Top-5 accuracy. Converting the pre-trained 32-bit floating point baseline model of GoogleNet to DeepShift and training it for 3 epochs, resulted in a Top-1/Top-5 accuracies of 69.87%/89.62% that are actually higher than that of the original model. Further testing is made on various well-known CNN architectures. Last but not least, we implemented the convolutional shifts and fully-connected shift GPU kernels and showed a reduction in latency time of 25% when inferring ResNet18 compared to an unoptimized multiplication-based GPU kernels. The code is available online at https://github.com/mostafaelhoushi/DeepShift.
Tasks
Published 2019-05-30
URL https://arxiv.org/abs/1905.13298v3
PDF https://arxiv.org/pdf/1905.13298v3.pdf
PWC https://paperswithcode.com/paper/deepshift-towards-multiplication-less-neural
Repo https://github.com/mostafaelhoushi/DeepShift
Framework pytorch

Multimodal Transformer for Unaligned Multimodal Language Sequences

Title Multimodal Transformer for Unaligned Multimodal Language Sequences
Authors Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov
Abstract Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise crossmodal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.
Tasks Time Series
Published 2019-06-01
URL https://arxiv.org/abs/1906.00295v1
PDF https://arxiv.org/pdf/1906.00295v1.pdf
PWC https://paperswithcode.com/paper/190600295
Repo https://github.com/yaohungt/Multimodal-Transformer
Framework pytorch

Aesthetic Image Captioning From Weakly-Labelled Photographs

Title Aesthetic Image Captioning From Weakly-Labelled Photographs
Authors Koustav Ghosal, Aakanksha Rana, Aljosa Smolic
Abstract Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs. While in natural image captioning (NIC), deep models are trained in an end-to-end manner using large curated datasets such as MS-COCO, no such large-scale, clean dataset exists for AIC. Towards this goal, we propose an automatic cleaning strategy to create a benchmarking AIC dataset, by exploiting the images and noisy comments easily available from photography websites. We propose a probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset “AVA-Captions”, (230, 000 images with 5 captions per image). Additionally, by exploiting the latent associations between aesthetic attributes, we propose a strategy for training the convolutional neural network (CNN) based visual feature extractor, the first component of the AIC framework. The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations. We finally show-case a thorough analysis of the proposed contributions using automatic metrics and subjective evaluations.
Tasks Image Captioning
Published 2019-08-29
URL https://arxiv.org/abs/1908.11310v1
PDF https://arxiv.org/pdf/1908.11310v1.pdf
PWC https://paperswithcode.com/paper/aesthetic-image-captioning-from-weakly
Repo https://github.com/V-Sense/Aesthetic-Image-Captioning-ICCVW-2019
Framework pytorch

Automated Chess Commentator Powered by Neural Chess Engine

Title Automated Chess Commentator Powered by Neural Chess Engine
Authors Hongyu Zang, Zhiwei Yu, Xiaojun Wan
Abstract In this paper, we explore a new approach for automated chess commentary generation, which aims to generate chess commentary texts in different categories (e.g., description, comparison, planning, etc.). We introduce a neural chess engine into text generation models to help with encoding boards, predicting moves, and analyzing situations. By jointly training the neural chess engine and the generation models for different categories, the models become more effective. We conduct experiments on 5 categories in a benchmark Chess Commentary dataset and achieve inspiring results in both automatic and human evaluations.
Tasks Text Generation
Published 2019-09-23
URL https://arxiv.org/abs/1909.10413v1
PDF https://arxiv.org/pdf/1909.10413v1.pdf
PWC https://paperswithcode.com/paper/190910413
Repo https://github.com/zhyack/SCC
Framework tf

K-Metamodes: frequency- and ensemble-based distributed k-modes clustering for security analytics

Title K-Metamodes: frequency- and ensemble-based distributed k-modes clustering for security analytics
Authors Andrey Sapegin, Christoph Meinel
Abstract Nowadays processing of Big Security Data, such as log messages, is commonly used for intrusion detection purposed. Its heterogeneous nature, as well as combination of numerical and categorical attributes does not allow to apply the existing data mining methods directly on the data without feature preprocessing. Therefore, a rather computationally expensive conversion of categorical attributes into vector space should be utilised for analysis of such data. However, a well-known k-modes algorithm allows to cluster the categorical data directly and avoid conversion into the vector space. The existing implementations of k-modes for Big Data processing are ensemble-based and utilise two-step clustering, where data subsets are first clustered independently, whereas the resulting cluster modes are clustered again in order to calculate metamodes valid for all data subsets. In this paper, the novel frequency-based distance function is proposed for the second step of ensemble-based k-modes clustering. Besides this, the existing feature discretisation method from the previous work is utilised in order to adapt k-modes for processing of mixed data sets. The resulting k-metamodes algorithm was tested on two public security data sets and reached higher effectiveness in comparison with the previous work.
Tasks Intrusion Detection
Published 2019-09-30
URL https://arxiv.org/abs/1909.13721v1
PDF https://arxiv.org/pdf/1909.13721v1.pdf
PWC https://paperswithcode.com/paper/k-metamodes-frequency-and-ensemble-based
Repo https://github.com/asapegin/pyspark-kmetamodes
Framework none

Deep Bayesian Optimization on Attributed Graphs

Title Deep Bayesian Optimization on Attributed Graphs
Authors Jiaxu Cui, Bo Yang, Xia Hu
Abstract Attributed graphs, which contain rich contextual features beyond just network structure, are ubiquitous and have been observed to benefit various network analytics applications. Graph structure optimization, aiming to find the optimal graphs in terms of some specific measures, has become an effective computational tool in complex network analysis. However, traditional model-free methods suffer from the expensive computational cost of evaluating graphs; existing vectorial Bayesian optimization methods cannot be directly applied to attributed graphs and have the scalability issue due to the use of Gaussian processes (GPs). To bridge the gap, in this paper, we propose a novel scalable Deep Graph Bayesian Optimization (DGBO) method on attributed graphs. The proposed DGBO prevents the cubical complexity of the GPs by adopting a deep graph neural network to surrogate black-box functions, and can scale linearly with the number of observations. Intensive experiments are conducted on both artificial and real-world problems, including molecular discovery and urban road network design, and demonstrate the effectiveness of the DGBO compared with the state-of-the-art.
Tasks Gaussian Processes
Published 2019-05-31
URL https://arxiv.org/abs/1905.13403v1
PDF https://arxiv.org/pdf/1905.13403v1.pdf
PWC https://paperswithcode.com/paper/deep-bayesian-optimization-on-attributed
Repo https://github.com/0h-n0/tfdbonas
Framework tf

Segmenting Medical MRI via Recurrent Decoding Cell

Title Segmenting Medical MRI via Recurrent Decoding Cell
Authors Ying Wen, Kai Xie, Lianghua He
Abstract The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion. However, the expanding path for feature decoding and spatial recovery does not consider the long-term dependency when fusing feature maps from different layers, and the universal encoder-decoder network does not make full use of the multi-modality information to improve the network robustness especially for segmenting medical MRI. In this paper, we propose a novel feature fusion unit called Recurrent Decoding Cell (RDC) which leverages convolutional RNNs to memorize the long-term context information from the previous layers in the decoding phase. An encoder-decoder network, named Convolutional Recurrent Decoding Network (CRDN), is also proposed based on RDC for segmenting multi-modality medical MRI. CRDN adopts CNN backbone to encode image features and decode them hierarchically through a chain of RDCs to obtain the final high-resolution score map. The evaluation experiments on BrainWeb, MRBrainS and HVSMR datasets demonstrate that the introduction of RDC effectively improves the segmentation accuracy as well as reduces the model size, and the proposed CRDN owns its robustness to image noise and intensity non-uniformity in medical MRI.
Tasks Medical Image Segmentation, Semantic Segmentation
Published 2019-11-21
URL https://arxiv.org/abs/1911.09401v1
PDF https://arxiv.org/pdf/1911.09401v1.pdf
PWC https://paperswithcode.com/paper/segmenting-medical-mri-via-recurrent-decoding
Repo https://github.com/shakex/Recurrent-Decoding-Cell
Framework pytorch

Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations

Title Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Authors David Acuna, Amlan Kar, Sanja Fidler
Abstract We tackle the problem of semantic boundary prediction, which aims to identify pixels that belong to object(class) boundaries. We notice that relevant datasets consist of a significant level of label noise, reflecting the fact that precise annotations are laborious to get and thus annotators trade-off quality with efficiency. We aim to learn sharp and precise semantic boundaries by explicitly reasoning about annotation noise during training. We propose a simple new layer and loss that can be used with existing learning-based boundary detectors. Our layer/loss enforces the detector to predict a maximum response along the normal direction at an edge, while also regularizing its direction. We further reason about true object boundaries during training using a level set formulation, which allows the network to learn from misaligned labels in an end-to-end fashion. Experiments show that we improve over the CASENet backbone network by more than 4% in terms of MF(ODS) and 18.61% in terms of AP, outperforming all current state-of-the-art methods including those that deal with alignment. Furthermore, we show that our learned network can be used to significantly improve coarse segmentation labels, lending itself as an efficient way to label new data.
Tasks Semantic Segmentation
Published 2019-04-16
URL https://arxiv.org/abs/1904.07934v2
PDF https://arxiv.org/pdf/1904.07934v2.pdf
PWC https://paperswithcode.com/paper/190407934
Repo https://github.com/nv-tlabs/STEAL
Framework pytorch

Commonsense Properties from Query Logs and Question Answering Forums

Title Commonsense Properties from Query Logs and Question Answering Forums
Authors Julien Romero, Simon Razniewski, Koninika Pal, Jeff Z. Pan, Archit Sakhadeo, Gerhard Weikum
Abstract Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasimodo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality.
Tasks Question Answering
Published 2019-05-27
URL https://arxiv.org/abs/1905.10989v3
PDF https://arxiv.org/pdf/1905.10989v3.pdf
PWC https://paperswithcode.com/paper/commonsense-properties-from-query-logs-and
Repo https://github.com/Aunsiels/CSK
Framework none

KTBoost: Combined Kernel and Tree Boosting

Title KTBoost: Combined Kernel and Tree Boosting
Authors Fabio Sigrist
Abstract In this article, we introduce a novel boosting algorithm called `KTBoost’, which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of functions that have parts with varying degrees of regularity such as discontinuities and smooth parts. We empirically show that KTBoost outperforms both tree and kernel boosting in terms of predictive accuracy on a wide array of data sets. |
Tasks
Published 2019-02-11
URL https://arxiv.org/abs/1902.03999v2
PDF https://arxiv.org/pdf/1902.03999v2.pdf
PWC https://paperswithcode.com/paper/ktboost-combined-kernel-and-tree-boosting
Repo https://github.com/fabsig/KTBoost
Framework none

LCA: Loss Change Allocation for Neural Network Training

Title LCA: Loss Change Allocation for Neural Network Training
Authors Janice Lan, Rosanne Liu, Hattie Zhou, Jason Yosinski
Abstract Neural networks enjoy widespread use, but many aspects of their training, representation, and operation are poorly understood. In particular, our view into the training process is limited, with a single scalar loss being the most common viewport into this high-dimensional, dynamic process. We propose a new window into training called Loss Change Allocation (LCA), in which credit for changes to the network loss is conservatively partitioned to the parameters. This measurement is accomplished by decomposing the components of an approximate path integral along the training trajectory using a Runge-Kutta integrator. This rich view shows which parameters are responsible for decreasing or increasing the loss during training, or which parameters “help” or “hurt” the network’s learning, respectively. LCA may be summed over training iterations and/or over neurons, channels, or layers for increasingly coarse views. This new measurement device produces several insights into training. (1) We find that barely over 50% of parameters help during any given iteration. (2) Some entire layers hurt overall, moving on average against the training gradient, a phenomenon we hypothesize may be due to phase lag in an oscillatory training process. (3) Finally, increments in learning proceed in a synchronized manner across layers, often peaking on identical iterations.
Tasks
Published 2019-09-03
URL https://arxiv.org/abs/1909.01440v2
PDF https://arxiv.org/pdf/1909.01440v2.pdf
PWC https://paperswithcode.com/paper/lca-loss-change-allocation-for-neural-network
Repo https://github.com/vkumaresan/LCA
Framework none
comments powered by Disqus