April 3, 2020

3448 words 17 mins read

Paper Group AWR 70

Paper Group AWR 70

Synaptic Metaplasticity in Binarized Neural Networks. Hierarchical Neural Architecture Search for Single Image Super-Resolution. Meta-Learning across Meta-Tasks for Few-Shot Learning. Few-Shot Learning as Domain Adaptation: Algorithm and Analysis. Simultaneous Navigation and Radio Mapping for Cellular-Connected UAV with Deep Reinforcement Learning. …

Synaptic Metaplasticity in Binarized Neural Networks

Title Synaptic Metaplasticity in Binarized Neural Networks
Authors Axel Laborieux, Maxence Ernoult, Tifenn Hirtzlin, Damien Querlioz
Abstract While deep neural networks have surpassed human performance in multiple situations, they are prone to catastrophic forgetting: upon training a new task, they rapidly forget previously learned ones. Neuroscience studies, based on idealized tasks, suggest that in the brain, synapses overcome this issue by adjusting their plasticity depending on their past history. However, such “metaplastic” behaviour has never been leveraged to mitigate catastrophic forgetting in deep neural networks. In this work, we highlight a connection between metaplasticity models and the training process of binarized neural networks, a low-precision version of deep neural networks. Building on this idea, we propose and demonstrate experimentally, in situations of multitask and stream learning, a training technique that prevents catastrophic forgetting without needing previously presented data, nor formal boundaries between datasets. We support our approach with a theoretical analysis on a tractable task. This work bridges computational neuroscience and deep learning, and presents significant assets for future embedded and neuromorphic systems.
Tasks
Published 2020-03-07
URL https://arxiv.org/abs/2003.03533v1
PDF https://arxiv.org/pdf/2003.03533v1.pdf
PWC https://paperswithcode.com/paper/synaptic-metaplasticity-in-binarized-neural
Repo https://github.com/Laborieux-Axel/SynapticMetaplasticityBNN
Framework pytorch

Hierarchical Neural Architecture Search for Single Image Super-Resolution

Title Hierarchical Neural Architecture Search for Single Image Super-Resolution
Authors Yong Guo, Yongsheng Luo, Zhenhao He, Jin Huang, Jian Chen
Abstract Deep neural networks have exhibited promising performance in image super-resolution (SR). Most SR models follow a hierarchical architecture that contains both the cell-level design of computational blocks and the network-level design of the positions of upsampling blocks. However, designing SR models heavily relies on human expertise and is very labor-intensive. More critically, these SR models often contain a huge number of parameters and may not meet the requirements of computation resources in real-world applications. To address the above issues, we propose a Hierarchical Neural Architecture Search (HNAS) method to automatically design promising architectures with different requirements of computation cost. To this end, we design a hierarchical SR search space and propose a hierarchical controller for architecture search. Such a hierarchical controller is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers. Moreover, to design compact architectures with promising performance, we build a joint reward by considering both the performance and computation cost to guide the search process. Extensive experiments on five benchmark datasets demonstrate the superiority of our method over existing methods.
Tasks Image Super-Resolution, Neural Architecture Search, Super-Resolution
Published 2020-03-10
URL https://arxiv.org/abs/2003.04619v1
PDF https://arxiv.org/pdf/2003.04619v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-neural-architecture-search-for
Repo https://github.com/guoyongcs/HNAS-SR
Framework pytorch

Meta-Learning across Meta-Tasks for Few-Shot Learning

Title Meta-Learning across Meta-Tasks for Few-Shot Learning
Authors Nanyi Fei, Zhiwu Lu, Yizhao Gao, Jia Tian, Tao Xiang, Ji-Rong Wen
Abstract Existing meta-learning based few-shot learning (FSL) methods typically adopt an episodic training strategy whereby each episode contains a meta-task. Across episodes, these tasks are sampled randomly and their relationships are ignored. In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning. Specifically, we consider the relationships defined over two types of meta-task pairs and propose different strategies to exploit them. (1) Two meta-tasks with disjoint sets of classes: this pair is interesting because it is reminiscent of the relationship between the source seen classes and target unseen classes, featured with domain gap caused by class differences. A novel learning objective termed meta-domain adaptation (MDA) is proposed to make the meta-learned model more robust to the domain gap. (2) Two meta-tasks with identical sets of classes: this pair is useful because it can be employed to learn models that are robust against poorly sampled few-shots. To that end, a novel meta-knowledge distillation (MKD) objective is formulated. Extensive experiments demonstrate that both MDA and MKD significantly boost the performance of a variety of FSL methods, resulting in new state-of-the-art on three benchmarks.
Tasks Domain Adaptation, Few-Shot Learning, Meta-Learning
Published 2020-02-11
URL https://arxiv.org/abs/2002.04274v2
PDF https://arxiv.org/pdf/2002.04274v2.pdf
PWC https://paperswithcode.com/paper/meta-learning-across-meta-tasks-for-few-shot
Repo https://github.com/neilfei/MLMT-FSL
Framework pytorch

Few-Shot Learning as Domain Adaptation: Algorithm and Analysis

Title Few-Shot Learning as Domain Adaptation: Algorithm and Analysis
Authors Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
Abstract To recognize the unseen classes with only few samples, few-shot learning (FSL) uses prior knowledge learned from the seen classes. A major challenge for FSL is that the distribution of the unseen classes is different from that of those seen, resulting in poor generalization even when a model is meta-trained on the seen classes. This class-difference-caused distribution shift can be considered as a special case of domain shift. In this paper, for the first time, we propose a domain adaptation prototypical network with attention (DAPNA) to explicitly tackle such a domain shift problem in a meta-learning framework. Specifically, armed with a set transformer based attention module, we construct each episode with two sub-episodes without class overlap on the seen classes to simulate the domain shift between the seen and unseen classes. To align the feature distributions of the two sub-episodes with limited training samples, a feature transfer network is employed together with a margin disparity discrepancy (MDD) loss. Importantly, theoretical analysis is provided to give the learning bound of our DAPNA. Extensive experiments show that our DAPNA outperforms the state-of-the-art FSL alternatives, often by significant margins.
Tasks Domain Adaptation, Few-Shot Learning, Meta-Learning
Published 2020-02-06
URL https://arxiv.org/abs/2002.02050v2
PDF https://arxiv.org/pdf/2002.02050v2.pdf
PWC https://paperswithcode.com/paper/few-shot-learning-as-domain-adaptation
Repo https://github.com/JiechaoGuan/FSL-DAPNA
Framework pytorch

Simultaneous Navigation and Radio Mapping for Cellular-Connected UAV with Deep Reinforcement Learning

Title Simultaneous Navigation and Radio Mapping for Cellular-Connected UAV with Deep Reinforcement Learning
Authors Yong Zeng, Xiaoli Xu, Shi Jin, Rui Zhang
Abstract Cellular-connected unmanned aerial vehicle (UAV) is a promising technology to unlock the full potential of UAVs in the future. However, how to achieve ubiquitous three-dimensional (3D) communication coverage for the UAVs in the sky is a new challenge. In this paper, we tackle this challenge by a new coverage-aware navigation approach, which exploits the UAV’s controllable mobility to design its navigation/trajectory to avoid the cellular BSs’ coverage holes while accomplishing their missions. We formulate an UAV trajectory optimization problem to minimize the weighted sum of its mission completion time and expected communication outage duration, and propose a new solution approach based on the technique of deep reinforcement learning (DRL). To further improve the performance, we propose a new framework called simultaneous navigation and radio mapping (SNARM), where the UAV’s signal measurement is used not only for training the deep Q network (DQN) directly, but also to create a radio map that is able to predict the outage probabilities at all locations in the area of interest. This thus enables the generation of simulated UAV trajectories and predicting their expected returns, which are then used to further train the DQN via Dyna technique, thus greatly improving the learning efficiency.
Tasks
Published 2020-03-17
URL https://arxiv.org/abs/2003.07574v1
PDF https://arxiv.org/pdf/2003.07574v1.pdf
PWC https://paperswithcode.com/paper/simultaneous-navigation-and-radio-mapping-for
Repo https://github.com/xuxiaoli-seu/SNARM-UAV-Learning
Framework tf

Frustratingly Simple Few-Shot Object Detection

Title Frustratingly Simple Few-Shot Object Detection
Authors Xin Wang, Thomas E. Huang, Trevor Darrell, Joseph E. Gonzalez, Fisher Yu
Abstract Detecting rare objects from a few examples is an emerging problem. Prior works show meta-learning is a promising approach. But, fine-tuning techniques have drawn scant attention. We find that fine-tuning only the last layer of existing detectors on rare classes is crucial to the few-shot object detection task. Such a simple approach outperforms the meta-learning methods by roughly 2~20 points on current benchmarks and sometimes even doubles the accuracy of the prior methods. However, the high variance in the few samples often leads to the unreliability of existing benchmarks. We revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparisons and build new benchmarks based on three datasets: PASCAL VOC, COCO and LVIS. Again, our fine-tuning approach establishes a new state of the art on the revised benchmarks. The code as well as the pretrained models are available at https://github.com/ucbdrive/few-shot-object-detection.
Tasks Few-Shot Object Detection, Meta-Learning, Object Detection
Published 2020-03-16
URL https://arxiv.org/abs/2003.06957v1
PDF https://arxiv.org/pdf/2003.06957v1.pdf
PWC https://paperswithcode.com/paper/frustratingly-simple-few-shot-object
Repo https://github.com/ucbdrive/few-shot-object-detection
Framework pytorch

A Quadruplet Loss for Enforcing Semantically Coherent Embeddings in Multi-output Classification Problems

Title A Quadruplet Loss for Enforcing Semantically Coherent Embeddings in Multi-output Classification Problems
Authors Hugo Proença, Ehsan Yaghoubi, Pendar Alirezazadeh
Abstract This paper describes one objective function for learning semantically coherent feature embeddings in multi-output classification problems, i.e., when the response variables have dimension higher than one. In particular, we consider the problems of identity retrieval and soft biometrics labelling in visual surveillance environments, which have been attracting growing interests. Inspired by the triplet loss [34] function, we propose a generalization that: 1) defines a metric that considers the number of agreeing labels between pairs of elements; and 2) disregards the notion of anchor, replacing d(A1, A2) < d(A1, B) by d(A, B) < d(C, D), for A, B, C, D distance constraints, according to the number of agreeing labels between pairs. As the triplet loss formulation, our proposal also privileges small distances between positive pairs, but at the same time explicitly enforces that the distance between other pairs corresponds directly to their similarity in terms of agreeing labels. This yields feature embeddings with a strong correspondence between the classes centroids and their semantic descriptions, i.e., where elements are closer to others that share some of their labels than to elements with fully disjoint labels membership. As practical effect, the proposed loss can be seen as particularly suitable for performing joint coarse (soft label) + fine (ID) inference, based on simple rules as k-neighbours, which is a novelty with respect to previous related loss functions. Also, in opposition to its triplet counterpart, the proposed loss is agnostic with regard to any demanding criteria for mining learning instances (such as the semi-hard pairs). Our experiments were carried out in five different datasets (BIODI, LFW, IJB-A, Megaface and PETA) and validate our assumptions, showing highly promising results.
Tasks Semantic Similarity, Semantic Textual Similarity
Published 2020-02-26
URL https://arxiv.org/abs/2002.11644v3
PDF https://arxiv.org/pdf/2002.11644v3.pdf
PWC https://paperswithcode.com/paper/a-quadruplet-loss-for-enforcing-semantically
Repo https://github.com/hugomcp/quadruplets
Framework tf

A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

Title A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings
Authors Iker García, Rodrigo Agerri, German Rigau
Abstract This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary problem. An extensive empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations, obtaining competitive results for Semantic Textual Similarity and state-of-the-art performance for word similarity and POS tagging (English and Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities. In other words, we can leverage pre-trained source embeddings from a resource-rich language in order to improve the word representations for under-resourced languages.
Tasks Cross-Lingual Transfer, Semantic Textual Similarity, Transfer Learning, Word Embeddings
Published 2020-01-17
URL https://arxiv.org/abs/2001.06381v1
PDF https://arxiv.org/pdf/2001.06381v1.pdf
PWC https://paperswithcode.com/paper/a-common-semantic-space-for-monolingual-and
Repo https://github.com/ikergarcia1996/MVM-Embeddings
Framework tf

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Title Recommending Themes for Ad Creative Design via Visual-Linguistic Representations
Authors Yichao Zhou, Shaunak Mishra, Manisha Verma, Narayan Bhamidipati, Wei Wang
Abstract There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.
Tasks Question Answering, Recommendation Systems, Visual Question Answering
Published 2020-01-20
URL https://arxiv.org/abs/2001.07194v2
PDF https://arxiv.org/pdf/2001.07194v2.pdf
PWC https://paperswithcode.com/paper/recommending-themes-for-ad-creative-design
Repo https://github.com/joey1993/ad-themes
Framework none

SUOD: A Scalable Unsupervised Outlier Detection Framework

Title SUOD: A Scalable Unsupervised Outlier Detection Framework
Authors Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Cao Xiao, Yunlong Wang, Jimeng Sun, Leman Akoglu
Abstract Outlier detection is a key data mining task for identifying abnormal objects from massive data. Due to the high expense of acquiring ground truth, practitioners lean towards building a large number of unsupervised models for further combination and analysis, rather than relying on a single model out of reliability consideration. However, this poses scalability challenge to high-dimensional, large datasets. In this study, we propose a three-module framework called SUOD to address the challenge. It can accelerate outlier model building and scoring when a large number of base models are used. It focuses on three complementary levels to speed up the process while controlling prediction performance degradation at the same time. At the data level, its Random Projection module projects high-dimensional data onto diversified low-dimensional subspaces while preserving the pairwise distance relationship. At the model level, SUOD’s Pseudo-supervised Approximation module can approximate and replace fitted unsupervised models by low-cost supervised regressors, leading to fast offline scoring on new-coming samples with better interpretability. At the system level, Balanced Parallel Scheduling module mitigates the workload imbalance within distributed systems, which is helpful for heterogeneous outlier ensembles. As the three modules are independent with different focuses, they have great flexibility to “mix and match”. The framework is also designed with extensibility in mind. One may customize each module based on specific use cases, and the framework may be generalized to other learning tasks as well. Extensive experiments on more than 20 benchmark datasets demonstrate SUOD’s effectiveness. In addition, a real-world deployment system on fraudulent claim analysis by IQVIA is also discussed. The full framework, documentation, and examples are openly shared at https://github.com/yzhao062/SUOD.
Tasks Outlier Detection, outlier ensembles
Published 2020-03-11
URL https://arxiv.org/abs/2003.05731v1
PDF https://arxiv.org/pdf/2003.05731v1.pdf
PWC https://paperswithcode.com/paper/suod-a-scalable-unsupervised-outlier
Repo https://github.com/yzhao062/SUOD
Framework none

Improving GANs for Speech Enhancement

Title Improving GANs for Speech Enhancement
Authors Huy Phan, Ian V. McLoughlin, Lam Pham, Oliver Y. Chén, Philipp Koch, Maarten De Vos, Alfred Mertins
Abstract Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. Most, if not all, existing speech enhancement GANs (SEGANs) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose two novel SEGAN frameworks, iterated SEGAN (ISEGAN) and deep SEGAN (DSEGAN). In the two proposed frameworks, the GAN architectures are composed of multiple generators that are chained to accomplish multiple-stage enhancement mapping which gradually refines the noisy input signals in stage-wise fashion. On the one hand, ISEGAN’s generators share their parameters to learn an iterative enhancement mapping. On the other hand, DSEGAN’s generators share a common architecture but their parameters are independent; as a result, different enhancement mappings are learned at different stages of the network. We empirically demonstrate favorable results obtained by the proposed ISEGAN and DSEGAN frameworks over the vanilla SEGAN. The source code is available at http://github.com/pquochuy/idsegan.
Tasks Speech Enhancement
Published 2020-01-15
URL https://arxiv.org/abs/2001.05532v1
PDF https://arxiv.org/pdf/2001.05532v1.pdf
PWC https://paperswithcode.com/paper/improving-gans-for-speech-enhancement
Repo https://github.com/pquochuy/idsegan
Framework tf

StyleGAN2 Distillation for Feed-forward Image Manipulation

Title StyleGAN2 Distillation for Feed-forward Image Manipulation
Authors Yuri Viazovetskyi, Vladimir Ivashkin, Evgeny Kashin
Abstract StyleGAN2 is a state-of-the-art network in generating realistic images. Besides, it was explicitly trained to have disentangled directions in latent space, which allows efficient image manipulation by varying latent factors. Editing existing images requires embedding a given image into the latent space of StyleGAN2. Latent code optimization via backpropagation is commonly used for qualitative embedding of real world images, although it is prohibitively slow for many applications. We propose a way to distill a particular image manipulation of StyleGAN2 into image-to-image network trained in paired way. The resulting pipeline is an alternative to existing GANs, trained on unpaired data. We provide results of human faces’ transformation: gender swap, aging/rejuvenation, style transfer and image morphing. We show that the quality of generation using our method is comparable to StyleGAN2 backpropagation and current state-of-the-art methods in these particular tasks.
Tasks Image Morphing, Style Transfer
Published 2020-03-07
URL https://arxiv.org/abs/2003.03581v1
PDF https://arxiv.org/pdf/2003.03581v1.pdf
PWC https://paperswithcode.com/paper/stylegan2-distillation-for-feed-forward-image
Repo https://github.com/EvgenyKashin/stylegan2-distillation
Framework none

Human-Aware Motion Deblurring

Title Human-Aware Motion Deblurring
Authors Ziyi Shen, Wenguan Wang, Xiankai Lu, Jianbing Shen, Haibin Ling, Tingfa Xu, Ling Shao
Abstract This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). The proposed model is based on a triple-branch encoder-decoder architecture. The first two branches are learned for sharpening FG humans and BG details, respectively; while the third one produces global, harmonious results by comprehensively fusing multi-scale deblurring information from the two domains. The proposed model is further endowed with a supervised, human-aware attention mechanism in an end-to-end fashion. It learns a soft mask that encodes FG human information and explicitly drives the FG/BG decoder-branches to focus on their specific domains. To further benefit the research towards Human-aware Image Deblurring, we introduce a large-scale dataset, named HIDE, which consists of 8,422 blurry and sharp image pairs with 65,784 densely annotated FG human bounding boxes. HIDE is specifically built to span a broad range of scenes, human object sizes, motion patterns, and background complexities. Extensive experiments on public benchmarks and our dataset demonstrate that our model performs favorably against the state-of-the-art motion deblurring methods, especially in capturing semantic details.
Tasks Deblurring
Published 2020-01-19
URL https://arxiv.org/abs/2001.06816v1
PDF https://arxiv.org/pdf/2001.06816v1.pdf
PWC https://paperswithcode.com/paper/human-aware-motion-deblurring-1
Repo https://github.com/joanshen0508/HA_deblur
Framework none

CSNNs: Unsupervised, Backpropagation-free Convolutional Neural Networks for Representation Learning

Title CSNNs: Unsupervised, Backpropagation-free Convolutional Neural Networks for Representation Learning
Authors Bonifaz Stuhr, Jürgen Brauer
Abstract This work combines Convolutional Neural Networks (CNNs), clustering via Self-Organizing Maps (SOMs) and Hebbian Learning to propose the building blocks of Convolutional Self-Organizing Neural Networks (CSNNs), which learn representations in an unsupervised and Backpropagation-free manner. Our approach replaces the learning of traditional convolutional layers from CNNs with the competitive learning procedure of SOMs and simultaneously learns local masks between those layers with separate Hebbian-like learning rules to overcome the problem of disentangling factors of variation when filters are learned through clustering. We investigate the learned representation by designing two simple models with our building blocks, achieving comparable performance to many methods which use Backpropagation, while we reach comparable performance on Cifar10 and give baseline performances on Cifar100, Tiny ImageNet and a small subset of ImageNet for Backpropagation-free methods.
Tasks Representation Learning
Published 2020-01-28
URL https://arxiv.org/abs/2001.10388v2
PDF https://arxiv.org/pdf/2001.10388v2.pdf
PWC https://paperswithcode.com/paper/csnns-unsupervised-backpropagation-free
Repo https://github.com/BonifazStuhr/CSNN
Framework tf

Block-wise Dynamic Sparseness

Title Block-wise Dynamic Sparseness
Authors Amir Hadifar, Johannes Deleu, Chris Develder, Thomas Demeester
Abstract Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these models has seen significant research attention in recent years. In this paper, we present a new method for \emph{dynamic sparseness}, whereby part of the computations are omitted dynamically, based on the input. For efficiency, we combined the idea of dynamic sparseness with block-wise matrix-vector multiplications. In contrast to static sparseness, which permanently zeroes out selected positions in weight matrices, our method preserves the full network capabilities by potentially accessing any trained weights. Yet, matrix vector multiplications are accelerated by omitting a pre-defined fraction of weight blocks from the matrix, based on the input. Experimental results on the task of language modeling, using recurrent and quasi-recurrent models, show that the proposed method can outperform a magnitude-based static sparseness baseline. In addition, our method achieves similar language modeling perplexities as the dense baseline, at half the computational cost at inference time.
Tasks Language Modelling
Published 2020-01-14
URL https://arxiv.org/abs/2001.04686v1
PDF https://arxiv.org/pdf/2001.04686v1.pdf
PWC https://paperswithcode.com/paper/block-wise-dynamic-sparseness
Repo https://github.com/hadifar/dynamic-sparseness
Framework none
comments powered by Disqus