January 31, 2020

3207 words 16 mins read

Paper Group AWR 403

Paper Group AWR 403

Wave Physics as an Analog Recurrent Neural Network. Self-Monitoring Navigation Agent via Auxiliary Progress Estimation. Improving the Harmony of the Composite Image by Spatial-Separated Attention Module. Improved Precision and Recall Metric for Assessing Generative Models. Investigating Recurrent Neural Network Memory Structures using Neuro-Evoluti …

Wave Physics as an Analog Recurrent Neural Network

Title Wave Physics as an Analog Recurrent Neural Network
Authors Tyler W. Hughes, Ian A. D. Williamson, Momchil Minkov, Shanhui Fan
Abstract Analog machine learning hardware platforms promise to be faster and more energy-efficient than their digital counterparts. Wave physics, as found in acoustics and optics, is a natural candidate for building analog processors for time-varying signals. Here we identify a mapping between the dynamics of wave physics, and the computation in recurrent neural networks. This mapping indicates that physical wave systems can be trained to learn complex features in temporal data, using standard training techniques for neural networks. As a demonstration, we show that an inverse-designed inhomogeneous medium can perform vowel classification on raw audio signals as their waveforms scatter and propagate through it, achieving performance comparable to a standard digital implementation of a recurrent neural network. These findings pave the way for a new class of analog machine learning platforms, capable of fast and efficient processing of information in its native domain.
Tasks Vowel Classification
Published 2019-04-29
URL https://arxiv.org/abs/1904.12831v2
PDF https://arxiv.org/pdf/1904.12831v2.pdf
PWC https://paperswithcode.com/paper/wave-physics-as-an-analog-recurrent-neural
Repo https://github.com/fancompute/wavetorch
Framework pytorch

Self-Monitoring Navigation Agent via Auxiliary Progress Estimation

Title Self-Monitoring Navigation Agent via Auxiliary Progress Estimation
Authors Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
Abstract The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments. This challenging task demands that the agent be aware of which instruction was completed, which instruction is needed next, which way to go, and its navigation progress towards the goal. In this paper, we introduce a self-monitoring agent with two complementary components: (1) visual-textual co-grounding module to locate the instruction completed in the past, the instruction required for the next action, and the next moving direction from surrounding images and (2) progress monitor to ensure the grounded instruction correctly reflects the navigation progress. We test our self-monitoring agent on a standard benchmark and analyze our proposed approach through a series of ablation studies that elucidate the contributions of the primary components. Using our proposed method, we set the new state of the art by a significant margin (8% absolute increase in success rate on the unseen test set). Code is available at https://github.com/chihyaoma/selfmonitoring-agent .
Tasks Natural Language Visual Grounding, Vision-Language Navigation, Visual Navigation
Published 2019-01-10
URL http://arxiv.org/abs/1901.03035v1
PDF http://arxiv.org/pdf/1901.03035v1.pdf
PWC https://paperswithcode.com/paper/self-monitoring-navigation-agent-via
Repo https://github.com/ayusefi/Localization-Papers
Framework none

Improving the Harmony of the Composite Image by Spatial-Separated Attention Module

Title Improving the Harmony of the Composite Image by Spatial-Separated Attention Module
Authors Xiaodong Cun, Chi-Man Pun
Abstract Image composition is one of the most important applications in image processing. However, the inharmonious appearance between the spliced region and background degrade the quality of the image. Thus, we address the problem of Image Harmonization: Given a spliced image and the mask of the spliced region, we try to harmonize the “style” of the pasted region with the background (non-spliced region). Previous approaches have been focusing on learning directly by the neural network. In this work, we start from an empirical observation: the differences can only be found in the spliced region between the spliced image and the harmonized result while they share the same semantic information and the appearance in the non-spliced region. Thus, in order to learn the feature map in the masked region and the others individually, we propose a novel attention module named Spatial-Separated Attention Module (S2AM). Furthermore, we design a novel image harmonization framework by inserting the S2AM in the coarser low-level features of the Unet structure in two different ways. Besides image harmonization, we make a big step for harmonizing the composite image without the specific mask under previous observation. The experiments show that the proposed S2AM performs better than other state-of-the-art attention modules in our task. Moreover, we demonstrate the advantages of our model against other state-of-the-art image harmonization methods via criteria from multiple points of view. Code is available at https://github.com/vinthony/s2am
Tasks
Published 2019-07-15
URL https://arxiv.org/abs/1907.06406v3
PDF https://arxiv.org/pdf/1907.06406v3.pdf
PWC https://paperswithcode.com/paper/improving-the-harmony-of-the-composite-image
Repo https://github.com/vinthony/s2am
Framework pytorch

Improved Precision and Recall Metric for Assessing Generative Models

Title Improved Precision and Recall Metric for Assessing Generative Models
Authors Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, Timo Aila
Abstract The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method. Finally, we extend our metric to estimate the perceptual quality of individual samples, and use this to study latent space interpolations.
Tasks Image Generation
Published 2019-04-15
URL https://arxiv.org/abs/1904.06991v3
PDF https://arxiv.org/pdf/1904.06991v3.pdf
PWC https://paperswithcode.com/paper/improved-precision-and-recall-metric-for
Repo https://github.com/kynkaat/improved-precision-and-recall-metric
Framework tf

Investigating Recurrent Neural Network Memory Structures using Neuro-Evolution

Title Investigating Recurrent Neural Network Memory Structures using Neuro-Evolution
Authors Alexander Ororbia, Ahmed Ahmed Elsaid, Travis Desell
Abstract This paper presents a new algorithm, Evolutionary eXploration of Augmenting Memory Models (EXAMM), which is capable of evolving recurrent neural networks (RNNs) using a wide variety of memory structures, such as Delta-RNN, GRU, LSTM, MGU and UGRNN cells. EXAMM evolved RNNs to perform prediction of large-scale, real world time series data from the aviation and power industries. These data sets consist of very long time series (thousands of readings), each with a large number of potentially correlated and dependent parameters. Four different parameters were selected for prediction and EXAMM runs were performed using each memory cell type alone, each cell type with feed forward nodes, and with all possible memory cell types. Evolved RNN performance was measured using repeated k-fold cross validation, resulting in 1210 EXAMM runs which evolved 2,420,000 RNNs in 12,100 CPU hours on a high performance computing cluster. Generalization of the evolved RNNs was examined statistically, providing interesting findings that can help refine the RNN memory cell design as well as inform future neuro-evolution algorithms development.
Tasks Time Series
Published 2019-02-06
URL http://arxiv.org/abs/1902.02390v2
PDF http://arxiv.org/pdf/1902.02390v2.pdf
PWC https://paperswithcode.com/paper/investigating-recurrent-neural-network-memory
Repo https://github.com/travisdesell/exact
Framework none

LOSSGRAD: automatic learning rate in gradient descent

Title LOSSGRAD: automatic learning rate in gradient descent
Authors Bartosz Wójcik, Łukasz Maziarka, Jacek Tabor
Abstract In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function $f$, a point $x$, and the gradient $\nabla_x f$ of $f$, we aim to find the step-size $h$ which is (locally) optimal, i.e. satisfies: $$ h=arg,min_{t \geq 0} f(x-t \nabla_x f). $$ Making use of quadratic approximation, we show that the algorithm satisfies the above assumption. We experimentally show that our method is insensitive to the choice of initial learning rate while achieving results comparable to other methods.
Tasks
Published 2019-02-20
URL http://arxiv.org/abs/1902.07656v1
PDF http://arxiv.org/pdf/1902.07656v1.pdf
PWC https://paperswithcode.com/paper/lossgrad-automatic-learning-rate-in-gradient
Repo https://github.com/bartwojcik/lossgrad
Framework pytorch

Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets

Title Biologically inspired alternatives to backpropagation through time for learning in recurrent neural nets
Authors Guillaume Bellec, Franz Scherr, Elias Hajek, Darjan Salaj, Robert Legenstein, Wolfgang Maass
Abstract The way how recurrently connected networks of spiking neurons in the brain acquire powerful information processing capabilities through learning has remained a mystery. This lack of understanding is linked to a lack of learning algorithms for recurrent networks of spiking neurons (RSNNs) that are both functionally powerful and can be implemented by known biological mechanisms. Since RSNNs are simultaneously a primary target for implementations of brain-inspired circuits in neuromorphic hardware, this lack of algorithmic insight also hinders technological progress in that area. The gold standard for learning in recurrent neural networks in machine learning is back-propagation through time (BPTT), which implements stochastic gradient descent with regard to a given loss function. But BPTT is unrealistic from a biological perspective, since it requires a transmission of error signals backwards in time and in space, i.e., from post- to presynaptic neurons. We show that an online merging of locally available information during a computation with suitable top-down learning signals in real-time provides highly capable approximations to BPTT. For tasks where information on errors arises only late during a network computation, we enrich locally available information through feedforward eligibility traces of synapses that can easily be computed in an online manner. The resulting new generation of learning algorithms for recurrent neural networks provides a new understanding of network learning in the brain that can be tested experimentally. In addition, these algorithms provide efficient methods for on-chip training of RSNNs in neuromorphic hardware.
Tasks
Published 2019-01-25
URL http://arxiv.org/abs/1901.09049v2
PDF http://arxiv.org/pdf/1901.09049v2.pdf
PWC https://paperswithcode.com/paper/biologically-inspired-alternatives-to
Repo https://github.com/NathanWycoff/eprop
Framework none

Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch

Title Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch
Authors Kaiyang Zhou, Tao Xiang
Abstract Person re-identification (re-ID), which aims to re-identify people across different camera views, has been significantly advanced by deep learning in recent years, particularly with convolutional neural networks (CNNs). In this paper, we present Torchreid, a software library built on PyTorch that allows fast development and end-to-end training and evaluation of deep re-ID models. As a general-purpose framework for person re-ID research, Torchreid provides (1) unified data loaders that support 15 commonly used re-ID benchmark datasets covering both image and video domains, (2) streamlined pipelines for quick development and benchmarking of deep re-ID models, and (3) implementations of the latest re-ID CNN architectures along with their pre-trained models to facilitate reproducibility as well as future research. With a high-level modularity in its design, Torchreid offers a great flexibility to allow easy extension to new datasets, CNN models and loss functions.
Tasks Person Re-Identification
Published 2019-10-22
URL https://arxiv.org/abs/1910.10093v1
PDF https://arxiv.org/pdf/1910.10093v1.pdf
PWC https://paperswithcode.com/paper/torchreid-a-library-for-deep-learning-person
Repo https://github.com/KaiyangZhou/deep-person-reid
Framework pytorch

Learning-to-Learn Stochastic Gradient Descent with Biased Regularization

Title Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
Authors Giulia Denevi, Carlo Ciliberto, Riccardo Grazzi, Massimiliano Pontil
Abstract We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks sampled from an unknown distribution. As class of algorithms we consider Stochastic Gradient Descent on the true risk regularized by the square euclidean distance to a bias vector. We present an average excess risk bound for such a learning algorithm. This result quantifies the potential benefit of using a bias vector with respect to the unbiased case. We then address the problem of estimating the bias from a sequence of tasks. We propose a meta-algorithm which incrementally updates the bias, as new tasks are observed. The low space and time complexity of this approach makes it appealing in practice. We provide guarantees on the learning ability of the meta-algorithm. A key feature of our results is that, when the number of tasks grows and their variance is relatively small, our learning-to-learn approach has a significant advantage over learning each task in isolation by Stochastic Gradient Descent without a bias term. We report on numerical experiments which demonstrate the effectiveness of our approach.
Tasks
Published 2019-03-25
URL http://arxiv.org/abs/1903.10399v1
PDF http://arxiv.org/pdf/1903.10399v1.pdf
PWC https://paperswithcode.com/paper/learning-to-learn-stochastic-gradient-descent
Repo https://github.com/prolearner/onlineLTL
Framework none

Semantic Object Accuracy for Generative Text-to-Image Synthesis

Title Semantic Object Accuracy for Generative Text-to-Image Synthesis
Authors Tobias Hinz, Stefan Heinrich, Stefan Wermter
Abstract Generative adversarial networks conditioned on simple textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image synthesis models is still challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption. To address the aforementioned challenges we introduce both a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption. Our model adds an object pathway to both the generator and the discriminator to explicitly learn features of individual objects. The SOA uses a pre-trained object detector to evaluate if a generated image contains objects that are specifically mentioned in the image caption, e.g. whether an image generated from “a car driving down the street” contains a car. Our evaluation shows that models which explicitly model individual objects outperform models which only model global image characteristics. However, the SOA also shows that despite this increased performance current models still struggle to generate images that contain realistic objects of multiple different domains.
Tasks Image Captioning, Image Generation, Text-to-Image Generation
Published 2019-10-29
URL https://arxiv.org/abs/1910.13321v1
PDF https://arxiv.org/pdf/1910.13321v1.pdf
PWC https://paperswithcode.com/paper/191013321
Repo https://github.com/tohinz/semantic-object-accuracy-for-generative-text-to-image-synthesis
Framework pytorch

E-Sports Talent Scouting Based on Multimodal Twitch Stream Data

Title E-Sports Talent Scouting Based on Multimodal Twitch Stream Data
Authors Anna Belova, Wen He, Ziyi Zhong
Abstract We propose and investigate feasibility of a novel task that consists in finding e-sports talent using multimodal Twitch chat and video stream data. In that, we focus on predicting the ranks of Counter-Strike: Global Offensive (CS:GO) gamers who broadcast their games on Twitch. During January 2019-April 2019, we have built two Twitch stream collections: One for 425 publicly ranked CS:GO gamers and one for 9,928 unranked CS:GO gamers. We extract neural features from video, audio and text chat data and estimate modality-specific probabilities for a gamer to be top-ranked during the data collection time-frame. A hierarchical Bayesian model is then used to pool the evidence across modalities and generate estimates of intrinsic skill for each gamer. Our modeling is validated through correlating the intrinsic skill predictions with May 2019 ranks of the publicly profiled gamers.
Tasks
Published 2019-07-02
URL https://arxiv.org/abs/1907.01615v1
PDF https://arxiv.org/pdf/1907.01615v1.pdf
PWC https://paperswithcode.com/paper/e-sports-talent-scouting-based-on-multimodal
Repo https://github.com/mug31416/E-sports-on-Twitch
Framework none

Bonsai – Diverse and Shallow Trees for Extreme Multi-label Classification

Title Bonsai – Diverse and Shallow Trees for Extreme Multi-label Classification
Authors Sujay Khandagale, Han Xiao, Rohit Babbar
Abstract Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousand or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including : (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds - fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, \bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at \url{https://github.com/xmc-aalto/bonsai}
Tasks Extreme Multi-Label Classification, Multi-Label Classification, Multi-Label Learning
Published 2019-04-17
URL https://arxiv.org/abs/1904.08249v2
PDF https://arxiv.org/pdf/1904.08249v2.pdf
PWC https://paperswithcode.com/paper/bonsai-diverse-and-shallow-trees-for-extreme
Repo https://github.com/scakc/ConvBonsai-Tree
Framework tf

A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification

Title A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification
Authors Hao Luo, Wei Jiang, Youzhi Gu, Fuxu Liu, Xingyu Liao, Shenqi Lai, Jianyang Gu
Abstract This study explores a simple but strong baseline for person re-identification (ReID). Person ReID with deep neural networks has progressed and achieved high performance in recent years. However, many state-of-the-art methods design complex network structures and concatenate multi-branch features. In the literature, some effective training tricks briefly appear in several papers or source codes. The present study collects and evaluates these effective training tricks in person ReID. By combining these tricks, the model achieves 94.5% rank-1 and 85.9% mean average precision on Market1501 with only using the global features of ResNet50. The performance surpasses all existing global- and part-based baselines in person ReID. We propose a novel neck structure named as batch normalization neck (BNNeck). BNNeck adds a batch normalization layer after global pooling layer to separate metric and classification losses into two different feature spaces because we observe they are inconsistent in one embedding space. Extended experiments show that BNNeck can boost the baseline, and our baseline can improve the performance of existing state-of-the-art methods. Our codes and models are available at: https://github.com/michuanhaohao/reid-strong-baseline.
Tasks Person Re-Identification
Published 2019-06-19
URL https://arxiv.org/abs/1906.08332v2
PDF https://arxiv.org/pdf/1906.08332v2.pdf
PWC https://paperswithcode.com/paper/a-strong-baseline-and-batch-normalization
Repo https://github.com/XingangPan/IBN-Net
Framework pytorch

One-shot Face Reenactment

Title One-shot Face Reenactment
Authors Yunxuan Zhang, Siwei Zhang, Yue He, Cheng Li, Chen Change Loy, Ziwei Liu
Abstract To enable realistic shape (e.g. pose and expression) transfer, existing face reenactment methods rely on a set of target faces for learning subject-specific traits. However, in real-world scenario end-users often only have one target face at hand, rendering existing methods inapplicable. In this work, we bridge this gap by proposing a novel one-shot face reenactment learning framework. Our key insight is that the one-shot learner should be able to disentangle and compose appearance and shape information for effective modeling. Specifically, the target face appearance and the source face shape are first projected into latent spaces with their corresponding encoders. Then these two latent spaces are associated by learning a shared decoder that aggregates multi-level features to produce the final reenactment results. To further improve the synthesizing quality on mustache and hair regions, we additionally propose FusionNet which combines the strengths of our learned decoder and the traditional warping method. Extensive experiments show that our one-shot face reenactment system achieves superior transfer fidelity as well as identity preserving capability than alternatives. More remarkably, our approach trained with only one target image per subject achieves competitive results to those using a set of target images, demonstrating the practical merit of this work. Code, models and an additional set of reenacted faces have been publicly released at the project page.
Tasks Face Reconstruction, Face Reenactment
Published 2019-08-05
URL https://arxiv.org/abs/1908.03251v1
PDF https://arxiv.org/pdf/1908.03251v1.pdf
PWC https://paperswithcode.com/paper/one-shot-face-reenactment
Repo https://github.com/bj80heyue/Learning_One_Shot_Face_Reenactment
Framework pytorch
Title Genetic Network Architecture Search
Authors Hai Victor Habi, Gil Rafalovich
Abstract We propose a method for learning the neural network architecture that based on Genetic Algorithm (GA). Our approach uses a genetic algorithm integrated with standard Stochastic Gradient Descent(SGD) which allows the sharing of weights across all architecture solutions. The method uses GA to design a sub-graph of Convolution cell which maximizes the accuracy on the validation-set. Through experiments, we demonstrate this methods performance on both CIFAR10 and CIFAR100 dataset with an accuracy of 96% and 80.1%. The code and result of this work available in GitHub:https://github.com/haihabi/GeneticNAS.
Tasks Neural Architecture Search
Published 2019-07-05
URL https://arxiv.org/abs/1907.02871v1
PDF https://arxiv.org/pdf/1907.02871v1.pdf
PWC https://paperswithcode.com/paper/genetic-network-architecture-search
Repo https://github.com/haihabi/GeneticNAS
Framework pytorch
comments powered by Disqus