Paper Group ANR 192
Large Batch Training Does Not Need Warmup. Spatial-Temporal Transformer Networks for Traffic Flow Forecasting. Improved Image Coding Autoencoder With Deep Learning. Update Aware Device Scheduling for Federated Learning at the Wireless Edge. The iCub multisensor datasets for robot and computer vision applications. Selective Segmentation Networks Usi …
Large Batch Training Does Not Need Warmup
Title | Large Batch Training Does Not Need Warmup |
Authors | Zhouyuan Huo, Bin Gu, Heng Huang |
Abstract | Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications. However, the optimizer converges slowly at early epochs and there is a gap between large-batch deep learning optimization heuristics and theoretical underpinnings. In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training. We also analyze the convergence rate of the proposed method by introducing a new fine-grained analysis of gradient-based methods. Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques, including linear learning rate scaling, gradual warmup, and layer-wise adaptive rate scaling. Extensive experiments demonstrate that the proposed algorithm outperforms gradual warmup technique by a large margin and defeats the convergence of the state-of-the-art large-batch optimizer in training advanced deep neural networks (ResNet, DenseNet, MobileNet) on ImageNet dataset. |
Tasks | |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01576v1 |
https://arxiv.org/pdf/2002.01576v1.pdf | |
PWC | https://paperswithcode.com/paper/large-batch-training-does-not-need-warmup |
Repo | |
Framework | |
Spatial-Temporal Transformer Networks for Traffic Flow Forecasting
Title | Spatial-Temporal Transformer Networks for Traffic Flow Forecasting |
Authors | Mingxing Xu, Wenrui Dai, Chunmiao Liu, Xing Gao, Weiyao Lin, Guo-Jun Qi, Hongkai Xiong |
Abstract | Traffic forecasting has emerged as a core component of intelligent transportation systems. However, timely accurate traffic forecasting, especially long-term forecasting, still remains an open challenge due to the highly nonlinear and dynamic spatial-temporal dependencies of traffic flows. In this paper, we propose a novel paradigm of Spatial-Temporal Transformer Networks (STTNs) that leverages dynamical directed spatial dependencies and long-range temporal dependencies to improve the accuracy of long-term traffic forecasting. Specifically, we present a new variant of graph neural networks, named spatial transformer, by dynamically modeling directed spatial dependencies with self-attention mechanism to capture realtime traffic conditions as well as the directionality of traffic flows. Furthermore, different spatial dependency patterns can be jointly modeled with multi-heads attention mechanism to consider diverse relationships related to different factors (e.g. similarity, connectivity and covariance). On the other hand, the temporal transformer is utilized to model long-range bidirectional temporal dependencies across multiple time steps. Finally, they are composed as a block to jointly model the spatial-temporal dependencies for accurate traffic prediction. Compared to existing works, the proposed model enables fast and scalable training over a long range spatial-temporal dependencies. Experiment results demonstrate that the proposed model achieves competitive results compared with the state-of-the-arts, especially forecasting long-term traffic flows on real-world PeMS-Bay and PeMSD7(M) datasets. |
Tasks | Traffic Prediction |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.02908v1 |
https://arxiv.org/pdf/2001.02908v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-temporal-transformer-networks-for |
Repo | |
Framework | |
Improved Image Coding Autoencoder With Deep Learning
Title | Improved Image Coding Autoencoder With Deep Learning |
Authors | Licheng Xiao, Hairong Wang, Nam Ling |
Abstract | In this paper, we build autoencoder based pipelines for extreme end-to-end image compression based on Ball'e’s approach, which is the state-of-the-art open source implementation in image compression using deep learning. We deepened the network by adding one more hidden layer before each strided convolutional layer with exactly the same number of down-samplings and up-samplings. Our approach outperformed Ball'e’s approach, and achieved around 4.0% reduction in bits per pixel (bpp), 0.03% increase in multi-scale structural similarity (MS-SSIM), and only 0.47% decrease in peak signal-to-noise ratio (PSNR), It also outperforms all traditional image compression methods including JPEG2000 and HEIC by at least 20% in terms of compression efficiency at similar reconstruction image quality. Regarding encoding and decoding time, our approach takes similar amount of time compared with traditional methods with the support of GPU, which means it’s almost ready for industrial applications. |
Tasks | Image Compression |
Published | 2020-02-28 |
URL | https://arxiv.org/abs/2002.12521v1 |
https://arxiv.org/pdf/2002.12521v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-image-coding-autoencoder-with-deep |
Repo | |
Framework | |
Update Aware Device Scheduling for Federated Learning at the Wireless Edge
Title | Update Aware Device Scheduling for Federated Learning at the Wireless Edge |
Authors | Mohammad Mohammadi Amiri, Deniz Gunduz, Sanjeev R. Kulkarni, H. Vincent Poor |
Abstract | We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources. We design novel scheduling policies, that decide on the subset of devices to transmit at each round not only based on their channel conditions, but also on the significance of their local model updates. Numerical results show that the proposed scheduling policy provides a better long-term performance than scheduling policies based only on either of the two metrics individually. We also observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., more devices should be scheduled. |
Tasks | |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2001.10402v1 |
https://arxiv.org/pdf/2001.10402v1.pdf | |
PWC | https://paperswithcode.com/paper/update-aware-device-scheduling-for-federated |
Repo | |
Framework | |
The iCub multisensor datasets for robot and computer vision applications
Title | The iCub multisensor datasets for robot and computer vision applications |
Authors | Murat Kirtay, Ugo Albanese, Lorenzo Vannucci, Guido Schillaci, Cecilia Laschi, Egidio Falotico |
Abstract | This document presents novel datasets, constructed by employing the iCub robot equipped with an additional depth sensor and color camera. We used the robot to acquire color and depth information for 210 objects in different acquisition scenarios. At this end, the results were large scale datasets for robot and computer vision applications: object representation, object recognition and classification, and action recognition. |
Tasks | Object Recognition |
Published | 2020-03-04 |
URL | https://arxiv.org/abs/2003.01994v1 |
https://arxiv.org/pdf/2003.01994v1.pdf | |
PWC | https://paperswithcode.com/paper/the-icub-multisensor-datasets-for-robot-and |
Repo | |
Framework | |
Selective Segmentation Networks Using Top-Down Attention
Title | Selective Segmentation Networks Using Top-Down Attention |
Authors | Mahdi Biparva, John Tsotsos |
Abstract | Convolutional neural networks model the transformation of the input sensory data at the bottom of a network hierarchy to the semantic information at the top of the visual hierarchy. Feedforward processing is sufficient for some object recognition tasks. Top-Down selection is potentially required in addition to the Bottom-Up feedforward pass. It can, in part, address the shortcoming of the loss of location information imposed by the hierarchical feature pyramids. We propose a unified 2-pass framework for object segmentation that augments Bottom-Up \convnets with a Top-Down selection network. We utilize the top-down selection gating activities to modulate the bottom-up hidden activities for segmentation predictions. We develop an end-to-end multi-task framework with loss terms satisfying task requirements at the two ends of the network. We evaluate the proposed network on benchmark datasets for semantic segmentation, and show that networks with the Top-Down selection capability outperform the baseline model. Additionally, we shed light on the superior aspects of the new segmentation paradigm and qualitatively and quantitatively support the efficiency of the novel framework over the baseline model that relies purely on parametric skip connections. |
Tasks | Object Recognition, Semantic Segmentation |
Published | 2020-02-04 |
URL | https://arxiv.org/abs/2002.01125v1 |
https://arxiv.org/pdf/2002.01125v1.pdf | |
PWC | https://paperswithcode.com/paper/selective-segmentation-networks-using-top |
Repo | |
Framework | |
Tractable Reinforcement Learning of Signal Temporal Logic Objectives
Title | Tractable Reinforcement Learning of Signal Temporal Logic Objectives |
Authors | Harish Venkataraman, Derya Aksaray, Peter Seiler |
Abstract | Signal temporal logic (STL) is an expressive language to specify time-bound real-world robotic tasks and safety specifications. Recently, there has been an interest in learning optimal policies to satisfy STL specifications via reinforcement learning (RL). Learning to satisfy STL specifications often needs a sufficient length of state history to compute reward and the next action. The need for history results in exponential state-space growth for the learning problem. Thus the learning problem becomes computationally intractable for most real-world applications. In this paper, we propose a compact means to capture state history in a new augmented state-space representation. An approximation to the objective (maximizing probability of satisfaction) is proposed and solved for in the new augmented state-space. We show the performance bound of the approximate solution and compare it with the solution of an existing technique via simulations. |
Tasks | |
Published | 2020-01-26 |
URL | https://arxiv.org/abs/2001.09467v2 |
https://arxiv.org/pdf/2001.09467v2.pdf | |
PWC | https://paperswithcode.com/paper/tractable-reinforcement-learning-of-signal |
Repo | |
Framework | |
Real-time information retrieval from Identity cards
Title | Real-time information retrieval from Identity cards |
Authors | Niloofar Tavakolian, Azadeh Nazemi, Donal Fitzpatrick |
Abstract | Information is frequently retrieved from valid personal ID cards by the authorised organisation to address different purposes. The successful information retrieval (IR) depends on the accuracy and timing process. A process which necessitates a long time to respond is frustrating for both sides in the exchange of data. This paper aims to propose a series of state-of-the-art methods for the journey of an Identification card (ID) from the scanning or capture phase to the point before Optical character recognition (OCR). The key factors for this proposal are the accuracy and speed of the process during the journey. The experimental results of this research prove that utilising the methods based on deep learning, such as Efficient and Accurate Scene Text (EAST) detector and Deep Neural Network (DNN) for face detection, instead of traditional methods increase the efficiency considerably. |
Tasks | Face Detection, Information Retrieval, Optical Character Recognition |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.12103v1 |
https://arxiv.org/pdf/2003.12103v1.pdf | |
PWC | https://paperswithcode.com/paper/real-time-information-retrieval-from-identity |
Repo | |
Framework | |
SAR Tomography at the Limit: Building Height Reconstruction Using Only 3-5 TanDEM-X Bistatic Interferograms
Title | SAR Tomography at the Limit: Building Height Reconstruction Using Only 3-5 TanDEM-X Bistatic Interferograms |
Authors | Yilei Shi, Richard Bamler, Yuanyuan Wang, Xiao Xiang Zhu |
Abstract | Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use more than twenty interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban mapping using TanDEM-X data where only a few acquisitions are available in average for each city. This work proposes a new SAR tomographic processing framework to work with those extremely small stacks, which integrates the non-local filtering into SAR tomography inversion. The applicability of the algorithm is demonstrated using a TanDEM-X multi-baseline stack with 5 bistatic interferograms over the whole city of Munich, Germany. Systematic comparison of our result with TanDEM-X raw digital elevation models (DEM) and airborne LiDAR data shows that the relative height accuracy of two third buildings is within two meters, which outperforms the TanDEM-X raw DEM. The promising performance of the proposed algorithm paved the first step towards high quality large-scale 3-D urban mapping. |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07803v1 |
https://arxiv.org/pdf/2003.07803v1.pdf | |
PWC | https://paperswithcode.com/paper/sar-tomography-at-the-limit-building-height |
Repo | |
Framework | |
COEBA: A Coevolutionary Bat Algorithm for Discrete Evolutionary Multitasking
Title | COEBA: A Coevolutionary Bat Algorithm for Discrete Evolutionary Multitasking |
Authors | Eneko Osaba, Javier Del Ser, Xin-She Yang, Andres Iglesias, Akemi Galvez |
Abstract | Multitasking optimization is an emerging research field which has attracted lot of attention in the scientific community. The main purpose of this paradigm is how to solve multiple optimization problems or tasks simultaneously by conducting a single search process. The main catalyst for reaching this objective is to exploit possible synergies and complementarities among the tasks to be optimized, helping each other by virtue of the transfer of knowledge among them (thereby being referred to as Transfer Optimization). In this context, Evolutionary Multitasking addresses Transfer Optimization problems by resorting to concepts from Evolutionary Computation for simultaneous solving the tasks at hand. This work contributes to this trend by proposing a novel algorithmic scheme for dealing with multitasking environments. The proposed approach, coined as Coevolutionary Bat Algorithm, finds its inspiration in concepts from both co-evolutionary strategies and the metaheuristic Bat Algorithm. We compare the performance of our proposed method with that of its Multifactorial Evolutionary Algorithm counterpart over 15 different multitasking setups, composed by eight reference instances of the discrete Traveling Salesman Problem. The experimentation and results stemming therefrom support the main hypothesis of this study: the proposed Coevolutionary Bat Algorithm is a promising meta-heuristic for solving Evolutionary Multitasking scenarios. |
Tasks | |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.11628v1 |
https://arxiv.org/pdf/2003.11628v1.pdf | |
PWC | https://paperswithcode.com/paper/coeba-a-coevolutionary-bat-algorithm-for |
Repo | |
Framework | |
CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data
Title | CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data |
Authors | Youngjae Yu, Seunghwan Lee, Yuncheol Choi, Gunhee Kim |
Abstract | We present an approach named CurlingNet that can measure the semantic distance of composition of image-text embedding. In order to learn an effective image-text composition for the data in the fashion domain, our model proposes two key components as follows. First, the Delivery makes the transition of a source image in an embedding space. Second, the Sweeping emphasizes query-related components of fashion images in the embedding space. We utilize a channel-wise gating mechanism to make it possible. Our single model outperforms previous state-of-the-art image-text composition models including TIRG and FiLM. We participate in the first fashion-IQ challenge in ICCV 2019, for which ensemble of our model achieves one of the best performances. |
Tasks | |
Published | 2020-03-27 |
URL | https://arxiv.org/abs/2003.12299v2 |
https://arxiv.org/pdf/2003.12299v2.pdf | |
PWC | https://paperswithcode.com/paper/curlingnet-compositional-learning-between |
Repo | |
Framework | |
How deep is your encoder: an analysis of features descriptors for an autoencoder-based audio-visual quality metric
Title | How deep is your encoder: an analysis of features descriptors for an autoencoder-based audio-visual quality metric |
Authors | Helard Martinez, Andrew Hines, Mylene C. Q. Farias |
Abstract | The development of audio-visual quality assessment models poses a number of challenges in order to obtain accurate predictions. One of these challenges is the modelling of the complex interaction that audio and visual stimuli have and how this interaction is interpreted by human users. The No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder (NAViDAd) deals with this problem from a machine learning perspective. The metric receives two sets of audio and video features descriptors and produces a low-dimensional set of features used to predict the audio-visual quality. A basic implementation of NAViDAd was able to produce accurate predictions tested with a range of different audio-visual databases. The current work performs an ablation study on the base architecture of the metric. Several modules are removed or re-trained using different configurations to have a better understanding of the metric functionality. The results presented in this study provided important feedback that allows us to understand the real capacity of the metric’s architecture and eventually develop a much better audio-visual quality metric. |
Tasks | |
Published | 2020-03-24 |
URL | https://arxiv.org/abs/2003.11100v1 |
https://arxiv.org/pdf/2003.11100v1.pdf | |
PWC | https://paperswithcode.com/paper/how-deep-is-your-encoder-an-analysis-of |
Repo | |
Framework | |
Population-Based Training for Loss Function Optimization
Title | Population-Based Training for Loss Function Optimization |
Authors | Jason Liang, Santiago Gonzalez, Risto Miikkulainen |
Abstract | Metalearning of deep neural network (DNN) architectures and hyperparameters has become an increasingly important area of research. Loss functions are a type of metaknowledge that is crucial to effective training of DNNs and their potential role in metalearning has not yet been fully explored. This paper presents an algorithm called Enhanced Population-Based Training (EPBT) that interleaves the training of a DNN’s weights with the metalearning of optimal hyperparameters and loss functions. Loss functions use a TaylorGLO parameterization, based on multivariate Taylor expansions, that EPBT can directly optimize. On the CIFAR-10 and SVHN image classification benchmarks, EPBT discovers loss function schedules that enable faster, more accurate learning. The discovered functions adapt to the training process and serve to regularize the learning task by discouraging overfitting to the labels. EPBT thus demonstrates a promising synergy of simultaneous training and metalearning. |
Tasks | Image Classification |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04225v1 |
https://arxiv.org/pdf/2002.04225v1.pdf | |
PWC | https://paperswithcode.com/paper/population-based-training-for-loss-function |
Repo | |
Framework | |
LESS is More: Rethinking Probabilistic Models of Human Behavior
Title | LESS is More: Rethinking Probabilistic Models of Human Behavior |
Authors | Andreea Bobu, Dexter R. R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan |
Abstract | Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics domains, its roots lie in econometrics, and in modeling decisions among different discrete options, each with its own utility or reward. In contrast, human trajectories lie in a continuous space, with continuous-valued features that influence the reward function. We propose that it is time to rethink the Boltzmann model, and design it from the ground up to operate over such trajectory spaces. We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards. Rather than each trajectory affecting the decision independently, similar trajectories now affect the decision together. We start by showing that our model better explains human behavior in a user study. We then analyze the implications this has for robot inference, first in toy environments where we have ground truth and find more accurate inference, and finally for a 7DOF robot arm learning from user demonstrations. |
Tasks | |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04465v1 |
https://arxiv.org/pdf/2001.04465v1.pdf | |
PWC | https://paperswithcode.com/paper/less-is-more-rethinking-probabilistic-models |
Repo | |
Framework | |
Nonparametric Deconvolution Models
Title | Nonparametric Deconvolution Models |
Authors | Allison J. B. Chaney, Archit Verma, Young-suk Lee, Barbara E. Engelhardt |
Abstract | We describe nonparametric deconvolution models (NDMs), a family of Bayesian nonparametric models for collections of data in which each observation is the average over the features from heterogeneous particles. For example, these types of data are found in elections, where we observe precinct-level vote tallies (observations) of individual citizens’ votes (particles) across each of the candidates or ballot measures (features), where each voter is part of a specific voter cohort or demographic (factor). Like the hierarchical Dirichlet process, NDMs rely on two tiers of Dirichlet processes to explain the data with an unknown number of latent factors; each observation is modeled as a weighted average of these latent factors. Unlike existing models, NDMs recover how factor distributions vary locally for each observation. This uniquely allows NDMs both to deconvolve each observation into its constituent factors, and also to describe how the factor distributions specific to each observation vary across observations and deviate from the corresponding global factors. We present variational inference techniques for this family of models and study its performance on simulated data and voting data from California. We show that including local factors improves estimates of global factors and provides a novel scaffold for exploring data. |
Tasks | |
Published | 2020-03-17 |
URL | https://arxiv.org/abs/2003.07718v1 |
https://arxiv.org/pdf/2003.07718v1.pdf | |
PWC | https://paperswithcode.com/paper/nonparametric-deconvolution-models |
Repo | |
Framework | |