February 1, 2020

2937 words 14 mins read

Paper Group AWR 248

Paper Group AWR 248

Scale-Aware Trident Networks for Object Detection. Representative Datasets: The Perceptron Case. Practical Open-Loop Optimistic Planning. Real-time Scalable Dense Surfel Mapping. Image Synthesis with a Single (Robust) Classifier. CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems. Tr …

Scale-Aware Trident Networks for Object Detection

Title Scale-Aware Trident Networks for Object Detection
Authors Yanghao Li, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang
Abstract Scale variation is one of the key challenges in object detection. In this work, we first present a controlled experiment to investigate the effect of receptive fields for scale variation in object detection. Based on the findings from the exploration experiments, we propose a novel Trident Network (TridentNet) aiming to generate scale-specific feature maps with a uniform representational power. We construct a parallel multi-branch architecture in which each branch shares the same transformation parameters but with different receptive fields. Then, we adopt a scale-aware training scheme to specialize each branch by sampling object instances of proper scales for training. As a bonus, a fast approximation version of TridentNet could achieve significant improvements without any additional parameters and computational cost compared with the vanilla detector. On the COCO dataset, our TridentNet with ResNet-101 backbone achieves state-of-the-art single-model results of 48.4 mAP. Codes are available at https://git.io/fj5vR.
Tasks Object Detection
Published 2019-01-07
URL https://arxiv.org/abs/1901.01892v2
PDF https://arxiv.org/pdf/1901.01892v2.pdf
PWC https://paperswithcode.com/paper/scale-aware-trident-networks-for-object
Repo https://github.com/chengzhengxin/groupsoftmax-simpledet
Framework mxnet

Representative Datasets: The Perceptron Case

Title Representative Datasets: The Perceptron Case
Authors Rocio Gonzalez-Diaz, Miguel A. Gutiérrez-Naranjo, Eduardo Paluzo-Hidalgo
Abstract One of the main drawbacks of the practical use of neural networks is the long time needed in the training process. Such training process consists in an iterative change of parameters trying to minimize a loss function. These changes are driven by a dataset, which can be seen as a set of labeled points in an n-dimensional space. In this paper, we explore the concept of it representative dataset which is smaller than the original dataset and satisfies a nearness condition independent of isometric transformations. The representativeness is measured using persistence diagrams due to its computational efficiency. We also prove that the accuracy of the learning process of a neural network on a representative dataset is comparable with the accuracy on the original dataset when the neural network architecture is a perceptron and the loss function is the mean squared error. These theoretical results accompanied with experimentation open a door to the size reduction of the dataset in order to gain time in the training process of any neural network.
Tasks
Published 2019-03-20
URL http://arxiv.org/abs/1903.08519v1
PDF http://arxiv.org/pdf/1903.08519v1.pdf
PWC https://paperswithcode.com/paper/representative-datasets-the-perceptron-case
Repo https://github.com/Cimagroup/Experiments-Representative-datasets
Framework none

Practical Open-Loop Optimistic Planning

Title Practical Open-Loop Optimistic Planning
Authors Edouard Leurent, Odalric-Ambrym Maillard
Abstract We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms.
Tasks
Published 2019-04-09
URL http://arxiv.org/abs/1904.04700v1
PDF http://arxiv.org/pdf/1904.04700v1.pdf
PWC https://paperswithcode.com/paper/practical-open-loop-optimistic-planning
Repo https://github.com/maximecb/gym-minigrid
Framework pytorch

Real-time Scalable Dense Surfel Mapping

Title Real-time Scalable Dense Surfel Mapping
Authors Kaixuan Wang, Fei Gao, Shaojie Shen
Abstract In this paper, we propose a novel dense surfel mapping system that scales well in different environments with only CPU computation. Using a sparse SLAM system to estimate camera poses, the proposed mapping system can fuse intensity images and depth images into a globally consistent model. The system is carefully designed so that it can build from room-scale environments to urban-scale environments using depth images from RGB-D cameras, stereo cameras or even a monocular camera. First, superpixels extracted from both intensity and depth images are used to model surfels in the system. superpixel-based surfels make our method both run-time efficient and memory efficient. Second, surfels are further organized according to the pose graph of the SLAM system to achieve $O(1)$ fusion time regardless of the scale of reconstructed models. Third, a fast map deformation using the optimized pose graph enables the map to achieve global consistency in real-time. The proposed surfel mapping system is compared with other state-of-the-art methods on synthetic datasets. The performances of urban-scale and room-scale reconstruction are demonstrated using the KITTI dataset and autonomous aggressive flights, respectively. The code is available for the benefit of the community.
Tasks
Published 2019-09-10
URL https://arxiv.org/abs/1909.04250v1
PDF https://arxiv.org/pdf/1909.04250v1.pdf
PWC https://paperswithcode.com/paper/real-time-scalable-dense-surfel-mapping
Repo https://github.com/HKUST-Aerial-Robotics/DenseSurfelMapping
Framework none

Image Synthesis with a Single (Robust) Classifier

Title Image Synthesis with a Single (Robust) Classifier
Authors Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Andrew Ilyas, Logan Engstrom, Aleksander Madry
Abstract We show that the basic classification framework alone can be used to tackle some of the most challenging tasks in image synthesis. In contrast to other state-of-the-art approaches, the toolkit we develop is rather minimal: it uses a single, off-the-shelf classifier for all these tasks. The crux of our approach is that we train this classifier to be adversarially robust. It turns out that adversarial robustness is precisely what we need to directly manipulate salient features of the input. Overall, our findings demonstrate the utility of robustness in the broader machine learning context. Code and models for our experiments can be found at https://git.io/robust-apps.
Tasks Image Generation
Published 2019-06-06
URL https://arxiv.org/abs/1906.09453v2
PDF https://arxiv.org/pdf/1906.09453v2.pdf
PWC https://paperswithcode.com/paper/computer-vision-with-a-single-robust
Repo https://github.com/MadryLab/robustness
Framework pytorch

CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems

Title CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems
Authors Mikel Elkano, Jose Sanz, Edurne Barrenechea, Humberto Bustince, Mikel Galar
Abstract Interpretability has always been a major concern for fuzzy rule-based classifiers. The usage of human-readable models allows them to explain the reasoning behind their predictions and decisions. However, when it comes to Big Data classification problems, fuzzy rule-based classifiers have not been able to maintain the good trade-off between accuracy and interpretability that has characterized these techniques in non-Big Data environments. The most accurate methods build too complex models composed of a large number of rules and fuzzy sets, while those approaches focusing on interpretability do not provide state-of-the-art discrimination capabilities. In this paper, we propose a new distributed learning algorithm named CFM-BD to construct accurate and compact fuzzy rule-based classification systems for Big Data. This method has been specifically designed from scratch for Big Data problems and does not adapt or extend any existing algorithm. The proposed learning process consists of three stages: 1) pre-processing based on the probability integral transform theorem; 2) rule induction inspired by CHI-BD and Apriori algorithms; 3) rule selection by means of a global evolutionary optimization. We conducted a complete empirical study to test the performance of our approach in terms of accuracy, complexity, and runtime. The results obtained were compared and contrasted with four state-of-the-art fuzzy classifiers for Big Data (FBDT, FMDT, Chi-Spark-RS, and CHI-BD). According to this study, CFM-BD is able to provide competitive discrimination capabilities using significantly simpler models composed of a few rules of less than 3 antecedents, employing 5 linguistic labels for all variables.
Tasks
Published 2019-02-25
URL http://arxiv.org/abs/1902.09357v1
PDF http://arxiv.org/pdf/1902.09357v1.pdf
PWC https://paperswithcode.com/paper/cfm-bd-a-distributed-rule-induction-algorithm
Repo https://github.com/melkano/cfm-bd
Framework none

Triangulation Learning Network: from Monocular to Stereo 3D Object Detection

Title Triangulation Learning Network: from Monocular to Stereo 3D Object Detection
Authors Zengyi Qin, Jinglu Wang, Yan Lu
Abstract In this paper, we study the problem of 3D object detection from stereo images, in which the key challenge is how to effectively utilize stereo information. Different from previous methods using pixel-level depth maps, we propose employing 3D anchors to explicitly construct object-level correspondences between the regions of interest in stereo images, from which the deep neural network learns to detect and triangulate the targeted object in 3D space. We also introduce a cost-efficient channel reweighting strategy that enhances representational features and weakens noisy signals to facilitate the learning process. All of these are flexibly integrated into a solid baseline detector that uses monocular images. We demonstrate that both the monocular baseline and the stereo triangulation learning network outperform the prior state-of-the-arts in 3D object detection and localization on the challenging KITTI dataset.
Tasks 3D Object Detection, 3D object detection from stereo images, Object Detection
Published 2019-06-04
URL https://arxiv.org/abs/1906.01193v1
PDF https://arxiv.org/pdf/1906.01193v1.pdf
PWC https://paperswithcode.com/paper/triangulation-learning-network-from-monocular-1
Repo https://github.com/Zengyi-Qin/TLNet
Framework tf

FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis

Title FLNet: Landmark Driven Fetching and Learning Network for Faithful Talking Facial Animation Synthesis
Authors Kuangxiao Gu, Yuqian Zhou, Thomas Huang
Abstract Talking face synthesis has been widely studied in either appearance-based or warping-based methods. Previous works mostly utilize single face image as a source, and generate novel facial animations by merging other person’s facial features. However, some facial regions like eyes or teeth, which may be hidden in the source image, can not be synthesized faithfully and stably. In this paper, We present a landmark driven two-stream network to generate faithful talking facial animation, in which more facial details are created, preserved and transferred from multiple source images instead of a single one. Specifically, we propose a network consisting of a learning and fetching stream. The fetching sub-net directly learns to attentively warp and merge facial regions from five source images of distinctive landmarks, while the learning pipeline renders facial organs from the training face space to compensate. Compared to baseline algorithms, extensive experiments demonstrate that the proposed method achieves a higher performance both quantitatively and qualitatively. Codes are at https://github.com/kgu3/FLNet_AAAI2020.
Tasks Face Generation
Published 2019-11-21
URL https://arxiv.org/abs/1911.09224v1
PDF https://arxiv.org/pdf/1911.09224v1.pdf
PWC https://paperswithcode.com/paper/flnet-landmark-driven-fetching-and-learning
Repo https://github.com/kgu3/FLNet_AAAI2020
Framework none

Confidence Regularized Self-Training

Title Confidence Regularized Self-Training
Authors Yang Zou, Zhiding Yu, Xiaofeng Liu, B. V. K. Vijaya Kumar, Jinsong Wang
Abstract Recent advances in domain adaptation show that deep self-training presents a powerful means for unsupervised domain adaptation. These methods often involve an iterative process of predicting on target domain and then taking the confident predictions as pseudo-labels for retraining. However, since pseudo-labels can be noisy, self-training can put overconfident label belief on wrong classes, leading to deviated solutions with propagated errors. To address the problem, we propose a confidence regularized self-training (CRST) framework, formulated as regularized self-training. Our method treats pseudo-labels as continuous latent variables jointly optimized via alternating optimization. We propose two types of confidence regularization: label regularization (LR) and model regularization (MR). CRST-LR generates soft pseudo-labels while CRST-MR encourages the smoothness on network output. Extensive experiments on image classification and semantic segmentation show that CRSTs outperform their non-regularized counterpart with state-of-the-art performance. The code and models of this work are available at https://github.com/yzou2/CRST.
Tasks Domain Adaptation, Image Classification, Semantic Segmentation, Unsupervised Domain Adaptation
Published 2019-08-26
URL https://arxiv.org/abs/1908.09822v2
PDF https://arxiv.org/pdf/1908.09822v2.pdf
PWC https://paperswithcode.com/paper/confidence-regularized-self-training
Repo https://github.com/yzou2/CRST
Framework pytorch

GaborNet: Gabor filters with learnable parameters in deep convolutional neural networks

Title GaborNet: Gabor filters with learnable parameters in deep convolutional neural networks
Authors Andrey Alekseev, Anatoly Bobe
Abstract The article describes a system for image recognition using deep convolutional neural networks. Modified network architecture is proposed that focuses on improving convergence and reducing training complexity. The filters in the first layer of the network are constrained to fit the Gabor function. The parameters of Gabor functions are learnable and are updated by standard backpropagation techniques. The system was implemented on Python, tested on several datasets and outperformed the common convolutional networks.
Tasks
Published 2019-04-30
URL http://arxiv.org/abs/1904.13204v1
PDF http://arxiv.org/pdf/1904.13204v1.pdf
PWC https://paperswithcode.com/paper/gabornet-gabor-filters-with-learnable
Repo https://github.com/iKintosh/GaborNet
Framework pytorch

NAOMI: Non-Autoregressive Multiresolution Sequence Imputation

Title NAOMI: Non-Autoregressive Multiresolution Sequence Imputation
Authors Yukai Liu, Rose Yu, Stephan Zheng, Eric Zhan, Yisong Yue
Abstract Missing value imputation is a fundamental problem in spatiotemporal modeling, from motion tracking to the dynamics of physical systems. Deep autoregressive models suffer from error propagation which becomes catastrophic for imputing long-range sequences. In this paper, we take a non-autoregressive approach and propose a novel deep generative model: Non-AutOregressive Multiresolution Imputation (NAOMI) to impute long-range sequences given arbitrary missing patterns. NAOMI exploits the multiresolution structure of spatiotemporal data and decodes recursively from coarse to fine-grained resolutions using a divide-and-conquer strategy. We further enhance our model with adversarial training. When evaluated extensively on benchmark datasets from systems of both deterministic and stochastic dynamics. NAOMI demonstrates significant improvement in imputation accuracy (reducing average prediction error by 60% compared to autoregressive counterparts) and generalization for long range sequences.
Tasks Imitation Learning, Imputation, Multivariate Time Series Imputation
Published 2019-01-30
URL https://arxiv.org/abs/1901.10946v3
PDF https://arxiv.org/pdf/1901.10946v3.pdf
PWC https://paperswithcode.com/paper/naomi-non-autoregressive-multiresolution
Repo https://github.com/felixykliu/NAOMI
Framework pytorch

On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models

Title On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models
Authors Erik Nijkamp, Mitch Hill, Tian Han, Song-Chun Zhu, Ying Nian Wu
Abstract This study investigates the effects of Markov chain Monte Carlo (MCMC) sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is restricted to the family of unnormalized probability densities for which the negative log density (or energy function) is a ConvNet. We find that many of the techniques used to stabilize training in previous studies are not necessary. ML learning with a ConvNet potential requires only a few hyper-parameters and no regularization. Using this minimal framework, we identify a variety of ML learning outcomes that depend solely on the implementation of MCMC sampling. On one hand, we show that it is easy to train an energy-based model which can sample realistic images with short-run Langevin. ML can be effective and stable even when MCMC samples have much higher energy than true steady-state samples throughout training. Based on this insight, we introduce an ML method with purely noise-initialized MCMC, high-quality short-run synthesis, and the same budget as ML with informative MCMC initialization such as CD or PCD. Unlike previous models, our energy model can obtain realistic high-diversity samples from a noise signal after training. On the other hand, ConvNet potentials learned with non-convergent MCMC do not have a valid steady-state and cannot be considered approximate unnormalized densities of the training data because long-run MCMC samples differ greatly from observed images. We show that it is much harder to train a ConvNet potential to learn a steady-state over realistic images. To our knowledge, long-run MCMC samples of all previous models lose the realism of short-run samples. With correct tuning of Langevin noise, we train the first ConvNet potentials for which long-run and steady-state MCMC samples are realistic images.
Tasks
Published 2019-03-29
URL https://arxiv.org/abs/1903.12370v4
PDF https://arxiv.org/pdf/1903.12370v4.pdf
PWC https://paperswithcode.com/paper/on-the-anatomy-of-mcmc-based-maximum
Repo https://github.com/point0bar1/ebm-anatomy
Framework pytorch

Class Teaching for Inverse Reinforcement Learners

Title Class Teaching for Inverse Reinforcement Learners
Authors Manuel Lopes, Francisco Melo
Abstract In this paper we propose the first machine teaching algorithm for multiple inverse reinforcement learners. Specifically, our contributions are: (i) we formally introduce the problem of teaching a sequential task to a heterogeneous group of learners; (ii) we identify conditions under which it is possible to conduct such teaching using the same demonstration for all learners; and (iii) we propose and evaluate a simple algorithm that computes a demonstration(s) ensuring that all agents in a heterogeneous class learn a task description that is compatible with the target task. Our analysis shows that, contrary to other teaching problems, teaching a heterogeneous class with a single demonstration may not be possible as the differences between agents increase. We also showcase the advantages of our proposed machine teaching approach against several possible alternatives.
Tasks
Published 2019-11-29
URL https://arxiv.org/abs/1911.13009v1
PDF https://arxiv.org/pdf/1911.13009v1.pdf
PWC https://paperswithcode.com/paper/class-teaching-for-inverse-reinforcement
Repo https://github.com/maclopes/classteachIRL
Framework none

Unsupervised Visuomotor Control through Distributional Planning Networks

Title Unsupervised Visuomotor Control through Distributional Planning Networks
Authors Tianhe Yu, Gleb Shevchuk, Dorsa Sadigh, Chelsea Finn
Abstract While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of the environment needed to compute progress are not directly accessible. To enable robots to autonomously learn skills, we instead consider the problem of reinforcement learning without access to rewards. We aim to learn an unsupervised embedding space under which the robot can measure progress towards a goal for itself. Our approach explicitly optimizes for a metric space under which action sequences that reach a particular state are optimal when the goal is the final state reached. This enables learning effective and control-centric representations that lead to more autonomous reinforcement learning algorithms. Our experiments on three simulated environments and two real-world manipulation problems show that our method can learn effective goal metrics from unlabeled interaction, and use the learned goal metrics for autonomous reinforcement learning.
Tasks
Published 2019-02-14
URL http://arxiv.org/abs/1902.05542v1
PDF http://arxiv.org/pdf/1902.05542v1.pdf
PWC https://paperswithcode.com/paper/unsupervised-visuomotor-control-through
Repo https://github.com/tianheyu927/dpn
Framework tf

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Title CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Authors Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzmán, Armand Joulin, Edouard Grave
Abstract Pre-training text representations have led to significant improvements in many areas of natural language processing. The quality of these models benefits greatly from the size of the pretraining corpora as long as its quality is preserved. In this paper, we describe an automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages. Our pipeline follows the data processing introduced in fastText (Mikolov et al., 2017; Grave et al., 2018), that deduplicates documents and identifies their language. We augment this pipeline with a filtering step to select documents that are close to high quality corpora like Wikipedia.
Tasks
Published 2019-11-01
URL https://arxiv.org/abs/1911.00359v2
PDF https://arxiv.org/pdf/1911.00359v2.pdf
PWC https://paperswithcode.com/paper/ccnet-extracting-high-quality-monolingual
Repo https://github.com/facebookresearch/cc_net
Framework none
comments powered by Disqus