February 1, 2020

3045 words 15 mins read

Paper Group AWR 95

Paper Group AWR 95

Searching for MobileNetV3. Explicit Shape Encoding for Real-Time Instance Segmentation. ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views. Towards Optimal Discrete Online Hashing with Balanced Similarity. The Thermodynamic Variational Objective. A Learned Representation for Scalable Vector Graphics. Exact-K Recommendation …

Searching for MobileNetV3

Title Searching for MobileNetV3
Authors Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam
Abstract We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2% more accurate on ImageNet classification while reducing latency by 15% compared to MobileNetV2. MobileNetV3-Small is 4.6% more accurate while reducing latency by 5% compared to MobileNetV2. MobileNetV3-Large detection is 25% faster at roughly the same accuracy as MobileNetV2 on COCO detection. MobileNetV3-Large LR-ASPP is 30% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.
Tasks Image Classification, Neural Architecture Search, Object Detection, Semantic Segmentation
Published 2019-05-06
URL https://arxiv.org/abs/1905.02244v5
PDF https://arxiv.org/pdf/1905.02244v5.pdf
PWC https://paperswithcode.com/paper/searching-for-mobilenetv3
Repo https://github.com/d-li14/mobilenetv3.pytorch
Framework pytorch

Explicit Shape Encoding for Real-Time Instance Segmentation

Title Explicit Shape Encoding for Real-Time Instance Segmentation
Authors Wenqiang Xu, Haiyang Wang, Fubo Qi, Cewu Lu
Abstract In this paper, we propose a novel top-down instance segmentation framework based on explicit shape encoding, named \textbf{ESE-Seg}. It largely reduces the computational consumption of the instance segmentation by explicitly decoding the multiple object shapes with tensor operations, thus performs the instance segmentation at almost the same speed as the object detection. ESE-Seg is based on a novel shape signature Inner-center Radius (IR), Chebyshev polynomial fitting and the strong modern object detectors. ESE-Seg with YOLOv3 outperforms the Mask R-CNN on Pascal VOC 2012 at mAP$^r$@0.5 while 7 times faster.
Tasks Instance Segmentation, Object Detection, Real-time Instance Segmentation, Semantic Segmentation
Published 2019-08-12
URL https://arxiv.org/abs/1908.04067v1
PDF https://arxiv.org/pdf/1908.04067v1.pdf
PWC https://paperswithcode.com/paper/explicit-shape-encoding-for-real-time
Repo https://github.com/WenqiangX/ese_seg
Framework mxnet

ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views

Title ShopSign: a Diverse Scene Text Dataset of Chinese Shop Signs in Street Views
Authors Chongsheng Zhang, Guowen Peng, Yuefeng Tao, Feifei Fu, Wei Jiang, George Almpanidis, Ke Chen
Abstract In this paper, we introduce the ShopSign dataset, which is a newly developed natural scene text dataset of Chinese shop signs in street views. Although a few scene text datasets are already publicly available (e.g. ICDAR2015, COCO-Text), there are few images in these datasets that contain Chinese texts/characters. Hence, we collect and annotate the ShopSign dataset to advance research in Chinese scene text detection and recognition. The new dataset has three distinctive characteristics: (1) large-scale: it contains 25,362 Chinese shop sign images, with a total number of 196,010 text-lines. (2) diversity: the images in ShopSign were captured in different scenes, from downtown to developing regions, using more than 50 different mobile phones. (3) difficulty: the dataset is very sparse and imbalanced. It also includes five categories of hard images (mirror, wooden, deformed, exposed and obscure). To illustrate the challenges in ShopSign, we run baseline experiments using state-of-the-art scene text detection methods (including CTPN, TextBoxes++ and EAST), and cross-dataset validation to compare their corresponding performance on the related datasets such as CTW, RCTW and ICPR 2018 MTWI challenge dataset. The sample images and detailed descriptions of our ShopSign dataset are publicly available at: https://github.com/chongshengzhang/shopsign.
Tasks Scene Text Detection
Published 2019-03-25
URL http://arxiv.org/abs/1903.10412v1
PDF http://arxiv.org/pdf/1903.10412v1.pdf
PWC https://paperswithcode.com/paper/shopsign-a-diverse-scene-text-dataset-of
Repo https://github.com/chongshengzhang/shopsign
Framework none

Towards Optimal Discrete Online Hashing with Balanced Similarity

Title Towards Optimal Discrete Online Hashing with Balanced Similarity
Authors Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, Yunsheng Wu
Abstract When facing large-scale image datasets, online hashing serves as a promising solution for online retrieval and prediction tasks. It encodes the online streaming data into compact binary codes, and simultaneously updates the hash functions to renew codes of the existing dataset. To this end, the existing methods update hash functions solely based on the new data batch, without investigating the correlation between such new data and the existing dataset. In addition, existing works update the hash functions using a relaxation process in its corresponding approximated continuous space. And it remains as an open problem to directly apply discrete optimizations in online hashing. In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework. BSODH employs a well-designed hashing algorithm to preserve the similarity between the streaming data and the existing dataset via an asymmetric graph regularization. We further identify the “data-imbalance” problem brought by the constructed asymmetric graph, which restricts the application of discrete optimization in our problem. Therefore, a novel balanced similarity is further proposed, which uses two equilibrium factors to balance the similar and dissimilar weights and eventually enables the usage of discrete optimizations. Extensive experiments conducted on three widely-used benchmarks demonstrate the advantages of the proposed method over the state-of-the-art methods.
Tasks
Published 2019-01-29
URL http://arxiv.org/abs/1901.10185v2
PDF http://arxiv.org/pdf/1901.10185v2.pdf
PWC https://paperswithcode.com/paper/towards-optimal-discrete-online-hashing-with
Repo https://github.com/lmbxmu/mycode
Framework none

The Thermodynamic Variational Objective

Title The Thermodynamic Variational Objective
Authors Vaden Masrani, Tuan Anh Le, Frank Wood
Abstract We introduce the thermodynamic variational objective (TVO) for learning in both continuous and discrete deep generative models. The TVO arises from a key connection between variational inference and thermodynamic integration that results in a tighter lower bound to the log marginal likelihood than the standard variational variational evidence lower bound (ELBO) while remaining as broadly applicable. We provide a computationally efficient gradient estimator for the TVO that applies to continuous, discrete, and non-reparameterizable distributions and show that the objective functions used in variational inference, variational autoencoders, wake sleep, and inference compilation are all special cases of the TVO. We use the TVO to learn both discrete and continuous deep generative models and empirically demonstrate state of the art model and inference network learning.
Tasks
Published 2019-06-28
URL https://arxiv.org/abs/1907.00031v4
PDF https://arxiv.org/pdf/1907.00031v4.pdf
PWC https://paperswithcode.com/paper/the-thermodynamic-variational-objective
Repo https://github.com/vmasrani/tvo
Framework pytorch

A Learned Representation for Scalable Vector Graphics

Title A Learned Representation for Scalable Vector Graphics
Authors Raphael Gontijo Lopes, David Ha, Douglas Eck, Jonathon Shlens
Abstract Dramatic advances in generative models have resulted in near photographic quality for artificially rendered faces, animals and other objects in the natural world. In spite of such advances, a higher level understanding of vision and imagery does not arise from exhaustively modeling an object, but instead identifying higher-level attributes that best summarize the aspects of an object. In this work we attempt to model the drawing process of fonts by building sequential generative models of vector graphics. This model has the benefit of providing a scale-invariant representation for imagery whose latent representation may be systematically manipulated and exploited to perform style propagation. We demonstrate these results on a large dataset of fonts and highlight how such a model captures the statistical dependencies and richness of this dataset. We envision that our model can find use as a tool for graphic designers to facilitate font design.
Tasks
Published 2019-04-04
URL http://arxiv.org/abs/1904.02632v1
PDF http://arxiv.org/pdf/1904.02632v1.pdf
PWC https://paperswithcode.com/paper/a-learned-representation-for-scalable-vector
Repo https://github.com/tensorflow/magenta/tree/master/magenta/models/svg_vae
Framework tf

Exact-K Recommendation via Maximal Clique Optimization

Title Exact-K Recommendation via Maximal Clique Optimization
Authors Yu Gong, Yu Zhu, Lu Duan, Qingwen Liu, Ziyu Guan, Fei Sun, Wenwu Ou, Kenny Q. Zhu
Abstract This paper targets to a novel but practical recommendation problem named exact-K recommendation. It is different from traditional top-K recommendation, as it focuses more on (constrained) combinatorial optimization which will optimize to recommend a whole set of K items called card, rather than ranking optimization which assumes that “better” items should be put into top positions. Thus we take the first step to give a formal problem definition, and innovatively reduce it to Maximum Clique Optimization based on graph. To tackle this specific combinatorial optimization problem which is NP-hard, we propose Graph Attention Networks (GAttN) with a Multi-head Self-attention encoder and a decoder with attention mechanism. It can end-to-end learn the joint distribution of the K items and generate an optimal card rather than rank individual items by prediction scores. Then we propose Reinforcement Learning from Demonstrations (RLfD) which combines the advantages in behavior cloning and reinforcement learning, making it sufficient- and-efficient to train the model. Extensive experiments on three datasets demonstrate the effectiveness of our proposed GAttN with RLfD method, it outperforms several strong baselines with a relative improvement of 7.7% and 4.7% on average in Precision and Hit Ratio respectively, and achieves state-of-the-art (SOTA) performance for the exact-K recommendation problem.
Tasks Combinatorial Optimization
Published 2019-05-17
URL https://arxiv.org/abs/1905.07089v1
PDF https://arxiv.org/pdf/1905.07089v1.pdf
PWC https://paperswithcode.com/paper/exact-k-recommendation-via-maximal-clique
Repo https://github.com/pangolulu/exact-k-recommendation
Framework tf

Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization

Title Deep Reinforcement Learning using Genetic Algorithm for Parameter Optimization
Authors Adarsh Sehgal, Hung Manh La, Sushil J. Louis, Hai Nguyen
Abstract Reinforcement learning (RL) enables agents to take decision based on a reward function. However, in the process of learning, the choice of values for learning algorithm parameters can significantly impact the overall learning process. In this paper, we use a genetic algorithm (GA) to find the values of parameters used in Deep Deterministic Policy Gradient (DDPG) combined with Hindsight Experience Replay (HER), to help speed up the learning agent. We used this method on fetch-reach, slide, push, pick and place, and door opening in robotic manipulation tasks. Our experimental evaluation shows that our method leads to better performance, faster than the original algorithm.
Tasks
Published 2019-02-19
URL http://arxiv.org/abs/1905.04100v1
PDF http://arxiv.org/pdf/1905.04100v1.pdf
PWC https://paperswithcode.com/paper/190504100
Repo https://github.com/Bibyutatsu/Self_Driving_Car
Framework pytorch

Intra-frame Object Tracking by Deblatting

Title Intra-frame Object Tracking by Deblatting
Authors Jan Kotera, Denys Rozumnyi, Filip Šroubek, Jiří Matas
Abstract Objects moving at high speed along complex trajectories often appear in videos, especially videos of sports. Such objects elapse non-negligible distance during exposure time of a single frame and therefore their position in the frame is not well defined. They appear as semi-transparent streaks due to the motion blur and cannot be reliably tracked by standard trackers. We propose a novel approach called Tracking by Deblatting based on the observation that motion blur is directly related to the intra-frame trajectory of an object. Blur is estimated by solving two intertwined inverse problems, blind deblurring and image matting, which we call deblatting. The trajectory is then estimated by fitting a piecewise quadratic curve, which models physically justifiable trajectories. As a result, tracked objects are precisely localized with higher temporal resolution than by conventional trackers. The proposed TbD tracker was evaluated on a newly created dataset of videos with ground truth obtained by a high-speed camera using a novel Trajectory-IoU metric that generalizes the traditional Intersection over Union and measures the accuracy of the intra-frame trajectory. The proposed method outperforms baseline both in recall and trajectory accuracy.
Tasks Deblurring, Image Matting, Object Tracking
Published 2019-05-09
URL https://arxiv.org/abs/1905.03633v2
PDF https://arxiv.org/pdf/1905.03633v2.pdf
PWC https://paperswithcode.com/paper/190503633
Repo https://github.com/rozumden/tbd
Framework none

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

Title Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack
Authors Francesco Croce, Matthias Hein
Abstract The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as the methods for the exact computation, even when available, do not scale to large networks. We propose in this paper a new white-box adversarial attack wrt the $l_p$-norms for $p \in {1,2,\infty}$ aiming at finding the minimal perturbation necessary to change the class of a given input. It has an intuitive geometric meaning, yields high quality results already with one restart, minimizes the size of the perturbation, so that the robust accuracy can be evaluated at all possible thresholds with a single run, and comes with almost no free parameters except number of iterations and restarts. It achieves better or similar robust test accuracy compared to state-of-the-art attacks which are partially specialized to one $l_p$-norm.
Tasks Adversarial Attack
Published 2019-07-03
URL https://arxiv.org/abs/1907.02044v1
PDF https://arxiv.org/pdf/1907.02044v1.pdf
PWC https://paperswithcode.com/paper/minimally-distorted-adversarial-examples-with
Repo https://github.com/fra31/fab-attack
Framework tf

Attention Augmented Convolutional Networks

Title Attention Augmented Convolutional Networks
Authors Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, Quoc V. Le
Abstract Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, on the other hand, has emerged as a recent advance to capture long range interactions, but has mostly been applied to sequence modeling and generative modeling tasks. In this paper, we consider the use of self-attention for discriminative visual tasks as an alternative to convolutions. We introduce a novel two-dimensional relative self-attention mechanism that proves competitive in replacing convolutions as a stand-alone computational primitive for image classification. We find in control experiments that the best results are obtained when combining both convolutions and self-attention. We therefore propose to augment convolutional operators with this self-attention mechanism by concatenating convolutional feature maps with a set of feature maps produced via self-attention. Extensive experiments show that Attention Augmentation leads to consistent improvements in image classification on ImageNet and object detection on COCO across many different models and scales, including ResNets and a state-of-the art mobile constrained network, while keeping the number of parameters similar. In particular, our method achieves a $1.3%$ top-1 accuracy improvement on ImageNet classification over a ResNet50 baseline and outperforms other attention mechanisms for images such as Squeeze-and-Excitation. It also achieves an improvement of 1.4 mAP in COCO Object Detection on top of a RetinaNet baseline.
Tasks Image Classification, Object Detection
Published 2019-04-22
URL https://arxiv.org/abs/1904.09925v4
PDF https://arxiv.org/pdf/1904.09925v4.pdf
PWC https://paperswithcode.com/paper/190409925
Repo https://github.com/sebastiani/pytorch-attention-augmented-convolution
Framework pytorch

Cormorant: Covariant Molecular Neural Networks

Title Cormorant: Covariant Molecular Neural Networks
Authors Brandon Anderson, Truong-Son Hy, Risi Kondor
Abstract We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems. We apply these networks to molecular systems with two goals: learning atomic potential energy surfaces for use in Molecular Dynamics simulations, and learning ground state properties of molecules calculated by Density Functional Theory. Some of the key features of our network are that (a) each neuron explicitly corresponds to a subset of atoms; (b) the activation of each neuron is covariant to rotations, ensuring that overall the network is fully rotationally invariant. Furthermore, the non-linearity in our network is based upon tensor products and the Clebsch-Gordan decomposition, allowing the network to operate entirely in Fourier space. Cormorant significantly outperforms competing algorithms in learning molecular Potential Energy Surfaces from conformational geometries in the MD-17 dataset, and is competitive with other methods at learning geometric, energetic, electronic, and thermodynamic properties of molecules on the GDB-9 dataset.
Tasks
Published 2019-06-06
URL https://arxiv.org/abs/1906.04015v3
PDF https://arxiv.org/pdf/1906.04015v3.pdf
PWC https://paperswithcode.com/paper/cormorant-covariant-molecular-neural-networks
Repo https://github.com/risilab/cormorant
Framework pytorch

ROS2Learn: a reinforcement learning framework for ROS 2

Title ROS2Learn: a reinforcement learning framework for ROS 2
Authors Yue Leire Erro Nuin, Nestor Gonzalez Lopez, Elias Barba Moral, Lander Usategui San Juan, Alejandro Solano Rueda, Víctor Mayoral Vilches, Risto Kojcev
Abstract We propose a novel framework for Deep Reinforcement Learning (DRL) in modular robotics to train a robot directly from joint states, using traditional robotic tools. We use an state-of-the-art implementation of the Proximal Policy Optimization, Trust Region Policy Optimization and Actor-Critic Kronecker-Factored Trust Region algorithms to learn policies in four different Modular Articulated Robotic Arm (MARA) environments. We support this process using a framework that communicates with typical tools used in robotics, such as Gazebo and Robot Operating System 2 (ROS 2). We evaluate several algorithms in modular robots with an empirical study in simulation.
Tasks
Published 2019-03-14
URL http://arxiv.org/abs/1903.06282v2
PDF http://arxiv.org/pdf/1903.06282v2.pdf
PWC https://paperswithcode.com/paper/ros2learn-a-reinforcement-learning-framework
Repo https://github.com/acutronicrobotics/ros2learn
Framework tf

Massive Styles Transfer with Limited Labeled Data

Title Massive Styles Transfer with Limited Labeled Data
Authors Hongyu Zang, Xiaojun Wan
Abstract Language style transfer has attracted more and more attention in the past few years. Recent researches focus on improving neural models targeting at transferring from one style to the other with labeled data. However, transferring across multiple styles is often very useful in real-life applications. Previous researches of language style transfer have two main deficiencies: dependency on massive labeled data and neglect of mutual influence among different style transfer tasks. In this paper, we propose a multi-agent style transfer system (MAST) for addressing multiple style transfer tasks with limited labeled data, by leveraging abundant unlabeled data and the mutual benefit among the multiple styles. A style transfer agent in our system not only learns from unlabeled data by using techniques like denoising auto-encoder and back-translation, but also learns to cooperate with other style transfer agents in a self-organization manner. We conduct our experiments by simulating a set of real-world style transfer tasks with multiple versions of the Bible. Our model significantly outperforms the other competitive methods. Extensive results and analysis further verify the efficacy of our proposed system.
Tasks Denoising, Style Transfer
Published 2019-06-03
URL https://arxiv.org/abs/1906.00580v1
PDF https://arxiv.org/pdf/1906.00580v1.pdf
PWC https://paperswithcode.com/paper/190600580
Repo https://github.com/zhyack/MAST
Framework tf

A hybrid parametric-deep learning approach for sound event localization and detection

Title A hybrid parametric-deep learning approach for sound event localization and detection
Authors Andres Perez-Lopez, Eduardo Fonseca, Xavier Serra
Abstract This work describes and discusses an algorithm submitted to the Sound Event Localization and Detection Task of DCASE2019 Challenge. The proposed methodology relies on parametric spatial audio analysis for source localization and detection, combined with a deep learning-based monophonic event classifier. The evaluation of the proposed algorithm yields overall results comparable to the baseline system. The main highlight is a reduction of the localization error on the evaluation dataset by a factor of 2.6, compared with the baseline performance.
Tasks
Published 2019-08-27
URL https://arxiv.org/abs/1908.10133v1
PDF https://arxiv.org/pdf/1908.10133v1.pdf
PWC https://paperswithcode.com/paper/a-hybrid-parametric-deep-learning-approach
Repo https://github.com/andresperezlopez/DCASE2019_task3
Framework tf
comments powered by Disqus