February 1, 2020

2928 words 14 mins read

Paper Group AWR 164

Paper Group AWR 164

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. RetinaFace: Single-stage Dense Face Localisation in the Wild. GO Gradient for Expectation-Based Objectives. No Padding Please: Efficient Neural Handwriting Recognition. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. ORRB – OpenAI Remote R …

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention

Title TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
Authors Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai
Abstract In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation cost. Experimental results on the validation set of KITTI dataset demonstrate that, in the challenging noisy cases, i.e., adding additional random noisy points around each object,the presented approach goes far beyond state-of-the-art approaches. Furthermore, for the 3D object detection task of the KITTI benchmark, our approach ranks the first place on Pedestrian class, by using the point clouds as the only input. The running speed is around 29 frames per second.
Tasks 3D Object Detection, Object Detection
Published 2019-12-11
URL https://arxiv.org/abs/1912.05163v1
PDF https://arxiv.org/pdf/1912.05163v1.pdf
PWC https://paperswithcode.com/paper/tanet-robust-3d-object-detection-from-point
Repo https://github.com/happinesslz/TANet
Framework pytorch

RetinaFace: Single-stage Dense Face Localisation in the Wild

Title RetinaFace: Single-stage Dense Face Localisation in the Wild
Authors Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou
Abstract Face Analysis Project on MXNet
Tasks Face Detection, Face Verification, Multi-Task Learning
Published 2019-05-02
URL https://arxiv.org/abs/1905.00641v2
PDF https://arxiv.org/pdf/1905.00641v2.pdf
PWC https://paperswithcode.com/paper/190500641
Repo https://github.com/peteryuX/retinaface-tf2
Framework tf

GO Gradient for Expectation-Based Objectives

Title GO Gradient for Expectation-Based Objectives
Authors Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin
Abstract Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous random variables and employ a reparameterization trick. To address these limitations, we propose a General and One-sample (GO) gradient that (i) applies to many distributions associated with non-reparameterizable continuous or discrete random variables, and (ii) has the same low-variance as the reparameterization trick. We find that the GO gradient often works well in practice based on only one Monte Carlo sample (although one can of course use more samples if desired). Alongside the GO gradient, we develop a means of propagating the chain rule through distributions, yielding statistical back-propagation, coupling neural networks to common random variables.
Tasks
Published 2019-01-17
URL http://arxiv.org/abs/1901.06020v1
PDF http://arxiv.org/pdf/1901.06020v1.pdf
PWC https://paperswithcode.com/paper/go-gradient-for-expectation-based-objectives
Repo https://github.com/YulaiCong/GOgradient
Framework tf

No Padding Please: Efficient Neural Handwriting Recognition

Title No Padding Please: Efficient Neural Handwriting Recognition
Authors Gideon Maillette de Buy Wenniger, Lambert Schomaker, Andy Way
Abstract Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) recurrent neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all directions, this strength limits the possibilities for parallelization, and therefore comes at a high computational cost. In this work we develop methods to create efficient MDLSTM-based models for NHR, particularly a method aimed at eliminating computation waste that results from padding. This proposed method, called example-packing, replaces wasteful stacking of padded examples with efficient tiling in a 2-dimensional grid. For word-based NHR this yields a speed improvement of factor 6.6 over an already efficient baseline of minimal padding for each batch separately. For line-based NHR the savings are more modest, but still significant. In addition to example-packing, we propose: 1) a technique to optimize parallelization for dynamic graph definition frameworks including PyTorch, using convolutions with grouping, 2) a method for parallelization across GPUs for variable-length example batches. All our techniques are thoroughly tested on our own PyTorch re-implementation of MDLSTM-based NHR models. A thorough evaluation on the IAM dataset shows that our models are performing similar to earlier implementations of state-of-the-art models. Our efficient NHR model and some of the reusable techniques discussed with it offer ways to realize relatively efficient models for the omnipresent scenario of variable-length inputs in deep learning.
Tasks
Published 2019-02-28
URL http://arxiv.org/abs/1902.11208v1
PDF http://arxiv.org/pdf/1902.11208v1.pdf
PWC https://paperswithcode.com/paper/no-padding-please-efficient-neural
Repo https://github.com/gwenniger/multi-hare
Framework pytorch

Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification

Title Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification
Authors Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, Tieniu Tan
Abstract Graph convolutional networks (GCNs) have been successfully applied in node classification tasks of network mining. However, most of these models based on neighborhood aggregation are usually shallow and lack the “graph pooling” mechanism, which prevents the model from obtaining adequate global information. In order to increase the receptive field, we propose a novel deep Hierarchical Graph Convolutional Network (H-GCN) for semi-supervised node classification. H-GCN first repeatedly aggregates structurally similar nodes to hyper-nodes and then refines the coarsened graph to the original to restore the representation for each node. Instead of merely aggregating one- or two-hop neighborhood information, the proposed coarsening procedure enlarges the receptive field for each node, hence more global information can be captured. The proposed H-GCN model shows strong empirical performance on various public benchmark graph datasets, outperforming state-of-the-art methods and acquiring up to 5.9% performance improvement in terms of accuracy. In addition, when only a few labeled samples are provided, our model gains substantial improvements.
Tasks Node Classification
Published 2019-02-13
URL https://arxiv.org/abs/1902.06667v4
PDF https://arxiv.org/pdf/1902.06667v4.pdf
PWC https://paperswithcode.com/paper/semi-supervised-node-classification-via
Repo https://github.com/CRIPAC-DIG/H-GCN
Framework tf

ORRB – OpenAI Remote Rendering Backend

Title ORRB – OpenAI Remote Rendering Backend
Authors Maciek Chociej, Peter Welinder, Lilian Weng
Abstract We present the OpenAI Remote Rendering Backend (ORRB), a system that allows fast and customizable rendering of robotics environments. It is based on the Unity3d game engine and interfaces with the MuJoCo physics simulation library. ORRB was designed with visual domain randomization in mind. It is optimized for cloud deployment and high throughput operation. We are releasing it to the public under a liberal MIT license: https://github.com/openai/orrb .
Tasks
Published 2019-06-26
URL https://arxiv.org/abs/1906.11633v1
PDF https://arxiv.org/pdf/1906.11633v1.pdf
PWC https://paperswithcode.com/paper/orrb-openai-remote-rendering-backend
Repo https://github.com/openai/orrb
Framework none

Beyond Correlation: A Path-Invariant Measure for Seismogram Similarity

Title Beyond Correlation: A Path-Invariant Measure for Seismogram Similarity
Authors Joshua Dickey, Brett Borghetti, William Junek, Richard Martin
Abstract Similarity search is a popular technique for seismic signal processing, with template matching, matched filters and subspace detectors being utilized for a wide variety of tasks, including both signal detection and source discrimination. Traditionally, these techniques rely on the cross-correlation function as the basis for measuring similarity. Unfortunately, seismogram correlation is dominated by path effects, essentially requiring a distinct waveform template along each path of interest. To address this limitation, we propose a novel measure of seismogram similarity that is explicitly invariant to path. Using Earthscope’s USArray experiment, a path-rich dataset of 207,291 regional seismograms across 8,452 unique events is constructed, and then employed via the batch-hard triplet loss function, to train a deep convolutional neural network which maps raw seismograms to a low dimensional embedding space, where nearness on the space corresponds to nearness of source function, regardless of path or recording instrumentation. This path-agnostic embedding space forms a new representation for seismograms, characterized by robust, source-specific features, which we show to be useful for performing both pairwise event association as well as template-based source discrimination with a single template.
Tasks
Published 2019-04-16
URL https://arxiv.org/abs/1904.07936v4
PDF https://arxiv.org/pdf/1904.07936v4.pdf
PWC https://paperswithcode.com/paper/beyond-correlation-a-path-invariant-measure
Repo https://github.com/joshuadickey/seis-sim
Framework tf

On the Cross-lingual Transferability of Monolingual Representations

Title On the Cross-lingual Transferability of Monolingual Representations
Authors Mikel Artetxe, Sebastian Ruder, Dani Yogatama
Abstract State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that transfers a monolingual model to new languages at the lexical level. More concretely, we first train a transformer-based masked language model on one language, and transfer it to a new language by learning a new embedding matrix with the same masked language modeling objective -freezing parameters of all other layers. This approach does not rely on a shared vocabulary or joint training. However, we show that it is competitive with multilingual BERT on standard cross-lingual classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD). Our results contradict common beliefs of the basis of the generalization ability of multilingual models and suggest that deep monolingual models learn some abstractions that generalize across languages. We also release XQuAD as a more comprehensive cross-lingual benchmark, which comprises 240 paragraphs and 1190 question-answer pairs from SQuAD v1.1 translated into ten languages by professional translators.
Tasks Language Modelling, Question Answering
Published 2019-10-25
URL https://arxiv.org/abs/1910.11856v1
PDF https://arxiv.org/pdf/1910.11856v1.pdf
PWC https://paperswithcode.com/paper/on-the-cross-lingual-transferability-of
Repo https://github.com/deepmind/xquad
Framework none

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Title MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension
Authors Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
Abstract We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the final six were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.
Tasks Multi-Task Learning, Question Answering, Reading Comprehension
Published 2019-10-22
URL https://arxiv.org/abs/1910.09753v2
PDF https://arxiv.org/pdf/1910.09753v2.pdf
PWC https://paperswithcode.com/paper/mrqa-2019-shared-task-evaluating
Repo https://github.com/mrqa/MRQA-Shared-Task-2019
Framework none

FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications

Title FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications
Authors Arpit Jadon, Mohd. Omama, Akshay Varshney, Mohammad Samar Ansari, Rishabh Sharma
Abstract Fire disasters typically result in lot of loss to life and property. It is therefore imperative that precise, fast, and possibly portable solutions to detect fire be made readily available to the masses at reasonable prices. There have been several research attempts to design effective and appropriately priced fire detection systems with varying degrees of success. However, most of them demonstrate a trade-off between performance and model size (which decides the model’s ability to be installed on portable devices). The work presented in this paper is an attempt to deal with both the performance and model size issues in one design. Toward that end, a `designed-from-scratch’ neural network, named FireNet, is proposed which is worthy on both the counts: (i) it has better performance than existing counterparts, and (ii) it is lightweight enough to be deploy-able on embedded platforms like Raspberry Pi. Performance evaluations on a standard dataset, as well as our own newly introduced custom-compiled fire dataset, are extremely encouraging. |
Tasks
Published 2019-05-28
URL https://arxiv.org/abs/1905.11922v2
PDF https://arxiv.org/pdf/1905.11922v2.pdf
PWC https://paperswithcode.com/paper/firenet-a-specialized-lightweight-fire-smoke
Repo https://github.com/arpit-jadon/FireNet-LightWeight-Network-for-Fire-Detection
Framework tf

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Title Sherlock: A Deep Learning Approach to Semantic Data Type Detection
Authors Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, Çağatay Demiralp, César Hidalgo
Abstract Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
Tasks Word Embeddings
Published 2019-05-25
URL https://arxiv.org/abs/1905.10688v1
PDF https://arxiv.org/pdf/1905.10688v1.pdf
PWC https://paperswithcode.com/paper/sherlock-a-deep-learning-approach-to-semantic
Repo https://github.com/tsegall/fta
Framework none

GANchors: Realistic Image Perturbation Distributions for Anchors Using Generative Models

Title GANchors: Realistic Image Perturbation Distributions for Anchors Using Generative Models
Authors Kurtis Evan David, Harrison Keane, Jun Min Noh
Abstract We extend and improve the work of Model Agnostic Anchors for explanations on image classification through the use of generative adversarial networks (GANs). Using GANs, we generate samples from a more realistic perturbation distribution, by optimizing under a lower dimensional latent space. This increases the trust in an explanation, as results now come from images that are more likely to be found in the original training set of a classifier, rather than an overlay of random images. A large drawback to our method is the computational complexity of sampling through optimization; to address this, we implement more efficient algorithms, including a diverse encoder. Lastly, we share results from the MNIST and CelebA datasets, and note that our explanations can lead to smaller and higher precision anchors.
Tasks Image Classification
Published 2019-06-01
URL https://arxiv.org/abs/1906.00297v1
PDF https://arxiv.org/pdf/1906.00297v1.pdf
PWC https://paperswithcode.com/paper/190600297
Repo https://github.com/kurtisdavid/ImageAnchors
Framework pytorch

Global Context for Convolutional Pose Machines

Title Global Context for Convolutional Pose Machines
Authors Daniil Osokin
Abstract Convolutional Pose Machine is a popular neural network architecture for articulated pose estimation. In this work we explore its empirical receptive field and realize, that it can be enhanced with integration of a global context. To do so U-shaped context module is proposed and compared with the pyramid pooling and atrous spatial pyramid pooling modules, which are often used in semantic segmentation domain. The proposed neural network achieves state-of-the-art accuracy with 87.9% PCKh for single-person pose estimation on the Look Into Person dataset. A smaller version of this network runs more than 160 frames per second while being just 2.9% less accurate. Generalization of the proposed approach is tested on the MPII benchmark and shown, that it faster than hourglass-based networks, while provides similar accuracy. The code is available at https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/human_pose_estimation .
Tasks Pose Estimation, Semantic Segmentation
Published 2019-06-10
URL https://arxiv.org/abs/1906.04104v1
PDF https://arxiv.org/pdf/1906.04104v1.pdf
PWC https://paperswithcode.com/paper/global-context-for-convolutional-pose
Repo https://github.com/Daniil-Osokin/gccpm-look-into-person-cvpr19.pytorch
Framework pytorch

Down to the Last Detail: Virtual Try-on with Detail Carving

Title Down to the Last Detail: Virtual Try-on with Detail Carving
Authors Jiahang Wang, Wei Zhang, Weizhong Liu, Tao Mei
Abstract Virtual try-on under arbitrary poses has attracted lots of research attention due to its huge potential applications. However, existing methods can hardly preserve the details in clothing texture and facial identity (face, hair) while fitting novel clothes and poses onto a person. In this paper, we propose a novel multi-stage framework to synthesize person images, where rich details in salient regions can be well preserved. Specifically, a multi-stage framework is proposed to decompose the generation into spatial alignment followed by a coarse-to-fine generation. To better preserve the details in salient areas such as clothing and facial areas, we propose a Tree-Block (tree dilated fusion block) to harness multi-scale features in the generator networks. With end-to-end training of multiple stages, the whole framework can be jointly optimized for results with significantly better visual fidelity and richer details. Extensive experiments on standard datasets demonstrate that our proposed framework achieves the state-of-the-art performance, especially in preserving the visual details in clothing texture and facial identity. Our implementation will be publicly available soon.
Tasks
Published 2019-12-13
URL https://arxiv.org/abs/1912.06324v2
PDF https://arxiv.org/pdf/1912.06324v2.pdf
PWC https://paperswithcode.com/paper/down-to-the-last-detail-virtual-try-on-with
Repo https://github.com/AIprogrammer/Down-to-the-Last-Detail-Virtual-Try-on-with-Detail-Carving
Framework pytorch

Bridging the Domain Gap for Ground-to-Aerial Image Matching

Title Bridging the Domain Gap for Ground-to-Aerial Image Matching
Authors Krishna Regmi, Mubarak Shah
Abstract The visual entities in cross-view images exhibit drastic domain changes due to the difference in viewpoints each set of images is captured from. Existing state-of-the-art methods address the problem by learning view-invariant descriptors for the images. We propose a novel method for solving this task by exploiting the generative powers of conditional GANs to synthesize an aerial representation of a ground level panorama and use it to minimize the domain gap between the two views. The synthesized image being from the same view as the target image helps the network to preserve important cues in aerial images following our Joint Feature Learning approach. Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation. In addition, multi-scale feature aggregation preserves image representations at different feature scales useful for solving this complex task. Experimental results show that our proposed approach performs significantly better than the state-of-the-art methods on the challenging CVUSA dataset in terms of top-1 and top-1% retrieval accuracies. Furthermore, to evaluate the generalization of our method on urban landscapes, we collected a new cross-view localization dataset with geo-reference information.
Tasks
Published 2019-04-24
URL https://arxiv.org/abs/1904.11045v2
PDF https://arxiv.org/pdf/1904.11045v2.pdf
PWC https://paperswithcode.com/paper/bridging-the-domain-gap-for-ground-to-aerial
Repo https://github.com/kregmi/cross-view-image-matching
Framework tf
comments powered by Disqus