February 1, 2020

2928 words 14 mins read

Paper Group AWR 164

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. RetinaFace: Single-stage Dense Face Localisation in the Wild. GO Gradient for Expectation-Based Objectives. No Padding Please: Efficient Neural Handwriting Recognition. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. ORRB – OpenAI Remote R …

TANet: Robust 3D Object Detection from Point Clouds with Triple Attention


Title	TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
Authors	Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai
Abstract	In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation cost. Experimental results on the validation set of KITTI dataset demonstrate that, in the challenging noisy cases, i.e., adding additional random noisy points around each object,the presented approach goes far beyond state-of-the-art approaches. Furthermore, for the 3D object detection task of the KITTI benchmark, our approach ranks the first place on Pedestrian class, by using the point clouds as the only input. The running speed is around 29 frames per second.
Tasks	3D Object Detection, Object Detection
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05163v1
PDF	https://arxiv.org/pdf/1912.05163v1.pdf
PWC	https://paperswithcode.com/paper/tanet-robust-3d-object-detection-from-point
Repo	https://github.com/happinesslz/TANet
Framework	pytorch

RetinaFace: Single-stage Dense Face Localisation in the Wild


Title	RetinaFace: Single-stage Dense Face Localisation in the Wild
Authors	Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou
Abstract	Face Analysis Project on MXNet
Tasks	Face Detection, Face Verification, Multi-Task Learning
Published	2019-05-02
URL	https://arxiv.org/abs/1905.00641v2
PDF	https://arxiv.org/pdf/1905.00641v2.pdf
PWC	https://paperswithcode.com/paper/190500641
Repo	https://github.com/peteryuX/retinaface-tf2
Framework	tf

GO Gradient for Expectation-Based Objectives


Title	GO Gradient for Expectation-Based Objectives
Authors	Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin
Abstract	Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous random variables and employ a reparameterization trick. To address these limitations, we propose a General and One-sample (GO) gradient that (i) applies to many distributions associated with non-reparameterizable continuous or discrete random variables, and (ii) has the same low-variance as the reparameterization trick. We find that the GO gradient often works well in practice based on only one Monte Carlo sample (although one can of course use more samples if desired). Alongside the GO gradient, we develop a means of propagating the chain rule through distributions, yielding statistical back-propagation, coupling neural networks to common random variables.
Tasks
Published	2019-01-17
URL	http://arxiv.org/abs/1901.06020v1
PDF	http://arxiv.org/pdf/1901.06020v1.pdf
PWC	https://paperswithcode.com/paper/go-gradient-for-expectation-based-objectives
Repo	https://github.com/YulaiCong/GOgradient
Framework	tf

No Padding Please: Efficient Neural Handwriting Recognition


Title	No Padding Please: Efficient Neural Handwriting Recognition
Authors	Gideon Maillette de Buy Wenniger, Lambert Schomaker, Andy Way
Abstract	Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) recurrent neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all directions, this strength limits the possibilities for parallelization, and therefore comes at a high computational cost. In this work we develop methods to create efficient MDLSTM-based models for NHR, particularly a method aimed at eliminating computation waste that results from padding. This proposed method, called example-packing, replaces wasteful stacking of padded examples with efficient tiling in a 2-dimensional grid. For word-based NHR this yields a speed improvement of factor 6.6 over an already efficient baseline of minimal padding for each batch separately. For line-based NHR the savings are more modest, but still significant. In addition to example-packing, we propose: 1) a technique to optimize parallelization for dynamic graph definition frameworks including PyTorch, using convolutions with grouping, 2) a method for parallelization across GPUs for variable-length example batches. All our techniques are thoroughly tested on our own PyTorch re-implementation of MDLSTM-based NHR models. A thorough evaluation on the IAM dataset shows that our models are performing similar to earlier implementations of state-of-the-art models. Our efficient NHR model and some of the reusable techniques discussed with it offer ways to realize relatively efficient models for the omnipresent scenario of variable-length inputs in deep learning.
Tasks
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11208v1
PDF	http://arxiv.org/pdf/1902.11208v1.pdf
PWC	https://paperswithcode.com/paper/no-padding-please-efficient-neural
Repo	https://github.com/gwenniger/multi-hare
Framework	pytorch

Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification


Title	Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification
Authors	Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, Tieniu Tan
Abstract	Graph convolutional networks (GCNs) have been successfully applied in node classification tasks of network mining. However, most of these models based on neighborhood aggregation are usually shallow and lack the “graph pooling” mechanism, which prevents the model from obtaining adequate global information. In order to increase the receptive field, we propose a novel deep Hierarchical Graph Convolutional Network (H-GCN) for semi-supervised node classification. H-GCN first repeatedly aggregates structurally similar nodes to hyper-nodes and then refines the coarsened graph to the original to restore the representation for each node. Instead of merely aggregating one- or two-hop neighborhood information, the proposed coarsening procedure enlarges the receptive field for each node, hence more global information can be captured. The proposed H-GCN model shows strong empirical performance on various public benchmark graph datasets, outperforming state-of-the-art methods and acquiring up to 5.9% performance improvement in terms of accuracy. In addition, when only a few labeled samples are provided, our model gains substantial improvements.
Tasks	Node Classification
Published	2019-02-13
URL	https://arxiv.org/abs/1902.06667v4
PDF	https://arxiv.org/pdf/1902.06667v4.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-node-classification-via
Repo	https://github.com/CRIPAC-DIG/H-GCN
Framework	tf

ORRB – OpenAI Remote Rendering Backend


Title	ORRB – OpenAI Remote Rendering Backend
Authors	Maciek Chociej, Peter Welinder, Lilian Weng
Abstract	We present the OpenAI Remote Rendering Backend (ORRB), a system that allows fast and customizable rendering of robotics environments. It is based on the Unity3d game engine and interfaces with the MuJoCo physics simulation library. ORRB was designed with visual domain randomization in mind. It is optimized for cloud deployment and high throughput operation. We are releasing it to the public under a liberal MIT license: https://github.com/openai/orrb .
Tasks
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11633v1
PDF	https://arxiv.org/pdf/1906.11633v1.pdf
PWC	https://paperswithcode.com/paper/orrb-openai-remote-rendering-backend
Repo	https://github.com/openai/orrb
Framework	none

Beyond Correlation: A Path-Invariant Measure for Seismogram Similarity


Title	Beyond Correlation: A Path-Invariant Measure for Seismogram Similarity
Authors	Joshua Dickey, Brett Borghetti, William Junek, Richard Martin
Abstract	Similarity search is a popular technique for seismic signal processing, with template matching, matched filters and subspace detectors being utilized for a wide variety of tasks, including both signal detection and source discrimination. Traditionally, these techniques rely on the cross-correlation function as the basis for measuring similarity. Unfortunately, seismogram correlation is dominated by path effects, essentially requiring a distinct waveform template along each path of interest. To address this limitation, we propose a novel measure of seismogram similarity that is explicitly invariant to path. Using Earthscope’s USArray experiment, a path-rich dataset of 207,291 regional seismograms across 8,452 unique events is constructed, and then employed via the batch-hard triplet loss function, to train a deep convolutional neural network which maps raw seismograms to a low dimensional embedding space, where nearness on the space corresponds to nearness of source function, regardless of path or recording instrumentation. This path-agnostic embedding space forms a new representation for seismograms, characterized by robust, source-specific features, which we show to be useful for performing both pairwise event association as well as template-based source discrimination with a single template.
Tasks
Published	2019-04-16
URL	https://arxiv.org/abs/1904.07936v4
PDF	https://arxiv.org/pdf/1904.07936v4.pdf
PWC	https://paperswithcode.com/paper/beyond-correlation-a-path-invariant-measure
Repo	https://github.com/joshuadickey/seis-sim
Framework	tf

On the Cross-lingual Transferability of Monolingual Representations


Title	On the Cross-lingual Transferability of Monolingual Representations
Authors	Mikel Artetxe, Sebastian Ruder, Dani Yogatama
Abstract	State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that transfers a monolingual model to new languages at the lexical level. More concretely, we first train a transformer-based masked language model on one language, and transfer it to a new language by learning a new embedding matrix with the same masked language modeling objective -freezing parameters of all other layers. This approach does not rely on a shared vocabulary or joint training. However, we show that it is competitive with multilingual BERT on standard cross-lingual classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD). Our results contradict common beliefs of the basis of the generalization ability of multilingual models and suggest that deep monolingual models learn some abstractions that generalize across languages. We also release XQuAD as a more comprehensive cross-lingual benchmark, which comprises 240 paragraphs and 1190 question-answer pairs from SQuAD v1.1 translated into ten languages by professional translators.
Tasks	Language Modelling, Question Answering
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11856v1
PDF	https://arxiv.org/pdf/1910.11856v1.pdf
PWC	https://paperswithcode.com/paper/on-the-cross-lingual-transferability-of
Repo	https://github.com/deepmind/xquad
Framework	none

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension


Title	MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension
Authors	Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen
Abstract	We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the final six were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.
Tasks	Multi-Task Learning, Question Answering, Reading Comprehension
Published	2019-10-22
URL	https://arxiv.org/abs/1910.09753v2
PDF	https://arxiv.org/pdf/1910.09753v2.pdf
PWC	https://paperswithcode.com/paper/mrqa-2019-shared-task-evaluating
Repo	https://github.com/mrqa/MRQA-Shared-Task-2019
Framework	none

FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications


Title	FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications
Authors	Arpit Jadon, Mohd. Omama, Akshay Varshney, Mohammad Samar Ansari, Rishabh Sharma
Abstract	Fire disasters typically result in lot of loss to life and property. It is therefore imperative that precise, fast, and possibly portable solutions to detect fire be made readily available to the masses at reasonable prices. There have been several research attempts to design effective and appropriately priced fire detection systems with varying degrees of success. However, most of them demonstrate a trade-off between performance and model size (which decides the model’s ability to be installed on portable devices). The work presented in this paper is an attempt to deal with both the performance and model size issues in one design. Toward that end, a `designed-from-scratch’ neural network, named FireNet, is proposed which is worthy on both the counts: (i) it has better performance than existing counterparts, and (ii) it is lightweight enough to be deploy-able on embedded platforms like Raspberry Pi. Performance evaluations on a standard dataset, as well as our own newly introduced custom-compiled fire dataset, are extremely encouraging. \|
Tasks
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11922v2
PDF	https://arxiv.org/pdf/1905.11922v2.pdf
PWC	https://paperswithcode.com/paper/firenet-a-specialized-lightweight-fire-smoke
Repo	https://github.com/arpit-jadon/FireNet-LightWeight-Network-for-Fire-Detection
Framework	tf

Sherlock: A Deep Learning Approach to Semantic Data Type Detection


Title	Sherlock: A Deep Learning Approach to Semantic Data Type Detection
Authors	Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, Çağatay Demiralp, César Hidalgo
Abstract	Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations.
Tasks	Word Embeddings
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10688v1
PDF	https://arxiv.org/pdf/1905.10688v1.pdf
PWC	https://paperswithcode.com/paper/sherlock-a-deep-learning-approach-to-semantic
Repo	https://github.com/tsegall/fta
Framework	none

GANchors: Realistic Image Perturbation Distributions for Anchors Using Generative Models


Title	GANchors: Realistic Image Perturbation Distributions for Anchors Using Generative Models
Authors	Kurtis Evan David, Harrison Keane, Jun Min Noh
Abstract	We extend and improve the work of Model Agnostic Anchors for explanations on image classification through the use of generative adversarial networks (GANs). Using GANs, we generate samples from a more realistic perturbation distribution, by optimizing under a lower dimensional latent space. This increases the trust in an explanation, as results now come from images that are more likely to be found in the original training set of a classifier, rather than an overlay of random images. A large drawback to our method is the computational complexity of sampling through optimization; to address this, we implement more efficient algorithms, including a diverse encoder. Lastly, we share results from the MNIST and CelebA datasets, and note that our explanations can lead to smaller and higher precision anchors.
Tasks	Image Classification
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00297v1
PDF	https://arxiv.org/pdf/1906.00297v1.pdf
PWC	https://paperswithcode.com/paper/190600297
Repo	https://github.com/kurtisdavid/ImageAnchors
Framework	pytorch

Global Context for Convolutional Pose Machines


Title	Global Context for Convolutional Pose Machines
Authors	Daniil Osokin
Abstract	Convolutional Pose Machine is a popular neural network architecture for articulated pose estimation. In this work we explore its empirical receptive field and realize, that it can be enhanced with integration of a global context. To do so U-shaped context module is proposed and compared with the pyramid pooling and atrous spatial pyramid pooling modules, which are often used in semantic segmentation domain. The proposed neural network achieves state-of-the-art accuracy with 87.9% PCKh for single-person pose estimation on the Look Into Person dataset. A smaller version of this network runs more than 160 frames per second while being just 2.9% less accurate. Generalization of the proposed approach is tested on the MPII benchmark and shown, that it faster than hourglass-based networks, while provides similar accuracy. The code is available at https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/human_pose_estimation .
Tasks	Pose Estimation, Semantic Segmentation
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04104v1
PDF	https://arxiv.org/pdf/1906.04104v1.pdf
PWC	https://paperswithcode.com/paper/global-context-for-convolutional-pose
Repo	https://github.com/Daniil-Osokin/gccpm-look-into-person-cvpr19.pytorch
Framework	pytorch

Down to the Last Detail: Virtual Try-on with Detail Carving


Title	Down to the Last Detail: Virtual Try-on with Detail Carving
Authors	Jiahang Wang, Wei Zhang, Weizhong Liu, Tao Mei
Abstract	Virtual try-on under arbitrary poses has attracted lots of research attention due to its huge potential applications. However, existing methods can hardly preserve the details in clothing texture and facial identity (face, hair) while fitting novel clothes and poses onto a person. In this paper, we propose a novel multi-stage framework to synthesize person images, where rich details in salient regions can be well preserved. Specifically, a multi-stage framework is proposed to decompose the generation into spatial alignment followed by a coarse-to-fine generation. To better preserve the details in salient areas such as clothing and facial areas, we propose a Tree-Block (tree dilated fusion block) to harness multi-scale features in the generator networks. With end-to-end training of multiple stages, the whole framework can be jointly optimized for results with significantly better visual fidelity and richer details. Extensive experiments on standard datasets demonstrate that our proposed framework achieves the state-of-the-art performance, especially in preserving the visual details in clothing texture and facial identity. Our implementation will be publicly available soon.
Tasks
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06324v2
PDF	https://arxiv.org/pdf/1912.06324v2.pdf
PWC	https://paperswithcode.com/paper/down-to-the-last-detail-virtual-try-on-with
Repo	https://github.com/AIprogrammer/Down-to-the-Last-Detail-Virtual-Try-on-with-Detail-Carving
Framework	pytorch

Bridging the Domain Gap for Ground-to-Aerial Image Matching


Title	Bridging the Domain Gap for Ground-to-Aerial Image Matching
Authors	Krishna Regmi, Mubarak Shah
Abstract	The visual entities in cross-view images exhibit drastic domain changes due to the difference in viewpoints each set of images is captured from. Existing state-of-the-art methods address the problem by learning view-invariant descriptors for the images. We propose a novel method for solving this task by exploiting the generative powers of conditional GANs to synthesize an aerial representation of a ground level panorama and use it to minimize the domain gap between the two views. The synthesized image being from the same view as the target image helps the network to preserve important cues in aerial images following our Joint Feature Learning approach. Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation. In addition, multi-scale feature aggregation preserves image representations at different feature scales useful for solving this complex task. Experimental results show that our proposed approach performs significantly better than the state-of-the-art methods on the challenging CVUSA dataset in terms of top-1 and top-1% retrieval accuracies. Furthermore, to evaluate the generalization of our method on urban landscapes, we collected a new cross-view localization dataset with geo-reference information.
Tasks
Published	2019-04-24
URL	https://arxiv.org/abs/1904.11045v2
PDF	https://arxiv.org/pdf/1904.11045v2.pdf
PWC	https://paperswithcode.com/paper/bridging-the-domain-gap-for-ground-to-aerial
Repo	https://github.com/kregmi/cross-view-image-matching
Framework	tf