Paper Group AWR 164
TANet: Robust 3D Object Detection from Point Clouds with Triple Attention. RetinaFace: Single-stage Dense Face Localisation in the Wild. GO Gradient for Expectation-Based Objectives. No Padding Please: Efficient Neural Handwriting Recognition. Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification. ORRB – OpenAI Remote R …
TANet: Robust 3D Object Detection from Point Clouds with Triple Attention
Title | TANet: Robust 3D Object Detection from Point Clouds with Triple Attention |
Authors | Zhe Liu, Xin Zhao, Tengteng Huang, Ruolan Hu, Yu Zhou, Xiang Bai |
Abstract | In this paper, we focus on exploring the robustness of the 3D object detection in point clouds, which has been rarely discussed in existing approaches. We observe two crucial phenomena: 1) the detection accuracy of the hard objects, e.g., Pedestrians, is unsatisfactory, 2) when adding additional noise points, the performance of existing approaches decreases rapidly. To alleviate these problems, a novel TANet is introduced in this paper, which mainly contains a Triple Attention (TA) module, and a Coarse-to-Fine Regression (CFR) module. By considering the channel-wise, point-wise and voxel-wise attention jointly, the TA module enhances the crucial information of the target while suppresses the unstable cloud points. Besides, the novel stacked TA further exploits the multi-level feature attention. In addition, the CFR module boosts the accuracy of localization without excessive computation cost. Experimental results on the validation set of KITTI dataset demonstrate that, in the challenging noisy cases, i.e., adding additional random noisy points around each object,the presented approach goes far beyond state-of-the-art approaches. Furthermore, for the 3D object detection task of the KITTI benchmark, our approach ranks the first place on Pedestrian class, by using the point clouds as the only input. The running speed is around 29 frames per second. |
Tasks | 3D Object Detection, Object Detection |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05163v1 |
https://arxiv.org/pdf/1912.05163v1.pdf | |
PWC | https://paperswithcode.com/paper/tanet-robust-3d-object-detection-from-point |
Repo | https://github.com/happinesslz/TANet |
Framework | pytorch |
RetinaFace: Single-stage Dense Face Localisation in the Wild
Title | RetinaFace: Single-stage Dense Face Localisation in the Wild |
Authors | Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou |
Abstract | Face Analysis Project on MXNet |
Tasks | Face Detection, Face Verification, Multi-Task Learning |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00641v2 |
https://arxiv.org/pdf/1905.00641v2.pdf | |
PWC | https://paperswithcode.com/paper/190500641 |
Repo | https://github.com/peteryuX/retinaface-tf2 |
Framework | tf |
GO Gradient for Expectation-Based Objectives
Title | GO Gradient for Expectation-Based Objectives |
Authors | Yulai Cong, Miaoyun Zhao, Ke Bai, Lawrence Carin |
Abstract | Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer from high variance, seeking help from (often) complicated variance-reduction techniques; or (ii) they only apply to reparameterizable continuous random variables and employ a reparameterization trick. To address these limitations, we propose a General and One-sample (GO) gradient that (i) applies to many distributions associated with non-reparameterizable continuous or discrete random variables, and (ii) has the same low-variance as the reparameterization trick. We find that the GO gradient often works well in practice based on only one Monte Carlo sample (although one can of course use more samples if desired). Alongside the GO gradient, we develop a means of propagating the chain rule through distributions, yielding statistical back-propagation, coupling neural networks to common random variables. |
Tasks | |
Published | 2019-01-17 |
URL | http://arxiv.org/abs/1901.06020v1 |
http://arxiv.org/pdf/1901.06020v1.pdf | |
PWC | https://paperswithcode.com/paper/go-gradient-for-expectation-based-objectives |
Repo | https://github.com/YulaiCong/GOgradient |
Framework | tf |
No Padding Please: Efficient Neural Handwriting Recognition
Title | No Padding Please: Efficient Neural Handwriting Recognition |
Authors | Gideon Maillette de Buy Wenniger, Lambert Schomaker, Andy Way |
Abstract | Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) recurrent neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all directions, this strength limits the possibilities for parallelization, and therefore comes at a high computational cost. In this work we develop methods to create efficient MDLSTM-based models for NHR, particularly a method aimed at eliminating computation waste that results from padding. This proposed method, called example-packing, replaces wasteful stacking of padded examples with efficient tiling in a 2-dimensional grid. For word-based NHR this yields a speed improvement of factor 6.6 over an already efficient baseline of minimal padding for each batch separately. For line-based NHR the savings are more modest, but still significant. In addition to example-packing, we propose: 1) a technique to optimize parallelization for dynamic graph definition frameworks including PyTorch, using convolutions with grouping, 2) a method for parallelization across GPUs for variable-length example batches. All our techniques are thoroughly tested on our own PyTorch re-implementation of MDLSTM-based NHR models. A thorough evaluation on the IAM dataset shows that our models are performing similar to earlier implementations of state-of-the-art models. Our efficient NHR model and some of the reusable techniques discussed with it offer ways to realize relatively efficient models for the omnipresent scenario of variable-length inputs in deep learning. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11208v1 |
http://arxiv.org/pdf/1902.11208v1.pdf | |
PWC | https://paperswithcode.com/paper/no-padding-please-efficient-neural |
Repo | https://github.com/gwenniger/multi-hare |
Framework | pytorch |
Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification
Title | Hierarchical Graph Convolutional Networks for Semi-supervised Node Classification |
Authors | Fenyu Hu, Yanqiao Zhu, Shu Wu, Liang Wang, Tieniu Tan |
Abstract | Graph convolutional networks (GCNs) have been successfully applied in node classification tasks of network mining. However, most of these models based on neighborhood aggregation are usually shallow and lack the “graph pooling” mechanism, which prevents the model from obtaining adequate global information. In order to increase the receptive field, we propose a novel deep Hierarchical Graph Convolutional Network (H-GCN) for semi-supervised node classification. H-GCN first repeatedly aggregates structurally similar nodes to hyper-nodes and then refines the coarsened graph to the original to restore the representation for each node. Instead of merely aggregating one- or two-hop neighborhood information, the proposed coarsening procedure enlarges the receptive field for each node, hence more global information can be captured. The proposed H-GCN model shows strong empirical performance on various public benchmark graph datasets, outperforming state-of-the-art methods and acquiring up to 5.9% performance improvement in terms of accuracy. In addition, when only a few labeled samples are provided, our model gains substantial improvements. |
Tasks | Node Classification |
Published | 2019-02-13 |
URL | https://arxiv.org/abs/1902.06667v4 |
https://arxiv.org/pdf/1902.06667v4.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-node-classification-via |
Repo | https://github.com/CRIPAC-DIG/H-GCN |
Framework | tf |
ORRB – OpenAI Remote Rendering Backend
Title | ORRB – OpenAI Remote Rendering Backend |
Authors | Maciek Chociej, Peter Welinder, Lilian Weng |
Abstract | We present the OpenAI Remote Rendering Backend (ORRB), a system that allows fast and customizable rendering of robotics environments. It is based on the Unity3d game engine and interfaces with the MuJoCo physics simulation library. ORRB was designed with visual domain randomization in mind. It is optimized for cloud deployment and high throughput operation. We are releasing it to the public under a liberal MIT license: https://github.com/openai/orrb . |
Tasks | |
Published | 2019-06-26 |
URL | https://arxiv.org/abs/1906.11633v1 |
https://arxiv.org/pdf/1906.11633v1.pdf | |
PWC | https://paperswithcode.com/paper/orrb-openai-remote-rendering-backend |
Repo | https://github.com/openai/orrb |
Framework | none |
Beyond Correlation: A Path-Invariant Measure for Seismogram Similarity
Title | Beyond Correlation: A Path-Invariant Measure for Seismogram Similarity |
Authors | Joshua Dickey, Brett Borghetti, William Junek, Richard Martin |
Abstract | Similarity search is a popular technique for seismic signal processing, with template matching, matched filters and subspace detectors being utilized for a wide variety of tasks, including both signal detection and source discrimination. Traditionally, these techniques rely on the cross-correlation function as the basis for measuring similarity. Unfortunately, seismogram correlation is dominated by path effects, essentially requiring a distinct waveform template along each path of interest. To address this limitation, we propose a novel measure of seismogram similarity that is explicitly invariant to path. Using Earthscope’s USArray experiment, a path-rich dataset of 207,291 regional seismograms across 8,452 unique events is constructed, and then employed via the batch-hard triplet loss function, to train a deep convolutional neural network which maps raw seismograms to a low dimensional embedding space, where nearness on the space corresponds to nearness of source function, regardless of path or recording instrumentation. This path-agnostic embedding space forms a new representation for seismograms, characterized by robust, source-specific features, which we show to be useful for performing both pairwise event association as well as template-based source discrimination with a single template. |
Tasks | |
Published | 2019-04-16 |
URL | https://arxiv.org/abs/1904.07936v4 |
https://arxiv.org/pdf/1904.07936v4.pdf | |
PWC | https://paperswithcode.com/paper/beyond-correlation-a-path-invariant-measure |
Repo | https://github.com/joshuadickey/seis-sim |
Framework | tf |
On the Cross-lingual Transferability of Monolingual Representations
Title | On the Cross-lingual Transferability of Monolingual Representations |
Authors | Mikel Artetxe, Sebastian Ruder, Dani Yogatama |
Abstract | State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that transfers a monolingual model to new languages at the lexical level. More concretely, we first train a transformer-based masked language model on one language, and transfer it to a new language by learning a new embedding matrix with the same masked language modeling objective -freezing parameters of all other layers. This approach does not rely on a shared vocabulary or joint training. However, we show that it is competitive with multilingual BERT on standard cross-lingual classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD). Our results contradict common beliefs of the basis of the generalization ability of multilingual models and suggest that deep monolingual models learn some abstractions that generalize across languages. We also release XQuAD as a more comprehensive cross-lingual benchmark, which comprises 240 paragraphs and 1190 question-answer pairs from SQuAD v1.1 translated into ten languages by professional translators. |
Tasks | Language Modelling, Question Answering |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11856v1 |
https://arxiv.org/pdf/1910.11856v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-cross-lingual-transferability-of |
Repo | https://github.com/deepmind/xquad |
Framework | none |
MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension
Title | MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension |
Authors | Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, Danqi Chen |
Abstract | We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the final six were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT. |
Tasks | Multi-Task Learning, Question Answering, Reading Comprehension |
Published | 2019-10-22 |
URL | https://arxiv.org/abs/1910.09753v2 |
https://arxiv.org/pdf/1910.09753v2.pdf | |
PWC | https://paperswithcode.com/paper/mrqa-2019-shared-task-evaluating |
Repo | https://github.com/mrqa/MRQA-Shared-Task-2019 |
Framework | none |
FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications
Title | FireNet: A Specialized Lightweight Fire & Smoke Detection Model for Real-Time IoT Applications |
Authors | Arpit Jadon, Mohd. Omama, Akshay Varshney, Mohammad Samar Ansari, Rishabh Sharma |
Abstract | Fire disasters typically result in lot of loss to life and property. It is therefore imperative that precise, fast, and possibly portable solutions to detect fire be made readily available to the masses at reasonable prices. There have been several research attempts to design effective and appropriately priced fire detection systems with varying degrees of success. However, most of them demonstrate a trade-off between performance and model size (which decides the model’s ability to be installed on portable devices). The work presented in this paper is an attempt to deal with both the performance and model size issues in one design. Toward that end, a `designed-from-scratch’ neural network, named FireNet, is proposed which is worthy on both the counts: (i) it has better performance than existing counterparts, and (ii) it is lightweight enough to be deploy-able on embedded platforms like Raspberry Pi. Performance evaluations on a standard dataset, as well as our own newly introduced custom-compiled fire dataset, are extremely encouraging. | |
Tasks | |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11922v2 |
https://arxiv.org/pdf/1905.11922v2.pdf | |
PWC | https://paperswithcode.com/paper/firenet-a-specialized-lightweight-fire-smoke |
Repo | https://github.com/arpit-jadon/FireNet-LightWeight-Network-for-Fire-Detection |
Framework | tf |
Sherlock: A Deep Learning Approach to Semantic Data Type Detection
Title | Sherlock: A Deep Learning Approach to Semantic Data Type Detection |
Authors | Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, Çağatay Demiralp, César Hidalgo |
Abstract | Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery. Existing data preparation and analysis systems rely on dictionary lookups and regular expression matching to detect semantic types. However, these matching-based approaches often are not robust to dirty data and only detect a limited number of types. We introduce Sherlock, a multi-input deep neural network for detecting semantic types. We train Sherlock on $686,765$ data columns retrieved from the VizNet corpus by matching $78$ semantic types from DBpedia to column headers. We characterize each matched column with $1,588$ features describing the statistical properties, character distributions, word embeddings, and paragraph vectors of column values. Sherlock achieves a support-weighted F$_1$ score of $0.89$, exceeding that of machine learning baselines, dictionary and regular expression benchmarks, and the consensus of crowdsourced annotations. |
Tasks | Word Embeddings |
Published | 2019-05-25 |
URL | https://arxiv.org/abs/1905.10688v1 |
https://arxiv.org/pdf/1905.10688v1.pdf | |
PWC | https://paperswithcode.com/paper/sherlock-a-deep-learning-approach-to-semantic |
Repo | https://github.com/tsegall/fta |
Framework | none |
GANchors: Realistic Image Perturbation Distributions for Anchors Using Generative Models
Title | GANchors: Realistic Image Perturbation Distributions for Anchors Using Generative Models |
Authors | Kurtis Evan David, Harrison Keane, Jun Min Noh |
Abstract | We extend and improve the work of Model Agnostic Anchors for explanations on image classification through the use of generative adversarial networks (GANs). Using GANs, we generate samples from a more realistic perturbation distribution, by optimizing under a lower dimensional latent space. This increases the trust in an explanation, as results now come from images that are more likely to be found in the original training set of a classifier, rather than an overlay of random images. A large drawback to our method is the computational complexity of sampling through optimization; to address this, we implement more efficient algorithms, including a diverse encoder. Lastly, we share results from the MNIST and CelebA datasets, and note that our explanations can lead to smaller and higher precision anchors. |
Tasks | Image Classification |
Published | 2019-06-01 |
URL | https://arxiv.org/abs/1906.00297v1 |
https://arxiv.org/pdf/1906.00297v1.pdf | |
PWC | https://paperswithcode.com/paper/190600297 |
Repo | https://github.com/kurtisdavid/ImageAnchors |
Framework | pytorch |
Global Context for Convolutional Pose Machines
Title | Global Context for Convolutional Pose Machines |
Authors | Daniil Osokin |
Abstract | Convolutional Pose Machine is a popular neural network architecture for articulated pose estimation. In this work we explore its empirical receptive field and realize, that it can be enhanced with integration of a global context. To do so U-shaped context module is proposed and compared with the pyramid pooling and atrous spatial pyramid pooling modules, which are often used in semantic segmentation domain. The proposed neural network achieves state-of-the-art accuracy with 87.9% PCKh for single-person pose estimation on the Look Into Person dataset. A smaller version of this network runs more than 160 frames per second while being just 2.9% less accurate. Generalization of the proposed approach is tested on the MPII benchmark and shown, that it faster than hourglass-based networks, while provides similar accuracy. The code is available at https://github.com/opencv/openvino_training_extensions/tree/develop/pytorch_toolkit/human_pose_estimation . |
Tasks | Pose Estimation, Semantic Segmentation |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04104v1 |
https://arxiv.org/pdf/1906.04104v1.pdf | |
PWC | https://paperswithcode.com/paper/global-context-for-convolutional-pose |
Repo | https://github.com/Daniil-Osokin/gccpm-look-into-person-cvpr19.pytorch |
Framework | pytorch |
Down to the Last Detail: Virtual Try-on with Detail Carving
Title | Down to the Last Detail: Virtual Try-on with Detail Carving |
Authors | Jiahang Wang, Wei Zhang, Weizhong Liu, Tao Mei |
Abstract | Virtual try-on under arbitrary poses has attracted lots of research attention due to its huge potential applications. However, existing methods can hardly preserve the details in clothing texture and facial identity (face, hair) while fitting novel clothes and poses onto a person. In this paper, we propose a novel multi-stage framework to synthesize person images, where rich details in salient regions can be well preserved. Specifically, a multi-stage framework is proposed to decompose the generation into spatial alignment followed by a coarse-to-fine generation. To better preserve the details in salient areas such as clothing and facial areas, we propose a Tree-Block (tree dilated fusion block) to harness multi-scale features in the generator networks. With end-to-end training of multiple stages, the whole framework can be jointly optimized for results with significantly better visual fidelity and richer details. Extensive experiments on standard datasets demonstrate that our proposed framework achieves the state-of-the-art performance, especially in preserving the visual details in clothing texture and facial identity. Our implementation will be publicly available soon. |
Tasks | |
Published | 2019-12-13 |
URL | https://arxiv.org/abs/1912.06324v2 |
https://arxiv.org/pdf/1912.06324v2.pdf | |
PWC | https://paperswithcode.com/paper/down-to-the-last-detail-virtual-try-on-with |
Repo | https://github.com/AIprogrammer/Down-to-the-Last-Detail-Virtual-Try-on-with-Detail-Carving |
Framework | pytorch |
Bridging the Domain Gap for Ground-to-Aerial Image Matching
Title | Bridging the Domain Gap for Ground-to-Aerial Image Matching |
Authors | Krishna Regmi, Mubarak Shah |
Abstract | The visual entities in cross-view images exhibit drastic domain changes due to the difference in viewpoints each set of images is captured from. Existing state-of-the-art methods address the problem by learning view-invariant descriptors for the images. We propose a novel method for solving this task by exploiting the generative powers of conditional GANs to synthesize an aerial representation of a ground level panorama and use it to minimize the domain gap between the two views. The synthesized image being from the same view as the target image helps the network to preserve important cues in aerial images following our Joint Feature Learning approach. Our Feature Fusion method combines the complementary features from a synthesized aerial image with the corresponding ground features to obtain a robust query representation. In addition, multi-scale feature aggregation preserves image representations at different feature scales useful for solving this complex task. Experimental results show that our proposed approach performs significantly better than the state-of-the-art methods on the challenging CVUSA dataset in terms of top-1 and top-1% retrieval accuracies. Furthermore, to evaluate the generalization of our method on urban landscapes, we collected a new cross-view localization dataset with geo-reference information. |
Tasks | |
Published | 2019-04-24 |
URL | https://arxiv.org/abs/1904.11045v2 |
https://arxiv.org/pdf/1904.11045v2.pdf | |
PWC | https://paperswithcode.com/paper/bridging-the-domain-gap-for-ground-to-aerial |
Repo | https://github.com/kregmi/cross-view-image-matching |
Framework | tf |