April 3, 2020

3017 words 15 mins read

Paper Group ANR 65

From English To Foreign Languages: Transferring Pre-trained Language Models. Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object Detection from Point Clouds. Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks. Extending a Tag-based Collaborative Recommender with Co-occurring Information Interest …

From English To Foreign Languages: Transferring Pre-trained Language Models


Title	From English To Foreign Languages: Transferring Pre-trained Language Models
Authors	Ke Tran
Abstract	Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two zero-shot tasks: natural language inference and dependency parsing.
Tasks	Dependency Parsing, Natural Language Inference
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07306v1
PDF	https://arxiv.org/pdf/2002.07306v1.pdf
PWC	https://paperswithcode.com/paper/from-english-to-foreign-languages-1
Repo
Framework

Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object Detection from Point Clouds


Title	Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object Detection from Point Clouds
Authors	Guodong Xu, Wenxiao Wang, Zili Liu, Liang Xie, Zheng Yang, Haifeng Liu, Deng Cai
Abstract	3D object detection based on point clouds has become more and more popular. Some methods propose localizing 3D objects directly from raw point clouds to avoid information loss. However, these methods come with complex structures and significant computational overhead, limiting its broader application in real-time scenarios. Some methods choose to transform the point cloud data into compact tensors first and leverage off-the-shelf 2D detectors to propose 3D objects, which is much faster and achieves state-of-the-art results. However, because of the inconsistency between 2D and 3D data, we argue that the performance of compact tensor-based 3D detectors is restricted if we use 2D detectors without corresponding modification. Specifically, the distribution of point clouds is uneven, with most points gather on the boundary of objects, while detectors for 2D data always extract features evenly. Motivated by this observation, we propose DENse Feature Indicator (DENFI), a universal module that helps 3D detectors focus on the densest region of the point clouds in a boundary-aware manner. Moreover, DENFI is lightweight and guarantees real-time speed when applied to 3D object detectors. Experiments on KITTI dataset show that DENFI improves the performance of the baseline single-stage detector remarkably, which achieves new state-of-the-art performance among previous 3D detectors, including both two-stage and multi-sensor fusion methods, in terms of mAP with a 34FPS detection speed.
Tasks	3D Object Detection, Object Detection, Sensor Fusion
Published	2020-04-01
URL	https://arxiv.org/abs/2004.00186v1
PDF	https://arxiv.org/pdf/2004.00186v1.pdf
PWC	https://paperswithcode.com/paper/boundary-aware-dense-feature-indicator-for
Repo
Framework

Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks


Title	Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Authors	Carlos Aspillaga, Andrés Carvallo, Vladimir Araujo
Abstract	There has been significant progress in recent years in the field of Natural Language Processing thanks to the introduction of the Transformer architecture. Current state-of-the-art models, via a large number of parameters and pre-training on massive text corpus, have shown impressive results on several downstream tasks. Many researchers have studied previous (non-Transformer) models to understand their actual behavior under different scenarios, showing that these models are taking advantage of clues or failures of datasets and that slight perturbations on the input data can severely reduce their performance. In contrast, recent models have not been systematically tested with adversarial-examples in order to show their robustness under severe stress conditions. For that reason, this work evaluates three Transformer-based models (RoBERTa, XLNet, and BERT) in Natural Language Inference (NLI) and Question Answering (QA) tasks to know if they are more robust or if they have the same flaws as their predecessors. As a result, our experiments reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks. Nevertheless, they are still very fragile and demonstrate various unexpected behaviors, thus revealing that there is still room for future improvement in this field.
Tasks	Natural Language Inference, Question Answering
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06261v2
PDF	https://arxiv.org/pdf/2002.06261v2.pdf
PWC	https://paperswithcode.com/paper/stress-test-evaluation-of-transformer-based
Repo
Framework

Extending a Tag-based Collaborative Recommender with Co-occurring Information Interests


Title	Extending a Tag-based Collaborative Recommender with Co-occurring Information Interests
Authors	Noemi Mauro, Liliana Ardissono
Abstract	Collaborative Filtering is largely applied to personalize item recommendation but its performance is affected by the sparsity of rating data. In order to address this issue, recent systems have been developed to improve recommendation by extracting latent factors from the rating matrices, or by exploiting trust relations established among users in social networks. In this work, we are interested in evaluating whether other sources of preference information than ratings and social ties can be used to improve recommendation performance. Specifically, we aim at testing whether the integration of frequently co-occurring interests in information search logs can improve recommendation performance in User-to-User Collaborative Filtering (U2UCF). For this purpose, we propose the Extended Category-based Collaborative Filtering (ECCF) recommender, which enriches category-based user profiles derived from the analysis of rating behavior with data categories that are frequently searched together by people in search sessions. We test our model using a big rating dataset and a log of a largely used search engine to extract the co-occurrence of interests. The experiments show that ECCF outperforms U2UCF and category-based collaborative recommendation in accuracy, MRR, diversity of recommendations and user coverage. Moreover, it outperforms the SVD++ Matrix Factorization algorithm in accuracy and diversity of recommendation lists.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13474v1
PDF	https://arxiv.org/pdf/2003.13474v1.pdf
PWC	https://paperswithcode.com/paper/extending-a-tag-based-collaborative
Repo
Framework

Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference


Title	Utilizing BERT Intermediate Layers for Aspect Based Sentiment Analysis and Natural Language Inference
Authors	Youwei Song, Jiahai Wang, Zhiwei Liang, Zhiyue Liu, Tao Jiang
Abstract	Aspect based sentiment analysis aims to identify the sentimental tendency towards a given aspect in text. Fine-tuning of pretrained BERT performs excellent on this task and achieves state-of-the-art performances. Existing BERT-based works only utilize the last output layer of BERT and ignore the semantic knowledge in the intermediate layers. This paper explores the potential of utilizing BERT intermediate layers to enhance the performance of fine-tuning of BERT. To the best of our knowledge, no existing work has been done on this research. To show the generality, we also apply this approach to a natural language inference task. Experimental results demonstrate the effectiveness and generality of the proposed approach.
Tasks	Aspect-Based Sentiment Analysis, Natural Language Inference, Sentiment Analysis
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04815v1
PDF	https://arxiv.org/pdf/2002.04815v1.pdf
PWC	https://paperswithcode.com/paper/utilizing-bert-intermediate-layers-for-aspect
Repo
Framework

Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference


Title	Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference
Authors	Timo Schick, Hinrich Schütze
Abstract	Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with “task descriptions” in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases which help the language model understand the given task. Theses phrases are then used to assign soft labels to a large set of unlabeled examples. Finally, regular supervised training is performed on the resulting training set. On several tasks, we show that PET outperforms both supervised training and unsupervised approaches in low-resource settings by a large margin.
Tasks	Language Modelling, Natural Language Inference, Text Classification
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07676v1
PDF	https://arxiv.org/pdf/2001.07676v1.pdf
PWC	https://paperswithcode.com/paper/exploiting-cloze-questions-for-few-shot-text
Repo
Framework

Cryptocurrency Address Clustering and Labeling


Title	Cryptocurrency Address Clustering and Labeling
Authors	Mengjiao Wang, Hikaru Ichijo, Bob Xiao
Abstract	Anonymity is one of the most important qualities of blockchain technology. For example, one can simply create a bitcoin address to send and receive funds without providing KYC to any authority. In general, the real identity behind cryptocurrency addresses is not known, however, some addresses can be clustered according to their ownership by analyzing behavioral patterns, allowing those with known attribution to be assigned labels. These labels may be further used for legal and compliance purposes to assist in law enforcement investigations. In this document, we discuss our methodology behind assigning attribution labels to cryptocurrency addresses.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13399v1
PDF	https://arxiv.org/pdf/2003.13399v1.pdf
PWC	https://paperswithcode.com/paper/cryptocurrency-address-clustering-and
Repo
Framework

Empirical Analysis of Zipf’s Law, Power Law, and Lognormal Distributions in Medical Discharge Reports


Title	Empirical Analysis of Zipf’s Law, Power Law, and Lognormal Distributions in Medical Discharge Reports
Authors	Juan C Quiroz, Liliana Laranjo, Catalin Tufanaru, Ahmet Baki Kocaballi, Dana Rezazadegan, Shlomo Berkovsky, Enrico Coiera
Abstract	Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether text in medical discharge reports follow Zipf’s law, a commonly assumed statistical property of language where word frequency follows a discrete power law distribution. We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power law distributions to the data, and testing whether alternative distributions–lognormal, exponential, stretched exponential, and truncated power law–provided superior fits to the data. Results show that discharge reports are best fit by the truncated power law and lognormal distributions. Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power law and lognormal probability priors.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13352v1
PDF	https://arxiv.org/pdf/2003.13352v1.pdf
PWC	https://paperswithcode.com/paper/empirical-analysis-of-zipf-s-law-power-law
Repo
Framework

Slow and Stale Gradients Can Win the Race


Title	Slow and Stale Gradients Can Win the Race
Authors	Sanghamitra Dutta, Jianyu Wang, Gauri Joshi
Abstract	Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime(wallclock time). The main novelty in our work is that our runtime analysis considers random straggling delays, which helps us design and compare distributed SGD algorithms that strike a balance between straggling and staleness. We also provide a new error convergence analysis of asynchronous SGD variants without bounded or exponential delay assumptions. Finally, based on our theoretical characterization of the error-runtime trade-off, we propose a method of gradually varying synchronicity in distributed SGD and demonstrate its performance on CIFAR10 dataset.
Tasks
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10579v1
PDF	https://arxiv.org/pdf/2003.10579v1.pdf
PWC	https://paperswithcode.com/paper/slow-and-stale-gradients-can-win-the-race-1
Repo
Framework

DeepDualMapper: A Gated Fusion Network for Automatic Map Extraction using Aerial Images and Trajectories


Title	DeepDualMapper: A Gated Fusion Network for Automatic Map Extraction using Aerial Images and Trajectories
Authors	Hao Wu, Hanyuan Zhang, Xinyu Zhang, Weiwei Sun, Baihua Zheng, Yuning Jiang
Abstract	Automatic map extraction is of great importance to urban computing and location-based services. Aerial image and GPS trajectory data refer to two different data sources that could be leveraged to generate the map, although they carry different types of information. Most previous works on data fusion between aerial images and data from auxiliary sensors do not fully utilize the information of both modalities and hence suffer from the issue of information loss. We propose a deep convolutional neural network called DeepDualMapper which fuses the aerial image and trajectory data in a more seamless manner to extract the digital map. We design a gated fusion module to explicitly control the information flows from both modalities in a complementary-aware manner. Moreover, we propose a novel densely supervised refinement decoder to generate the prediction in a coarse-to-fine way. Our comprehensive experiments demonstrate that DeepDualMapper can fuse the information of images and trajectories much more effectively than existing approaches, and is able to generate maps with higher accuracy.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06832v1
PDF	https://arxiv.org/pdf/2002.06832v1.pdf
PWC	https://paperswithcode.com/paper/deepdualmapper-a-gated-fusion-network-for
Repo
Framework

Machine Learning String Standard Models


Title	Machine Learning String Standard Models
Authors	Rehan Deen, Yang-Hui He, Seung-Joo Lee, Andre Lukas
Abstract	We study machine learning of phenomenologically relevant properties of string compactifications, which arise in the context of heterotic line bundle models. Both supervised and unsupervised learning are considered. We find that, for a fixed compactification manifold, relatively small neural networks are capable of distinguishing consistent line bundle models with the correct gauge group and the correct chiral asymmetry from random models without these properties. The same distinction can also be achieved in the context of unsupervised learning, using an auto-encoder. Learning non-topological properties, specifically the number of Higgs multiplets, turns out to be more difficult, but is possible using sizeable networks and feature-enhanced data sets.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13339v1
PDF	https://arxiv.org/pdf/2003.13339v1.pdf
PWC	https://paperswithcode.com/paper/machine-learning-string-standard-models
Repo
Framework

On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions


Title	On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions
Authors	Jingzhao Zhang, Hongzhou Lin, Stefanie Jegelka, Ali Jadbabaie, Suvrit Sra
Abstract	We provide the first \emph{non-asymptotic} analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains important examples such as ReLU neural networks and others with non-differentiable activation functions. First, we show that finding an $\epsilon$-stationary point with first-order methods is impossible in finite time. Therefore, we introduce the notion of \emph{$(\delta, \epsilon)$-stationarity}, a generalization that allows for a point to be within distance $\delta$ of an $\epsilon$-stationary point and reduces to $\epsilon$-stationarity for smooth functions. We propose a series of randomized first-order methods and analyze their complexity of finding a $(\delta, \epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $\delta$. Empirically, our methods perform well for training ReLU neural networks.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04130v2
PDF	https://arxiv.org/pdf/2002.04130v2.pdf
PWC	https://paperswithcode.com/paper/on-complexity-of-finding-stationary-points-of
Repo
Framework

PointGMM: a Neural GMM Network for Point Clouds


Title	PointGMM: a Neural GMM Network for Point Clouds
Authors	Amir Hertz, Rana Hanocka, Raja Giryes, Daniel Cohen-Or
Abstract	Point clouds are a popular representation for 3D shapes. However, they encode a particular sampling without accounting for shape priors or non-local information. We advocate for the use of a hierarchical Gaussian mixture model (hGMM), which is a compact, adaptive and lightweight representation that probabilistically defines the underlying 3D surface. We present PointGMM, a neural network that learns to generate hGMMs which are characteristic of the shape class, and also coincide with the input point cloud. PointGMM is trained over a collection of shapes to learn a class-specific prior. The hierarchical representation has two main advantages: (i) coarse-to-fine learning, which avoids converging to poor local-minima; and (ii) (an unsupervised) consistent partitioning of the input shape. We show that as a generative model, PointGMM learns a meaningful latent space which enables generating consistent interpolations between existing shapes, as well as synthesizing novel shapes. We also present a novel framework for rigid registration using PointGMM, that learns to disentangle orientation from structure of an input shape.
Tasks
Published	2020-03-30
URL	https://arxiv.org/abs/2003.13326v1
PDF	https://arxiv.org/pdf/2003.13326v1.pdf
PWC	https://paperswithcode.com/paper/pointgmm-a-neural-gmm-network-for-point
Repo
Framework

Provable Robust Learning Based on Transformation-Specific Smoothing


Title	Provable Robust Learning Based on Transformation-Specific Smoothing
Authors	Linyi Li, Maurice Weber, Xiaojun Xu, Luka Rimanic, Tao Xie, Ce Zhang, Bo Li
Abstract	As machine learning systems become pervasive, safeguarding their security is critical. Recent work has demonstrated that motivated adversaries could manipulate the test data to mislead ML systems to make arbitrary mistakes. So far, most research has focused on providing provable robustness guarantees for a specific $\ell_p$ norm bounded adversarial perturbation. However, in practice there are more adversarial transformations that are realistic and of semantic meaning, requiring to be analyzed and ideally certified. In this paper we aim to provide a unified framework for certifying ML model robustness against general adversarial transformations. First, we leverage the function smoothing strategy to certify robustness against a series of adversarial transformations such as rotation, translation, Gaussian blur, etc. We then provide sufficient conditions and strategies for certifying certain transformations. For instance, we propose a novel sampling based interpolation approach with the estimated Lipschitz upper bound to certify the robustness against rotation transformation. In addition, we theoretically optimize the smoothing strategies for certifying the robustness of ML models against different transformations. For instance, we show that smoothing by sampling from exponential distribution provides tighter robustness bound than Gaussian. We also prove two generalization gaps for the proposed framework to understand its theoretic barrier. Extensive experiments show that our proposed unified framework significantly outperforms the state-of-the-art certified robustness approaches on several datasets including ImageNet.
Tasks
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12398v2
PDF	https://arxiv.org/pdf/2002.12398v2.pdf
PWC	https://paperswithcode.com/paper/provable-robust-learning-based-on
Repo
Framework

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition


Title	RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition
Authors	Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao
Abstract	Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$\times$ while maintaining the same inference time.
Tasks	Speech Recognition
Published	2020-02-19
URL	https://arxiv.org/abs/2002.11474v1
PDF	https://arxiv.org/pdf/2002.11474v1.pdf
PWC	https://paperswithcode.com/paper/rtmobile-beyond-real-time-mobile-acceleration
Repo
Framework