April 3, 2020

3417 words 17 mins read

Paper Group AWR 55

Paper Group AWR 55

RobBERT: a Dutch RoBERTa-based Language Model. Slice Tuner: A Selective Data Collection Framework for Accurate and Fair Machine Learning Models. On the Texture Bias for Few-Shot CNN Segmentation. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Netwo …

RobBERT: a Dutch RoBERTa-based Language Model

Title RobBERT: a Dutch RoBERTa-based Language Model
Authors Pieter Delobelle, Thomas Winters, Bettina Berendt
Abstract Pre-trained language models have been dominating the field of natural language processing in recent years, and have led to significant performance gains for various complex natural language tasks. One of the most prominent pre-trained language models is BERT (Bi-directional Encoders for Transformers), which was released as an English as well as a multilingual version. Although multilingual BERT performs well on many tasks, recent studies showed that BERT models trained on a single language significantly outperform the multilingual results. Training a Dutch BERT model thus has a lot of potential for a wide range of Dutch NLP tasks. While previous approaches have used earlier implementations of BERT to train their Dutch BERT, we used RoBERTa, a robustly optimized BERT approach, to train a Dutch language model called RobBERT. We show that RobBERT improves state of the art results in Dutch-specific language tasks, and also outperforms other existing Dutch BERT-based models in sentiment analysis. These results indicate that RobBERT is a powerful pre-trained model for fine-tuning for a large variety of Dutch language tasks. We publicly release this pre-trained model in hope of supporting further downstream Dutch NLP applications.
Tasks Language Modelling, Sentiment Analysis
Published 2020-01-17
URL https://arxiv.org/abs/2001.06286v1
PDF https://arxiv.org/pdf/2001.06286v1.pdf
PWC https://paperswithcode.com/paper/robbert-a-dutch-roberta-based-language-model
Repo https://github.com/iPieter/RobBERT
Framework pytorch

Slice Tuner: A Selective Data Collection Framework for Accurate and Fair Machine Learning Models

Title Slice Tuner: A Selective Data Collection Framework for Accurate and Fair Machine Learning Models
Authors Ki Hyun Tae, Steven Euijong Whang
Abstract As machine learning becomes democratized in the era of Software 2.0, one of the most serious bottlenecks is collecting enough labeled data to ensure accurate and fair models. Recent techniques including crowdsourcing provide cost-effective ways to gather such data. However, simply collecting data as much as possible is not necessarily an effective strategy for optimizing accuracy and fairness. For example, if an online app store has enough training data for certain slices of data (say American customers), but not for others, collecting more American customer data will only bias the model training. Instead, we contend that one needs to selectively collect data and propose Slice Tuner, which collects possibly-different amounts of data per slice such that the model accuracy and fairness on all slices are optimized. At its core, Slice Tuner maintains learning curves of slices that estimate the model accuracies given more data and uses convex optimization to find the best data collection strategy. The key challenges of estimating learning curves are that they may be inaccurate if there is not enough data, and there may be dependencies among slices where collecting data for one slice influences the learning curves of others. We solve these issues by iteratively and efficiently updating the learning curves as more data is collected. We evaluate Slice Tuner on real datasets using crowdsourcing for data collection and show that Slice Tuner significantly outperforms baselines in terms of model accuracy and fairness, even for initially small slices. We believe Slice Tuner is a practical tool for suggesting concrete action items based on model analysis.
Published 2020-03-10
URL https://arxiv.org/abs/2003.04549v1
PDF https://arxiv.org/pdf/2003.04549v1.pdf
PWC https://paperswithcode.com/paper/slice-tuner-a-selective-data-collection
Repo https://github.com/khtae8250/SliceTuner
Framework none

On the Texture Bias for Few-Shot CNN Segmentation

Title On the Texture Bias for Few-Shot CNN Segmentation
Authors Reza Azad, Abdur R Fayjie, Claude Kauffman, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz
Abstract Despite the initial belief that Convolutional Neural Networks (CNNs) are driven by shapes to perform visual recognition tasks, recent evidence suggests that texture bias in CNNs provides higher performing and more robust models. This contrasts with the perceptual bias in the human visual cortex, which has a stronger preference towards shape components. Perceptual differences may explain why CNNs achieve human-level performance when large labeled datasets are available, but their performance significantly degrades in low-labeled data scenarios, such as few-shot semantic segmentation. To remove the texture bias in the context of few-shot learning, we propose a novel architecture that integrates a set of difference of Gaussians (DoG) to attenuate high-frequency local components in the feature space. This produces a set of modified feature maps, whose high-frequency components are diminished at different standard deviation values of the Gaussian distribution in the spatial domain. As this results in multiple feature maps for a single image, we employ a bi-directional convolutional long-short-term-memory to efficiently merge the multi scale-space representations. We perform extensive experiments on two well-known few-shot segmentation benchmarks -Pascal i5 and FSS-1000- and demonstrate that our method outperforms significantly state-of-the-art approaches. The code is available at: https://github.com/rezazad68/fewshot-segmentation
Tasks Few-Shot Learning, Few-Shot Semantic Segmentation, Semantic Segmentation
Published 2020-03-09
URL https://arxiv.org/abs/2003.04052v1
PDF https://arxiv.org/pdf/2003.04052v1.pdf
PWC https://paperswithcode.com/paper/on-the-texture-bias-for-few-shot-cnn
Repo https://github.com/rezazad68/fewshot-segmentation
Framework tf

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

Title AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data
Authors Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, Alexander Smola
Abstract We introduce AutoGluon-Tabular, an open-source AutoML framework that requires only a single line of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file. Unlike existing AutoML frameworks that primarily focus on model/hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers. Experiments reveal that our multi-layer combination of many models offers better use of allocated training time than seeking out the best. A second contribution is an extensive evaluation of public and commercial AutoML platforms including TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML Tables. Tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark reveal that AutoGluon is faster, more robust, and much more accurate. We find that AutoGluon often even outperforms the best-in-hindsight combination of all of its competitors. In two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4h of training on the raw data.
Tasks AutoML
Published 2020-03-13
URL https://arxiv.org/abs/2003.06505v1
PDF https://arxiv.org/pdf/2003.06505v1.pdf
PWC https://paperswithcode.com/paper/autogluon-tabular-robust-and-accurate-automl
Repo https://github.com/Innixma/autogluon-benchmarking
Framework mxnet

TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network

Title TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network
Authors Jiaming Shen, Zhihong Shen, Chenyan Xiong, Chi Wang, Kuansan Wang, Jiawei Han
Abstract Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications. For example, online retailers (e.g., Amazon and eBay) use taxonomies for product recommendation, and web search engines (e.g., Google and Bing) leverage taxonomies to enhance query understanding. Enormous efforts have been made on constructing taxonomies either manually or semi-automatically. However, with the fast-growing volume of web content, existing taxonomies will become outdated and fail to capture emerging knowledge. Therefore, in many applications, dynamic expansions of an existing taxonomy are in great demand. In this paper, we study how to expand an existing taxonomy by adding a set of new concepts. We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of <query concept, anchor concept> pairs from the existing taxonomy as training data. Using such self-supervision data, TaxoExpan learns a model to predict whether a query concept is the direct hyponym of an anchor concept. We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data. Extensive experiments on three large-scale datasets from different domains demonstrate both the effectiveness and the efficiency of TaxoExpan for taxonomy expansion.
Tasks Product Recommendation
Published 2020-01-26
URL https://arxiv.org/abs/2001.09522v1
PDF https://arxiv.org/pdf/2001.09522v1.pdf
PWC https://paperswithcode.com/paper/taxoexpan-self-supervised-taxonomy-expansion
Repo https://github.com/mickeystroller/TaxoExpan
Framework pytorch

Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation

Title Learning to Select Bi-Aspect Information for Document-Scale Text Content Manipulation
Authors Xiaocheng Feng, Yawei Sun, Bing Qin, Heng Gong, Yibo Sun, Wei Bi, Xiaojiang Liu, Ting Liu
Abstract In this paper, we focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from bi-aspect inputs respectively and generate a high-fidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the back-translation in our task for constructing some pseudo-training pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new state-of-the-art result on a sentence-level dataset.
Tasks Style Transfer, Text Style Transfer
Published 2020-02-24
URL https://arxiv.org/abs/2002.10210v1
PDF https://arxiv.org/pdf/2002.10210v1.pdf
PWC https://paperswithcode.com/paper/learning-to-select-bi-aspect-information-for
Repo https://github.com/syw1996/SCIR-TG-Data2text-Bi-Aspect
Framework pytorch

Individual Claims Forecasting with Bayesian Mixture Density Networks

Title Individual Claims Forecasting with Bayesian Mixture Density Networks
Authors Kevin Kuo
Abstract We introduce an individual claims forecasting framework utilizing Bayesian mixture density networks that can be used for claims analytics tasks such as case reserving and triaging. The proposed approach enables incorporating claims information from both structured and unstructured data sources, producing multi-period cash flow forecasts, and generating different scenarios of future payment patterns. We implement and evaluate the modeling framework using publicly available data.
Published 2020-03-05
URL https://arxiv.org/abs/2003.02453v1
PDF https://arxiv.org/pdf/2003.02453v1.pdf
PWC https://paperswithcode.com/paper/individual-claims-forecasting-with-bayesian
Repo https://github.com/kasaai/bnn-claims
Framework none

Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation

Title Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation
Authors Vikas Gupta
Abstract We propose a new deep learning network that introduces a deeper CNN channel filter and constraints as losses to reduce joint position and motion errors for 3D video human body pose estimation. Our model outperforms the previous best result from the literature based on mean per-joint position error, velocity error, and acceleration errors on the Human 3.6M benchmark corresponding to a new state-of-the-art mean error reduction in all protocols and motion metrics. Mean per joint error is reduced by 1%, velocity error by 7% and acceleration by 13% compared to the best results from the literature. Our contribution increasing positional accuracy and motion smoothness in video can be integrated with future end to end networks without increasing network complexity. Our model and code are available at https://vnmr.github.io/ Keywords: 3D, human, image, pose, action, detection, object, video, visual, supervised, joint, kinematic
Tasks 3D Human Pose Estimation, Action Detection, Pose Estimation
Published 2020-02-22
URL https://arxiv.org/abs/2002.11251v1
PDF https://arxiv.org/pdf/2002.11251v1.pdf
PWC https://paperswithcode.com/paper/back-to-the-future-joint-aware-temporal-deep
Repo https://github.com/vnmr/JointVideoPose3D
Framework pytorch

An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs

Title An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs
Authors Benedek Rozemberczki, Oliver Kiss, Rik Sarkar
Abstract We present Karate Club a Python framework combining more than 30 state-of-the-art graph mining algorithms which can solve unsupervised machine learning tasks. The primary goal of the package is to make community detection, node and whole graph embedding available to a wide audience of machine learning researchers and practitioners. We designed Karate Club with an emphasis on a consistent application interface, scalability, ease of use, sensible out of the box model behaviour, standardized dataset ingestion, and output generation. This paper discusses the design principles behind this framework with practical examples. We show Karate Club’s efficiency with respect to learning performance on a wide range of real world clustering problems, classification tasks and support evidence with regards to its competitive speed.
Tasks Community Detection, Graph Classification, Graph Embedding, Node Classification
Published 2020-03-10
URL https://arxiv.org/abs/2003.04819v1
PDF https://arxiv.org/pdf/2003.04819v1.pdf
PWC https://paperswithcode.com/paper/an-api-oriented-open-source-python-framework-1
Repo https://github.com/benedekrozemberczki/karateclub
Framework none

SOLOv2: Dynamic, Faster and Stronger

Title SOLOv2: Dynamic, Faster and Stronger
Authors Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen
Abstract In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. “SOLO: segmenting objects by locations”. Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: https://git.io/AdelaiDet
Tasks Instance Segmentation, Object Detection, Panoptic Segmentation, Semantic Segmentation
Published 2020-03-23
URL https://arxiv.org/abs/2003.10152v1
PDF https://arxiv.org/pdf/2003.10152v1.pdf
PWC https://paperswithcode.com/paper/solov2-dynamic-faster-and-stronger
Repo https://github.com/aim-uofa/AdelaiDet
Framework pytorch

Evaluation Framework For Large-scale Federated Learning

Title Evaluation Framework For Large-scale Federated Learning
Authors Lifeng Liu, Fengda Zhang, Jun Xiao, Chao Wu
Abstract Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model while keeping all the training data on device, which can not only take full advantage of data distributed across millions of nodes to train a good model but also protect data privacy. However, learning in scenario above poses new challenges. In fact, data across a massive number of unreliable devices is likely to be non-IID (identically and independently distributed), which may make the performance of models trained by federated learning unstable. In this paper, we introduce a framework designed for large-scale federated learning which consists of approaches to generating dataset and modular evaluation framework. Firstly, we construct a suite of open-source non-IID datasets by providing three respects including covariate shift, prior probability shift, and concept shift, which are grounded in real-world assumptions. In addition, we design several rigorous evaluation metrics including the number of network nodes, the size of datasets, the number of communication rounds and communication resources etc. Finally, we present an open-source benchmark for large-scale federated learning research.
Published 2020-03-03
URL https://arxiv.org/abs/2003.01575v2
PDF https://arxiv.org/pdf/2003.01575v2.pdf
PWC https://paperswithcode.com/paper/evaluation-framework-for-large-scale
Repo https://github.com/ZJU-DistributedAI/DAIDataset
Framework pytorch

Membership Inference Attacks Against Object Detection Models

Title Membership Inference Attacks Against Object Detection Models
Authors Yeachan Park, Myungjoo Kang
Abstract Machine learning models can leak information regarding the dataset they have trained. In this paper, we present the first membership inference attack against black-boxed object detection models that determines whether the given data records are used in the training. To attack the object detection model, we devise a novel method named as called a canvas method, in which predicted bounding boxes are drawn on an empty image for the attack model input. Based on the experiments, we successfully reveal the membership status of privately sensitive data trained using one-stage and two-stage detection models. We then propose defense strategies and also conduct a transfer attack between the models and datasets. Our results show that object detection models are also vulnerable to inference attacks like other models.
Tasks Inference Attack, Object Detection
Published 2020-01-12
URL https://arxiv.org/abs/2001.04011v2
PDF https://arxiv.org/pdf/2001.04011v2.pdf
PWC https://paperswithcode.com/paper/membership-inference-attacks-against-object
Repo https://github.com/yechanp/Membership-Inference-Attacks-Against-Object-Detection-Models
Framework pytorch

Fisher Deep Domain Adaptation

Title Fisher Deep Domain Adaptation
Authors Yinghua Zhang, Yu Zhang, Ying Wei, Kun Bai, Yangqiu Song, Qiang Yang
Abstract Deep domain adaptation models learn a neural network in an unlabeled target domain by leveraging the knowledge from a labeled source domain. This can be achieved by learning a domain-invariant feature space. Though the learned representations are separable in the source domain, they usually have a large variance and samples with different class labels tend to overlap in the target domain, which yields suboptimal adaptation performance. To fill the gap, a Fisher loss is proposed to learn discriminative representations which are within-class compact and between-class separable. Experimental results on two benchmark datasets show that the Fisher loss is a general and effective loss for deep domain adaptation. Noticeable improvements are brought when it is used together with widely adopted transfer criteria, including MMD, CORAL and domain adversarial loss. For example, an absolute improvement of 6.67% in terms of the mean accuracy is attained when the Fisher loss is used together with the domain adversarial loss on the Office-Home dataset.
Tasks Domain Adaptation
Published 2020-03-12
URL https://arxiv.org/abs/2003.05636v1
PDF https://arxiv.org/pdf/2003.05636v1.pdf
PWC https://paperswithcode.com/paper/fisher-deep-domain-adaptation
Repo https://github.com/HKUST-KnowComp/FisherDA
Framework pytorch

Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System

Title Automatic Compilation of Resources for Academic Writing and Evaluating with Informal Word Identification and Paraphrasing System
Authors Seid Muhie Yimam, Gopalakrishnan Venkatesh, John Sie Yuen Lee, Chris Biemann
Abstract We present the first approach to automatically building resources for academic writing. The aim is to build a writing aid system that automatically edits a text so that it better adheres to the academic style of writing. On top of existing academic resources, such as the Corpus of Contemporary American English (COCA) academic Word List, the New Academic Word List, and the Academic Collocation List, we also explore how to dynamically build such resources that would be used to automatically identify informal or non-academic words or phrases. The resources are compiled using different generic approaches that can be extended for different domains and languages. We describe the evaluation of resources with a system implementation. The system consists of an informal word identification (IWI), academic candidate paraphrase generation, and paraphrase ranking components. To generate candidates and rank them in context, we have used the PPDB and WordNet paraphrase resources. We use the Concepts in Context (CoInCO) “All-Words” lexical substitution dataset both for the informal word identification and paraphrase generation experiments. Our informal word identification component achieves an F-1 score of 82%, significantly outperforming a stratified classifier baseline. The main contribution of this work is a domain-independent methodology to build targeted resources for writing aids.
Tasks Paraphrase Generation
Published 2020-03-05
URL https://arxiv.org/abs/2003.02955v1
PDF https://arxiv.org/pdf/2003.02955v1.pdf
PWC https://paperswithcode.com/paper/automatic-compilation-of-resources-for
Repo https://github.com/uhh-lt/par4Acad
Framework tf

How to 0wn NAS in Your Spare Time

Title How to 0wn NAS in Your Spare Time
Authors Sanghyun Hong, Michael Davinroy, Yiğitcan Kaya, Dana Dachman-Soled, Tudor Dumitraş
Abstract New data processing pipelines and novel network architectures increasingly drive the success of deep learning. In consequence, the industry considers top-performing architectures as intellectual property and devotes considerable computational resources to discovering such architectures through neural architecture search (NAS). This provides an incentive for adversaries to steal these novel architectures; when used in the cloud, to provide Machine Learning as a Service, the adversaries also have an opportunity to reconstruct the architectures by exploiting a range of hardware side channels. However, it is challenging to reconstruct novel architectures and pipelines without knowing the computational graph (e.g., the layers, branches or skip connections), the architectural parameters (e.g., the number of filters in a convolutional layer) or the specific pre-processing steps (e.g. embeddings). In this paper, we design an algorithm that reconstructs the key components of a novel deep learning system by exploiting a small amount of information leakage from a cache side-channel attack, Flush+Reload. We use Flush+Reload to infer the trace of computations and the timing for each computation. Our algorithm then generates candidate computational graphs from the trace and eliminates incompatible candidates through a parameter estimation process. We implement our algorithm in PyTorch and Tensorflow. We demonstrate experimentally that we can reconstruct MalConv, a novel data pre-processing pipeline for malware detection, and ProxylessNAS- CPU, a novel network architecture for the ImageNet classification optimized to run on CPUs, without knowing the architecture family. In both cases, we achieve 0% error. These results suggest hardware side channels are a practical attack vector against MLaaS, and more efforts should be devoted to understanding their impact on the security of deep learning systems.
Tasks Malware Detection, Neural Architecture Search
Published 2020-02-17
URL https://arxiv.org/abs/2002.06776v1
PDF https://arxiv.org/pdf/2002.06776v1.pdf
PWC https://paperswithcode.com/paper/how-to-0wn-nas-in-your-spare-time
Repo https://github.com/Sanghyun-Hong/How-to-0wn-NAS-in-Your-Spare-Time
Framework none
comments powered by Disqus