April 3, 2020

3210 words 16 mins read

Paper Group AWR 1

Paper Group AWR 1

A Sparse Deep Factorization Machine for Efficient CTR prediction. Probability Weighted Compact Feature for Domain Adaptive Retrieval. Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning. Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features. SQLFlow: A Bridg …

A Sparse Deep Factorization Machine for Efficient CTR prediction

Title A Sparse Deep Factorization Machine for Efficient CTR prediction
Authors Wei Deng, Junwei Pan, Tian Zhou, Aaron Flores, Guang Lin
Abstract Click-through rate (CTR) prediction is a crucial task in online display advertising and the key part is to learn important feature interactions. The mainstream models are embedding-based neural networks that provide end-to-end training by incorporating hybrid components to model both low-order and high-order feature interactions. These models, however, slow down the prediction inference by at least hundreds of times due to the deep neural network (DNN) component. Considering the challenge of deploying embedding-based neural networks for online advertising, we propose to prune the redundant parameters for the first time to accelerate the inference and reduce the run-time memory usage. Most notably, we can accelerate the inference by 46X on Criteo dataset and 27X on Avazu dataset without loss on the prediction accuracy. In addition, the deep model acceleration makes an efficient model ensemble possible with low latency and significant gains on the performance.
Tasks Click-Through Rate Prediction
Published 2020-02-17
URL https://arxiv.org/abs/2002.06987v1
PDF https://arxiv.org/pdf/2002.06987v1.pdf
PWC https://paperswithcode.com/paper/a-sparse-deep-factorization-machine-for
Repo https://github.com/WayneDW/sDeepFwFM
Framework pytorch

Probability Weighted Compact Feature for Domain Adaptive Retrieval

Title Probability Weighted Compact Feature for Domain Adaptive Retrieval
Authors Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou
Abstract Domain adaptive image retrieval includes single-domain retrieval and cross-domain retrieval. Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar. However, in practical application, the discrepancies between retrieval databases often taken in ideal illumination/pose/background/camera conditions and queries usually obtained in uncontrolled conditions are very large. In this paper, considering the practical application, we focus on challenging cross-domain retrieval. To address the problem, we propose an effective method named Probability Weighted Compact Feature Learning (PWCF), which provides inter-domain correlation guidance to promote cross-domain retrieval accuracy and learns a series of compact binary codes to improve the retrieval speed. First, we derive our loss function through the Maximum A Posteriori Estimation (MAP): Bayesian Perspective (BP) induced focal-triplet loss, BP induced quantization loss and BP induced classification loss. Second, we propose a common manifold structure between domains to explore the potential correlation across domains. Considering the original feature representation is biased due to the inter-domain discrepancy, the manifold structure is difficult to be constructed. Therefore, we propose a new feature named Histogram Feature of Neighbors (HFON) from the sample statistics perspective. Extensive experiments on various benchmark databases validate that our method outperforms many state-of-the-art image retrieval methods for domain adaptive image retrieval. The source code is available at https://github.com/fuxianghuang1/PWCF
Tasks Image Retrieval, Quantization
Published 2020-03-06
URL https://arxiv.org/abs/2003.03293v1
PDF https://arxiv.org/pdf/2003.03293v1.pdf
PWC https://paperswithcode.com/paper/probability-weighted-compact-feature-for
Repo https://github.com/fuxianghuang1/PWCF
Framework none

Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning

Title Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning
Authors Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang
Abstract Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks where the training objectives are given naturally according to the nature of the utterance and the structure of the multi-role conversation. Meanwhile, in order to locate essential information for dialogue summarization/extraction, the pretraining process enables external knowledge integration. The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders.
Tasks Dialogue Understanding, Question Answering, Representation Learning
Published 2020-02-27
URL https://arxiv.org/abs/2003.04994v1
PDF https://arxiv.org/pdf/2003.04994v1.pdf
PWC https://paperswithcode.com/paper/masking-orchestration-multi-task-pretraining
Repo https://github.com/wangtianyiftd/dialogue_pretrain
Framework none

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

Title Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
Authors Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Abstract Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
Tasks Fine-Grained Image Classification, Image Classification, Image Retrieval, Question Answering, Visual Question Answering
Published 2020-01-14
URL https://arxiv.org/abs/2001.04732v1
PDF https://arxiv.org/pdf/2001.04732v1.pdf
PWC https://paperswithcode.com/paper/fine-grained-image-classification-and
Repo https://github.com/DreadPiratePsyopus/Fine_Grained_Clf
Framework pytorch

SQLFlow: A Bridge between SQL and Machine Learning

Title SQLFlow: A Bridge between SQL and Machine Learning
Authors Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, Yuan Tang
Abstract Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques – supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.
Published 2020-01-19
URL https://arxiv.org/abs/2001.06846v1
PDF https://arxiv.org/pdf/2001.06846v1.pdf
PWC https://paperswithcode.com/paper/sqlflow-a-bridge-between-sql-and-machine
Repo https://github.com/sql-machine-learning/sqlflow
Framework tf

Representation Learning for Medical Data

Title Representation Learning for Medical Data
Authors Karol Antczak
Abstract We propose a representation learning framework for medical diagnosis domain. It is based on heterogeneous network-based model of diagnostic data as well as modified metapath2vec algorithm for learning latent node representation. We compare the proposed algorithm with other representation learning methods in two practical case studies: symptom/disease classification and disease prediction. We observe a significant performance boost in these task resulting from learning representations of domain data in a form of heterogeneous network.
Tasks Disease Prediction, Medical Diagnosis, Representation Learning
Published 2020-01-22
URL https://arxiv.org/abs/2001.08269v1
PDF https://arxiv.org/pdf/2001.08269v1.pdf
PWC https://paperswithcode.com/paper/representation-learning-for-medical-data
Repo https://github.com/KarolAntczak/multimetapath2vec
Framework none

JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention

Title JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention
Authors Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma
Abstract Facial action unit (AU) detection and face alignment are two highly correlated tasks, since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. However, most existing AU detection works handle the two tasks independently by treating face alignment as a preprocessing, and often use landmarks to predefine a fixed region or attention for each AU. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared feature is learned firstly, and high-level feature of face alignment is fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment feature and global feature for AU detection. Extensive experiments demonstrate that our framework (i) significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks, (ii) can adaptively capture the irregular region of each AU, (iii) achieves competitive performance for face alignment, and (iv) also works well under partial occlusions and non-frontal poses. The code for our method is available at https://github.com/ZhiwenShao/PyTorch-JAANet.
Tasks Action Unit Detection, Face Alignment, Facial Action Unit Detection
Published 2020-03-18
URL https://arxiv.org/abs/2003.08834v1
PDF https://arxiv.org/pdf/2003.08834v1.pdf
PWC https://paperswithcode.com/paper/jaa-net-joint-facial-action-unit-detection
Repo https://github.com/ZhiwenShao/PyTorch-JAANet
Framework pytorch

A Large Scale Event-based Detection Dataset for Automotive

Title A Large Scale Event-based Detection Dataset for Automotive
Authors Pierre de Tournemire, Davide Nitti, Etienne Perot, Davide Migliore, Amos Sironi
Abstract We introduce the first very large detection dataset for event cameras. The dataset is composed of more than 39 hours of automotive recordings acquired with a 304x240 ATIS sensor. It contains open roads and very diverse driving scenarios, ranging from urban, highway, suburbs and countryside scenes, as well as different weather and illumination conditions. Manual bounding box annotations of cars and pedestrians contained in the recordings are also provided at a frequency between 1 and 4Hz, yielding more than 255,000 labels in total. We believe that the availability of a labeled dataset of this size will contribute to major advances in event-based vision tasks such as object detection and classification. We also expect benefits in other tasks such as optical flow, structure from motion and tracking, where for example, the large amount of data can be leveraged by self-supervised learning methods.
Tasks Event-based vision, Object Detection, Optical Flow Estimation
Published 2020-01-23
URL https://arxiv.org/abs/2001.08499v3
PDF https://arxiv.org/pdf/2001.08499v3.pdf
PWC https://paperswithcode.com/paper/a-large-scale-event-based-detection-dataset
Repo https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox
Framework none

CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking

Title CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking
Authors Grigori Fursin, Herve Guillou, Nicolas Essayan
Abstract We present CodeReef - an open platform to share all the components necessary to enable cross-platform MLOps (MLSysOps), i.e. automating the deployment of ML models across diverse systems in the most efficient way. We also introduce the CodeReef solution - a way to package and share models as non-virtualized, portable, customizable and reproducible archive files. Such ML packages include JSON meta description of models with all dependencies, Python APIs, CLI actions and portable workflows necessary to automatically build, benchmark, test and customize models across diverse platforms, AI frameworks, libraries, compilers and datasets. We demonstrate several CodeReef solutions to automatically build, run and measure object detection based on SSD-Mobilenets, TensorFlow and COCO dataset from the latest MLPerf inference benchmark across a wide range of platforms from Raspberry Pi, Android phones and IoT devices to data centers. Our long-term goal is to help researchers share their new techniques as production-ready packages along with research papers to participate in collaborative and reproducible benchmarking, compare the different ML/software/hardware stacks and select the most efficient ones on a Pareto frontier using online CodeReef dashboards.
Tasks Object Detection
Published 2020-01-22
URL https://arxiv.org/abs/2001.07935v2
PDF https://arxiv.org/pdf/2001.07935v2.pdf
PWC https://paperswithcode.com/paper/codereef-an-open-platform-for-portable-mlops
Repo https://github.com/ctuning/ck
Framework none

LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Title LaProp: a Better Way to Combine Momentum with Adaptive Gradient
Authors Liu Ziyin, Zhikang T. Wang, Masahito Ueda
Abstract Identifying a divergence problem in Adam, we propose a new optimizer, LaProp, which belongs to the family of adaptive gradient descent methods. This method allows for greater flexibility in choosing its hyperparameters, mitigates the effort of fine tuning, and permits straightforward interpolation between the signed gradient methods and the adaptive gradient methods. We bound the regret of LaProp on a convex problem and show that our bound differs from the previous methods by a key factor, which demonstrates its advantage. We experimentally show that LaProp outperforms the previous methods on a toy task with noisy gradients, optimization of extremely deep fully-connected networks, neural art style transfer, natural language processing using transformers, and reinforcement learning with deep-Q networks. The performance improvement of LaProp is shown to be consistent, sometimes dramatic and qualitative.
Tasks Style Transfer
Published 2020-02-12
URL https://arxiv.org/abs/2002.04839v1
PDF https://arxiv.org/pdf/2002.04839v1.pdf
PWC https://paperswithcode.com/paper/laprop-a-better-way-to-combine-momentum-with
Repo https://github.com/Z-T-WANG/LaProp-Optimizer
Framework pytorch

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding

Title Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding
Authors Yibo Yang, Robert Bamler, Stephan Mandt
Abstract Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the “plug and play” nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.
Tasks Image Compression, Latent Variable Models, Word Embeddings
Published 2020-02-18
URL https://arxiv.org/abs/2002.08158v1
PDF https://arxiv.org/pdf/2002.08158v1.pdf
PWC https://paperswithcode.com/paper/variable-bitrate-neural-compression-via
Repo https://github.com/mandt-lab/bayesian-ac
Framework tf

High-Fidelity Synthesis with Disentangled Representation

Title High-Fidelity Synthesis with Disentangled Representation
Authors Wonkwang Lee, Donggyun Kim, Seunghoon Hong, Honglak Lee
Abstract Learning disentangled representation of data without supervision is an important step towards improving the interpretability of generative models. Despite recent advances in disentangled representation learning, existing approaches often suffer from the trade-off between representation learning and generation performance i.e. improving generation quality sacrifices disentanglement performance). We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that easily incorporates the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. To ensure that both generative models are aligned to render the same generative factors, we further constrain the GAN generator to maximize the mutual information between the learned latent code and the output. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation. We also show that the proposed decomposition leads to an efficient and stable model design, and we demonstrate photo-realistic high-resolution image synthesis results (1024x1024 pixels) for the first time using the disentangled representations.
Tasks Image Generation, Representation Learning
Published 2020-01-13
URL https://arxiv.org/abs/2001.04296v1
PDF https://arxiv.org/pdf/2001.04296v1.pdf
PWC https://paperswithcode.com/paper/high-fidelity-synthesis-with-disentangled
Repo https://github.com/rosinality/id-gan-pytorch
Framework pytorch

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

Title A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs
Authors Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, Chengkai Li
Abstract Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained.
Tasks Entity Alignment, Knowledge Graphs
Published 2020-03-10
URL https://arxiv.org/abs/2003.07743v1
PDF https://arxiv.org/pdf/2003.07743v1.pdf
PWC https://paperswithcode.com/paper/a-benchmarking-study-of-embedding-based
Repo https://github.com/nju-websoft/OpenEA
Framework tf

Learning Dynamic Routing for Semantic Segmentation

Title Learning Dynamic Routing for Semantic Segmentation
Authors Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun
Abstract Recently, numerous handcrafted and searched networks have been applied for semantic segmentation. However, previous works intend to handle inputs with various scales in pre-defined static architectures, such as FCN, U-Net, and DeepLab series. This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image. To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly. In addition, the computational cost can be further reduced in an end-to-end manner by giving budget constraints to the gating function. We further relax the network level routing space to support multi-path propagations and skip-connections in each forward, bringing substantial network capacity. To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space. Extensive experiments are conducted on Cityscapes and PASCAL VOC 2012 to illustrate the effectiveness of the dynamic framework. Code is available at https://github.com/yanwei-li/DynamicRouting.
Tasks Semantic Segmentation
Published 2020-03-23
URL https://arxiv.org/abs/2003.10401v1
PDF https://arxiv.org/pdf/2003.10401v1.pdf
PWC https://paperswithcode.com/paper/learning-dynamic-routing-for-semantic
Repo https://github.com/yanwei-li/DynamicRouting
Framework pytorch

BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation

Title BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation
Authors Quan Zhou, Dechun Cong, Bin Kang, Xiaofu Wu, Baoyu Zheng, Huimin Lu, Longin Jan Latecki
Abstract Exploring contextual information in convolution neural networks (CNNs) has gained substantial attention in recent years for semantic segmentation. This paper introduces a Bi-directional Contextual Aggregating Network, called BiCANet, for semantic segmentation. Unlike previous approaches that encode context in feature space, BiCANet aggregates contextual cues from a categorical perspective, which is mainly consist of three parts: contextual condensed projection block (CCPB), bi-directional context interaction block (BCIB), and muti-scale contextual fusion block (MCFB). More specifically, CCPB learns a category-based mapping through a split-transform-merge architecture, which condenses contextual cues with different receptive fields from intermediate layer. BCIB, on the other hand, employs dense skipped-connections to enhance the class-level context exchanging. Finally, MCFB integrates multi-scale contextual cues by investigating short- and long-ranged spatial dependencies. To evaluate BiCANet, we have conducted extensive experiments on three semantic segmentation datasets: PASCAL VOC 2012, Cityscapes, and ADE20K. The experimental results demonstrate that BiCANet outperforms recent state-of-the-art networks without any postprocess techniques. Particularly, BiCANet achieves the mIoU score of 86.7%, 82.4% and 38.66% on PASCAL VOC 2012, Cityscapes and ADE20K testset, respectively.
Tasks Semantic Segmentation
Published 2020-03-21
URL https://arxiv.org/abs/2003.09669v1
PDF https://arxiv.org/pdf/2003.09669v1.pdf
PWC https://paperswithcode.com/paper/bicanet-bi-directional-contextual-aggregating
Repo https://github.com/cdcnjupt/BCANet
Framework none
comments powered by Disqus