April 3, 2020

3210 words 16 mins read

Paper Group AWR 1

A Sparse Deep Factorization Machine for Efficient CTR prediction. Probability Weighted Compact Feature for Domain Adaptive Retrieval. Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning. Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features. SQLFlow: A Bridg …

A Sparse Deep Factorization Machine for Efficient CTR prediction


Title	A Sparse Deep Factorization Machine for Efficient CTR prediction
Authors	Wei Deng, Junwei Pan, Tian Zhou, Aaron Flores, Guang Lin
Abstract	Click-through rate (CTR) prediction is a crucial task in online display advertising and the key part is to learn important feature interactions. The mainstream models are embedding-based neural networks that provide end-to-end training by incorporating hybrid components to model both low-order and high-order feature interactions. These models, however, slow down the prediction inference by at least hundreds of times due to the deep neural network (DNN) component. Considering the challenge of deploying embedding-based neural networks for online advertising, we propose to prune the redundant parameters for the first time to accelerate the inference and reduce the run-time memory usage. Most notably, we can accelerate the inference by 46X on Criteo dataset and 27X on Avazu dataset without loss on the prediction accuracy. In addition, the deep model acceleration makes an efficient model ensemble possible with low latency and significant gains on the performance.
Tasks	Click-Through Rate Prediction
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06987v1
PDF	https://arxiv.org/pdf/2002.06987v1.pdf
PWC	https://paperswithcode.com/paper/a-sparse-deep-factorization-machine-for
Repo	https://github.com/WayneDW/sDeepFwFM
Framework	pytorch

Probability Weighted Compact Feature for Domain Adaptive Retrieval


Title	Probability Weighted Compact Feature for Domain Adaptive Retrieval
Authors	Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou
Abstract	Domain adaptive image retrieval includes single-domain retrieval and cross-domain retrieval. Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar. However, in practical application, the discrepancies between retrieval databases often taken in ideal illumination/pose/background/camera conditions and queries usually obtained in uncontrolled conditions are very large. In this paper, considering the practical application, we focus on challenging cross-domain retrieval. To address the problem, we propose an effective method named Probability Weighted Compact Feature Learning (PWCF), which provides inter-domain correlation guidance to promote cross-domain retrieval accuracy and learns a series of compact binary codes to improve the retrieval speed. First, we derive our loss function through the Maximum A Posteriori Estimation (MAP): Bayesian Perspective (BP) induced focal-triplet loss, BP induced quantization loss and BP induced classification loss. Second, we propose a common manifold structure between domains to explore the potential correlation across domains. Considering the original feature representation is biased due to the inter-domain discrepancy, the manifold structure is difficult to be constructed. Therefore, we propose a new feature named Histogram Feature of Neighbors (HFON) from the sample statistics perspective. Extensive experiments on various benchmark databases validate that our method outperforms many state-of-the-art image retrieval methods for domain adaptive image retrieval. The source code is available at https://github.com/fuxianghuang1/PWCF
Tasks	Image Retrieval, Quantization
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03293v1
PDF	https://arxiv.org/pdf/2003.03293v1.pdf
PWC	https://paperswithcode.com/paper/probability-weighted-compact-feature-for
Repo	https://github.com/fuxianghuang1/PWCF
Framework	none

Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning


Title	Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning
Authors	Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang
Abstract	Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks where the training objectives are given naturally according to the nature of the utterance and the structure of the multi-role conversation. Meanwhile, in order to locate essential information for dialogue summarization/extraction, the pretraining process enables external knowledge integration. The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders.
Tasks	Dialogue Understanding, Question Answering, Representation Learning
Published	2020-02-27
URL	https://arxiv.org/abs/2003.04994v1
PDF	https://arxiv.org/pdf/2003.04994v1.pdf
PWC	https://paperswithcode.com/paper/masking-orchestration-multi-task-pretraining
Repo	https://github.com/wangtianyiftd/dialogue_pretrain
Framework	none

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features


Title	Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
Authors	Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas
Abstract	Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval.
Tasks	Fine-Grained Image Classification, Image Classification, Image Retrieval, Question Answering, Visual Question Answering
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04732v1
PDF	https://arxiv.org/pdf/2001.04732v1.pdf
PWC	https://paperswithcode.com/paper/fine-grained-image-classification-and
Repo	https://github.com/DreadPiratePsyopus/Fine_Grained_Clf
Framework	pytorch

SQLFlow: A Bridge between SQL and Machine Learning


Title	SQLFlow: A Bridge between SQL and Machine Learning
Authors	Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, Yuan Tang
Abstract	Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques – supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.
Tasks
Published	2020-01-19
URL	https://arxiv.org/abs/2001.06846v1
PDF	https://arxiv.org/pdf/2001.06846v1.pdf
PWC	https://paperswithcode.com/paper/sqlflow-a-bridge-between-sql-and-machine
Repo	https://github.com/sql-machine-learning/sqlflow
Framework	tf

Representation Learning for Medical Data


Title	Representation Learning for Medical Data
Authors	Karol Antczak
Abstract	We propose a representation learning framework for medical diagnosis domain. It is based on heterogeneous network-based model of diagnostic data as well as modified metapath2vec algorithm for learning latent node representation. We compare the proposed algorithm with other representation learning methods in two practical case studies: symptom/disease classification and disease prediction. We observe a significant performance boost in these task resulting from learning representations of domain data in a form of heterogeneous network.
Tasks	Disease Prediction, Medical Diagnosis, Representation Learning
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08269v1
PDF	https://arxiv.org/pdf/2001.08269v1.pdf
PWC	https://paperswithcode.com/paper/representation-learning-for-medical-data
Repo	https://github.com/KarolAntczak/multimetapath2vec
Framework	none

JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention


Title	JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention
Authors	Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma
Abstract	Facial action unit (AU) detection and face alignment are two highly correlated tasks, since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. However, most existing AU detection works handle the two tasks independently by treating face alignment as a preprocessing, and often use landmarks to predefine a fixed region or attention for each AU. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared feature is learned firstly, and high-level feature of face alignment is fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment feature and global feature for AU detection. Extensive experiments demonstrate that our framework (i) significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks, (ii) can adaptively capture the irregular region of each AU, (iii) achieves competitive performance for face alignment, and (iv) also works well under partial occlusions and non-frontal poses. The code for our method is available at https://github.com/ZhiwenShao/PyTorch-JAANet.
Tasks	Action Unit Detection, Face Alignment, Facial Action Unit Detection
Published	2020-03-18
URL	https://arxiv.org/abs/2003.08834v1
PDF	https://arxiv.org/pdf/2003.08834v1.pdf
PWC	https://paperswithcode.com/paper/jaa-net-joint-facial-action-unit-detection
Repo	https://github.com/ZhiwenShao/PyTorch-JAANet
Framework	pytorch

A Large Scale Event-based Detection Dataset for Automotive


Title	A Large Scale Event-based Detection Dataset for Automotive
Authors	Pierre de Tournemire, Davide Nitti, Etienne Perot, Davide Migliore, Amos Sironi
Abstract	We introduce the first very large detection dataset for event cameras. The dataset is composed of more than 39 hours of automotive recordings acquired with a 304x240 ATIS sensor. It contains open roads and very diverse driving scenarios, ranging from urban, highway, suburbs and countryside scenes, as well as different weather and illumination conditions. Manual bounding box annotations of cars and pedestrians contained in the recordings are also provided at a frequency between 1 and 4Hz, yielding more than 255,000 labels in total. We believe that the availability of a labeled dataset of this size will contribute to major advances in event-based vision tasks such as object detection and classification. We also expect benefits in other tasks such as optical flow, structure from motion and tracking, where for example, the large amount of data can be leveraged by self-supervised learning methods.
Tasks	Event-based vision, Object Detection, Optical Flow Estimation
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08499v3
PDF	https://arxiv.org/pdf/2001.08499v3.pdf
PWC	https://paperswithcode.com/paper/a-large-scale-event-based-detection-dataset
Repo	https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox
Framework	none

CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking


Title	CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking
Authors	Grigori Fursin, Herve Guillou, Nicolas Essayan
Abstract	We present CodeReef - an open platform to share all the components necessary to enable cross-platform MLOps (MLSysOps), i.e. automating the deployment of ML models across diverse systems in the most efficient way. We also introduce the CodeReef solution - a way to package and share models as non-virtualized, portable, customizable and reproducible archive files. Such ML packages include JSON meta description of models with all dependencies, Python APIs, CLI actions and portable workflows necessary to automatically build, benchmark, test and customize models across diverse platforms, AI frameworks, libraries, compilers and datasets. We demonstrate several CodeReef solutions to automatically build, run and measure object detection based on SSD-Mobilenets, TensorFlow and COCO dataset from the latest MLPerf inference benchmark across a wide range of platforms from Raspberry Pi, Android phones and IoT devices to data centers. Our long-term goal is to help researchers share their new techniques as production-ready packages along with research papers to participate in collaborative and reproducible benchmarking, compare the different ML/software/hardware stacks and select the most efficient ones on a Pareto frontier using online CodeReef dashboards.
Tasks	Object Detection
Published	2020-01-22
URL	https://arxiv.org/abs/2001.07935v2
PDF	https://arxiv.org/pdf/2001.07935v2.pdf
PWC	https://paperswithcode.com/paper/codereef-an-open-platform-for-portable-mlops
Repo	https://github.com/ctuning/ck
Framework	none

LaProp: a Better Way to Combine Momentum with Adaptive Gradient


Title	LaProp: a Better Way to Combine Momentum with Adaptive Gradient
Authors	Liu Ziyin, Zhikang T. Wang, Masahito Ueda
Abstract	Identifying a divergence problem in Adam, we propose a new optimizer, LaProp, which belongs to the family of adaptive gradient descent methods. This method allows for greater flexibility in choosing its hyperparameters, mitigates the effort of fine tuning, and permits straightforward interpolation between the signed gradient methods and the adaptive gradient methods. We bound the regret of LaProp on a convex problem and show that our bound differs from the previous methods by a key factor, which demonstrates its advantage. We experimentally show that LaProp outperforms the previous methods on a toy task with noisy gradients, optimization of extremely deep fully-connected networks, neural art style transfer, natural language processing using transformers, and reinforcement learning with deep-Q networks. The performance improvement of LaProp is shown to be consistent, sometimes dramatic and qualitative.
Tasks	Style Transfer
Published	2020-02-12
URL	https://arxiv.org/abs/2002.04839v1
PDF	https://arxiv.org/pdf/2002.04839v1.pdf
PWC	https://paperswithcode.com/paper/laprop-a-better-way-to-combine-momentum-with
Repo	https://github.com/Z-T-WANG/LaProp-Optimizer
Framework	pytorch

Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding


Title	Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding
Authors	Yibo Yang, Robert Bamler, Stephan Mandt
Abstract	Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the “plug and play” nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.
Tasks	Image Compression, Latent Variable Models, Word Embeddings
Published	2020-02-18
URL	https://arxiv.org/abs/2002.08158v1
PDF	https://arxiv.org/pdf/2002.08158v1.pdf
PWC	https://paperswithcode.com/paper/variable-bitrate-neural-compression-via
Repo	https://github.com/mandt-lab/bayesian-ac
Framework	tf

High-Fidelity Synthesis with Disentangled Representation


Title	High-Fidelity Synthesis with Disentangled Representation
Authors	Wonkwang Lee, Donggyun Kim, Seunghoon Hong, Honglak Lee
Abstract	Learning disentangled representation of data without supervision is an important step towards improving the interpretability of generative models. Despite recent advances in disentangled representation learning, existing approaches often suffer from the trade-off between representation learning and generation performance i.e. improving generation quality sacrifices disentanglement performance). We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that easily incorporates the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. To ensure that both generative models are aligned to render the same generative factors, we further constrain the GAN generator to maximize the mutual information between the learned latent code and the output. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation. We also show that the proposed decomposition leads to an efficient and stable model design, and we demonstrate photo-realistic high-resolution image synthesis results (1024x1024 pixels) for the first time using the disentangled representations.
Tasks	Image Generation, Representation Learning
Published	2020-01-13
URL	https://arxiv.org/abs/2001.04296v1
PDF	https://arxiv.org/pdf/2001.04296v1.pdf
PWC	https://paperswithcode.com/paper/high-fidelity-synthesis-with-disentangled
Repo	https://github.com/rosinality/id-gan-pytorch
Framework	pytorch

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs


Title	A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs
Authors	Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, Chengkai Li
Abstract	Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained.
Tasks	Entity Alignment, Knowledge Graphs
Published	2020-03-10
URL	https://arxiv.org/abs/2003.07743v1
PDF	https://arxiv.org/pdf/2003.07743v1.pdf
PWC	https://paperswithcode.com/paper/a-benchmarking-study-of-embedding-based
Repo	https://github.com/nju-websoft/OpenEA
Framework	tf

Learning Dynamic Routing for Semantic Segmentation


Title	Learning Dynamic Routing for Semantic Segmentation
Authors	Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun
Abstract	Recently, numerous handcrafted and searched networks have been applied for semantic segmentation. However, previous works intend to handle inputs with various scales in pre-defined static architectures, such as FCN, U-Net, and DeepLab series. This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image. To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly. In addition, the computational cost can be further reduced in an end-to-end manner by giving budget constraints to the gating function. We further relax the network level routing space to support multi-path propagations and skip-connections in each forward, bringing substantial network capacity. To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space. Extensive experiments are conducted on Cityscapes and PASCAL VOC 2012 to illustrate the effectiveness of the dynamic framework. Code is available at https://github.com/yanwei-li/DynamicRouting.
Tasks	Semantic Segmentation
Published	2020-03-23
URL	https://arxiv.org/abs/2003.10401v1
PDF	https://arxiv.org/pdf/2003.10401v1.pdf
PWC	https://paperswithcode.com/paper/learning-dynamic-routing-for-semantic
Repo	https://github.com/yanwei-li/DynamicRouting
Framework	pytorch

BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation


Title	BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation
Authors	Quan Zhou, Dechun Cong, Bin Kang, Xiaofu Wu, Baoyu Zheng, Huimin Lu, Longin Jan Latecki
Abstract	Exploring contextual information in convolution neural networks (CNNs) has gained substantial attention in recent years for semantic segmentation. This paper introduces a Bi-directional Contextual Aggregating Network, called BiCANet, for semantic segmentation. Unlike previous approaches that encode context in feature space, BiCANet aggregates contextual cues from a categorical perspective, which is mainly consist of three parts: contextual condensed projection block (CCPB), bi-directional context interaction block (BCIB), and muti-scale contextual fusion block (MCFB). More specifically, CCPB learns a category-based mapping through a split-transform-merge architecture, which condenses contextual cues with different receptive fields from intermediate layer. BCIB, on the other hand, employs dense skipped-connections to enhance the class-level context exchanging. Finally, MCFB integrates multi-scale contextual cues by investigating short- and long-ranged spatial dependencies. To evaluate BiCANet, we have conducted extensive experiments on three semantic segmentation datasets: PASCAL VOC 2012, Cityscapes, and ADE20K. The experimental results demonstrate that BiCANet outperforms recent state-of-the-art networks without any postprocess techniques. Particularly, BiCANet achieves the mIoU score of 86.7%, 82.4% and 38.66% on PASCAL VOC 2012, Cityscapes and ADE20K testset, respectively.
Tasks	Semantic Segmentation
Published	2020-03-21
URL	https://arxiv.org/abs/2003.09669v1
PDF	https://arxiv.org/pdf/2003.09669v1.pdf
PWC	https://paperswithcode.com/paper/bicanet-bi-directional-contextual-aggregating
Repo	https://github.com/cdcnjupt/BCANet
Framework	none