Paper Group AWR 1
A Sparse Deep Factorization Machine for Efficient CTR prediction. Probability Weighted Compact Feature for Domain Adaptive Retrieval. Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning. Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features. SQLFlow: A Bridg …
A Sparse Deep Factorization Machine for Efficient CTR prediction
Title | A Sparse Deep Factorization Machine for Efficient CTR prediction |
Authors | Wei Deng, Junwei Pan, Tian Zhou, Aaron Flores, Guang Lin |
Abstract | Click-through rate (CTR) prediction is a crucial task in online display advertising and the key part is to learn important feature interactions. The mainstream models are embedding-based neural networks that provide end-to-end training by incorporating hybrid components to model both low-order and high-order feature interactions. These models, however, slow down the prediction inference by at least hundreds of times due to the deep neural network (DNN) component. Considering the challenge of deploying embedding-based neural networks for online advertising, we propose to prune the redundant parameters for the first time to accelerate the inference and reduce the run-time memory usage. Most notably, we can accelerate the inference by 46X on Criteo dataset and 27X on Avazu dataset without loss on the prediction accuracy. In addition, the deep model acceleration makes an efficient model ensemble possible with low latency and significant gains on the performance. |
Tasks | Click-Through Rate Prediction |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.06987v1 |
https://arxiv.org/pdf/2002.06987v1.pdf | |
PWC | https://paperswithcode.com/paper/a-sparse-deep-factorization-machine-for |
Repo | https://github.com/WayneDW/sDeepFwFM |
Framework | pytorch |
Probability Weighted Compact Feature for Domain Adaptive Retrieval
Title | Probability Weighted Compact Feature for Domain Adaptive Retrieval |
Authors | Fuxiang Huang, Lei Zhang, Yang Yang, Xichuan Zhou |
Abstract | Domain adaptive image retrieval includes single-domain retrieval and cross-domain retrieval. Most of the existing image retrieval methods only focus on single-domain retrieval, which assumes that the distributions of retrieval databases and queries are similar. However, in practical application, the discrepancies between retrieval databases often taken in ideal illumination/pose/background/camera conditions and queries usually obtained in uncontrolled conditions are very large. In this paper, considering the practical application, we focus on challenging cross-domain retrieval. To address the problem, we propose an effective method named Probability Weighted Compact Feature Learning (PWCF), which provides inter-domain correlation guidance to promote cross-domain retrieval accuracy and learns a series of compact binary codes to improve the retrieval speed. First, we derive our loss function through the Maximum A Posteriori Estimation (MAP): Bayesian Perspective (BP) induced focal-triplet loss, BP induced quantization loss and BP induced classification loss. Second, we propose a common manifold structure between domains to explore the potential correlation across domains. Considering the original feature representation is biased due to the inter-domain discrepancy, the manifold structure is difficult to be constructed. Therefore, we propose a new feature named Histogram Feature of Neighbors (HFON) from the sample statistics perspective. Extensive experiments on various benchmark databases validate that our method outperforms many state-of-the-art image retrieval methods for domain adaptive image retrieval. The source code is available at https://github.com/fuxianghuang1/PWCF |
Tasks | Image Retrieval, Quantization |
Published | 2020-03-06 |
URL | https://arxiv.org/abs/2003.03293v1 |
https://arxiv.org/pdf/2003.03293v1.pdf | |
PWC | https://paperswithcode.com/paper/probability-weighted-compact-feature-for |
Repo | https://github.com/fuxianghuang1/PWCF |
Framework | none |
Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning
Title | Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning |
Authors | Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang |
Abstract | Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc. While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive. In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks where the training objectives are given naturally according to the nature of the utterance and the structure of the multi-role conversation. Meanwhile, in order to locate essential information for dialogue summarization/extraction, the pretraining process enables external knowledge integration. The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders. |
Tasks | Dialogue Understanding, Question Answering, Representation Learning |
Published | 2020-02-27 |
URL | https://arxiv.org/abs/2003.04994v1 |
https://arxiv.org/pdf/2003.04994v1.pdf | |
PWC | https://paperswithcode.com/paper/masking-orchestration-multi-task-pretraining |
Repo | https://github.com/wangtianyiftd/dialogue_pretrain |
Framework | none |
Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features
Title | Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features |
Authors | Andres Mafla, Sounak Dey, Ali Furkan Biten, Lluis Gomez, Dimosthenis Karatzas |
Abstract | Text contained in an image carries high-level semantics that can be exploited to achieve richer image understanding. In particular, the mere presence of text provides strong guiding content that should be employed to tackle a diversity of computer vision tasks such as image retrieval, fine-grained classification, and visual question answering. In this paper, we address the problem of fine-grained classification and image retrieval by leveraging textual information along with visual cues to comprehend the existing intrinsic relation between the two modalities. The novelty of the proposed model consists of the usage of a PHOC descriptor to construct a bag of textual words along with a Fisher Vector Encoding that captures the morphology of text. This approach provides a stronger multimodal representation for this task and as our experiments demonstrate, it achieves state-of-the-art results on two different tasks, fine-grained classification and image retrieval. |
Tasks | Fine-Grained Image Classification, Image Classification, Image Retrieval, Question Answering, Visual Question Answering |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04732v1 |
https://arxiv.org/pdf/2001.04732v1.pdf | |
PWC | https://paperswithcode.com/paper/fine-grained-image-classification-and |
Repo | https://github.com/DreadPiratePsyopus/Fine_Grained_Clf |
Framework | pytorch |
SQLFlow: A Bridge between SQL and Machine Learning
Title | SQLFlow: A Bridge between SQL and Machine Learning |
Authors | Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, Yuan Tang |
Abstract | Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques – supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group. |
Tasks | |
Published | 2020-01-19 |
URL | https://arxiv.org/abs/2001.06846v1 |
https://arxiv.org/pdf/2001.06846v1.pdf | |
PWC | https://paperswithcode.com/paper/sqlflow-a-bridge-between-sql-and-machine |
Repo | https://github.com/sql-machine-learning/sqlflow |
Framework | tf |
Representation Learning for Medical Data
Title | Representation Learning for Medical Data |
Authors | Karol Antczak |
Abstract | We propose a representation learning framework for medical diagnosis domain. It is based on heterogeneous network-based model of diagnostic data as well as modified metapath2vec algorithm for learning latent node representation. We compare the proposed algorithm with other representation learning methods in two practical case studies: symptom/disease classification and disease prediction. We observe a significant performance boost in these task resulting from learning representations of domain data in a form of heterogeneous network. |
Tasks | Disease Prediction, Medical Diagnosis, Representation Learning |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.08269v1 |
https://arxiv.org/pdf/2001.08269v1.pdf | |
PWC | https://paperswithcode.com/paper/representation-learning-for-medical-data |
Repo | https://github.com/KarolAntczak/multimetapath2vec |
Framework | none |
JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention
Title | JAA-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention |
Authors | Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma |
Abstract | Facial action unit (AU) detection and face alignment are two highly correlated tasks, since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. However, most existing AU detection works handle the two tasks independently by treating face alignment as a preprocessing, and often use landmarks to predefine a fixed region or attention for each AU. In this paper, we propose a novel end-to-end deep learning framework for joint AU detection and face alignment, which has not been explored before. In particular, multi-scale shared feature is learned firstly, and high-level feature of face alignment is fed into AU detection. Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively. Finally, the assembled local features are integrated with face alignment feature and global feature for AU detection. Extensive experiments demonstrate that our framework (i) significantly outperforms the state-of-the-art AU detection methods on the challenging BP4D, DISFA, GFT and BP4D+ benchmarks, (ii) can adaptively capture the irregular region of each AU, (iii) achieves competitive performance for face alignment, and (iv) also works well under partial occlusions and non-frontal poses. The code for our method is available at https://github.com/ZhiwenShao/PyTorch-JAANet. |
Tasks | Action Unit Detection, Face Alignment, Facial Action Unit Detection |
Published | 2020-03-18 |
URL | https://arxiv.org/abs/2003.08834v1 |
https://arxiv.org/pdf/2003.08834v1.pdf | |
PWC | https://paperswithcode.com/paper/jaa-net-joint-facial-action-unit-detection |
Repo | https://github.com/ZhiwenShao/PyTorch-JAANet |
Framework | pytorch |
A Large Scale Event-based Detection Dataset for Automotive
Title | A Large Scale Event-based Detection Dataset for Automotive |
Authors | Pierre de Tournemire, Davide Nitti, Etienne Perot, Davide Migliore, Amos Sironi |
Abstract | We introduce the first very large detection dataset for event cameras. The dataset is composed of more than 39 hours of automotive recordings acquired with a 304x240 ATIS sensor. It contains open roads and very diverse driving scenarios, ranging from urban, highway, suburbs and countryside scenes, as well as different weather and illumination conditions. Manual bounding box annotations of cars and pedestrians contained in the recordings are also provided at a frequency between 1 and 4Hz, yielding more than 255,000 labels in total. We believe that the availability of a labeled dataset of this size will contribute to major advances in event-based vision tasks such as object detection and classification. We also expect benefits in other tasks such as optical flow, structure from motion and tracking, where for example, the large amount of data can be leveraged by self-supervised learning methods. |
Tasks | Event-based vision, Object Detection, Optical Flow Estimation |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2001.08499v3 |
https://arxiv.org/pdf/2001.08499v3.pdf | |
PWC | https://paperswithcode.com/paper/a-large-scale-event-based-detection-dataset |
Repo | https://github.com/prophesee-ai/prophesee-automotive-dataset-toolbox |
Framework | none |
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking
Title | CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking |
Authors | Grigori Fursin, Herve Guillou, Nicolas Essayan |
Abstract | We present CodeReef - an open platform to share all the components necessary to enable cross-platform MLOps (MLSysOps), i.e. automating the deployment of ML models across diverse systems in the most efficient way. We also introduce the CodeReef solution - a way to package and share models as non-virtualized, portable, customizable and reproducible archive files. Such ML packages include JSON meta description of models with all dependencies, Python APIs, CLI actions and portable workflows necessary to automatically build, benchmark, test and customize models across diverse platforms, AI frameworks, libraries, compilers and datasets. We demonstrate several CodeReef solutions to automatically build, run and measure object detection based on SSD-Mobilenets, TensorFlow and COCO dataset from the latest MLPerf inference benchmark across a wide range of platforms from Raspberry Pi, Android phones and IoT devices to data centers. Our long-term goal is to help researchers share their new techniques as production-ready packages along with research papers to participate in collaborative and reproducible benchmarking, compare the different ML/software/hardware stacks and select the most efficient ones on a Pareto frontier using online CodeReef dashboards. |
Tasks | Object Detection |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.07935v2 |
https://arxiv.org/pdf/2001.07935v2.pdf | |
PWC | https://paperswithcode.com/paper/codereef-an-open-platform-for-portable-mlops |
Repo | https://github.com/ctuning/ck |
Framework | none |
LaProp: a Better Way to Combine Momentum with Adaptive Gradient
Title | LaProp: a Better Way to Combine Momentum with Adaptive Gradient |
Authors | Liu Ziyin, Zhikang T. Wang, Masahito Ueda |
Abstract | Identifying a divergence problem in Adam, we propose a new optimizer, LaProp, which belongs to the family of adaptive gradient descent methods. This method allows for greater flexibility in choosing its hyperparameters, mitigates the effort of fine tuning, and permits straightforward interpolation between the signed gradient methods and the adaptive gradient methods. We bound the regret of LaProp on a convex problem and show that our bound differs from the previous methods by a key factor, which demonstrates its advantage. We experimentally show that LaProp outperforms the previous methods on a toy task with noisy gradients, optimization of extremely deep fully-connected networks, neural art style transfer, natural language processing using transformers, and reinforcement learning with deep-Q networks. The performance improvement of LaProp is shown to be consistent, sometimes dramatic and qualitative. |
Tasks | Style Transfer |
Published | 2020-02-12 |
URL | https://arxiv.org/abs/2002.04839v1 |
https://arxiv.org/pdf/2002.04839v1.pdf | |
PWC | https://paperswithcode.com/paper/laprop-a-better-way-to-combine-momentum-with |
Repo | https://github.com/Z-T-WANG/LaProp-Optimizer |
Framework | pytorch |
Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding
Title | Variable-Bitrate Neural Compression via Bayesian Arithmetic Coding |
Authors | Yibo Yang, Robert Bamler, Stephan Mandt |
Abstract | Deep Bayesian latent variable models have enabled new approaches to both model and data compression. Here, we propose a new algorithm for compressing latent representations in deep probabilistic models, such as variational autoencoders, in post-processing. The approach thus separates model design and training from the compression task. Our algorithm generalizes arithmetic coding to the continuous domain, using adaptive discretization accuracy that exploits estimates of posterior uncertainty. A consequence of the “plug and play” nature of our approach is that various rate-distortion trade-offs can be achieved with a single trained model, eliminating the need to train multiple models for different bit rates. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single machine learning model. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method. |
Tasks | Image Compression, Latent Variable Models, Word Embeddings |
Published | 2020-02-18 |
URL | https://arxiv.org/abs/2002.08158v1 |
https://arxiv.org/pdf/2002.08158v1.pdf | |
PWC | https://paperswithcode.com/paper/variable-bitrate-neural-compression-via |
Repo | https://github.com/mandt-lab/bayesian-ac |
Framework | tf |
High-Fidelity Synthesis with Disentangled Representation
Title | High-Fidelity Synthesis with Disentangled Representation |
Authors | Wonkwang Lee, Donggyun Kim, Seunghoon Hong, Honglak Lee |
Abstract | Learning disentangled representation of data without supervision is an important step towards improving the interpretability of generative models. Despite recent advances in disentangled representation learning, existing approaches often suffer from the trade-off between representation learning and generation performance i.e. improving generation quality sacrifices disentanglement performance). We propose an Information-Distillation Generative Adversarial Network (ID-GAN), a simple yet generic framework that easily incorporates the existing state-of-the-art models for both disentanglement learning and high-fidelity synthesis. Our method learns disentangled representation using VAE-based models, and distills the learned representation with an additional nuisance variable to the separate GAN-based generator for high-fidelity synthesis. To ensure that both generative models are aligned to render the same generative factors, we further constrain the GAN generator to maximize the mutual information between the learned latent code and the output. Despite the simplicity, we show that the proposed method is highly effective, achieving comparable image generation quality to the state-of-the-art methods using the disentangled representation. We also show that the proposed decomposition leads to an efficient and stable model design, and we demonstrate photo-realistic high-resolution image synthesis results (1024x1024 pixels) for the first time using the disentangled representations. |
Tasks | Image Generation, Representation Learning |
Published | 2020-01-13 |
URL | https://arxiv.org/abs/2001.04296v1 |
https://arxiv.org/pdf/2001.04296v1.pdf | |
PWC | https://paperswithcode.com/paper/high-fidelity-synthesis-with-disentangled |
Repo | https://github.com/rosinality/id-gan-pytorch |
Framework | pytorch |
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs
Title | A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs |
Authors | Zequn Sun, Qingheng Zhang, Wei Hu, Chengming Wang, Muhao Chen, Farahnaz Akrami, Chengkai Li |
Abstract | Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained. |
Tasks | Entity Alignment, Knowledge Graphs |
Published | 2020-03-10 |
URL | https://arxiv.org/abs/2003.07743v1 |
https://arxiv.org/pdf/2003.07743v1.pdf | |
PWC | https://paperswithcode.com/paper/a-benchmarking-study-of-embedding-based |
Repo | https://github.com/nju-websoft/OpenEA |
Framework | tf |
Learning Dynamic Routing for Semantic Segmentation
Title | Learning Dynamic Routing for Semantic Segmentation |
Authors | Yanwei Li, Lin Song, Yukang Chen, Zeming Li, Xiangyu Zhang, Xingang Wang, Jian Sun |
Abstract | Recently, numerous handcrafted and searched networks have been applied for semantic segmentation. However, previous works intend to handle inputs with various scales in pre-defined static architectures, such as FCN, U-Net, and DeepLab series. This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image. To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly. In addition, the computational cost can be further reduced in an end-to-end manner by giving budget constraints to the gating function. We further relax the network level routing space to support multi-path propagations and skip-connections in each forward, bringing substantial network capacity. To demonstrate the superiority of the dynamic property, we compare with several static architectures, which can be modeled as special cases in the routing space. Extensive experiments are conducted on Cityscapes and PASCAL VOC 2012 to illustrate the effectiveness of the dynamic framework. Code is available at https://github.com/yanwei-li/DynamicRouting. |
Tasks | Semantic Segmentation |
Published | 2020-03-23 |
URL | https://arxiv.org/abs/2003.10401v1 |
https://arxiv.org/pdf/2003.10401v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-dynamic-routing-for-semantic |
Repo | https://github.com/yanwei-li/DynamicRouting |
Framework | pytorch |
BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation
Title | BiCANet: Bi-directional Contextual Aggregating Network for Image Semantic Segmentation |
Authors | Quan Zhou, Dechun Cong, Bin Kang, Xiaofu Wu, Baoyu Zheng, Huimin Lu, Longin Jan Latecki |
Abstract | Exploring contextual information in convolution neural networks (CNNs) has gained substantial attention in recent years for semantic segmentation. This paper introduces a Bi-directional Contextual Aggregating Network, called BiCANet, for semantic segmentation. Unlike previous approaches that encode context in feature space, BiCANet aggregates contextual cues from a categorical perspective, which is mainly consist of three parts: contextual condensed projection block (CCPB), bi-directional context interaction block (BCIB), and muti-scale contextual fusion block (MCFB). More specifically, CCPB learns a category-based mapping through a split-transform-merge architecture, which condenses contextual cues with different receptive fields from intermediate layer. BCIB, on the other hand, employs dense skipped-connections to enhance the class-level context exchanging. Finally, MCFB integrates multi-scale contextual cues by investigating short- and long-ranged spatial dependencies. To evaluate BiCANet, we have conducted extensive experiments on three semantic segmentation datasets: PASCAL VOC 2012, Cityscapes, and ADE20K. The experimental results demonstrate that BiCANet outperforms recent state-of-the-art networks without any postprocess techniques. Particularly, BiCANet achieves the mIoU score of 86.7%, 82.4% and 38.66% on PASCAL VOC 2012, Cityscapes and ADE20K testset, respectively. |
Tasks | Semantic Segmentation |
Published | 2020-03-21 |
URL | https://arxiv.org/abs/2003.09669v1 |
https://arxiv.org/pdf/2003.09669v1.pdf | |
PWC | https://paperswithcode.com/paper/bicanet-bi-directional-contextual-aggregating |
Repo | https://github.com/cdcnjupt/BCANet |
Framework | none |