January 26, 2020

3174 words 15 mins read

Paper Group ANR 1614

SWAG: Item Recommendations using Convolutions on Weighted Graphs. Few-shot Learning with Meta Metric Learners. PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems. U-Net Training with Instance-Layer Normalization. Pairwise Interactive Graph Attention Network for Context-Aware Recommendation. An Automatic I …

SWAG: Item Recommendations using Convolutions on Weighted Graphs


Title	SWAG: Item Recommendations using Convolutions on Weighted Graphs
Authors	Amit Pande, Kai Ni, Venkataramani Kini
Abstract	Recent advancements in deep neural networks for graph-structured data have led to state-of-the-art performance on recommender system benchmarks. In this work, we present a Graph Convolutional Network (GCN) algorithm SWAG (Sample Weight and AGgregate), which combines efficient random walks and graph convolutions on weighted graphs to generate embeddings for nodes (items) that incorporate both graph structure as well as node feature information such as item-descriptions and item-images. The three important SWAG operations that enable us to efficiently generate node embeddings based on graph structures are (a) Sampling of graph to homogeneous structure, (b) Weighting the sampling, walks and convolution operations, and (c) using AGgregation functions for generating convolutions. The work is an adaptation of graphSAGE over weighted graphs. We deploy SWAG at Target and train it on a graph of more than 500K products sold online with over 50M edges. Offline and online evaluations reveal the benefit of using a graph-based approach and the benefits of weighing to produce high quality embeddings and product recommendations.
Tasks	Recommendation Systems
Published	2019-11-22
URL	https://arxiv.org/abs/1911.10232v1
PDF	https://arxiv.org/pdf/1911.10232v1.pdf
PWC	https://paperswithcode.com/paper/swag-item-recommendations-using-convolutions
Repo
Framework

Few-shot Learning with Meta Metric Learners


Title	Few-shot Learning with Meta Metric Learners
Authors	Yu Cheng, Mo Yu, Xiaoxiao Guo, Bowen Zhou
Abstract	Few-shot Learning aims to learn classifiers for new classes with only a few training examples per class. Existing meta-learning or metric-learning based few-shot learning approaches are limited in handling diverse domains with various number of labels. The meta-learning approaches train a meta learner to predict weights of homogeneous-structured task-specific networks, requiring a uniform number of classes across tasks. The metric-learning approaches learn one task-invariant metric for all the tasks, and they fail if the tasks diverge. We propose to deal with these limitations with meta metric learning. Our meta metric learning approach consists of task-specific learners, that exploit metric learning to handle flexible labels, and a meta learner, that discovers good parameters and gradient decent to specify the metrics in task-specific learners. Thus the proposed model is able to handle unbalanced classes as well as to generate task-specific metrics. We test our approach in the `$k$-shot $N$-way’ few-shot learning setting used in previous work and new realistic few-shot setting with diverse multi-domain tasks and flexible label numbers. Experiments show that our approach attains superior performances in both settings. \|
Tasks	Few-Shot Learning, Meta-Learning, Metric Learning
Published	2019-01-26
URL	http://arxiv.org/abs/1901.09890v1
PDF	http://arxiv.org/pdf/1901.09890v1.pdf
PWC	https://paperswithcode.com/paper/few-shot-learning-with-meta-metric-learners
Repo
Framework

PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems


Title	PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems
Authors	Azin Ghazimatin, Oana Balalau, Rishiraj Saha Roy, Gerhard Weikum
Abstract	Interpretable explanations for recommender systems and other machine learning models are crucial to gain user trust. Prior works that have focused on paths connecting users and items in a heterogeneous network have several limitations, such as discovering relationships rather than true explanations, or disregarding other users’ privacy. In this work, we take a fresh perspective, and present PRINCE: a provider-side mechanism to produce tangible explanations for end-users, where an explanation is defined to be a set of minimal actions performed by the user that, if removed, changes the recommendation to a different item. Given a recommendation, PRINCE uses a polynomial-time optimal algorithm for finding this minimal set of a user’s actions from an exponential search space, based on random walks over dynamic graphs. Experiments on two real-world datasets show that PRINCE provides more compact explanations than intuitive baselines, and insights from a crowdsourced user-study demonstrate the viability of such action-based explanations. We thus posit that PRINCE produces scrutable, actionable, and concise explanations, owing to its use of counterfactual evidence, a user’s own actions, and minimal sets, respectively.
Tasks	Recommendation Systems
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08378v4
PDF	https://arxiv.org/pdf/1911.08378v4.pdf
PWC	https://paperswithcode.com/paper/prince-provider-side-interpretability-with
Repo
Framework

U-Net Training with Instance-Layer Normalization


Title	U-Net Training with Instance-Layer Normalization
Authors	Xiao-Yun Zhou, Peichao Li, Zhao-Yang Wang, Guang-Zhong Yang
Abstract	Normalization layers are essential in a Deep Convolutional Neural Network (DCNN). Various normalization methods have been proposed. The statistics used to normalize the feature maps can be computed at batch, channel, or instance level. However, in most of existing methods, the normalization for each layer is fixed. Batch-Instance Normalization (BIN) is one of the first proposed methods that combines two different normalization methods and achieve diverse normalization for different layers. However, two potential issues exist in BIN: first, the Clip function is not differentiable at input values of 0 and 1; second, the combined feature map is not with a normalized distribution which is harmful for signal propagation in DCNN. In this paper, an Instance-Layer Normalization (ILN) layer is proposed by using the Sigmoid function for the feature map combination, and cascading group normalization. The performance of ILN is validated on image segmentation of the Right Ventricle (RV) and Left Ventricle (LV) using U-Net as the network architecture. The results show that the proposed ILN outperforms previous traditional and popular normalization methods with noticeable accuracy improvements for most validations, supporting the effectiveness of the proposed ILN.
Tasks	Semantic Segmentation
Published	2019-08-21
URL	https://arxiv.org/abs/1908.08466v2
PDF	https://arxiv.org/pdf/1908.08466v2.pdf
PWC	https://paperswithcode.com/paper/u-net-training-with-instance-layer
Repo
Framework

Pairwise Interactive Graph Attention Network for Context-Aware Recommendation


Title	Pairwise Interactive Graph Attention Network for Context-Aware Recommendation
Authors	Yahui Liu, Furao Shen, Jian Zhao
Abstract	Context-aware recommender systems (CARS), which consider rich side information to improve recommendation performance, have caught more and more attention in both academia and industry. How to predict user preferences from diverse contextual features is the core of CARS. Several recent models pay attention to user behaviors and use specifically designed structures to extract adaptive user interests from history behaviors. However, few works take item history interactions into consideration, which leads to the insufficiency of item feature representation and item attraction extraction. From these observations, we model the user-item interaction as a dynamic interaction graph (DIG) and proposed a GNN-based model called Pairwise Interactive Graph Attention Network (PIGAT) to capture dynamic user interests and item attractions simultaneously. PIGAT introduces the attention mechanism to consider the importance of each interacted user/item to both the user and the item, which captures user interests, item attractions and their influence on the recommendation context. Moreover, confidence embeddings are applied to interactions to distinguish the confidence of interactions occurring at different times. Then more expressive user/item representations and adaptive interaction features are generated, which benefits the recommendation performance especially when involving long-tail items. We conduct experiments on three real-world datasets to demonstrate the effectiveness of PIGAT.
Tasks	Recommendation Systems
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07429v1
PDF	https://arxiv.org/pdf/1911.07429v1.pdf
PWC	https://paperswithcode.com/paper/pairwise-interactive-graph-attention-network
Repo
Framework

An Automatic Interaction Detection Hybrid Model for Bankcard Response Classification


Title	An Automatic Interaction Detection Hybrid Model for Bankcard Response Classification
Authors	Yan Wang, Xuelei Sherry Ni, Brian Stone
Abstract	In this paper, we propose a hybrid bankcard response model, which integrates decision tree based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possibly potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model is that adding variable interactions may improve the performance of logistic regression. To demonstrate the effectiveness of the proposed hybrid model, it is evaluated on a real credit customer response data set. As the results reveal, by identifying potential interactions among independent variables, the proposed hybrid approach outperforms the logistic regression without searching for interactions in terms of classification accuracy, the area under the receiver operating characteristic curve (ROC), and Kolmogorov-Smirnov (KS) statistics. Furthermore, CHAID analysis for interaction detection is much more computationally efficient than the stepwise search mentioned above and some identified interactions are shown to have statistically significant predictive power on the target variable. Last but not least, the customer profile created based on the CHAID tree provides a reasonable interpretation of the interactions, which is the required by regulations of the credit industry. Hence, this study provides an alternative for handling bankcard classification tasks.
Tasks
Published	2019-01-02
URL	http://arxiv.org/abs/1901.00251v1
PDF	http://arxiv.org/pdf/1901.00251v1.pdf
PWC	https://paperswithcode.com/paper/an-automatic-interaction-detection-hybrid
Repo
Framework

Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations


Title	Learning Generalized Transformation Equivariant Representations via Autoencoding Transformations
Authors	Guo-Jun Qi, Liheng Zhang, Xiao Wang
Abstract	Transformation Equivariant Representations (TERs) aim to capture the intrinsic visual structures that equivary to various transformations by expanding the notion of {\em translation} equivariance underlying the success of Convolutional Neural Networks (CNNs). For this purpose, we present both deterministic AutoEncoding Transformations (AET) and probabilistic AutoEncoding Variational Transformations (AVT) models to learn visual representations from generic groups of transformations. While the AET is trained by directly decoding the transformations from the learned representations, the AVT is trained by maximizing the joint mutual information between the learned representation and transformations. This results in Generalized TERs (GTERs) equivariant against transformations in a more general fashion by capturing complex patterns of visual structures beyond the conventional linear equivariance under a transformation group. The presented approach can be extended to (semi-)supervised models by jointly maximizing the mutual information of the learned representation with both labels and transformations. Experiments demonstrate the proposed models outperform the state-of-the-art models in both unsupervised and (semi-)supervised tasks.
Tasks
Published	2019-06-19
URL	https://arxiv.org/abs/1906.08628v3
PDF	https://arxiv.org/pdf/1906.08628v3.pdf
PWC	https://paperswithcode.com/paper/learning-generalized-transformation
Repo
Framework

DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting


Title	DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting
Authors	Kyungeun Lee, Wonjong Rhee
Abstract	Traffic speed forecasting is one of the core problems in Intelligent Transportation Systems. For a more accurate prediction, recent studies started using not only the temporal speed patterns but also the spatial information on the road network through the graph convolutional networks. Even though the road network is highly complex due to its non-Euclidean and directional characteristics, previous approaches mainly focus on modeling the spatial dependencies only with the distance. In this paper, we identify two essential spatial dependencies in traffic forecasting in addition to distance, direction and positional relationship, for designing basic graph elements as the smallest building blocks. Using the building blocks, we suggest DDP-GCN (Distance, Direction, and Positional relationship Graph Convolutional Network) to incorporate the three spatial relationships into prediction network for traffic forecasting. We evaluate the proposed model with two large-scale real-world datasets, and find 7.40% average improvement for 1-hour forecasting in highly complex urban networks.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12256v2
PDF	https://arxiv.org/pdf/1905.12256v2.pdf
PWC	https://paperswithcode.com/paper/graph-convolutional-modules-for-traffic
Repo
Framework

Generating Stereotypes Automatically For Complex Categorical Features


Title	Generating Stereotypes Automatically For Complex Categorical Features
Authors	Nourah ALRossais, Daniel Kudenko
Abstract	In the context of stereotypes creation for recommender systems, we found that certain types of categorical variables pose particular challenges if simple clustering procedures were employed with the objective to create stereotypes. A categorical variable is defined to be complex when it cannot be easily translated into a numerical variable, when the semantic of the categories potentially plays an important role in the optimal determination of stereotypes, and when it is also multi-choice (e.g., each item can be labelled with one or more categories that may be applicable, in a non pre-defined number). The main objective of this paper is to analyse the possibility of obtaining a viable recommendation system that operates on stereotypes generated directly via the feature’s metadata similarities, without using ratings information at the time the generation of the classes. The encouraging results using integrated MovieLens and Imdb data set show that the proposed algorithm performs better than other categorical clustering algorithms like k-modes when clustering complex categorical features. Notably, the representation of complex categorical features can help to alleviate cold-start issues in recommender systems.
Tasks	Recommendation Systems
Published	2019-11-13
URL	https://arxiv.org/abs/1911.11064v1
PDF	https://arxiv.org/pdf/1911.11064v1.pdf
PWC	https://paperswithcode.com/paper/191111064
Repo
Framework

BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search


Title	BETANAS: BalancEd TrAining and selective drop for Neural Architecture Search
Authors	Muyuan Fang, Qiang Wang, Zhao Zhong
Abstract	Automatic neural architecture search techniques are becoming increasingly important in machine learning area. Especially, weight sharing methods have shown remarkable potentials on searching good network architectures with few computational resources. However, existing weight sharing methods mainly suffer limitations on searching strategies: these methods either uniformly train all network paths to convergence which introduces conflicts between branches and wastes a large amount of computation on unpromising candidates, or selectively train branches with different frequency which leads to unfair evaluation and comparison among paths. To address these issues, we propose a novel neural architecture search method with balanced training strategy to ensure fair comparisons and a selective drop mechanism to reduce conflicts among candidate paths. The experimental results show that our proposed method can achieve a leading performance of 79.0% on ImageNet under mobile settings, which outperforms other state-of-the-art methods in both accuracy and efficiency.
Tasks	Neural Architecture Search
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11191v1
PDF	https://arxiv.org/pdf/1912.11191v1.pdf
PWC	https://paperswithcode.com/paper/betanas-balanced-training-and-selective-drop-1
Repo
Framework

Self-Supervised Contextual Language Representation of Radiology Reports to Improve the Identification of Communication Urgency


Title	Self-Supervised Contextual Language Representation of Radiology Reports to Improve the Identification of Communication Urgency
Authors	Xing Meng, Craig H. Ganoe, Ryan T. Sieberg, Yvonne Y. Cheung, Saeed Hassanpour
Abstract	Machine learning methods have recently achieved high-performance in biomedical text analysis. However, a major bottleneck in the widespread application of these methods is obtaining the required large amounts of annotated training data, which is resource intensive and time consuming. Recent progress in self-supervised learning has shown promise in leveraging large text corpora without explicit annotations. In this work, we built a self-supervised contextual language representation model using BERT, a deep bidirectional transformer architecture, to identify radiology reports requiring prompt communication to the referring physicians. We pre-trained the BERT model on a large unlabeled corpus of radiology reports and used the resulting contextual representations in a final text classifier for communication urgency. Our model achieved a precision of 97.0%, recall of 93.3%, and F-measure of 95.1% on an independent test set in identifying radiology reports for prompt communication, and significantly outperformed the previous state-of-the-art model based on word2vec representations.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02703v1
PDF	https://arxiv.org/pdf/1912.02703v1.pdf
PWC	https://paperswithcode.com/paper/self-supervised-contextual-language
Repo
Framework

Learning-Accelerated ADMM for Distributed Optimal Power Flow


Title	Learning-Accelerated ADMM for Distributed Optimal Power Flow
Authors	Dave Biagioni, Peter Graf, Xiangyu Zhang, Jennifer King
Abstract	We propose a novel data-driven method to accelerate the convergence of Alternating Direction Method of Multipliers (ADMM) for solving distributed DC optimal power flow (DC-OPF) where lines are shared between independent network partitions. Using previous observations of ADMM trajectories for a given system under varying load, the method trains a recurrent neural network (RNN) to predict the converged values of dual and consensus variables. Given a new realization of system load, a small number of initial ADMM iterations is taken as input to infer the converged values and directly inject them into the iteration. For this purpose, we utilize a recently proposed RNN architecture called antisymmetric RNN (aRNN) that avoids vanishing and exploding gradients via network weights designed to have the spectral properties of a convergent numerical integration scheme. We demonstrate empirically that the online injection of these values into the ADMM iteration accelerates convergence by a factor of 2-50x for partitioned 13-, 300- and 2848-bus test systems under differing load scenarios. The proposed method has several advantages: it can be easily integrated around an existing software framework, requiring no changes to underlying physical models; it maintains the security of private decision variables inherent in consensus ADMM; inference is fast and so may be used in online settings; historical data is leveraged to improve performance instead of being discarded or ignored. While we focus on the ADMM formulation of distributed DC-OPF in this paper, the ideas presented are naturally extended to other iterative optimization schemes.
Tasks
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03019v1
PDF	https://arxiv.org/pdf/1911.03019v1.pdf
PWC	https://paperswithcode.com/paper/learning-accelerated-admm-for-distributed
Repo
Framework

The Missing Ingredient in Zero-Shot Neural Machine Translation


Title	The Missing Ingredient in Zero-Shot Neural Machine Translation
Authors	Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Roee Aharoni, Melvin Johnson, Wolfgang Macherey
Abstract	Multilingual Neural Machine Translation (NMT) models are capable of translating between multiple source and target languages. Despite various approaches to train such models, they have difficulty with zero-shot translation: translating between language pairs that were not together seen during training. In this paper we first diagnose why state-of-the-art multilingual NMT models that rely purely on parameter sharing, fail to generalize to unseen language pairs. We then propose auxiliary losses on the NMT encoder that impose representational invariance across languages. Our simple approach vastly improves zero-shot translation quality without regressing on supervised directions. For the first time, on WMT14 English-FrenchGerman, we achieve zero-shot performance that is on par with pivoting. We also demonstrate the easy scalability of our approach to multiple languages on the IWSLT 2017 shared task.
Tasks	Machine Translation
Published	2019-03-17
URL	http://arxiv.org/abs/1903.07091v1
PDF	http://arxiv.org/pdf/1903.07091v1.pdf
PWC	https://paperswithcode.com/paper/the-missing-ingredient-in-zero-shot-neural
Repo
Framework

BASN – Learning Steganography with Binary Attention Mechanism


Title	BASN – Learning Steganography with Binary Attention Mechanism
Authors	Yang Yang
Abstract	Secret information sharing through image carrier has aroused much research attention in recent years with images’ growing domination on the Internet and mobile applications. However, with the booming trend of convolutional neural networks, image steganography is facing a more significant challenge from neural-network-automated tasks. To improve the security of image steganography and minimize task result distortion, models must maintain the feature maps generated by task-specific networks being irrelative to any hidden information embedded in the carrier. This paper introduces a binary attention mechanism into image steganography to help alleviate the security issue, and in the meanwhile, increase embedding payload capacity. The experimental results show that our method has the advantage of high payload capacity with little feature map distortion and still resist detection by state-of-the-art image steganalysis algorithms.
Tasks	Image Steganography
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04362v1
PDF	https://arxiv.org/pdf/1907.04362v1.pdf
PWC	https://paperswithcode.com/paper/basn-learning-steganography-with-binary
Repo
Framework

Adversarial Domain Adaptation for Machine Reading Comprehension


Title	Adversarial Domain Adaptation for Machine Reading Comprehension
Authors	Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang
Abstract	In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain. To this end, we propose an Adversarial Domain Adaptation framework (AdaMRC), where ($i$) pseudo questions are first generated for unlabeled passages in the target domain, and then ($ii$) a domain classifier is incorporated into an MRC model to predict which domain a given passage-question pair comes from. The classifier and the passage-question encoder are jointly trained using adversarial learning to enforce domain-invariant representation learning. Comprehensive evaluations demonstrate that our approach ($i$) is generalizable to different MRC models and datasets, ($ii$) can be combined with pre-trained large-scale language models (such as ELMo and BERT), and ($iii$) can be extended to semi-supervised learning.
Tasks	Domain Adaptation, Machine Reading Comprehension, Reading Comprehension, Representation Learning, Unsupervised Domain Adaptation
Published	2019-08-24
URL	https://arxiv.org/abs/1908.09209v1
PDF	https://arxiv.org/pdf/1908.09209v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-domain-adaptation-for-machine
Repo
Framework