February 1, 2020

3281 words 16 mins read

Paper Group AWR 111

Using Text Embeddings for Causal Inference. INFaaS: A Model-less Inference Serving System. Distorted Representation Space Characterization Through Backpropagated Gradients. Choosing Transfer Languages for Cross-Lingual Learning. Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition. Saliency detection based on structural dissimila …

Using Text Embeddings for Causal Inference


Title	Using Text Embeddings for Causal Inference
Authors	Victor Veitch, Dhanya Sridhar, David M. Blei
Abstract	We address causal inference with text documents. For example, does adding a theorem to a paper affect its chance of acceptance? Does reporting the gender of a forum post author affect the popularity of the post? We estimate these effects from observational data, where they may be confounded by features of the text such as the subject or writing quality. Although the text suffices for causal adjustment, it is prohibitively high-dimensional. The challenge is to find a low-dimensional text representation that can be used in causal inference. A key insight is that causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. Our proposed method adapts deep language models to learn low-dimensional embeddings from text that predict these values well; these embeddings suffice for causal adjustment. We establish theoretical properties of this method. We study it empirically on semi-simulated and real data on paper acceptance and forum post popularity. Code is available at https://github.com/blei-lab/causal-text-embeddings.
Tasks	Causal Inference
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12741v1
PDF	https://arxiv.org/pdf/1905.12741v1.pdf
PWC	https://paperswithcode.com/paper/using-text-embeddings-for-causal-inference
Repo	https://github.com/blei-lab/causal-text-embeddings
Framework	tf

INFaaS: A Model-less Inference Serving System


Title	INFaaS: A Model-less Inference Serving System
Authors	Francisco Romero, Qian Li, Neeraja J. Yadwadkar, Christos Kozyrakis
Abstract	Despite existing work in machine learning inference serving, ease-of-use and cost efficiency remain key challenges. Developers must manually match the performance, accuracy, and cost constraints of their applications to decisions about selecting the right model and model optimizations, suitable hardware architectures, and auto-scaling configurations. These interacting decisions are difficult to make for users, especially when the application load varies, applications evolve, and the available resources vary over time. Thus, users often end up making decisions that overprovision resources. This paper introduces INFaaS, a model-less inference-as-a-service system that relieves users of making these decisions. INFaaS provides a simple interface allowing users to specify their inference task, and performance and accuracy requirements. To implement this interface, INFaaS generates and leverages model-variants, versions of a model that differ in resource footprints, latencies, costs, and accuracies. Based on the characteristics of the model-variants, INFaaS automatically navigates the decision space on behalf of users to meet user-specified objectives: (a) it selects a model, hardware architecture, and any compiler optimizations, and (b) it makes scaling and resource allocation decisions. By sharing models across users and hardware resources across models, INFaaS achieves up to 150x cost savings, 1.5x higher throughput, and violates latency objectives 1.5x less frequently, compared to Clipper and TensorFlow Serving.
Tasks	Model Selection
Published	2019-05-30
URL	https://arxiv.org/abs/1905.13348v5
PDF	https://arxiv.org/pdf/1905.13348v5.pdf
PWC	https://paperswithcode.com/paper/infaas-managed-model-less-inference-serving
Repo	https://github.com/stanford-mast/INFaaS
Framework	tf

Distorted Representation Space Characterization Through Backpropagated Gradients


Title	Distorted Representation Space Characterization Through Backpropagated Gradients
Authors	Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib
Abstract	In this paper, we utilize weight gradients from backpropagation to characterize the representation space learned by deep learning algorithms. We demonstrate the utility of such gradients in applications including perceptual image quality assessment and out-of-distribution classification. The applications are chosen to validate the effectiveness of gradients as features when the test image distribution is distorted from the train image distribution. In both applications, the proposed gradient based features outperform activation features. In image quality assessment, the proposed approach is compared with other state of the art approaches and is generally the top performing method on TID 2013 and MULTI-LIVE databases in terms of accuracy, consistency, linearity, and monotonic behavior. Finally, we analyze the effect of regularization on gradients using CURE-TSR dataset for out-of-distribution classification.
Tasks	Image Quality Assessment
Published	2019-08-27
URL	https://arxiv.org/abs/1908.09998v1
PDF	https://arxiv.org/pdf/1908.09998v1.pdf
PWC	https://paperswithcode.com/paper/distorted-representation-space
Repo	https://github.com/gukyeongkwon/distorted-representation-characterization
Framework	pytorch

Choosing Transfer Languages for Cross-Lingual Learning


Title	Choosing Transfer Languages for Cross-Lingual Learning
Authors	Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig
Abstract	Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. Code, data, and pre-trained models are available at https://github.com/neulab/langrank
Tasks	Cross-Lingual Transfer
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12688v2
PDF	https://arxiv.org/pdf/1905.12688v2.pdf
PWC	https://paperswithcode.com/paper/choosing-transfer-languages-for-cross-lingual
Repo	https://github.com/neulab/langrank
Framework	none

Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition


Title	Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition
Authors	Chao Gao, Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor
Abstract	The Pommerman Team Environment is a recently proposed benchmark which involves a multi-agent domain with challenges such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards. The inaugural Pommerman Team Competition held at NeurIPS 2018 hosted 25 participants who submitted a team of 2 agents. Our submission nn_team_skynet955_skynet955 won 2nd place of the “learning agents’’ category. Our team is composed of 2 neural networks trained with state of the art deep reinforcement learning algorithms and makes use of concepts like reward shaping, curriculum learning, and an automatic reasoning module for action pruning. Here, we describe these elements and additionally we present a collection of open-sourced agents that can be used for training and testing in the Pommerman environment. Code available at: https://github.com/BorealisAI/pommerman-baseline
Tasks
Published	2019-04-20
URL	http://arxiv.org/abs/1905.01360v1
PDF	http://arxiv.org/pdf/1905.01360v1.pdf
PWC	https://paperswithcode.com/paper/190501360
Repo	https://github.com/BorealisAI/pommerman-baseline
Framework	none

Saliency detection based on structural dissimilarity induced by image quality assessment model


Title	Saliency detection based on structural dissimilarity induced by image quality assessment model
Authors	Yang Li, Xuanqin Mou
Abstract	The distinctiveness of image regions is widely used as the cue of saliency. Generally, the distinctiveness is computed according to the absolute difference of features. However, according to the image quality assessment (IQA) studies, the human visual system is highly sensitive to structural changes rather than absolute difference. Accordingly, we propose the computation of the structural dissimilarity between image patches as the distinctiveness measure for saliency detection. Similar to IQA models, the structural dissimilarity is computed based on the correlation of the structural features. The global structural dissimilarity of a patch to all the other patches represents saliency of the patch. We adopt two widely used structural features, namely the local contrast and gradient magnitude, into the structural dissimilarity computation in the proposed model. Without any postprocessing, the proposed model based on the correlation of either of the two structural features outperforms 11 state-of-the-art saliency models on three saliency databases.
Tasks	Image Quality Assessment, Saliency Detection
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10150v1
PDF	https://arxiv.org/pdf/1905.10150v1.pdf
PWC	https://paperswithcode.com/paper/saliency-detection-based-on-structural
Repo	https://github.com/yangli-xjtu/SDS
Framework	none

UPSNet: A Unified Panoptic Segmentation Network


Title	UPSNet: A Unified Panoptic Segmentation Network
Authors	Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun
Abstract	In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. On top of a single backbone residual network, we first design a deformable convolution based semantic segmentation head and a Mask R-CNN style instance segmentation head which solve these two subtasks simultaneously. More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification. It first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolve the conflicts between semantic and instance segmentation. Additionally, it handles the challenge caused by the varying number of instances and permits back propagation to the bottom modules in an end-to-end manner. Extensive experimental results on Cityscapes, COCO and our internal dataset demonstrate that our UPSNet achieves state-of-the-art performance with much faster inference. Code has been made available at: https://github.com/uber-research/UPSNet
Tasks	Instance Segmentation, Panoptic Segmentation, Semantic Segmentation
Published	2019-01-12
URL	http://arxiv.org/abs/1901.03784v2
PDF	http://arxiv.org/pdf/1901.03784v2.pdf
PWC	https://paperswithcode.com/paper/upsnet-a-unified-panoptic-segmentation
Repo	https://github.com/uber-research/UPSNet
Framework	pytorch

Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation


Title	Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Authors	Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, Jie Zhou
Abstract	Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En$\leftrightarrow$De and about 2.5 BLEU scores on WMT16 En$\leftrightarrow$Ro.
Tasks	Machine Translation
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09320v1
PDF	https://arxiv.org/pdf/1911.09320v1.pdf
PWC	https://paperswithcode.com/paper/minimizing-the-bag-of-ngrams-difference-for
Repo	https://github.com/ictnlp/BoN-NAT
Framework	pytorch

TENER: Adapting Transformer Encoder for Named Entity Recognition


Title	TENER: Adapting Transformer Encoder for Named Entity Recognition
Authors	Hang Yan, Bocao Deng, Xiaonan Li, Xipeng Qiu
Abstract	The Bidirectional long short-term memory networks (BiLSTM) have been widely used as an encoder in models solving the named entity recognition (NER) task. Recently, the Transformer is broadly adopted in various Natural Language Processing (NLP) tasks owing to its parallelism and advantageous performance. Nevertheless, the performance of the Transformer in NER is not as good as it is in other NLP tasks. In this paper, we propose TENER, a NER architecture adopting adapted Transformer Encoder to model the character-level features and word-level features. By incorporating the direction and relative distance aware attention and the un-scaled attention, we prove the Transformer-like encoder is just as effective for NER as other NLP tasks.
Tasks	Chinese Named Entity Recognition, Named Entity Recognition
Published	2019-11-10
URL	https://arxiv.org/abs/1911.04474v3
PDF	https://arxiv.org/pdf/1911.04474v3.pdf
PWC	https://paperswithcode.com/paper/tener-adapting-transformer-encoder-for-name
Repo	https://github.com/fastnlp/TENER
Framework	pytorch

Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search


Title	Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search
Authors	Cong Fu, Changxu Wang, Deng Cai
Abstract	Approximate Nearest Neighbor Search (ANNS) in high dimensional space is essential in database and information retrieval. Recently, there has been a surge of interests in exploring efficient graph-based indices for the ANNS problem. Among them, the NSG has resurrected the theory of Monotonic Search Networks (MSNET) and achieved the state-of-the-art performance. However, the performance of the NSG deviates from a potentially optimal position due to the high sparsity of the graph. Specifically, though the average degree of the graph is small, their search algorithm travels a longer way to reach the query. Integrating both factors, the total search complexity (i.e., the number of distance calculations) is not minimized as their wish. In addition, NSG suffers from a high indexing time complexity, which limits the efficiency and the scalability of their method. In this paper, we aim to further mine the potential of the MSNETs. Inspired by the message transfer mechanism of the communication satellite system, we find a new family of MSNETs, namely the Satellite System Graphs (SSG). In particular, while inheriting the superior ANNS properties from the MSNET, we try to ensure the angles between the edges to be no smaller than a given value. Consequently, each node in the graph builds effective connections to its neighborhood omnidirectionally, which ensures an efficient search-routing on the graph like the message transfer among the satellites. We also propose an approximation of the SSG, Navigating SSG, to increase the efficiency of indexing. Both theoretical and extensive experimental analysis are provided to demonstrate the strengths of the proposed approach over the existing state-of-the-art algorithms. Our code has been released on GitHub.
Tasks	Information Retrieval
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06146v2
PDF	https://arxiv.org/pdf/1907.06146v2.pdf
PWC	https://paperswithcode.com/paper/satellite-system-graph-towards-the-efficiency
Repo	https://github.com/ZJULearning/SSG
Framework	none

Heterogeneous Deep Graph Infomax


Title	Heterogeneous Deep Graph Infomax
Authors	Yuxiang Ren, Bo Liu, Chao Huang, Peng Dai, Liefeng Bo, Jiawei Zhang
Abstract	Graph representation learning is to learn universal node representations that preserve both node attributes and structural information. The derived node representations can be used to serve various downstream tasks, such as node classification and node clustering. When a graph is heterogeneous, the problem becomes more challenging than the homogeneous graph node learning problem. Inspired by the emerging information theoretic-based learning algorithm, in this paper we propose an unsupervised graph neural network Heterogeneous Deep Graph Infomax (HDGI) for heterogeneous graph representation learning. We use the meta-path structure to analyze the connections involving semantics in heterogeneous graphs and utilize graph convolution module and semantic-level attention mechanism to capture local representations. By maximizing local-global mutual information, HDGI effectively learns high-level node representations that can be utilized in downstream graph-related tasks. Experiment results show that HDGI remarkably outperforms state-of-the-art unsupervised graph representation learning methods on both classification and clustering tasks. By feeding the learned representations into a parametric model, such as logistic regression, we even achieve comparable performance in node classification tasks when comparing with state-of-the-art supervised end-to-end GNN models.
Tasks	Graph Representation Learning, Heterogeneous Node Classification, Node Classification, Representation Learning
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08538v2
PDF	https://arxiv.org/pdf/1911.08538v2.pdf
PWC	https://paperswithcode.com/paper/heterogeneous-deep-graph-infomax
Repo	https://github.com/YuxiangRen/Heterogeneous-Deep-Graph-Infomax
Framework	pytorch

Question Answering as Global Reasoning over Semantic Abstractions


Title	Question Answering as Global Reasoning over Semantic Abstractions
Authors	Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Dan Roth
Abstract	We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Representing multiple abstractions as a family of graphs, we translate question answering (QA) into a search for an optimal subgraph that satisfies certain global and local properties. This formulation generalizes several prior structured QA systems. Our system, SEMANTICILP, demonstrates strong performance on two domains simultaneously. In particular, on a collection of challenging science QA datasets, it outperforms various state-of-the-art approaches, including neural models, broad coverage information retrieval, and specialized techniques using structured knowledge bases, by 2%-6%.
Tasks	Information Retrieval, Question Answering
Published	2019-06-09
URL	https://arxiv.org/abs/1906.03672v1
PDF	https://arxiv.org/pdf/1906.03672v1.pdf
PWC	https://paperswithcode.com/paper/question-answering-as-global-reasoning-over
Repo	https://github.com/allenai/semanticilp
Framework	none

Spherical View Synthesis for Self-Supervised 360 Depth Estimation


Title	Spherical View Synthesis for Self-Supervised 360 Depth Estimation
Authors	Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, Petros Daras
Abstract	Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https://vcl3d.github.io/SphericalViewSynthesis/.
Tasks	3D Depth Estimation, Depth Estimation
Published	2019-09-17
URL	https://arxiv.org/abs/1909.08112v1
PDF	https://arxiv.org/pdf/1909.08112v1.pdf
PWC	https://paperswithcode.com/paper/spherical-view-synthesis-for-self-supervised
Repo	https://github.com/VCL3D/SphericalViewSynthesis
Framework	pytorch

Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective


Title	Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective
Authors	Kaidi Xu, Hongge Chen, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Mingyi Hong, Xue Lin
Abstract	Graph neural networks (GNNs) which apply the deep neural networks to graph data have achieved significant performance for the task of semi-supervised node classification. However, only few work has addressed the adversarial robustness of GNNs. In this paper, we first present a novel gradient-based attack method that facilitates the difficulty of tackling discrete graph data. When comparing to current adversarial attacks on GNNs, the results show that by only perturbing a small number of edge perturbations, including addition and deletion, our optimization-based attack can lead to a noticeable decrease in classification performance. Moreover, leveraging our gradient-based attack, we propose the first optimization-based adversarial training for GNNs. Our method yields higher robustness against both different gradient based and greedy attack methods without sacrificing classification accuracy on original graph.
Tasks	Node Classification
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04214v3
PDF	https://arxiv.org/pdf/1906.04214v3.pdf
PWC	https://paperswithcode.com/paper/topology-attack-and-defense-for-graph-neural
Repo	https://github.com/KaidiXu/GCN_ADV_Train
Framework	tf

Modelling the influence of data structure on learning in neural networks: the hidden manifold model


Title	Modelling the influence of data structure on learning in neural networks: the hidden manifold model
Authors	Sebastian Goldt, Marc Mézard, Florent Krzakala, Lenka Zdeborová
Abstract	The lack of crisp mathematical models that capture the structure of real-world data sets is a major obstacle to the detailed theoretical understanding of deep neural networks. Here, we introduce a generative model for data sets that we call the hidden manifold model (HMM). The idea is to have high-dimensional inputs lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single layer decoder or generator in a generative adversarial network. We first demonstrate the effect of structured data sets by experimentally comparing the dynamics and the performance of two-layer neural networks trained on three different data sets: (i) an unstructured synthetic data set containing random i.i.d. inputs, (ii) a structured data set drawn from the HMM and (iii) a simple canonical data set containing MNIST images. We pinpoint two phenomena related to the dynamics of the networks and their ability to generalise that only appear when training on structured data sets, and we experimentally demonstrate that training networks on data sets drawn from the HMM reproduces both the phenomena seen during training on real dataset. Our main theoretical result is that we show that the learning dynamics in the hidden manifold model is amenable to an analytical treatment by proving a “Gaussian Equivalence Theorem”, opening the way to further detailed theoretical studies. In particular, we show how the dynamics of stochastic gradient descent for a two-layer network is captured by a set of ordinary differential equations that track the generalisation error at all times.
Tasks
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11500v2
PDF	https://arxiv.org/pdf/1909.11500v2.pdf
PWC	https://paperswithcode.com/paper/modelling-the-influence-of-data-structure-on-1
Repo	https://github.com/sgoldt/hidden-manifold-model
Framework	none