Paper Group AWR 111
Using Text Embeddings for Causal Inference. INFaaS: A Model-less Inference Serving System. Distorted Representation Space Characterization Through Backpropagated Gradients. Choosing Transfer Languages for Cross-Lingual Learning. Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition. Saliency detection based on structural dissimila …
Using Text Embeddings for Causal Inference
Title | Using Text Embeddings for Causal Inference |
Authors | Victor Veitch, Dhanya Sridhar, David M. Blei |
Abstract | We address causal inference with text documents. For example, does adding a theorem to a paper affect its chance of acceptance? Does reporting the gender of a forum post author affect the popularity of the post? We estimate these effects from observational data, where they may be confounded by features of the text such as the subject or writing quality. Although the text suffices for causal adjustment, it is prohibitively high-dimensional. The challenge is to find a low-dimensional text representation that can be used in causal inference. A key insight is that causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. Our proposed method adapts deep language models to learn low-dimensional embeddings from text that predict these values well; these embeddings suffice for causal adjustment. We establish theoretical properties of this method. We study it empirically on semi-simulated and real data on paper acceptance and forum post popularity. Code is available at https://github.com/blei-lab/causal-text-embeddings. |
Tasks | Causal Inference |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12741v1 |
https://arxiv.org/pdf/1905.12741v1.pdf | |
PWC | https://paperswithcode.com/paper/using-text-embeddings-for-causal-inference |
Repo | https://github.com/blei-lab/causal-text-embeddings |
Framework | tf |
INFaaS: A Model-less Inference Serving System
Title | INFaaS: A Model-less Inference Serving System |
Authors | Francisco Romero, Qian Li, Neeraja J. Yadwadkar, Christos Kozyrakis |
Abstract | Despite existing work in machine learning inference serving, ease-of-use and cost efficiency remain key challenges. Developers must manually match the performance, accuracy, and cost constraints of their applications to decisions about selecting the right model and model optimizations, suitable hardware architectures, and auto-scaling configurations. These interacting decisions are difficult to make for users, especially when the application load varies, applications evolve, and the available resources vary over time. Thus, users often end up making decisions that overprovision resources. This paper introduces INFaaS, a model-less inference-as-a-service system that relieves users of making these decisions. INFaaS provides a simple interface allowing users to specify their inference task, and performance and accuracy requirements. To implement this interface, INFaaS generates and leverages model-variants, versions of a model that differ in resource footprints, latencies, costs, and accuracies. Based on the characteristics of the model-variants, INFaaS automatically navigates the decision space on behalf of users to meet user-specified objectives: (a) it selects a model, hardware architecture, and any compiler optimizations, and (b) it makes scaling and resource allocation decisions. By sharing models across users and hardware resources across models, INFaaS achieves up to 150x cost savings, 1.5x higher throughput, and violates latency objectives 1.5x less frequently, compared to Clipper and TensorFlow Serving. |
Tasks | Model Selection |
Published | 2019-05-30 |
URL | https://arxiv.org/abs/1905.13348v5 |
https://arxiv.org/pdf/1905.13348v5.pdf | |
PWC | https://paperswithcode.com/paper/infaas-managed-model-less-inference-serving |
Repo | https://github.com/stanford-mast/INFaaS |
Framework | tf |
Distorted Representation Space Characterization Through Backpropagated Gradients
Title | Distorted Representation Space Characterization Through Backpropagated Gradients |
Authors | Gukyeong Kwon, Mohit Prabhushankar, Dogancan Temel, Ghassan AlRegib |
Abstract | In this paper, we utilize weight gradients from backpropagation to characterize the representation space learned by deep learning algorithms. We demonstrate the utility of such gradients in applications including perceptual image quality assessment and out-of-distribution classification. The applications are chosen to validate the effectiveness of gradients as features when the test image distribution is distorted from the train image distribution. In both applications, the proposed gradient based features outperform activation features. In image quality assessment, the proposed approach is compared with other state of the art approaches and is generally the top performing method on TID 2013 and MULTI-LIVE databases in terms of accuracy, consistency, linearity, and monotonic behavior. Finally, we analyze the effect of regularization on gradients using CURE-TSR dataset for out-of-distribution classification. |
Tasks | Image Quality Assessment |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.09998v1 |
https://arxiv.org/pdf/1908.09998v1.pdf | |
PWC | https://paperswithcode.com/paper/distorted-representation-space |
Repo | https://github.com/gukyeongkwon/distorted-representation-characterization |
Framework | pytorch |
Choosing Transfer Languages for Cross-Lingual Learning
Title | Choosing Transfer Languages for Cross-Lingual Learning |
Authors | Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Anastasopoulos, Patrick Littell, Graham Neubig |
Abstract | Cross-lingual transfer, where a high-resource transfer language is used to improve the accuracy of a low-resource task language, is now an invaluable tool for improving performance of natural language processing (NLP) on low-resource languages. However, given a particular task language, it is not clear which language to transfer from, and the standard strategy is to select languages based on ad hoc criteria, usually the intuition of the experimenter. Since a large number of features contribute to the success of cross-lingual transfer (including phylogenetic similarity, typological properties, lexical overlap, or size of available data), even the most enlightened experimenter rarely considers all these factors for the particular task at hand. In this paper, we consider this task of automatically selecting optimal transfer languages as a ranking problem, and build models that consider the aforementioned features to perform this prediction. In experiments on representative NLP tasks, we demonstrate that our model predicts good transfer languages much better than ad hoc baselines considering single features in isolation, and glean insights on what features are most informative for each different NLP tasks, which may inform future ad hoc selection even without use of our method. Code, data, and pre-trained models are available at https://github.com/neulab/langrank |
Tasks | Cross-Lingual Transfer |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1905.12688v2 |
https://arxiv.org/pdf/1905.12688v2.pdf | |
PWC | https://paperswithcode.com/paper/choosing-transfer-languages-for-cross-lingual |
Repo | https://github.com/neulab/langrank |
Framework | none |
Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition
Title | Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition |
Authors | Chao Gao, Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor |
Abstract | The Pommerman Team Environment is a recently proposed benchmark which involves a multi-agent domain with challenges such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards. The inaugural Pommerman Team Competition held at NeurIPS 2018 hosted 25 participants who submitted a team of 2 agents. Our submission nn_team_skynet955_skynet955 won 2nd place of the “learning agents’’ category. Our team is composed of 2 neural networks trained with state of the art deep reinforcement learning algorithms and makes use of concepts like reward shaping, curriculum learning, and an automatic reasoning module for action pruning. Here, we describe these elements and additionally we present a collection of open-sourced agents that can be used for training and testing in the Pommerman environment. Code available at: https://github.com/BorealisAI/pommerman-baseline |
Tasks | |
Published | 2019-04-20 |
URL | http://arxiv.org/abs/1905.01360v1 |
http://arxiv.org/pdf/1905.01360v1.pdf | |
PWC | https://paperswithcode.com/paper/190501360 |
Repo | https://github.com/BorealisAI/pommerman-baseline |
Framework | none |
Saliency detection based on structural dissimilarity induced by image quality assessment model
Title | Saliency detection based on structural dissimilarity induced by image quality assessment model |
Authors | Yang Li, Xuanqin Mou |
Abstract | The distinctiveness of image regions is widely used as the cue of saliency. Generally, the distinctiveness is computed according to the absolute difference of features. However, according to the image quality assessment (IQA) studies, the human visual system is highly sensitive to structural changes rather than absolute difference. Accordingly, we propose the computation of the structural dissimilarity between image patches as the distinctiveness measure for saliency detection. Similar to IQA models, the structural dissimilarity is computed based on the correlation of the structural features. The global structural dissimilarity of a patch to all the other patches represents saliency of the patch. We adopt two widely used structural features, namely the local contrast and gradient magnitude, into the structural dissimilarity computation in the proposed model. Without any postprocessing, the proposed model based on the correlation of either of the two structural features outperforms 11 state-of-the-art saliency models on three saliency databases. |
Tasks | Image Quality Assessment, Saliency Detection |
Published | 2019-05-24 |
URL | https://arxiv.org/abs/1905.10150v1 |
https://arxiv.org/pdf/1905.10150v1.pdf | |
PWC | https://paperswithcode.com/paper/saliency-detection-based-on-structural |
Repo | https://github.com/yangli-xjtu/SDS |
Framework | none |
UPSNet: A Unified Panoptic Segmentation Network
Title | UPSNet: A Unified Panoptic Segmentation Network |
Authors | Yuwen Xiong, Renjie Liao, Hengshuang Zhao, Rui Hu, Min Bai, Ersin Yumer, Raquel Urtasun |
Abstract | In this paper, we propose a unified panoptic segmentation network (UPSNet) for tackling the newly proposed panoptic segmentation task. On top of a single backbone residual network, we first design a deformable convolution based semantic segmentation head and a Mask R-CNN style instance segmentation head which solve these two subtasks simultaneously. More importantly, we introduce a parameter-free panoptic head which solves the panoptic segmentation via pixel-wise classification. It first leverages the logits from the previous two heads and then innovatively expands the representation for enabling prediction of an extra unknown class which helps better resolve the conflicts between semantic and instance segmentation. Additionally, it handles the challenge caused by the varying number of instances and permits back propagation to the bottom modules in an end-to-end manner. Extensive experimental results on Cityscapes, COCO and our internal dataset demonstrate that our UPSNet achieves state-of-the-art performance with much faster inference. Code has been made available at: https://github.com/uber-research/UPSNet |
Tasks | Instance Segmentation, Panoptic Segmentation, Semantic Segmentation |
Published | 2019-01-12 |
URL | http://arxiv.org/abs/1901.03784v2 |
http://arxiv.org/pdf/1901.03784v2.pdf | |
PWC | https://paperswithcode.com/paper/upsnet-a-unified-panoptic-segmentation |
Repo | https://github.com/uber-research/UPSNet |
Framework | pytorch |
Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation
Title | Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation |
Authors | Chenze Shao, Jinchao Zhang, Yang Feng, Fandong Meng, Jie Zhou |
Abstract | Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through generating target words independently and simultaneously. However, in the context of non-autoregressive translation, the word-level cross-entropy loss cannot model the target-side sequential dependency properly, leading to its weak correlation with the translation quality. As a result, NAT tends to generate influent translations with over-translation and under-translation errors. In this paper, we propose to train NAT to minimize the Bag-of-Ngrams (BoN) difference between the model output and the reference sentence. The bag-of-ngrams training objective is differentiable and can be efficiently calculated, which encourages NAT to capture the target-side sequential dependency and correlates well with the translation quality. We validate our approach on three translation tasks and show that our approach largely outperforms the NAT baseline by about 5.0 BLEU scores on WMT14 En$\leftrightarrow$De and about 2.5 BLEU scores on WMT16 En$\leftrightarrow$Ro. |
Tasks | Machine Translation |
Published | 2019-11-21 |
URL | https://arxiv.org/abs/1911.09320v1 |
https://arxiv.org/pdf/1911.09320v1.pdf | |
PWC | https://paperswithcode.com/paper/minimizing-the-bag-of-ngrams-difference-for |
Repo | https://github.com/ictnlp/BoN-NAT |
Framework | pytorch |
TENER: Adapting Transformer Encoder for Named Entity Recognition
Title | TENER: Adapting Transformer Encoder for Named Entity Recognition |
Authors | Hang Yan, Bocao Deng, Xiaonan Li, Xipeng Qiu |
Abstract | The Bidirectional long short-term memory networks (BiLSTM) have been widely used as an encoder in models solving the named entity recognition (NER) task. Recently, the Transformer is broadly adopted in various Natural Language Processing (NLP) tasks owing to its parallelism and advantageous performance. Nevertheless, the performance of the Transformer in NER is not as good as it is in other NLP tasks. In this paper, we propose TENER, a NER architecture adopting adapted Transformer Encoder to model the character-level features and word-level features. By incorporating the direction and relative distance aware attention and the un-scaled attention, we prove the Transformer-like encoder is just as effective for NER as other NLP tasks. |
Tasks | Chinese Named Entity Recognition, Named Entity Recognition |
Published | 2019-11-10 |
URL | https://arxiv.org/abs/1911.04474v3 |
https://arxiv.org/pdf/1911.04474v3.pdf | |
PWC | https://paperswithcode.com/paper/tener-adapting-transformer-encoder-for-name |
Repo | https://github.com/fastnlp/TENER |
Framework | pytorch |
Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search
Title | Satellite System Graph: Towards the Efficiency Up-Boundary of Graph-Based Approximate Nearest Neighbor Search |
Authors | Cong Fu, Changxu Wang, Deng Cai |
Abstract | Approximate Nearest Neighbor Search (ANNS) in high dimensional space is essential in database and information retrieval. Recently, there has been a surge of interests in exploring efficient graph-based indices for the ANNS problem. Among them, the NSG has resurrected the theory of Monotonic Search Networks (MSNET) and achieved the state-of-the-art performance. However, the performance of the NSG deviates from a potentially optimal position due to the high sparsity of the graph. Specifically, though the average degree of the graph is small, their search algorithm travels a longer way to reach the query. Integrating both factors, the total search complexity (i.e., the number of distance calculations) is not minimized as their wish. In addition, NSG suffers from a high indexing time complexity, which limits the efficiency and the scalability of their method. In this paper, we aim to further mine the potential of the MSNETs. Inspired by the message transfer mechanism of the communication satellite system, we find a new family of MSNETs, namely the Satellite System Graphs (SSG). In particular, while inheriting the superior ANNS properties from the MSNET, we try to ensure the angles between the edges to be no smaller than a given value. Consequently, each node in the graph builds effective connections to its neighborhood omnidirectionally, which ensures an efficient search-routing on the graph like the message transfer among the satellites. We also propose an approximation of the SSG, Navigating SSG, to increase the efficiency of indexing. Both theoretical and extensive experimental analysis are provided to demonstrate the strengths of the proposed approach over the existing state-of-the-art algorithms. Our code has been released on GitHub. |
Tasks | Information Retrieval |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06146v2 |
https://arxiv.org/pdf/1907.06146v2.pdf | |
PWC | https://paperswithcode.com/paper/satellite-system-graph-towards-the-efficiency |
Repo | https://github.com/ZJULearning/SSG |
Framework | none |
Heterogeneous Deep Graph Infomax
Title | Heterogeneous Deep Graph Infomax |
Authors | Yuxiang Ren, Bo Liu, Chao Huang, Peng Dai, Liefeng Bo, Jiawei Zhang |
Abstract | Graph representation learning is to learn universal node representations that preserve both node attributes and structural information. The derived node representations can be used to serve various downstream tasks, such as node classification and node clustering. When a graph is heterogeneous, the problem becomes more challenging than the homogeneous graph node learning problem. Inspired by the emerging information theoretic-based learning algorithm, in this paper we propose an unsupervised graph neural network Heterogeneous Deep Graph Infomax (HDGI) for heterogeneous graph representation learning. We use the meta-path structure to analyze the connections involving semantics in heterogeneous graphs and utilize graph convolution module and semantic-level attention mechanism to capture local representations. By maximizing local-global mutual information, HDGI effectively learns high-level node representations that can be utilized in downstream graph-related tasks. Experiment results show that HDGI remarkably outperforms state-of-the-art unsupervised graph representation learning methods on both classification and clustering tasks. By feeding the learned representations into a parametric model, such as logistic regression, we even achieve comparable performance in node classification tasks when comparing with state-of-the-art supervised end-to-end GNN models. |
Tasks | Graph Representation Learning, Heterogeneous Node Classification, Node Classification, Representation Learning |
Published | 2019-11-19 |
URL | https://arxiv.org/abs/1911.08538v2 |
https://arxiv.org/pdf/1911.08538v2.pdf | |
PWC | https://paperswithcode.com/paper/heterogeneous-deep-graph-infomax |
Repo | https://github.com/YuxiangRen/Heterogeneous-Deep-Graph-Infomax |
Framework | pytorch |
Question Answering as Global Reasoning over Semantic Abstractions
Title | Question Answering as Global Reasoning over Semantic Abstractions |
Authors | Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Dan Roth |
Abstract | We propose a novel method for exploiting the semantic structure of text to answer multiple-choice questions. The approach is especially suitable for domains that require reasoning over a diverse set of linguistic constructs but have limited training data. To address these challenges, we present the first system, to the best of our knowledge, that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers. Representing multiple abstractions as a family of graphs, we translate question answering (QA) into a search for an optimal subgraph that satisfies certain global and local properties. This formulation generalizes several prior structured QA systems. Our system, SEMANTICILP, demonstrates strong performance on two domains simultaneously. In particular, on a collection of challenging science QA datasets, it outperforms various state-of-the-art approaches, including neural models, broad coverage information retrieval, and specialized techniques using structured knowledge bases, by 2%-6%. |
Tasks | Information Retrieval, Question Answering |
Published | 2019-06-09 |
URL | https://arxiv.org/abs/1906.03672v1 |
https://arxiv.org/pdf/1906.03672v1.pdf | |
PWC | https://paperswithcode.com/paper/question-answering-as-global-reasoning-over |
Repo | https://github.com/allenai/semanticilp |
Framework | none |
Spherical View Synthesis for Self-Supervised 360 Depth Estimation
Title | Spherical View Synthesis for Self-Supervised 360 Depth Estimation |
Authors | Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, Petros Daras |
Abstract | Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at https://vcl3d.github.io/SphericalViewSynthesis/. |
Tasks | 3D Depth Estimation, Depth Estimation |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.08112v1 |
https://arxiv.org/pdf/1909.08112v1.pdf | |
PWC | https://paperswithcode.com/paper/spherical-view-synthesis-for-self-supervised |
Repo | https://github.com/VCL3D/SphericalViewSynthesis |
Framework | pytorch |
Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective
Title | Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective |
Authors | Kaidi Xu, Hongge Chen, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Mingyi Hong, Xue Lin |
Abstract | Graph neural networks (GNNs) which apply the deep neural networks to graph data have achieved significant performance for the task of semi-supervised node classification. However, only few work has addressed the adversarial robustness of GNNs. In this paper, we first present a novel gradient-based attack method that facilitates the difficulty of tackling discrete graph data. When comparing to current adversarial attacks on GNNs, the results show that by only perturbing a small number of edge perturbations, including addition and deletion, our optimization-based attack can lead to a noticeable decrease in classification performance. Moreover, leveraging our gradient-based attack, we propose the first optimization-based adversarial training for GNNs. Our method yields higher robustness against both different gradient based and greedy attack methods without sacrificing classification accuracy on original graph. |
Tasks | Node Classification |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04214v3 |
https://arxiv.org/pdf/1906.04214v3.pdf | |
PWC | https://paperswithcode.com/paper/topology-attack-and-defense-for-graph-neural |
Repo | https://github.com/KaidiXu/GCN_ADV_Train |
Framework | tf |
Modelling the influence of data structure on learning in neural networks: the hidden manifold model
Title | Modelling the influence of data structure on learning in neural networks: the hidden manifold model |
Authors | Sebastian Goldt, Marc Mézard, Florent Krzakala, Lenka Zdeborová |
Abstract | The lack of crisp mathematical models that capture the structure of real-world data sets is a major obstacle to the detailed theoretical understanding of deep neural networks. Here, we introduce a generative model for data sets that we call the hidden manifold model (HMM). The idea is to have high-dimensional inputs lie on a lower-dimensional manifold, with labels that depend only on their position within this manifold, akin to a single layer decoder or generator in a generative adversarial network. We first demonstrate the effect of structured data sets by experimentally comparing the dynamics and the performance of two-layer neural networks trained on three different data sets: (i) an unstructured synthetic data set containing random i.i.d. inputs, (ii) a structured data set drawn from the HMM and (iii) a simple canonical data set containing MNIST images. We pinpoint two phenomena related to the dynamics of the networks and their ability to generalise that only appear when training on structured data sets, and we experimentally demonstrate that training networks on data sets drawn from the HMM reproduces both the phenomena seen during training on real dataset. Our main theoretical result is that we show that the learning dynamics in the hidden manifold model is amenable to an analytical treatment by proving a “Gaussian Equivalence Theorem”, opening the way to further detailed theoretical studies. In particular, we show how the dynamics of stochastic gradient descent for a two-layer network is captured by a set of ordinary differential equations that track the generalisation error at all times. |
Tasks | |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11500v2 |
https://arxiv.org/pdf/1909.11500v2.pdf | |
PWC | https://paperswithcode.com/paper/modelling-the-influence-of-data-structure-on-1 |
Repo | https://github.com/sgoldt/hidden-manifold-model |
Framework | none |