February 1, 2020

3291 words 16 mins read

Paper Group AWR 201

Train, Sort, Explain: Learning to Diagnose Translation Models. A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks. Black-Box Adversarial Attack with Transferable Model-based Embedding. Loss Aware Post-training Quantization. Relation Distillation Networks for Video Object Detection. PyRep: Bringing V-REP to Deep …

Train, Sort, Explain: Learning to Diagnose Translation Models


Title	Train, Sort, Explain: Learning to Diagnose Translation Models
Authors	Robert Schwarzenberg, David Harbecke, Vivien Macketanz, Eleftherios Avramidis, Sebastian Möller
Abstract	Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.
Tasks
Published	2019-03-28
URL	http://arxiv.org/abs/1903.12017v1
PDF	http://arxiv.org/pdf/1903.12017v1.pdf
PWC	https://paperswithcode.com/paper/train-sort-explain-learning-to-diagnose
Repo	https://github.com/dfki-nlp/diamat
Framework	tf

A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks


Title	A Context-Aware Citation Recommendation Model with BERT and Graph Convolutional Networks
Authors	Chanwoo Jeong, Sion Jang, Hyuna Shin, Eunjeong Park, Sungchul Choi
Abstract	With the tremendous growth in the number of scientific papers being published, searching for references while writing a scientific paper is a time-consuming process. A technique that could add a reference citation at the appropriate place in a sentence will be beneficial. In this perspective, context-aware citation recommendation has been researched upon for around two decades. Many researchers have utilized the text data called the context sentence, which surrounds the citation tag, and the metadata of the target paper to find the appropriate cited research. However, the lack of well-organized benchmarking datasets and no model that can attain high performance has made the research difficult. In this paper, we propose a deep learning based model and well-organized dataset for context-aware paper citation recommendation. Our model comprises a document encoder and a context encoder, which uses Graph Convolutional Networks (GCN) layer and Bidirectional Encoder Representations from Transformers (BERT), which is a pre-trained model of textual data. By modifying the related PeerRead dataset, we propose a new dataset called FullTextPeerRead containing context sentences to cited references and paper metadata. To the best of our knowledge, This dataset is the first well-organized dataset for context-aware paper recommendation. The results indicate that the proposed model with the proposed datasets can attain state-of-the-art performance and achieve a more than 28% improvement in mean average precision (MAP) and recall@k.
Tasks
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06464v1
PDF	http://arxiv.org/pdf/1903.06464v1.pdf
PWC	https://paperswithcode.com/paper/a-context-aware-citation-recommendation-model
Repo	https://github.com/TeamLab/bert-gcn-for-paper-citation
Framework	tf

Black-Box Adversarial Attack with Transferable Model-based Embedding


Title	Black-Box Adversarial Attack with Transferable Model-based Embedding
Authors	Zhichao Huang, Tong Zhang
Abstract	We present a new method for black-box adversarial attack. Unlike previous methods that combined transfer-based and scored-based methods by using the gradient or initialization of a surrogate white-box model, this new method tries to learn a low-dimensional embedding using a pretrained model, and then performs efficient search within the embedding space to attack an unknown target network. The method produces adversarial perturbations with high level semantic patterns that are easily transferable. We show that this approach can greatly improve the query efficiency of black-box adversarial attack across different target network architectures. We evaluate our approach on MNIST, ImageNet and Google Cloud Vision API, resulting in a significant reduction on the number of queries. We also attack adversarially defended networks on CIFAR10 and ImageNet, where our method not only reduces the number of queries, but also improves the attack success rate.
Tasks	Adversarial Attack
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07140v2
PDF	https://arxiv.org/pdf/1911.07140v2.pdf
PWC	https://paperswithcode.com/paper/black-box-adversarial-attack-with-1
Repo	https://github.com/TransEmbedBA/TREMBA
Framework	pytorch

Loss Aware Post-training Quantization


Title	Loss Aware Post-training Quantization
Authors	Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, Avi Mendelson
Abstract	Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq
Tasks	Quantization
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07190v2
PDF	https://arxiv.org/pdf/1911.07190v2.pdf
PWC	https://paperswithcode.com/paper/loss-aware-post-training-quantization
Repo	https://github.com/ynahshan/nn-quantization-pytorch
Framework	pytorch

Relation Distillation Networks for Video Object Detection


Title	Relation Distillation Networks for Video Object Detection
Authors	Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, Tao Mei
Abstract	It has been well recognized that modeling object-to-object relations would be helpful for object detection. Nevertheless, the problem is not trivial especially when exploring the interactions between objects to boost video object detectors. The difficulty originates from the aspect that reliable object relations in a video should depend on not only the objects in the present frame but also all the supportive objects extracted over a long range span of the video. In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context. Specifically, we present Relation Distillation Networks (RDN) — a new architecture that novelly aggregates and propagates object relation to augment object features for detection. Technically, object proposals are first generated via Region Proposal Networks (RPN). RDN then, on one hand, models object relation via multi-stage reasoning, and on the other, progressively distills relation through refining supportive object proposals with high objectness scores in a cascaded manner. The learnt relation verifies the efficacy on both improving object detection in each frame and box linking across frames. Extensive experiments are conducted on ImageNet VID dataset, and superior results are reported when comparing to state-of-the-art methods. More remarkably, our RDN achieves 81.8% and 83.2% mAP with ResNet-101 and ResNeXt-101, respectively. When further equipped with linking and rescoring, we obtain to-date the best reported mAP of 83.8% and 84.7%.
Tasks	Object Detection, Video Object Detection
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09511v1
PDF	https://arxiv.org/pdf/1908.09511v1.pdf
PWC	https://paperswithcode.com/paper/relation-distillation-networks-for-video
Repo	https://github.com/Scalsol/mega.pytorch
Framework	pytorch

PyRep: Bringing V-REP to Deep Robot Learning


Title	PyRep: Bringing V-REP to Deep Robot Learning
Authors	Stephen James, Marc Freese, Andrew J. Davison
Abstract	PyRep is a toolkit for robot learning research, built on top of the virtual robotics experimentation platform (V-REP). Through a series of modifications and additions, we have created a tailored version of V-REP built with robot learning in mind. The new PyRep toolkit offers three improvements: (1) a simple and flexible API for robot control and scene manipulation, (2) a new rendering engine, and (3) speed boosts upwards of 10,000x in comparison to the previous Python Remote API. With these improvements, we believe PyRep is the ideal toolkit to facilitate rapid prototyping of learning algorithms in the areas of reinforcement learning, imitation learning, state estimation, mapping, and computer vision.
Tasks	Imitation Learning
Published	2019-06-26
URL	https://arxiv.org/abs/1906.11176v1
PDF	https://arxiv.org/pdf/1906.11176v1.pdf
PWC	https://paperswithcode.com/paper/pyrep-bringing-v-rep-to-deep-robot-learning
Repo	https://github.com/stepjam/PyRep
Framework	none

Avoidance Learning Using Observational Reinforcement Learning


Title	Avoidance Learning Using Observational Reinforcement Learning
Authors	David Venuto, Leonard Boussioux, Junhao Wang, Rola Dali, Jhelum Chakravorty, Yoshua Bengio, Doina Precup
Abstract	Imitation learning seeks to learn an expert policy from sampled demonstrations. However, in the real world, it is often difficult to find a perfect expert and avoiding dangerous behaviors becomes relevant for safety reasons. We present the idea of \textit{learning to avoid}, an objective opposite to imitation learning in some sense, where an agent learns to avoid a demonstrator policy given an environment. We define avoidance learning as the process of optimizing the agent’s reward while avoiding dangerous behaviors given by a demonstrator. In this work we develop a framework of avoidance learning by defining a suitable objective function for these problems which involves the \emph{distance} of state occupancy distributions of the expert and demonstrator policies. We use density estimates for state occupancy measures and use the aforementioned distance as the reward bonus for avoiding the demonstrator. We validate our theory with experiments using a wide range of partially observable environments. Experimental results show that we are able to improve sample efficiency during training compared to state of the art policy optimization and safety methods.
Tasks	Imitation Learning
Published	2019-09-24
URL	https://arxiv.org/abs/1909.11228v1
PDF	https://arxiv.org/pdf/1909.11228v1.pdf
PWC	https://paperswithcode.com/paper/avoidance-learning-using-observational
Repo	https://github.com/maximecb/gym-miniworld
Framework	pytorch

Benchmarks for Graph Embedding Evaluation


Title	Benchmarks for Graph Embedding Evaluation
Authors	Palash Goyal, Di Huang, Ankita Goswami, Sujit Rokka Chhetri, Arquimedes Canedo, Emilio Ferrara
Abstract	Graph embedding is the task of representing nodes of a graph in a low-dimensional space and its applications for graph tasks have gained significant traction in academia and industry. The primary difference among the many recently proposed graph embedding methods is the way they preserve the inherent properties of the graphs. However, in practice, comparing these methods is very challenging. The majority of methods report performance boosts on few selected real graphs. Therefore, it is difficult to generalize these performance improvements to other types of graphs. Given a graph, it is currently impossible to quantify the advantages of one approach over another. In this work, we introduce a principled framework to compare graph embedding methods. Our goal is threefold: (i) provide a unifying framework for comparing the performance of various graph embedding methods, (ii) establish a benchmark with real-world graphs that exhibit different structural properties, and (iii) provide users with a tool to identify the best graph embedding method for their data. This paper evaluates 4 of the most influential graph embedding methods and 4 traditional link prediction methods against a corpus of 100 real-world networks with varying properties. We organize the 100 networks in terms of their properties to get a better understanding of the embedding performance of these popular methods. We use the comparisons on our 100 benchmark graphs to define GFS-score, that can be applied to any embedding method to quantify its performance. We rank the state-of-the-art embedding approaches using the GFS-score and show that it can be used to understand and evaluate novel embedding approaches. We envision that the proposed framework (https://www.github.com/palash1992/GEM-Benchmark) will serve the community as a benchmarking platform to test and compare the performance of future graph embedding techniques.
Tasks	Graph Embedding, Link Prediction
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06543v3
PDF	https://arxiv.org/pdf/1908.06543v3.pdf
PWC	https://paperswithcode.com/paper/benchmarks-for-graph-embedding-evaluation
Repo	https://github.com/palash1992/GEM-Benchmark
Framework	none

Global Greedy Dependency Parsing


Title	Global Greedy Dependency Parsing
Authors	Zuchao Li, Hai Zhao, Kevin Parnow
Abstract	Most syntactic dependency parsing models may fall into one of two categories: transition- and graph-based models. The former models enjoy high inference efficiency with linear time complexity, but they rely on the stacking or re-ranking of partially-built parse trees to build a complete parse tree and are stuck with slower training for the necessity of dynamic oracle training. The latter, graph-based models, may boast better performance but are unfortunately marred by polynomial time inference. In this paper, we propose a novel parsing order objective, resulting in a novel dependency parsing model capable of both global (in sentence scope) feature extraction as in graph models and linear time inference as in transitional models. The proposed global greedy parser only uses two arc-building actions, left and right arcs, for projective parsing. When equipped with two extra non-projective arc-building actions, the proposed parser may also smoothly support non-projective parsing. Using multiple benchmark treebanks, including the Penn Treebank (PTB), the CoNLL-X treebanks, and the Universal Dependency Treebanks, we evaluate our parser and demonstrate that the proposed novel parser achieves good performance with faster training and decoding.
Tasks	Dependency Parsing
Published	2019-11-20
URL	https://arxiv.org/abs/1911.08673v3
PDF	https://arxiv.org/pdf/1911.08673v3.pdf
PWC	https://paperswithcode.com/paper/global-greedy-dependency-parsing
Repo	https://github.com/bcmi220/ggdp
Framework	none

Evaluating the Factual Consistency of Abstractive Text Summarization


Title	Evaluating the Factual Consistency of Abstractive Text Summarization
Authors	Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher
Abstract	Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents. We propose a weakly-supervised, model-based approach for verifying factual consistency and identifying conflicts between source documents and a generated summary. Training data is generated by applying a series of rule-based transformations to the sentences of source documents. The factual consistency model is then trained jointly for three tasks: 1) identify whether sentences remain factually consistent after transformation, 2) extract a span in the source documents to support the consistency prediction, 3) extract a span in the summary sentence that is inconsistent if one exists. Transferring this model to summaries generated by several state-of-the art models reveals that this highly scalable approach substantially outperforms previous models, including those trained with strong supervision using standard datasets for natural language inference and fact checking. Additionally, human evaluation shows that the auxiliary span extraction tasks provide useful assistance in the process of verifying factual consistency.
Tasks	Abstractive Text Summarization, Natural Language Inference, Text Summarization
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12840v1
PDF	https://arxiv.org/pdf/1910.12840v1.pdf
PWC	https://paperswithcode.com/paper/evaluating-the-factual-consistency-of
Repo	https://github.com/nargesam/factCC
Framework	tf

Explaining a black-box using Deep Variational Information Bottleneck Approach


Title	Explaining a black-box using Deep Variational Information Bottleneck Approach
Authors	Seojin Bang, Pengtao Xie, Heewook Lee, Wei Wu, Eric Xing
Abstract	Interpretable machine learning has gained much attention recently. Briefness and comprehensiveness are necessary in order to provide a large amount of information concisely when explaining a black-box decision system. However, existing interpretable machine learning methods fail to consider briefness and comprehensiveness simultaneously, leading to redundant explanations. We propose the variational information bottleneck for interpretation, VIBI, a system-agnostic interpretable method that provides a brief but comprehensive explanation. VIBI adopts an information theoretic principle, information bottleneck principle, as a criterion for finding such explanations. For each instance, VIBI selects key features that are maximally compressed about an input (briefness), and informative about a decision made by a black-box system on that input (comprehensive). We evaluate VIBI on three datasets and compare with state-of-the-art interpretable machine learning methods in terms of both interpretability and fidelity evaluated by human and quantitative metrics
Tasks	Interpretable Machine Learning
Published	2019-02-19
URL	https://arxiv.org/abs/1902.06918v2
PDF	https://arxiv.org/pdf/1902.06918v2.pdf
PWC	https://paperswithcode.com/paper/explaining-a-black-box-using-deep-variational
Repo	https://github.com/SeojinBang/TCR
Framework	pytorch

Does Generative Face Completion Help Face Recognition?


Title	Does Generative Face Completion Help Face Recognition?
Authors	Joe Mathai, Iacopo Masi, Wael AbdAlmageed
Abstract	Face occlusions, covering either the majority or discriminative parts of the face, can break facial perception and produce a drastic loss of information. Biometric systems such as recent deep face recognition models are not immune to obstructions or other objects covering parts of the face. While most of the current face recognition methods are not optimized to handle occlusions, there have been a few attempts to improve robustness directly in the training stage. Unlike those, we propose to study the effect of generative face completion on the recognition. We offer a face completion encoder-decoder, based on a convolutional operator with a gating mechanism, trained with an ample set of face occlusions. To systematically evaluate the impact of realistic occlusions on recognition, we propose to play the occlusion game: we render 3D objects onto different face parts, providing precious knowledge of what the impact is of effectively removing those occlusions. Extensive experiments on the Labeled Faces in the Wild (LFW), and its more difficult variant LFW-BLUFR, testify that face completion is able to partially restore face perception in machine vision systems for improved recognition.
Tasks	Face Recognition, Facial Inpainting
Published	2019-06-07
URL	https://arxiv.org/abs/1906.02858v1
PDF	https://arxiv.org/pdf/1906.02858v1.pdf
PWC	https://paperswithcode.com/paper/does-generative-face-completion-help-face
Repo	https://github.com/isi-vista/face-completion
Framework	pytorch

Graph Attention Auto-Encoders


Title	Graph Attention Auto-Encoders
Authors	Amin Salehi, Hasan Davulcu
Abstract	Auto-encoders have emerged as a successful framework for unsupervised learning. However, conventional auto-encoders are incapable of utilizing explicit relations in structured data. To take advantage of relations in graph-structured data, several graph auto-encoders have recently been proposed, but they neglect to reconstruct either the graph structure or node attributes. In this paper, we present the graph attention auto-encoder (GATE), a neural network architecture for unsupervised representation learning on graph-structured data. Our architecture is able to reconstruct graph-structured inputs, including both node attributes and the graph structure, through stacked encoder/decoder layers equipped with self-attention mechanisms. In the encoder, by considering node attributes as initial node representations, each layer generates new representations of nodes by attending over their neighbors’ representations. In the decoder, we attempt to reverse the encoding process to reconstruct node attributes. Moreover, node representations are regularized to reconstruct the graph structure. Our proposed architecture does not need to know the graph structure upfront, and thus it can be applied to inductive learning. Our experiments demonstrate competitive performance on several node classification benchmark datasets for transductive and inductive tasks, even exceeding the performance of supervised learning baselines in most cases.
Tasks	Node Classification, Representation Learning, Unsupervised Representation Learning
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10715v1
PDF	https://arxiv.org/pdf/1905.10715v1.pdf
PWC	https://paperswithcode.com/paper/graph-attention-auto-encoders
Repo	https://github.com/amin-salehi/GATE
Framework	none

PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain


Title	PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain
Authors	Sheng You, Ning You, Minxue Pan
Abstract	We propose a universal image reconstruction method to represent detailed images purely from binary sparse edge and flat color domain. Inspired by the procedures of painting, our framework, based on generative adversarial network, consists of three phases: Imitation Phase aims at initializing networks, followed by Generating Phase to reconstruct preliminary images. Moreover, Refinement Phase is utilized to fine-tune preliminary images into final outputs with details. This framework allows our model generating abundant high frequency details from sparse input information. We also explore the defects of disentangling style latent space implicitly from images, and demonstrate that explicit color domain in our model performs better on controllability and interpretability. In our experiments, we achieve outstanding results on reconstructing realistic images and translating hand drawn drafts into satisfactory paintings. Besides, within the domain of edge-to-image translation, our model PI-REC outperforms existing state-of-the-art methods on evaluations of realism and accuracy, both quantitatively and qualitatively.
Tasks	Image Reconstruction, Image-to-Image Translation
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10146v1
PDF	http://arxiv.org/pdf/1903.10146v1.pdf
PWC	https://paperswithcode.com/paper/pi-rec-progressive-image-reconstruction-1
Repo	https://github.com/youyuge34/PI-REC
Framework	pytorch

DeepFork: Supervised Prediction of Information Diffusion in GitHub


Title	DeepFork: Supervised Prediction of Information Diffusion in GitHub
Authors	Ramya Akula, Niloofar Yousefi, Ivan Garibay
Abstract	Information spreads on complex social networks extremely fast, in other words, a piece of information can go viral within no time. Often it is hard to barricade this diffusion prior to the significant occurrence of chaos, be it a social media or an online coding platform. GitHub is one such trending online focal point for any business to reach their potential contributors and customers, simultaneously. By exploiting such software development paradigm, millions of free software emerged lately in diverse communities. To understand human influence, information spread and evolution of transmitted information among assorted users in GitHub, we developed a deep neural network model: DeepFork, a supervised machine learning based approach that aims to predict information diffusion in complex social networks; considering node as well as topological features. In our empirical studies, we observed that information diffusion can be detected by link prediction using supervised learning. DeepFork outperforms other machine learning models as it better learns the discriminative patterns from the input features. DeepFork aids in understanding information spread and evolution through a bipartite network of users and repositories i.e., information flow from a user to repository to user.
Tasks	Link Prediction
Published	2019-10-17
URL	https://arxiv.org/abs/1910.07999v1
PDF	https://arxiv.org/pdf/1910.07999v1.pdf
PWC	https://paperswithcode.com/paper/deepfork-supervised-prediction-of-information
Repo	https://github.com/akula01/DeepFork
Framework	none