April 3, 2020

3129 words 15 mins read

Paper Group AWR 54

DivideMix: Learning with Noisy Labels as Semi-supervised Learning. Robust Explanations for Visual Question Answering. Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings. Restricting the Flow: Information Bottlenecks for Attribution. Adversarial-Learned Loss for Domain Adaptation. On the Role of Conceptualization in Commonsense …

DivideMix: Learning with Noisy Labels as Semi-supervised Learning


Title	DivideMix: Learning with Noisy Labels as Semi-supervised Learning
Authors	Junnan Li, Richard Socher, Steven C. H. Hoi
Abstract	Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted to reducing the annotation cost when learning with deep networks. Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. In particular, DivideMix models the per-sample loss distribution with a mixture model to dynamically divide the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, we simultaneously train two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, we improve the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. Code is available at https://github.com/LiJunnan1992/DivideMix .
Tasks
Published	2020-02-18
URL	https://arxiv.org/abs/2002.07394v1
PDF	https://arxiv.org/pdf/2002.07394v1.pdf
PWC	https://paperswithcode.com/paper/dividemix-learning-with-noisy-labels-as-semi-1
Repo	https://github.com/LiJunnan1992/DivideMix
Framework	pytorch

Robust Explanations for Visual Question Answering


Title	Robust Explanations for Visual Question Answering
Authors	Badri N. Patro, Shivansh Pate, Vinay P. Namboodiri
Abstract	In this paper, we propose a method to obtain robust explanations for visual question answering(VQA) that correlate well with the answers. Our model explains the answers obtained through a VQA model by providing visual and textual explanations. The main challenges that we address are i) Answers and textual explanations obtained by current methods are not well correlated and ii) Current methods for visual explanation do not focus on the right location for explaining the answer. We address both these challenges by using a collaborative correlated module which ensures that even if we do not train for noise based attacks, the enhanced correlation ensures that the right explanation and answer can be generated. We further show that this also aids in improving the generated visual and textual explanations. The use of the correlated module can be thought of as a robust method to verify if the answer and explanations are coherent. We evaluate this model using VQA-X dataset. We observe that the proposed method yields better textual and visual justification that supports the decision. We showcase the robustness of the model against a noise-based perturbation attack using corresponding visual and textual explanations. A detailed empirical analysis is shown. Here we provide source code link for our model \url{https://github.com/DelTA-Lab-IITK/CCM-WACV}.
Tasks	Question Answering, Visual Question Answering
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08730v1
PDF	https://arxiv.org/pdf/2001.08730v1.pdf
PWC	https://paperswithcode.com/paper/robust-explanations-for-visual-question
Repo	https://github.com/DelTA-Lab-IITK/CCM-WACV
Framework	caffe2

Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings


Title	Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings
Authors	Haoran Zhang, Amy X. Lu, Mohamed Abdalla, Matthew McDermott, Marzyeh Ghassemi
Abstract	In this work, we examine the extent to which embeddings may encode marginalized populations differently, and how this may lead to a perpetuation of biases and worsened performance on clinical tasks. We pretrain deep embedding models (BERT) on medical notes from the MIMIC-III hospital dataset, and quantify potential disparities using two approaches. First, we identify dangerous latent relationships that are captured by the contextual word embeddings using a fill-in-the-blank method with text from real clinical notes and a log probability bias score quantification. Second, we evaluate performance gaps across different definitions of fairness on over 50 downstream clinical prediction tasks that include detection of acute and chronic conditions. We find that classifiers trained from BERT representations exhibit statistically significant differences in performance, often favoring the majority group with regards to gender, language, ethnicity, and insurance status. Finally, we explore shortcomings of using adversarial debiasing to obfuscate subgroup information in contextual word embeddings, and recommend best practices for such deep embedding models in clinical settings.
Tasks	Word Embeddings
Published	2020-03-11
URL	https://arxiv.org/abs/2003.11515v1
PDF	https://arxiv.org/pdf/2003.11515v1.pdf
PWC	https://paperswithcode.com/paper/hurtful-words-quantifying-biases-in-clinical
Repo	https://github.com/MLforHealth/HurtfulWords
Framework	pytorch

Restricting the Flow: Information Bottlenecks for Attribution


Title	Restricting the Flow: Information Bottlenecks for Attribution
Authors	Karl Schulz, Leon Sixt, Federico Tombari, Tim Landgraf
Abstract	Attribution methods provide insights into the decision-making of machine learning models like artificial neural networks. For a given input sample, they assign a relevance score to each individual input variable, such as the pixels of an image. In this work we adapt the information bottleneck concept for attribution. By adding noise to intermediate feature maps we restrict the flow of information and can quantify (in bits) how much information image regions provide. We compare our method against ten baselines using three different metrics on VGG-16 and ResNet-50, and find that our methods outperform all baselines in five out of six settings. The method’s information-theoretic foundation provides an absolute frame of reference for attribution values (bits) and a guarantee that regions scored close to zero are not necessary for the network’s decision. For reviews: https://openreview.net/forum?id=S1xWh1rYwB For code: https://github.com/BioroboticsLab/IBA
Tasks	Decision Making
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00396v2
PDF	https://arxiv.org/pdf/2001.00396v2.pdf
PWC	https://paperswithcode.com/paper/restricting-the-flow-information-bottlenecks-1
Repo	https://github.com/BioroboticsLab/IBA
Framework	tf

Adversarial-Learned Loss for Domain Adaptation


Title	Adversarial-Learned Loss for Domain Adaptation
Authors	Minghao Chen, Shuai Zhao, Haifeng Liu, Deng Cai
Abstract	Recently, remarkable progress has been made in learning transferable representation across domains. Previous works in domain adaptation are majorly based on two techniques: domain-adversarial learning and self-training. However, domain-adversarial learning only aligns feature distributions between domains but does not consider whether the target features are discriminative. On the other hand, self-training utilizes the model predictions to enhance the discrimination of target features, but it is unable to explicitly align domain distributions. In order to combine the strengths of these two methods, we propose a novel method called Adversarial-Learned Loss for Domain Adaptation (ALDA). We first analyze the pseudo-label method, a typical self-training method. Nevertheless, there is a gap between pseudo-labels and the ground truth, which can cause incorrect training. Thus we introduce the confusion matrix, which is learned through an adversarial manner in ALDA, to reduce the gap and align the feature distributions. Finally, a new loss function is auto-constructed from the learned confusion matrix, which serves as the loss for unlabeled target samples. Our ALDA outperforms state-of-the-art approaches in four standard domain adaptation datasets. Our code is available at https://github.com/ZJULearning/ALDA.
Tasks	Domain Adaptation
Published	2020-01-04
URL	https://arxiv.org/abs/2001.01046v1
PDF	https://arxiv.org/pdf/2001.01046v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-learned-loss-for-domain
Repo	https://github.com/ZJULearning/ALDA
Framework	pytorch

On the Role of Conceptualization in Commonsense Knowledge Graph Construction


Title	On the Role of Conceptualization in Commonsense Knowledge Graph Construction
Authors	Mutian He, Yangqiu Song, Kun Xu, Yu Dong
Abstract	Commonsense knowledge graphs (CKG) like Atomic and ASER are substantially different from conventional KG as they consist of much larger number of nodes formed by loosely-structured texts, which, though, enable them to handle highly diverse queries in natural language regarding commonsense, lead to unique challenges to automatic KG construction methods. Besides identifying relations absent from the KG between nodes, the methods are also expected to explore absent nodes represented by texts, in which different real-world things or entities may appear. To deal with innumerable entities involved with commonsense in real world, we introduce to CKG construction methods conceptualization, i.e., to view entities mentioned in texts as instances of specific concepts or vice versa. We build synthetic triples by conceptualization, and further formulate the task as triple classification, handled by a discriminatory model with knowledge transferred from pretrained language models and fine-tuned by negative sampling. Experiments demonstrate that our methods could effectively identify plausible triples and expand the KG by triples of both new nodes and edges in high diversity and novelty.
Tasks	graph construction, Knowledge Graphs
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03239v1
PDF	https://arxiv.org/pdf/2003.03239v1.pdf
PWC	https://paperswithcode.com/paper/on-the-role-of-conceptualization-in
Repo	https://github.com/mutiann/ccc
Framework	pytorch

AMP Chain Graphs: Minimal Separators and Structure Learning Algorithms


Title	AMP Chain Graphs: Minimal Separators and Structure Learning Algorithms
Authors	Mohammad Ali Javidian, Marco Valtorta, Pooyan Jamshidi
Abstract	We address the problem of finding a minimal separator in an Andersson-Madigan-Perlman chain graph (AMP CG), namely, finding a set Z of nodes that separate a given non-adjacent pair of nodes such that no proper subset of Z separates that pair. We analyze several versions of this problem and offer polynomial-time algorithms for each. These include finding a minimal separator from a restricted set of nodes, finding a minimal separator for two given disjoint sets, and testing whether a given separator is minimal. We provide an extension of the decomposition approach for learning Bayesian networks (BNs) proposed by (Xie et. al.) to learn AMP CGs, which include BNs as a special case, under the faithfulness assumption and prove its correctness using the minimal separator results. The advantages of this decomposition approach hold in the more general setting: reduced complexity and increased power of computational independence tests. In addition, we show that the PC-like algorithm is order-dependent, in the sense that the output can depend on the order in which the variables are given. We propose two modifications of the PC-like algorithm that remove part or all of this order-dependence. Simulations under a variety of settings demonstrate the competitive performance of our decomposition-based method, called LCD-AMP, in comparison with the (modified version of) PC-like algorithm. In fact, the decomposition-based algorithm usually outperforms the PC-like algorithm. We empirically show that the results of both algorithms are more accurate and stable when the sample size is reasonably large and the underlying graph is sparse.
Tasks
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10870v1
PDF	https://arxiv.org/pdf/2002.10870v1.pdf
PWC	https://paperswithcode.com/paper/amp-chain-graphs-minimal-separators-and
Repo	https://github.com/majavid/AMPCGs2019
Framework	none

Attacking Neural Text Detectors


Title	Attacking Neural Text Detectors
Authors	Max Wolff
Abstract	Machine learning based language models have recently made significant progress, which introduces a danger to spread misinformation. To combat this potential danger, several methods have been proposed for detecting text written by these language models. This paper presents two classes of black-box attacks on these detectors, one which randomly replaces characters with homoglyphs, and the other a simple scheme to purposefully misspell words. The homoglyph and misspelling attacks decrease a popular neural text detector’s recall on neural text from 97.44% to 0.26% and 22.68%, respectively. Results also indicate that the attacks are transferable to other neural text detectors.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.11768v2
PDF	https://arxiv.org/pdf/2002.11768v2.pdf
PWC	https://paperswithcode.com/paper/attacking-neural-text-detectors
Repo	https://github.com/mwolff31/attacking_neural_text_detectors
Framework	none

Adversarial Generation of Continuous Implicit Shape Representations


Title	Adversarial Generation of Continuous Implicit Shape Representations
Authors	Marian Kleineberg, Matthias Fey, Frank Weichert
Abstract	This work presents a generative adversarial architecture for generating three-dimensional shapes based on signed distance representations. While the deep generation of shapes has been mostly tackled by voxel and surface point cloud approaches, our generator learns to approximate the signed distance for any point in space given prior latent information. Although structurally similar to generative point cloud approaches, this formulation can be evaluated with arbitrary point density during inference, leading to fine-grained details in generated outputs. Furthermore, we study the effects of using either progressively growing voxel- or point-processing networks as discriminators, and propose a refinement scheme to strengthen the generator’s capabilities in modeling the zero iso-surface decision boundary of shapes. We train our approach on the ShapeNet benchmark dataset and validate, both quantitatively and qualitatively, its performance in generating realistic 3D shapes.
Tasks
Published	2020-02-02
URL	https://arxiv.org/abs/2002.00349v2
PDF	https://arxiv.org/pdf/2002.00349v2.pdf
PWC	https://paperswithcode.com/paper/adversarial-generation-of-continuous-implicit
Repo	https://github.com/marian42/shapegan
Framework	pytorch

Differentiate Everything with a Reversible Programming Language


Title	Differentiate Everything with a Reversible Programming Language
Authors	Jin-Guo Liu, Taine Zhao
Abstract	This paper considers the source-to-source automatic differentiation (AD) in a reversible language. We start by reviewing the limitations of traditional AD frameworks. To solve the issues in these frameworks, we developed a reversible eDSL NiLang in Julia that can differentiate a general program while being compatible with Julia’s ecosystem. It empowers users the flexibility to tradeoff time, space, and energy so that one can use it to obtain gradients and Hessians ranging for elementary mathematical functions, sparse matrix operations, and linear algebra that are widely used in scientific programming. We demonstrate that a source-to-source AD framework can achieve the state-of-the-art performance by showing and benchmarking several examples. Finally, we will discuss the challenges that we face towards rigorous reversible programming, mainly from the instruction and hardware perspective.
Tasks
Published	2020-03-10
URL	https://arxiv.org/abs/2003.04617v1
PDF	https://arxiv.org/pdf/2003.04617v1.pdf
PWC	https://paperswithcode.com/paper/differentiate-everything-with-a-reversible
Repo	https://github.com/GiggleLiu/NiLang.jl
Framework	none

Low Rank Saddle Free Newton: Algorithm and Analysis


Title	Low Rank Saddle Free Newton: Algorithm and Analysis
Authors	Thomas O’Leary-Roseberry, Nick Alger, Omar Ghattas
Abstract	Many tasks in engineering fields and machine learning involve minimizing a high dimensional non-convex function. The existence of saddle points poses a central challenge in practice. The Saddle Free Newton (SFN) algorithm can rapidly escape high dimensional saddle points by using the absolute value of the Hessian of the empirical risk function. In SFN, a Lanczos type procedure is used to approximate the absolute value of the Hessian. Motivated by recent empirical works that note neural network training Hessians are typically low rank, we propose using approximation via scalable randomized low rank methods. Such factorizations can be efficiently inverted via Sherman Morrison Woodbury formula. We derive bounds for convergence rates in expectation for a stochastic version of the algorithm, which quantify errors incurred in subsampling as well as in approximating the Hessian via low rank factorization. We test the method on standard neural network training benchmark problems: MNIST and CIFAR10. Numerical results demonstrate that in addition to avoiding saddle points, the method can converge faster than first order methods, and the Hessian can be subsampled significantly relative to the gradient and retain superior performance for the method.
Tasks
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02881v1
PDF	https://arxiv.org/pdf/2002.02881v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-saddle-free-newton-algorithm-and
Repo	https://github.com/tomoleary/hessianlearn
Framework	tf

Paraphrase Generation with Latent Bag of Words


Title	Paraphrase Generation with Latent Bag of Words
Authors	Yao Fu, Yansong Feng, John P. Cunningham
Abstract	Paraphrase generation is a longstanding important problem in natural language processing. In addition, recent progress in deep generative models has shown promising results on discrete latent variables for text generation. Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation. We ground the semantics of a discrete latent variable by the BOW from the target sentences. We use this latent variable to build a fully differentiable content planning and surface realization model. Specifically, we use source words to predict their neighbors and model the target BOW with a mixture of softmax. We use Gumbel top-k reparameterization to perform differentiable subset sampling from the predicted BOW distribution. We retrieve the sampled word embeddings and use them to augment the decoder and guide its generation search space. Our latent BOW model not only enhances the decoder, but also exhibits clear interpretability. We show the model interpretability with regard to \emph{(i)} unsupervised learning of word neighbors \emph{(ii)} the step-by-step generation procedure. Extensive experiments demonstrate the transparent and effective generation process of this model.\footnote{Our code can be found at \url{https://github.com/FranxYao/dgm_latent_bow}}
Tasks	Paraphrase Generation, Text Generation, Word Embeddings
Published	2020-01-07
URL	https://arxiv.org/abs/2001.01941v1
PDF	https://arxiv.org/pdf/2001.01941v1.pdf
PWC	https://paperswithcode.com/paper/paraphrase-generation-with-latent-bag-of-1
Repo	https://github.com/FranxYao/dgm_latent_bow
Framework	tf

Improving Generalizability of Fake News Detection Methods using Propensity Score Matching


Title	Improving Generalizability of Fake News Detection Methods using Propensity Score Matching
Authors	Bo Ni, Zhichun Guo, Jianing Li, Meng Jiang
Abstract	Recently, due to the booming influence of online social networks, detecting fake news is drawing significant attention from both academic communities and general public. In this paper, we consider the existence of confounding variables in the features of fake news and use Propensity Score Matching (PSM) to select generalizable features in order to reduce the effects of the confounding variables. Experimental results show that the generalizability of fake news method is significantly better by using PSM than using raw frequency to select features. We investigate multiple types of fake news methods (classifiers) such as logistic regression, random forests, and support vector machines. We have consistent observations of performance improvement.
Tasks	Fake News Detection
Published	2020-01-28
URL	https://arxiv.org/abs/2002.00838v1
PDF	https://arxiv.org/pdf/2002.00838v1.pdf
PWC	https://paperswithcode.com/paper/improving-generalizability-of-fake-news
Repo	https://github.com/Arstanley/fakenews_pscore_match
Framework	none

D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features


Title	D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features
Authors	Xuyang Bai, Zixin Luo, Lei Zhou, Hongbo Fu, Long Quan, Chiew-Lan Tai
Abstract	A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.code release
Tasks	Point Cloud Registration
Published	2020-03-06
URL	https://arxiv.org/abs/2003.03164v1
PDF	https://arxiv.org/pdf/2003.03164v1.pdf
PWC	https://paperswithcode.com/paper/d3feat-joint-learning-of-dense-detection-and
Repo	https://github.com/XuyangBai/D3Feat
Framework	tf

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection


Title	Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
Authors	Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin
Abstract	Arbitrary shape text detection is a challenging task due to the high variety and complexity of scenes texts. In this paper, we propose a novel unified relational reasoning graph network for arbitrary shape text detection. In our method, an innovative local graph bridges a text proposal model via Convolutional Neural Network (CNN) and a deep relational reasoning network via Graph Convolutional Network (GCN), making our network end-to-end trainable. To be concrete, every text instance will be divided into a series of small rectangular components, and the geometry attributes (e.g., height, width, and orientation) of the small components will be estimated by our text proposal model. Given the geometry attributes, the local graph construction model can roughly establish linkages between different text components. For further reasoning and deducing the likelihood of linkages between the component and its neighbors, we adopt a graph-based network to perform deep relational reasoning on local graphs. Experiments on public available datasets demonstrate the state-of-the-art performance of our method.
Tasks	graph construction, Relational Reasoning
Published	2020-03-17
URL	https://arxiv.org/abs/2003.07493v1
PDF	https://arxiv.org/pdf/2003.07493v1.pdf
PWC	https://paperswithcode.com/paper/deep-relational-reasoning-graph-network-for
Repo	https://github.com/GXYM/DRRG
Framework	pytorch