February 2, 2020

3335 words 16 mins read

Paper Group AWR 46

Does BERT agree? Evaluating knowledge of structure dependence through agreement relations. Maybe Deep Neural Networks are the Best Choice for Modeling Source Code. Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. SpanBERT: Improving Pre- …

Does BERT agree? Evaluating knowledge of structure dependence through agreement relations


Title	Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
Authors	Geoff Bacon, Terry Regier
Abstract	Learning representations that accurately model semantics is an important goal of natural language processing research. Many semantic phenomena depend on syntactic structure. Recent work examines the extent to which state-of-the-art models for pre-training representations, such as BERT, capture such structure-dependent phenomena, but is largely restricted to one phenomenon in English: number agreement between subjects and verbs. We evaluate BERT’s sensitivity to four types of structure-dependent agreement relations in a new semi-automatically curated dataset across 26 languages. We show that both the single-language and multilingual BERT models capture syntax-sensitive agreement patterns well in general, but we also highlight the specific linguistic contexts in which their performance degrades.
Tasks
Published	2019-08-26
URL	https://arxiv.org/abs/1908.09892v1
PDF	https://arxiv.org/pdf/1908.09892v1.pdf
PWC	https://paperswithcode.com/paper/does-bert-agree-evaluating-knowledge-of
Repo	https://github.com/geoffbacon/does-bert-agree
Framework	none

Maybe Deep Neural Networks are the Best Choice for Modeling Source Code


Title	Maybe Deep Neural Networks are the Best Choice for Modeling Source Code
Authors	Rafael-Michael Karampatsis, Charles Sutton
Abstract	Statistical language modeling techniques have successfully been applied to source code, yielding a variety of new software development tools, such as tools for code suggestion and improving readability. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. But traditional language models limit the vocabulary to a fixed set of common words. For code, this strong assumption has been shown to have a significant negative effect on predictive performance. But the open vocabulary version of the neural network language models for code have not been introduced in the literature. We present a new open-vocabulary neural language model for code that is not limited to a fixed vocabulary of identifier names. We employ a segmentation into subword units, subsequences of tokens chosen based on a compression criterion, following previous work in machine translation. Our network achieves best in class performance, outperforming even the state-of-the-art methods of Hellendoorn and Devanbu that are designed specifically to model code. Furthermore, we present a simple method for dynamically adapting the model to a new test project, resulting in increased performance. We showcase our methodology on code corpora in three different languages of over a billion tokens each, hundreds of times larger than in previous work. To our knowledge, this is the largest neural language model for code that has been reported.
Tasks	Language Modelling, Machine Translation
Published	2019-03-13
URL	http://arxiv.org/abs/1903.05734v1
PDF	http://arxiv.org/pdf/1903.05734v1.pdf
PWC	https://paperswithcode.com/paper/maybe-deep-neural-networks-are-the-best
Repo	https://github.com/mast-group/OpenVocabCodeNLM
Framework	tf

Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling


Title	Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling
Authors	Dominik Schlechtweg, Cennet Oguz, Sabine Schulte im Walde
Abstract	We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. We support the results with an empirical study finding that the models react differently when provided with additional second-order information. Our findings reveal a basic property of Skip-Gram with Negative Sampling and point towards an explanation of its success on a variety of tasks.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02479v2
PDF	https://arxiv.org/pdf/1906.02479v2.pdf
PWC	https://paperswithcode.com/paper/second-order-co-occurrence-sensitivity-of
Repo	https://github.com/Garrafao/SecondOrder
Framework	none

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression


Title	Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
Authors	Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese
Abstract	Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that $IoU$ can be directly used as a regression loss. However, $IoU$ has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.
Tasks	Object Detection
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09630v2
PDF	http://arxiv.org/pdf/1902.09630v2.pdf
PWC	https://paperswithcode.com/paper/generalized-intersection-over-union-a-metric
Repo	https://github.com/RuiminChen/GIou_loss_caffe
Framework	none

SpanBERT: Improving Pre-training by Representing and Predicting Spans


Title	SpanBERT: Improving Pre-training by Representing and Predicting Spans
Authors	Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy
Abstract	We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT-large, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even show gains on GLUE.
Tasks	Coreference Resolution, Linguistic Acceptability, Natural Language Inference, Open-Domain Question Answering, Question Answering, Relation Extraction, Semantic Textual Similarity, Sentiment Analysis
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10529v3
PDF	https://arxiv.org/pdf/1907.10529v3.pdf
PWC	https://paperswithcode.com/paper/spanbert-improving-pre-training-by
Repo	https://github.com/facebookresearch/SpanBERT
Framework	pytorch

The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge


Title	The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge
Authors	Nicholas Heller, Fabian Isensee, Klaus H. Maier-Hein, Xiaoshuai Hou, Chunmei Xie, Fengyi Li, Yang Nan, Guangrui Mu, Zhiyong Lin, Miofei Han, Guang Yao, Yaozong Gao, Yao Zhang, Yixin Wang, Feng Hou, Jiawei Yang, Guangwei Xiong, Jiang Tian, Cheng Zhong, Jun Ma, Jack Rickman, Joshua Dean, Bethany Stai, Resha Tejpaul, Makinna Oestreich, Paul Blake, Heather Kaluzniak, Shaneabbas Raza, Joel Rosenberg, Keenan Moore, Edward Walczak, Zachary Rengel, Zach Edgerton, Ranveer Vasdev, Matthew Peterson, Sean McSweeney, Sarah Peterson, Arveen Kalapara, Niranjan Sathianathen, Christopher Weight, Nikolaos Papanikolopoulos
Abstract	There is a large body of literature linking anatomic and geometric characteristics of kidney tumors to perioperative and oncologic outcomes. Semantic segmentation of these tumors and their host kidneys is a promising tool for quantitatively characterizing these lesions, but its adoption is limited due to the manual effort required to produce high-quality 3D segmentations of these structures. Recently, methods based on deep learning have shown excellent results in automatic 3D segmentation, but they require large datasets for training, and there remains little consensus on which methods perform best. The 2019 Kidney and Kidney Tumor Segmentation challenge (KiTS19) was a competition held in conjunction with the 2019 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) which sought to address these issues and stimulate progress on this automatic segmentation problem. A training set of 210 cross sectional CT images with kidney tumors was publicly released with corresponding semantic segmentation masks. 106 teams from five continents used this data to develop automated systems to predict the true segmentation masks on a test set of 90 CT images for which the corresponding ground truth segmentations were kept private. These predictions were scored and ranked according to their average So rensen-Dice coefficient between the kidney and tumor across all 90 cases. The winning team achieved a Dice of 0.974 for kidney and 0.851 for tumor, approaching the inter-annotator performance on kidney (0.983) but falling short on tumor (0.923). This challenge has now entered an “open leaderboard” phase where it serves as a challenging benchmark in 3D semantic segmentation.
Tasks	3D Semantic Segmentation, Semantic Segmentation
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01054v1
PDF	https://arxiv.org/pdf/1912.01054v1.pdf
PWC	https://paperswithcode.com/paper/the-state-of-the-art-in-kidney-and-kidney
Repo	https://github.com/neheller/kits19
Framework	none

SegSort: Segmentation by Discriminative Sorting of Segments


Title	SegSort: Segmentation by Discriminative Sorting of Segments
Authors	Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen
Abstract	Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In our approach, the optimal visual representation determines the right segmentation within individual images and associates segments with the same semantic classes across images. The core visual learning problem is therefore to maximize the similarity within segments and minimize the similarity between segments. Given a model trained this way, inference is performed consistently by extracting pixel-wise embeddings and clustering, with the semantic label determined by the majority vote of its nearest neighbors from an annotated set. As a result, we present the SegSort, as a first attempt using deep learning for unsupervised semantic segmentation, achieving $76%$ performance of its supervised counterpart. When supervision is available, SegSort shows consistent improvements over conventional approaches based on pixel-wise softmax training. Additionally, our approach produces more precise boundaries and consistent region predictions. The proposed SegSort further produces an interpretable result, as each choice of label can be easily understood from the retrieved nearest segments.
Tasks	Metric Learning, Semantic Segmentation, Unsupervised Semantic Segmentation
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06962v2
PDF	https://arxiv.org/pdf/1910.06962v2.pdf
PWC	https://paperswithcode.com/paper/segsort-segmentation-by-discriminative
Repo	https://github.com/jyhjinghwang/segsort
Framework	tf

Hyperbolic Disk Embeddings for Directed Acyclic Graphs


Title	Hyperbolic Disk Embeddings for Directed Acyclic Graphs
Authors	Ryota Suzuki, Ryusuke Takahama, Shun Onoda
Abstract	Obtaining continuous representations of structural data such as directed acyclic graphs (DAGs) has gained attention in machine learning and artificial intelligence. However, embedding complex DAGs in which both ancestors and descendants of nodes are exponentially increasing is difficult. Tackling in this problem, we develop Disk Embeddings, which is a framework for embedding DAGs into quasi-metric spaces. Existing state-of-the-art methods, Order Embeddings and Hyperbolic Entailment Cones, are instances of Disk Embedding in Euclidean space and spheres respectively. Furthermore, we propose a novel method Hyperbolic Disk Embeddings to handle exponential growth of relations. The results of our experiments show that our Disk Embedding models outperform existing methods especially in complex DAGs other than trees.
Tasks
Published	2019-02-12
URL	https://arxiv.org/abs/1902.04335v3
PDF	https://arxiv.org/pdf/1902.04335v3.pdf
PWC	https://paperswithcode.com/paper/hyperbolic-disk-embeddings-for-directed
Repo	https://github.com/lapras-inc/disk-embedding
Framework	none

IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks


Title	IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks
Authors	Liangzhi Li, Manisha Verma, Yuta Nakashima, Hajime Nagahara, Ryo Kawasaki
Abstract	Retinal vessel segmentation is of great interest for diagnosis of retinal vascular diseases. To further improve the performance of vessel segmentation, we propose IterNet, a new model based on UNet, with the ability to find obscured details of the vessel from the segmented vessel image itself, rather than the raw input image. IterNet consists of multiple iterations of a mini-UNet, which can be 4$\times$ deeper than the common UNet. IterNet also adopts the weight-sharing and skip-connection features to facilitate training; therefore, even with such a large architecture, IterNet can still learn from merely 10$\sim$20 labeled images, without pre-training or any prior knowledge. IterNet achieves AUCs of 0.9816, 0.9851, and 0.9881 on three mainstream datasets, namely DRIVE, CHASE-DB1, and STARE, respectively, which currently are the best scores in the literature. The source code is available.
Tasks	Retinal Vessel Segmentation, Semantic Segmentation
Published	2019-12-12
URL	https://arxiv.org/abs/1912.05763v1
PDF	https://arxiv.org/pdf/1912.05763v1.pdf
PWC	https://paperswithcode.com/paper/iternet-retinal-image-segmentation-utilizing
Repo	https://github.com/conscienceli/IterNet
Framework	none

Verified Uncertainty Calibration


Title	Verified Uncertainty Calibration
Authors	Ananya Kumar, Percy Liang, Tengyu Ma
Abstract	Applications such as weather forecasting and personalized medicine demand models that output calibrated probability estimates—those representative of the true likelihood of a prediction. Most models are not calibrated out of the box but are recalibrated by post-processing model outputs. We find in this work that popular recalibration methods like Platt scaling and temperature scaling are (i) less calibrated than reported, and (ii) current techniques cannot estimate how miscalibrated they are. An alternative method, histogram binning, has measurable calibration error but is sample inefficient—it requires $O(B/\epsilon^2)$ samples, compared to $O(1/\epsilon^2)$ for scaling methods, where $B$ is the number of distinct probabilities the model can output. To get the best of both worlds, we introduce the scaling-binning calibrator, which first fits a parametric function to reduce variance and then bins the function values to actually ensure calibration. This requires only $O(1/\epsilon^2 + B)$ samples. Next, we show that we can estimate a model’s calibration error more accurately using an estimator from the meteorological community—or equivalently measure its calibration error with fewer samples ($O(\sqrt{B})$ instead of $O(B)$). We validate our approach with multiclass calibration experiments on CIFAR-10 and ImageNet, where we obtain a 35% lower calibration error than histogram binning and, unlike scaling methods, guarantees on true calibration. In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators. We implement all these methods in a Python library: https://pypi.org/project/uncertainty-calibration
Tasks	Calibration, Weather Forecasting
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10155v2
PDF	https://arxiv.org/pdf/1909.10155v2.pdf
PWC	https://paperswithcode.com/paper/190910155
Repo	https://github.com/AnanyaKumar/verified_calibration
Framework	none

Orthogonal Convolutional Neural Networks


Title	Orthogonal Convolutional Neural Networks
Authors	Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, Stella X. Yu
Abstract	The instability and feature redundancy in CNNs hinders further performance improvement. Using orthogonality as a regularizer has shown success in alleviating these issues. Previous works however only considered the kernel orthogonality in the convolution layers of CNNs, which is a necessary but not sufficient condition for orthogonal convolutions in general. We propose orthogonal convolutions as regularizations in CNNs and benchmark its effect on various tasks. We observe up to 3% gain for CIFAR100 and up to 1% gain for ImageNet classification. Our experiments also demonstrate improved performance on image retrieval, inpainting and generation, which suggests orthogonal convolution improves the feature expressiveness. Empirically, we show that the uniform spectrum and reduced feature redundancy may account for the gain in performance and robustness under adversarial attacks.
Tasks	Image Retrieval
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12207v1
PDF	https://arxiv.org/pdf/1911.12207v1.pdf
PWC	https://paperswithcode.com/paper/orthogonal-convolutional-neural-networks
Repo	https://github.com/samaonline/Orthogonal-Convolutional-Neural-Networks
Framework	pytorch

Efficient Deep Gaussian Process Models for Variable-Sized Input


Title	Efficient Deep Gaussian Process Models for Variable-Sized Input
Authors	Issam H. Laradji, Mark Schmidt, Vladimir Pavlovic, Minyoung Kim
Abstract	Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://github.com/IssamLaradji/GP_DRF.
Tasks	Gaussian Processes
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06982v1
PDF	https://arxiv.org/pdf/1905.06982v1.pdf
PWC	https://paperswithcode.com/paper/efficient-deep-gaussian-process-models-for
Repo	https://github.com/IssamLaradji/GP_DRF
Framework	pytorch

Linking artificial and human neural representations of language


Title	Linking artificial and human neural representations of language
Authors	Jon Gauthier, Roger Levy
Abstract	What information from an act of sentence understanding is robustly represented in the human brain? We investigate this question by comparing sentence encoding models on a brain decoding task, where the sentence that an experimental participant has seen must be predicted from the fMRI signal evoked by the sentence. We take a pre-trained BERT architecture as a baseline sentence encoding model and fine-tune it on a variety of natural language understanding (NLU) tasks, asking which lead to improvements in brain-decoding performance. We find that none of the sentence encoding tasks tested yield significant increases in brain decoding performance. Through further task ablations and representational analyses, we find that tasks which produce syntax-light representations yield significant improvements in brain decoding performance. Our results constrain the space of NLU models that could best account for human neural representations of language, but also suggest limits on the possibility of decoding fine-grained syntactic information from fMRI human neuroimaging.
Tasks	Brain Decoding
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01244v1
PDF	https://arxiv.org/pdf/1910.01244v1.pdf
PWC	https://paperswithcode.com/paper/linking-artificial-and-human-neural
Repo	https://github.com/hans/nn-decoding
Framework	tf

Unmasking DeepFakes with simple Features


Title	Unmasking DeepFakes with simple Features
Authors	Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, Janis Keuper
Abstract	Deep generative models have recently achieved impressive results for many real-world applications, successfully generating high-resolution and diverse samples from complex datasets. Due to this improvement, fake digital contents have proliferated growing concern and spreading distrust in image content, leading to an urgent need for automated ways to detect these AI-generated fake images. Despite the fact that many face editing algorithms seem to produce realistic human faces, upon closer examination, they do exhibit artifacts in certain domains which are often hidden to the naked eye. In this work, we present a simple way to detect such fake face images - so-called DeepFakes. Our method is based on a classical frequency domain analysis followed by basic classifier. Compared to previous systems, which need to be fed with large amounts of labeled data, our approach showed very good results using only a few annotated training samples and even achieved good accuracies in fully unsupervised scenarios. For the evaluation on high resolution face images, we combined several public datasets of real and fake faces into a new benchmark: Faces-HQ. Given such high-resolution images, our approach reaches a perfect classification accuracy of 100% when it is trained on as little as 20 annotated samples. In a second experiment, in the evaluation of the medium-resolution images of the CelebA dataset, our method achieves 100% accuracy supervised and 96% in an unsupervised setting. Finally, evaluating a low-resolution video sequences of the FaceForensics++ dataset, our method achieves 91% accuracy detecting manipulated videos. Source Code: https://github.com/cc-hpc-itwm/DeepFakeDetection
Tasks	DeepFake Detection
Published	2019-11-02
URL	https://arxiv.org/abs/1911.00686v3
PDF	https://arxiv.org/pdf/1911.00686v3.pdf
PWC	https://paperswithcode.com/paper/unmasking-deepfakes-with-simple-features
Repo	https://github.com/cc-hpc-itwm/UpConv
Framework	pytorch

MIC: Mining Interclass Characteristics for Improved Metric Learning


Title	MIC: Mining Interclass Characteristics for Improved Metric Learning
Authors	Karsten Roth, Biagio Brattoli, Björn Ommer
Abstract	Metric learning seeks to embed images of objects suchthat class-defined relations are captured by the embeddingspace. However, variability in images is not just due to different depicted object classes, but also depends on other latent characteristics such as viewpoint or illumination. In addition to these structured properties, random noise further obstructs the visual relations of interest. The common approach to metric learning is to enforce a representation that is invariant under all factors but the ones of interest. In contrast, we propose to explicitly learn the latent characteristics that are shared by and go across object classes. We can then directly explain away structured visual variability, rather than assuming it to be unknown random noise. We propose a novel surrogate task to learn visual characteristics shared across classes with a separate encoder. This encoder is trained jointly with the encoder for class information by reducing their mutual information. On five standard image retrieval benchmarks the approach significantly improves upon the state-of-the-art.
Tasks	Image Retrieval, Metric Learning
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11574v1
PDF	https://arxiv.org/pdf/1909.11574v1.pdf
PWC	https://paperswithcode.com/paper/mic-mining-interclass-characteristics-for
Repo	https://github.com/Confusezius/metric-learning-mining-interclass-characteristics
Framework	pytorch