February 2, 2020

3335 words 16 mins read

Paper Group AWR 46

Paper Group AWR 46

Does BERT agree? Evaluating knowledge of structure dependence through agreement relations. Maybe Deep Neural Networks are the Best Choice for Modeling Source Code. Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. SpanBERT: Improving Pre- …

Does BERT agree? Evaluating knowledge of structure dependence through agreement relations

Title Does BERT agree? Evaluating knowledge of structure dependence through agreement relations
Authors Geoff Bacon, Terry Regier
Abstract Learning representations that accurately model semantics is an important goal of natural language processing research. Many semantic phenomena depend on syntactic structure. Recent work examines the extent to which state-of-the-art models for pre-training representations, such as BERT, capture such structure-dependent phenomena, but is largely restricted to one phenomenon in English: number agreement between subjects and verbs. We evaluate BERT’s sensitivity to four types of structure-dependent agreement relations in a new semi-automatically curated dataset across 26 languages. We show that both the single-language and multilingual BERT models capture syntax-sensitive agreement patterns well in general, but we also highlight the specific linguistic contexts in which their performance degrades.
Tasks
Published 2019-08-26
URL https://arxiv.org/abs/1908.09892v1
PDF https://arxiv.org/pdf/1908.09892v1.pdf
PWC https://paperswithcode.com/paper/does-bert-agree-evaluating-knowledge-of
Repo https://github.com/geoffbacon/does-bert-agree
Framework none

Maybe Deep Neural Networks are the Best Choice for Modeling Source Code

Title Maybe Deep Neural Networks are the Best Choice for Modeling Source Code
Authors Rafael-Michael Karampatsis, Charles Sutton
Abstract Statistical language modeling techniques have successfully been applied to source code, yielding a variety of new software development tools, such as tools for code suggestion and improving readability. A major issue with these techniques is that code introduces new vocabulary at a far higher rate than natural language, as new identifier names proliferate. But traditional language models limit the vocabulary to a fixed set of common words. For code, this strong assumption has been shown to have a significant negative effect on predictive performance. But the open vocabulary version of the neural network language models for code have not been introduced in the literature. We present a new open-vocabulary neural language model for code that is not limited to a fixed vocabulary of identifier names. We employ a segmentation into subword units, subsequences of tokens chosen based on a compression criterion, following previous work in machine translation. Our network achieves best in class performance, outperforming even the state-of-the-art methods of Hellendoorn and Devanbu that are designed specifically to model code. Furthermore, we present a simple method for dynamically adapting the model to a new test project, resulting in increased performance. We showcase our methodology on code corpora in three different languages of over a billion tokens each, hundreds of times larger than in previous work. To our knowledge, this is the largest neural language model for code that has been reported.
Tasks Language Modelling, Machine Translation
Published 2019-03-13
URL http://arxiv.org/abs/1903.05734v1
PDF http://arxiv.org/pdf/1903.05734v1.pdf
PWC https://paperswithcode.com/paper/maybe-deep-neural-networks-are-the-best
Repo https://github.com/mast-group/OpenVocabCodeNLM
Framework tf

Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling

Title Second-order Co-occurrence Sensitivity of Skip-Gram with Negative Sampling
Authors Dominik Schlechtweg, Cennet Oguz, Sabine Schulte im Walde
Abstract We simulate first- and second-order context overlap and show that Skip-Gram with Negative Sampling is similar to Singular Value Decomposition in capturing second-order co-occurrence information, while Pointwise Mutual Information is agnostic to it. We support the results with an empirical study finding that the models react differently when provided with additional second-order information. Our findings reveal a basic property of Skip-Gram with Negative Sampling and point towards an explanation of its success on a variety of tasks.
Tasks
Published 2019-06-06
URL https://arxiv.org/abs/1906.02479v2
PDF https://arxiv.org/pdf/1906.02479v2.pdf
PWC https://paperswithcode.com/paper/second-order-co-occurrence-sensitivity-of
Repo https://github.com/Garrafao/SecondOrder
Framework none

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

Title Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
Authors Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese
Abstract Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that $IoU$ can be directly used as a regression loss. However, $IoU$ has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.
Tasks Object Detection
Published 2019-02-25
URL http://arxiv.org/abs/1902.09630v2
PDF http://arxiv.org/pdf/1902.09630v2.pdf
PWC https://paperswithcode.com/paper/generalized-intersection-over-union-a-metric
Repo https://github.com/RuiminChen/GIou_loss_caffe
Framework none

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Title SpanBERT: Improving Pre-training by Representing and Predicting Spans
Authors Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy
Abstract We present SpanBERT, a pre-training method that is designed to better represent and predict spans of text. Our approach extends BERT by (1) masking contiguous random spans, rather than random tokens, and (2) training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it. SpanBERT consistently outperforms BERT and our better-tuned baselines, with substantial gains on span selection tasks such as question answering and coreference resolution. In particular, with the same training data and model size as BERT-large, our single model obtains 94.6% and 88.7% F1 on SQuAD 1.1 and 2.0, respectively. We also achieve a new state of the art on the OntoNotes coreference resolution task (79.6% F1), strong performance on the TACRED relation extraction benchmark, and even show gains on GLUE.
Tasks Coreference Resolution, Linguistic Acceptability, Natural Language Inference, Open-Domain Question Answering, Question Answering, Relation Extraction, Semantic Textual Similarity, Sentiment Analysis
Published 2019-07-24
URL https://arxiv.org/abs/1907.10529v3
PDF https://arxiv.org/pdf/1907.10529v3.pdf
PWC https://paperswithcode.com/paper/spanbert-improving-pre-training-by
Repo https://github.com/facebookresearch/SpanBERT
Framework pytorch

The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge

Title The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 Challenge
Authors Nicholas Heller, Fabian Isensee, Klaus H. Maier-Hein, Xiaoshuai Hou, Chunmei Xie, Fengyi Li, Yang Nan, Guangrui Mu, Zhiyong Lin, Miofei Han, Guang Yao, Yaozong Gao, Yao Zhang, Yixin Wang, Feng Hou, Jiawei Yang, Guangwei Xiong, Jiang Tian, Cheng Zhong, Jun Ma, Jack Rickman, Joshua Dean, Bethany Stai, Resha Tejpaul, Makinna Oestreich, Paul Blake, Heather Kaluzniak, Shaneabbas Raza, Joel Rosenberg, Keenan Moore, Edward Walczak, Zachary Rengel, Zach Edgerton, Ranveer Vasdev, Matthew Peterson, Sean McSweeney, Sarah Peterson, Arveen Kalapara, Niranjan Sathianathen, Christopher Weight, Nikolaos Papanikolopoulos
Abstract There is a large body of literature linking anatomic and geometric characteristics of kidney tumors to perioperative and oncologic outcomes. Semantic segmentation of these tumors and their host kidneys is a promising tool for quantitatively characterizing these lesions, but its adoption is limited due to the manual effort required to produce high-quality 3D segmentations of these structures. Recently, methods based on deep learning have shown excellent results in automatic 3D segmentation, but they require large datasets for training, and there remains little consensus on which methods perform best. The 2019 Kidney and Kidney Tumor Segmentation challenge (KiTS19) was a competition held in conjunction with the 2019 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) which sought to address these issues and stimulate progress on this automatic segmentation problem. A training set of 210 cross sectional CT images with kidney tumors was publicly released with corresponding semantic segmentation masks. 106 teams from five continents used this data to develop automated systems to predict the true segmentation masks on a test set of 90 CT images for which the corresponding ground truth segmentations were kept private. These predictions were scored and ranked according to their average So rensen-Dice coefficient between the kidney and tumor across all 90 cases. The winning team achieved a Dice of 0.974 for kidney and 0.851 for tumor, approaching the inter-annotator performance on kidney (0.983) but falling short on tumor (0.923). This challenge has now entered an “open leaderboard” phase where it serves as a challenging benchmark in 3D semantic segmentation.
Tasks 3D Semantic Segmentation, Semantic Segmentation
Published 2019-12-02
URL https://arxiv.org/abs/1912.01054v1
PDF https://arxiv.org/pdf/1912.01054v1.pdf
PWC https://paperswithcode.com/paper/the-state-of-the-art-in-kidney-and-kidney
Repo https://github.com/neheller/kits19
Framework none

SegSort: Segmentation by Discriminative Sorting of Segments

Title SegSort: Segmentation by Discriminative Sorting of Segments
Authors Jyh-Jing Hwang, Stella X. Yu, Jianbo Shi, Maxwell D. Collins, Tien-Ju Yang, Xiao Zhang, Liang-Chieh Chen
Abstract Almost all existing deep learning approaches for semantic segmentation tackle this task as a pixel-wise classification problem. Yet humans understand a scene not in terms of pixels, but by decomposing it into perceptual groups and structures that are the basic building blocks of recognition. This motivates us to propose an end-to-end pixel-wise metric learning approach that mimics this process. In our approach, the optimal visual representation determines the right segmentation within individual images and associates segments with the same semantic classes across images. The core visual learning problem is therefore to maximize the similarity within segments and minimize the similarity between segments. Given a model trained this way, inference is performed consistently by extracting pixel-wise embeddings and clustering, with the semantic label determined by the majority vote of its nearest neighbors from an annotated set. As a result, we present the SegSort, as a first attempt using deep learning for unsupervised semantic segmentation, achieving $76%$ performance of its supervised counterpart. When supervision is available, SegSort shows consistent improvements over conventional approaches based on pixel-wise softmax training. Additionally, our approach produces more precise boundaries and consistent region predictions. The proposed SegSort further produces an interpretable result, as each choice of label can be easily understood from the retrieved nearest segments.
Tasks Metric Learning, Semantic Segmentation, Unsupervised Semantic Segmentation
Published 2019-10-15
URL https://arxiv.org/abs/1910.06962v2
PDF https://arxiv.org/pdf/1910.06962v2.pdf
PWC https://paperswithcode.com/paper/segsort-segmentation-by-discriminative
Repo https://github.com/jyhjinghwang/segsort
Framework tf

Hyperbolic Disk Embeddings for Directed Acyclic Graphs

Title Hyperbolic Disk Embeddings for Directed Acyclic Graphs
Authors Ryota Suzuki, Ryusuke Takahama, Shun Onoda
Abstract Obtaining continuous representations of structural data such as directed acyclic graphs (DAGs) has gained attention in machine learning and artificial intelligence. However, embedding complex DAGs in which both ancestors and descendants of nodes are exponentially increasing is difficult. Tackling in this problem, we develop Disk Embeddings, which is a framework for embedding DAGs into quasi-metric spaces. Existing state-of-the-art methods, Order Embeddings and Hyperbolic Entailment Cones, are instances of Disk Embedding in Euclidean space and spheres respectively. Furthermore, we propose a novel method Hyperbolic Disk Embeddings to handle exponential growth of relations. The results of our experiments show that our Disk Embedding models outperform existing methods especially in complex DAGs other than trees.
Tasks
Published 2019-02-12
URL https://arxiv.org/abs/1902.04335v3
PDF https://arxiv.org/pdf/1902.04335v3.pdf
PWC https://paperswithcode.com/paper/hyperbolic-disk-embeddings-for-directed
Repo https://github.com/lapras-inc/disk-embedding
Framework none

IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks

Title IterNet: Retinal Image Segmentation Utilizing Structural Redundancy in Vessel Networks
Authors Liangzhi Li, Manisha Verma, Yuta Nakashima, Hajime Nagahara, Ryo Kawasaki
Abstract Retinal vessel segmentation is of great interest for diagnosis of retinal vascular diseases. To further improve the performance of vessel segmentation, we propose IterNet, a new model based on UNet, with the ability to find obscured details of the vessel from the segmented vessel image itself, rather than the raw input image. IterNet consists of multiple iterations of a mini-UNet, which can be 4$\times$ deeper than the common UNet. IterNet also adopts the weight-sharing and skip-connection features to facilitate training; therefore, even with such a large architecture, IterNet can still learn from merely 10$\sim$20 labeled images, without pre-training or any prior knowledge. IterNet achieves AUCs of 0.9816, 0.9851, and 0.9881 on three mainstream datasets, namely DRIVE, CHASE-DB1, and STARE, respectively, which currently are the best scores in the literature. The source code is available.
Tasks Retinal Vessel Segmentation, Semantic Segmentation
Published 2019-12-12
URL https://arxiv.org/abs/1912.05763v1
PDF https://arxiv.org/pdf/1912.05763v1.pdf
PWC https://paperswithcode.com/paper/iternet-retinal-image-segmentation-utilizing
Repo https://github.com/conscienceli/IterNet
Framework none

Verified Uncertainty Calibration

Title Verified Uncertainty Calibration
Authors Ananya Kumar, Percy Liang, Tengyu Ma
Abstract Applications such as weather forecasting and personalized medicine demand models that output calibrated probability estimates—those representative of the true likelihood of a prediction. Most models are not calibrated out of the box but are recalibrated by post-processing model outputs. We find in this work that popular recalibration methods like Platt scaling and temperature scaling are (i) less calibrated than reported, and (ii) current techniques cannot estimate how miscalibrated they are. An alternative method, histogram binning, has measurable calibration error but is sample inefficient—it requires $O(B/\epsilon^2)$ samples, compared to $O(1/\epsilon^2)$ for scaling methods, where $B$ is the number of distinct probabilities the model can output. To get the best of both worlds, we introduce the scaling-binning calibrator, which first fits a parametric function to reduce variance and then bins the function values to actually ensure calibration. This requires only $O(1/\epsilon^2 + B)$ samples. Next, we show that we can estimate a model’s calibration error more accurately using an estimator from the meteorological community—or equivalently measure its calibration error with fewer samples ($O(\sqrt{B})$ instead of $O(B)$). We validate our approach with multiclass calibration experiments on CIFAR-10 and ImageNet, where we obtain a 35% lower calibration error than histogram binning and, unlike scaling methods, guarantees on true calibration. In these experiments, we also estimate the calibration error and ECE more accurately than the commonly used plugin estimators. We implement all these methods in a Python library: https://pypi.org/project/uncertainty-calibration
Tasks Calibration, Weather Forecasting
Published 2019-09-23
URL https://arxiv.org/abs/1909.10155v2
PDF https://arxiv.org/pdf/1909.10155v2.pdf
PWC https://paperswithcode.com/paper/190910155
Repo https://github.com/AnanyaKumar/verified_calibration
Framework none

Orthogonal Convolutional Neural Networks

Title Orthogonal Convolutional Neural Networks
Authors Jiayun Wang, Yubei Chen, Rudrasis Chakraborty, Stella X. Yu
Abstract The instability and feature redundancy in CNNs hinders further performance improvement. Using orthogonality as a regularizer has shown success in alleviating these issues. Previous works however only considered the kernel orthogonality in the convolution layers of CNNs, which is a necessary but not sufficient condition for orthogonal convolutions in general. We propose orthogonal convolutions as regularizations in CNNs and benchmark its effect on various tasks. We observe up to 3% gain for CIFAR100 and up to 1% gain for ImageNet classification. Our experiments also demonstrate improved performance on image retrieval, inpainting and generation, which suggests orthogonal convolution improves the feature expressiveness. Empirically, we show that the uniform spectrum and reduced feature redundancy may account for the gain in performance and robustness under adversarial attacks.
Tasks Image Retrieval
Published 2019-11-27
URL https://arxiv.org/abs/1911.12207v1
PDF https://arxiv.org/pdf/1911.12207v1.pdf
PWC https://paperswithcode.com/paper/orthogonal-convolutional-neural-networks
Repo https://github.com/samaonline/Orthogonal-Convolutional-Neural-Networks
Framework pytorch

Efficient Deep Gaussian Process Models for Variable-Sized Input

Title Efficient Deep Gaussian Process Models for Variable-Sized Input
Authors Issam H. Laradji, Mark Schmidt, Vladimir Pavlovic, Minyoung Kim
Abstract Deep Gaussian processes (DGP) have appealing Bayesian properties, can handle variable-sized data, and learn deep features. Their limitation is that they do not scale well with the size of the data. Existing approaches address this using a deep random feature (DRF) expansion model, which makes inference tractable by approximating DGPs. However, DRF is not suitable for variable-sized input data such as trees, graphs, and sequences. We introduce the GP-DRF, a novel Bayesian model with an input layer of GPs, followed by DRF layers. The key advantage is that the combination of GP and DRF leads to a tractable model that can both handle a variable-sized input as well as learn deep long-range dependency structures of the data. We provide a novel efficient method to simultaneously infer the posterior of GP’s latent vectors and infer the posterior of DRF’s internal weights and random frequencies. Our experiments show that GP-DRF outperforms the standard GP model and DRF model across many datasets. Furthermore, they demonstrate that GP-DRF enables improved uncertainty quantification compared to GP and DRF alone, with respect to a Bhattacharyya distance assessment. Source code is available at https://github.com/IssamLaradji/GP_DRF.
Tasks Gaussian Processes
Published 2019-05-16
URL https://arxiv.org/abs/1905.06982v1
PDF https://arxiv.org/pdf/1905.06982v1.pdf
PWC https://paperswithcode.com/paper/efficient-deep-gaussian-process-models-for
Repo https://github.com/IssamLaradji/GP_DRF
Framework pytorch

Linking artificial and human neural representations of language

Title Linking artificial and human neural representations of language
Authors Jon Gauthier, Roger Levy
Abstract What information from an act of sentence understanding is robustly represented in the human brain? We investigate this question by comparing sentence encoding models on a brain decoding task, where the sentence that an experimental participant has seen must be predicted from the fMRI signal evoked by the sentence. We take a pre-trained BERT architecture as a baseline sentence encoding model and fine-tune it on a variety of natural language understanding (NLU) tasks, asking which lead to improvements in brain-decoding performance. We find that none of the sentence encoding tasks tested yield significant increases in brain decoding performance. Through further task ablations and representational analyses, we find that tasks which produce syntax-light representations yield significant improvements in brain decoding performance. Our results constrain the space of NLU models that could best account for human neural representations of language, but also suggest limits on the possibility of decoding fine-grained syntactic information from fMRI human neuroimaging.
Tasks Brain Decoding
Published 2019-10-02
URL https://arxiv.org/abs/1910.01244v1
PDF https://arxiv.org/pdf/1910.01244v1.pdf
PWC https://paperswithcode.com/paper/linking-artificial-and-human-neural
Repo https://github.com/hans/nn-decoding
Framework tf

Unmasking DeepFakes with simple Features

Title Unmasking DeepFakes with simple Features
Authors Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, Janis Keuper
Abstract Deep generative models have recently achieved impressive results for many real-world applications, successfully generating high-resolution and diverse samples from complex datasets. Due to this improvement, fake digital contents have proliferated growing concern and spreading distrust in image content, leading to an urgent need for automated ways to detect these AI-generated fake images. Despite the fact that many face editing algorithms seem to produce realistic human faces, upon closer examination, they do exhibit artifacts in certain domains which are often hidden to the naked eye. In this work, we present a simple way to detect such fake face images - so-called DeepFakes. Our method is based on a classical frequency domain analysis followed by basic classifier. Compared to previous systems, which need to be fed with large amounts of labeled data, our approach showed very good results using only a few annotated training samples and even achieved good accuracies in fully unsupervised scenarios. For the evaluation on high resolution face images, we combined several public datasets of real and fake faces into a new benchmark: Faces-HQ. Given such high-resolution images, our approach reaches a perfect classification accuracy of 100% when it is trained on as little as 20 annotated samples. In a second experiment, in the evaluation of the medium-resolution images of the CelebA dataset, our method achieves 100% accuracy supervised and 96% in an unsupervised setting. Finally, evaluating a low-resolution video sequences of the FaceForensics++ dataset, our method achieves 91% accuracy detecting manipulated videos. Source Code: https://github.com/cc-hpc-itwm/DeepFakeDetection
Tasks DeepFake Detection
Published 2019-11-02
URL https://arxiv.org/abs/1911.00686v3
PDF https://arxiv.org/pdf/1911.00686v3.pdf
PWC https://paperswithcode.com/paper/unmasking-deepfakes-with-simple-features
Repo https://github.com/cc-hpc-itwm/UpConv
Framework pytorch

MIC: Mining Interclass Characteristics for Improved Metric Learning

Title MIC: Mining Interclass Characteristics for Improved Metric Learning
Authors Karsten Roth, Biagio Brattoli, Björn Ommer
Abstract Metric learning seeks to embed images of objects suchthat class-defined relations are captured by the embeddingspace. However, variability in images is not just due to different depicted object classes, but also depends on other latent characteristics such as viewpoint or illumination. In addition to these structured properties, random noise further obstructs the visual relations of interest. The common approach to metric learning is to enforce a representation that is invariant under all factors but the ones of interest. In contrast, we propose to explicitly learn the latent characteristics that are shared by and go across object classes. We can then directly explain away structured visual variability, rather than assuming it to be unknown random noise. We propose a novel surrogate task to learn visual characteristics shared across classes with a separate encoder. This encoder is trained jointly with the encoder for class information by reducing their mutual information. On five standard image retrieval benchmarks the approach significantly improves upon the state-of-the-art.
Tasks Image Retrieval, Metric Learning
Published 2019-09-25
URL https://arxiv.org/abs/1909.11574v1
PDF https://arxiv.org/pdf/1909.11574v1.pdf
PWC https://paperswithcode.com/paper/mic-mining-interclass-characteristics-for
Repo https://github.com/Confusezius/metric-learning-mining-interclass-characteristics
Framework pytorch
comments powered by Disqus