February 2, 2020

3093 words 15 mins read

Paper Group AWR 63

Paper Group AWR 63

Survival Function Matching for Calibrated Time-to-Event Predictions. Diagnosing Bottlenecks in Deep Q-learning Algorithms. DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis. Differentiable Surface Splatting for Point-based Geometry Processing. BLiMP: A Benchmark of Linguistic Minimal Pairs for English. Generating Di …

Survival Function Matching for Calibrated Time-to-Event Predictions

Title Survival Function Matching for Calibrated Time-to-Event Predictions
Authors Paidamoyo Chapfuwa, Chenyang Tao, Lawrence Carin, Ricardo Henao
Abstract Models for predicting the time of a future event are crucial for risk assessment, across a diverse range of applications. Existing time-to-event (survival) models have focused primarily on preserving pairwise ordering of estimated event times, or relative risk. Model calibration is relatively under explored, despite its critical importance in time-to-event applications. We present a survival function estimator for probabilistic predictions in time-to-event models, based on a neural network model for draws from the distribution of event times, without explicit assumptions on the form of the distribution. This is done like in adversarial learning, but we achieve learning without a discriminator or adversarial objective. The proposed estimator can be used in practice as a means of estimating and comparing conditional survival distributions, while accounting for the predictive uncertainty of probabilistic models. Extensive experiments show that the proposed model outperforms existing approaches, trained both with and without adversarial learning, in terms of both calibration and concentration of time-to-event distributions.
Tasks Calibration
Published 2019-05-21
URL https://arxiv.org/abs/1905.08838v1
PDF https://arxiv.org/pdf/1905.08838v1.pdf
PWC https://paperswithcode.com/paper/survival-function-matching-for-calibrated
Repo https://github.com/paidamoyo/survival_cluster_analysis
Framework tf

Diagnosing Bottlenecks in Deep Q-learning Algorithms

Title Diagnosing Bottlenecks in Deep Q-learning Algorithms
Authors Justin Fu, Aviral Kumar, Matthew Soh, Sergey Levine
Abstract Q-learning methods represent a commonly used class of algorithms in reinforcement learning: they are generally efficient and simple, and can be combined readily with function approximators for deep reinforcement learning (RL). However, the behavior of Q-learning methods with function approximation is poorly understood, both theoretically and empirically. In this work, we aim to experimentally investigate potential issues in Q-learning, by means of a “unit testing” framework where we can utilize oracles to disentangle sources of error. Specifically, we investigate questions related to function approximation, sampling error and nonstationarity, and where available, verify if trends found in oracle settings hold true with modern deep RL methods. We find that large neural network architectures have many benefits with regards to learning stability; offer several practical compensations for overfitting; and develop a novel sampling method based on explicitly compensating for function approximation error that yields fair improvement on high-dimensional continuous control domains.
Tasks Continuous Control, Q-Learning
Published 2019-02-26
URL http://arxiv.org/abs/1902.10250v1
PDF http://arxiv.org/pdf/1902.10250v1.pdf
PWC https://paperswithcode.com/paper/diagnosing-bottlenecks-in-deep-q-learning
Repo https://github.com/justinjfu/diagnosing_qlearning
Framework none

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

Title DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis
Authors Minfeng Zhu, Pingbo Pan, Wei Chen, Yi Yang
Abstract In this paper, we focus on generating realistic images from text descriptions. Current methods first generate an initial image with rough shape and color, and then refine the initial image to a high-resolution one. Most existing text-to-image synthesis methods have two main problems. (1) These methods depend heavily on the quality of the initial images. If the initial image is not well initialized, the following processes can hardly refine the image to a satisfactory quality. (2) Each word contributes a different level of importance when depicting different image contents, however, unchanged text representation is used in existing image refinement processes. In this paper, we propose the Dynamic Memory Generative Adversarial Network (DM-GAN) to generate high-quality images. The proposed method introduces a dynamic memory module to refine fuzzy image contents, when the initial images are not well generated. A memory writing gate is designed to select the important text information based on the initial image content, which enables our method to accurately generate images from the text description. We also utilize a response gate to adaptively fuse the information read from the memories and the image features. We evaluate the DM-GAN model on the Caltech-UCSD Birds 200 dataset and the Microsoft Common Objects in Context dataset. Experimental results demonstrate that our DM-GAN model performs favorably against the state-of-the-art approaches.
Tasks Image Generation
Published 2019-04-02
URL http://arxiv.org/abs/1904.01310v1
PDF http://arxiv.org/pdf/1904.01310v1.pdf
PWC https://paperswithcode.com/paper/dm-gan-dynamic-memory-generative-adversarial
Repo https://github.com/MinfengZhu/DM-GAN
Framework tf

Differentiable Surface Splatting for Point-based Geometry Processing

Title Differentiable Surface Splatting for Point-based Geometry Processing
Authors Wang Yifan, Felice Serena, Shihao Wu, Cengiz Öztireli, Olga Sorkine-Hornung
Abstract We propose Differentiable Surface Splatting (DSS), a high-fidelity differentiable renderer for point clouds. Gradients for point locations and normals are carefully designed to handle discontinuities of the rendering function. Regularization terms are introduced to ensure uniform distribution of the points on the underlying surface. We demonstrate applications of DSS to inverse rendering for geometry synthesis and denoising, where large scale topological changes, as well as small scale detail modifications, are accurately and robustly handled without requiring explicit connectivity, outperforming state-of-the-art techniques. The data and code are at https://github.com/yifita/DSS.
Tasks Denoising
Published 2019-06-10
URL https://arxiv.org/abs/1906.04173v3
PDF https://arxiv.org/pdf/1906.04173v3.pdf
PWC https://paperswithcode.com/paper/differentiable-surface-splatting-for-point
Repo https://github.com/yifita/DSS
Framework pytorch

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Title BLiMP: A Benchmark of Linguistic Minimal Pairs for English
Authors Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman
Abstract We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English. BLiMP consists of 67 sub-datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. The data is automatically generated according to expert-crafted grammars, and aggregate human agreement with the labels is 96.4%. We use it to evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs. We find that state-of-the-art models identify morphological contrasts reliably, but they struggle with semantic restrictions on the distribution of quantifiers and negative polarity items and subtle syntactic phenomena such as extraction islands.
Tasks
Published 2019-12-02
URL https://arxiv.org/abs/1912.00582v1
PDF https://arxiv.org/pdf/1912.00582v1.pdf
PWC https://paperswithcode.com/paper/blimp-a-benchmark-of-linguistic-minimal-pairs
Repo https://github.com/alexwarstadt/blimp
Framework none

Generating Diverse High-Fidelity Images with VQ-VAE-2

Title Generating Diverse High-Fidelity Images with VQ-VAE-2
Authors Ali Razavi, Aaron van den Oord, Oriol Vinyals
Abstract We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN’s known shortcomings such as mode collapse and lack of diversity.
Tasks Image Generation
Published 2019-06-02
URL https://arxiv.org/abs/1906.00446v1
PDF https://arxiv.org/pdf/1906.00446v1.pdf
PWC https://paperswithcode.com/paper/190600446
Repo https://github.com/rosinality/vq-vae-2-pytorch
Framework pytorch

Critically Examining the “Neural Hype”: Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models

Title Critically Examining the “Neural Hype”: Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models
Authors Wei Yang, Kuang Lu, Peilin Yang, Jimmy Lin
Abstract Is neural IR mostly hype? In a recent SIGIR Forum article, Lin expressed skepticism that neural ranking models were actually improving ad hoc retrieval effectiveness in limited data scenarios. He provided anecdotal evidence that authors of neural IR papers demonstrate “wins” by comparing against weak baselines. This paper provides a rigorous evaluation of those claims in two ways: First, we conducted a meta-analysis of papers that have reported experimental results on the TREC Robust04 test collection. We do not find evidence of an upward trend in effectiveness over time. In fact, the best reported results are from a decade ago and no recent neural approach comes close. Second, we applied five recent neural models to rerank the strong baselines that Lin used to make his arguments. A significant improvement was observed for one of the models, demonstrating additivity in gains. While there appears to be merit to neural IR approaches, at least some of the gains reported in the literature appear illusory.
Tasks
Published 2019-04-19
URL https://arxiv.org/abs/1904.09171v2
PDF https://arxiv.org/pdf/1904.09171v2.pdf
PWC https://paperswithcode.com/paper/critically-examining-the-neural-hype-weak
Repo https://github.com/castorini/Anserini
Framework none

Occlusion-shared and Feature-separated Network for Occlusion Relationship Reasoning

Title Occlusion-shared and Feature-separated Network for Occlusion Relationship Reasoning
Authors Rui Lu, Feng Xue, Menghan Zhou, Anlong Ming, Yu Zhou
Abstract Occlusion relationship reasoning demands closed contour to express the object, and orientation of each contour pixel to describe the order relationship between objects. Current CNN-based methods neglect two critical issues of the task: (1) simultaneous existence of the relevance and distinction for the two elements, i.e, occlusion edge and occlusion orientation; and (2) inadequate exploration to the orientation features. For the reasons above, we propose the Occlusion-shared and Feature-separated Network (OFNet). On one hand, considering the relevance between edge and orientation, two sub-networks are designed to share the occlusion cue. On the other hand, the whole network is split into two paths to learn the high-level semantic features separately. Moreover, a contextual feature for orientation prediction is extracted, which represents the bilateral cue of the foreground and background areas. The bilateral cue is then fused with the occlusion cue to precisely locate the object regions. Finally, a stripe convolution is designed to further aggregate features from surrounding scenes of the occlusion edge. The proposed OFNet remarkably advances the state-of-the-art approaches on PIOD and BSDS ownership dataset. The source code is available at https://github.com/buptlr/OFNet.
Tasks
Published 2019-08-16
URL https://arxiv.org/abs/1908.05898v1
PDF https://arxiv.org/pdf/1908.05898v1.pdf
PWC https://paperswithcode.com/paper/occlusion-shared-and-feature-separated
Repo https://github.com/buptlr/OFNet
Framework none

A Closer Look at the Optimization Landscapes of Generative Adversarial Networks

Title A Closer Look at the Optimization Landscapes of Generative Adversarial Networks
Authors Hugo Berard, Gauthier Gidel, Amjad Almahairi, Pascal Vincent, Simon Lacoste-Julien
Abstract Generative adversarial networks have been very successful in generative modeling, however they remain relatively challenging to train compared to standard deep neural networks. In this paper, we propose new visualization techniques for the optimization landscapes of GANs that enable us to study the game vector field resulting from the concatenation of the gradient of both players. Using these visualization techniques we try to bridge the gap between theory and practice by showing empirically that the training of GANs exhibits significant rotations around LSSP, similar to the one predicted by theory on toy examples. Moreover, we provide empirical evidence that GAN training seems to converge to a stable stationary point which is a saddle point for the generator loss, not a minimum, while still achieving excellent performance.
Tasks
Published 2019-06-11
URL https://arxiv.org/abs/1906.04848v2
PDF https://arxiv.org/pdf/1906.04848v2.pdf
PWC https://paperswithcode.com/paper/a-closer-look-at-the-optimization-landscapes
Repo https://github.com/facebookresearch/GAN-optimization-landscape
Framework pytorch

Differentiable Product Quantization for End-to-End Embedding Compression

Title Differentiable Product Quantization for End-to-End Embedding Compression
Authors Ting Chen, Lala Li, Yizhou Sun
Abstract Embedding layers are commonly used to map discrete symbols into continuous embedding vectors that reflect their semantic meanings. Despite their effectiveness, the number of parameters in an embedding layer increases linearly with the number of symbols and poses a critical challenge on memory and storage constraints. In this work, we propose a generic and end-to-end learnable compression framework termed differentiable product quantization (DPQ). We present two instantiations of DPQ that leverage different approximation techniques to enable differentiability in end-to-end learning. Our method can readily serve as a drop-in alternative for any existing embedding layer. Empirically, DPQ offers significant compression ratios (14-238X) at negligible or no performance cost on 10 datasets across three different language tasks.
Tasks Quantization
Published 2019-08-26
URL https://arxiv.org/abs/1908.09756v2
PDF https://arxiv.org/pdf/1908.09756v2.pdf
PWC https://paperswithcode.com/paper/differentiable-product-quantization-for-end
Repo https://github.com/chentingpc/dpq_embedding_compression
Framework tf

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

Title On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
Authors Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee
Abstract Scene text recognition (STR) is the task of recognizing character sequences in natural scenes. While there have been great advances in STR methods, current methods still fail to recognize texts in arbitrary shapes, such as heavily curved or rotated texts, which are abundant in daily life (e.g. restaurant signs, product labels, company logos, etc). This paper introduces a novel architecture to recognizing texts of arbitrary shapes, named Self-Attention Text Recognition Network (SATRN), which is inspired by the Transformer. SATRN utilizes the self-attention mechanism to describe two-dimensional (2D) spatial dependencies of characters in a scene text image. Exploiting the full-graph propagation of self-attention, SATRN can recognize texts with arbitrary arrangements and large inter-character spacing. As a result, SATRN outperforms existing STR models by a large margin of 5.7 pp on average in “irregular text” benchmarks. We provide empirical analyses that illustrate the inner mechanisms and the extent to which the model is applicable (e.g. rotated and multi-line text). We will open-source the code.
Tasks Scene Text Recognition
Published 2019-10-10
URL https://arxiv.org/abs/1910.04396v1
PDF https://arxiv.org/pdf/1910.04396v1.pdf
PWC https://paperswithcode.com/paper/on-recognizing-texts-of-arbitrary-shapes-with
Repo https://github.com/hestheimar/Paper-Review
Framework none

Attention on Attention for Image Captioning

Title Attention on Attention for Image Captioning
Authors Lun Huang, Wenmin Wang, Jie Chen, Xiao-Yong Wei
Abstract Attention mechanisms are widely used in current encoder/decoder frameworks of image captioning, where a weighted average on encoded vectors is generated at each time step to guide the caption decoding process. However, the decoder has little idea of whether or how well the attended vector and the given attention query are related, which could make the decoder give misled results. In this paper, we propose an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries. AoA first generates an information vector and an attention gate using the attention result and the current context, then adds another attention by applying element-wise multiplication to them and finally obtains the attended information, the expected useful knowledge. We apply AoA to both the encoder and the decoder of our image captioning model, which we name as AoA Network (AoANet). Experiments show that AoANet outperforms all previously published methods and achieves a new state-of-the-art performance of 129.8 CIDEr-D score on MS COCO Karpathy offline test split and 129.6 CIDEr-D (C40) score on the official online testing server. Code is available at https://github.com/husthuaan/AoANet.
Tasks Image Captioning
Published 2019-08-19
URL https://arxiv.org/abs/1908.06954v2
PDF https://arxiv.org/pdf/1908.06954v2.pdf
PWC https://paperswithcode.com/paper/attention-on-attention-for-image-captioning
Repo https://github.com/husthuaan/AoANet
Framework pytorch

SEN12MS – A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion

Title SEN12MS – A Curated Dataset of Georeferenced Multi-Spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion
Authors Michael Schmitt, Lloyd Haydn Hughes, Chunping Qiu, Xiao Xiang Zhu
Abstract The availability of curated large-scale training data is a crucial factor for the development of well-generalizing deep learning methods for the extraction of geoinformation from multi-sensor remote sensing imagery. While quite some datasets have already been published by the community, most of them suffer from rather strong limitations, e.g. regarding spatial coverage, diversity or simply number of available samples. Exploiting the freely available data acquired by the Sentinel satellites of the Copernicus program implemented by the European Space Agency, as well as the cloud computing facilities of Google Earth Engine, we provide a dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps. With all patches being fully georeferenced at a 10 m ground sampling distance and covering all inhabited continents during all meteorological seasons, we expect the dataset to support the community in developing sophisticated deep learning-based approaches for common tasks such as scene classification or semantic segmentation for land cover mapping.
Tasks Scene Classification, Semantic Segmentation
Published 2019-06-18
URL https://arxiv.org/abs/1906.07789v1
PDF https://arxiv.org/pdf/1906.07789v1.pdf
PWC https://paperswithcode.com/paper/sen12ms-a-curated-dataset-of-georeferenced
Repo https://github.com/lucashu1/land-cover
Framework none

YOLACT++: Better Real-time Instance Segmentation

Title YOLACT++: Better Real-time Instance Segmentation
Authors Daniel Bolya, Chong Zhou, Fanyi Xiao, Yong Jae Lee
Abstract We present a simple, fully-convolutional model for real-time (>30 fps) instance segmentation that achieves competitive results on MS COCO evaluated on a single Titan Xp, which is significantly faster than any previous state-of-the-art approach. Moreover, we obtain this result after training on only one GPU. We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coefficients. Then we produce instance masks by linearly combining the prototypes with the mask coefficients. We find that because this process doesn’t depend on repooling, this approach produces very high-quality masks and exhibits temporal stability for free. Furthermore, we analyze the emergent behavior of our prototypes and show they learn to localize instances on their own in a translation variant manner, despite being fully-convolutional. We also propose Fast NMS, a drop-in 12 ms faster replacement for standard NMS that only has a marginal performance penalty. Finally, by incorporating deformable convolutions into the backbone network, optimizing the prediction head with better anchor scales and aspect ratios, and adding a novel fast mask re-scoring branch, our YOLACT++ model can achieve 34.1 mAP on MS COCO at 33.5 fps, which is fairly close to the state-of-the-art approaches while still running at real-time.
Tasks Instance Segmentation, Real-time Instance Segmentation, Semantic Segmentation
Published 2019-12-03
URL https://arxiv.org/abs/1912.06218v1
PDF https://arxiv.org/pdf/1912.06218v1.pdf
PWC https://paperswithcode.com/paper/yolact-better-real-time-instance-segmentation
Repo https://github.com/dbolya/yolact
Framework pytorch

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

Title CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Authors Satwik Kottur, José M. F. Moura, Devi Parikh, Dhruv Batra, Marcus Rohrbach
Abstract Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image, using the conversation history as context. It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the ‘state’ of all images and dialogs. We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs. We use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dialog models that was not possible without this dataset. We hope the findings from CLEVR-Dialog will help inform the development of future models for visual dialog. Our dataset and code are publicly available.
Tasks Coreference Resolution, Visual Dialog
Published 2019-03-07
URL https://arxiv.org/abs/1903.03166v2
PDF https://arxiv.org/pdf/1903.03166v2.pdf
PWC https://paperswithcode.com/paper/clevr-dialog-a-diagnostic-dataset-for-multi
Repo https://github.com/satwikkottur/clevr-dialog
Framework none
comments powered by Disqus