February 1, 2020

3150 words 15 mins read

Paper Group AWR 331

Paper Group AWR 331

DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better. MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension. Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies. Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution s …

DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better

Title DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better
Authors Orest Kupyn, Tetiana Martyniuk, Junru Wu, Zhangyang Wang
Abstract We present a new end-to-end generative adversarial network (GAN) for single image motion deblurring, named DeblurGAN-v2, which considerably boosts state-of-the-art deblurring efficiency, quality, and flexibility. DeblurGAN-v2 is based on a relativistic conditional GAN with a double-scale discriminator. For the first time, we introduce the Feature Pyramid Network into deblurring, as a core building block in the generator of DeblurGAN-v2. It can flexibly work with a wide range of backbones, to navigate the balance between performance and efficiency. The plug-in of sophisticated backbones (e.g., Inception-ResNet-v2) can lead to solid state-of-the-art deblurring. Meanwhile, with light-weight backbones (e.g., MobileNet and its variants), DeblurGAN-v2 reaches 10-100 times faster than the nearest competitors, while maintaining close to state-of-the-art results, implying the option of real-time video deblurring. We demonstrate that DeblurGAN-v2 obtains very competitive performance on several popular benchmarks, in terms of deblurring quality (both objective and subjective), as well as efficiency. Besides, we show the architecture to be effective for general image restoration tasks too. Our codes, models and data are available at: https://github.com/KupynOrest/DeblurGANv2
Tasks Deblurring, Image Restoration
Published 2019-08-10
URL https://arxiv.org/abs/1908.03826v1
PDF https://arxiv.org/pdf/1908.03826v1.pdf
PWC https://paperswithcode.com/paper/deblurgan-v2-deblurring-orders-of-magnitude
Repo https://github.com/KupynOrest/DeblurGANv2
Framework pytorch

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

Title MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension
Authors Alon Talmor, Jonathan Berant
Abstract A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.
Tasks Reading Comprehension
Published 2019-05-31
URL https://arxiv.org/abs/1905.13453v1
PDF https://arxiv.org/pdf/1905.13453v1.pdf
PWC https://paperswithcode.com/paper/multiqa-an-empirical-investigation-of
Repo https://github.com/alontalmor/multiqa
Framework pytorch

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies

Title Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies
Authors Patrick Nadeem Ward, Ariella Smofsky, Avishek Joey Bose
Abstract Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by \cut{chosen due}its easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. \cut{We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.}We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings.
Tasks
Published 2019-06-06
URL https://arxiv.org/abs/1906.02771v1
PDF https://arxiv.org/pdf/1906.02771v1.pdf
PWC https://paperswithcode.com/paper/improving-exploration-in-soft-actor-critic
Repo https://github.com/joeybose/FloRL
Framework pytorch

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

Title Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Authors Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Abstract We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.
Tasks Speech Synthesis, Text-To-Speech Synthesis
Published 2019-10-25
URL https://arxiv.org/abs/1910.11480v2
PDF https://arxiv.org/pdf/1910.11480v2.pdf
PWC https://paperswithcode.com/paper/parallel-wavegan-a-fast-waveform-generation
Repo https://github.com/yanggeng1995/GAN-TTS
Framework pytorch

SCALOR: Generative World Models with Scalable Object Representations

Title SCALOR: Generative World Models with Scalable Object Representations
Authors Jindong Jiang, Sepehr Janghorbani, Gerard de Melo, Sungjin Ahn
Abstract Scalability in terms of object density in a scene is a primary challenge in unsupervised sequential object-oriented representation learning. Most of the previous models have been shown to work only on scenes with a few objects. In this paper, we propose SCALOR, a probabilistic generative world model for learning SCALable Object-oriented Representation of a video. With the proposed spatially-parallel attention and proposal-rejection mechanisms, SCALOR can deal with orders of magnitude larger numbers of objects compared to the previous state-of-the-art models. Additionally, we introduce a background module that allows SCALOR to model complex dynamic backgrounds as well as many foreground objects in the scene. We demonstrate that SCALOR can deal with crowded scenes containing up to a hundred objects while jointly modeling complex dynamic backgrounds. Importantly, SCALOR is the first unsupervised object representation model shown to work for natural scenes containing several tens of moving objects.
Tasks Representation Learning
Published 2019-10-06
URL https://arxiv.org/abs/1910.02384v4
PDF https://arxiv.org/pdf/1910.02384v4.pdf
PWC https://paperswithcode.com/paper/scalable-object-oriented-sequential-1
Repo https://github.com/JindongJiang/JindongJiang.github.io
Framework none

Unsupervised Video Interpolation Using Cycle Consistency

Title Unsupervised Video Interpolation Using Cycle Consistency
Authors Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
Abstract Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. For a triplet of consecutive frames, we optimize models to minimize the discrepancy between the center frame and its cycle reconstruction, obtained by interpolating back from interpolated intermediate frames. This simple unsupervised constraint alone achieves results comparable with supervision using the ground truth intermediate frames. We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model. The pseudo supervised loss term, used together with cycle consistency, can effectively adapt a pre-trained model to a new target domain. With no additional data and in a completely unsupervised fashion, our techniques significantly improve pre-trained models on new target domains, increasing PSNR values from 32.84dB to 33.05dB on the Slowflow and from 31.82dB to 32.53dB on the Sintel evaluation datasets.
Tasks
Published 2019-06-13
URL https://arxiv.org/abs/1906.05928v2
PDF https://arxiv.org/pdf/1906.05928v2.pdf
PWC https://paperswithcode.com/paper/unsupervised-video-interpolation-using-cycle
Repo https://github.com/NVIDIA/unsupervised-video-interpolation
Framework pytorch

Fast Structured Decoding for Sequence Models

Title Fast Structured Decoding for Sequence Models
Authors Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng
Abstract Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.
Tasks Machine Translation
Published 2019-10-25
URL https://arxiv.org/abs/1910.11555v2
PDF https://arxiv.org/pdf/1910.11555v2.pdf
PWC https://paperswithcode.com/paper/fast-structured-decoding-for-sequence-models
Repo https://github.com/Edward-Sun/structured-nart
Framework pytorch

SweepNet: Wide-baseline Omnidirectional Depth Estimation

Title SweepNet: Wide-baseline Omnidirectional Depth Estimation
Authors Changhee Won, Jongbin Ryu, Jongwoo Lim
Abstract Omnidirectional depth sensing has its advantage over the conventional stereo systems since it enables us to recognize the objects of interest in all directions without any blind regions. In this paper, we propose a novel wide-baseline omnidirectional stereo algorithm which computes the dense depth estimate from the fisheye images using a deep convolutional neural network. The capture system consists of multiple cameras mounted on a wide-baseline rig with ultrawide field of view (FOV) lenses, and we present the calibration algorithm for the extrinsic parameters based on the bundle adjustment. Instead of estimating depth maps from multiple sets of rectified images and stitching them, our approach directly generates one dense omnidirectional depth map with full 360-degree coverage at the rig global coordinate system. To this end, the proposed neural network is designed to output the cost volume from the warped images in the sphere sweeping method, and the final depth map is estimated by taking the minimum cost indices of the aggregated cost volume by SGM. For training the deep neural network and testing the entire system, realistic synthetic urban datasets are rendered using Blender. The experiments using the synthetic and real-world datasets show that our algorithm outperforms the conventional depth estimation methods and generate highly accurate depth maps.
Tasks Calibration, Depth Estimation
Published 2019-02-28
URL https://arxiv.org/abs/1902.10904v2
PDF https://arxiv.org/pdf/1902.10904v2.pdf
PWC https://paperswithcode.com/paper/sweepnet-wide-baseline-omnidirectional-depth
Repo https://github.com/hyu-cvlab/sweepnet
Framework none

TinyBERT: Distilling BERT for Natural Language Understanding

Title TinyBERT: Distilling BERT for Natural Language Understanding
Authors Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
Abstract Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them.
Tasks Language Modelling, Linguistic Acceptability, Natural Language Inference, Question Answering, Semantic Textual Similarity
Published 2019-09-23
URL https://arxiv.org/abs/1909.10351v4
PDF https://arxiv.org/pdf/1909.10351v4.pdf
PWC https://paperswithcode.com/paper/190910351
Repo https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT
Framework tf

Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs

Title Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs
Authors Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, Dongyan Zhao
Abstract Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations.
Tasks Entity Alignment, Entity Embeddings, Knowledge Graphs
Published 2019-08-22
URL https://arxiv.org/abs/1908.08210v1
PDF https://arxiv.org/pdf/1908.08210v1.pdf
PWC https://paperswithcode.com/paper/relation-aware-entity-alignment-for
Repo https://github.com/StephanieWyt/RDGCN
Framework tf

Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches

Title Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches
Authors Yoni Kasten, Meirav Galun, Ronen Basri
Abstract Incremental (online) structure from motion pipelines seek to recover the camera matrix associated with an image $I_n$ given $n-1$ images, $I_1,…,I_{n-1}$, whose camera matrices have already been recovered. In this paper, we introduce a novel solution to the six-point online algorithm to recover the exterior parameters associated with $I_n$. Our algorithm uses just six corresponding pairs of 2D points, extracted each from $I_n$ and from \textit{any} of the preceding $n-1$ images, allowing the recovery of the full six degrees of freedom of the $n$'th camera, and unlike common methods, does not require tracking feature points in three or more images. Our novel solution is based on constructing a Dixon resultant, yielding a solution method that is both efficient and accurate compared to existing solutions. We further use Bernstein’s theorem to prove a tight bound on the number of complex solutions. Our experiments demonstrate the utility of our approach.
Tasks
Published 2019-01-27
URL http://arxiv.org/abs/1901.09364v1
PDF http://arxiv.org/pdf/1901.09364v1.pdf
PWC https://paperswithcode.com/paper/resultant-based-incremental-recovery-of
Repo https://github.com/ykasten/resultantCamPose
Framework none

A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation

Title A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation
Authors Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough
Abstract Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classification for the lay user has not been attempted before. In this paper, we investigate the possibility of classifying the ransomware a system is infected with simply based on a screenshot of the splash screen or the ransom note captured using a consumer camera commonly found in any modern mobile device. To train and evaluate our system, we create a sample dataset of the splash screens of 50 well-known ransomware variants. In our dataset, only a single training image is available per ransomware. Instead of creating a large training dataset of ransomware screenshots, we simulate screenshot capture conditions via carefully designed data augmentation techniques, enabling simple and efficient one-shot learning. Moreover, using model uncertainty obtained via Bayesian approximation, we ensure special input cases such as unrelated non-ransomware images and previously-unseen ransomware variants are correctly identified for special handling and not mis-classified. Extensive experimental evaluation demonstrates the efficacy of our work, with accuracy levels of up to 93.6% for ransomware classification.
Tasks Data Augmentation, One-Shot Learning
Published 2019-08-19
URL https://arxiv.org/abs/1908.06750v1
PDF https://arxiv.org/pdf/1908.06750v1.pdf
PWC https://paperswithcode.com/paper/a-kings-ransom-for-encryption-ransomware
Repo https://github.com/atapour/ransomware-classification
Framework pytorch

Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network

Title Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network
Authors Yao Saint Yen, Zhe Wei Chen, Ying Ren Guo, Meng Chang Chen
Abstract Deep learning has been used in the research of malware analysis. Most classification methods use either static analysis features or dynamic analysis features for malware family classification, and rarely combine them as classification features and also no extra effort is spent integrating the two types of features. In this paper, we combine static and dynamic analysis features with deep neural networks for Windows malware classification. We develop several methods to generate static and dynamic analysis features to classify malware in different ways. Given these features, we conduct experiments with composite neural network, showing that the proposed approach performs best with an accuracy of 83.17% on a total of 80 malware families with 4519 malware samples. Additionally, we show that using integrated features for malware family classification outperforms using static features or dynamic features alone. We show how static and dynamic features complement each other for malware classification.
Tasks Malware Classification
Published 2019-12-24
URL https://arxiv.org/abs/1912.11249v1
PDF https://arxiv.org/pdf/1912.11249v1.pdf
PWC https://paperswithcode.com/paper/integration-of-static-and-dynamic-analysis
Repo https://github.com/guelfoweb/peframe
Framework none

KiloGrams: Very Large N-Grams for Malware Classification

Title KiloGrams: Very Large N-Grams for Malware Classification
Authors Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles Nicholas, Mark McLean
Abstract N-grams have been a common tool for information retrieval and machine learning applications for decades. In nearly all previous works, only a few values of $n$ are tested, with $n > 6$ being exceedingly rare. Larger values of $n$ are not tested due to computational burden or the fear of overfitting. In this work, we present a method to find the top-$k$ most frequent $n$-grams that is 60$\times$ faster for small $n$, and can tackle large $n\geq1024$. Despite the unprecedented size of $n$ considered, we show how these features still have predictive ability for malware classification tasks. More important, large $n$-grams provide benefits in producing features that are interpretable by malware analysis, and can be used to create general purpose signatures compatible with industry standard tools like Yara. Furthermore, the counts of common $n$-grams in a file may be added as features to publicly available human-engineered features that rival efficacy of professionally-developed features when used to train gradient-boosted decision tree models on the EMBER dataset.
Tasks Information Retrieval, Malware Classification
Published 2019-08-01
URL https://arxiv.org/abs/1908.00200v1
PDF https://arxiv.org/pdf/1908.00200v1.pdf
PWC https://paperswithcode.com/paper/kilograms-very-large-n-grams-for-malware
Repo https://github.com/NeuromorphicComputationResearchProgram/KiloGrams
Framework none

Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following

Title Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
Authors David Gaddy, Dan Klein
Abstract We consider the problem of learning to map from natural language instructions to state transitions (actions) in a data-efficient manner. Our method takes inspiration from the idea that it should be easier to ground language to concepts that have already been formed through pre-linguistic observation. We augment a baseline instruction-following learner with an initial environment-learning phase that uses observations of language-free state transitions to induce a suitable latent representation of actions before processing the instruction-following training data. We show that mapping to pre-learned representations substantially improves performance over systems whose representations are learned from limited instructional data alone.
Tasks
Published 2019-07-23
URL https://arxiv.org/abs/1907.09671v1
PDF https://arxiv.org/pdf/1907.09671v1.pdf
PWC https://paperswithcode.com/paper/pre-learning-environment-representations-for
Repo https://github.com/dgaddy/environment-learning
Framework pytorch
comments powered by Disqus