February 1, 2020

3150 words 15 mins read

Paper Group AWR 331

DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better. MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension. Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies. Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution s …

DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better


Title	DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better
Authors	Orest Kupyn, Tetiana Martyniuk, Junru Wu, Zhangyang Wang
Abstract	We present a new end-to-end generative adversarial network (GAN) for single image motion deblurring, named DeblurGAN-v2, which considerably boosts state-of-the-art deblurring efficiency, quality, and flexibility. DeblurGAN-v2 is based on a relativistic conditional GAN with a double-scale discriminator. For the first time, we introduce the Feature Pyramid Network into deblurring, as a core building block in the generator of DeblurGAN-v2. It can flexibly work with a wide range of backbones, to navigate the balance between performance and efficiency. The plug-in of sophisticated backbones (e.g., Inception-ResNet-v2) can lead to solid state-of-the-art deblurring. Meanwhile, with light-weight backbones (e.g., MobileNet and its variants), DeblurGAN-v2 reaches 10-100 times faster than the nearest competitors, while maintaining close to state-of-the-art results, implying the option of real-time video deblurring. We demonstrate that DeblurGAN-v2 obtains very competitive performance on several popular benchmarks, in terms of deblurring quality (both objective and subjective), as well as efficiency. Besides, we show the architecture to be effective for general image restoration tasks too. Our codes, models and data are available at: https://github.com/KupynOrest/DeblurGANv2
Tasks	Deblurring, Image Restoration
Published	2019-08-10
URL	https://arxiv.org/abs/1908.03826v1
PDF	https://arxiv.org/pdf/1908.03826v1.pdf
PWC	https://paperswithcode.com/paper/deblurgan-v2-deblurring-orders-of-magnitude
Repo	https://github.com/KupynOrest/DeblurGANv2
Framework	pytorch

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension


Title	MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension
Authors	Alon Talmor, Jonathan Berant
Abstract	A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community.
Tasks	Reading Comprehension
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13453v1
PDF	https://arxiv.org/pdf/1905.13453v1.pdf
PWC	https://paperswithcode.com/paper/multiqa-an-empirical-investigation-of
Repo	https://github.com/alontalmor/multiqa
Framework	pytorch

Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies


Title	Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies
Authors	Patrick Nadeem Ward, Ariella Smofsky, Avishek Joey Bose
Abstract	Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by \cut{chosen due}its easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. \cut{We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.}We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02771v1
PDF	https://arxiv.org/pdf/1906.02771v1.pdf
PWC	https://paperswithcode.com/paper/improving-exploration-in-soft-actor-critic
Repo	https://github.com/joeybose/FloRL
Framework	pytorch

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram


Title	Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Authors	Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim
Abstract	We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system.
Tasks	Speech Synthesis, Text-To-Speech Synthesis
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11480v2
PDF	https://arxiv.org/pdf/1910.11480v2.pdf
PWC	https://paperswithcode.com/paper/parallel-wavegan-a-fast-waveform-generation
Repo	https://github.com/yanggeng1995/GAN-TTS
Framework	pytorch

SCALOR: Generative World Models with Scalable Object Representations


Title	SCALOR: Generative World Models with Scalable Object Representations
Authors	Jindong Jiang, Sepehr Janghorbani, Gerard de Melo, Sungjin Ahn
Abstract	Scalability in terms of object density in a scene is a primary challenge in unsupervised sequential object-oriented representation learning. Most of the previous models have been shown to work only on scenes with a few objects. In this paper, we propose SCALOR, a probabilistic generative world model for learning SCALable Object-oriented Representation of a video. With the proposed spatially-parallel attention and proposal-rejection mechanisms, SCALOR can deal with orders of magnitude larger numbers of objects compared to the previous state-of-the-art models. Additionally, we introduce a background module that allows SCALOR to model complex dynamic backgrounds as well as many foreground objects in the scene. We demonstrate that SCALOR can deal with crowded scenes containing up to a hundred objects while jointly modeling complex dynamic backgrounds. Importantly, SCALOR is the first unsupervised object representation model shown to work for natural scenes containing several tens of moving objects.
Tasks	Representation Learning
Published	2019-10-06
URL	https://arxiv.org/abs/1910.02384v4
PDF	https://arxiv.org/pdf/1910.02384v4.pdf
PWC	https://paperswithcode.com/paper/scalable-object-oriented-sequential-1
Repo	https://github.com/JindongJiang/JindongJiang.github.io
Framework	none

Unsupervised Video Interpolation Using Cycle Consistency


Title	Unsupervised Video Interpolation Using Cycle Consistency
Authors	Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
Abstract	Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. For a triplet of consecutive frames, we optimize models to minimize the discrepancy between the center frame and its cycle reconstruction, obtained by interpolating back from interpolated intermediate frames. This simple unsupervised constraint alone achieves results comparable with supervision using the ground truth intermediate frames. We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model. The pseudo supervised loss term, used together with cycle consistency, can effectively adapt a pre-trained model to a new target domain. With no additional data and in a completely unsupervised fashion, our techniques significantly improve pre-trained models on new target domains, increasing PSNR values from 32.84dB to 33.05dB on the Slowflow and from 31.82dB to 32.53dB on the Sintel evaluation datasets.
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05928v2
PDF	https://arxiv.org/pdf/1906.05928v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-video-interpolation-using-cycle
Repo	https://github.com/NVIDIA/unsupervised-video-interpolation
Framework	pytorch

Fast Structured Decoding for Sequence Models


Title	Fast Structured Decoding for Sequence Models
Authors	Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng
Abstract	Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models.
Tasks	Machine Translation
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11555v2
PDF	https://arxiv.org/pdf/1910.11555v2.pdf
PWC	https://paperswithcode.com/paper/fast-structured-decoding-for-sequence-models
Repo	https://github.com/Edward-Sun/structured-nart
Framework	pytorch

SweepNet: Wide-baseline Omnidirectional Depth Estimation


Title	SweepNet: Wide-baseline Omnidirectional Depth Estimation
Authors	Changhee Won, Jongbin Ryu, Jongwoo Lim
Abstract	Omnidirectional depth sensing has its advantage over the conventional stereo systems since it enables us to recognize the objects of interest in all directions without any blind regions. In this paper, we propose a novel wide-baseline omnidirectional stereo algorithm which computes the dense depth estimate from the fisheye images using a deep convolutional neural network. The capture system consists of multiple cameras mounted on a wide-baseline rig with ultrawide field of view (FOV) lenses, and we present the calibration algorithm for the extrinsic parameters based on the bundle adjustment. Instead of estimating depth maps from multiple sets of rectified images and stitching them, our approach directly generates one dense omnidirectional depth map with full 360-degree coverage at the rig global coordinate system. To this end, the proposed neural network is designed to output the cost volume from the warped images in the sphere sweeping method, and the final depth map is estimated by taking the minimum cost indices of the aggregated cost volume by SGM. For training the deep neural network and testing the entire system, realistic synthetic urban datasets are rendered using Blender. The experiments using the synthetic and real-world datasets show that our algorithm outperforms the conventional depth estimation methods and generate highly accurate depth maps.
Tasks	Calibration, Depth Estimation
Published	2019-02-28
URL	https://arxiv.org/abs/1902.10904v2
PDF	https://arxiv.org/pdf/1902.10904v2.pdf
PWC	https://paperswithcode.com/paper/sweepnet-wide-baseline-omnidirectional-depth
Repo	https://github.com/hyu-cvlab/sweepnet
Framework	none

TinyBERT: Distilling BERT for Natural Language Understanding


Title	TinyBERT: Distilling BERT for Natural Language Understanding
Authors	Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu
Abstract	Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them.
Tasks	Language Modelling, Linguistic Acceptability, Natural Language Inference, Question Answering, Semantic Textual Similarity
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10351v4
PDF	https://arxiv.org/pdf/1909.10351v4.pdf
PWC	https://paperswithcode.com/paper/190910351
Repo	https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT
Framework	tf

Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs


Title	Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs
Authors	Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, Dongyan Zhao
Abstract	Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations.
Tasks	Entity Alignment, Entity Embeddings, Knowledge Graphs
Published	2019-08-22
URL	https://arxiv.org/abs/1908.08210v1
PDF	https://arxiv.org/pdf/1908.08210v1.pdf
PWC	https://paperswithcode.com/paper/relation-aware-entity-alignment-for
Repo	https://github.com/StephanieWyt/RDGCN
Framework	tf

Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches


Title	Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches
Authors	Yoni Kasten, Meirav Galun, Ronen Basri
Abstract	Incremental (online) structure from motion pipelines seek to recover the camera matrix associated with an image $I_n$ given $n-1$ images, $I_1,…,I_{n-1}$, whose camera matrices have already been recovered. In this paper, we introduce a novel solution to the six-point online algorithm to recover the exterior parameters associated with $I_n$. Our algorithm uses just six corresponding pairs of 2D points, extracted each from $I_n$ and from \textit{any} of the preceding $n-1$ images, allowing the recovery of the full six degrees of freedom of the $n$'th camera, and unlike common methods, does not require tracking feature points in three or more images. Our novel solution is based on constructing a Dixon resultant, yielding a solution method that is both efficient and accurate compared to existing solutions. We further use Bernstein’s theorem to prove a tight bound on the number of complex solutions. Our experiments demonstrate the utility of our approach.
Tasks
Published	2019-01-27
URL	http://arxiv.org/abs/1901.09364v1
PDF	http://arxiv.org/pdf/1901.09364v1.pdf
PWC	https://paperswithcode.com/paper/resultant-based-incremental-recovery-of
Repo	https://github.com/ykasten/resultantCamPose
Framework	none

A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation


Title	A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation
Authors	Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough
Abstract	Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classification for the lay user has not been attempted before. In this paper, we investigate the possibility of classifying the ransomware a system is infected with simply based on a screenshot of the splash screen or the ransom note captured using a consumer camera commonly found in any modern mobile device. To train and evaluate our system, we create a sample dataset of the splash screens of 50 well-known ransomware variants. In our dataset, only a single training image is available per ransomware. Instead of creating a large training dataset of ransomware screenshots, we simulate screenshot capture conditions via carefully designed data augmentation techniques, enabling simple and efficient one-shot learning. Moreover, using model uncertainty obtained via Bayesian approximation, we ensure special input cases such as unrelated non-ransomware images and previously-unseen ransomware variants are correctly identified for special handling and not mis-classified. Extensive experimental evaluation demonstrates the efficacy of our work, with accuracy levels of up to 93.6% for ransomware classification.
Tasks	Data Augmentation, One-Shot Learning
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06750v1
PDF	https://arxiv.org/pdf/1908.06750v1.pdf
PWC	https://paperswithcode.com/paper/a-kings-ransom-for-encryption-ransomware
Repo	https://github.com/atapour/ransomware-classification
Framework	pytorch

Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network


Title	Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network
Authors	Yao Saint Yen, Zhe Wei Chen, Ying Ren Guo, Meng Chang Chen
Abstract	Deep learning has been used in the research of malware analysis. Most classification methods use either static analysis features or dynamic analysis features for malware family classification, and rarely combine them as classification features and also no extra effort is spent integrating the two types of features. In this paper, we combine static and dynamic analysis features with deep neural networks for Windows malware classification. We develop several methods to generate static and dynamic analysis features to classify malware in different ways. Given these features, we conduct experiments with composite neural network, showing that the proposed approach performs best with an accuracy of 83.17% on a total of 80 malware families with 4519 malware samples. Additionally, we show that using integrated features for malware family classification outperforms using static features or dynamic features alone. We show how static and dynamic features complement each other for malware classification.
Tasks	Malware Classification
Published	2019-12-24
URL	https://arxiv.org/abs/1912.11249v1
PDF	https://arxiv.org/pdf/1912.11249v1.pdf
PWC	https://paperswithcode.com/paper/integration-of-static-and-dynamic-analysis
Repo	https://github.com/guelfoweb/peframe
Framework	none

KiloGrams: Very Large N-Grams for Malware Classification


Title	KiloGrams: Very Large N-Grams for Malware Classification
Authors	Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles Nicholas, Mark McLean
Abstract	N-grams have been a common tool for information retrieval and machine learning applications for decades. In nearly all previous works, only a few values of $n$ are tested, with $n > 6$ being exceedingly rare. Larger values of $n$ are not tested due to computational burden or the fear of overfitting. In this work, we present a method to find the top-$k$ most frequent $n$-grams that is 60$\times$ faster for small $n$, and can tackle large $n\geq1024$. Despite the unprecedented size of $n$ considered, we show how these features still have predictive ability for malware classification tasks. More important, large $n$-grams provide benefits in producing features that are interpretable by malware analysis, and can be used to create general purpose signatures compatible with industry standard tools like Yara. Furthermore, the counts of common $n$-grams in a file may be added as features to publicly available human-engineered features that rival efficacy of professionally-developed features when used to train gradient-boosted decision tree models on the EMBER dataset.
Tasks	Information Retrieval, Malware Classification
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00200v1
PDF	https://arxiv.org/pdf/1908.00200v1.pdf
PWC	https://paperswithcode.com/paper/kilograms-very-large-n-grams-for-malware
Repo	https://github.com/NeuromorphicComputationResearchProgram/KiloGrams
Framework	none

Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following


Title	Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
Authors	David Gaddy, Dan Klein
Abstract	We consider the problem of learning to map from natural language instructions to state transitions (actions) in a data-efficient manner. Our method takes inspiration from the idea that it should be easier to ground language to concepts that have already been formed through pre-linguistic observation. We augment a baseline instruction-following learner with an initial environment-learning phase that uses observations of language-free state transitions to induce a suitable latent representation of actions before processing the instruction-following training data. We show that mapping to pre-learned representations substantially improves performance over systems whose representations are learned from limited instructional data alone.
Tasks
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09671v1
PDF	https://arxiv.org/pdf/1907.09671v1.pdf
PWC	https://paperswithcode.com/paper/pre-learning-environment-representations-for
Repo	https://github.com/dgaddy/environment-learning
Framework	pytorch