Paper Group AWR 331
DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better. MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension. Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies. Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution s …
DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better
Title | DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better |
Authors | Orest Kupyn, Tetiana Martyniuk, Junru Wu, Zhangyang Wang |
Abstract | We present a new end-to-end generative adversarial network (GAN) for single image motion deblurring, named DeblurGAN-v2, which considerably boosts state-of-the-art deblurring efficiency, quality, and flexibility. DeblurGAN-v2 is based on a relativistic conditional GAN with a double-scale discriminator. For the first time, we introduce the Feature Pyramid Network into deblurring, as a core building block in the generator of DeblurGAN-v2. It can flexibly work with a wide range of backbones, to navigate the balance between performance and efficiency. The plug-in of sophisticated backbones (e.g., Inception-ResNet-v2) can lead to solid state-of-the-art deblurring. Meanwhile, with light-weight backbones (e.g., MobileNet and its variants), DeblurGAN-v2 reaches 10-100 times faster than the nearest competitors, while maintaining close to state-of-the-art results, implying the option of real-time video deblurring. We demonstrate that DeblurGAN-v2 obtains very competitive performance on several popular benchmarks, in terms of deblurring quality (both objective and subjective), as well as efficiency. Besides, we show the architecture to be effective for general image restoration tasks too. Our codes, models and data are available at: https://github.com/KupynOrest/DeblurGANv2 |
Tasks | Deblurring, Image Restoration |
Published | 2019-08-10 |
URL | https://arxiv.org/abs/1908.03826v1 |
https://arxiv.org/pdf/1908.03826v1.pdf | |
PWC | https://paperswithcode.com/paper/deblurgan-v2-deblurring-orders-of-magnitude |
Repo | https://github.com/KupynOrest/DeblurGANv2 |
Framework | pytorch |
MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension
Title | MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension |
Authors | Alon Talmor, Jonathan Berant |
Abstract | A large number of reading comprehension (RC) datasets has been created recently, but little analysis has been done on whether they generalize to one another, and the extent to which existing datasets can be leveraged for improving performance on new ones. In this paper, we conduct such an investigation over ten RC datasets, training on one or more source RC datasets, and evaluating generalization, as well as transfer to a target RC dataset. We analyze the factors that contribute to generalization, and show that training on a source RC dataset and transferring to a target dataset substantially improves performance, even in the presence of powerful contextual representations from BERT (Devlin et al., 2019). We also find that training on multiple source RC datasets leads to robust generalization and transfer, and can reduce the cost of example collection for a new RC dataset. Following our analysis, we propose MultiQA, a BERT-based model, trained on multiple RC datasets, which leads to state-of-the-art performance on five RC datasets. We share our infrastructure for the benefit of the research community. |
Tasks | Reading Comprehension |
Published | 2019-05-31 |
URL | https://arxiv.org/abs/1905.13453v1 |
https://arxiv.org/pdf/1905.13453v1.pdf | |
PWC | https://paperswithcode.com/paper/multiqa-an-empirical-investigation-of |
Repo | https://github.com/alontalmor/multiqa |
Framework | pytorch |
Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies
Title | Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies |
Authors | Patrick Nadeem Ward, Ariella Smofsky, Avishek Joey Bose |
Abstract | Deep Reinforcement Learning (DRL) algorithms for continuous action spaces are known to be brittle toward hyperparameters as well as \cut{being}sample inefficient. Soft Actor Critic (SAC) proposes an off-policy deep actor critic algorithm within the maximum entropy RL framework which offers greater stability and empirical gains. The choice of policy distribution, a factored Gaussian, is motivated by \cut{chosen due}its easy re-parametrization rather than its modeling power. We introduce Normalizing Flow policies within the SAC framework that learn more expressive classes of policies than simple factored Gaussians. \cut{We also present a series of stabilization tricks that enable effective training of these policies in the RL setting.}We show empirically on continuous grid world tasks that our approach increases stability and is better suited to difficult exploration in sparse reward settings. |
Tasks | |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02771v1 |
https://arxiv.org/pdf/1906.02771v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-exploration-in-soft-actor-critic |
Repo | https://github.com/joeybose/FloRL |
Framework | pytorch |
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram
Title | Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram |
Authors | Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim |
Abstract | We propose Parallel WaveGAN, a distillation-free, fast, and small-footprint waveform generation method using a generative adversarial network. In the proposed method, a non-autoregressive WaveNet is trained by jointly optimizing multi-resolution spectrogram and adversarial loss functions, which can effectively capture the time-frequency distribution of the realistic speech waveform. As our method does not require density distillation used in the conventional teacher-student framework, the entire model can be easily trained. Furthermore, our model is able to generate high-fidelity speech even with its compact architecture. In particular, the proposed Parallel WaveGAN has only 1.44 M parameters and can generate 24 kHz speech waveform 28.68 times faster than real-time on a single GPU environment. Perceptual listening test results verify that our proposed method achieves 4.16 mean opinion score within a Transformer-based text-to-speech framework, which is comparative to the best distillation-based Parallel WaveNet system. |
Tasks | Speech Synthesis, Text-To-Speech Synthesis |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11480v2 |
https://arxiv.org/pdf/1910.11480v2.pdf | |
PWC | https://paperswithcode.com/paper/parallel-wavegan-a-fast-waveform-generation |
Repo | https://github.com/yanggeng1995/GAN-TTS |
Framework | pytorch |
SCALOR: Generative World Models with Scalable Object Representations
Title | SCALOR: Generative World Models with Scalable Object Representations |
Authors | Jindong Jiang, Sepehr Janghorbani, Gerard de Melo, Sungjin Ahn |
Abstract | Scalability in terms of object density in a scene is a primary challenge in unsupervised sequential object-oriented representation learning. Most of the previous models have been shown to work only on scenes with a few objects. In this paper, we propose SCALOR, a probabilistic generative world model for learning SCALable Object-oriented Representation of a video. With the proposed spatially-parallel attention and proposal-rejection mechanisms, SCALOR can deal with orders of magnitude larger numbers of objects compared to the previous state-of-the-art models. Additionally, we introduce a background module that allows SCALOR to model complex dynamic backgrounds as well as many foreground objects in the scene. We demonstrate that SCALOR can deal with crowded scenes containing up to a hundred objects while jointly modeling complex dynamic backgrounds. Importantly, SCALOR is the first unsupervised object representation model shown to work for natural scenes containing several tens of moving objects. |
Tasks | Representation Learning |
Published | 2019-10-06 |
URL | https://arxiv.org/abs/1910.02384v4 |
https://arxiv.org/pdf/1910.02384v4.pdf | |
PWC | https://paperswithcode.com/paper/scalable-object-oriented-sequential-1 |
Repo | https://github.com/JindongJiang/JindongJiang.github.io |
Framework | none |
Unsupervised Video Interpolation Using Cycle Consistency
Title | Unsupervised Video Interpolation Using Cycle Consistency |
Authors | Fitsum A. Reda, Deqing Sun, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro |
Abstract | Learning to synthesize high frame rate videos via interpolation requires large quantities of high frame rate training videos, which, however, are scarce, especially at high resolutions. Here, we propose unsupervised techniques to synthesize high frame rate videos directly from low frame rate videos using cycle consistency. For a triplet of consecutive frames, we optimize models to minimize the discrepancy between the center frame and its cycle reconstruction, obtained by interpolating back from interpolated intermediate frames. This simple unsupervised constraint alone achieves results comparable with supervision using the ground truth intermediate frames. We further introduce a pseudo supervised loss term that enforces the interpolated frames to be consistent with predictions of a pre-trained interpolation model. The pseudo supervised loss term, used together with cycle consistency, can effectively adapt a pre-trained model to a new target domain. With no additional data and in a completely unsupervised fashion, our techniques significantly improve pre-trained models on new target domains, increasing PSNR values from 32.84dB to 33.05dB on the Slowflow and from 31.82dB to 32.53dB on the Sintel evaluation datasets. |
Tasks | |
Published | 2019-06-13 |
URL | https://arxiv.org/abs/1906.05928v2 |
https://arxiv.org/pdf/1906.05928v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-video-interpolation-using-cycle |
Repo | https://github.com/NVIDIA/unsupervised-video-interpolation |
Framework | pytorch |
Fast Structured Decoding for Sequence Models
Title | Fast Structured Decoding for Sequence Models |
Authors | Zhiqing Sun, Zhuohan Li, Haoqing Wang, Zi Lin, Di He, Zhi-Hong Deng |
Abstract | Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently, non-autoregressive sequence models were proposed to reduce the inference time. However, these models assume that the decoding process of each token is conditionally independent of others. Such a generation process sometimes makes the output sentence inconsistent, and thus the learned non-autoregressive models could only achieve inferior accuracy compared to their autoregressive counterparts. To improve then decoding consistency and reduce the inference cost at the same time, we propose to incorporate a structured inference module into the non-autoregressive models. Specifically, we design an efficient approximation for Conditional Random Fields (CRF) for non-autoregressive sequence models, and further propose a dynamic transition technique to model positional contexts in the CRF. Experiments in machine translation show that while increasing little latency (8~14ms), our model could achieve significantly better translation performance than previous non-autoregressive models on different translation datasets. In particular, for the WMT14 En-De dataset, our model obtains a BLEU score of 26.80, which largely outperforms the previous non-autoregressive baselines and is only 0.61 lower in BLEU than purely autoregressive models. |
Tasks | Machine Translation |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11555v2 |
https://arxiv.org/pdf/1910.11555v2.pdf | |
PWC | https://paperswithcode.com/paper/fast-structured-decoding-for-sequence-models |
Repo | https://github.com/Edward-Sun/structured-nart |
Framework | pytorch |
SweepNet: Wide-baseline Omnidirectional Depth Estimation
Title | SweepNet: Wide-baseline Omnidirectional Depth Estimation |
Authors | Changhee Won, Jongbin Ryu, Jongwoo Lim |
Abstract | Omnidirectional depth sensing has its advantage over the conventional stereo systems since it enables us to recognize the objects of interest in all directions without any blind regions. In this paper, we propose a novel wide-baseline omnidirectional stereo algorithm which computes the dense depth estimate from the fisheye images using a deep convolutional neural network. The capture system consists of multiple cameras mounted on a wide-baseline rig with ultrawide field of view (FOV) lenses, and we present the calibration algorithm for the extrinsic parameters based on the bundle adjustment. Instead of estimating depth maps from multiple sets of rectified images and stitching them, our approach directly generates one dense omnidirectional depth map with full 360-degree coverage at the rig global coordinate system. To this end, the proposed neural network is designed to output the cost volume from the warped images in the sphere sweeping method, and the final depth map is estimated by taking the minimum cost indices of the aggregated cost volume by SGM. For training the deep neural network and testing the entire system, realistic synthetic urban datasets are rendered using Blender. The experiments using the synthetic and real-world datasets show that our algorithm outperforms the conventional depth estimation methods and generate highly accurate depth maps. |
Tasks | Calibration, Depth Estimation |
Published | 2019-02-28 |
URL | https://arxiv.org/abs/1902.10904v2 |
https://arxiv.org/pdf/1902.10904v2.pdf | |
PWC | https://paperswithcode.com/paper/sweepnet-wide-baseline-omnidirectional-depth |
Repo | https://github.com/hyu-cvlab/sweepnet |
Framework | none |
TinyBERT: Distilling BERT for Natural Language Understanding
Title | TinyBERT: Distilling BERT for Natural Language Understanding |
Authors | Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu |
Abstract | Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT.TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERTBASE on GLUE benchmark while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines on BERT distillation, with only about 28% parameters and about 31% inference time of them. |
Tasks | Language Modelling, Linguistic Acceptability, Natural Language Inference, Question Answering, Semantic Textual Similarity |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.10351v4 |
https://arxiv.org/pdf/1909.10351v4.pdf | |
PWC | https://paperswithcode.com/paper/190910351 |
Repo | https://github.com/huawei-noah/Pretrained-Language-Model/tree/master/TinyBERT |
Framework | tf |
Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs
Title | Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs |
Authors | Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, Rui Yan, Dongyan Zhao |
Abstract | Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations. |
Tasks | Entity Alignment, Entity Embeddings, Knowledge Graphs |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08210v1 |
https://arxiv.org/pdf/1908.08210v1.pdf | |
PWC | https://paperswithcode.com/paper/relation-aware-entity-alignment-for |
Repo | https://github.com/StephanieWyt/RDGCN |
Framework | tf |
Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches
Title | Resultant Based Incremental Recovery of Camera Pose from Pairwise Matches |
Authors | Yoni Kasten, Meirav Galun, Ronen Basri |
Abstract | Incremental (online) structure from motion pipelines seek to recover the camera matrix associated with an image $I_n$ given $n-1$ images, $I_1,…,I_{n-1}$, whose camera matrices have already been recovered. In this paper, we introduce a novel solution to the six-point online algorithm to recover the exterior parameters associated with $I_n$. Our algorithm uses just six corresponding pairs of 2D points, extracted each from $I_n$ and from \textit{any} of the preceding $n-1$ images, allowing the recovery of the full six degrees of freedom of the $n$'th camera, and unlike common methods, does not require tracking feature points in three or more images. Our novel solution is based on constructing a Dixon resultant, yielding a solution method that is both efficient and accurate compared to existing solutions. We further use Bernstein’s theorem to prove a tight bound on the number of complex solutions. Our experiments demonstrate the utility of our approach. |
Tasks | |
Published | 2019-01-27 |
URL | http://arxiv.org/abs/1901.09364v1 |
http://arxiv.org/pdf/1901.09364v1.pdf | |
PWC | https://paperswithcode.com/paper/resultant-based-incremental-recovery-of |
Repo | https://github.com/ykasten/resultantCamPose |
Framework | none |
A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation
Title | A Kings Ransom for Encryption: Ransomware Classification using Augmented One-Shot Learning and Bayesian Approximation |
Authors | Amir Atapour-Abarghouei, Stephen Bonner, Andrew Stephen McGough |
Abstract | Newly emerging variants of ransomware pose an ever-growing threat to computer systems governing every aspect of modern life through the handling and analysis of big data. While various recent security-based approaches have focused on detecting and classifying ransomware at the network or system level, easy-to-use post-infection ransomware classification for the lay user has not been attempted before. In this paper, we investigate the possibility of classifying the ransomware a system is infected with simply based on a screenshot of the splash screen or the ransom note captured using a consumer camera commonly found in any modern mobile device. To train and evaluate our system, we create a sample dataset of the splash screens of 50 well-known ransomware variants. In our dataset, only a single training image is available per ransomware. Instead of creating a large training dataset of ransomware screenshots, we simulate screenshot capture conditions via carefully designed data augmentation techniques, enabling simple and efficient one-shot learning. Moreover, using model uncertainty obtained via Bayesian approximation, we ensure special input cases such as unrelated non-ransomware images and previously-unseen ransomware variants are correctly identified for special handling and not mis-classified. Extensive experimental evaluation demonstrates the efficacy of our work, with accuracy levels of up to 93.6% for ransomware classification. |
Tasks | Data Augmentation, One-Shot Learning |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06750v1 |
https://arxiv.org/pdf/1908.06750v1.pdf | |
PWC | https://paperswithcode.com/paper/a-kings-ransom-for-encryption-ransomware |
Repo | https://github.com/atapour/ransomware-classification |
Framework | pytorch |
Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network
Title | Integration of Static and Dynamic Analysis for Malware Family Classification with Composite Neural Network |
Authors | Yao Saint Yen, Zhe Wei Chen, Ying Ren Guo, Meng Chang Chen |
Abstract | Deep learning has been used in the research of malware analysis. Most classification methods use either static analysis features or dynamic analysis features for malware family classification, and rarely combine them as classification features and also no extra effort is spent integrating the two types of features. In this paper, we combine static and dynamic analysis features with deep neural networks for Windows malware classification. We develop several methods to generate static and dynamic analysis features to classify malware in different ways. Given these features, we conduct experiments with composite neural network, showing that the proposed approach performs best with an accuracy of 83.17% on a total of 80 malware families with 4519 malware samples. Additionally, we show that using integrated features for malware family classification outperforms using static features or dynamic features alone. We show how static and dynamic features complement each other for malware classification. |
Tasks | Malware Classification |
Published | 2019-12-24 |
URL | https://arxiv.org/abs/1912.11249v1 |
https://arxiv.org/pdf/1912.11249v1.pdf | |
PWC | https://paperswithcode.com/paper/integration-of-static-and-dynamic-analysis |
Repo | https://github.com/guelfoweb/peframe |
Framework | none |
KiloGrams: Very Large N-Grams for Malware Classification
Title | KiloGrams: Very Large N-Grams for Malware Classification |
Authors | Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles Nicholas, Mark McLean |
Abstract | N-grams have been a common tool for information retrieval and machine learning applications for decades. In nearly all previous works, only a few values of $n$ are tested, with $n > 6$ being exceedingly rare. Larger values of $n$ are not tested due to computational burden or the fear of overfitting. In this work, we present a method to find the top-$k$ most frequent $n$-grams that is 60$\times$ faster for small $n$, and can tackle large $n\geq1024$. Despite the unprecedented size of $n$ considered, we show how these features still have predictive ability for malware classification tasks. More important, large $n$-grams provide benefits in producing features that are interpretable by malware analysis, and can be used to create general purpose signatures compatible with industry standard tools like Yara. Furthermore, the counts of common $n$-grams in a file may be added as features to publicly available human-engineered features that rival efficacy of professionally-developed features when used to train gradient-boosted decision tree models on the EMBER dataset. |
Tasks | Information Retrieval, Malware Classification |
Published | 2019-08-01 |
URL | https://arxiv.org/abs/1908.00200v1 |
https://arxiv.org/pdf/1908.00200v1.pdf | |
PWC | https://paperswithcode.com/paper/kilograms-very-large-n-grams-for-malware |
Repo | https://github.com/NeuromorphicComputationResearchProgram/KiloGrams |
Framework | none |
Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following
Title | Pre-Learning Environment Representations for Data-Efficient Neural Instruction Following |
Authors | David Gaddy, Dan Klein |
Abstract | We consider the problem of learning to map from natural language instructions to state transitions (actions) in a data-efficient manner. Our method takes inspiration from the idea that it should be easier to ground language to concepts that have already been formed through pre-linguistic observation. We augment a baseline instruction-following learner with an initial environment-learning phase that uses observations of language-free state transitions to induce a suitable latent representation of actions before processing the instruction-following training data. We show that mapping to pre-learned representations substantially improves performance over systems whose representations are learned from limited instructional data alone. |
Tasks | |
Published | 2019-07-23 |
URL | https://arxiv.org/abs/1907.09671v1 |
https://arxiv.org/pdf/1907.09671v1.pdf | |
PWC | https://paperswithcode.com/paper/pre-learning-environment-representations-for |
Repo | https://github.com/dgaddy/environment-learning |
Framework | pytorch |