April 3, 2020

3561 words 17 mins read

Paper Group AWR 76

Self-guided Approximate Linear Programs. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Improved Techniques for Training Single-Image GANs. A C Code Generator for Fast Inference and Simple Deployment of Convolutional Neural Networks on Resource Constrained Systems. Image Fine-grained …

Self-guided Approximate Linear Programs


Title	Self-guided Approximate Linear Programs
Authors	Parshan Pakiman, Selvaprabu Nadarajah, Negar Soheili, Qihang Lin
Abstract	Approximate linear programs (ALPs) are well-known models based on value function approximations (VFAs) to obtain heuristic policies and lower bounds on the optimal policy cost of Markov decision processes (MDPs). The ALP VFA is a linear combination of predefined basis functions that are chosen using domain knowledge and updated heuristically if the ALP optimality gap is large. We side-step the need for such basis function engineering in ALP – an implementation bottleneck – by proposing a sequence of ALPs that embed increasing numbers of random basis functions obtained via inexpensive sampling. We provide a sampling guarantee and show that the VFAs from this sequence of models converge to the exact value function. Nevertheless, the performance of the ALP policy can fluctuate significantly as more basis functions are sampled. To mitigate these fluctuations, we “self-guide” our convergent sequence of ALPs using past VFA information such that a worst-case measure of policy performance is improved. We perform numerical experiments on perishable inventory control and generalized joint replenishment applications, which, respectively, give rise to challenging discounted-cost MDPs and average-cost semi-MDPs. We find that self-guided ALPs (i) significantly reduce policy cost fluctuations and improve the optimality gaps from an ALP approach that employs basis functions tailored to the former application, and (ii) deliver optimality gaps that are comparable to a known adaptive basis function generation approach targeting the latter application. More broadly, our methodology provides application-agnostic policies and lower bounds to benchmark approaches that exploit application structure.
Tasks
Published	2020-01-09
URL	https://arxiv.org/abs/2001.02798v1
PDF	https://arxiv.org/pdf/2001.02798v1.pdf
PWC	https://paperswithcode.com/paper/self-guided-approximate-linear-programs
Repo	https://github.com/Self-guided-Approximate-Linear-Programs/Self-guided-ALPs-and-Related-Benchmarks
Framework	none

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data


Title	Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data
Authors	Charles H. Martin, Tongsu, Peng, Michael W. Mahoney
Abstract	In many applications, one works with deep neural network (DNN) models trained by someone else. For such pretrained models, one typically does not have access to training/test data. Moreover, one does not know many details about the model, such as the specifics of the training data, the loss function, the hyperparameter values, etc. Given one or many pretrained models, can one say anything about the expected performance or quality of the models? Here, we present and evaluate empirical quality metrics for pretrained DNN models at scale. Using the open-source WeightWatcher tool, we analyze hundreds of publicly-available pretrained models, including older and current state-of-the-art models in CV and NLP. We examine norm-based capacity control metrics as well as newer Power Law (PL) based metrics (including fitted PL exponents and a Weighted Alpha metric), from the recently-developed Theory of Heavy-Tailed Self Regularization. Norm-based metrics correlate well with reported test accuracies for well-trained models across nearly all CV architecture series. On the other hand, norm-based metrics can not distinguish “good-versus-bad” models—which, arguably is the point of needing quality metrics. Indeed, they may give spurious results. PL-based metrics do much better—quantitatively better at discriminating series of “good-better-best” models, and qualitatively better at discriminating “good-versus-bad” models. PL-based metrics can also be used to characterize fine-scale properties of models, and we introduce the layer-wise Correlation Flow as new quality assessment. We show how poorly-trained (and/or poorly fine-tuned) models may exhibit both Scale Collapse and unusually large PL exponents, in particular for recent NLP models. Our techniques can be used to identify when a pretrained DNN has problems that can not be detected simply by examining training/test accuracies.
Tasks
Published	2020-02-17
URL	https://arxiv.org/abs/2002.06716v1
PDF	https://arxiv.org/pdf/2002.06716v1.pdf
PWC	https://paperswithcode.com/paper/predicting-trends-in-the-quality-of-state-of
Repo	https://github.com/CalculatedContent/WeightWatcher
Framework	pytorch

Improved Techniques for Training Single-Image GANs


Title	Improved Techniques for Training Single-Image GANs
Authors	Tobias Hinz, Matthew Fisher, Oliver Wang, Stefan Wermter
Abstract	Recently there has been an interest in the potential of learning generative models from a single image, as opposed to from a large dataset. This task is of practical significance, as it means that generative models can be used in domains where collecting a large dataset is not feasible. However, training a model capable of generating realistic images from only a single sample is a difficult problem. In this work, we conduct a number of experiments to understand the challenges of training these methods and propose some best practices that we found allowed us to generate improved results over previous work in this space. One key piece is that unlike prior single image generation methods, we concurrently train several stages in a sequential multi-stage manner, allowing us to learn models with fewer stages of increasing image resolution. Compared to a recent state of the art baseline, our model is up to six times faster to train, has fewer parameters, and can better capture the global structure of images.
Tasks	Image Generation
Published	2020-03-25
URL	https://arxiv.org/abs/2003.11512v1
PDF	https://arxiv.org/pdf/2003.11512v1.pdf
PWC	https://paperswithcode.com/paper/improved-techniques-for-training-single-image
Repo	https://github.com/tohinz/ConSinGAN
Framework	pytorch

A C Code Generator for Fast Inference and Simple Deployment of Convolutional Neural Networks on Resource Constrained Systems


Title	A C Code Generator for Fast Inference and Simple Deployment of Convolutional Neural Networks on Resource Constrained Systems
Authors	Oliver Urbann, Simon Camphausen, Arne Moos, Ingmar Schwarz, Sören Kerner, Maximilian Otten
Abstract	Inference of Convolutional Neural Networks in time critical applications usually requires a GPU. In robotics or embedded devices these are often not available due to energy, space and cost constraints. Furthermore, installation of a deep learning framework or even a native compiler on the target platform is not possible. This paper presents a neural network code generator (NNCG) that generates from a trained CNN a plain ANSI C code file that encapsulates the inference in single a function. It can easily be included in existing projects and due to lack of dependencies, cross compilation is usually possible. Additionally, the code generation is optimized based on the known trained CNN and target platform following four design principles. The system is evaluated utilizing small CNN designed for this application. Compared to TensorFlow XLA and Glow speed-ups of up to 11.81 can be shown and even GPUs are outperformed regarding latency.
Tasks	Code Generation
Published	2020-01-14
URL	https://arxiv.org/abs/2001.05572v1
PDF	https://arxiv.org/pdf/2001.05572v1.pdf
PWC	https://paperswithcode.com/paper/a-c-code-generator-for-fast-inference-and
Repo	https://github.com/iml130/nncg
Framework	tf

Image Fine-grained Inpainting


Title	Image Fine-grained Inpainting
Authors	Zheng Hui, Jie Li, Xiumei Wang, Xinbo Gao
Abstract	Image inpainting techniques have shown promising improvement with the assistance of generative adversarial networks (GANs) recently. However, most of them often suffered from completed results with unreasonable structure or blurriness. To mitigate this problem, in this paper, we present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields. Benefited from the property of this network, we can more easily recover large regions in an incomplete image. To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss for concentrating on uncertain areas and enhancing the semantic details. Besides, we devise a geometrical alignment constraint item to compensate for the pixel-based distance between prediction features and ground-truth ones. We also employ a discriminator with local and global branches to ensure local-global contents consistency. To further improve the quality of generated images, discriminator feature matching on the local branch is introduced, which dynamically minimizes the similarity of intermediate features between synthetic and ground-truth patches. Extensive experiments on several public datasets demonstrate that our approach outperforms current state-of-the-art methods. Code is available at~\url{https://github.com/Zheng222/DMFN}.
Tasks	Facial Inpainting, Fine-Grained Image Inpainting, Image Inpainting
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02609v1
PDF	https://arxiv.org/pdf/2002.02609v1.pdf
PWC	https://paperswithcode.com/paper/image-fine-grained-inpainting
Repo	https://github.com/Zheng222/DMFN
Framework	none

FastGAE: Fast, Scalable and Effective Graph Autoencoders with Stochastic Subgraph Decoding


Title	FastGAE: Fast, Scalable and Effective Graph Autoencoders with Stochastic Subgraph Decoding
Authors	Guillaume Salha, Romain Hennequin, Jean-Baptiste Remy, Manuel Moussallam, Michalis Vazirgiannis
Abstract	Graph autoencoders (AE) and variational autoencoders (VAE) are powerful node embedding methods, but suffer from scalability issues. In this paper, we introduce FastGAE, a general framework to scale graph AE and VAE to large graphs with millions of nodes and edges. Our strategy, based on node sampling and subgraph decoding, significantly speeds up the training of graph AE and VAE while preserving or even improving performances. We demonstrate the effectiveness of FastGAE on numerous real-world graphs, outperforming the few existing approaches to scale graph AE and VAE by a wide margin.
Tasks
Published	2020-02-05
URL	https://arxiv.org/abs/2002.01910v3
PDF	https://arxiv.org/pdf/2002.01910v3.pdf
PWC	https://paperswithcode.com/paper/fastgae-fast-scalable-and-effective-graph
Repo	https://github.com/deezer/fastgae
Framework	none

Domain Adaptive Ensemble Learning


Title	Domain Adaptive Ensemble Learning
Authors	Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang
Abstract	The problem of generalizing deep neural networks from multiple source domains to a target one is studied under two settings: When unlabeled target data is available, it is a multi-source unsupervised domain adaptation (UDA) problem, otherwise a domain generalization (DG) problem. We propose a unified framework termed domain adaptive ensemble learning (DAEL) to address both problems. A DAEL model is composed of a CNN feature extractor shared across domains and multiple classifier heads each trained to specialize in a particular source domain. Each such classifier is an expert to its own domain and a non-expert to others. DAEL aims to learn these experts collaboratively so that when forming an ensemble, they can leverage complementary information from each other to be more effective for an unseen target domain. To this end, each source domain is used in turn as a pseudo-target-domain with its own expert providing supervision signal to the ensemble of non-experts learned from the other sources. For unlabeled target data under the UDA setting where real expert does not exist, DAEL uses pseudo-label to supervise the ensemble learning. Extensive experiments on three multi-source UDA datasets and two DG datasets show that DAEL improves the state-of-the-art on both problems, often by significant margins. The code is released at \url{https://github.com/KaiyangZhou/Dassl.pytorch}.
Tasks	Domain Adaptation, Domain Generalization, Unsupervised Domain Adaptation
Published	2020-03-16
URL	https://arxiv.org/abs/2003.07325v1
PDF	https://arxiv.org/pdf/2003.07325v1.pdf
PWC	https://paperswithcode.com/paper/domain-adaptive-ensemble-learning
Repo	https://github.com/KaiyangZhou/Dassl.pytorch
Framework	pytorch

Semi-Supervised Speech Recognition via Local Prior Matching


Title	Semi-Supervised Speech Recognition via Local Prior Matching
Authors	Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun
Abstract	For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.
Tasks	Language Modelling, Speech Recognition
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10336v1
PDF	https://arxiv.org/pdf/2002.10336v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-speech-recognition-via-local
Repo	https://github.com/facebookresearch/wav2letter
Framework	none

ARMS: Automated rules management system for fraud detection


Title	ARMS: Automated rules management system for fraud detection
Authors	David Aparício, Ricardo Barata, João Bravo, João Tiago Ascensão, Pedro Bizarro
Abstract	Fraud detection is essential in financial services, with the potential of greatly reducing criminal activities and saving considerable resources for businesses and customers. We address online fraud detection, which consists of classifying incoming transactions as either legitimate or fraudulent in real-time. Modern fraud detection systems consist of a machine learning model and rules defined by human experts. Often, the rules performance degrades over time due to concept drift, especially of adversarial nature. Furthermore, they can be costly to maintain, either because they are computationally expensive or because they send transactions for manual review. We propose ARMS, an automated rules management system that evaluates the contribution of individual rules and optimizes the set of active rules using heuristic search and a user-defined loss-function. It complies with critical domain-specific requirements, such as handling different actions (e.g., accept, alert, and decline), priorities, blacklists, and large datasets (i.e., hundreds of rules and millions of transactions). We use ARMS to optimize the rule-based systems of two real-world clients. Results show that it can maintain the original systems’ performance (e.g., recall, or false-positive rate) using only a fraction of the original rules (~ 50% in one case, and ~ 20% in the other).
Tasks	Fraud Detection
Published	2020-02-14
URL	https://arxiv.org/abs/2002.06075v1
PDF	https://arxiv.org/pdf/2002.06075v1.pdf
PWC	https://paperswithcode.com/paper/arms-automated-rules-management-system-for
Repo	https://github.com/feedzai/research-arms
Framework	none

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose


Title	Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose
Authors	Ran Yi, Zipeng Ye, Juyong Zhang, Hujun Bao, Yong-Jin Liu
Abstract	Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth background transition, we propose a novel memory-augmented GAN module. By first training a general mapping based on a publicly available dataset and fine-tuning the mapping using the input short video of target person, we develop an effective strategy that only requires a small number of frames (about 300 frames) to learn personalized talking behavior including head pose. Extensive experiments and two user studies show that our method can generate high-quality (i.e., personalized head movements, expressions and good lip synchronization) talking face videos, which are naturally looking with more distinguishing head movement effects than the state-of-the-art methods.
Tasks	3D Face Animation, Video Generation
Published	2020-02-24
URL	https://arxiv.org/abs/2002.10137v2
PDF	https://arxiv.org/pdf/2002.10137v2.pdf
PWC	https://paperswithcode.com/paper/audio-driven-talking-face-video-generation
Repo	https://github.com/yiranran/Audio-driven-TalkingFace-HeadPose
Framework	pytorch

Improving Domain-Adapted Sentiment Classification by Deep Adversarial Mutual Learning


Title	Improving Domain-Adapted Sentiment Classification by Deep Adversarial Mutual Learning
Authors	Qianming Xue, Wei Zhang, Hongyuan Zha
Abstract	Domain-adapted sentiment classification refers to training on a labeled source domain to well infer document-level sentiment on an unlabeled target domain. Most existing relevant models involve a feature extractor and a sentiment classifier, where the feature extractor works towards learning domain-invariant features from both domains, and the sentiment classifier is trained only on the source domain to guide the feature extractor. As such, they lack a mechanism to use sentiment polarity lying in the target domain. To improve domain-adapted sentiment classification by learning sentiment from the target domain as well, we devise a novel deep adversarial mutual learning approach involving two groups of feature extractors, domain discriminators, sentiment classifiers, and label probers. The domain discriminators enable the feature extractors to obtain domain-invariant features. Meanwhile, the label prober in each group explores document sentiment polarity of the target domain through the sentiment prediction generated by the classifier in the peer group, and guides the learning of the feature extractor in its own group. The proposed approach achieves the mutual learning of the two groups in an end-to-end manner. Experiments on multiple public datasets indicate our method obtains the state-of-the-art performance, validating the effectiveness of mutual learning through label probers.
Tasks	Sentiment Analysis
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00119v1
PDF	https://arxiv.org/pdf/2002.00119v1.pdf
PWC	https://paperswithcode.com/paper/improving-domain-adapted-sentiment
Repo	https://github.com/SleepyBag/DAML
Framework	tf

Classifying the classifier: dissecting the weight space of neural networks


Title	Classifying the classifier: dissecting the weight space of neural networks
Authors	Gabriel Eilertsen, Daniel Jönsson, Timo Ropinski, Jonas Unger, Anders Ynnerman
Abstract	This paper presents an empirical study on the weights of neural networks, where we interpret each model as a point in a high-dimensional space – the neural weight space. To explore the complex structure of this space, we sample from a diverse selection of training variations (dataset, optimization procedure, architecture, etc.) of neural network classifiers, and train a large number of models to represent the weight space. Then, we use a machine learning approach for analyzing and extracting information from this space. Most centrally, we train a number of novel deep meta-classifiers with the objective of classifying different properties of the training setup by identifying their footprints in the weight space. Thus, the meta-classifiers probe for patterns induced by hyper-parameters, so that we can quantify how much, where, and when these are encoded through the optimization process. This provides a novel and complementary view for explainable AI, and we show how meta-classifiers can reveal a great deal of information about the training setup and optimization, by only considering a small subset of randomly selected consecutive weights. To promote further research on the weight space, we release the neural weight space (NWS) dataset – a collection of 320K weight snapshots from 16K individually trained deep neural networks.
Tasks
Published	2020-02-13
URL	https://arxiv.org/abs/2002.05688v1
PDF	https://arxiv.org/pdf/2002.05688v1.pdf
PWC	https://paperswithcode.com/paper/classifying-the-classifier-dissecting-the
Repo	https://github.com/gabrieleilertsen/nws
Framework	tf

Temporal Interlacing Network


Title	Temporal Interlacing Network
Authors	Hao Shao, Shengju Qian, Yu Liu
Abstract	For a long time, the vision community tries to learn the spatio-temporal representation by combining convolutional neural network together with various temporal models, such as the families of Markov chain, optical flow, RNN and temporal convolution. However, these pipelines consume enormous computing resources due to the alternately learning process for spatial and temporal information. One natural question is whether we can embed the temporal information into the spatial one so the information in the two domains can be jointly learned once-only. In this work, we answer this question by presenting a simple yet powerful operator – temporal interlacing network (TIN). Instead of learning the temporal features, TIN fuses the two kinds of information by interlacing spatial representations from the past to the future, and vice versa. A differentiable interlacing target can be learned to control the interlacing process. In this way, a heavy temporal model is replaced by a simple interlacing operator. We theoretically prove that with a learnable interlacing target, TIN performs equivalently to the regularized temporal convolution network (r-TCN), but gains 4% more accuracy with 6x less latency on 6 challenging benchmarks. These results push the state-of-the-art performances of video understanding by a considerable margin. Not surprising, the ensemble model of the proposed TIN won the $1^{st}$ place in the ICCV19 - Multi Moments in Time challenge. Code is made available to facilitate further research at https://github.com/deepcs233/TIN
Tasks	Optical Flow Estimation, Video Understanding
Published	2020-01-17
URL	https://arxiv.org/abs/2001.06499v1
PDF	https://arxiv.org/pdf/2001.06499v1.pdf
PWC	https://paperswithcode.com/paper/temporal-interlacing-network
Repo	https://github.com/deepcs233/TIN
Framework	pytorch

Know thy corpus! Robust methods for digital curation of Web corpora


Title	Know thy corpus! Robust methods for digital curation of Web corpora
Authors	Serge Sharoff
Abstract	This paper proposes a novel framework for digital curation of Web corpora in order to provide robust estimation of their parameters, such as their composition and the lexicon. In recent years language models pre-trained on large corpora emerged as clear winners in numerous NLP tasks, but no proper analysis of the corpora which led to their success has been conducted. The paper presents a procedure for robust frequency estimation, which helps in establishing the core lexicon for a given corpus, as well as a procedure for estimating the corpus composition via unsupervised topic models and via supervised genre classification of Web pages. The results of the digital curation study applied to several Web-derived corpora demonstrate their considerable differences. First, this concerns different frequency bursts which impact the core lexicon obtained from each corpus. Second, this concerns the kinds of texts they contain. For example, OpenWebText contains considerably more topical news and political argumentation in comparison to ukWac or Wikipedia. The tools and the results of analysis have been released.
Tasks	Topic Models
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06389v1
PDF	https://arxiv.org/pdf/2003.06389v1.pdf
PWC	https://paperswithcode.com/paper/know-thy-corpus-robust-methods-for-digital
Repo	https://github.com/ssharoff/robust
Framework	none

Weakly Supervised Temporal Action Localization Using Deep Metric Learning


Title	Weakly Supervised Temporal Action Localization Using Deep Metric Learning
Authors	Ashraful Islam, Richard J. Radke
Abstract	Temporal action localization is an important step towards video understanding. Most current action localization methods depend on untrimmed videos with full temporal annotations of action instances. However, it is expensive and time-consuming to annotate both action labels and temporal boundaries of videos. To this end, we propose a weakly supervised temporal action localization method that only requires video-level action instances as supervision during training. We propose a classification module to generate action labels for each segment in the video, and a deep metric learning module to learn the similarity between different action instances. We jointly optimize a balanced binary cross-entropy loss and a metric loss using a standard backpropagation algorithm. Extensive experiments demonstrate the effectiveness of both of these components in temporal localization. We evaluate our algorithm on two challenging untrimmed video datasets: THUMOS14 and ActivityNet1.2. Our approach improves the current state-of-the-art result for THUMOS14 by 6.5% mAP at IoU threshold 0.5, and achieves competitive performance for ActivityNet1.2.
Tasks	Action Localization, Metric Learning, Temporal Action Localization, Temporal Localization, Video Understanding, Weakly-supervised Temporal Action Localization
Published	2020-01-21
URL	https://arxiv.org/abs/2001.07793v1
PDF	https://arxiv.org/pdf/2001.07793v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-temporal-action-1
Repo	https://github.com/asrafulashiq/wsad
Framework	pytorch