Paper Group ANR 102
Using Small Proxy Datasets to Accelerate Hyperparameter Search. FSGAN: Subject Agnostic Face Swapping and Reenactment. Data-Driven Deep Learning of Partial Differential Equations in Modal Space. Learning Energy-based Spatial-Temporal Generative ConvNets for Dynamic Patterns. Integrating Source-channel and Attention-based Sequence-to-sequence Models …
Using Small Proxy Datasets to Accelerate Hyperparameter Search
Title | Using Small Proxy Datasets to Accelerate Hyperparameter Search |
Authors | Sam Shleifer, Eric Prokop |
Abstract | One of the biggest bottlenecks in a machine learning workflow is waiting for models to train. Depending on the available computing resources, it can take days to weeks to train a neural network on a large dataset with many classes such as ImageNet. For researchers experimenting with new algorithmic approaches, this is impractically time consuming and costly. We aim to generate smaller “proxy datasets” where experiments are cheaper to run but results are highly correlated with experimental results on the full dataset. We generate these proxy datasets using by randomly sampling from examples or classes, training on only the easiest or hardest examples and training on synthetic examples generated by “data distillation”. We compare these techniques to the more widely used baseline of training on the full dataset for fewer epochs. For each proxying strategy, we estimate three measures of “proxy quality”: how much of the variance in experimental results on the full dataset can be explained by experimental results on the proxy dataset. Experiments on Imagenette and Imagewoof (Howard, 2019) show that running hyperparameter search on the easiest 10% of examples explains 81% of the variance in experiment results on the target task, and using the easiest 50% of examples can explain 95% of the variance, significantly more than training on all the data for fewer epochs, a more widely used baseline. These “easy” proxies are higher quality than training on the full dataset for a reduced number of epochs (but equivalent computational cost), and, unexpectedly, higher quality than proxies constructed from the hardest examples. Without access to a trained model, researchers can improve proxy quality by restricting the subset to fewer classes; proxies built on half the classes are higher quality than those with an equivalent number of examples spread across all classes. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.04887v1 |
https://arxiv.org/pdf/1906.04887v1.pdf | |
PWC | https://paperswithcode.com/paper/using-small-proxy-datasets-to-accelerate |
Repo | |
Framework | |
FSGAN: Subject Agnostic Face Swapping and Reenactment
Title | FSGAN: Subject Agnostic Face Swapping and Reenactment |
Authors | Yuval Nirkin, Yosi Keller, Tal Hassner |
Abstract | We present Face Swapping GAN (FSGAN) for face swapping and reenactment. Unlike previous work, FSGAN is subject agnostic and can be applied to pairs of faces without requiring training on those faces. To this end, we describe a number of technical contributions. We derive a novel recurrent neural network (RNN)-based approach for face reenactment which adjusts for both pose and expression variations and can be applied to a single image or a video sequence. For video sequences, we introduce continuous interpolation of the face views based on reenactment, Delaunay Triangulation, and barycentric coordinates. Occluded face regions are handled by a face completion network. Finally, we use a face blending network for seamless blending of the two faces while preserving target skin color and lighting conditions. This network uses a novel Poisson blending loss which combines Poisson optimization with perceptual loss. We compare our approach to existing state-of-the-art systems and show our results to be both qualitatively and quantitatively superior. |
Tasks | Face Reenactment, Face Swapping, Facial Inpainting |
Published | 2019-08-16 |
URL | https://arxiv.org/abs/1908.05932v1 |
https://arxiv.org/pdf/1908.05932v1.pdf | |
PWC | https://paperswithcode.com/paper/fsgan-subject-agnostic-face-swapping-and |
Repo | |
Framework | |
Data-Driven Deep Learning of Partial Differential Equations in Modal Space
Title | Data-Driven Deep Learning of Partial Differential Equations in Modal Space |
Authors | Kailiang Wu, Dongbin Xiu |
Abstract | We present a framework for recovering/approximating unknown time-dependent partial differential equation (PDE) using its solution data. Instead of identifying the terms in the underlying PDE, we seek to approximate the evolution operator of the underlying PDE numerically. The evolution operator of the PDE, defined in infinite-dimensional space, maps the solution from a current time to a future time and completely characterizes the solution evolution of the underlying unknown PDE. Our recovery strategy relies on approximation of the evolution operator in a properly defined modal space, i.e., generalized Fourier space, in order to reduce the problem to finite dimensions. The finite dimensional approximation is then accomplished by training a deep neural network structure, which is based on residual network (ResNet), using the given data. Error analysis is provided to illustrate the predictive accuracy of the proposed method. A set of examples of different types of PDEs, including inviscid Burgers’ equation that develops discontinuity in its solution, are presented to demonstrate the effectiveness of the proposed method. |
Tasks | |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06948v2 |
https://arxiv.org/pdf/1910.06948v2.pdf | |
PWC | https://paperswithcode.com/paper/data-driven-deep-learning-of-partial |
Repo | |
Framework | |
Learning Energy-based Spatial-Temporal Generative ConvNets for Dynamic Patterns
Title | Learning Energy-based Spatial-Temporal Generative ConvNets for Dynamic Patterns |
Authors | Jianwen Xie, Song-Chun Zhu, Ying Nian Wu |
Abstract | Video sequences contain rich dynamic patterns, such as dynamic texture patterns that exhibit stationarity in the temporal domain, and action patterns that are non-stationary in either spatial or temporal domain. We show that an energy-based spatial-temporal generative ConvNet can be used to model and synthesize dynamic patterns. The model defines a probability distribution on the video sequence, and the log probability is defined by a spatial-temporal ConvNet that consists of multiple layers of spatial-temporal filters to capture spatial-temporal patterns of different scales. The model can be learned from the training video sequences by an “analysis by synthesis” learning algorithm that iterates the following two steps. Step 1 synthesizes video sequences from the currently learned model. Step 2 then updates the model parameters based on the difference between the synthesized video sequences and the observed training sequences. We show that the learning algorithm can synthesize realistic dynamic patterns. We also show that it is possible to learn the model from incomplete training sequences with either occluded pixels or missing frames, so that model learning and pattern completion can be accomplished simultaneously. |
Tasks | |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.11975v1 |
https://arxiv.org/pdf/1909.11975v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-energy-based-spatial-temporal |
Repo | |
Framework | |
Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition
Title | Integrating Source-channel and Attention-based Sequence-to-sequence Models for Speech Recognition |
Authors | Qiujia Li, Chao Zhang, Philip C. Woodland |
Abstract | This paper proposes a novel automatic speech recognition (ASR) framework called Integrated Source-Channel and Attention (ISCA) that combines the advantages of traditional systems based on the noisy source-channel model (SC) and end-to-end style systems using attention-based sequence-to-sequence models. The traditional SC system framework includes hidden Markov models and connectionist temporal classification (CTC) based acoustic models, language models (LMs), and a decoding procedure based on a lexicon, whereas the end-to-end style attention-based system jointly models the whole process with a single model. By rescoring the hypotheses produced by traditional systems using end-to-end style systems based on an extended noisy source-channel model, ISCA allows structured knowledge to be easily incorporated via the SC-based model while exploiting the complementarity of the attention-based model. Experiments on the AMI meeting corpus show that ISCA is able to give a relative word error rate reduction up to 21% over an individual system, and by 13% over an alternative method which also involves combining CTC and attention-based models. |
Tasks | Speech Recognition |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.06614v2 |
https://arxiv.org/pdf/1909.06614v2.pdf | |
PWC | https://paperswithcode.com/paper/integrating-source-channel-and-attention |
Repo | |
Framework | |
On the Intrinsic Privacy of Stochastic Gradient Descent
Title | On the Intrinsic Privacy of Stochastic Gradient Descent |
Authors | Stephanie L. Hyland, Shruti Tople |
Abstract | Private learning algorithms have been proposed that ensure strong differential-privacy (DP) guarantees, however they often come at a cost to utility. Meanwhile, stochastic gradient descent (SGD) contains intrinsic randomness which has not been leveraged for privacy. In this work, we take the first step towards analysing the intrinsic privacy properties of SGD. Our primary contribution is a large-scale empirical analysis of SGD on convex and non-convex objectives. We evaluate the inherent variability in SGD on 4 datasets and calculate the intrinsic $\epsilon_i$ values due to the inherent noise. First, we show that the variability in model parameters due to the random sampling almost always exceeds that due to changes in the data. We observe that SGD provides intrinsic $\epsilon_i$ values of 2.8, 6.9, 13.01 and 17.99 on Forest Covertype, Adult, and MNIST-binary, CIFAR2 datasets respectively. Next, we propose a method to augment the intrinsic noise of SGD to achieve the desired target $\epsilon$. Our augmented SGD outputs models that outperform existing approaches with the same privacy guarantee, closing the gap to noiseless utility between 0.19% and 10.07%. Finally, we show that the existing theoretical bound on the sensitivity of SGD is not tight. By estimating the tightest bound empirically, we achieve near-noiseless performance at $\epsilon=1$, closing the utility gap to the noiseless model between 3.13% and 100%. Our experiments provide concrete evidence that changing the seed in SGD has far greater impact on the model than excluding any given training example. By accounting for this intrinsic randomness, higher utility is achievable without sacrificing further privacy. With these results, we hope to inspire the research community to further characterise the randomness in SGD, its impact on privacy, and the parallels with generalisation in machine learning. |
Tasks | |
Published | 2019-12-05 |
URL | https://arxiv.org/abs/1912.02919v2 |
https://arxiv.org/pdf/1912.02919v2.pdf | |
PWC | https://paperswithcode.com/paper/on-the-intrinsic-privacy-of-stochastic |
Repo | |
Framework | |
Differentially Private Algorithms for Learning Mixtures of Separated Gaussians
Title | Differentially Private Algorithms for Learning Mixtures of Separated Gaussians |
Authors | Gautam Kamath, Or Sheffet, Vikrant Singhal, Jonathan Ullman |
Abstract | Learning the parameters of Gaussian mixture models is a fundamental and widely studied problem with numerous applications. In this work, we give new algorithms for learning the parameters of a high-dimensional, well separated, Gaussian mixture model subject to the strong constraint of differential privacy. In particular, we give a differentially private analogue of the algorithm of Achlioptas and McSherry. Our algorithm has two key properties not achieved by prior work: (1) The algorithm’s sample complexity matches that of the corresponding non-private algorithm up to lower order terms in a wide range of parameters. (2) The algorithm does not require strong a priori bounds on the parameters of the mixture components. |
Tasks | |
Published | 2019-09-09 |
URL | https://arxiv.org/abs/1909.03951v2 |
https://arxiv.org/pdf/1909.03951v2.pdf | |
PWC | https://paperswithcode.com/paper/differentially-private-algorithms-for-1 |
Repo | |
Framework | |
Evaluating Combinatorial Generalization in Variational Autoencoders
Title | Evaluating Combinatorial Generalization in Variational Autoencoders |
Authors | Alican Bozkurt, Babak Esmaeili, Dana H. Brooks, Jennifer G. Dy, Jan-Willem van de Meent |
Abstract | We evaluate the ability of variational autoencoders to generalize to unseen examples in domains with a large combinatorial space of feature values. Our experiments systematically evaluate the effect of network width, depth, regularization, and the typical distance between the training and test examples. Increasing network capacity benefits generalization in easy problems, where test-set examples are similar to training examples. In more difficult problems, increasing capacity deteriorates generalization when optimizing the standard VAE objective, but once again improves generalization when we decrease the KL regularization. Our results establish that interplay between model capacity and KL regularization is not clear cut; we need to take the typical distance between train and test examples into account when evaluating generalization. |
Tasks | |
Published | 2019-11-11 |
URL | https://arxiv.org/abs/1911.04594v1 |
https://arxiv.org/pdf/1911.04594v1.pdf | |
PWC | https://paperswithcode.com/paper/evaluating-combinatorial-generalization-in |
Repo | |
Framework | |
When to use parametric models in reinforcement learning?
Title | When to use parametric models in reinforcement learning? |
Authors | Hado van Hasselt, Matteo Hessel, John Aslanides |
Abstract | We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free. We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency, improving over prior results with parametric models. |
Tasks | |
Published | 2019-06-12 |
URL | https://arxiv.org/abs/1906.05243v1 |
https://arxiv.org/pdf/1906.05243v1.pdf | |
PWC | https://paperswithcode.com/paper/when-to-use-parametric-models-in |
Repo | |
Framework | |
Feature-wise change detection and robust indoor positioning using RANSAC-like approach
Title | Feature-wise change detection and robust indoor positioning using RANSAC-like approach |
Authors | Caifa Zhou |
Abstract | Fingerprinting-based positioning, one of the promising indoor positioning solutions, has been broadly explored owing to the pervasiveness of sensor-rich mobile devices, the prosperity of opportunistically measurable location-relevant signals and the progress of data-driven algorithms. One critical challenge is to controland improve the quality of the reference fingerprint map (RFM), which is built at the offline stage and applied for online positioning. The key concept concerningthe quality control of the RFM is updating the RFM according to the newly measured data. Though varies methods have been proposed for adapting the RFM, they approach the problem by introducing extra-positioning schemes (e.g. PDR orUGV) and directly adjust the RFM without distinguishing whether critical changes have occurred. This paper aims at proposing an extra-positioning-free solution by making full use of the redundancy of measurable features. Loosely inspired by random sampling consensus (RANSAC), arbitrarily sampled subset of features from the online measurement are used for generating multi-resamples, which areused for estimating the intermediate locations. In the way of resampling, it can mitigate the impact of the changed features on positioning and enables to retrieve accurate location estimation. The users location is robustly computed by identifying the candidate locations from these intermediate ones using modified Jaccardindex (MJI) and the feature-wise change belief is calculated according to the world model of the RFM and the estimated variability of features. In order to validate our proposed approach, two levels of experimental analysis have been carried out. On the simulated dataset, the average change detection accuracy is about 90%. Meanwhile, the improvement of positioning accuracy within 2 m is about 20% by dropping out the features that are detected as changed when performing positioning comparing to that of using all measured features for location estimation. On the long-term collected dataset, the average change detection accuracy is about 85%. |
Tasks | |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.09301v1 |
https://arxiv.org/pdf/1912.09301v1.pdf | |
PWC | https://paperswithcode.com/paper/feature-wise-change-detection-and-robust |
Repo | |
Framework | |
D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
Title | D3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation |
Authors | Chien-Yi Chang, De-An Huang, Yanan Sui, Li Fei-Fei, Juan Carlos Niebles |
Abstract | We address weakly supervised action alignment and segmentation in videos, where only the order of occurring actions is available during training. We propose Discriminative Differentiable Dynamic Time Warping (D3TW), the first discriminative model using weak ordering supervision. The key technical challenge for discriminative modeling with weak supervision is that the loss function of the ordering supervision is usually formulated using dynamic programming and is thus not differentiable. We address this challenge with a continuous relaxation of the min-operator in dynamic programming and extend the alignment loss to be differentiable. The proposed D3TW innovatively solves sequence alignment with discriminative modeling and end-to-end training, which substantially improves the performance in weakly supervised action alignment and segmentation tasks. We show that our model is able to bypass the degenerated sequence problem usually encountered in previous work and outperform the current state-of-the-art across three evaluation metrics in two challenging datasets. |
Tasks | |
Published | 2019-01-09 |
URL | http://arxiv.org/abs/1901.02598v2 |
http://arxiv.org/pdf/1901.02598v2.pdf | |
PWC | https://paperswithcode.com/paper/d3tw-discriminative-differentiable-dynamic |
Repo | |
Framework | |
Improving Context-aware Neural Machine Translation with Target-side Context
Title | Improving Context-aware Neural Machine Translation with Target-side Context |
Authors | Hayahide Yamagishi, Mamoru Komachi |
Abstract | In recent years, several studies on neural machine translation (NMT) have attempted to use document-level context by using a multi-encoder and two attention mechanisms to read the current and previous sentences to incorporate the context of the previous sentences. These studies concluded that the target-side context is less useful than the source-side context. However, we considered that the reason why the target-side context is less useful lies in the architecture used to model these contexts. Therefore, in this study, we investigate how the target-side context can improve context-aware neural machine translation. We propose a weight sharing method wherein NMT saves decoder states and calculates an attention vector using the saved states when translating a current sentence. Our experiments show that the target-side context is also useful if we plug it into NMT as the decoder state when translating a previous sentence. |
Tasks | Machine Translation |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.00531v1 |
https://arxiv.org/pdf/1909.00531v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-context-aware-neural-machine |
Repo | |
Framework | |
Recurrent Convolutional Strategies for Face Manipulation Detection in Videos
Title | Recurrent Convolutional Strategies for Face Manipulation Detection in Videos |
Authors | Ekraam Sabir, Jiaxin Cheng, Ayush Jaiswal, Wael AbdAlmageed, Iacopo Masi, Prem Natarajan |
Abstract | The spread of misinformation through synthetically generated yet realistic images and videos has become a significant problem, calling for robust manipulation detection methods. Despite the predominant effort of detecting face manipulation in still images, less attention has been paid to the identification of tampered faces in videos by taking advantage of the temporal information present in the stream. Recurrent convolutional models are a class of deep learning models which have proven effective at exploiting the temporal information from image streams across domains. We thereby distill the best strategy for combining variations in these models along with domain specific face preprocessing techniques through extensive experimentation to obtain state-of-the-art performance on publicly available video-based facial manipulation benchmarks. Specifically, we attempt to detect Deepfake, Face2Face and FaceSwap tampered faces in video streams. Evaluation is performed on the recently introduced FaceForensics++ dataset, improving the previous state-of-the-art by up to 4.55% in accuracy. |
Tasks | Face Swapping |
Published | 2019-05-02 |
URL | https://arxiv.org/abs/1905.00582v3 |
https://arxiv.org/pdf/1905.00582v3.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-convolution-approach-to-deepfake |
Repo | |
Framework | |
“You might also like this model”: Data Driven Approach for Recommending Deep Learning Models for Unknown Image Datasets
Title | “You might also like this model”: Data Driven Approach for Recommending Deep Learning Models for Unknown Image Datasets |
Authors | Ameya Prabhu, Riddhiman Dasgupta, Anush Sankaran, Srikanth Tamilselvam, Senthil Mani |
Abstract | For an unknown (new) classification dataset, choosing an appropriate deep learning architecture is often a recursive, time-taking, and laborious process. In this research, we propose a novel technique to recommend a suitable architecture from a repository of known models. Further, we predict the performance accuracy of the recommended architecture on the given unknown dataset, without the need for training the model. We propose a model encoder approach to learn a fixed length representation of deep learning architectures along with its hyperparameters, in an unsupervised fashion. We manually curate a repository of image datasets with corresponding known deep learning models and show that the predicted accuracy is a good estimator of the actual accuracy. We discuss the implications of the proposed approach for three benchmark images datasets and also the challenges in using the approach for text modality. To further increase the reproducibility of the proposed approach, the entire implementation is made publicly available along with the trained models. |
Tasks | |
Published | 2019-11-26 |
URL | https://arxiv.org/abs/1911.11433v1 |
https://arxiv.org/pdf/1911.11433v1.pdf | |
PWC | https://paperswithcode.com/paper/you-might-also-like-this-model-data-driven |
Repo | |
Framework | |
PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking
Title | PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking |
Authors | Xinke Deng, Arsalan Mousavian, Yu Xiang, Fei Xia, Timothy Bretl, Dieter Fox |
Abstract | Tracking 6D poses of objects from videos provides rich information to a robot in performing different tasks such as manipulation and navigation. In this work, we formulate the 6D object pose tracking problem in the Rao-Blackwellized particle filtering framework, where the 3D rotation and the 3D translation of an object are decoupled. This factorization allows our approach, called PoseRBPF, to efficiently estimate the 3D translation of an object along with the full distribution over the 3D rotation. This is achieved by discretizing the rotation space in a fine-grained manner, and training an auto-encoder network to construct a codebook of feature embeddings for the discretized rotations. As a result, PoseRBPF can track objects with arbitrary symmetries while still maintaining adequate posterior distributions. Our approach achieves state-of-the-art results on two 6D pose estimation benchmarks. A video showing the experiments can be found at https://youtu.be/lE5gjzRKWuA |
Tasks | 6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation, Pose Tracking |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.09304v1 |
https://arxiv.org/pdf/1905.09304v1.pdf | |
PWC | https://paperswithcode.com/paper/poserbpf-a-rao-blackwellized-particle-filter |
Repo | |
Framework | |