April 2, 2020

3385 words 16 mins read

Paper Group ANR 172

Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation. Detecting Changes in Asset Co-Movement Using the Autoencoder Reconstruction Ratio. Multilingual Denoising Pre-training for Neural Machine Translation. Provable Self-Play Algorithms for Competitive Reinforcement Learning. Hierarchical Multi-Process Fusion f …

Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation


Title	Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation
Authors	Lingjie Liu, Weipeng Xu, Marc Habermann, Michael Zollhoefer, Florian Bernard, Hyeongwoo Kim, Wenping Wang, Christian Theobalt
Abstract	Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively.
Tasks	Image-to-Image Translation, Novel View Synthesis
Published	2020-01-14
URL	https://arxiv.org/abs/2001.04947v2
PDF	https://arxiv.org/pdf/2001.04947v2.pdf
PWC	https://paperswithcode.com/paper/neural-human-video-rendering-joint-learning
Repo
Framework

Detecting Changes in Asset Co-Movement Using the Autoencoder Reconstruction Ratio


Title	Detecting Changes in Asset Co-Movement Using the Autoencoder Reconstruction Ratio
Authors	Bryan Lim, Stefan Zohren, Stephen Roberts
Abstract	Detecting changes in asset co-movements is of much importance to financial practitioners, with numerous risk management benefits arising from the timely detection of breakdowns in historical correlations. In this article, we propose a real-time indicator to detect temporary increases in asset co-movements, the Autoencoder Reconstruction Ratio, which measures how well a basket of asset returns can be modelled using a lower-dimensional set of latent variables. The ARR uses a deep sparse denoising autoencoder to perform the dimensionality reduction on the returns vector, which replaces the PCA approach of the standard Absorption Ratio, and provides a better model for non-Gaussian returns. Through a systemic risk application on forecasting on the CRSP US Total Market Index, we show that lower ARR values coincide with higher volatility and larger drawdowns, indicating that increased asset co-movement does correspond with periods of market weakness. We also demonstrate that short-term (i.e. 5-min and 1-hour) predictors for realised volatility and market crashes can be improved by including additional ARR inputs.
Tasks	Denoising, Dimensionality Reduction
Published	2020-01-23
URL	https://arxiv.org/abs/2002.02008v1
PDF	https://arxiv.org/pdf/2002.02008v1.pdf
PWC	https://paperswithcode.com/paper/detecting-changes-in-asset-co-movement-using
Repo
Framework

Multilingual Denoising Pre-training for Neural Machine Translation


Title	Multilingual Denoising Pre-training for Neural Machine Translation
Authors	Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer
Abstract	This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART – a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.
Tasks	Denoising, Machine Translation, Unsupervised Machine Translation
Published	2020-01-22
URL	https://arxiv.org/abs/2001.08210v2
PDF	https://arxiv.org/pdf/2001.08210v2.pdf
PWC	https://paperswithcode.com/paper/multilingual-denoising-pre-training-for
Repo
Framework

Provable Self-Play Algorithms for Competitive Reinforcement Learning


Title	Provable Self-Play Algorithms for Competitive Reinforcement Learning
Authors	Yu Bai, Chi Jin
Abstract	Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of exisiting theory in reinforcement learning only applies to the setting where the agent plays against a fixed environment. It remains largely open whether self-play algorithms can be provably effective, especially when it is necessary to manage the exploration/exploitation tradeoff. We study self-play in competitive reinforcement learning under the setting of Markov games, a generalization of Markov decision processes to the two-player case. We introduce a self-play algorithm—Value Iteration with Upper/Lower Confidence Bound (VI-ULCB), and show that it achieves regret $\mathcal{\tilde{O}}(\sqrt{T})$ after playing $T$ steps of the game. The regret is measured by the agent’s performance against a \emph{fully adversarial} opponent who can exploit the agent’s strategy at \emph{any} step. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret of $\mathcal{\tilde{O}}(T^{2/3})$, but is guaranteed to run in polynomial time even in the worst case. To the best of our knowledge, our work presents the first line of provably sample-efficient self-play algorithms for competitive reinforcement learning.
Tasks
Published	2020-02-10
URL	https://arxiv.org/abs/2002.04017v2
PDF	https://arxiv.org/pdf/2002.04017v2.pdf
PWC	https://paperswithcode.com/paper/provable-self-play-algorithms-for-competitive
Repo
Framework

Hierarchical Multi-Process Fusion for Visual Place Recognition


Title	Hierarchical Multi-Process Fusion for Visual Place Recognition
Authors	Stephen Hausler, Michael Milford
Abstract	Combining multiple complementary techniques together has long been regarded as a way to improve performance. In visual localization, multi-sensor fusion, multi-process fusion of a single sensing modality, and even combinations of different localization techniques have been shown to result in improved performance. However, merely fusing together different localization techniques does not account for the varying performance characteristics of different localization techniques. In this paper we present a novel, hierarchical localization system that explicitly benefits from three varying characteristics of localization techniques: the distribution of their localization hypotheses, their appearance- and viewpoint-invariant properties, and the resulting differences in where in an environment each system works well and fails. We show how two techniques deployed hierarchically work better than in parallel fusion, how combining two different techniques works better than two levels of a single technique, even when the single technique has superior individual performance, and develop two and three-tier hierarchical structures that progressively improve localization performance. Finally, we develop a stacked hierarchical framework where localization hypotheses from techniques with complementary characteristics are concatenated at each layer, significantly improving retention of the correct hypothesis through to the final localization stage. Using two challenging datasets, we show the proposed system outperforming state-of-the-art techniques.
Tasks	Sensor Fusion, Visual Localization, Visual Place Recognition
Published	2020-01-28
URL	https://arxiv.org/abs/2002.03895v1
PDF	https://arxiv.org/pdf/2002.03895v1.pdf
PWC	https://paperswithcode.com/paper/hierarchical-multi-process-fusion-for-visual
Repo
Framework

First Order Motion Model for Image Animation


Title	First Order Motion Model for Image Animation
Authors	Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe
Abstract	Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. To achieve this, we decouple appearance and motion information using a self-supervised formulation. To support complex motions, we use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video. Our framework scores best on diverse benchmarks and on a variety of object categories. Our source code is publicly available.
Tasks	Image Animation
Published	2020-02-29
URL	https://arxiv.org/abs/2003.00196v2
PDF	https://arxiv.org/pdf/2003.00196v2.pdf
PWC	https://paperswithcode.com/paper/first-order-motion-model-for-image-animation-1
Repo
Framework

Post-Comparison Mitigation of Demographic Bias in Face Recognition Using Fair Score Normalization


Title	Post-Comparison Mitigation of Demographic Bias in Face Recognition Using Fair Score Normalization
Authors	Philipp Terhörst, Jan Niklas Kolf, Naser Damer, Florian Kirchbuchner, Arjan Kuijper
Abstract	Current face recognition systems achieved high progress on several benchmark tests. Despite this progress, recent works showed that these systems are strongly biased against demographic sub-groups. Consequently, an easily integrable solution is needed to reduce the discriminatory effect of these biased systems. Previous work introduced fairness-enhancing solutions that strongly degrades the overall system performance. In this work, we propose a novel fair score normalization approach that is specifically designed to reduce the effect of bias in face recognition and subsequently lead to a significant overall performance boost. Our hypothesis is built on the notation of individual fairness by designing a normalization approach that leads to treating “similar” individuals “similarly”. Experiments were conducted on two publicly available datasets captured under controlled and in-the-wild circumstances. The results show that our fair normalization approach enhances the overall performance by up to 14.8% under intermediate false match rate settings and up to 30.7% under high security settings. Our proposed approach significantly reduces the errors of all demographic groups, and thus reduce bias. Especially under in-the-wild conditions, we demonstrated that our fair normalization method improves the recognition performance of the effected population sub-groups by 31.6%. Unlike previous work, our proposed fairness-enhancing solution does not require demographic information about the individuals, leads to an overall performance boost, and can be easily integrated in existing biometric systems.
Tasks	Face Recognition
Published	2020-02-10
URL	https://arxiv.org/abs/2002.03592v1
PDF	https://arxiv.org/pdf/2002.03592v1.pdf
PWC	https://paperswithcode.com/paper/post-comparison-mitigation-of-demographic
Repo
Framework

On the Robustness of Face Recognition Algorithms Against Attacks and Bias


Title	On the Robustness of Face Recognition Algorithms Against Attacks and Bias
Authors	Richa Singh, Akshay Agarwal, Maneet Singh, Shruti Nagpal, Mayank Vatsa
Abstract	Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications. Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged. This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working. Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed. We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms. The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models.
Tasks	Face Recognition
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02942v1
PDF	https://arxiv.org/pdf/2002.02942v1.pdf
PWC	https://paperswithcode.com/paper/on-the-robustness-of-face-recognition
Repo
Framework

How Does Gender Balance In Training Data Affect Face Recognition Accuracy?


Title	How Does Gender Balance In Training Data Affect Face Recognition Accuracy?
Authors	Vítor Albiero, Kai Zhang, Kevin W. Bowyer
Abstract	Even though deep learning methods have greatly increased the overall accuracy of face recognition, an old problem still persists: accuracy is higher for men than for women. Previous researchers have speculated that the difference could be due to cosmetics, head pose, or hair covering the face. It is also often speculated that the lower accuracy for women is caused by women being under-represented in the training data. This work aims to investigate if gender imbalance in the training data is actually the cause of lower accuracy for females. Using a state-of-the-art deep CNN, three different loss functions, and two training datasets, we train each on seven subsets with different male/female ratios, totaling forty two train-ings. The trained face matchers are then tested on three different testing datasets. Results show that gender-balancing the dataset has an overall positive effect, with higher accuracy for most of the combinations of loss functions and datasets when a balanced subset is used. However, for the best combination of loss function and dataset, the original training dataset shows better accuracy on 3 out of 4 times. We observe that test accuracy for males is higher when the training data is all male. However, test accuracy for females is not maximized when the training data is all female. Fora number of combinations of loss function and test dataset, accuracy for females is higher when only 75% of the train-ing data is female than when 100% of the training data is female. This suggests that lower accuracy for females is nota simple result of the fraction of female training data. By clustering face features, we show that in general, male faces are closer to other male faces than female faces, and female faces are closer to other female faces than male faces
Tasks	Face Recognition
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02934v1
PDF	https://arxiv.org/pdf/2002.02934v1.pdf
PWC	https://paperswithcode.com/paper/how-does-gender-balance-in-training-data
Repo
Framework

MEMO: A Deep Network for Flexible Combination of Episodic Memories


Title	MEMO: A Deep Network for Flexible Combination of Episodic Memories
Authors	Andrea Banino, Adrià Puigdomènech Badia, Raphael Köster, Martin J. Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, Charles Blundell
Abstract	Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmented architectures. This task is thought to capture the essence of reasoning – the appreciation of distant relationships among elements distributed across multiple facts or memories. Surprisingly, we found that current architectures struggle to reason over long distance associations. Similar results were obtained on a more complex task involving finding the shortest path between nodes in a path. We therefore developed MEMO, an architecture endowed with the capacity to reason over longer distances. This was accomplished with the addition of two novel components. First, it introduces a separation between memories (facts) stored in external memory and the items that comprise these facts in external memory. Second, it makes use of an adaptive retrieval mechanism, allowing a variable number of “memory hops” before the answer is produced. MEMO is capable of solving our novel reasoning tasks, as well as match state of the art results in bAbI.
Tasks
Published	2020-01-29
URL	https://arxiv.org/abs/2001.10913v1
PDF	https://arxiv.org/pdf/2001.10913v1.pdf
PWC	https://paperswithcode.com/paper/memo-a-deep-network-for-flexible-combination-1
Repo
Framework

Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms


Title	Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms
Authors	P. Jonathon Phillips, Mark Przybocki
Abstract	Traditionally, researchers in automatic face recognition and biometric technologies have focused on developing accurate algorithms. With this technology being integrated into operational systems, engineers and scientists are being asked, do these systems meet societal norms? The origin of this line of inquiry is `trust’ of artificial intelligence (AI) systems. In this paper, we concentrate on adapting explainable AI to face recognition and biometrics, and we present four principles of explainable AI to face recognition and biometrics. The principles are illustrated by $\it{four}$ case studies, which show the challenges and issues in developing algorithms that can produce explanations. \|
Tasks	Face Recognition
Published	2020-02-03
URL	https://arxiv.org/abs/2002.01014v1
PDF	https://arxiv.org/pdf/2002.01014v1.pdf
PWC	https://paperswithcode.com/paper/four-principles-of-explainable-ai-as-applied
Repo
Framework

A Deep Structural Model for Analyzing Correlated Multivariate Time Series


Title	A Deep Structural Model for Analyzing Correlated Multivariate Time Series
Authors	Changwei Hu, Yifan Hu, Sungyong Seo
Abstract	Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multivariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category.
Tasks	Time Series
Published	2020-01-02
URL	https://arxiv.org/abs/2001.00559v1
PDF	https://arxiv.org/pdf/2001.00559v1.pdf
PWC	https://paperswithcode.com/paper/a-deep-structural-model-for-analyzing
Repo
Framework

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models


Title	Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
Authors	Pranav Agarwal, Alejandro Betancourt, Vana Panagiotou, Natalia Díaz-Rodríguez
Abstract	Image captioning models have been able to generate grammatically correct and human understandable sentences. However most of the captions convey limited information as the model used is trained on datasets that do not caption all possible objects existing in everyday life. Due to this lack of prior information most of the captions are biased to only a few objects present in the scene, hence limiting their usage in daily life. In this paper, we attempt to show the biased nature of the currently existing image captioning models and present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions. We further exploit the state of the art pre-trained image captioning and object recognition networks to annotate our images and show the limitations of existing works. Furthermore, in order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF). Existing image captioning metrics can evaluate a caption only in the presence of their corresponding annotations; however, SF allows evaluating captions generated for images without annotations, making it highly useful for real life generated captions.
Tasks	Image Captioning, Object Recognition
Published	2020-03-26
URL	https://arxiv.org/abs/2003.11743v2
PDF	https://arxiv.org/pdf/2003.11743v2.pdf
PWC	https://paperswithcode.com/paper/egoshots-an-ego-vision-life-logging-dataset
Repo
Framework

LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation


Title	LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation
Authors	Peng Jiang, Srikanth Saripalli
Abstract	We present a boundary-aware domain adaptation model for Lidar point cloud semantic segmentation. Our model is designed to extract both the domain private features and the domain shared features using shared weight. We embedded Gated-SCNN into the shared features extractors to help it learn boundary information while learning other shared features. Besides, the CycleGAN mechanism is imposed for further adaptation. We conducted experiments on real-world datasets. The source domain data is from the Semantic KITTI dataset, and the target domain data is collected from our own platform (a warthog) in off-road as well as urban scenarios. The two datasets have differences in channel distributions, reflectivity distributions, and sensors setup. Using our approach, we are able to get a single model that can work on both domains. The model is capable of achieving the state of art performance on the source domain (Semantic KITTI dataset) and get 44.0% mIoU on the target domain dataset.
Tasks	Domain Adaptation, Semantic Segmentation
Published	2020-03-02
URL	https://arxiv.org/abs/2003.01174v1
PDF	https://arxiv.org/pdf/2003.01174v1.pdf
PWC	https://paperswithcode.com/paper/lidarnet-a-boundary-aware-domain-adaptation
Repo
Framework

Adversarial Robustness for Code


Title	Adversarial Robustness for Code
Authors	Pavol Bielik, Martin Vechev
Abstract	We propose a novel technique which addresses the challenge of learning accurate and robust models of code in a principled way. Our method consists of three key components: (i) learning to abstain from making a prediction if uncertain, (ii) adversarial training, and (iii) representation refinement which learns the program parts relevant for the prediction and abstracts the rest. These components are used to iteratively train multiple models, each of which learns a suitable program representation necessary to make robust predictions on a different subset of the dataset. We instantiated our approach to the task of type inference for dynamically typed languages and demonstrate its effectiveness by learning a model that achieves 88% accuracy and 84% robustness. Further, our evaluation shows that using the combination of all three components is key to obtaining accurate and robust models.
Tasks
Published	2020-02-11
URL	https://arxiv.org/abs/2002.04694v1
PDF	https://arxiv.org/pdf/2002.04694v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-robustness-for-code
Repo
Framework