Paper Group ANR 172
Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation. Detecting Changes in Asset Co-Movement Using the Autoencoder Reconstruction Ratio. Multilingual Denoising Pre-training for Neural Machine Translation. Provable Self-Play Algorithms for Competitive Reinforcement Learning. Hierarchical Multi-Process Fusion f …
Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation
Title | Neural Human Video Rendering by Learning Dynamic Textures and Rendering-to-Video Translation |
Authors | Lingjie Liu, Weipeng Xu, Marc Habermann, Michael Zollhoefer, Florian Bernard, Hyeongwoo Kim, Wenping Wang, Christian Theobalt |
Abstract | Synthesizing realistic videos of humans using neural networks has been a popular alternative to the conventional graphics-based rendering pipeline due to its high efficiency. Existing works typically formulate this as an image-to-image translation problem in 2D screen space, which leads to artifacts such as over-smoothing, missing body parts, and temporal instability of fine-scale detail, such as pose-dependent wrinkles in the clothing. In this paper, we propose a novel human video synthesis method that approaches these limiting factors by explicitly disentangling the learning of time-coherent fine-scale details from the embedding of the human in 2D screen space. More specifically, our method relies on the combination of two convolutional neural networks (CNNs). Given the pose information, the first CNN predicts a dynamic texture map that contains time-coherent high-frequency details, and the second CNN conditions the generation of the final video on the temporally coherent output of the first CNN. We demonstrate several applications of our approach, such as human reenactment and novel view synthesis from monocular video, where we show significant improvement over the state of the art both qualitatively and quantitatively. |
Tasks | Image-to-Image Translation, Novel View Synthesis |
Published | 2020-01-14 |
URL | https://arxiv.org/abs/2001.04947v2 |
https://arxiv.org/pdf/2001.04947v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-human-video-rendering-joint-learning |
Repo | |
Framework | |
Detecting Changes in Asset Co-Movement Using the Autoencoder Reconstruction Ratio
Title | Detecting Changes in Asset Co-Movement Using the Autoencoder Reconstruction Ratio |
Authors | Bryan Lim, Stefan Zohren, Stephen Roberts |
Abstract | Detecting changes in asset co-movements is of much importance to financial practitioners, with numerous risk management benefits arising from the timely detection of breakdowns in historical correlations. In this article, we propose a real-time indicator to detect temporary increases in asset co-movements, the Autoencoder Reconstruction Ratio, which measures how well a basket of asset returns can be modelled using a lower-dimensional set of latent variables. The ARR uses a deep sparse denoising autoencoder to perform the dimensionality reduction on the returns vector, which replaces the PCA approach of the standard Absorption Ratio, and provides a better model for non-Gaussian returns. Through a systemic risk application on forecasting on the CRSP US Total Market Index, we show that lower ARR values coincide with higher volatility and larger drawdowns, indicating that increased asset co-movement does correspond with periods of market weakness. We also demonstrate that short-term (i.e. 5-min and 1-hour) predictors for realised volatility and market crashes can be improved by including additional ARR inputs. |
Tasks | Denoising, Dimensionality Reduction |
Published | 2020-01-23 |
URL | https://arxiv.org/abs/2002.02008v1 |
https://arxiv.org/pdf/2002.02008v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-changes-in-asset-co-movement-using |
Repo | |
Framework | |
Multilingual Denoising Pre-training for Neural Machine Translation
Title | Multilingual Denoising Pre-training for Neural Machine Translation |
Authors | Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer |
Abstract | This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART – a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training. |
Tasks | Denoising, Machine Translation, Unsupervised Machine Translation |
Published | 2020-01-22 |
URL | https://arxiv.org/abs/2001.08210v2 |
https://arxiv.org/pdf/2001.08210v2.pdf | |
PWC | https://paperswithcode.com/paper/multilingual-denoising-pre-training-for |
Repo | |
Framework | |
Provable Self-Play Algorithms for Competitive Reinforcement Learning
Title | Provable Self-Play Algorithms for Competitive Reinforcement Learning |
Authors | Yu Bai, Chi Jin |
Abstract | Self-play, where the algorithm learns by playing against itself without requiring any direct supervision, has become the new weapon in modern Reinforcement Learning (RL) for achieving superhuman performance in practice. However, the majority of exisiting theory in reinforcement learning only applies to the setting where the agent plays against a fixed environment. It remains largely open whether self-play algorithms can be provably effective, especially when it is necessary to manage the exploration/exploitation tradeoff. We study self-play in competitive reinforcement learning under the setting of Markov games, a generalization of Markov decision processes to the two-player case. We introduce a self-play algorithm—Value Iteration with Upper/Lower Confidence Bound (VI-ULCB), and show that it achieves regret $\mathcal{\tilde{O}}(\sqrt{T})$ after playing $T$ steps of the game. The regret is measured by the agent’s performance against a \emph{fully adversarial} opponent who can exploit the agent’s strategy at \emph{any} step. We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret of $\mathcal{\tilde{O}}(T^{2/3})$, but is guaranteed to run in polynomial time even in the worst case. To the best of our knowledge, our work presents the first line of provably sample-efficient self-play algorithms for competitive reinforcement learning. |
Tasks | |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.04017v2 |
https://arxiv.org/pdf/2002.04017v2.pdf | |
PWC | https://paperswithcode.com/paper/provable-self-play-algorithms-for-competitive |
Repo | |
Framework | |
Hierarchical Multi-Process Fusion for Visual Place Recognition
Title | Hierarchical Multi-Process Fusion for Visual Place Recognition |
Authors | Stephen Hausler, Michael Milford |
Abstract | Combining multiple complementary techniques together has long been regarded as a way to improve performance. In visual localization, multi-sensor fusion, multi-process fusion of a single sensing modality, and even combinations of different localization techniques have been shown to result in improved performance. However, merely fusing together different localization techniques does not account for the varying performance characteristics of different localization techniques. In this paper we present a novel, hierarchical localization system that explicitly benefits from three varying characteristics of localization techniques: the distribution of their localization hypotheses, their appearance- and viewpoint-invariant properties, and the resulting differences in where in an environment each system works well and fails. We show how two techniques deployed hierarchically work better than in parallel fusion, how combining two different techniques works better than two levels of a single technique, even when the single technique has superior individual performance, and develop two and three-tier hierarchical structures that progressively improve localization performance. Finally, we develop a stacked hierarchical framework where localization hypotheses from techniques with complementary characteristics are concatenated at each layer, significantly improving retention of the correct hypothesis through to the final localization stage. Using two challenging datasets, we show the proposed system outperforming state-of-the-art techniques. |
Tasks | Sensor Fusion, Visual Localization, Visual Place Recognition |
Published | 2020-01-28 |
URL | https://arxiv.org/abs/2002.03895v1 |
https://arxiv.org/pdf/2002.03895v1.pdf | |
PWC | https://paperswithcode.com/paper/hierarchical-multi-process-fusion-for-visual |
Repo | |
Framework | |
First Order Motion Model for Image Animation
Title | First Order Motion Model for Image Animation |
Authors | Aliaksandr Siarohin, Stéphane Lathuilière, Sergey Tulyakov, Elisa Ricci, Nicu Sebe |
Abstract | Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate. Once trained on a set of videos depicting objects of the same category (e.g. faces, human bodies), our method can be applied to any object of this class. To achieve this, we decouple appearance and motion information using a self-supervised formulation. To support complex motions, we use a representation consisting of a set of learned keypoints along with their local affine transformations. A generator network models occlusions arising during target motions and combines the appearance extracted from the source image and the motion derived from the driving video. Our framework scores best on diverse benchmarks and on a variety of object categories. Our source code is publicly available. |
Tasks | Image Animation |
Published | 2020-02-29 |
URL | https://arxiv.org/abs/2003.00196v2 |
https://arxiv.org/pdf/2003.00196v2.pdf | |
PWC | https://paperswithcode.com/paper/first-order-motion-model-for-image-animation-1 |
Repo | |
Framework | |
Post-Comparison Mitigation of Demographic Bias in Face Recognition Using Fair Score Normalization
Title | Post-Comparison Mitigation of Demographic Bias in Face Recognition Using Fair Score Normalization |
Authors | Philipp Terhörst, Jan Niklas Kolf, Naser Damer, Florian Kirchbuchner, Arjan Kuijper |
Abstract | Current face recognition systems achieved high progress on several benchmark tests. Despite this progress, recent works showed that these systems are strongly biased against demographic sub-groups. Consequently, an easily integrable solution is needed to reduce the discriminatory effect of these biased systems. Previous work introduced fairness-enhancing solutions that strongly degrades the overall system performance. In this work, we propose a novel fair score normalization approach that is specifically designed to reduce the effect of bias in face recognition and subsequently lead to a significant overall performance boost. Our hypothesis is built on the notation of individual fairness by designing a normalization approach that leads to treating “similar” individuals “similarly”. Experiments were conducted on two publicly available datasets captured under controlled and in-the-wild circumstances. The results show that our fair normalization approach enhances the overall performance by up to 14.8% under intermediate false match rate settings and up to 30.7% under high security settings. Our proposed approach significantly reduces the errors of all demographic groups, and thus reduce bias. Especially under in-the-wild conditions, we demonstrated that our fair normalization method improves the recognition performance of the effected population sub-groups by 31.6%. Unlike previous work, our proposed fairness-enhancing solution does not require demographic information about the individuals, leads to an overall performance boost, and can be easily integrated in existing biometric systems. |
Tasks | Face Recognition |
Published | 2020-02-10 |
URL | https://arxiv.org/abs/2002.03592v1 |
https://arxiv.org/pdf/2002.03592v1.pdf | |
PWC | https://paperswithcode.com/paper/post-comparison-mitigation-of-demographic |
Repo | |
Framework | |
On the Robustness of Face Recognition Algorithms Against Attacks and Bias
Title | On the Robustness of Face Recognition Algorithms Against Attacks and Bias |
Authors | Richa Singh, Akshay Agarwal, Maneet Singh, Shruti Nagpal, Mayank Vatsa |
Abstract | Face recognition algorithms have demonstrated very high recognition performance, suggesting suitability for real world applications. Despite the enhanced accuracies, robustness of these algorithms against attacks and bias has been challenged. This paper summarizes different ways in which the robustness of a face recognition algorithm is challenged, which can severely affect its intended working. Different types of attacks such as physical presentation attacks, disguise/makeup, digital adversarial attacks, and morphing/tampering using GANs have been discussed. We also present a discussion on the effect of bias on face recognition models and showcase that factors such as age and gender variations affect the performance of modern algorithms. The paper also presents the potential reasons for these challenges and some of the future research directions for increasing the robustness of face recognition models. |
Tasks | Face Recognition |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02942v1 |
https://arxiv.org/pdf/2002.02942v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-robustness-of-face-recognition |
Repo | |
Framework | |
How Does Gender Balance In Training Data Affect Face Recognition Accuracy?
Title | How Does Gender Balance In Training Data Affect Face Recognition Accuracy? |
Authors | Vítor Albiero, Kai Zhang, Kevin W. Bowyer |
Abstract | Even though deep learning methods have greatly increased the overall accuracy of face recognition, an old problem still persists: accuracy is higher for men than for women. Previous researchers have speculated that the difference could be due to cosmetics, head pose, or hair covering the face. It is also often speculated that the lower accuracy for women is caused by women being under-represented in the training data. This work aims to investigate if gender imbalance in the training data is actually the cause of lower accuracy for females. Using a state-of-the-art deep CNN, three different loss functions, and two training datasets, we train each on seven subsets with different male/female ratios, totaling forty two train-ings. The trained face matchers are then tested on three different testing datasets. Results show that gender-balancing the dataset has an overall positive effect, with higher accuracy for most of the combinations of loss functions and datasets when a balanced subset is used. However, for the best combination of loss function and dataset, the original training dataset shows better accuracy on 3 out of 4 times. We observe that test accuracy for males is higher when the training data is all male. However, test accuracy for females is not maximized when the training data is all female. Fora number of combinations of loss function and test dataset, accuracy for females is higher when only 75% of the train-ing data is female than when 100% of the training data is female. This suggests that lower accuracy for females is nota simple result of the fraction of female training data. By clustering face features, we show that in general, male faces are closer to other male faces than female faces, and female faces are closer to other female faces than male faces |
Tasks | Face Recognition |
Published | 2020-02-07 |
URL | https://arxiv.org/abs/2002.02934v1 |
https://arxiv.org/pdf/2002.02934v1.pdf | |
PWC | https://paperswithcode.com/paper/how-does-gender-balance-in-training-data |
Repo | |
Framework | |
MEMO: A Deep Network for Flexible Combination of Episodic Memories
Title | MEMO: A Deep Network for Flexible Combination of Episodic Memories |
Authors | Andrea Banino, Adrià Puigdomènech Badia, Raphael Köster, Martin J. Chadwick, Vinicius Zambaldi, Demis Hassabis, Caswell Barry, Matthew Botvinick, Dharshan Kumaran, Charles Blundell |
Abstract | Recent research developing neural network architectures with external memory have often used the benchmark bAbI question and answering dataset which provides a challenging number of tasks requiring reasoning. Here we employed a classic associative inference task from the memory-based reasoning neuroscience literature in order to more carefully probe the reasoning capacity of existing memory-augmented architectures. This task is thought to capture the essence of reasoning – the appreciation of distant relationships among elements distributed across multiple facts or memories. Surprisingly, we found that current architectures struggle to reason over long distance associations. Similar results were obtained on a more complex task involving finding the shortest path between nodes in a path. We therefore developed MEMO, an architecture endowed with the capacity to reason over longer distances. This was accomplished with the addition of two novel components. First, it introduces a separation between memories (facts) stored in external memory and the items that comprise these facts in external memory. Second, it makes use of an adaptive retrieval mechanism, allowing a variable number of “memory hops” before the answer is produced. MEMO is capable of solving our novel reasoning tasks, as well as match state of the art results in bAbI. |
Tasks | |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.10913v1 |
https://arxiv.org/pdf/2001.10913v1.pdf | |
PWC | https://paperswithcode.com/paper/memo-a-deep-network-for-flexible-combination-1 |
Repo | |
Framework | |
Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms
Title | Four Principles of Explainable AI as Applied to Biometrics and Facial Forensic Algorithms |
Authors | P. Jonathon Phillips, Mark Przybocki |
Abstract | Traditionally, researchers in automatic face recognition and biometric technologies have focused on developing accurate algorithms. With this technology being integrated into operational systems, engineers and scientists are being asked, do these systems meet societal norms? The origin of this line of inquiry is `trust’ of artificial intelligence (AI) systems. In this paper, we concentrate on adapting explainable AI to face recognition and biometrics, and we present four principles of explainable AI to face recognition and biometrics. The principles are illustrated by $\it{four}$ case studies, which show the challenges and issues in developing algorithms that can produce explanations. | |
Tasks | Face Recognition |
Published | 2020-02-03 |
URL | https://arxiv.org/abs/2002.01014v1 |
https://arxiv.org/pdf/2002.01014v1.pdf | |
PWC | https://paperswithcode.com/paper/four-principles-of-explainable-ai-as-applied |
Repo | |
Framework | |
A Deep Structural Model for Analyzing Correlated Multivariate Time Series
Title | A Deep Structural Model for Analyzing Correlated Multivariate Time Series |
Authors | Changwei Hu, Yifan Hu, Sungyong Seo |
Abstract | Multivariate time series are routinely encountered in real-world applications, and in many cases, these time series are strongly correlated. In this paper, we present a deep learning structural time series model which can (i) handle correlated multivariate time series input, and (ii) forecast the targeted temporal sequence by explicitly learning/extracting the trend, seasonality, and event components. The trend is learned via a 1D and 2D temporal CNN and LSTM hierarchical neural net. The CNN-LSTM architecture can (i) seamlessly leverage the dependency among multiple correlated time series in a natural way, (ii) extract the weighted differencing feature for better trend learning, and (iii) memorize the long-term sequential pattern. The seasonality component is approximated via a non-liner function of a set of Fourier terms, and the event components are learned by a simple linear function of regressor encoding the event dates. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of time series data sets, such as forecasts of Amazon AWS Simple Storage Service (S3) and Elastic Compute Cloud (EC2) billings, and the closing prices for corporate stocks in the same category. |
Tasks | Time Series |
Published | 2020-01-02 |
URL | https://arxiv.org/abs/2001.00559v1 |
https://arxiv.org/pdf/2001.00559v1.pdf | |
PWC | https://paperswithcode.com/paper/a-deep-structural-model-for-analyzing |
Repo | |
Framework | |
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
Title | Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models |
Authors | Pranav Agarwal, Alejandro Betancourt, Vana Panagiotou, Natalia Díaz-Rodríguez |
Abstract | Image captioning models have been able to generate grammatically correct and human understandable sentences. However most of the captions convey limited information as the model used is trained on datasets that do not caption all possible objects existing in everyday life. Due to this lack of prior information most of the captions are biased to only a few objects present in the scene, hence limiting their usage in daily life. In this paper, we attempt to show the biased nature of the currently existing image captioning models and present a new image captioning dataset, Egoshots, consisting of 978 real life images with no captions. We further exploit the state of the art pre-trained image captioning and object recognition networks to annotate our images and show the limitations of existing works. Furthermore, in order to evaluate the quality of the generated captions, we propose a new image captioning metric, object based Semantic Fidelity (SF). Existing image captioning metrics can evaluate a caption only in the presence of their corresponding annotations; however, SF allows evaluating captions generated for images without annotations, making it highly useful for real life generated captions. |
Tasks | Image Captioning, Object Recognition |
Published | 2020-03-26 |
URL | https://arxiv.org/abs/2003.11743v2 |
https://arxiv.org/pdf/2003.11743v2.pdf | |
PWC | https://paperswithcode.com/paper/egoshots-an-ego-vision-life-logging-dataset |
Repo | |
Framework | |
LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation
Title | LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation |
Authors | Peng Jiang, Srikanth Saripalli |
Abstract | We present a boundary-aware domain adaptation model for Lidar point cloud semantic segmentation. Our model is designed to extract both the domain private features and the domain shared features using shared weight. We embedded Gated-SCNN into the shared features extractors to help it learn boundary information while learning other shared features. Besides, the CycleGAN mechanism is imposed for further adaptation. We conducted experiments on real-world datasets. The source domain data is from the Semantic KITTI dataset, and the target domain data is collected from our own platform (a warthog) in off-road as well as urban scenarios. The two datasets have differences in channel distributions, reflectivity distributions, and sensors setup. Using our approach, we are able to get a single model that can work on both domains. The model is capable of achieving the state of art performance on the source domain (Semantic KITTI dataset) and get 44.0% mIoU on the target domain dataset. |
Tasks | Domain Adaptation, Semantic Segmentation |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.01174v1 |
https://arxiv.org/pdf/2003.01174v1.pdf | |
PWC | https://paperswithcode.com/paper/lidarnet-a-boundary-aware-domain-adaptation |
Repo | |
Framework | |
Adversarial Robustness for Code
Title | Adversarial Robustness for Code |
Authors | Pavol Bielik, Martin Vechev |
Abstract | We propose a novel technique which addresses the challenge of learning accurate and robust models of code in a principled way. Our method consists of three key components: (i) learning to abstain from making a prediction if uncertain, (ii) adversarial training, and (iii) representation refinement which learns the program parts relevant for the prediction and abstracts the rest. These components are used to iteratively train multiple models, each of which learns a suitable program representation necessary to make robust predictions on a different subset of the dataset. We instantiated our approach to the task of type inference for dynamically typed languages and demonstrate its effectiveness by learning a model that achieves 88% accuracy and 84% robustness. Further, our evaluation shows that using the combination of all three components is key to obtaining accurate and robust models. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04694v1 |
https://arxiv.org/pdf/2002.04694v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-robustness-for-code |
Repo | |
Framework | |