April 2, 2020

3281 words 16 mins read

Paper Group ANR 186

Paper Group ANR 186

Focus on Semantic Consistency for Cross-domain Crowd Understanding. InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining. Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching. Estimating Treatment Effects with Observed Confounders and Mediators. Meta3D: Single-View 3D Object Reconstruction from Sh …

Focus on Semantic Consistency for Cross-domain Crowd Understanding

Title Focus on Semantic Consistency for Cross-domain Crowd Understanding
Authors Tao Han, Junyu Gao, Yuan Yuan, Qi Wang
Abstract For pixel-level crowd understanding, it is time-consuming and laborious in data collection and annotation. Some domain adaptation algorithms try to liberate it by training models with synthetic data, and the results in some recent works have proved the feasibility. However, we found that a mass of estimation errors in the background areas impede the performance of the existing methods. In this paper, we propose a domain adaptation method to eliminate it. According to the semantic consistency, a similar distribution in deep layer’s features of the synthetic and real-world crowd area, we first introduce a semantic extractor to effectively distinguish crowd and background in high-level semantic information. Besides, to further enhance the adapted model, we adopt adversarial learning to align features in the semantic space. Experiments on three representative real datasets show that the proposed domain adaptation scheme achieves the state-of-the-art for cross-domain counting problems.
Tasks Domain Adaptation
Published 2020-02-20
URL https://arxiv.org/abs/2002.08623v1
PDF https://arxiv.org/pdf/2002.08623v1.pdf
PWC https://paperswithcode.com/paper/focus-on-semantic-consistency-for-cross
Repo
Framework

InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining

Title InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining
Authors Junyang Lin, An Yang, Yichang Zhang, Jie Liu, Jingren Zhou, Hongxia Yang
Abstract Multi-modal pretraining for learning high-level multi-modal representation is a further step towards deep learning and artificial intelligence. In this work, we propose a novel model, namely InterBERT (BERT for Interaction), which owns strong capability of modeling interaction between the information flows of different modalities. The single-stream interaction module is capable of effectively processing information of multiple modalilties, and the two-stream module on top preserves the independence of each modality to avoid performance downgrade in single-modal tasks. We pretrain the model with three pretraining tasks, including masked segment modeling (MSM), masked region modeling (MRM) and image-text matching (ITM); and finetune the model on a series of vision-and-language downstream tasks. Experimental results demonstrate that InterBERT outperforms a series of strong baselines, including the most recent multi-modal pretraining methods, and the analysis shows that MSM and MRM are effective for pretraining and our method can achieve performances comparable to BERT in single-modal tasks. Besides, we propose a large-scale dataset for multi-modal pretraining in Chinese, and we develop the Chinese InterBERT which is the first Chinese multi-modal pretrained model. We pretrain the Chinese InterBERT on our proposed dataset of 3.1M image-text pairs from the mobile Taobao, the largest Chinese e-commerce platform. We finetune the model for text-based image retrieval, and recently we deployed the model online for topic-based recommendation.
Tasks Image Retrieval, Text Matching
Published 2020-03-30
URL https://arxiv.org/abs/2003.13198v1
PDF https://arxiv.org/pdf/2003.13198v1.pdf
PWC https://paperswithcode.com/paper/interbert-vision-and-language-interaction-for
Repo
Framework

Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching

Title Expressing Objects just like Words: Recurrent Visual Embedding for Image-Text Matching
Authors Tianlang Chen, Jiebo Luo
Abstract Existing image-text matching approaches typically infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image. However, they ignore the connections between the objects that are semantically related. These objects may collectively determine whether the image corresponds to a text or not. To address this problem, we propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN). In particular, given an input image-text pair, our model reorders the image objects based on the positions of their most related words in the text. In the same way as extracting the hidden features from word embeddings, the model leverages RNN to extract high-level object features from the reordered object inputs. We validate that the high-level object features contain useful joint information of semantically related objects, which benefit the retrieval task. To compute the image-text similarity, we incorporate a Multi-attention Cross Matching Model into DP-RNN. It aggregates the affinity between objects and words with cross-modality guided attention and self-attention. Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset. Extensive experiments demonstrate the effectiveness of our model.
Tasks Text Matching, Word Embeddings
Published 2020-02-20
URL https://arxiv.org/abs/2002.08510v1
PDF https://arxiv.org/pdf/2002.08510v1.pdf
PWC https://paperswithcode.com/paper/expressing-objects-just-like-words-recurrent
Repo
Framework

Estimating Treatment Effects with Observed Confounders and Mediators

Title Estimating Treatment Effects with Observed Confounders and Mediators
Authors Shantanu Gupta, Zachary C. Lipton, David Childers
Abstract Given a causal graph, the do-calculus can express treatment effects as functionals of the observational joint distribution that can be estimated empirically. Sometimes the do-calculus identifies multiple valid formulae, prompting us to compare the statistical properties of the corresponding estimators. For example, the backdoor formula applies when all confounders are observed and the frontdoor formula applies when an observed mediator transmits the causal effect. In this paper, we investigate the over-identified scenario where both confounders and mediators are observed, rendering both estimators valid. Addressing the linear Gaussian causal model, we derive the finite-sample variance for both estimators and demonstrate that either estimator can dominate the other by an unbounded constant factor depending on the model parameters. Next, we derive an optimal estimator, which leverages all observed variables to strictly outperform the backdoor and frontdoor estimators. We also present a procedure for combining two datasets, with confounders observed in one and mediators in the other. Finally, we evaluate our methods on both simulated data and the IHDP and JTPA datasets.
Tasks
Published 2020-03-26
URL https://arxiv.org/abs/2003.11991v1
PDF https://arxiv.org/pdf/2003.11991v1.pdf
PWC https://paperswithcode.com/paper/estimating-treatment-effects-with-observed
Repo
Framework

Meta3D: Single-View 3D Object Reconstruction from Shape Priors in Memory

Title Meta3D: Single-View 3D Object Reconstruction from Shape Priors in Memory
Authors Shuo Yang, Min Xu, Hongxun Yao
Abstract 3D shape reconstruction from a single-view RGB image is an ill-posed problem due to the invisible parts of the object to be reconstructed. Most of the existing methods rely on large-scale data to obtain shape priors through tuning parameters of reconstruction models. These methods might not be able to deal with the cases with heavy object occlusions and noisy background since prior information can not be retained completely or applied efficiently. In this paper, we are the first to develop a memory-based meta-learning framework for single-view 3D reconstruction. A write controller is designed to extract shape-discriminative features from images and store image features and their corresponding volumes into external memory. A read controller is proposed to sequentially encode shape priors related to the input image and predict a shape-specific refiner. Experimental results demonstrate that our Meta3D outperforms state-of-the-art methods with a large margin through retaining shape priors explicitly, especially for the extremely difficult cases.
Tasks 3D Object Reconstruction, 3D Reconstruction, Meta-Learning, Object Reconstruction, Single-View 3D Reconstruction
Published 2020-03-08
URL https://arxiv.org/abs/2003.03711v2
PDF https://arxiv.org/pdf/2003.03711v2.pdf
PWC https://paperswithcode.com/paper/meta3d-single-view-3d-object-reconstruction
Repo
Framework

FSinR: an exhaustive package for feature selection

Title FSinR: an exhaustive package for feature selection
Authors F. Aragón-Royón, A. Jiménez-Vílchez, A. Arauzo-Azofra, J. M. Benítez
Abstract Feature Selection (FS) is a key task in Machine Learning. It consists in selecting a number of relevant variables for the model construction or data analysis. We present the R package, FSinR, which implements a variety of widely known filter and wrapper methods, as well as search algorithms. Thus, the package provides the possibility to perform the feature selection process, which consists in the combination of a guided search on the subsets of features with the filter or wrapper methods that return an evaluation measure of those subsets. In this article, we also present some examples on the usage of the package and a comparison with other packages available in R that contain methods for feature selection.
Tasks Feature Selection
Published 2020-02-24
URL https://arxiv.org/abs/2002.10330v1
PDF https://arxiv.org/pdf/2002.10330v1.pdf
PWC https://paperswithcode.com/paper/fsinr-an-exhaustive-package-for-feature
Repo
Framework

ActGAN: Flexible and Efficient One-shot Face Reenactment

Title ActGAN: Flexible and Efficient One-shot Face Reenactment
Authors Ivan Kosarevych, Marian Petruk, Markian Kostiv, Orest Kupyn, Mykola Maksymenko, Volodymyr Budzan
Abstract This paper introduces ActGAN - a novel end-to-end generative adversarial network (GAN) for one-shot face reenactment. Given two images, the goal is to transfer the facial expression of the source actor onto a target person in a photo-realistic fashion. While existing methods require target identity to be predefined, we address this problem by introducing a “many-to-many” approach, which allows arbitrary persons both for source and target without additional retraining. To this end, we employ the Feature Pyramid Network (FPN) as a core generator building block - the first application of FPN in face reenactment, producing finer results. We also introduce a solution to preserve a person’s identity between synthesized and target person by adopting the state-of-the-art approach in deep face recognition domain. The architecture readily supports reenactment in different scenarios: “many-to-many”, “one-to-one”, “one-to-another” in terms of expression accuracy, identity preservation, and overall image quality. We demonstrate that ActGAN achieves competitive performance against recent works concerning visual quality.
Tasks Face Recognition, Face Reenactment
Published 2020-03-30
URL https://arxiv.org/abs/2003.13840v1
PDF https://arxiv.org/pdf/2003.13840v1.pdf
PWC https://paperswithcode.com/paper/actgan-flexible-and-efficient-one-shot-face
Repo
Framework

Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay

Title Continuous Domain Adaptation with Variational Domain-Agnostic Feature Replay
Authors Qicheng Lao, Xiang Jiang, Mohammad Havaei, Yoshua Bengio
Abstract Learning in non-stationary environments is one of the biggest challenges in machine learning. Non-stationarity can be caused by either task drift, i.e., the drift in the conditional distribution of labels given the input data, or the domain drift, i.e., the drift in the marginal distribution of the input data. This paper aims to tackle this challenge in the context of continuous domain adaptation, where the model is required to learn new tasks adapted to new domains in a non-stationary environment while maintaining previously learned knowledge. To deal with both drifts, we propose variational domain-agnostic feature replay, an approach that is composed of three components: an inference module that filters the input data into domain-agnostic representations, a generative module that facilitates knowledge transfer, and a solver module that applies the filtered and transferable knowledge to solve the queries. We address the two fundamental scenarios in continuous domain adaptation, demonstrating the effectiveness of our proposed approach for practical usage.
Tasks Domain Adaptation, Transfer Learning
Published 2020-03-09
URL https://arxiv.org/abs/2003.04382v1
PDF https://arxiv.org/pdf/2003.04382v1.pdf
PWC https://paperswithcode.com/paper/continuous-domain-adaptation-with-variational
Repo
Framework

Advances in Bayesian Probabilistic Modeling for Industrial Applications

Title Advances in Bayesian Probabilistic Modeling for Industrial Applications
Authors Sayan Ghosh, Piyush Pandita, Steven Atkinson, Waad Subber, Yiming Zhang, Natarajan Chennimalai Kumar, Suryarghya Chakrabarti, Liping Wang
Abstract Industrial applications frequently pose a notorious challenge for state-of-the-art methods in the contexts of optimization, designing experiments and modeling unknown physical response. This problem is aggravated by limited availability of clean data, uncertainty in available physics-based models and additional logistic and computational expense associated with experiments. In such a scenario, Bayesian methods have played an impactful role in alleviating the aforementioned obstacles by quantifying uncertainty of different types under limited resources. These methods, usually deployed as a framework, allows decision makers to make informed choices under uncertainty while being able to incorporate information on the the fly, usually in the form of data, from multiple sources while being consistent with the physical intuition about the problem. This is a major advantage that Bayesian methods bring to fruition especially in the industrial context. This paper is a compendium of the Bayesian modeling methodology that is being consistently developed at GE Research. The methodology, called GE’s Bayesian Hybrid Modeling (GEBHM), is a probabilistic modeling method, based on the Kennedy and O’Hagan framework, that has been continuously scaled-up and industrialized over several years. In this work, we explain the various advancements in GEBHM’s methods and demonstrate their impact on several challenging industrial problems.
Tasks
Published 2020-03-26
URL https://arxiv.org/abs/2003.11939v1
PDF https://arxiv.org/pdf/2003.11939v1.pdf
PWC https://paperswithcode.com/paper/advances-in-bayesian-probabilistic-modeling
Repo
Framework

Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration

Title Off-Policy Deep Reinforcement Learning with Analogous Disentangled Exploration
Authors Anji Liu, Yitao Liang, Guy Van den Broeck
Abstract Off-policy reinforcement learning (RL) is concerned with learning a rewarding policy by executing another policy that gathers samples of experience. While the former policy (i.e. target policy) is rewarding but in-expressive (in most cases, deterministic), doing well in the latter task, in contrast, requires an expressive policy (i.e. behavior policy) that offers guided and effective exploration. Contrary to most methods that make a trade-off between optimality and expressiveness, disentangled frameworks explicitly decouple the two objectives, which each is dealt with by a distinct separate policy. Although being able to freely design and optimize the two policies with respect to their own objectives, naively disentangling them can lead to inefficient learning or stability issues. To mitigate this problem, our proposed method Analogous Disentangled Actor-Critic (ADAC) designs analogous pairs of actors and critics. Specifically, ADAC leverages a key property about Stein variational gradient descent (SVGD) to constraint the expressive energy-based behavior policy with respect to the target one for effective exploration. Additionally, an analogous critic pair is introduced to incorporate intrinsic rewards in a principled manner, with theoretical guarantees on the overall learning stability and effectiveness. We empirically evaluate environment-reward-only ADAC on 14 continuous-control tasks and report the state-of-the-art on 10 of them. We further demonstrate ADAC, when paired with intrinsic rewards, outperform alternatives in exploration-challenging tasks.
Tasks Continuous Control
Published 2020-02-25
URL https://arxiv.org/abs/2002.10738v2
PDF https://arxiv.org/pdf/2002.10738v2.pdf
PWC https://paperswithcode.com/paper/off-policy-deep-reinforcement-learning-with
Repo
Framework

Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose

Title Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose
Authors Xianfang Zeng, Yusu Pan, Mengmeng Wang, Jiangning Zhang, Yong Liu
Abstract Recent works have shown how realistic talking face images can be obtained under the supervision of geometry guidance, e.g., facial landmark or boundary. To alleviate the demand for manual annotations, in this paper, we propose a novel self-supervised hybrid model (DAE-GAN) that learns how to reenact face naturally given large amounts of unlabeled videos. Our approach combines two deforming autoencoders with the latest advances in the conditional generation. On the one hand, we adopt the deforming autoencoder to disentangle identity and pose representations. A strong prior in talking face videos is that each frame can be encoded as two parts: one for video-specific identity and the other for various poses. Inspired by that, we utilize a multi-frame deforming autoencoder to learn a pose-invariant embedded face for each video. Meanwhile, a multi-scale deforming autoencoder is proposed to extract pose-related information for each frame. On the other hand, the conditional generator allows for enhancing fine details and overall reality. It leverages the disentangled features to generate photo-realistic and pose-alike face images. We evaluate our model on VoxCeleb1 and RaFD dataset. Experiment results demonstrate the superior quality of reenacted images and the flexibility of transferring facial movements between identities.
Tasks Face Reenactment
Published 2020-03-29
URL https://arxiv.org/abs/2003.12957v1
PDF https://arxiv.org/pdf/2003.12957v1.pdf
PWC https://paperswithcode.com/paper/realistic-face-reenactment-via-self
Repo
Framework

Safeguarded Learned Convex Optimization

Title Safeguarded Learned Convex Optimization
Authors Howard Heaton, Xiaohan Chen, Zhangyang Wang, Wotao Yin
Abstract Many applications require repeatedly solving a certain type of optimization problem, each time with new (but similar) data. Data-driven algorithms can “learn to optimize” (L2O) with much fewer iterations and with similar cost per iteration as general-purpose optimization algorithms. L2O algorithms are often derived from general-purpose algorithms, but with the inclusion of (possibly many) tunable parameters. Exceptional performance has been demonstrated when the parameters are optimized for a particular distribution of data. Unfortunately, it is impossible to ensure all L2O algorithms always converge to a solution. However, we present a framework that uses L2O updates together with a safeguard to guarantee convergence for convex problems with proximal and/or gradient oracles. The safeguard is simple and computationally cheap to implement, and it should be activated only when the current L2O updates would perform poorly or appear to diverge. This approach yields the numerical benefits of employing machine learning methods to create rapid L2O algorithms while still guaranteeing convergence. Our numerical examples demonstrate the efficacy of this approach for existing and new L2O schemes.
Tasks
Published 2020-03-04
URL https://arxiv.org/abs/2003.01880v1
PDF https://arxiv.org/pdf/2003.01880v1.pdf
PWC https://paperswithcode.com/paper/safeguarded-learned-convex-optimization
Repo
Framework

Gender Representation in Open Source Speech Resources

Title Gender Representation in Open Source Speech Resources
Authors Mahault Garnerin, Solange Rossato, Laurent Besacier
Abstract With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited/non elicited speech, low/high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.
Tasks
Published 2020-03-18
URL https://arxiv.org/abs/2003.08132v1
PDF https://arxiv.org/pdf/2003.08132v1.pdf
PWC https://paperswithcode.com/paper/gender-representation-in-open-source-speech
Repo
Framework

Predictive intraday correlations in stable and volatile market environments: Evidence from deep learning

Title Predictive intraday correlations in stable and volatile market environments: Evidence from deep learning
Authors Ben Moews, Gbenga Ibikunle
Abstract Standard methods and theories in finance can be ill-equipped to capture highly non-linear interactions in financial prediction problems based on large-scale datasets, with deep learning offering a way to gain insights into correlations in markets as complex systems. In this paper, we apply deep learning to econometrically constructed gradients to learn and exploit lagged correlations among S&P 500 stocks to compare model behaviour in stable and volatile market environments, and under the exclusion of target stock information for predictions. In order to measure the effect of time horizons, we predict intraday and daily stock price movements in varying interval lengths and gauge the complexity of the problem at hand with a modification of our model architecture. Our findings show that accuracies, while remaining significant and demonstrating the exploitability of lagged correlations in stock markets, decrease with shorter prediction horizons. We discuss implications for modern finance theory and our work’s applicability as an investigative tool for portfolio managers. Lastly, we show that our model’s performance is consistent in volatile markets by exposing it to the environment of the recent financial crisis of 2007/2008.
Tasks
Published 2020-02-24
URL https://arxiv.org/abs/2002.10385v1
PDF https://arxiv.org/pdf/2002.10385v1.pdf
PWC https://paperswithcode.com/paper/predictive-intraday-correlations-in-stable
Repo
Framework

Causal datasheet: An approximate guide to practically assess Bayesian networks in the real world

Title Causal datasheet: An approximate guide to practically assess Bayesian networks in the real world
Authors Bradley Butcher, Vincent S. Huang, Jeremy Reffin, Sema K. Sgaier, Grace Charles, Novi Quadrianto
Abstract In solving real-world problems like changing healthcare-seeking behaviors, designing interventions to improve downstream outcomes requires an understanding of the causal links within the system. Causal Bayesian Networks (BN) have been proposed as one such powerful method. In real-world applications, however, confidence in the results of BNs are often moderate at best. This is due in part to the inability to validate against some ground truth, as the DAG is not available. This is especially problematic if the learned DAG conflicts with pre-existing domain doctrine. At the policy level, one must justify insights generated by such analysis, preferably accompanying them with uncertainty estimation. Here we propose a causal extension to the datasheet concept proposed by Gebru et al (2018) to include approximate BN performance expectations for any given dataset. To generate the results for a prototype Causal Datasheet, we constructed over 30,000 synthetic datasets with properties mirroring characteristics of real data. We then recorded the results given by state-of-the-art structure learning algorithms. These results were used to populate the Causal Datasheet, and recommendations were automatically generated dependent on expected performance. As a proof of concept, we used our Causal Datasheet Generation Tool (CDG-T) to assign expected performance expectations to a maternal health survey we conducted in Uttar Pradesh, India.
Tasks
Published 2020-03-12
URL https://arxiv.org/abs/2003.07182v1
PDF https://arxiv.org/pdf/2003.07182v1.pdf
PWC https://paperswithcode.com/paper/causal-datasheet-an-approximate-guide-to
Repo
Framework
comments powered by Disqus