January 31, 2020

3146 words 15 mins read

Paper Group ANR 188

A linear method for camera pair self-calibration and multi-view reconstruction with geometrically verified correspondences. A Novel Unsupervised Post-Processing Calibration Method for DNNS with Robustness to Domain Shift. Evidential distance measure in complex belief function theory. Discovering Options for Exploration by Minimizing Cover Time. Agi …

A linear method for camera pair self-calibration and multi-view reconstruction with geometrically verified correspondences


Title	A linear method for camera pair self-calibration and multi-view reconstruction with geometrically verified correspondences
Authors	Nikos Melanitis, Petros Maragos
Abstract	We examine 3D reconstruction of architectural scenes in unordered sets of uncalibrated images. We introduce a linear method to self-calibrate and find the metric reconstruction of a camera pair. We assume unknown and different focal lengths but otherwise known internal camera parameters and a known projective reconstruction of the camera pair. We recover two possible camera configurations in space and use the Cheirality condition, that all 3D scene points are in front of both cameras, to disambiguate the solution. We show in two Theorems, first that the two solutions are in mirror positions and then the relations between their viewing directions. Our new method performs on par (median rotation error $\Delta R = 3.49^{\circ}$) with the standard approach of Kruppa equations ($\Delta R = 3.77^{\circ}$) for self-calibration and 5-Point algorithm for calibrated metric reconstruction of a camera pair. We reject erroneous image correspondences by introducing a method to examine whether point correspondences appear in the same order along $x, y$ image axes in image pairs. We evaluate this method by its precision and recall and show that it improves the robustness of point matches in architectural and general scenes. Finally, we integrate all the introduced methods to a 3D reconstruction pipeline. We utilize the numerous camera pair metric recontructions using rotation-averaging algorithms and a novel method to average focal length estimates.
Tasks	3D Reconstruction, Calibration
Published	2019-06-28
URL	https://arxiv.org/abs/1906.12075v1
PDF	https://arxiv.org/pdf/1906.12075v1.pdf
PWC	https://paperswithcode.com/paper/a-linear-method-for-camera-pair-self
Repo
Framework

A Novel Unsupervised Post-Processing Calibration Method for DNNS with Robustness to Domain Shift


Title	A Novel Unsupervised Post-Processing Calibration Method for DNNS with Robustness to Domain Shift
Authors	Azadeh Sadat Mozafari, Hugo Siqueira Gomes, Christian Gagne
Abstract	The uncertainty estimation is critical in real-world decision making applications, especially when distributional shift between the training and test data are prevalent. Many calibration methods in the literature have been proposed to improve the predictive uncertainty of DNNs which are generally not well-calibrated. However, none of them is specifically designed to work properly under domain shift condition. In this paper, we propose Unsupervised Temperature Scaling (UTS) as a robust calibration method to domain shift. It exploits unlabeled test samples instead of the training one to adjust the uncertainty prediction of deep models towards the test distribution. UTS utilizes a novel loss function, weighted NLL, which allows unsupervised calibration. We evaluate UTS on a wide range of model-datasets to show the possibility of calibration without labels and demonstrate the robustness of UTS compared to other methods (e.g., TS, MC-dropout, SVI, ensembles) in shifted domains.
Tasks	Calibration, Decision Making
Published	2019-11-25
URL	https://arxiv.org/abs/1911.11195v1
PDF	https://arxiv.org/pdf/1911.11195v1.pdf
PWC	https://paperswithcode.com/paper/a-novel-unsupervised-post-processing
Repo
Framework

Evidential distance measure in complex belief function theory


Title	Evidential distance measure in complex belief function theory
Authors	Fuyuan Xiao
Abstract	In this paper, an evidential distance measure is proposed which can measure the difference or dissimilarity between complex basic belief assignments (CBBAs), in which the CBBAs are composed of complex numbers. When the CBBAs are degenerated from complex numbers to real numbers, i.e., BBAs, the proposed distance will degrade into the Jousselme et al.‘s distance. Therefore, the proposed distance provides a promising way to measure the differences between evidences in a more general framework of complex plane space.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1907.00716v1
PDF	https://arxiv.org/pdf/1907.00716v1.pdf
PWC	https://paperswithcode.com/paper/evidential-distance-measure-in-complex-belief
Repo
Framework

Discovering Options for Exploration by Minimizing Cover Time


Title	Discovering Options for Exploration by Minimizing Cover Time
Authors	Yuu Jinnai, Jee Won Park, David Abel, George Konidaris
Abstract	One of the main challenges in reinforcement learning is solving tasks with sparse reward. We show that the difficulty of discovering a distant rewarding state in an MDP is bounded by the expected cover time of a random walk over the graph induced by the MDP’s transition dynamics. We therefore propose to accelerate exploration by constructing options that minimize cover time. The proposed algorithm finds an option which provably diminishes the expected number of steps to visit every state in the state space by a uniform random walk. We show empirically that the proposed algorithm improves the learning time in several domains with sparse rewards.
Tasks
Published	2019-03-02
URL	http://arxiv.org/abs/1903.00606v2
PDF	http://arxiv.org/pdf/1903.00606v2.pdf
PWC	https://paperswithcode.com/paper/discovering-options-for-exploration-by
Repo
Framework

Aging Memories Generate More Fluent Dialogue Responses with Memory Networks


Title	Aging Memories Generate More Fluent Dialogue Responses with Memory Networks
Authors	Omar U. Florez, Erik Mueller
Abstract	The integration of a Knowledge Base (KB) into a neural dialogue agent is one of the key challenges in Conversational AI. Memory networks has proven to be effective to encode KB information into an external memory to thus generate more fluent and informed responses. Unfortunately, such memory becomes full of latent representations during training, so the most common strategy is to overwrite old memory entries randomly. In this paper, we question this approach and provide experimental evidence showing that conventional memory networks generate many redundant latent vectors resulting in overfitting and the need for larger memories. We introduce memory dropout as an automatic technique that encourages diversity in the latent space by 1) Aging redundant memories to increase their probability of being overwritten during training 2) Sampling new memories that summarize the knowledge acquired by redundant memories. This technique allows us to incorporate Knowledge Bases to achieve state-of-the-art dialogue generation in the Stanford Multi-Turn Dialogue dataset. Considering the same architecture, its use provides an improvement of +2.2 BLEU points for the automatic generation of responses and an increase of +8.1% in the recognition of named entities.
Tasks	Dialogue Generation
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08522v1
PDF	https://arxiv.org/pdf/1911.08522v1.pdf
PWC	https://paperswithcode.com/paper/aging-memories-generate-more-fluent-dialogue
Repo
Framework

Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning


Title	Look, Read and Feel: Benchmarking Ads Understanding with Multimodal Multitask Learning
Authors	Huaizheng Zhang, Yong Luo, Qiming Ai, Yonggang Wen
Abstract	Given the massive market of advertising and the sharply increasing online multimedia content (such as videos), it is now fashionable to promote advertisements (ads) together with the multimedia content. It is exhausted to find relevant ads to match the provided content manually, and hence, some automatic advertising techniques are developed. Since ads are usually hard to understand only according to its visual appearance due to the contained visual metaphor, some other modalities, such as the contained texts, should be exploited for understanding. To further improve user experience, it is necessary to understand both the topic and sentiment of the ads. This motivates us to develop a novel deep multimodal multitask framework to integrate multiple modalities to achieve effective topic and sentiment prediction simultaneously for ads understanding. In particular, our model first extracts multimodal information from ads and learn high-level and comparable representations. The visual metaphor of the ad is decoded in an unsupervised manner. The obtained representations are then fed into the proposed hierarchical multimodal attention modules to learn task-specific representations for final prediction. A multitask loss function is also designed to train both the topic and sentiment prediction models jointly in an end-to-end manner. We conduct extensive experiments on the latest and large advertisement dataset and achieve state-of-the-art performance for both prediction tasks. The obtained results could be utilized as a benchmark for ads understanding.
Tasks
Published	2019-12-21
URL	https://arxiv.org/abs/1912.10248v2
PDF	https://arxiv.org/pdf/1912.10248v2.pdf
PWC	https://paperswithcode.com/paper/look-read-and-feel-benchmarking-ads
Repo
Framework

Composition-Aware Image Aesthetics Assessment


Title	Composition-Aware Image Aesthetics Assessment
Authors	Dong Liu, Rohit Puri, Nagendra Kamath, Subhabrata Bhattachary
Abstract	Automatic image aesthetics assessment is important for a wide variety of applications such as on-line photo suggestion, photo album management and image retrieval. Previous methods have focused on mapping the holistic image content to a high or low aesthetics rating. However, the composition information of an image characterizes the harmony of its visual elements according to the principles of art, and provides richer information for learning aesthetics. In this work, we propose to model the image composition information as the mutual dependency of its local regions, and design a novel architecture to leverage such information to boost the performance of aesthetics assessment. To achieve this, we densely partition an image into local regions and compute aesthetics-preserving features over the regions to characterize the aesthetics properties of image content. With the feature representation of local regions, we build a region composition graph in which each node denotes one region and any two nodes are connected by an edge weighted by the similarity of the region features. We perform reasoning on this graph via graph convolution, in which the activation of each node is determined by its highly correlated neighbors. Our method naturally uncovers the mutual dependency of local regions in the network training procedure, and achieves the state-of-the-art performance on the benchmark visual aesthetics datasets.
Tasks	Image Retrieval
Published	2019-07-25
URL	https://arxiv.org/abs/1907.10801v1
PDF	https://arxiv.org/pdf/1907.10801v1.pdf
PWC	https://paperswithcode.com/paper/composition-aware-image-aesthetics-assessment
Repo
Framework

A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data


Title	A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data
Authors	Yinhe Zheng, Rongsheng Zhang, Xiaoxi Mao, Minlie Huang
Abstract	Endowing dialogue systems with personas is essential to deliver more human-like conversations. However, this problem is still far from well explored due to the difficulties of both embodying personalities in natural languages and the persona sparsity issue observed in most dialogue corpora. This paper proposes a pre-training based personalized dialogue model that can generate coherent responses using persona-sparse dialogue data. In this method, a pre-trained language model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers’ personas together with dialogue histories. Further, to incorporate the target persona in the decoding process and to balance its contribution, an attention routing structure is devised in the decoder to merge features extracted from the target persona and dialogue contexts using dynamically predicted weights. Our model can utilize persona-sparse dialogues in a unified manner during the training process, and can also control the amount of persona-related features to exhibit during the inference process. Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data.
Tasks	Dialogue Generation, Language Modelling
Published	2019-11-12
URL	https://arxiv.org/abs/1911.04700v1
PDF	https://arxiv.org/pdf/1911.04700v1.pdf
PWC	https://paperswithcode.com/paper/a-pre-training-based-personalized-dialogue
Repo
Framework

Nail Polish Try-On: Realtime Semantic Segmentation of Small Objects for Native and Browser Smartphone AR Applications


Title	Nail Polish Try-On: Realtime Semantic Segmentation of Small Objects for Native and Browser Smartphone AR Applications
Authors	Brendan Duke, Abdalla Ahmed, Edmund Phung, Irina Kezele, Parham Aarabi
Abstract	We provide a system for semantic segmentation of small objects that enables nail polish try-on AR applications to run client-side in realtime in native and web mobile applications. By adjusting input resolution and neural network depth, our model design enables a smooth trade-off of performance and runtime, with the highest performance setting achieving~\num{94.5} mIoU at 29.8ms runtime in native applications on an iPad Pro. We also provide a postprocessing and rendering algorithm for nail polish try-on, which integrates with our semantic segmentation and fingernail base-tip direction predictions.
Tasks	Semantic Segmentation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02222v2
PDF	https://arxiv.org/pdf/1906.02222v2.pdf
PWC	https://paperswithcode.com/paper/nail-polish-try-on-realtime-semantic
Repo
Framework

Multi-Label Classification with Label Graph Superimposing


Title	Multi-Label Classification with Label Graph Superimposing
Authors	Ya Wang, Dongliang He, Fu Li, Xiang Long, Zhichao Zhou, Jinwen Ma, Shilei Wen
Abstract	Images or videos always contain multiple objects or actions. Multi-label recognition has been witnessed to achieve pretty performance attribute to the rapid development of deep learning technologies. Recently, graph convolution network (GCN) is leveraged to boost the performance of multi-label recognition. However, what is the best way for label correlation modeling and how feature learning can be improved with label system awareness are still unclear. In this paper, we propose a label graph superimposing framework to improve the conventional GCN+CNN framework developed for multi-label recognition in the following two aspects. Firstly, we model the label correlations by superimposing label graph built from statistical co-occurrence information into the graph constructed from knowledge priors of labels, and then multi-layer graph convolutions are applied on the final superimposed graph for label embedding abstraction. Secondly, we propose to leverage embedding of the whole label system for better representation learning. In detail, lateral connections between GCN and CNN are added at shallow, middle and deep layers to inject information of label system into backbone CNN for label-awareness in the feature learning process. Extensive experiments are carried out on MS-COCO and Charades datasets, showing that our proposed solution can greatly improve the recognition performance and achieves new state-of-the-art recognition performance.
Tasks	Multi-Label Classification, Representation Learning
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09243v1
PDF	https://arxiv.org/pdf/1911.09243v1.pdf
PWC	https://paperswithcode.com/paper/multi-label-classification-with-label-graph
Repo
Framework

Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views


Title	Fast and Robust Multi-Person 3D Pose Estimation from Multiple Views
Authors	Junting Dong, Wen Jiang, Qixing Huang, Hujun Bao, Xiaowei Zhou
Abstract	This paper addresses the problem of 3D pose estimation for multiple people in a few calibrated camera views. The main challenge of this problem is to find the cross-view correspondences among noisy and incomplete 2D pose predictions. Most previous methods address this challenge by directly reasoning in 3D using a pictorial structure model, which is inefficient due to the huge state space. We propose a fast and robust approach to solve this problem. Our key idea is to use a multi-way matching algorithm to cluster the detected 2D poses in all views. Each resulting cluster encodes 2D poses of the same person across different views and consistent correspondences across the keypoints, from which the 3D pose of each person can be effectively inferred. The proposed convex optimization based multi-way matching algorithm is efficient and robust against missing and false detections, without knowing the number of people in the scene. Moreover, we propose to combine geometric and appearance cues for cross-view matching. The proposed approach achieves significant performance gains from the state-of-the-art (96.3% vs. 90.6% and 96.9% vs. 88% on the Campus and Shelf datasets, respectively), while being efficient for real-time applications.
Tasks	3D Pose Estimation, Pose Estimation
Published	2019-01-14
URL	http://arxiv.org/abs/1901.04111v1
PDF	http://arxiv.org/pdf/1901.04111v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-robust-multi-person-3d-pose
Repo
Framework

The Use of Gaussian Processes in System Identification


Title	The Use of Gaussian Processes in System Identification
Authors	Simo Särkkä
Abstract	Gaussian processes are used in machine learning to learn input-output mappings from observed data. Gaussian process regression is based on imposing a Gaussian process prior on the unknown regressor function and statistically conditioning it on the observed data. In system identification, Gaussian processes are used to form time series prediction models such as non-linear finite-impulse response (NFIR) models as well as non-linear autoregressive (NARX) models. Gaussian process state-space models (GPSS) can be used to learn the dynamic and measurement models for a state-space representation of the input-output data. Temporal and spatio-temporal Gaussian processes can be directly used to form regressor on the data in the time domain. The aim of this article is to briefly outline the main directions in system identification methods using Gaussian processes.
Tasks	Gaussian Processes, Time Series, Time Series Prediction
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06066v1
PDF	https://arxiv.org/pdf/1907.06066v1.pdf
PWC	https://paperswithcode.com/paper/the-use-of-gaussian-processes-in-system
Repo
Framework

Approximate Query Processing using Deep Generative Models


Title	Approximate Query Processing using Deep Generative Models
Authors	Saravanan Thirumuruganathan, Shohedul Hasan, Nick Koudas, Gautam Das
Abstract	Data is generated at an unprecedented rate surpassing our ability to analyze them. The database community has pioneered many novel techniques for Approximate Query Processing (AQP) that could give approximate results in a fraction of time needed for computing exact results. In this work, we explore the usage of deep learning (DL) for answering aggregate queries specifically for interactive applications such as data exploration and visualization. We use deep generative models, an unsupervised learning based approach, to learn the data distribution faithfully such that aggregate queries could be answered approximately by generating samples from the learned model. The model is often compact - few hundred KBs - so that arbitrary AQP queries could be answered on the client side without contacting the database server. Our other contributions include identifying model bias and minimizing it through a rejection sampling based approach and an algorithm to build model ensembles for AQP for improved accuracy. Our extensive experiments show that our proposed approach can provide answers with high accuracy and low latency.
Tasks
Published	2019-03-24
URL	https://arxiv.org/abs/1903.10000v3
PDF	https://arxiv.org/pdf/1903.10000v3.pdf
PWC	https://paperswithcode.com/paper/approximate-query-processing-using-deep
Repo
Framework

Synthetic learner: model-free inference on treatments over time


Title	Synthetic learner: model-free inference on treatments over time
Authors	Davide Viviano, Jelena Bradic
Abstract	Understanding of the effect of a particular treatment or a policy pertains to many areas of interest – ranging from political economics, marketing to health-care and personalized treatment studies. In this paper, we develop a non-parametric, model-free test for detecting the effects of treatment over time that extends widely used Synthetic Control tests. The test is built on counterfactual predictions arising from many learning algorithms. In the Neyman-Rubin potential outcome framework with possible carry-over effects, we show that the proposed test is asymptotically consistent for stationary, beta mixing processes. We do not assume that class of learners captures the correct model necessarily. We also discuss estimates of the average treatment effect, and we provide regret bounds on the predictive performance. To the best of our knowledge, this is the first set of results that allow for example any Random Forest to be useful for provably valid statistical inference in the Synthetic Control setting. In experiments, we show that our Synthetic Learner is substantially more powerful than classical methods based on Synthetic Control or Difference-in-Differences, especially in the presence of non-linear outcome models.
Tasks
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01490v1
PDF	http://arxiv.org/pdf/1904.01490v1.pdf
PWC	https://paperswithcode.com/paper/synthetic-learner-model-free-inference-on
Repo
Framework

Learning Geometrically Consistent Mesh Corrections


Title	Learning Geometrically Consistent Mesh Corrections
Authors	Ştefan Săftescu, Paul Newman
Abstract	Building good 3D maps is a challenging and expensive task, which requires high-quality sensors and careful, time-consuming scanning. We seek to reduce the cost of building good reconstructions by correcting views of existing low-quality ones in a post-hoc fashion using learnt priors over surfaces and appearance. We train a CNN model to predict the difference in inverse-depth from varying viewpoints of two meshes – one of low quality that we wish to correct, and one of high-quality that we use as a reference. In contrast to previous work, we pay attention to the problem of excessive smoothing in corrected meshes. We address this with a suitable network architecture, and introduce a loss-weighting mechanism that emphasises edges in the prediction. Furthermore, smooth predictions result in geometrical inconsistencies. To deal with this issue, we present a loss function which penalises re-projection differences that are not due to occlusions. Our model reduces gross errors by 45.3%–77.5%, up to five times more than previous work.
Tasks
Published	2019-09-08
URL	https://arxiv.org/abs/1909.03471v1
PDF	https://arxiv.org/pdf/1909.03471v1.pdf
PWC	https://paperswithcode.com/paper/learning-geometrically-consistent-mesh
Repo
Framework