January 29, 2020

2793 words 14 mins read

Paper Group ANR 570

Paper Group ANR 570

OptiBox: Breaking the Limits of Proposals for Visual Grounding. Non-rigid 3D shape retrieval based on multi-view metric learning. Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding. The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera. A Sketch Based 3D Shape Retrieval Approach Based on …

OptiBox: Breaking the Limits of Proposals for Visual Grounding

Title OptiBox: Breaking the Limits of Proposals for Visual Grounding
Authors Zicong Fan, Si Yi Meng, Leonid Sigal, James J. Little
Abstract The problem of language grounding has attracted much attention in recent years due to its pivotal role in more general image-lingual high level reasoning tasks (e.g., image captioning, VQA). Despite the tremendous progress in visual grounding, the performance of most approaches has been hindered by the quality of bounding box proposals obtained in the early stages of all recent pipelines. To address this limitation, we propose a general progressive query-guided bounding box refinement architecture (OptiBox) that leverages global image encoding for added context. We apply this architecture in the context of the GroundeR model, first introduced in 2016, which has a number of unique and appealing properties, such as the ability to learn in the semi-supervised setting by leveraging cyclic language-reconstruction. Using GroundeR + OptiBox and a simple semantic language reconstruction loss that we propose, we achieve state-of-the-art grounding performance in the supervised setting on Flickr30k Entities dataset. More importantly, we are able to surpass many recent fully supervised models with only 50% of training data and perform competitively with as low as 3%.
Tasks Image Captioning, Visual Question Answering
Published 2019-11-29
URL https://arxiv.org/abs/1912.00076v1
PDF https://arxiv.org/pdf/1912.00076v1.pdf
PWC https://paperswithcode.com/paper/optibox-breaking-the-limits-of-proposals-for
Repo
Framework

Non-rigid 3D shape retrieval based on multi-view metric learning

Title Non-rigid 3D shape retrieval based on multi-view metric learning
Authors Haohao Li, Shengfa Wang, Nannan Li, Zhixun Su, Ximin Liu
Abstract This study presents a novel multi-view metric learning algorithm, which aims to improve 3D non-rigid shape retrieval. With the development of non-rigid 3D shape analysis, there exist many shape descriptors. The intrinsic descriptors can be explored to construct various intrinsic representations for non-rigid 3D shape retrieval task. The different intrinsic representations (features) focus on different geometric properties to describe the same 3D shape, which makes the representations are related. Therefore, it is possible and necessary to learn multiple metrics for different representations jointly. We propose an effective multi-view metric learning algorithm by extending the Marginal Fisher Analysis (MFA) into the multi-view domain, and exploring Hilbert-Schmidt Independence Criteria (HSCI) as a diversity term to jointly learning the new metrics. The different classes can be separated by MFA in our method. Meanwhile, HSCI is exploited to make the multiple representations to be consensus. The learned metrics can reduce the redundancy between the multiple representations, and improve the accuracy of the retrieval results. Experiments are performed on SHREC’10 benchmarks, and the results show that the proposed method outperforms the state-of-the-art non-rigid 3D shape retrieval methods.
Tasks 3D Shape Analysis, 3D Shape Retrieval, Metric Learning
Published 2019-03-20
URL http://arxiv.org/abs/1904.00765v1
PDF http://arxiv.org/pdf/1904.00765v1.pdf
PWC https://paperswithcode.com/paper/non-rigid-3d-shape-retrieval-based-on-multi
Repo
Framework

Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding

Title Orthogonal Relation Transforms with Graph Context Modeling for Knowledge Graph Embedding
Authors Yun Tang, Jing Huang, Guangtao Wang, Xiaodong He, Bowen Zhou
Abstract Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE. However, N-1, 1-N and N-N predictions still remain challenging. In this work, we propose a novel translational distance-based approach for knowledge graph link prediction. The proposed method includes two-folds, first we extend the RotatE from 2D complex domain to high dimension space with orthogonal transforms to model relations for better modeling capacity. Second, the graph context is explicitly modeled via two directed context representations. These context representations are used as part of the distance scoring function to measure the plausibility of the triples during training and inference. The proposed approach effectively improves prediction accuracy on the difficult N-1, 1-N and N-N cases for knowledge graph link prediction task. The experimental results show that it achieves better performance on two benchmark data sets compared to the baseline RotatE, especially on data set (FB15k-237) with many high in-degree connection nodes.
Tasks Graph Embedding, Knowledge Graph Embedding, Link Prediction
Published 2019-11-09
URL https://arxiv.org/abs/1911.04910v1
PDF https://arxiv.org/pdf/1911.04910v1.pdf
PWC https://paperswithcode.com/paper/orthogonal-relation-transforms-with-graph
Repo
Framework

The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera

Title The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera
Authors Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang
Abstract In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera’s view is an important problem. In this paper, we propose a novel approach to calculate field-of-view (FOV) constraint of markers for camera. Our method can make the camera maintain the visibility of all feature points during the motion of mobile robot. According to the angular aperture of camera, the mobile robot can obtain the FOV constraint region where the camera cannot keep all feature points in an image. Based on the FOV constraint region, the mobile robot can be guided to move from the initial position to destination. Finally simulations and experiments are conducted based on a mobile robot equipped with a pan-tilt camera, which validates the effectiveness of the method to obtain the FOV constraints.
Tasks
Published 2019-09-24
URL https://arxiv.org/abs/1909.10682v1
PDF https://arxiv.org/pdf/1909.10682v1.pdf
PWC https://paperswithcode.com/paper/the-field-of-view-constraint-of-markers-for
Repo
Framework

A Sketch Based 3D Shape Retrieval Approach Based on Efficient Deep Point-to-Subspace Metric Learning

Title A Sketch Based 3D Shape Retrieval Approach Based on Efficient Deep Point-to-Subspace Metric Learning
Authors Yinjie Lei, Ziqin Zhou, Pingping Zhang, Yulan Guo, Zijun Ma, Lingqiao Liu
Abstract A sketch based 3D shape retrieval
Tasks 3D Shape Retrieval, Metric Learning
Published 2019-03-01
URL http://arxiv.org/abs/1903.00117v2
PDF http://arxiv.org/pdf/1903.00117v2.pdf
PWC https://paperswithcode.com/paper/a-sketch-based-3d-shape-retrieval-approach
Repo
Framework

Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval

Title Multi-feature Distance Metric Learning for Non-rigid 3D Shape Retrieval
Authors Huibing Wang, Haohao Li, Xianping Fu
Abstract In the past decades, feature-learning-based 3D shape retrieval approaches have been received widespread attention in the computer graphic community. These approaches usually explored the hand-crafted distance metric or conventional distance metric learning methods to compute the similarity of the single feature. The single feature always contains onefold geometric information, which cannot characterize the 3D shapes well. Therefore, the multiple features should be used for the retrieval task to overcome the limitation of single feature and further improve the performance. However, most conventional distance metric learning methods fail to integrate the complementary information from multiple features to construct the distance metric. To address these issue, a novel multi-feature distance metric learning method for non-rigid 3D shape retrieval is presented in this study, which can make full use of the complimentary geometric information from multiple shape features by utilizing the KL-divergences. Minimizing KL-divergence between different metric of features and a common metric is a consistency constraints, which can lead the consistency shared latent feature space of the multiple features. We apply the proposed method to 3D model retrieval, and test our method on well known benchmark database. The results show that our method substantially outperforms the state-of-the-art non-rigid 3D shape retrieval methods.
Tasks 3D Shape Retrieval, Metric Learning
Published 2019-01-10
URL http://arxiv.org/abs/1901.03031v1
PDF http://arxiv.org/pdf/1901.03031v1.pdf
PWC https://paperswithcode.com/paper/multi-feature-distance-metric-learning-for
Repo
Framework

Learning Low-Rank Approximation for CNNs

Title Learning Low-Rank Approximation for CNNs
Authors Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Gu-Yeon Wei
Abstract Low-rank approximation is an effective model compression technique to not only reduce parameter storage requirements, but to also reduce computations. For convolutional neural networks (CNNs), however, well-known low-rank approximation methods, such as Tucker or CP decomposition, result in degraded model accuracy because decomposed layers hinder training convergence. In this paper, we propose a new training technique that finds a flat minimum in the view of low-rank approximation without a decomposed structure during training. By preserving the original model structure, 2-dimensional low-rank approximation demanding lowering (such as im2col) is available in our proposed scheme. We show that CNN models can be compressed by low-rank approximation with much higher compression ratio than conventional training methods while maintaining or even enhancing model accuracy. We also discuss various 2-dimensional low-rank approximation techniques for CNNs.
Tasks Model Compression
Published 2019-05-24
URL https://arxiv.org/abs/1905.10145v1
PDF https://arxiv.org/pdf/1905.10145v1.pdf
PWC https://paperswithcode.com/paper/learning-low-rank-approximation-for-cnns
Repo
Framework

Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations

Title Independent Subspace Analysis for Unsupervised Learning of Disentangled Representations
Authors Jan Stühmer, Richard E. Turner, Sebastian Nowozin
Abstract Recently there has been an increased interest in unsupervised learning of disentangled representations using the Variational Autoencoder (VAE) framework. Most of the existing work has focused largely on modifying the variational cost function to achieve this goal. We first show that these modifications, e.g. beta-VAE, simplify the tendency of variational inference to underfit causing pathological over-pruning and over-orthogonalization of learned components. Second we propose a complementary approach: to modify the probabilistic model with a structured latent prior. This prior allows to discover latent variable representations that are structured into a hierarchy of independent vector spaces. The proposed prior has three major advantages: First, in contrast to the standard VAE normal prior the proposed prior is not rotationally invariant. This resolves the problem of unidentifiability of the standard VAE normal prior. Second, we demonstrate that the proposed prior encourages a disentangled latent representation which facilitates learning of disentangled representations. Third, extensive quantitative experiments demonstrate that the prior significantly mitigates the trade-off between reconstruction loss and disentanglement over the state of the art.
Tasks
Published 2019-09-05
URL https://arxiv.org/abs/1909.05063v1
PDF https://arxiv.org/pdf/1909.05063v1.pdf
PWC https://paperswithcode.com/paper/independent-subspace-analysis-for
Repo
Framework

Transfer Learning in Visual and Relational Reasoning

Title Transfer Learning in Visual and Relational Reasoning
Authors T. S. Jayram, Vincent Marois, Tomasz Kornuta, Vincent Albouy, Emre Sevgen, Ahmet S. Ozcan
Abstract Transfer learning has become the de facto standard in computer vision and natural language processing, especially where labeled data is scarce. Accuracy can be significantly improved by using pre-trained models and subsequent fine-tuning. In visual reasoning tasks, such as image question answering, transfer learning is more complex. In addition to transferring the capability to recognize visual features, we also expect to transfer the system’s ability to reason. Moreover, for video data, temporal reasoning adds another dimension. In this work, we formalize these unique aspects of transfer learning and propose a theoretical framework for visual reasoning, exemplified by the well-established CLEVR and COG datasets. Furthermore, we introduce a new, end-to-end differentiable recurrent model (SAMNet), which shows state-of-the-art accuracy and better performance in transfer learning on both datasets. The improved performance of SAMNet stems from its capability to decouple the abstract multi-step reasoning from the length of the sequence and its selective attention enabling to store only the question-relevant objects in the external memory.
Tasks Question Answering, Relational Reasoning, Transfer Learning, Visual Question Answering, Visual Reasoning
Published 2019-11-27
URL https://arxiv.org/abs/1911.11938v2
PDF https://arxiv.org/pdf/1911.11938v2.pdf
PWC https://paperswithcode.com/paper/transfer-learning-in-visual-and-relational
Repo
Framework

Smoothing and Interpolating Noisy GPS Data with Smoothing Splines

Title Smoothing and Interpolating Noisy GPS Data with Smoothing Splines
Authors Jeffrey J. Early, Adam M. Sykulski
Abstract A comprehensive methodology is provided for smoothing noisy, irregularly sampled data with non-Gaussian noise using smoothing splines. We demonstrate how the spline order and tension parameter can be chosen a priori from physical reasoning. We also show how to allow for non-Gaussian noise and outliers which are typical in GPS signals. We demonstrate the effectiveness of our methods on GPS trajectory data obtained from oceanographic floating instruments known as drifters.
Tasks
Published 2019-04-26
URL https://arxiv.org/abs/1904.12064v2
PDF https://arxiv.org/pdf/1904.12064v2.pdf
PWC https://paperswithcode.com/paper/smoothing-and-interpolating-noisy-gps-data
Repo
Framework

Question-Conditioned Counterfactual Image Generation for VQA

Title Question-Conditioned Counterfactual Image Generation for VQA
Authors Jingjing Pan, Yash Goyal, Stefan Lee
Abstract While Visual Question Answering (VQA) models continue to push the state-of-the-art forward, they largely remain black-boxes - failing to provide insight into how or why an answer is generated. In this ongoing work, we propose addressing this shortcoming by learning to generate counterfactual images for a VQA model - i.e. given a question-image pair, we wish to generate a new image such that i) the VQA model outputs a different answer, ii) the new image is minimally different from the original, and iii) the new image is realistic. Our hope is that providing such counterfactual examples allows users to investigate and understand the VQA model’s internal mechanisms.
Tasks Image Generation, Question Answering, Visual Question Answering
Published 2019-11-14
URL https://arxiv.org/abs/1911.06352v1
PDF https://arxiv.org/pdf/1911.06352v1.pdf
PWC https://paperswithcode.com/paper/question-conditioned-counterfactual-image
Repo
Framework

Understanding racial bias in health using the Medical Expenditure Panel Survey data

Title Understanding racial bias in health using the Medical Expenditure Panel Survey data
Authors Moninder Singh, Karthikeyan Natesan Ramamurthy
Abstract Over the years, several studies have demonstrated that there exist significant disparities in health indicators in the United States population across various groups. Healthcare expense is used as a proxy for health in algorithms that drive healthcare systems and this exacerbates the existing bias. In this work, we focus on the presence of racial bias in health indicators in the publicly available, and nationally representative Medical Expenditure Panel Survey (MEPS) data. We show that predictive models for care management trained using this data inherit this bias. Finally, we demonstrate that this inherited bias can be reduced significantly using simple mitigation techniques.
Tasks
Published 2019-11-04
URL https://arxiv.org/abs/1911.01509v1
PDF https://arxiv.org/pdf/1911.01509v1.pdf
PWC https://paperswithcode.com/paper/understanding-racial-bias-in-health-using-the
Repo
Framework

Adaptive Generation of Unrestricted Adversarial Inputs

Title Adaptive Generation of Unrestricted Adversarial Inputs
Authors Isaac Dunn, Hadrien Pouget, Tom Melham, Daniel Kroening
Abstract Neural networks are vulnerable to adversarially-constructed perturbations of their inputs. Most research so far has considered perturbations of a fixed magnitude under some $l_p$ norm. Although studying these attacks is valuable, there has been increasing interest in the construction of (and robustness to) unrestricted attacks, which are not constrained to a small and rather artificial subset of all possible adversarial inputs. We introduce a novel algorithm for generating such unrestricted adversarial inputs which, unlike prior work, is adaptive: it is able to tune its attacks to the classifier being targeted. It also offers a 400-2,000x speedup over the existing state of the art. We demonstrate our approach by generating unrestricted adversarial inputs that fool classifiers robust to perturbation-based attacks. We also show that, by virtue of being adaptive and unrestricted, our attack is able to defeat adversarial training against it.
Tasks
Published 2019-05-07
URL https://arxiv.org/abs/1905.02463v2
PDF https://arxiv.org/pdf/1905.02463v2.pdf
PWC https://paperswithcode.com/paper/generating-realistic-unrestricted-adversarial
Repo
Framework

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Title Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
Authors Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, Chung-Cheng Chiu, Semih Yavuz, Ruoming Pang, Wei Li, Colin Raffel
Abstract Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard, monotonic attention head to schedule the reading of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk’s adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values.
Tasks Machine Translation
Published 2019-06-12
URL https://arxiv.org/abs/1906.05218v1
PDF https://arxiv.org/pdf/1906.05218v1.pdf
PWC https://paperswithcode.com/paper/monotonic-infinite-lookback-attention-for
Repo
Framework

Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin using Recursive Neural Networks

Title Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin using Recursive Neural Networks
Authors Minh Nguyen, Gia H. Ngo, Nancy F. Chen
Abstract Logographs (Chinese characters) have recursive structures (i.e. hierarchies of sub-units in logographs) that contain phonological and semantic information, as developmental psychology literature suggests that native speakers leverage on the structures to learn how to read. Exploiting these structures could potentially lead to better embeddings that can benefit many downstream tasks. We propose building hierarchical logograph (character) embeddings from logograph recursive structures using treeLSTM, a recursive neural network. Using recursive neural network imposes a prior on the mapping from logographs to embeddings since the network must read in the sub-units in logographs according to the order specified by the recursive structures. Based on human behavior in language learning and reading, we hypothesize that modeling logographs’ structures using recursive neural network should be beneficial. To verify this claim, we consider two tasks (1) predicting logographs’ Cantonese pronunciation from logographic structures and (2) language modeling. Empirical results show that the proposed hierarchical embeddings outperform baseline approaches. Diagnostic analysis suggests that hierarchical embeddings constructed using treeLSTM is less sensitive to distractors, thus is more robust, especially on complex logographs.
Tasks Language Modelling
Published 2019-12-20
URL https://arxiv.org/abs/1912.09913v1
PDF https://arxiv.org/pdf/1912.09913v1.pdf
PWC https://paperswithcode.com/paper/hierarchical-character-embeddings-learning
Repo
Framework
comments powered by Disqus