July 28, 2019

2867 words 14 mins read

Paper Group ANR 370

Paper Group ANR 370

Strongly-Typed Agents are Guaranteed to Interact Safely. Correction of “Cloud Removal By Fusing Multi-Source and Multi-Temporal Images”. Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness. First-spike based visual categorization using reward-modulated STDP. Shape-Color Differential …

Strongly-Typed Agents are Guaranteed to Interact Safely

Title Strongly-Typed Agents are Guaranteed to Interact Safely
Authors David Balduzzi
Abstract As artificial agents proliferate, it is becoming increasingly important to ensure that their interactions with one another are well-behaved. In this paper, we formalize a common-sense notion of when algorithms are well-behaved: an algorithm is safe if it does no harm. Motivated by recent progress in deep learning, we focus on the specific case where agents update their actions according to gradient descent. The paper shows that that gradient descent converges to a Nash equilibrium in safe games. The main contribution is to define strongly-typed agents and show they are guaranteed to interact safely, thereby providing sufficient conditions to guarantee safe interactions. A series of examples show that strong-typing generalizes certain key features of convexity, is closely related to blind source separation, and introduces a new perspective on classical multilinear games based on tensor decomposition.
Tasks Common Sense Reasoning
Published 2017-02-24
URL http://arxiv.org/abs/1702.07450v2
PDF http://arxiv.org/pdf/1702.07450v2.pdf
PWC https://paperswithcode.com/paper/strongly-typed-agents-are-guaranteed-to
Repo
Framework

Correction of “Cloud Removal By Fusing Multi-Source and Multi-Temporal Images”

Title Correction of “Cloud Removal By Fusing Multi-Source and Multi-Temporal Images”
Authors Chengyue Zhang, Zhiwei Li, Qing Cheng, Xinghua Li, Huanfeng Shen
Abstract Remote sensing images often suffer from cloud cover. Cloud removal is required in many applications of remote sensing images. Multitemporal-based methods are popular and effective to cope with thick clouds. This paper contributes to a summarization and experimental comparation of the existing multitemporal-based methods. Furthermore, we propose a spatiotemporal-fusion with poisson-adjustment method to fuse multi-sensor and multi-temporal images for cloud removal. The experimental results show that the proposed method has potential to address the problem of accuracy reduction of cloud removal in multi-temporal images with significant changes.
Tasks
Published 2017-07-25
URL http://arxiv.org/abs/1707.09959v1
PDF http://arxiv.org/pdf/1707.09959v1.pdf
PWC https://paperswithcode.com/paper/correction-of-cloud-removal-by-fusing-multi
Repo
Framework

Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness

Title Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness
Authors Wufeng Xue, Andrea Lum, Ashley Mercado, Mark Landis, James Warringto, Shuo Li
Abstract Cardiac left ventricle (LV) quantification is among the most clinically important tasks for identification and diagnosis of cardiac diseases, yet still a challenge due to the high variability of cardiac structure and the complexity of temporal dynamics. Full quantification, i.e., to simultaneously quantify all LV indices including two areas (cavity and myocardium), six regional wall thicknesses (RWT), three LV dimensions, and one cardiac phase, is even more challenging since the uncertain relatedness intra and inter each type of indices may hinder the learning procedure from better convergence and generalization. In this paper, we propose a newly-designed multitask learning network (FullLVNet), which is constituted by a deep convolution neural network (CNN) for expressive feature embedding of cardiac structure; two followed parallel recurrent neural network (RNN) modules for temporal dynamic modeling; and four linear models for the final estimation. During the final estimation, both intra- and inter-task relatedness are modeled to enforce improvement of generalization: 1) respecting intra-task relatedness, group lasso is applied to each of the regression tasks for sparse and common feature selection and consistent prediction; 2) respecting inter-task relatedness, three phase-guided constraints are proposed to penalize violation of the temporal behavior of the obtained LV indices. Experiments on MR sequences of 145 subjects show that FullLVNet achieves high accurate prediction with our intra- and inter-task relatedness, leading to MAE of 190mm$^2$, 1.41mm, 2.68mm for average areas, RWT, dimensions and error rate of 10.4% for the phase classification. This endows our method a great potential in comprehensive clinical assessment of global, regional and dynamic cardiac function.
Tasks Feature Selection
Published 2017-06-06
URL http://arxiv.org/abs/1706.01912v2
PDF http://arxiv.org/pdf/1706.01912v2.pdf
PWC https://paperswithcode.com/paper/full-quantification-of-left-ventricle-via
Repo
Framework

First-spike based visual categorization using reward-modulated STDP

Title First-spike based visual categorization using reward-modulated STDP
Authors Milad Mozafari, Saeed Reza Kheradpisheh, Timothée Masquelier, Abbas Nowzari-Dalini, Mohammad Ganjtabesh
Abstract Reinforcement learning (RL) has recently regained popularity, with major achievements such as beating the European game of Go champion. Here, for the first time, we show that RL can be used efficiently to train a spiking neural network (SNN) to perform object recognition in natural images without using an external classifier. We used a feedforward convolutional SNN and a temporal coding scheme where the most strongly activated neurons fire first, while less activated ones fire later, or not at all. In the highest layers, each neuron was assigned to an object category, and it was assumed that the stimulus category was the category of the first neuron to fire. If this assumption was correct, the neuron was rewarded, i.e. spike-timing-dependent plasticity (STDP) was applied, which reinforced the neuron’s selectivity. Otherwise, anti-STDP was applied, which encouraged the neuron to learn something else. As demonstrated on various image datasets (Caltech, ETH-80, and NORB), this reward modulated STDP (R-STDP) approach extracted particularly discriminative visual features, whereas classic unsupervised STDP extracts any feature that consistently repeats. As a result, R-STDP outperformed STDP on these datasets. Furthermore, R-STDP is suitable for online learning, and can adapt to drastic changes such as label permutations. Finally, it is worth mentioning that both feature extraction and classification were done with spikes, using at most one spike per neuron. Thus the network is hardware friendly and energy efficient.
Tasks Game of Go, Object Recognition
Published 2017-05-25
URL http://arxiv.org/abs/1705.09132v3
PDF http://arxiv.org/pdf/1705.09132v3.pdf
PWC https://paperswithcode.com/paper/first-spike-based-visual-categorization-using
Repo
Framework

Shape-Color Differential Moment Invariants under Affine Transformations

Title Shape-Color Differential Moment Invariants under Affine Transformations
Authors Hanlin Mo, Shirui Li, You Hao, Hua Li
Abstract We propose the general construction formula of shape-color primitives by using partial differentials of each color channel in this paper. By using all kinds of shape-color primitives, shape-color differential moment invariants can be constructed very easily, which are invariant to the shape affine and color affine transforms. 50 instances of SCDMIs are obtained finally. In experiments, several commonly used color descriptors and SCDMIs are used in image classification and retrieval of color images, respectively. By comparing the experimental results, we find that SCDMIs get better results.
Tasks Image Classification
Published 2017-06-14
URL http://arxiv.org/abs/1706.04382v1
PDF http://arxiv.org/pdf/1706.04382v1.pdf
PWC https://paperswithcode.com/paper/shape-color-differential-moment-invariants
Repo
Framework

Dynamic Input Structure and Network Assembly for Few-Shot Learning

Title Dynamic Input Structure and Network Assembly for Few-Shot Learning
Authors Nathan Hilliard, Nathan O. Hodas, Courtney D. Corley
Abstract The ability to learn from a small number of examples has been a difficult problem in machine learning since its inception. While methods have succeeded with large amounts of training data, research has been underway in how to accomplish similar performance with fewer examples, known as one-shot or more generally few-shot learning. This technique has been shown to have promising performance, but in practice requires fixed-size inputs making it impractical for production systems where class sizes can vary. This impedes training and the final utility of few-shot learning systems. This paper describes an approach to constructing and training a network that can handle arbitrary example sizes dynamically as the system is used.
Tasks Few-Shot Learning
Published 2017-08-22
URL http://arxiv.org/abs/1708.06819v1
PDF http://arxiv.org/pdf/1708.06819v1.pdf
PWC https://paperswithcode.com/paper/dynamic-input-structure-and-network-assembly
Repo
Framework

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Title Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning
Authors Xiangxiang Chu, Hangjun Ye
Abstract Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents. In order to boost scalability, we propose a parameter sharing deterministic policy gradient method with three variants based on neural networks, including actor-critic sharing, actor sharing and actor sharing with partially shared critic. Benchmarks from rllab show that the proposed method has advantages in learning speed and memory efficiency, well scales with growing amount of agents, and moreover, it can make full use of reward sharing and exchangeability if possible.
Tasks Multi-agent Reinforcement Learning
Published 2017-10-01
URL http://arxiv.org/abs/1710.00336v2
PDF http://arxiv.org/pdf/1710.00336v2.pdf
PWC https://paperswithcode.com/paper/parameter-sharing-deep-deterministic-policy
Repo
Framework

Hybrid eye center localization using cascaded regression and hand-crafted model fitting

Title Hybrid eye center localization using cascaded regression and hand-crafted model fitting
Authors Alex Levinshtein, Edmund Phung, Parham Aarabi
Abstract We propose a new cascaded regressor for eye center detection. Previous methods start from a face or an eye detector and use either advanced features or powerful regressors for eye center localization, but not both. Instead, we detect the eyes more accurately using an existing facial feature alignment method. We improve the robustness of localization by using both advanced features and powerful regression machinery. Unlike most other methods that do not refine the regression results, we make the localization more accurate by adding a robust circle fitting post-processing step. Finally, using a simple hand-crafted method for eye center localization, we show how to train the cascaded regressor without the need for manually annotated training data. We evaluate our new approach and show that it achieves state-of-the-art performance on the BioID, GI4E, and the TalkingFace datasets. At an average normalized error of e < 0.05, the regressor trained on manually annotated data yields an accuracy of 95.07% (BioID), 99.27% (GI4E), and 95.68% (TalkingFace). The automatically trained regressor is nearly as good, yielding an accuracy of 93.9% (BioID), 99.27% (GI4E), and 95.46% (TalkingFace).
Tasks
Published 2017-12-07
URL http://arxiv.org/abs/1712.02822v1
PDF http://arxiv.org/pdf/1712.02822v1.pdf
PWC https://paperswithcode.com/paper/hybrid-eye-center-localization-using-cascaded
Repo
Framework

Tensors Come of Age: Why the AI Revolution will help HPC

Title Tensors Come of Age: Why the AI Revolution will help HPC
Authors John L. Gustafson, Lenore M. Mullin
Abstract This article discusses how the automation of tensor algorithms, based on A Mathematics of Arrays and Psi Calculus, and a new way to represent numbers, Unum Arithmetic, enables mechanically provable, scalable, portable, and more numerically accurate software.
Tasks
Published 2017-09-26
URL http://arxiv.org/abs/1709.09108v1
PDF http://arxiv.org/pdf/1709.09108v1.pdf
PWC https://paperswithcode.com/paper/tensors-come-of-age-why-the-ai-revolution
Repo
Framework
Title Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
Authors Lei Zhu, Zi Huang, Xiaobai Liu, Xiangnan He, Jingkuan Song, Xiaofang Zhou
Abstract Mobile landmark search (MLS) recently receives increasing attention for its great practical values. However, it still remains unsolved due to two important challenges. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images sent from mobile devices. In this paper, we propose a novel hashing scheme, named as canonical view based discrete multi-modal hashing (CV-DMH), to handle these problems via a novel three-stage learning procedure. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multi-modal sparse coding is applied to transform visual features from multiple modalities into an intermediate representation. It can robustly and adaptively characterize visual contents of varied landmark images with certain canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises. In this part, we develop a new augmented Lagrangian multiplier (ALM) based optimization method to directly solve the discrete binary codes. We can not only explicitly deal with the discrete constraint, but also consider the bit-uncorrelated constraint and balance constraint together. Experiments on real world landmark datasets demonstrate the superior performance of CV-DMH over several state-of-the-art methods.
Tasks
Published 2017-07-13
URL http://arxiv.org/abs/1707.04047v1
PDF http://arxiv.org/pdf/1707.04047v1.pdf
PWC https://paperswithcode.com/paper/discrete-multi-modal-hashing-with-canonical
Repo
Framework

Joint Semantic and Motion Segmentation for dynamic scenes using Deep Convolutional Networks

Title Joint Semantic and Motion Segmentation for dynamic scenes using Deep Convolutional Networks
Authors Nazrul Haque, N Dinesh Reddy, K. Madhava Krishna
Abstract Dynamic scene understanding is a challenging problem and motion segmentation plays a crucial role in solving it. Incorporating semantics and motion enhances the overall perception of the dynamic scene. For applications of outdoor robotic navigation, joint learning methods have not been extensively used for extracting spatio-temporal features or adding different priors into the formulation. The task becomes even more challenging without stereo information being incorporated. This paper proposes an approach to fuse semantic features and motion clues using CNNs, to address the problem of monocular semantic motion segmentation. We deduce semantic and motion labels by integrating optical flow as a constraint with semantic features into dilated convolution network. The pipeline consists of three main stages i.e Feature extraction, Feature amplification and Multi Scale Context Aggregation to fuse the semantics and flow features. Our joint formulation shows significant improvements in monocular motion segmentation over the state of the art methods on challenging KITTI tracking dataset.
Tasks Motion Segmentation, Optical Flow Estimation, Scene Understanding
Published 2017-04-18
URL http://arxiv.org/abs/1704.08331v1
PDF http://arxiv.org/pdf/1704.08331v1.pdf
PWC https://paperswithcode.com/paper/joint-semantic-and-motion-segmentation-for
Repo
Framework

Learning first-order definable concepts over structures of small degree

Title Learning first-order definable concepts over structures of small degree
Authors Martin Grohe, Martin Ritzert
Abstract We consider a declarative framework for machine learning where concepts and hypotheses are defined by formulas of a logic over some background structure. We show that within this framework, concepts defined by first-order formulas over a background structure of at most polylogarithmic degree can be learned in polylogarithmic time in the “probably approximately correct” learning sense.
Tasks
Published 2017-01-19
URL http://arxiv.org/abs/1701.05487v1
PDF http://arxiv.org/pdf/1701.05487v1.pdf
PWC https://paperswithcode.com/paper/learning-first-order-definable-concepts-over
Repo
Framework

Illuminant Estimation using Ensembles of Multivariate Regression Trees

Title Illuminant Estimation using Ensembles of Multivariate Regression Trees
Authors Peter van Beek, R. Wayne Oldford
Abstract White balancing is a fundamental step in the image processing pipeline. The process involves estimating the chromaticity of the illuminant or light source and using the estimate to correct the image to remove any color cast. Given the importance of the problem, there has been much previous work on illuminant estimation. Recently, an approach based on ensembles of univariate regression trees that are fit using the squared-error loss function has been proposed and shown to give excellent performance. In this paper, we show that a simpler and more accurate ensemble model can be learned by (i) using multivariate regression trees to take into account that the chromaticity components of the illuminant are correlated and constrained, and (ii) fitting each tree by directly minimizing a loss function of interest—such as recovery angular error or reproduction angular error—rather than indirectly using the squared-error loss function as a surrogate. We show empirically that overall our method leads to improved performance on diverse image sets.
Tasks
Published 2017-03-15
URL http://arxiv.org/abs/1703.05354v1
PDF http://arxiv.org/pdf/1703.05354v1.pdf
PWC https://paperswithcode.com/paper/illuminant-estimation-using-ensembles-of
Repo
Framework

View-Invariant Template Matching Using Homography Constraints

Title View-Invariant Template Matching Using Homography Constraints
Authors Sina Lotfian, Hassan Foroosh
Abstract Change in viewpoint is one of the major factors for variation in object appearance across different images. Thus, view-invariant object recognition is a challenging and important image understanding task. In this paper, we propose a method that can match objects in images taken under different viewpoints. Unlike most methods in the literature, no restriction on camera orientations or internal camera parameters are imposed and no prior knowledge of 3D structure of the object is required. We prove that when two cameras take pictures of the same object from two different viewing angels, the relationship between every quadruple of points reduces to the special case of homography with two equal eigenvalues. Based on this property, we formulate the problem as an error function that indicates how likely two sets of 2D points are projections of the same set of 3D points under two different cameras. Comprehensive set of experiments were conducted to prove the robustness of the method to noise, and evaluate its performance on real-world applications, such as face and object recognition.
Tasks Object Recognition
Published 2017-05-12
URL http://arxiv.org/abs/1705.04433v1
PDF http://arxiv.org/pdf/1705.04433v1.pdf
PWC https://paperswithcode.com/paper/view-invariant-template-matching-using
Repo
Framework

From Lifestyle Vlogs to Everyday Interactions

Title From Lifestyle Vlogs to Everyday Interactions
Authors David F. Fouhey, Wei-cheng Kuo, Alexei A. Efros, Jitendra Malik
Abstract A major stumbling block to progress in understanding basic human interactions, such as getting out of bed or opening a refrigerator, is lack of good training data. Most past efforts have gathered this data explicitly: starting with a laundry list of action labels, and then querying search engines for videos tagged with each label. In this work, we do the reverse and search implicitly: we start with a large collection of interaction-rich video data and then annotate and analyze it. We use Internet Lifestyle Vlogs as the source of surprisingly large and diverse interaction data. We show that by collecting the data first, we are able to achieve greater scale and far greater diversity in terms of actions and actors. Additionally, our data exposes biases built into common explicitly gathered data. We make sense of our data by analyzing the central component of interaction – hands. We benchmark two tasks: identifying semantic object contact at the video level and non-semantic contact state at the frame level. We additionally demonstrate future prediction of hands.
Tasks Future prediction
Published 2017-12-06
URL http://arxiv.org/abs/1712.02310v1
PDF http://arxiv.org/pdf/1712.02310v1.pdf
PWC https://paperswithcode.com/paper/from-lifestyle-vlogs-to-everyday-interactions
Repo
Framework
comments powered by Disqus