May 5, 2019

3120 words 15 mins read

Paper Group ANR 484

Paper Group ANR 484

Visual Question: Predicting If a Crowd Will Agree on the Answer. A Hybrid Approach to Query Answering under Expressive Datalog+/-. Efficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization. Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image. Photometric Bundle Adjustment for Vision-Based SLAM. First …

Visual Question: Predicting If a Crowd Will Agree on the Answer

Title Visual Question: Predicting If a Crowd Will Agree on the Answer
Authors Danna Gurari, Kristen Grauman
Abstract Visual question answering (VQA) systems are emerging from a desire to empower users to ask any natural language question about visual content and receive a valid answer in response. However, close examination of the VQA problem reveals an unavoidable, entangled problem that multiple humans may or may not always agree on a single answer to a visual question. We train a model to automatically predict from a visual question whether a crowd would agree on a single answer. We then propose how to exploit this system in a novel application to efficiently allocate human effort to collect answers to visual questions. Specifically, we propose a crowdsourcing system that automatically solicits fewer human responses when answer agreement is expected and more human responses when answer disagreement is expected. Our system improves upon existing crowdsourcing systems, typically eliminating at least 20% of human effort with no loss to the information collected from the crowd.
Tasks Question Answering, Visual Question Answering
Published 2016-08-29
URL http://arxiv.org/abs/1608.08188v1
PDF http://arxiv.org/pdf/1608.08188v1.pdf
PWC https://paperswithcode.com/paper/visual-question-predicting-if-a-crowd-will
Repo
Framework

A Hybrid Approach to Query Answering under Expressive Datalog+/-

Title A Hybrid Approach to Query Answering under Expressive Datalog+/-
Authors Mostafa Milani, Andrea Cali, Leopoldo Bertossi
Abstract Datalog+/- is a family of ontology languages that combine good computational properties with high expressive power. Datalog+/- languages are provably able to capture the most relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog+/- programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-acyclic TGDs. So far, only non-deterministic algorithms were known for answering queries on WS Datalog+/- programs. We present novel deterministic query answering algorithms under WS Datalog+/-. In particular, we propose: (1) a bottom-up grounding algorithm based on a query-driven chase, and (2) a hybrid approach based on transforming a WS program into a so-called sticky one, for which query rewriting techniques are known. We discuss how our algorithms can be optimized and effectively applied for query answering in real-world scenarios.
Tasks
Published 2016-04-22
URL http://arxiv.org/abs/1604.06770v2
PDF http://arxiv.org/pdf/1604.06770v2.pdf
PWC https://paperswithcode.com/paper/a-hybrid-approach-to-query-answering-under
Repo
Framework

Efficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization

Title Efficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization
Authors Francisco Chicano, Darrell Whitley, Renato Tinos
Abstract Local search algorithms and iterated local search algorithms are a basic technique. Local search can be a stand along search methods, but it can also be hybridized with evolutionary algorithms. Recently, it has been shown that it is possible to identify improving moves in Hamming neighborhoods for k-bounded pseudo-Boolean optimization problems in constant time. This means that local search does not need to enumerate neighborhoods to find improving moves. It also means that evolutionary algorithms do not need to use random mutation as a operator, except perhaps as a way to escape local optima. In this paper, we show how improving moves can be identified in constant time for multiobjective problems that are expressed as k-bounded pseudo-Boolean functions. In particular, multiobjective forms of NK Landscapes and Mk Landscapes are considered.
Tasks
Published 2016-01-27
URL http://arxiv.org/abs/1601.07596v1
PDF http://arxiv.org/pdf/1601.07596v1.pdf
PWC https://paperswithcode.com/paper/efficient-hill-climber-for-multi-objective
Repo
Framework

Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image

Title Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image
Authors Reza Fuad Rachmadi, Keiichi Uchimura, Gou Koutaki
Abstract Social event detection in a static image is a very challenging problem and it’s very useful for internet of things applications including automatic photo organization, ads recommender system, or image captioning. Several publications show that variety of objects, scene, and people can be very ambiguous for the system to decide the event that occurs in the image. We proposed the spatial pyramid configuration of convolutional neural network (CNN) classifier for social event detection in a static image. By applying the spatial pyramid configuration to the CNN classifier, the detail that occurs in the image can observe more accurately by the classifier. USED dataset provided by Ahmad et al. is used to evaluate our proposed method, which consists of two different image sets, EiMM, and SED dataset. As a result, the average accuracy of our system outperforms the baseline method by 15% and 2% respectively.
Tasks Image Captioning, Recommendation Systems
Published 2016-12-13
URL http://arxiv.org/abs/1612.04062v1
PDF http://arxiv.org/pdf/1612.04062v1.pdf
PWC https://paperswithcode.com/paper/spatial-pyramid-convolutional-neural-network
Repo
Framework

Photometric Bundle Adjustment for Vision-Based SLAM

Title Photometric Bundle Adjustment for Vision-Based SLAM
Authors Hatem Alismail, Brett Browning, Simon Lucey
Abstract We propose a novel algorithm for the joint refinement of structure and motion parameters from image data directly without relying on fixed and known correspondences. In contrast to traditional bundle adjustment (BA) where the optimal parameters are determined by minimizing the reprojection error using tracked features, the proposed algorithm relies on maximizing the photometric consistency and estimates the correspondences implicitly. Since the proposed algorithm does not require correspondences, its application is not limited to corner-like structure; any pixel with nonvanishing gradient could be used in the estimation process. Furthermore, we demonstrate the feasibility of refining the motion and structure parameters simultaneously using the photometric in unconstrained scenes and without requiring restrictive assumptions such as planarity. The proposed algorithm is evaluated on range of challenging outdoor datasets, and it is shown to improve upon the accuracy of the state-of-the-art VSLAM methods obtained using the minimization of the reprojection error using traditional BA as well as loop closure.
Tasks
Published 2016-08-05
URL http://arxiv.org/abs/1608.02026v1
PDF http://arxiv.org/pdf/1608.02026v1.pdf
PWC https://paperswithcode.com/paper/photometric-bundle-adjustment-for-vision
Repo
Framework

First Steps Toward Camera Model Identification with Convolutional Neural Networks

Title First Steps Toward Camera Model Identification with Convolutional Neural Networks
Authors Luca Bondi, Luca Baroffio, David Güera, Paolo Bestagini, Edward J. Delp, Stefano Tubaro
Abstract Detecting the camera model used to shoot a picture enables to solve a wide series of forensic problems, from copyright infringement to ownership attribution. For this reason, the forensic community has developed a set of camera model identification algorithms that exploit characteristic traces left on acquired images by the processing pipelines specific of each camera model. In this paper, we investigate a novel approach to solve camera model identification problem. Specifically, we propose a data-driven algorithm based on convolutional neural networks, which learns features characterizing each camera model directly from the acquired pictures. Results on a well-known dataset of 18 camera models show that: (i) the proposed method outperforms up-to-date state-of-the-art algorithms on classification of 64x64 color image patches; (ii) features learned by the proposed network generalize to camera models never used for training.
Tasks
Published 2016-03-03
URL http://arxiv.org/abs/1603.01068v2
PDF http://arxiv.org/pdf/1603.01068v2.pdf
PWC https://paperswithcode.com/paper/first-steps-toward-camera-model
Repo
Framework

Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification

Title Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification
Authors Ting Chen, Yizhou Sun
Abstract In this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network. To address the challenges, we propose a task-guided and path-augmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using path-augmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods.
Tasks Feature Engineering, Network Embedding
Published 2016-12-08
URL http://arxiv.org/abs/1612.02814v2
PDF http://arxiv.org/pdf/1612.02814v2.pdf
PWC https://paperswithcode.com/paper/task-guided-and-path-augmented-heterogeneous
Repo
Framework

Joint Data Compression and MAC Protocol Design for Smartgrids with Renewable Energy

Title Joint Data Compression and MAC Protocol Design for Smartgrids with Renewable Energy
Authors Le Thanh Tan, Long Bao Le
Abstract In this paper, we consider the joint design of data compression and 802.15.4-based medium access control (MAC) protocol for smartgrids with renewable energy. We study the setting where a number of nodes, each of which comprises electricity load and/or renewable sources, report periodically their injected powers to a data concentrator. Our design exploits the correlation of the reported data in both time and space to efficiently design the data compression using the compressed sensing (CS) technique and theMAC protocol so that the reported data can be recovered reliably within minimum reporting time. Specifically, we perform the following design tasks: i) we employ the two-dimensional (2D) CS technique to compress the reported data in the distributed manner; ii) we propose to adapt the 802.15.4 MAC protocol frame structure to enable efficient data transmission and reliable data reconstruction; and iii) we develop an analytical model based on which we can obtain efficient MAC parameter configuration to minimize the reporting delay. Finally, numerical results are presented to demonstrate the effectiveness of our proposed framework compared to existing solutions.
Tasks
Published 2016-06-15
URL http://arxiv.org/abs/1606.04995v1
PDF http://arxiv.org/pdf/1606.04995v1.pdf
PWC https://paperswithcode.com/paper/joint-data-compression-and-mac-protocol
Repo
Framework

A Photometrically Calibrated Benchmark For Monocular Visual Odometry

Title A Photometrically Calibrated Benchmark For Monocular Visual Odometry
Authors Jakob Engel, Vladyslav Usenko, Daniel Cremers
Abstract We present a dataset for evaluating the tracking accuracy of monocular visual odometry and SLAM methods. It contains 50 real-world sequences comprising more than 100 minutes of video, recorded across dozens of different environments – ranging from narrow indoor corridors to wide outdoor scenes. All sequences contain mostly exploring camera motion, starting and ending at the same position. This allows to evaluate tracking accuracy via the accumulated drift from start to end, without requiring ground truth for the full sequence. In contrast to existing datasets, all sequences are photometrically calibrated. We provide exposure times for each frame as reported by the sensor, the camera response function, and dense lens attenuation factors. We also propose a novel, simple approach to non-parametric vignette calibration, which requires minimal set-up and is easy to reproduce. Finally, we thoroughly evaluate two existing methods (ORB-SLAM and DSO) on the dataset, including an analysis of the effect of image resolution, camera field of view, and the camera motion direction.
Tasks Calibration, Monocular Visual Odometry, Visual Odometry
Published 2016-07-09
URL http://arxiv.org/abs/1607.02555v2
PDF http://arxiv.org/pdf/1607.02555v2.pdf
PWC https://paperswithcode.com/paper/a-photometrically-calibrated-benchmark-for
Repo
Framework

The IMP game: Learnability, approximability and adversarial learning beyond $Σ^0_1$

Title The IMP game: Learnability, approximability and adversarial learning beyond $Σ^0_1$
Authors Michael Brand, David L. Dowe
Abstract We introduce a problem set-up we call the Iterated Matching Pennies (IMP) game and show that it is a powerful framework for the study of three problems: adversarial learnability, conventional (i.e., non-adversarial) learnability and approximability. Using it, we are able to derive the following theorems. (1) It is possible to learn by example all of $\Sigma^0_1 \cup \Pi^0_1$ as well as some supersets; (2) in adversarial learning (which we describe as a pursuit-evasion game), the pursuer has a winning strategy (in other words, $\Sigma^0_1$ can be learned adversarially, but $\Pi^0_1$ not); (3) some languages in $\Pi^0_1$ cannot be approximated by any language in $\Sigma^0_1$. We show corresponding results also for $\Sigma^0_i$ and $\Pi^0_i$ for arbitrary $i$.
Tasks
Published 2016-02-07
URL http://arxiv.org/abs/1602.02743v1
PDF http://arxiv.org/pdf/1602.02743v1.pdf
PWC https://paperswithcode.com/paper/the-imp-game-learnability-approximability-and
Repo
Framework

Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition

Title Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
Authors Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu
Abstract Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we propose a novel end-to-end Mask-CNN model without the fully connected layers for fine-grained recognition. Based on the part annotations of fine-grained images, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate object/part masks for selecting useful and meaningful convolutional descriptors. After that, a four-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. The proposed Mask-CNN model has the smallest number of parameters, lowest feature dimensionality and highest recognition accuracy when compared with state-of-the-arts fine-grained approaches.
Tasks Fine-Grained Image Recognition
Published 2016-05-23
URL http://arxiv.org/abs/1605.06878v1
PDF http://arxiv.org/pdf/1605.06878v1.pdf
PWC https://paperswithcode.com/paper/mask-cnn-localizing-parts-and-selecting
Repo
Framework

Multi-Region Neural Representation: A novel model for decoding visual stimuli in human brains

Title Multi-Region Neural Representation: A novel model for decoding visual stimuli in human brains
Authors Muhammad Yousefnezhad, Daoqiang Zhang
Abstract Multivariate Pattern (MVP) classification holds enormous potential for decoding visual stimuli in the human brain by employing task-based fMRI data sets. There is a wide range of challenges in the MVP techniques, i.e. decreasing noise and sparsity, defining effective regions of interest (ROIs), visualizing results, and the cost of brain studies. In overcoming these challenges, this paper proposes a novel model of neural representation, which can automatically detect the active regions for each visual stimulus and then utilize these anatomical regions for visualizing and analyzing the functional activities. Therefore, this model provides an opportunity for neuroscientists to ask this question: what is the effect of a stimulus on each of the detected regions instead of just study the fluctuation of voxels in the manually selected ROIs. Moreover, our method introduces analyzing snapshots of brain image for decreasing sparsity rather than using the whole of fMRI time series. Further, a new Gaussian smoothing method is proposed for removing noise of voxels in the level of ROIs. The proposed method enables us to combine different fMRI data sets for reducing the cost of brain studies. Experimental studies on 4 visual categories (words, consonants, objects and nonsense photos) confirm that the proposed method achieves superior performance to state-of-the-art methods.
Tasks Time Series
Published 2016-12-26
URL http://arxiv.org/abs/1612.08392v1
PDF http://arxiv.org/pdf/1612.08392v1.pdf
PWC https://paperswithcode.com/paper/multi-region-neural-representation-a-novel
Repo
Framework

Unsupervised Perceptual Rewards for Imitation Learning

Title Unsupervised Perceptual Rewards for Imitation Learning
Authors Pierre Sermanet, Kelvin Xu, Sergey Levine
Abstract Reward function design and exploration time are arguably the biggest obstacles to the deployment of reinforcement learning (RL) agents in the real world. In many real-world tasks, designing a reward function takes considerable hand engineering and often requires additional sensors to be installed just to measure whether the task has been executed successfully. Furthermore, many interesting tasks consist of multiple implicit intermediate steps that must be executed in sequence. Even when the final outcome can be measured, it does not necessarily provide feedback on these intermediate steps. To address these issues, we propose leveraging the abstraction power of intermediate visual representations learned by deep models to quickly infer perceptual reward functions from small numbers of demonstrations. We present a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps. This method makes use of the features in a pre-trained deep model, but does not require any explicit specification of sub-goals. The resulting reward functions can then be used by an RL agent to learn to perform the task in real-world settings. To evaluate the learned reward, we present qualitative results on two real-world tasks and a quantitative evaluation against a human-designed reward function. We also show that our method can be used to learn a real-world door opening skill using a real robot, even when the demonstration used for reward learning is provided by a human using their own hand. To our knowledge, these are the first results showing that complex robotic manipulation skills can be learned directly and without supervised labels from a video of a human performing the task. Supplementary material and data are available at https://sermanet.github.io/rewards
Tasks Imitation Learning
Published 2016-12-20
URL http://arxiv.org/abs/1612.06699v3
PDF http://arxiv.org/pdf/1612.06699v3.pdf
PWC https://paperswithcode.com/paper/unsupervised-perceptual-rewards-for-imitation
Repo
Framework

Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks

Title Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks
Authors Jason M. Klusowski, Andrew R. Barron
Abstract Let $ f^{\star} $ be a function on $ \mathbb{R}^d $ with an assumption of a spectral norm $ v_{f^{\star}} $. For various noise settings, we show that $ \mathbb{E}\hat{f} - f^{\star} ^2 \leq \left(v^4_{f^{\star}}\frac{\log d}{n}\right)^{1/3} $, where $ n $ is the sample size and $ \hat{f} $ is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of sinusoidal, sigmoidal, ramp, ramp-squared or other smooth ridge functions. The candidate fits may be chosen from a continuum of functions, thus avoiding the rigidity of discretizations of the parameter space. On the other hand, if the candidate fits are chosen from a discretization, we show that $ \mathbb{E}\hat{f} - f^{\star} ^2 \leq \left(v^3_{f^{\star}}\frac{\log d}{n}\right)^{2/5} $. This work bridges non-linear and non-parametric function estimation and includes single-hidden layer nets. Unlike past theory for such settings, our bound shows that the risk is small even when the input dimension $ d $ of an infinite-dimensional parameterized dictionary is much larger than the available sample size. When the dimension is larger than the cube root of the sample size, this quantity is seen to improve the more familiar risk bound of $ v_{f^{\star}}\left(\frac{d\log (n/d)}{n}\right)^{1/2} $, also investigated here.
Tasks
Published 2016-07-05
URL http://arxiv.org/abs/1607.01434v4
PDF http://arxiv.org/pdf/1607.01434v4.pdf
PWC https://paperswithcode.com/paper/risk-bounds-for-high-dimensional-ridge
Repo
Framework

Mutual Transformation of Information and Knowledge

Title Mutual Transformation of Information and Knowledge
Authors Olegs Verhodubs
Abstract Information and knowledge are transformable into each other. Information transformation into knowledge by the example of rule generation from OWL (Web Ontology Language) ontology has been shown during the development of the SWES (Semantic Web Expert System). The SWES is expected as an expert system for searching OWL ontologies from the Web, generating rules from the found ontologies and supplementing the SWES knowledge base with these rules. The purpose of this paper is to show knowledge transformation into information by the example of ontology generation from rules.
Tasks
Published 2016-04-26
URL http://arxiv.org/abs/1604.07625v1
PDF http://arxiv.org/pdf/1604.07625v1.pdf
PWC https://paperswithcode.com/paper/mutual-transformation-of-information-and
Repo
Framework
comments powered by Disqus