Paper Group ANR 484
Visual Question: Predicting If a Crowd Will Agree on the Answer. A Hybrid Approach to Query Answering under Expressive Datalog+/-. Efficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization. Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image. Photometric Bundle Adjustment for Vision-Based SLAM. First …
Visual Question: Predicting If a Crowd Will Agree on the Answer
Title | Visual Question: Predicting If a Crowd Will Agree on the Answer |
Authors | Danna Gurari, Kristen Grauman |
Abstract | Visual question answering (VQA) systems are emerging from a desire to empower users to ask any natural language question about visual content and receive a valid answer in response. However, close examination of the VQA problem reveals an unavoidable, entangled problem that multiple humans may or may not always agree on a single answer to a visual question. We train a model to automatically predict from a visual question whether a crowd would agree on a single answer. We then propose how to exploit this system in a novel application to efficiently allocate human effort to collect answers to visual questions. Specifically, we propose a crowdsourcing system that automatically solicits fewer human responses when answer agreement is expected and more human responses when answer disagreement is expected. Our system improves upon existing crowdsourcing systems, typically eliminating at least 20% of human effort with no loss to the information collected from the crowd. |
Tasks | Question Answering, Visual Question Answering |
Published | 2016-08-29 |
URL | http://arxiv.org/abs/1608.08188v1 |
http://arxiv.org/pdf/1608.08188v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-question-predicting-if-a-crowd-will |
Repo | |
Framework | |
A Hybrid Approach to Query Answering under Expressive Datalog+/-
Title | A Hybrid Approach to Query Answering under Expressive Datalog+/- |
Authors | Mostafa Milani, Andrea Cali, Leopoldo Bertossi |
Abstract | Datalog+/- is a family of ontology languages that combine good computational properties with high expressive power. Datalog+/- languages are provably able to capture the most relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog+/- programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-acyclic TGDs. So far, only non-deterministic algorithms were known for answering queries on WS Datalog+/- programs. We present novel deterministic query answering algorithms under WS Datalog+/-. In particular, we propose: (1) a bottom-up grounding algorithm based on a query-driven chase, and (2) a hybrid approach based on transforming a WS program into a so-called sticky one, for which query rewriting techniques are known. We discuss how our algorithms can be optimized and effectively applied for query answering in real-world scenarios. |
Tasks | |
Published | 2016-04-22 |
URL | http://arxiv.org/abs/1604.06770v2 |
http://arxiv.org/pdf/1604.06770v2.pdf | |
PWC | https://paperswithcode.com/paper/a-hybrid-approach-to-query-answering-under |
Repo | |
Framework | |
Efficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization
Title | Efficient Hill-Climber for Multi-Objective Pseudo-Boolean Optimization |
Authors | Francisco Chicano, Darrell Whitley, Renato Tinos |
Abstract | Local search algorithms and iterated local search algorithms are a basic technique. Local search can be a stand along search methods, but it can also be hybridized with evolutionary algorithms. Recently, it has been shown that it is possible to identify improving moves in Hamming neighborhoods for k-bounded pseudo-Boolean optimization problems in constant time. This means that local search does not need to enumerate neighborhoods to find improving moves. It also means that evolutionary algorithms do not need to use random mutation as a operator, except perhaps as a way to escape local optima. In this paper, we show how improving moves can be identified in constant time for multiobjective problems that are expressed as k-bounded pseudo-Boolean functions. In particular, multiobjective forms of NK Landscapes and Mk Landscapes are considered. |
Tasks | |
Published | 2016-01-27 |
URL | http://arxiv.org/abs/1601.07596v1 |
http://arxiv.org/pdf/1601.07596v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-hill-climber-for-multi-objective |
Repo | |
Framework | |
Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image
Title | Spatial Pyramid Convolutional Neural Network for Social Event Detection in Static Image |
Authors | Reza Fuad Rachmadi, Keiichi Uchimura, Gou Koutaki |
Abstract | Social event detection in a static image is a very challenging problem and it’s very useful for internet of things applications including automatic photo organization, ads recommender system, or image captioning. Several publications show that variety of objects, scene, and people can be very ambiguous for the system to decide the event that occurs in the image. We proposed the spatial pyramid configuration of convolutional neural network (CNN) classifier for social event detection in a static image. By applying the spatial pyramid configuration to the CNN classifier, the detail that occurs in the image can observe more accurately by the classifier. USED dataset provided by Ahmad et al. is used to evaluate our proposed method, which consists of two different image sets, EiMM, and SED dataset. As a result, the average accuracy of our system outperforms the baseline method by 15% and 2% respectively. |
Tasks | Image Captioning, Recommendation Systems |
Published | 2016-12-13 |
URL | http://arxiv.org/abs/1612.04062v1 |
http://arxiv.org/pdf/1612.04062v1.pdf | |
PWC | https://paperswithcode.com/paper/spatial-pyramid-convolutional-neural-network |
Repo | |
Framework | |
Photometric Bundle Adjustment for Vision-Based SLAM
Title | Photometric Bundle Adjustment for Vision-Based SLAM |
Authors | Hatem Alismail, Brett Browning, Simon Lucey |
Abstract | We propose a novel algorithm for the joint refinement of structure and motion parameters from image data directly without relying on fixed and known correspondences. In contrast to traditional bundle adjustment (BA) where the optimal parameters are determined by minimizing the reprojection error using tracked features, the proposed algorithm relies on maximizing the photometric consistency and estimates the correspondences implicitly. Since the proposed algorithm does not require correspondences, its application is not limited to corner-like structure; any pixel with nonvanishing gradient could be used in the estimation process. Furthermore, we demonstrate the feasibility of refining the motion and structure parameters simultaneously using the photometric in unconstrained scenes and without requiring restrictive assumptions such as planarity. The proposed algorithm is evaluated on range of challenging outdoor datasets, and it is shown to improve upon the accuracy of the state-of-the-art VSLAM methods obtained using the minimization of the reprojection error using traditional BA as well as loop closure. |
Tasks | |
Published | 2016-08-05 |
URL | http://arxiv.org/abs/1608.02026v1 |
http://arxiv.org/pdf/1608.02026v1.pdf | |
PWC | https://paperswithcode.com/paper/photometric-bundle-adjustment-for-vision |
Repo | |
Framework | |
First Steps Toward Camera Model Identification with Convolutional Neural Networks
Title | First Steps Toward Camera Model Identification with Convolutional Neural Networks |
Authors | Luca Bondi, Luca Baroffio, David Güera, Paolo Bestagini, Edward J. Delp, Stefano Tubaro |
Abstract | Detecting the camera model used to shoot a picture enables to solve a wide series of forensic problems, from copyright infringement to ownership attribution. For this reason, the forensic community has developed a set of camera model identification algorithms that exploit characteristic traces left on acquired images by the processing pipelines specific of each camera model. In this paper, we investigate a novel approach to solve camera model identification problem. Specifically, we propose a data-driven algorithm based on convolutional neural networks, which learns features characterizing each camera model directly from the acquired pictures. Results on a well-known dataset of 18 camera models show that: (i) the proposed method outperforms up-to-date state-of-the-art algorithms on classification of 64x64 color image patches; (ii) features learned by the proposed network generalize to camera models never used for training. |
Tasks | |
Published | 2016-03-03 |
URL | http://arxiv.org/abs/1603.01068v2 |
http://arxiv.org/pdf/1603.01068v2.pdf | |
PWC | https://paperswithcode.com/paper/first-steps-toward-camera-model |
Repo | |
Framework | |
Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification
Title | Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification |
Authors | Ting Chen, Yizhou Sun |
Abstract | In this paper, we study the problem of author identification under double-blind review setting, which is to identify potential authors given information of an anonymized paper. Different from existing approaches that rely heavily on feature engineering, we propose to use network embedding approach to address the problem, which can automatically represent nodes into lower dimensional feature vectors. However, there are two major limitations in recent studies on network embedding: (1) they are usually general-purpose embedding methods, which are independent of the specific tasks; and (2) most of these approaches can only deal with homogeneous networks, where the heterogeneity of the network is ignored. Hence, challenges faced here are two folds: (1) how to embed the network under the guidance of the author identification task, and (2) how to select the best type of information due to the heterogeneity of the network. To address the challenges, we propose a task-guided and path-augmented heterogeneous network embedding model. In our model, nodes are first embedded as vectors in latent feature space. Embeddings are then shared and jointly trained according to task-specific and network-general objectives. We extend the existing unsupervised network embedding to incorporate meta paths in heterogeneous networks, and select paths according to the specific task. The guidance from author identification task for network embedding is provided both explicitly in joint training and implicitly during meta path selection. Our experiments demonstrate that by using path-augmented network embedding with task guidance, our model can obtain significantly better accuracy at identifying the true authors comparing to existing methods. |
Tasks | Feature Engineering, Network Embedding |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02814v2 |
http://arxiv.org/pdf/1612.02814v2.pdf | |
PWC | https://paperswithcode.com/paper/task-guided-and-path-augmented-heterogeneous |
Repo | |
Framework | |
Joint Data Compression and MAC Protocol Design for Smartgrids with Renewable Energy
Title | Joint Data Compression and MAC Protocol Design for Smartgrids with Renewable Energy |
Authors | Le Thanh Tan, Long Bao Le |
Abstract | In this paper, we consider the joint design of data compression and 802.15.4-based medium access control (MAC) protocol for smartgrids with renewable energy. We study the setting where a number of nodes, each of which comprises electricity load and/or renewable sources, report periodically their injected powers to a data concentrator. Our design exploits the correlation of the reported data in both time and space to efficiently design the data compression using the compressed sensing (CS) technique and theMAC protocol so that the reported data can be recovered reliably within minimum reporting time. Specifically, we perform the following design tasks: i) we employ the two-dimensional (2D) CS technique to compress the reported data in the distributed manner; ii) we propose to adapt the 802.15.4 MAC protocol frame structure to enable efficient data transmission and reliable data reconstruction; and iii) we develop an analytical model based on which we can obtain efficient MAC parameter configuration to minimize the reporting delay. Finally, numerical results are presented to demonstrate the effectiveness of our proposed framework compared to existing solutions. |
Tasks | |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04995v1 |
http://arxiv.org/pdf/1606.04995v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-data-compression-and-mac-protocol |
Repo | |
Framework | |
A Photometrically Calibrated Benchmark For Monocular Visual Odometry
Title | A Photometrically Calibrated Benchmark For Monocular Visual Odometry |
Authors | Jakob Engel, Vladyslav Usenko, Daniel Cremers |
Abstract | We present a dataset for evaluating the tracking accuracy of monocular visual odometry and SLAM methods. It contains 50 real-world sequences comprising more than 100 minutes of video, recorded across dozens of different environments – ranging from narrow indoor corridors to wide outdoor scenes. All sequences contain mostly exploring camera motion, starting and ending at the same position. This allows to evaluate tracking accuracy via the accumulated drift from start to end, without requiring ground truth for the full sequence. In contrast to existing datasets, all sequences are photometrically calibrated. We provide exposure times for each frame as reported by the sensor, the camera response function, and dense lens attenuation factors. We also propose a novel, simple approach to non-parametric vignette calibration, which requires minimal set-up and is easy to reproduce. Finally, we thoroughly evaluate two existing methods (ORB-SLAM and DSO) on the dataset, including an analysis of the effect of image resolution, camera field of view, and the camera motion direction. |
Tasks | Calibration, Monocular Visual Odometry, Visual Odometry |
Published | 2016-07-09 |
URL | http://arxiv.org/abs/1607.02555v2 |
http://arxiv.org/pdf/1607.02555v2.pdf | |
PWC | https://paperswithcode.com/paper/a-photometrically-calibrated-benchmark-for |
Repo | |
Framework | |
The IMP game: Learnability, approximability and adversarial learning beyond $Σ^0_1$
Title | The IMP game: Learnability, approximability and adversarial learning beyond $Σ^0_1$ |
Authors | Michael Brand, David L. Dowe |
Abstract | We introduce a problem set-up we call the Iterated Matching Pennies (IMP) game and show that it is a powerful framework for the study of three problems: adversarial learnability, conventional (i.e., non-adversarial) learnability and approximability. Using it, we are able to derive the following theorems. (1) It is possible to learn by example all of $\Sigma^0_1 \cup \Pi^0_1$ as well as some supersets; (2) in adversarial learning (which we describe as a pursuit-evasion game), the pursuer has a winning strategy (in other words, $\Sigma^0_1$ can be learned adversarially, but $\Pi^0_1$ not); (3) some languages in $\Pi^0_1$ cannot be approximated by any language in $\Sigma^0_1$. We show corresponding results also for $\Sigma^0_i$ and $\Pi^0_i$ for arbitrary $i$. |
Tasks | |
Published | 2016-02-07 |
URL | http://arxiv.org/abs/1602.02743v1 |
http://arxiv.org/pdf/1602.02743v1.pdf | |
PWC | https://paperswithcode.com/paper/the-imp-game-learnability-approximability-and |
Repo | |
Framework | |
Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition
Title | Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition |
Authors | Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu |
Abstract | Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we propose a novel end-to-end Mask-CNN model without the fully connected layers for fine-grained recognition. Based on the part annotations of fine-grained images, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate object/part masks for selecting useful and meaningful convolutional descriptors. After that, a four-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. The proposed Mask-CNN model has the smallest number of parameters, lowest feature dimensionality and highest recognition accuracy when compared with state-of-the-arts fine-grained approaches. |
Tasks | Fine-Grained Image Recognition |
Published | 2016-05-23 |
URL | http://arxiv.org/abs/1605.06878v1 |
http://arxiv.org/pdf/1605.06878v1.pdf | |
PWC | https://paperswithcode.com/paper/mask-cnn-localizing-parts-and-selecting |
Repo | |
Framework | |
Multi-Region Neural Representation: A novel model for decoding visual stimuli in human brains
Title | Multi-Region Neural Representation: A novel model for decoding visual stimuli in human brains |
Authors | Muhammad Yousefnezhad, Daoqiang Zhang |
Abstract | Multivariate Pattern (MVP) classification holds enormous potential for decoding visual stimuli in the human brain by employing task-based fMRI data sets. There is a wide range of challenges in the MVP techniques, i.e. decreasing noise and sparsity, defining effective regions of interest (ROIs), visualizing results, and the cost of brain studies. In overcoming these challenges, this paper proposes a novel model of neural representation, which can automatically detect the active regions for each visual stimulus and then utilize these anatomical regions for visualizing and analyzing the functional activities. Therefore, this model provides an opportunity for neuroscientists to ask this question: what is the effect of a stimulus on each of the detected regions instead of just study the fluctuation of voxels in the manually selected ROIs. Moreover, our method introduces analyzing snapshots of brain image for decreasing sparsity rather than using the whole of fMRI time series. Further, a new Gaussian smoothing method is proposed for removing noise of voxels in the level of ROIs. The proposed method enables us to combine different fMRI data sets for reducing the cost of brain studies. Experimental studies on 4 visual categories (words, consonants, objects and nonsense photos) confirm that the proposed method achieves superior performance to state-of-the-art methods. |
Tasks | Time Series |
Published | 2016-12-26 |
URL | http://arxiv.org/abs/1612.08392v1 |
http://arxiv.org/pdf/1612.08392v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-region-neural-representation-a-novel |
Repo | |
Framework | |
Unsupervised Perceptual Rewards for Imitation Learning
Title | Unsupervised Perceptual Rewards for Imitation Learning |
Authors | Pierre Sermanet, Kelvin Xu, Sergey Levine |
Abstract | Reward function design and exploration time are arguably the biggest obstacles to the deployment of reinforcement learning (RL) agents in the real world. In many real-world tasks, designing a reward function takes considerable hand engineering and often requires additional sensors to be installed just to measure whether the task has been executed successfully. Furthermore, many interesting tasks consist of multiple implicit intermediate steps that must be executed in sequence. Even when the final outcome can be measured, it does not necessarily provide feedback on these intermediate steps. To address these issues, we propose leveraging the abstraction power of intermediate visual representations learned by deep models to quickly infer perceptual reward functions from small numbers of demonstrations. We present a method that is able to identify key intermediate steps of a task from only a handful of demonstration sequences, and automatically identify the most discriminative features for identifying these steps. This method makes use of the features in a pre-trained deep model, but does not require any explicit specification of sub-goals. The resulting reward functions can then be used by an RL agent to learn to perform the task in real-world settings. To evaluate the learned reward, we present qualitative results on two real-world tasks and a quantitative evaluation against a human-designed reward function. We also show that our method can be used to learn a real-world door opening skill using a real robot, even when the demonstration used for reward learning is provided by a human using their own hand. To our knowledge, these are the first results showing that complex robotic manipulation skills can be learned directly and without supervised labels from a video of a human performing the task. Supplementary material and data are available at https://sermanet.github.io/rewards |
Tasks | Imitation Learning |
Published | 2016-12-20 |
URL | http://arxiv.org/abs/1612.06699v3 |
http://arxiv.org/pdf/1612.06699v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-perceptual-rewards-for-imitation |
Repo | |
Framework | |
Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks
Title | Risk Bounds for High-dimensional Ridge Function Combinations Including Neural Networks |
Authors | Jason M. Klusowski, Andrew R. Barron |
Abstract | Let $ f^{\star} $ be a function on $ \mathbb{R}^d $ with an assumption of a spectral norm $ v_{f^{\star}} $. For various noise settings, we show that $ \mathbb{E}\hat{f} - f^{\star} ^2 \leq \left(v^4_{f^{\star}}\frac{\log d}{n}\right)^{1/3} $, where $ n $ is the sample size and $ \hat{f} $ is either a penalized least squares estimator or a greedily obtained version of such using linear combinations of sinusoidal, sigmoidal, ramp, ramp-squared or other smooth ridge functions. The candidate fits may be chosen from a continuum of functions, thus avoiding the rigidity of discretizations of the parameter space. On the other hand, if the candidate fits are chosen from a discretization, we show that $ \mathbb{E}\hat{f} - f^{\star} ^2 \leq \left(v^3_{f^{\star}}\frac{\log d}{n}\right)^{2/5} $. This work bridges non-linear and non-parametric function estimation and includes single-hidden layer nets. Unlike past theory for such settings, our bound shows that the risk is small even when the input dimension $ d $ of an infinite-dimensional parameterized dictionary is much larger than the available sample size. When the dimension is larger than the cube root of the sample size, this quantity is seen to improve the more familiar risk bound of $ v_{f^{\star}}\left(\frac{d\log (n/d)}{n}\right)^{1/2} $, also investigated here. |
Tasks | |
Published | 2016-07-05 |
URL | http://arxiv.org/abs/1607.01434v4 |
http://arxiv.org/pdf/1607.01434v4.pdf | |
PWC | https://paperswithcode.com/paper/risk-bounds-for-high-dimensional-ridge |
Repo | |
Framework | |
Mutual Transformation of Information and Knowledge
Title | Mutual Transformation of Information and Knowledge |
Authors | Olegs Verhodubs |
Abstract | Information and knowledge are transformable into each other. Information transformation into knowledge by the example of rule generation from OWL (Web Ontology Language) ontology has been shown during the development of the SWES (Semantic Web Expert System). The SWES is expected as an expert system for searching OWL ontologies from the Web, generating rules from the found ontologies and supplementing the SWES knowledge base with these rules. The purpose of this paper is to show knowledge transformation into information by the example of ontology generation from rules. |
Tasks | |
Published | 2016-04-26 |
URL | http://arxiv.org/abs/1604.07625v1 |
http://arxiv.org/pdf/1604.07625v1.pdf | |
PWC | https://paperswithcode.com/paper/mutual-transformation-of-information-and |
Repo | |
Framework | |