July 27, 2019

3129 words 15 mins read

Paper Group ANR 683

Active Learning amidst Logical Knowledge. Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles. Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding. Recognizing and Curating Photo Albums via Event-Specific Image Importance. Fast and easy blind deblurring using an inverse filter an …

Active Learning amidst Logical Knowledge


Title	Active Learning amidst Logical Knowledge
Authors	Emmanouil Antonios Platanios, Ashish Kapoor, Eric Horvitz
Abstract	Structured prediction is ubiquitous in applications of machine learning such as knowledge extraction and natural language processing. Structure often can be formulated in terms of logical constraints. We consider the question of how to perform efficient active learning in the presence of logical constraints among variables inferred by different classifiers. We propose several methods and provide theoretical results that demonstrate the inappropriateness of employing uncertainty guided sampling, a commonly used active learning method. Furthermore, experiments on ten different datasets demonstrate that the methods significantly outperform alternatives in practice. The results are of practical significance in situations where labeled data is scarce.
Tasks	Active Learning, Structured Prediction
Published	2017-09-26
URL	http://arxiv.org/abs/1709.08850v1
PDF	http://arxiv.org/pdf/1709.08850v1.pdf
PWC	https://paperswithcode.com/paper/active-learning-amidst-logical-knowledge
Repo
Framework

Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles


Title	Best Viewpoint Tracking for Camera Mounted on Robotic Arm with Dynamic Obstacles
Authors	Christos Maniatis, Marcelo Saval-Calvo, Radim Tylecek, Robert B. Fisher
Abstract	The problem of finding a next best viewpoint for 3D modeling or scene mapping has been explored in computer vision over the last decade. This paper tackles a similar problem, but with different characteristics. It proposes a method for dynamic next best viewpoint recovery of a target point while avoiding possible occlusions. Since the environment can change, the method has to iteratively find the next best view with a global understanding of the free and occupied parts. We model the problem as a set of possible viewpoints which correspond to the centers of the facets of a virtual tessellated hemisphere covering the scene. Taking into account occlusions, distances between current and future viewpoints, quality of the viewpoint and joint constraints (robot arm joint distances or limits), we evaluate the next best viewpoint. The proposal has been evaluated on 8 different scenarios with different occlusions and a short 3D video sequence to validate its dynamic performance.
Tasks
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00300v2
PDF	http://arxiv.org/pdf/1708.00300v2.pdf
PWC	https://paperswithcode.com/paper/best-viewpoint-tracking-for-camera-mounted-on
Repo
Framework

Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding


Title	Date-Field Retrieval in Scene Image and Video Frames using Text Enhancement and Shape Coding
Authors	Partha Pratim Roy, Ayan Kumar Bhunia, Umapada Pal
Abstract	Text recognition in scene image and video frames is difficult because of low resolution, blur, background noise, etc. Since traditional OCRs do not perform well in such images, information retrieval using keywords could be an alternative way to index/retrieve such text information. Date is a useful piece of information which has various applications including date-wise videos/scene searching, indexing or retrieval. This paper presents a date spotting based information retrieval system for natural scene image and video frames where text appears with complex backgrounds. We propose a line based date spotting approach using Hidden Markov Model (HMM) which is used to detect the date information in a given text. Different date models are searched from a line without segmenting characters or words. Given a text line image in RGB, we apply an efficient gray image conversion to enhance the text information. Wavelet decomposition and gradient sub-bands are used to enhance text information in gray scale. Next, Pyramid Histogram of Oriented Gradient (PHOG) feature has been extracted from gray image and binary images for date-spotting framework. Binary and gray image features are combined by MLP based Tandem approach. Finally, to boost the performance further, a shape coding based scheme is used to combine the similar shape characters in same class during word spotting. For our experiment, three different date models have been constructed to search similar date information having numeric dates that contains numeral values and punctuations and semi-numeric that contains dates with numerals along with months in scene/video text. We have tested our system on 1648 text lines and the results show the effectiveness of our proposed date spotting approach.
Tasks	Information Retrieval
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06833v1
PDF	http://arxiv.org/pdf/1707.06833v1.pdf
PWC	https://paperswithcode.com/paper/date-field-retrieval-in-scene-image-and-video
Repo
Framework

Recognizing and Curating Photo Albums via Event-Specific Image Importance


Title	Recognizing and Curating Photo Albums via Event-Specific Image Importance
Authors	Yufei Wang, Zhe Lin, Xiaohui Shen, Radomir Mech, Gavin Miller, Garrison W. Cottrell
Abstract	Automatic organization of personal photos is a problem with many real world ap- plications, and can be divided into two main tasks: recognizing the event type of the photo collection, and selecting interesting images from the collection. In this paper, we attempt to simultaneously solve both tasks: album-wise event recognition and image- wise importance prediction. We collected an album dataset with both event type labels and image importance labels, refined from an existing CUFED dataset. We propose a hybrid system consisting of three parts: A siamese network-based event-specific image importance prediction, a Convolutional Neural Network (CNN) that recognizes the event type, and a Long Short-Term Memory (LSTM)-based sequence level event recognizer. We propose an iterative updating procedure for event type and image importance score prediction. We experimentally verified that image importance score prediction and event type recognition can each help the performance of the other.
Tasks
Published	2017-07-19
URL	http://arxiv.org/abs/1707.05911v1
PDF	http://arxiv.org/pdf/1707.05911v1.pdf
PWC	https://paperswithcode.com/paper/recognizing-and-curating-photo-albums-via
Repo
Framework


Title	Fast and easy blind deblurring using an inverse filter and PROBE
Authors	Naftali Zon, Rana Hanocka, Nahum Kiryati
Abstract	PROBE (Progressive Removal of Blur Residual) is a recursive framework for blind deblurring. Using the elementary modified inverse filter at its core, PROBE’s experimental performance meets or exceeds the state of the art, both visually and quantitatively. Remarkably, PROBE lends itself to analysis that reveals its convergence properties. PROBE is motivated by recent ideas on progressive blind deblurring, but breaks away from previous research by its simplicity, speed, performance and potential for analysis. PROBE is neither a functional minimization approach, nor an open-loop sequential method (blur kernel estimation followed by non-blind deblurring). PROBE is a feedback scheme, deriving its unique strength from the closed-loop architecture rather than from the accuracy of its algorithmic components.
Tasks	Deblurring
Published	2017-02-04
URL	http://arxiv.org/abs/1702.01315v1
PDF	http://arxiv.org/pdf/1702.01315v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-easy-blind-deblurring-using-an
Repo
Framework

Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks


Title	Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks
Authors	Haiyang Yu, Zhihai Wu, Shuqin Wang, Yunpeng Wang, Xiaolei Ma
Abstract	Predicting large-scale transportation network traffic has become an important and challenging topic in recent decades. Inspired by the domain knowledge of motion prediction, in which the future motion of an object can be predicted based on previous scenes, we propose a network grid representation method that can retain the fine-scale structure of a transportation network. Network-wide traffic speeds are converted into a series of static images and input into a novel deep architecture, namely, spatiotemporal recurrent convolutional networks (SRCNs), for traffic forecasting. The proposed SRCNs inherit the advantages of deep convolutional neural networks (DCNNs) and long short-term memory (LSTM) neural networks. The spatial dependencies of network-wide traffic can be captured by DCNNs, and the temporal dynamics can be learned by LSTMs. An experiment on a Beijing transportation network with 278 links demonstrates that SRCNs outperform other deep learning-based algorithms in both short-term and long-term traffic prediction.
Tasks	motion prediction, Traffic Prediction
Published	2017-05-07
URL	http://arxiv.org/abs/1705.02699v1
PDF	http://arxiv.org/pdf/1705.02699v1.pdf
PWC	https://paperswithcode.com/paper/spatiotemporal-recurrent-convolutional-1
Repo
Framework

Learning to Act Properly: Predicting and Explaining Affordances from Images


Title	Learning to Act Properly: Predicting and Explaining Affordances from Images
Authors	Ching-Yao Chuang, Jiaman Li, Antonio Torralba, Sanja Fidler
Abstract	We address the problem of affordance reasoning in diverse scenes that appear in the real world. Affordances relate the agent’s actions to their effects when taken on the surrounding objects. In our work, we take the egocentric view of the scene, and aim to reason about action-object affordances that respect both the physical world as well as the social norms imposed by the society. We also aim to teach artificial agents why some actions should not be taken in certain situations, and what would likely happen if these actions would be taken. We collect a new dataset that builds upon ADE20k, referred to as ADE-Affordance, which contains annotations enabling such rich visual reasoning. We propose a model that exploits Graph Neural Networks to propagate contextual information from the scene in order to perform detailed affordance reasoning about each object. Our model is showcased through various ablation studies, pointing to successes and challenges in this complex task.
Tasks	Visual Reasoning
Published	2017-12-20
URL	http://arxiv.org/abs/1712.07576v2
PDF	http://arxiv.org/pdf/1712.07576v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-act-properly-predicting-and
Repo
Framework

Inference in Sparse Graphs with Pairwise Measurements and Side Information


Title	Inference in Sparse Graphs with Pairwise Measurements and Side Information
Authors	Dylan J. Foster, Daniel Reichman, Karthik Sridharan
Abstract	We consider the statistical problem of recovering a hidden “ground truth” binary labeling for the vertices of a graph up to low Hamming error from noisy edge and vertex measurements. We present new algorithms and a sharp finite-sample analysis for this problem on trees and sparse graphs with poor expansion properties such as hypergrids and ring lattices. Our method generalizes and improves over that of Globerson et al. (2015), who introduced the problem for two-dimensional grid lattices. For trees we provide a simple, efficient, algorithm that infers the ground truth with optimal Hamming error has optimal sample complexity and implies recovery results for all connected graphs. Here, the presence of side information is critical to obtain a non-trivial recovery rate. We then show how to adapt this algorithm to tree decompositions of edge-subgraphs of certain graph families such as lattices, resulting in optimal recovery error rates that can be obtained efficiently The thrust of our analysis is to 1) use the tree decomposition along with edge measurements to produce a small class of viable vertex labelings and 2) apply an analysis influenced by statistical learning theory to show that we can infer the ground truth from this class using vertex measurements. We show the power of our method in several examples including hypergrids, ring lattices, and the Newman-Watts model for small world graphs. For two-dimensional grids, our results improve over Globerson et al. (2015) by obtaining optimal recovery in the constant-height regime.
Tasks
Published	2017-03-08
URL	http://arxiv.org/abs/1703.02728v3
PDF	http://arxiv.org/pdf/1703.02728v3.pdf
PWC	https://paperswithcode.com/paper/inference-in-sparse-graphs-with-pairwise
Repo
Framework

Enabling Embedded Inference Engine with ARM Compute Library: A Case Study


Title	Enabling Embedded Inference Engine with ARM Compute Library: A Case Study
Authors	Dawei Sun, Shaoshan Liu, Jean-Luc Gaudiot
Abstract	When you need to enable deep learning on low-cost embedded SoCs, is it better to port an existing deep learning framework or should you build one from scratch? In this paper, we share our practical experiences of building an embedded inference engine using ARM Compute Library (ACL). The results show that, contradictory to conventional wisdoms, for simple models, it takes much less development time to build an inference engine from scratch compared to porting existing frameworks. In addition, by utilizing ACL, we managed to build an inference engine that outperforms TensorFlow by 25%. Our conclusion is that, on embedded devices, we most likely will use very simple deep learning models for inference, and with well-developed building blocks such as ACL, it may be better in both performance and development time to build the engine from scratch.
Tasks
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03751v3
PDF	http://arxiv.org/pdf/1704.03751v3.pdf
PWC	https://paperswithcode.com/paper/enabling-embedded-inference-engine-with-arm
Repo
Framework

Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models


Title	Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models
Authors	Jesse Engel, Matthew Hoffman, Adam Roberts
Abstract	Deep generative neural networks have proven effective at both conditional and unconditional modeling of complex data distributions. Conditional generation enables interactive control, but creating new controls often requires expensive retraining. In this paper, we develop a method to condition generation without retraining the model. By post-hoc learning latent constraints, value functions that identify regions in latent space that generate outputs with desired attributes, we can conditionally sample from these regions with gradient-based optimization or amortized actor functions. Combining attribute constraints with a universal “realism” constraint, which enforces similarity to the data distribution, we generate realistic conditional images from an unconditional variational autoencoder. Further, using gradient-based optimization, we demonstrate identity-preserving transformations that make the minimal adjustment in latent space to modify the attributes of an image. Finally, with discrete sequences of musical notes, we demonstrate zero-shot conditional generation, learning latent constraints in the absence of labeled data or a differentiable reward function. Code with dedicated cloud instance has been made publicly available (https://goo.gl/STGMGx).
Tasks
Published	2017-11-15
URL	http://arxiv.org/abs/1711.05772v2
PDF	http://arxiv.org/pdf/1711.05772v2.pdf
PWC	https://paperswithcode.com/paper/latent-constraints-learning-to-generate
Repo
Framework

Online Hashing


Title	Online Hashing
Authors	Long-Kai Huang, Qiang Yang, Wei-Shi Zheng
Abstract	Although hash function learning algorithms have achieved great success in recent years, most existing hash models are off-line, which are not suitable for processing sequential or online data. To address this problem, this work proposes an online hash model to accommodate data coming in stream for online learning. Specifically, a new loss function is proposed to measure the similarity loss between a pair of data samples in hamming space. Then, a structured hash model is derived and optimized in a passive-aggressive way. Theoretical analysis on the upper bound of the cumulative loss for the proposed online hash model is provided. Furthermore, we extend our online hashing from a single-model to a multi-model online hashing that trains multiple models so as to retain diverse online hashing models in order to avoid biased update. The competitive efficiency and effectiveness of the proposed online hash models are verified through extensive experiments on several large-scale datasets as compared to related hashing methods.
Tasks
Published	2017-04-06
URL	http://arxiv.org/abs/1704.01897v1
PDF	http://arxiv.org/pdf/1704.01897v1.pdf
PWC	https://paperswithcode.com/paper/online-hashing
Repo
Framework

Weakly-supervised Semantic Parsing with Abstract Examples


Title	Weakly-supervised Semantic Parsing with Abstract Examples
Authors	Omer Goldman, Veronica Latcinnik, Udi Naveh, Amir Globerson, Jonathan Berant
Abstract	Training semantic parsers from weak supervision (denotations) rather than strong supervision (programs) complicates training in two ways. First, a large search space of potential programs needs to be explored at training time to find a correct program. Second, spurious programs that accidentally lead to a correct denotation add noise to training. In this work we propose that in closed worlds with clear semantic types, one can substantially alleviate these problems by utilizing an abstract representation, where tokens in both the language utterance and program are lifted to an abstract form. We show that these abstractions can be defined with a handful of lexical rules and that they result in sharing between different examples that alleviates the difficulties in training. To test our approach, we develop the first semantic parser for CNLVR, a challenging visual reasoning dataset, where the search space is large and overcoming spuriousness is critical, because denotations are either TRUE or FALSE, and thus random programs are likely to lead to a correct denotation. Our method substantially improves performance, and reaches 82.5% accuracy, a 14.7% absolute accuracy improvement compared to the best reported accuracy so far.
Tasks	Semantic Parsing, Visual Reasoning
Published	2017-11-14
URL	http://arxiv.org/abs/1711.05240v5
PDF	http://arxiv.org/pdf/1711.05240v5.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-semantic-parsing-with-1
Repo
Framework

Deep Reinforcement Learning with Surrogate Agent-Environment Interface


Title	Deep Reinforcement Learning with Surrogate Agent-Environment Interface
Authors	Song Wang, Yu Jing
Abstract	In this paper, we propose surrogate agent-environment interface (SAEI) in reinforcement learning. We also state that learning based on probability surrogate agent-environment interface provides optimal policy of task agent-environment interface. We introduce surrogate probability action and develop the probability surrogate action deterministic policy gradient (PSADPG) algorithm based on SAEI. This algorithm enables continuous control of discrete action. The experiments show PSADPG achieves the performance of DQN in certain tasks with the stochastic optimal policy nature in the initial training stage.
Tasks	Continuous Control
Published	2017-09-12
URL	http://arxiv.org/abs/1709.03942v3
PDF	http://arxiv.org/pdf/1709.03942v3.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-with-surrogate
Repo
Framework

Multi-spectral Image Panchromatic Sharpening, Outcome and Process Quality Assessment Protocol


Title	Multi-spectral Image Panchromatic Sharpening, Outcome and Process Quality Assessment Protocol
Authors	Andrea Baraldi, Francesca Despini, Sergio Teggi
Abstract	Multispectral (MS) image panchromatic (PAN) sharpening algorithms proposed to the remote sensing community are ever increasing in number and variety. Their aim is to sharpen a coarse spatial resolution MS image with a fine spatial resolution PAN image acquired simultaneously by a spaceborne or airborne Earth observation (EO) optical imaging sensor pair. Unfortunately, to date, no standard evaluation procedure for MS image PAN sharpening outcome and process is community agreed upon, in contrast with the Quality Assurance Framework for Earth Observation (QA4EO) guidelines proposed by the intergovernmental Group on Earth Observations (GEO). In general, process is easier to measure, outcome is more important. The original contribution of the present study is fourfold. First, existing procedures for quantitative quality assessment (Q2A) of the (sole) PAN sharpened MS product are critically reviewed. Their conceptual and implementation drawbacks are highlighted to be overcome for quality improvement. Second, a novel (to the best of these authors’ knowledge, the first) protocol for Q2A of MS image PAN sharpening product and process is designed, implemented and validated by independent means. Third, within this protocol, an innovative categorization of spectral and spatial image quality indicators and metrics is presented. Fourth, according to this new taxonomy, an original third order isotropic multi scale gray level co occurrence matrix (TIMS GLCM) calculator and a TIMS GLCM texture feature extractor are proposed to replace popular second order GLCMs.
Tasks
Published	2017-01-08
URL	http://arxiv.org/abs/1701.01942v1
PDF	http://arxiv.org/pdf/1701.01942v1.pdf
PWC	https://paperswithcode.com/paper/multi-spectral-image-panchromatic-sharpening
Repo
Framework

Complete 3D Scene Parsing from an RGBD Image


Title	Complete 3D Scene Parsing from an RGBD Image
Authors	Chuhang Zou, Ruiqi Guo, Zhizhong Li, Derek Hoiem
Abstract	One major goal of vision is to infer physical models of objects, surfaces, and their layout from sensors. In this paper, we aim to interpret indoor scenes from one RGBD image. Our representation encodes the layout of orthogonal walls and the extent of objects, modeled with CAD-like 3D shapes. We parse both the visible and occluded portions of the scene and all observable objects, producing a complete 3D parse. Such a scene interpretation is useful for robotics and visual reasoning, but difficult to produce due to the well-known challenge of segmentation, the high degree of occlusion, and the diversity of objects in indoor scenes. We take a data-driven approach, generating sets of potential object regions, matching to regions in training images, and transferring and aligning associated 3D models while encouraging fit to observations and spatial consistency. We use support inference to aid interpretation and propose a retrieval scheme that uses convolutional neural networks (CNNs) to classify regions and retrieve objects with similar shapes. We demonstrate the performance of our method on our newly annotated NYUd v2 dataset with detailed 3D shapes.
Tasks	Scene Parsing, Visual Reasoning
Published	2017-10-25
URL	http://arxiv.org/abs/1710.09490v2
PDF	http://arxiv.org/pdf/1710.09490v2.pdf
PWC	https://paperswithcode.com/paper/complete-3d-scene-parsing-from-an-rgbd-image
Repo
Framework