July 28, 2019

3713 words 18 mins read

Paper Group ANR 454

Transferable Semi-supervised Semantic Segmentation. Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks. Neural Person Search Machines. Segmenting Sky Pixels in Images. r-BTN: Cross-domain Face Composite and Synthesis from Limited Facial Patches. IAN: The Individual Aggregation Network for Person Search. Climbing a shaky …

Transferable Semi-supervised Semantic Segmentation


Title	Transferable Semi-supervised Semantic Segmentation
Authors	Huaxin Xiao, Yunchao Wei, Yu Liu, Maojun Zhang, Jiashi Feng
Abstract	The performance of deep learning based semantic segmentation models heavily depends on sufficient data with careful annotations. However, even the largest public datasets only provide samples with pixel-level annotations for rather limited semantic categories. Such data scarcity critically limits scalability and applicability of semantic segmentation models in real applications. In this paper, we propose a novel transferable semi-supervised semantic segmentation model that can transfer the learned segmentation knowledge from a few strong categories with pixel-level annotations to unseen weak categories with only image-level annotations, significantly broadening the applicable territory of deep segmentation models. In particular, the proposed model consists of two complementary and learnable components: a Label transfer Network (L-Net) and a Prediction transfer Network (P-Net). The L-Net learns to transfer the segmentation knowledge from strong categories to the images in the weak categories and produces coarse pixel-level semantic maps, by effectively exploiting the similar appearance shared across categories. Meanwhile, the P-Net tailors the transferred knowledge through a carefully designed adversarial learning strategy and produces refined segmentation results with better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4% performance of the fully-supervised baseline using 50% and 0% categories with pixel-level annotations respectively on PASCAL VOC 2012. With such a novel transfer mechanism, our proposed model is easily generalizable to a variety of new categories, only requiring image-level annotations, and offers appealing scalability in real applications.
Tasks	Semantic Segmentation, Semi-Supervised Semantic Segmentation
Published	2017-11-18
URL	http://arxiv.org/abs/1711.06828v2
PDF	http://arxiv.org/pdf/1711.06828v2.pdf
PWC	https://paperswithcode.com/paper/transferable-semi-supervised-semantic
Repo
Framework

Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks


Title	Reward Maximization Under Uncertainty: Leveraging Side-Observations on Networks
Authors	Swapna Buccapatnam, Fang Liu, Atilla Eryilmaz, Ness B. Shroff
Abstract	We study the stochastic multi-armed bandit (MAB) problem in the presence of side-observations across actions that occur as a result of an underlying network structure. In our model, a bipartite graph captures the relationship between actions and a common set of unknowns such that choosing an action reveals observations for the unknowns that it is connected to. This models a common scenario in online social networks where users respond to their friends’ activity, thus providing side information about each other’s preferences. Our contributions are as follows: 1) We derive an asymptotic lower bound (with respect to time) as a function of the bi-partite network structure on the regret of any uniformly good policy that achieves the maximum long-term average reward. 2) We propose two policies - a randomized policy; and a policy based on the well-known upper confidence bound (UCB) policies - both of which explore each action at a rate that is a function of its network position. We show, under mild assumptions, that these policies achieve the asymptotic lower bound on the regret up to a multiplicative factor, independent of the network structure. Finally, we use numerical examples on a real-world social network and a routing example network to demonstrate the benefits obtained by our policies over other existing policies.
Tasks
Published	2017-04-26
URL	http://arxiv.org/abs/1704.07943v2
PDF	http://arxiv.org/pdf/1704.07943v2.pdf
PWC	https://paperswithcode.com/paper/reward-maximization-under-uncertainty
Repo
Framework

Neural Person Search Machines


Title	Neural Person Search Machines
Authors	Hao Liu, Jiashi Feng, Zequn Jie, Karlekar Jayashree, Bo Zhao, Meibin Qi, Jianguo Jiang, Shuicheng Yan
Abstract	We investigate the problem of person search in the wild in this work. Instead of comparing the query against all candidate regions generated in a query-blind manner, we propose to recursively shrink the search area from the whole image till achieving precise localization of the target person, by fully exploiting information from the query and contextual cues in every recursive search step. We develop the Neural Person Search Machines (NPSM) to implement such recursive localization for person search. Benefiting from its neural search mechanism, NPSM is able to selectively shrink its focus from a loose region to a tighter one containing the target automatically. In this process, NPSM employs an internal primitive memory component to memorize the query representation which modulates the attention and augments its robustness to other distracting regions. Evaluations on two benchmark datasets, CUHK-SYSU Person Search dataset and PRW dataset, have demonstrated that our method can outperform current state-of-the-arts in both mAP and top-1 evaluation protocols.
Tasks	Person Search
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06777v1
PDF	http://arxiv.org/pdf/1707.06777v1.pdf
PWC	https://paperswithcode.com/paper/neural-person-search-machines
Repo
Framework

Segmenting Sky Pixels in Images


Title	Segmenting Sky Pixels in Images
Authors	Cecilia La Place, Aisha Urooj Khan, Ali Borji
Abstract	Outdoor scene parsing models are often trained on ideal datasets and produce quality results. However, this leads to a discrepancy when applied to the real world. The quality of scene parsing, particularly sky classification, decreases in night time images, images involving varying weather conditions, and scene changes due to seasonal weather. This project focuses on approaching these challenges by using a state-of-the-art model in conjunction with a non-ideal dataset: SkyFinder and a subset from SUN database with Sky object. We focus specifically on sky segmentation, the task of determining sky and not-sky pixels, and improving upon an existing state-of-the-art model: RefineNet. As a result of our efforts, we have seen an improvement of 10-15% in the average MCR compared to the prior methods on SkyFinder dataset. We have also improved from an off-the shelf-model in terms of average mIOU by nearly 35%. Further, we analyze our trained models on images w.r.t two aspects: times of day and weather, and find that, in spite of facing same challenges as prior methods, our trained models significantly outperform them.
Tasks	Scene Parsing
Published	2017-12-26
URL	http://arxiv.org/abs/1712.09161v2
PDF	http://arxiv.org/pdf/1712.09161v2.pdf
PWC	https://paperswithcode.com/paper/segmenting-sky-pixels-in-images
Repo
Framework

r-BTN: Cross-domain Face Composite and Synthesis from Limited Facial Patches


Title	r-BTN: Cross-domain Face Composite and Synthesis from Limited Facial Patches
Authors	Yang Song, Zhifei Zhang, Hairong Qi
Abstract	We start by asking an interesting yet challenging question, “If an eyewitness can only recall the eye features of the suspect, such that the forensic artist can only produce a sketch of the eyes (e.g., the top-left sketch shown in Fig. 1), can advanced computer vision techniques help generate the whole face image?” A more generalized question is that if a large proportion (e.g., more than 50%) of the face/sketch is missing, can a realistic whole face sketch/image still be estimated. Existing face completion and generation methods either do not conduct domain transfer learning or can not handle large missing area. For example, the inpainting approach tends to blur the generated region when the missing area is large (i.e., more than 50%). In this paper, we exploit the potential of deep learning networks in filling large missing region (e.g., as high as 95% missing) and generating realistic faces with high-fidelity in cross domains. We propose the recursive generation by bidirectional transformation networks (r-BTN) that recursively generates a whole face/sketch from a small sketch/face patch. The large missing area and the cross domain challenge make it difficult to generate satisfactory results using a unidirectional cross-domain learning structure. On the other hand, a forward and backward bidirectional learning between the face and sketch domains would enable recursive estimation of the missing region in an incremental manner (Fig. 1) and yield appealing results. r-BTN also adopts an adversarial constraint to encourage the generation of realistic faces/sketches. Extensive experiments have been conducted to demonstrate the superior performance from r-BTN as compared to existing potential solutions.
Tasks	Facial Inpainting, Transfer Learning
Published	2017-06-02
URL	http://arxiv.org/abs/1706.00556v2
PDF	http://arxiv.org/pdf/1706.00556v2.pdf
PWC	https://paperswithcode.com/paper/r-btn-cross-domain-face-composite-and
Repo
Framework

IAN: The Individual Aggregation Network for Person Search


Title	IAN: The Individual Aggregation Network for Person Search
Authors	Jimin Xiao, Yanchun Xie, Tammam Tillo, Kaizhu Huang, Yunchao Wei, Jiashi Feng
Abstract	Person search in real-world scenarios is a new challenging computer version task with many meaningful applications. The challenge of this task mainly comes from: (1) unavailable bounding boxes for pedestrians and the model needs to search for the person over the whole gallery images; (2) huge variance of visual appearance of a particular person owing to varying poses, lighting conditions, and occlusions. To address these two critical issues in modern person search applications, we propose a novel Individual Aggregation Network (IAN) that can accurately localize persons by learning to minimize intra-person feature variations. IAN is built upon the state-of-the-art object detection framework, i.e., faster R-CNN, so that high-quality region proposals for pedestrians can be produced in an online manner. In addition, to relieve the negative effect caused by varying visual appearances of the same individual, IAN introduces a novel center loss that can increase the intra-class compactness of feature representations. The engaged center loss encourages persons with the same identity to have similar feature characteristics. Extensive experimental results on two benchmarks, i.e., CUHK-SYSU and PRW, well demonstrate the superiority of the proposed model. In particular, IAN achieves 77.23% mAP and 80.45% top-1 accuracy on CUHK-SYSU, which outperform the state-of-the-art by 1.7% and 1.85%, respectively.
Tasks	Object Detection, Person Search
Published	2017-05-16
URL	http://arxiv.org/abs/1705.05552v1
PDF	http://arxiv.org/pdf/1705.05552v1.pdf
PWC	https://paperswithcode.com/paper/ian-the-individual-aggregation-network-for
Repo
Framework

Climbing a shaky ladder: Better adaptive risk estimation


Title	Climbing a shaky ladder: Better adaptive risk estimation
Authors	Moritz Hardt
Abstract	We revisit the \emph{leaderboard problem} introduced by Blum and Hardt (2015) in an effort to reduce overfitting in machine learning benchmarks. We show that a randomized version of their Ladder algorithm achieves leaderboard error O(1/n^{0.4}) compared with the previous best rate of O(1/n^{1/3}). Short of proving that our algorithm is optimal, we point out a major obstacle toward further progress. Specifically, any improvement to our upper bound would lead to asymptotic improvements in the general adaptive estimation setting as have remained elusive in recent years. This connection also directly leads to lower bounds for specific classes of algorithms. In particular, we exhibit a new attack on the leaderboard algorithm that both theoretically and empirically distinguishes between our algorithm and previous leaderboard algorithms.
Tasks
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02733v1
PDF	http://arxiv.org/pdf/1706.02733v1.pdf
PWC	https://paperswithcode.com/paper/climbing-a-shaky-ladder-better-adaptive-risk
Repo
Framework

Reducing Crowdsourcing to Graphon Estimation, Statistically


Title	Reducing Crowdsourcing to Graphon Estimation, Statistically
Authors	Devavrat Shah, Christina Lee Yu
Abstract	Inferring the correct answers to binary tasks based on multiple noisy answers in an unsupervised manner has emerged as the canonical question for micro-task crowdsourcing or more generally aggregating opinions. In graphon estimation, one is interested in estimating edge intensities or probabilities between nodes using a single snapshot of a graph realization. In the recent literature, there has been exciting development within both of these topics. In the context of crowdsourcing, the key intellectual challenge is to understand whether a given task can be more accurately denoised by aggregating answers collected from other different tasks. In the context of graphon estimation, precise information limits and estimation algorithms remain of interest. In this paper, we utilize a statistical reduction from crowdsourcing to graphon estimation to advance the state-of-art for both of these challenges. We use concepts from graphon estimation to design an algorithm that achieves better performance than the {\em majority voting} scheme for a setup that goes beyond the {\em rank one} models considered in the literature. We use known explicit lower bounds for crowdsourcing to provide refined lower bounds for graphon estimation.
Tasks	Graphon Estimation
Published	2017-03-23
URL	https://arxiv.org/abs/1703.08085v4
PDF	https://arxiv.org/pdf/1703.08085v4.pdf
PWC	https://paperswithcode.com/paper/reducing-crowdsourcing-to-graphon-estimation
Repo
Framework

Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery


Title	Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery
Authors	Mahdieh Poostchi
Abstract	A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.
Tasks	3D Face Reconstruction, Face Reconstruction, Object Detection
Published	2017-11-05
URL	http://arxiv.org/abs/1711.01656v1
PDF	http://arxiv.org/pdf/1711.01656v1.pdf
PWC	https://paperswithcode.com/paper/spatial-pyramid-context-aware-moving-object
Repo
Framework

Gabor frames and deep scattering networks in audio processing


Title	Gabor frames and deep scattering networks in audio processing
Authors	Roswitha Bammer, Monika Dörfler, Pavol Harar
Abstract	This paper introduces Gabor scattering, a feature extractor based on Gabor frames and Mallat’s scattering transform. By using a simple signal model for audio signals specific properties of Gabor scattering are studied. It is shown that for each layer, specific invariances to certain signal characteristics occur. Furthermore, deformation stability of the coefficient vector generated by the feature extractor is derived by using a decoupling technique which exploits the contractivity of general scattering networks. Deformations are introduced as changes in spectral shape and frequency modulation. The theoretical results are illustrated by numerical examples and experiments. Numerical evidence is given by evaluation on a synthetic and a “real” data set, that the invariances encoded by the Gabor scattering transform lead to higher performance in comparison with just using Gabor transform, especially when few training samples are available.
Tasks
Published	2017-06-27
URL	https://arxiv.org/abs/1706.08818v4
PDF	https://arxiv.org/pdf/1706.08818v4.pdf
PWC	https://paperswithcode.com/paper/gabor-frames-and-deep-scattering-networks-in
Repo
Framework

Simulated Data Experiments for Time Series Classification Part 1: Accuracy Comparison with Default Settings


Title	Simulated Data Experiments for Time Series Classification Part 1: Accuracy Comparison with Default Settings
Authors	Anthony Bagnall, Aaron Bostrom, James Large, Jason Lines
Abstract	There are now a broad range of time series classification (TSC) algorithms designed to exploit different representations of the data. These have been evaluated on a range of problems hosted at the UCR-UEA TSC Archive (www.timeseriesclassification.com), and there have been extensive comparative studies. However, our understanding of why one algorithm outperforms another is still anecdotal at best. This series of experiments is meant to help provide insights into what sort of discriminatory features in the data lead one set of algorithms that exploit a particular representation to be better than other algorithms. We categorise five different feature spaces exploited by TSC algorithms then design data simulators to generate randomised data from each representation. We describe what results we expected from each class of algorithm and data representation, then observe whether these prior beliefs are supported by the experimental evidence. We provide an open source implementation of all the simulators to allow for the controlled testing of hypotheses relating to classifier performance on different data representations. We identify many surprising results that confounded our expectations, and use these results to highlight how an over simplified view of classifier structure can often lead to erroneous prior beliefs. We believe ensembling can often overcome prior bias, and our results support the belief by showing that the ensemble approach adopted by the Hierarchical Collective of Transform based Ensembles (HIVE-COTE) is significantly better than the alternatives when the data representation is unknown, and is significantly better than, or not significantly significantly better than, or not significantly worse than, the best other approach on three out of five of the individual simulators.
Tasks	Time Series, Time Series Classification
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09480v1
PDF	http://arxiv.org/pdf/1703.09480v1.pdf
PWC	https://paperswithcode.com/paper/simulated-data-experiments-for-time-series
Repo
Framework

“i have a feeling trump will win………………": Forecasting Winners and Losers from User Predictions on Twitter


Title	“i have a feeling trump will win………………": Forecasting Winners and Losers from User Predictions on Twitter
Authors	Sandesh Swamy, Alan Ritter, Marie-Catherine de Marneffe
Abstract	Social media users often make explicit predictions about upcoming events. Such statements vary in the degree of certainty the author expresses toward the outcome:“Leonardo DiCaprio will win Best Actor” vs. “Leonardo DiCaprio may win” or “No way Leonardo wins!". Can popular beliefs on social media predict who will win? To answer this question, we build a corpus of tweets annotated for veridicality on which we train a log-linear classifier that detects positive veridicality with high precision. We then forecast uncertain outcomes using the wisdom of crowds, by aggregating users’ explicit predictions. Our method for forecasting winners is fully automated, relying only on a set of contenders as input. It requires no training data of past outcomes and outperforms sentiment and tweet volume baselines on a broad range of contest prediction tasks. We further demonstrate how our approach can be used to measure the reliability of individual accounts’ predictions and retrospectively identify surprise outcomes.
Tasks
Published	2017-07-22
URL	http://arxiv.org/abs/1707.07212v3
PDF	http://arxiv.org/pdf/1707.07212v3.pdf
PWC	https://paperswithcode.com/paper/i-have-a-feeling-trump-will-win-forecasting
Repo
Framework

Saliency Detection by Forward and Backward Cues in Deep-CNNs


Title	Saliency Detection by Forward and Backward Cues in Deep-CNNs
Authors	Nevrez Imamoglu, Chi Zhang, Wataru Shimoda, Yuming Fang, Boxin Shi
Abstract	As prior knowledge of objects or object features helps us make relations for similar objects on attentional tasks, pre-trained deep convolutional neural networks (CNNs) can be used to detect salient objects on images regardless of the object class is in the network knowledge or not. In this paper, we propose a top-down saliency model using CNN, a weakly supervised CNN model trained for 1000 object labelling task from RGB images. The model detects attentive regions based on their objectness scores predicted by selected features from CNNs. To estimate the salient objects effectively, we combine both forward and backward features, while demonstrating that partially-guided backpropagation will provide sufficient information for selecting the features from forward run of CNN model. Finally, these top-down cues are enhanced with a state-of-the-art bottom-up model as complementing the overall saliency. As the proposed model is an effective integration of forward and backward cues through objectness without any supervision or regression to ground truth data, it gives promising results compared to state-of-the-art models in two different datasets.
Tasks	Saliency Detection
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00152v2
PDF	http://arxiv.org/pdf/1703.00152v2.pdf
PWC	https://paperswithcode.com/paper/saliency-detection-by-forward-and-backward
Repo
Framework

Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking


Title	Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking
Authors	Heng Fan, Haibin Ling
Abstract	Being intensively studied, visual tracking has seen great recent advances in either speed (e.g., with correlation filters) or accuracy (e.g., with deep features). Real-time and high accuracy tracking algorithms, however, remain scarce. In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing from the success of parallel tracking and mapping in visual SLAM. Our PTAV framework typically consists of two components, a tracker T and a verifier V, working in parallel on two separate threads. The tracker T aims to provide a super real-time tracking inference and is expected to perform well most of the time; by contrast, the verifier V checks the tracking results and corrects T when needed. The key innovation is that, V does not work on every frame but only upon the requests from T; on the other end, T may adjust the tracking according to the feedback from V. With such collaboration, PTAV enjoys both the high efficiency provided by T and the strong discriminative power by V. In our extensive experiments on popular benchmarks including OTB2013, OTB2015, TC128 and UAV20L, PTAV achieves the best tracking accuracy among all real-time trackers, and in fact performs even better than many deep learning based solutions. Moreover, as a general framework, PTAV is very flexible and has great rooms for improvement and generalization.
Tasks	Visual Tracking
Published	2017-08-01
URL	http://arxiv.org/abs/1708.00153v1
PDF	http://arxiv.org/pdf/1708.00153v1.pdf
PWC	https://paperswithcode.com/paper/parallel-tracking-and-verifying-a-framework
Repo
Framework

WARP: Wavelets with adaptive recursive partitioning for multi-dimensional data


Title	WARP: Wavelets with adaptive recursive partitioning for multi-dimensional data
Authors	Meng Li, Li Ma
Abstract	Effective identification of asymmetric and local features in images and other data observed on multi-dimensional grids plays a critical role in a wide range of applications including biomedical and natural image processing. Moreover, the ever increasing amount of image data, in terms of both the resolution per image and the number of images processed per application, requires algorithms and methods for such applications to be computationally efficient. We develop a new probabilistic framework for multi-dimensional data to overcome these challenges through incorporating data adaptivity into discrete wavelet transforms, thereby allowing them to adapt to the geometric structure of the data while maintaining the linear computational scalability. By exploiting a connection between the local directionality of wavelet transforms and recursive dyadic partitioning on the grid points of the observation, we obtain the desired adaptivity through adding to the traditional Bayesian wavelet regression framework an additional layer of Bayesian modeling on the space of recursive partitions over the grid points. We derive the corresponding inference recipe in the form of a recursive representation of the exact posterior, and develop a class of efficient recursive message passing algorithms for achieving exact Bayesian inference with a computational complexity linear in the resolution and sample size of the images. While our framework is applicable to a range of problems including multi-dimensional signal processing, compression, and structural learning, we illustrate its work and evaluate its performance in the context of 2D and 3D image reconstruction using real images from the ImageNet database. We also apply the framework to analyze a data set from retinal optical coherence tomography.
Tasks	Bayesian Inference, Image Reconstruction
Published	2017-11-02
URL	http://arxiv.org/abs/1711.00789v4
PDF	http://arxiv.org/pdf/1711.00789v4.pdf
PWC	https://paperswithcode.com/paper/warp-wavelets-with-adaptive-recursive
Repo
Framework