May 7, 2019

2962 words 14 mins read

Paper Group AWR 58

Fast $ε$-free Inference of Simulation Models with Bayesian Conditional Density Estimation. Commonly Uncommon: Semantic Sparsity in Situation Recognition. Does Multimodality Help Human and Machine for Translation and Image Captioning?. Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos. Joint Detection and Identification …

Fast $ε$-free Inference of Simulation Models with Bayesian Conditional Density Estimation


Title	Fast $ε$-free Inference of Simulation Models with Bayesian Conditional Density Estimation
Authors	George Papamakarios, Iain Murray
Abstract	Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an $\epsilon$-ball around the observed data, which is only correct in the limit $\epsilon!\rightarrow!0$. Monte Carlo methods can then draw samples from the approximate posterior to approximate predictions or error bars on parameters. These algorithms critically slow down as $\epsilon!\rightarrow!0$, and in practice draw samples from a broader distribution than the posterior. We propose a new approach to likelihood-free inference based on Bayesian conditional density estimation. Preliminary inferences based on limited simulation data are used to guide later simulations. In some cases, learning an accurate parametric representation of the entire true posterior distribution requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior.
Tasks	Density Estimation
Published	2016-05-20
URL	http://arxiv.org/abs/1605.06376v4
PDF	http://arxiv.org/pdf/1605.06376v4.pdf
PWC	https://paperswithcode.com/paper/fast-free-inference-of-simulation-models-with
Repo	https://github.com/gpapamak/epsilon_free_inference
Framework	none

Commonly Uncommon: Semantic Sparsity in Situation Recognition


Title	Commonly Uncommon: Semantic Sparsity in Situation Recognition
Authors	Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi
Abstract	Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objects play within the activity. For this problem, we find empirically that most object-role combinations are rare, and current state-of-the-art models significantly underperform in this sparse data regime. We avoid many such errors by (1) introducing a novel tensor composition function that learns to share examples across role-noun combinations and (2) semantically augmenting our training data with automatically gathered examples of rarely observed outputs using web data. When integrated within a complete CRF-based structured prediction model, the tensor-based approach outperforms existing state of the art by a relative improvement of 2.11% and 4.40% on top-5 verb and noun-role accuracy, respectively. Adding 5 million images with our semantic augmentation techniques gives further relative improvements of 6.23% and 9.57% on top-5 verb and noun-role accuracy.
Tasks	Structured Prediction
Published	2016-12-03
URL	http://arxiv.org/abs/1612.00901v1
PDF	http://arxiv.org/pdf/1612.00901v1.pdf
PWC	https://paperswithcode.com/paper/commonly-uncommon-semantic-sparsity-in
Repo	https://github.com/my89/imSitu
Framework	pytorch

Does Multimodality Help Human and Machine for Translation and Image Captioning?


Title	Does Multimodality Help Human and Machine for Translation and Image Captioning?
Authors	Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Joost van de Weijer
Abstract	This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.
Tasks	Image Captioning, Machine Translation, Multimodal Machine Translation
Published	2016-05-30
URL	http://arxiv.org/abs/1605.09186v4
PDF	http://arxiv.org/pdf/1605.09186v4.pdf
PWC	https://paperswithcode.com/paper/does-multimodality-help-human-and-machine-for
Repo	https://github.com/lium-lst/nmtpy
Framework	none

Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos


Title	Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos
Authors	Henrique Morimitsu, Isabelle Bloch, Roberto M. Cesar-Jr
Abstract	In this paper, we propose a novel approach for exploiting structural relations to track multiple objects that may undergo long-term occlusion and abrupt motion. We use a model-free approach that relies only on annotations given in the first frame of the video to track all the objects online, i.e. without knowledge from future frames. We initialize a probabilistic Attributed Relational Graph (ARG) from the first frame, which is incrementally updated along the video. Instead of using the structural information only to evaluate the scene, the proposed approach considers it to generate new tracking hypotheses. In this way, our method is capable of generating relevant object candidates that are used to improve or recover the track of lost objects. The proposed method is evaluated on several videos of table tennis, volleyball, and on the ACASVA dataset. The results show that our approach is very robust, flexible and able to outperform other state-of-the-art methods in sports videos that present structural patterns.
Tasks
Published	2016-12-19
URL	http://arxiv.org/abs/1612.06454v1
PDF	http://arxiv.org/pdf/1612.06454v1.pdf
PWC	https://paperswithcode.com/paper/exploring-structure-for-long-term-tracking-of
Repo	https://github.com/henriquem87/structured-graph-tracker
Framework	none

Joint Detection and Identification Feature Learning for Person Search


Title	Joint Detection and Identification Feature Learning for Person Search
Authors	Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, Xiaogang Wang
Abstract	Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries and candidates. However, it is different from real-world scenarios where the annotations of pedestrian bounding boxes are unavailable and the target person needs to be searched from a gallery of whole scene images. To close the gap, we propose a new deep learning framework for person search. Instead of breaking it down into two separate tasks—pedestrian detection and person re-identification, we jointly handle both aspects in a single convolutional neural network. An Online Instance Matching (OIM) loss function is proposed to train the network effectively, which is scalable to datasets with numerous identities. To validate our approach, we collect and annotate a large-scale benchmark dataset for person search. It contains 18,184 images, 8,432 identities, and 96,143 pedestrian bounding boxes. Experiments show that our framework outperforms other separate approaches, and the proposed OIM loss function converges much faster and better than the conventional Softmax loss.
Tasks	Pedestrian Detection, Person Re-Identification, Person Search
Published	2016-04-07
URL	http://arxiv.org/abs/1604.01850v3
PDF	http://arxiv.org/pdf/1604.01850v3.pdf
PWC	https://paperswithcode.com/paper/joint-detection-and-identification-feature
Repo	https://github.com/ShuangLI59/person_search
Framework	none

Detecting Text in Natural Image with Connectionist Text Proposal Network


Title	Detecting Text in Natural Image with Connectionist Text Proposal Network
Authors	Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao
Abstract	We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi- language text without further post-processing, departing from previous bottom-up methods requiring multi-step post-processing. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpass- ing recent results [8, 35] by a large margin. The CTPN is computationally efficient with 0:14s/image, by using the very deep VGG16 model [27]. Online demo is available at: http://textdet.com/.
Tasks	Scene Text Detection
Published	2016-09-12
URL	http://arxiv.org/abs/1609.03605v1
PDF	http://arxiv.org/pdf/1609.03605v1.pdf
PWC	https://paperswithcode.com/paper/detecting-text-in-natural-image-with
Repo	https://github.com/eragonruan/text-detection-ctpn
Framework	tf

Generalized Random Forests


Title	Generalized Random Forests
Authors	Susan Athey, Julie Tibshirani, Stefan Wager
Abstract	We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.
Tasks
Published	2016-10-05
URL	http://arxiv.org/abs/1610.01271v4
PDF	http://arxiv.org/pdf/1610.01271v4.pdf
PWC	https://paperswithcode.com/paper/generalized-random-forests
Repo	https://github.com/rajkumarkarthik/mgrf-develop
Framework	none

Deep Successor Reinforcement Learning


Title	Deep Successor Reinforcement Learning
Authors	Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman
Abstract	Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components – a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations – simple grid-world domains (MazeBase) and the Doom game engine.
Tasks	FPS Games, Game of Doom
Published	2016-06-08
URL	http://arxiv.org/abs/1606.02396v1
PDF	http://arxiv.org/pdf/1606.02396v1.pdf
PWC	https://paperswithcode.com/paper/deep-successor-reinforcement-learning
Repo	https://github.com/Ardavans/DSR
Framework	none

Single Pass PCA of Matrix Products


Title	Single Pass PCA of Matrix Products
Authors	Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis
Abstract	In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$. The straightforward way to do this is to (a) first sketch $A$ and $B$ individually, and then (b) find the top components using PCA on the sketch. Our algorithm in contrast retains additional summary information about $A,B$ (e.g. row and column norms etc.) and uses this additional information to obtain an improved approximation from the sketches. Our main analytical result establishes a comparable spectral norm guarantee to existing two-pass methods; in addition we also provide results from an Apache Spark implementation that shows better computational and statistical performance on real-world and synthetic evaluation datasets.
Tasks
Published	2016-10-21
URL	http://arxiv.org/abs/1610.06656v2
PDF	http://arxiv.org/pdf/1610.06656v2.pdf
PWC	https://paperswithcode.com/paper/single-pass-pca-of-matrix-products
Repo	https://github.com/wushanshan/MatrixProductPCA
Framework	none

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization


Title	f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
Authors	Sebastian Nowozin, Botond Cseke, Ryota Tomioka
Abstract	Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.
Tasks
Published	2016-06-02
URL	http://arxiv.org/abs/1606.00709v1
PDF	http://arxiv.org/pdf/1606.00709v1.pdf
PWC	https://paperswithcode.com/paper/f-gan-training-generative-neural-samplers
Repo	https://github.com/mboudiaf/Mutual-Information-Variational-Bounds
Framework	tf

Learning Video Object Segmentation from Static Images


Title	Learning Video Object Segmentation from Static Images
Authors	Anna Khoreva, Federico Perazzi, Rodrigo Benenson, Bernt Schiele, Alexander Sorkine-Hornung
Abstract	Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using a convnet trained with static images only. The key ingredient of our approach is a combination of offline and online learning strategies, where the former serves to produce a refined mask from the previous frame estimate and the latter allows to capture the appearance of the specific object instance. Our method can handle different types of input annotations: bounding boxes and segments, as well as incorporate multiple annotated frames, making the system suitable for diverse applications. We obtain competitive results on three different datasets, independently from the type of input annotation.
Tasks	Instance Segmentation, Object Tracking, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation, Visual Object Tracking
Published	2016-12-08
URL	http://arxiv.org/abs/1612.02646v1
PDF	http://arxiv.org/pdf/1612.02646v1.pdf
PWC	https://paperswithcode.com/paper/learning-video-object-segmentation-from
Repo	https://github.com/birdman9390/MetaMaskTrack
Framework	pytorch

SDP Relaxation with Randomized Rounding for Energy Disaggregation


Title	SDP Relaxation with Randomized Rounding for Energy Disaggregation
Authors	Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu
Abstract	We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find an approximate solution to the resulting quadratic integer program. Here we take a more principled approach, better suited to integer programming problems, and find an approximate optimum by combining convex semidefinite relaxations randomized rounding, as well as a scalable ADMM method that exploits the special structure of the resulting semidefinite program. Simulation results both in synthetic and real-world datasets demonstrate the superiority of our method.
Tasks
Published	2016-10-29
URL	http://arxiv.org/abs/1610.09491v1
PDF	http://arxiv.org/pdf/1610.09491v1.pdf
PWC	https://paperswithcode.com/paper/sdp-relaxation-with-randomized-rounding-for
Repo	https://github.com/kiarashshaloudegi/FHMM_inference
Framework	none

House price estimation from visual and textual features


Title	House price estimation from visual and textual features
Authors	Eman Ahmed, Mohamed Moustafa
Abstract	Most existing automatic house price estimation systems rely only on some textual data like its neighborhood area and the number of rooms. The final price is estimated by a human agent who visits the house and assesses it visually. In this paper, we propose extracting visual features from house photographs and combining them with the house’s textual information. The combined features are fed to a fully connected multilayer Neural Network (NN) that estimates the house price as its single output. To train and evaluate our network, we have collected the first houses dataset (to our knowledge) that combines both images and textual attributes. The dataset is composed of 535 sample houses from the state of California, USA. Our experiments showed that adding the visual features increased the R-value by a factor of 3 and decreased the Mean Square Error (MSE) by one order of magnitude compared with textual-only features. Additionally, when trained on the benchmark textual-only features housing dataset, our proposed NN still outperformed the existing model published results.
Tasks
Published	2016-09-27
URL	http://arxiv.org/abs/1609.08399v1
PDF	http://arxiv.org/pdf/1609.08399v1.pdf
PWC	https://paperswithcode.com/paper/house-price-estimation-from-visual-and
Repo	https://github.com/SPra03/Housedataset
Framework	none

Progressive Neural Networks


Title	Progressive Neural Networks
Authors	Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell
Abstract	Learning to solve complex sequences of tasks–while both leveraging transfer and avoiding catastrophic forgetting–remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
Tasks
Published	2016-06-15
URL	http://arxiv.org/abs/1606.04671v3
PDF	http://arxiv.org/pdf/1606.04671v3.pdf
PWC	https://paperswithcode.com/paper/progressive-neural-networks
Repo	https://github.com/GuangpingYuan/PNN_Pong_A3C
Framework	tf

Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation


Title	Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation
Authors	Yiyi Liao, Lichao Huang, Yue Wang, Sarath Kodagoda, Yinan Yu, Yong Liu
Abstract	Many standard robotic platforms are equipped with at least a fixed 2D laser range finder and a monocular camera. Although those platforms do not have sensors for 3D depth sensing capability, knowledge of depth is an essential part in many robotics activities. Therefore, recently, there is an increasing interest in depth estimation using monocular images. As this task is inherently ambiguous, the data-driven estimated depth might be unreliable in robotics applications. In this paper, we have attempted to improve the precision of monocular depth estimation by introducing 2D planar observation from the remaining laser range finder without extra cost. Specifically, we construct a dense reference map from the sparse laser range data, redefining the depth estimation task as estimating the distance between the real and the reference depth. To solve the problem, we construct a novel residual of residual neural network, and tightly combine the classification and regression losses for continuous depth estimation. Experimental results suggest that our method achieves considerable promotion compared to the state-of-the-art methods on both NYUD2 and KITTI, validating the effectiveness of our method on leveraging the additional sensory information. We further demonstrate the potential usage of our method in obstacle avoidance where our methodology provides comprehensive depth information compared to the solution using monocular camera or 2D laser range finder alone.
Tasks	Depth Completion, Depth Estimation
Published	2016-10-17
URL	http://arxiv.org/abs/1611.02174v1
PDF	http://arxiv.org/pdf/1611.02174v1.pdf
PWC	https://paperswithcode.com/paper/parse-geometry-from-a-line-monocular-depth
Repo	https://github.com/fangchangma/sparse-to-dense.pytorch
Framework	pytorch