Paper Group AWR 58
Fast $ε$-free Inference of Simulation Models with Bayesian Conditional Density Estimation. Commonly Uncommon: Semantic Sparsity in Situation Recognition. Does Multimodality Help Human and Machine for Translation and Image Captioning?. Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos. Joint Detection and Identification …
Fast $ε$-free Inference of Simulation Models with Bayesian Conditional Density Estimation
Title | Fast $ε$-free Inference of Simulation Models with Bayesian Conditional Density Estimation |
Authors | George Papamakarios, Iain Murray |
Abstract | Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an $\epsilon$-ball around the observed data, which is only correct in the limit $\epsilon!\rightarrow!0$. Monte Carlo methods can then draw samples from the approximate posterior to approximate predictions or error bars on parameters. These algorithms critically slow down as $\epsilon!\rightarrow!0$, and in practice draw samples from a broader distribution than the posterior. We propose a new approach to likelihood-free inference based on Bayesian conditional density estimation. Preliminary inferences based on limited simulation data are used to guide later simulations. In some cases, learning an accurate parametric representation of the entire true posterior distribution requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior. |
Tasks | Density Estimation |
Published | 2016-05-20 |
URL | http://arxiv.org/abs/1605.06376v4 |
http://arxiv.org/pdf/1605.06376v4.pdf | |
PWC | https://paperswithcode.com/paper/fast-free-inference-of-simulation-models-with |
Repo | https://github.com/gpapamak/epsilon_free_inference |
Framework | none |
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Title | Commonly Uncommon: Semantic Sparsity in Situation Recognition |
Authors | Mark Yatskar, Vicente Ordonez, Luke Zettlemoyer, Ali Farhadi |
Abstract | Semantic sparsity is a common challenge in structured visual classification problems; when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objects play within the activity. For this problem, we find empirically that most object-role combinations are rare, and current state-of-the-art models significantly underperform in this sparse data regime. We avoid many such errors by (1) introducing a novel tensor composition function that learns to share examples across role-noun combinations and (2) semantically augmenting our training data with automatically gathered examples of rarely observed outputs using web data. When integrated within a complete CRF-based structured prediction model, the tensor-based approach outperforms existing state of the art by a relative improvement of 2.11% and 4.40% on top-5 verb and noun-role accuracy, respectively. Adding 5 million images with our semantic augmentation techniques gives further relative improvements of 6.23% and 9.57% on top-5 verb and noun-role accuracy. |
Tasks | Structured Prediction |
Published | 2016-12-03 |
URL | http://arxiv.org/abs/1612.00901v1 |
http://arxiv.org/pdf/1612.00901v1.pdf | |
PWC | https://paperswithcode.com/paper/commonly-uncommon-semantic-sparsity-in |
Repo | https://github.com/my89/imSitu |
Framework | pytorch |
Does Multimodality Help Human and Machine for Translation and Image Captioning?
Title | Does Multimodality Help Human and Machine for Translation and Image Captioning? |
Authors | Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Joost van de Weijer |
Abstract | This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR. |
Tasks | Image Captioning, Machine Translation, Multimodal Machine Translation |
Published | 2016-05-30 |
URL | http://arxiv.org/abs/1605.09186v4 |
http://arxiv.org/pdf/1605.09186v4.pdf | |
PWC | https://paperswithcode.com/paper/does-multimodality-help-human-and-machine-for |
Repo | https://github.com/lium-lst/nmtpy |
Framework | none |
Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos
Title | Exploring Structure for Long-Term Tracking of Multiple Objects in Sports Videos |
Authors | Henrique Morimitsu, Isabelle Bloch, Roberto M. Cesar-Jr |
Abstract | In this paper, we propose a novel approach for exploiting structural relations to track multiple objects that may undergo long-term occlusion and abrupt motion. We use a model-free approach that relies only on annotations given in the first frame of the video to track all the objects online, i.e. without knowledge from future frames. We initialize a probabilistic Attributed Relational Graph (ARG) from the first frame, which is incrementally updated along the video. Instead of using the structural information only to evaluate the scene, the proposed approach considers it to generate new tracking hypotheses. In this way, our method is capable of generating relevant object candidates that are used to improve or recover the track of lost objects. The proposed method is evaluated on several videos of table tennis, volleyball, and on the ACASVA dataset. The results show that our approach is very robust, flexible and able to outperform other state-of-the-art methods in sports videos that present structural patterns. |
Tasks | |
Published | 2016-12-19 |
URL | http://arxiv.org/abs/1612.06454v1 |
http://arxiv.org/pdf/1612.06454v1.pdf | |
PWC | https://paperswithcode.com/paper/exploring-structure-for-long-term-tracking-of |
Repo | https://github.com/henriquem87/structured-graph-tracker |
Framework | none |
Joint Detection and Identification Feature Learning for Person Search
Title | Joint Detection and Identification Feature Learning for Person Search |
Authors | Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, Xiaogang Wang |
Abstract | Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries and candidates. However, it is different from real-world scenarios where the annotations of pedestrian bounding boxes are unavailable and the target person needs to be searched from a gallery of whole scene images. To close the gap, we propose a new deep learning framework for person search. Instead of breaking it down into two separate tasks—pedestrian detection and person re-identification, we jointly handle both aspects in a single convolutional neural network. An Online Instance Matching (OIM) loss function is proposed to train the network effectively, which is scalable to datasets with numerous identities. To validate our approach, we collect and annotate a large-scale benchmark dataset for person search. It contains 18,184 images, 8,432 identities, and 96,143 pedestrian bounding boxes. Experiments show that our framework outperforms other separate approaches, and the proposed OIM loss function converges much faster and better than the conventional Softmax loss. |
Tasks | Pedestrian Detection, Person Re-Identification, Person Search |
Published | 2016-04-07 |
URL | http://arxiv.org/abs/1604.01850v3 |
http://arxiv.org/pdf/1604.01850v3.pdf | |
PWC | https://paperswithcode.com/paper/joint-detection-and-identification-feature |
Repo | https://github.com/ShuangLI59/person_search |
Framework | none |
Detecting Text in Natural Image with Connectionist Text Proposal Network
Title | Detecting Text in Natural Image with Connectionist Text Proposal Network |
Authors | Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao |
Abstract | We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposal, considerably improving localization accuracy. The sequential proposals are naturally connected by a recurrent neural network, which is seamlessly incorporated into the convolutional network, resulting in an end-to-end trainable model. This allows the CTPN to explore rich context information of image, making it powerful to detect extremely ambiguous text. The CTPN works reliably on multi-scale and multi- language text without further post-processing, departing from previous bottom-up methods requiring multi-step post-processing. It achieves 0.88 and 0.61 F-measure on the ICDAR 2013 and 2015 benchmarks, surpass- ing recent results [8, 35] by a large margin. The CTPN is computationally efficient with 0:14s/image, by using the very deep VGG16 model [27]. Online demo is available at: http://textdet.com/. |
Tasks | Scene Text Detection |
Published | 2016-09-12 |
URL | http://arxiv.org/abs/1609.03605v1 |
http://arxiv.org/pdf/1609.03605v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-text-in-natural-image-with |
Repo | https://github.com/eragonruan/text-detection-ctpn |
Framework | tf |
Generalized Random Forests
Title | Generalized Random Forests |
Authors | Susan Athey, Julie Tibshirani, Stefan Wager |
Abstract | We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN. |
Tasks | |
Published | 2016-10-05 |
URL | http://arxiv.org/abs/1610.01271v4 |
http://arxiv.org/pdf/1610.01271v4.pdf | |
PWC | https://paperswithcode.com/paper/generalized-random-forests |
Repo | https://github.com/rajkumarkarthik/mgrf-develop |
Framework | none |
Deep Successor Reinforcement Learning
Title | Deep Successor Reinforcement Learning |
Authors | Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman |
Abstract | Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components – a reward predictor and a successor map. The successor map represents the expected future state occupancy from any given state and the reward predictor maps states to scalar rewards. The value function of a state can be computed as the inner product between the successor map and the reward weights. In this paper, we present DSR, which generalizes SR within an end-to-end deep reinforcement learning framework. DSR has several appealing properties including: increased sensitivity to distal reward changes due to factorization of reward and world dynamics, and the ability to extract bottleneck states (subgoals) given successor maps trained under a random policy. We show the efficacy of our approach on two diverse environments given raw pixel observations – simple grid-world domains (MazeBase) and the Doom game engine. |
Tasks | FPS Games, Game of Doom |
Published | 2016-06-08 |
URL | http://arxiv.org/abs/1606.02396v1 |
http://arxiv.org/pdf/1606.02396v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-successor-reinforcement-learning |
Repo | https://github.com/Ardavans/DSR |
Framework | none |
Single Pass PCA of Matrix Products
Title | Single Pass PCA of Matrix Products |
Authors | Shanshan Wu, Srinadh Bhojanapalli, Sujay Sanghavi, Alexandros G. Dimakis |
Abstract | In this paper we present a new algorithm for computing a low rank approximation of the product $A^TB$ by taking only a single pass of the two matrices $A$ and $B$. The straightforward way to do this is to (a) first sketch $A$ and $B$ individually, and then (b) find the top components using PCA on the sketch. Our algorithm in contrast retains additional summary information about $A,B$ (e.g. row and column norms etc.) and uses this additional information to obtain an improved approximation from the sketches. Our main analytical result establishes a comparable spectral norm guarantee to existing two-pass methods; in addition we also provide results from an Apache Spark implementation that shows better computational and statistical performance on real-world and synthetic evaluation datasets. |
Tasks | |
Published | 2016-10-21 |
URL | http://arxiv.org/abs/1610.06656v2 |
http://arxiv.org/pdf/1610.06656v2.pdf | |
PWC | https://paperswithcode.com/paper/single-pass-pca-of-matrix-products |
Repo | https://github.com/wushanshan/MatrixProductPCA |
Framework | none |
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
Title | f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization |
Authors | Sebastian Nowozin, Botond Cseke, Ryota Tomioka |
Abstract | Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models. |
Tasks | |
Published | 2016-06-02 |
URL | http://arxiv.org/abs/1606.00709v1 |
http://arxiv.org/pdf/1606.00709v1.pdf | |
PWC | https://paperswithcode.com/paper/f-gan-training-generative-neural-samplers |
Repo | https://github.com/mboudiaf/Mutual-Information-Variational-Bounds |
Framework | tf |
Learning Video Object Segmentation from Static Images
Title | Learning Video Object Segmentation from Static Images |
Authors | Anna Khoreva, Federico Perazzi, Rodrigo Benenson, Bernt Schiele, Alexander Sorkine-Hornung |
Abstract | Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce video object segmentation problem as a concept of guided instance segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using a convnet trained with static images only. The key ingredient of our approach is a combination of offline and online learning strategies, where the former serves to produce a refined mask from the previous frame estimate and the latter allows to capture the appearance of the specific object instance. Our method can handle different types of input annotations: bounding boxes and segments, as well as incorporate multiple annotated frames, making the system suitable for diverse applications. We obtain competitive results on three different datasets, independently from the type of input annotation. |
Tasks | Instance Segmentation, Object Tracking, Semantic Segmentation, Video Object Segmentation, Video Semantic Segmentation, Visual Object Tracking |
Published | 2016-12-08 |
URL | http://arxiv.org/abs/1612.02646v1 |
http://arxiv.org/pdf/1612.02646v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-video-object-segmentation-from |
Repo | https://github.com/birdman9390/MetaMaskTrack |
Framework | pytorch |
SDP Relaxation with Randomized Rounding for Energy Disaggregation
Title | SDP Relaxation with Randomized Rounding for Energy Disaggregation |
Authors | Kiarash Shaloudegi, András György, Csaba Szepesvári, Wilsun Xu |
Abstract | We develop a scalable, computationally efficient method for the task of energy disaggregation for home appliance monitoring. In this problem the goal is to estimate the energy consumption of each appliance over time based on the total energy-consumption signal of a household. The current state of the art is to model the problem as inference in factorial HMMs, and use quadratic programming to find an approximate solution to the resulting quadratic integer program. Here we take a more principled approach, better suited to integer programming problems, and find an approximate optimum by combining convex semidefinite relaxations randomized rounding, as well as a scalable ADMM method that exploits the special structure of the resulting semidefinite program. Simulation results both in synthetic and real-world datasets demonstrate the superiority of our method. |
Tasks | |
Published | 2016-10-29 |
URL | http://arxiv.org/abs/1610.09491v1 |
http://arxiv.org/pdf/1610.09491v1.pdf | |
PWC | https://paperswithcode.com/paper/sdp-relaxation-with-randomized-rounding-for |
Repo | https://github.com/kiarashshaloudegi/FHMM_inference |
Framework | none |
House price estimation from visual and textual features
Title | House price estimation from visual and textual features |
Authors | Eman Ahmed, Mohamed Moustafa |
Abstract | Most existing automatic house price estimation systems rely only on some textual data like its neighborhood area and the number of rooms. The final price is estimated by a human agent who visits the house and assesses it visually. In this paper, we propose extracting visual features from house photographs and combining them with the house’s textual information. The combined features are fed to a fully connected multilayer Neural Network (NN) that estimates the house price as its single output. To train and evaluate our network, we have collected the first houses dataset (to our knowledge) that combines both images and textual attributes. The dataset is composed of 535 sample houses from the state of California, USA. Our experiments showed that adding the visual features increased the R-value by a factor of 3 and decreased the Mean Square Error (MSE) by one order of magnitude compared with textual-only features. Additionally, when trained on the benchmark textual-only features housing dataset, our proposed NN still outperformed the existing model published results. |
Tasks | |
Published | 2016-09-27 |
URL | http://arxiv.org/abs/1609.08399v1 |
http://arxiv.org/pdf/1609.08399v1.pdf | |
PWC | https://paperswithcode.com/paper/house-price-estimation-from-visual-and |
Repo | https://github.com/SPra03/Housedataset |
Framework | none |
Progressive Neural Networks
Title | Progressive Neural Networks |
Authors | Andrei A. Rusu, Neil C. Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, Raia Hadsell |
Abstract | Learning to solve complex sequences of tasks–while both leveraging transfer and avoiding catastrophic forgetting–remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy. |
Tasks | |
Published | 2016-06-15 |
URL | http://arxiv.org/abs/1606.04671v3 |
http://arxiv.org/pdf/1606.04671v3.pdf | |
PWC | https://paperswithcode.com/paper/progressive-neural-networks |
Repo | https://github.com/GuangpingYuan/PNN_Pong_A3C |
Framework | tf |
Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation
Title | Parse Geometry from a Line: Monocular Depth Estimation with Partial Laser Observation |
Authors | Yiyi Liao, Lichao Huang, Yue Wang, Sarath Kodagoda, Yinan Yu, Yong Liu |
Abstract | Many standard robotic platforms are equipped with at least a fixed 2D laser range finder and a monocular camera. Although those platforms do not have sensors for 3D depth sensing capability, knowledge of depth is an essential part in many robotics activities. Therefore, recently, there is an increasing interest in depth estimation using monocular images. As this task is inherently ambiguous, the data-driven estimated depth might be unreliable in robotics applications. In this paper, we have attempted to improve the precision of monocular depth estimation by introducing 2D planar observation from the remaining laser range finder without extra cost. Specifically, we construct a dense reference map from the sparse laser range data, redefining the depth estimation task as estimating the distance between the real and the reference depth. To solve the problem, we construct a novel residual of residual neural network, and tightly combine the classification and regression losses for continuous depth estimation. Experimental results suggest that our method achieves considerable promotion compared to the state-of-the-art methods on both NYUD2 and KITTI, validating the effectiveness of our method on leveraging the additional sensory information. We further demonstrate the potential usage of our method in obstacle avoidance where our methodology provides comprehensive depth information compared to the solution using monocular camera or 2D laser range finder alone. |
Tasks | Depth Completion, Depth Estimation |
Published | 2016-10-17 |
URL | http://arxiv.org/abs/1611.02174v1 |
http://arxiv.org/pdf/1611.02174v1.pdf | |
PWC | https://paperswithcode.com/paper/parse-geometry-from-a-line-monocular-depth |
Repo | https://github.com/fangchangma/sparse-to-dense.pytorch |
Framework | pytorch |