October 20, 2019

3089 words 15 mins read

Paper Group AWR 304

Paper Group AWR 304

SceneEDNet: A Deep Learning Approach for Scene Flow Estimation. LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics. Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications. Neural Guided Constraint Logic Programming for Program Synthesis. Automatic Program Synthesi …

SceneEDNet: A Deep Learning Approach for Scene Flow Estimation

Title SceneEDNet: A Deep Learning Approach for Scene Flow Estimation
Authors Ravi Kumar Thakur, Snehasis Mukherjee
Abstract Estimating scene flow in RGB-D videos is attracting much interest of the computer vision researchers, due to its potential applications in robotics. The state-of-the-art techniques for scene flow estimation, typically rely on the knowledge of scene structure of the frame and the correspondence between frames. However, with the increasing amount of RGB-D data captured from sophisticated sensors like Microsoft Kinect, and the recent advances in the area of sophisticated deep learning techniques, introduction of an efficient deep learning technique for scene flow estimation, is becoming important. This paper introduces a first effort to apply a deep learning method for direct estimation of scene flow by presenting a fully convolutional neural network with an encoder-decoder (ED) architecture. The proposed network SceneEDNet involves estimation of three dimensional motion vectors of all the scene points from sequence of stereo images. The training for direct estimation of scene flow is done using consecutive pairs of stereo images and corresponding scene flow ground truth. The proposed architecture is applied on a huge dataset and provides meaningful results.
Tasks Scene Flow Estimation
Published 2018-07-10
URL http://arxiv.org/abs/1807.03464v1
PDF http://arxiv.org/pdf/1807.03464v1.pdf
PWC https://paperswithcode.com/paper/sceneednet-a-deep-learning-approach-for-scene
Repo https://github.com/ravikt/sceneednet
Framework none

LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics

Title LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
Authors Sourav Garg, Niko Suenderhauf, Michael Milford
Abstract Human visual scene understanding is so remarkable that we are able to recognize a revisited place when entering it from the opposite direction it was first visited, even in the presence of extreme variations in appearance. This capability is especially apparent during driving: a human driver can recognize where they are when travelling in the reverse direction along a route for the first time, without having to turn back and look. The difficulty of this problem exceeds any addressed in past appearance- and viewpoint-invariant visual place recognition (VPR) research, in part because large parts of the scene are not commonly observable from opposite directions. Consequently, as shown in this paper, the precision-recall performance of current state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders of magnitude below what would be usable in a closed-loop system. Current engineered solutions predominantly rely on panoramic camera or LIDAR sensing setups; an eminently suitable engineering solution but one that is clearly very different to how humans navigate, which also has implications for how naturally humans could interact and communicate with the navigation system. In this paper we develop a suite of novel semantic- and appearance-based techniques to enable for the first time high performance place recognition in this challenging scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of images using the convolutional feature maps from a state-of-the-art dense semantic segmentation network. Then, to verify the spatial semantic arrangement of the top matching candidates, we develop a novel approach for mining semantically-salient keypoint correspondences.
Tasks Scene Understanding, Semantic Segmentation, Visual Place Recognition
Published 2018-04-16
URL http://arxiv.org/abs/1804.05526v3
PDF http://arxiv.org/pdf/1804.05526v3.pdf
PWC https://paperswithcode.com/paper/lost-appearance-invariant-place-recognition
Repo https://github.com/oravus/lostX
Framework none

Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications

Title Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications
Authors Xiaoxiao Du, Alina Zare
Abstract In classifier (or regression) fusion the aim is to combine the outputs of several algorithms to boost overall performance. Standard supervised fusion algorithms often require accurate and precise training labels. However, accurate labels may be difficult to obtain in many remote sensing applications. This paper proposes novel classification and regression fusion models that can be trained given ambiguosly and imprecisely labeled training data in which training labels are associated with sets of data points (i.e., “bags”) instead of individual data points (i.e., “instances”) following a multiple instance learning framework. Experiments were conducted based on the proposed algorithms on both synthetic data and applications such as target detection and crop yield prediction given remote sensing data. The proposed algorithms show effective classification and regression performance.
Tasks Multiple Instance Learning
Published 2018-03-11
URL http://arxiv.org/abs/1803.04048v2
PDF http://arxiv.org/pdf/1803.04048v2.pdf
PWC https://paperswithcode.com/paper/multiple-instance-choquet-integral-classifier
Repo https://github.com/GatorSense/MICI
Framework none

Neural Guided Constraint Logic Programming for Program Synthesis

Title Neural Guided Constraint Logic Programming for Program Synthesis
Authors Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William E. Byrd, Matthew Might, Raquel Urtasun, Richard Zemel
Abstract Synthesizing programs using example input/outputs is a classic problem in artificial intelligence. We present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren. Crucially, the neural model uses miniKanren’s internal representation as input; miniKanren represents a PBE problem as recursive constraints imposed by the provided examples. We explore Recurrent Neural Network and Graph Neural Network models. We contribute a modified miniKanren, drivable by an external agent, available at https://github.com/xuexue/neuralkanren. We show that our neural-guided approach using constraints can synthesize programs faster in many cases, and importantly, can generalize to larger problems.
Tasks Program Synthesis
Published 2018-09-08
URL http://arxiv.org/abs/1809.02840v3
PDF http://arxiv.org/pdf/1809.02840v3.pdf
PWC https://paperswithcode.com/paper/neural-guided-constraint-logic-programming
Repo https://github.com/xuexue/neuralkanren
Framework pytorch

Automatic Program Synthesis of Long Programs with a Learned Garbage Collector

Title Automatic Program Synthesis of Long Programs with a Learned Garbage Collector
Authors Amit Zohar, Lior Wolf
Abstract We consider the problem of generating automatic code given sample input-output pairs. We train a neural network to map from the current state and the outputs to the program’s next statement. The neural network optimizes multiple tasks concurrently: the next operation out of a set of high level commands, the operands of the next statement, and which variables can be dropped from memory. Using our method we are able to create programs that are more than twice as long as existing state-of-the-art solutions, while improving the success rate for comparable lengths, and cutting the run-time by two orders of magnitude. Our code, including an implementation of various literature baselines, is publicly available at https://github.com/amitz25/PCCoder
Tasks Program Synthesis
Published 2018-09-12
URL http://arxiv.org/abs/1809.04682v2
PDF http://arxiv.org/pdf/1809.04682v2.pdf
PWC https://paperswithcode.com/paper/automatic-program-synthesis-of-long-programs
Repo https://github.com/amitz25/PCCoder
Framework pytorch

The Mirage of Action-Dependent Baselines in Reinforcement Learning

Title The Mirage of Action-Dependent Baselines in Reinforcement Learning
Authors George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine
Abstract Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the previously observed empirical gains. Furthermore, the variance decomposition highlights areas for improvement, which we demonstrate by illustrating a simple change to the typical value function parameterization that can significantly improve performance.
Tasks Policy Gradient Methods
Published 2018-02-27
URL http://arxiv.org/abs/1802.10031v3
PDF http://arxiv.org/pdf/1802.10031v3.pdf
PWC https://paperswithcode.com/paper/the-mirage-of-action-dependent-baselines-in
Repo https://github.com/brain-research/mirage-rl
Framework tf

Training and Refining Deep Learning Based Denoisers without Ground Truth Data

Title Training and Refining Deep Learning Based Denoisers without Ground Truth Data
Authors Shakarim Soltanayev, Se Young Chun
Abstract Recently developed deep-learning-based denoisers often outperform state-of-the-art conventional denoisers such as the BM3D. They are typically trained to minimize the mean squared error (MSE) between the output image of a deep neural network (DNN) and a ground truth image. Thus, it is important for deep-learning-based denoisers to use high quality noiseless ground truth data for high performance. However, it is often challenging or even infeasible to obtain noiseless images in some applications. Here, we propose a method based on Stein’s unbiased risk estimator (SURE) for training DNN denoisers based only on the use of noisy images in the training data with Gaussian noise. We demonstrate that our SURE-based method, without the use of ground truth data, is able to train DNN denoisers to yield performances close to those networks trained with ground truth for both grayscale and color images. We also propose a SURE-based refining method with a noisy test image for further performance improvement. Our quick refining method outperformed conventional BM3D, deep image prior, and often the networks trained with ground truth. Potential extension of our SURE-based methods to Poisson noise model was also investigated.
Tasks Image Denoising
Published 2018-03-04
URL http://arxiv.org/abs/1803.01314v3
PDF http://arxiv.org/pdf/1803.01314v3.pdf
PWC https://paperswithcode.com/paper/training-deep-learning-based-denoisers
Repo https://github.com/Shakarim94/Net-SURE
Framework tf

Learning Human-Object Interactions by Graph Parsing Neural Networks

Title Learning Human-Object Interactions by Graph Parsing Neural Networks
Authors Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu
Abstract This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings. The code is available at https://github.com/SiyuanQi/gpnn.
Tasks Human-Object Interaction Detection
Published 2018-08-23
URL http://arxiv.org/abs/1808.07962v1
PDF http://arxiv.org/pdf/1808.07962v1.pdf
PWC https://paperswithcode.com/paper/learning-human-object-interactions-by-graph
Repo https://github.com/SiyuanQi/gpnn
Framework pytorch

Multi-Level Contextual Network for Biomedical Image Segmentation

Title Multi-Level Contextual Network for Biomedical Image Segmentation
Authors Amirhossein Dadashzadeh, Alireza Tavakoli Targhi
Abstract Accurate and reliable image segmentation is an essential part of biomedical image analysis. In this paper, we consider the problem of biomedical image segmentation using deep convolutional neural networks. We propose a new end-to-end network architecture that effectively integrates local and global contextual patterns of histologic primitives to obtain a more reliable segmentation result. Specifically, we introduce a deep fully convolution residual network with a new skip connection strategy to control the contextual information passed forward. Moreover, our trained model is also computationally inexpensive due to its small number of network parameters. We evaluate our method on two public datasets for epithelium segmentation and tubule segmentation tasks. Our experimental results show that the proposed method provides a fast and effective way of producing a pixel-wise dense prediction of biomedical images.
Tasks Semantic Segmentation
Published 2018-09-30
URL http://arxiv.org/abs/1810.00327v1
PDF http://arxiv.org/pdf/1810.00327v1.pdf
PWC https://paperswithcode.com/paper/multi-level-contextual-network-for-biomedical
Repo https://github.com/Plrbear/biomedical-image-segmentation
Framework tf

Reachability Analysis of Deep Neural Networks with Provable Guarantees

Title Reachability Analysis of Deep Neural Networks with Provable Guarantees
Authors Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska
Abstract Verifying correctness of deep neural networks (DNNs) is challenging. We study a generic reachability problem for feed-forward DNNs which, for a given set of inputs to the network and a Lipschitz-continuous function over its outputs, computes the lower and upper bound on the function values. Because the network and the function are Lipschitz continuous, all values in the interval between the lower and upper bound are reachable. We show how to obtain the safety verification problem, the output range analysis problem and a robustness measure by instantiating the reachability problem. We present a novel algorithm based on adaptive nested optimisation to solve the reachability problem. The technique has been implemented and evaluated on a range of DNNs, demonstrating its efficiency, scalability and ability to handle a broader class of networks than state-of-the-art verification approaches.
Tasks
Published 2018-05-06
URL http://arxiv.org/abs/1805.02242v1
PDF http://arxiv.org/pdf/1805.02242v1.pdf
PWC https://paperswithcode.com/paper/reachability-analysis-of-deep-neural-networks
Repo https://github.com/trustAI/DeepGO
Framework none

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Title Multimodal Sentiment Analysis To Explore the Structure of Emotions
Authors Anthony Hu, Seth Flaxman
Abstract We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as “self-reported emotions.” We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model’s results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.
Tasks Multimodal Sentiment Analysis, Sentiment Analysis
Published 2018-05-25
URL http://arxiv.org/abs/1805.10205v1
PDF http://arxiv.org/pdf/1805.10205v1.pdf
PWC https://paperswithcode.com/paper/multimodal-sentiment-analysis-to-explore-the
Repo https://github.com/anthonyhu/tumblr-emotions
Framework tf

Efficient Exploration through Bayesian Deep Q-Networks

Title Efficient Exploration through Bayesian Deep Q-Networks
Authors Kamyar Azizzadenesheli, Animashree Anandkumar
Abstract We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d sqrt{T}) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear regression, resulting in an approximated posterior over Q-function. This allows us to directly incorporate the uncertainty over the Q-function and deploy Thompson sampling on the learned posterior distribution resulting in efficient exploration/exploitation trade-off. We empirically study the behavior of BDQN on a wide range of Atari games. Since BDQN carries out more efficient exploration and exploitation, it is able to reach higher return substantially faster compared to DDQN.
Tasks Atari Games, Efficient Exploration
Published 2018-02-13
URL https://arxiv.org/abs/1802.04412v4
PDF https://arxiv.org/pdf/1802.04412v4.pdf
PWC https://paperswithcode.com/paper/efficient-exploration-through-bayesian-deep-q
Repo https://github.com/kazizzad/BDQN-MxNet-Gluon
Framework mxnet

Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network

Title Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network
Authors Huicheng Liu
Abstract Stock market prediction is one of the most attractive research topic since the successful prediction on the market’s future movement leads to significant profit. Traditional short term stock market predictions are usually based on the analysis of historical market data, such as stock prices, moving averages or daily returns. However, financial news also contains useful information on public companies and the market. Existing methods in finance literature exploit sentiment signal features, which are limited by not considering factors such as events and the news context. We address this issue by leveraging deep neural models to extract rich semantic features from news text. In particular, a Bidirectional-LSTM are used to encode the news text and capture the context information, self attention mechanism are applied to distribute attention on most relative words, news and days. In terms of predicting directional changes in both Standard & Poor’s 500 index and individual companies stock price, we show that this technique is competitive with other state of the art approaches, demonstrating the effectiveness of recent NLP technology advances for computational finance.
Tasks Stock Market Prediction, Stock Trend Prediction
Published 2018-11-15
URL http://arxiv.org/abs/1811.06173v1
PDF http://arxiv.org/pdf/1811.06173v1.pdf
PWC https://paperswithcode.com/paper/leveraging-financial-news-for-stock-trend
Repo https://github.com/maobubu/stock-prediction
Framework none

Deep Geometric Prior for Surface Reconstruction

Title Deep Geometric Prior for Surface Reconstruction
Authors Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, Daniele Panozzo
Abstract The reconstruction of a discrete surface from a point cloud is a fundamental geometry processing problem that has been studied for decades, with many methods developed. We propose the use of a deep neural network as a geometric prior for surface reconstruction. Specifically, we overfit a neural network representing a local chart parameterization to part of an input point cloud using the Wasserstein distance as a measure of approximation. By jointly fitting many such networks to overlapping parts of the point cloud, while enforcing a consistency condition, we compute a manifold atlas. By sampling this atlas, we can produce a dense reconstruction of the surface approximating the input cloud. The entire procedure does not require any training data or explicit regularization, yet, we show that it is able to perform remarkably well: not introducing typical overfitting artifacts, and approximating sharp features closely at the same time. We experimentally show that this geometric prior produces good results for both man-made objects containing sharp features and smoother organic objects, as well as noisy inputs. We compare our method with a number of well-known reconstruction methods on a standard surface reconstruction benchmark.
Tasks
Published 2018-11-27
URL http://arxiv.org/abs/1811.10943v2
PDF http://arxiv.org/pdf/1811.10943v2.pdf
PWC https://paperswithcode.com/paper/deep-geometric-prior-for-surface
Repo https://github.com/fwilliams/deep-geometric-prior
Framework pytorch

Sports Camera Calibration via Synthetic Data

Title Sports Camera Calibration via Synthetic Data
Authors Jianhui Chen, James J. Little
Abstract Calibrating sports cameras is important for autonomous broadcasting and sports analysis. Here we propose a highly automatic method for calibrating sports cameras from a single image using synthetic data. First, we develop a novel camera pose engine. The camera pose engine has only three significant free parameters so that it can effectively generate a lot of camera poses and corresponding edge (i.e, field marking) images. Then, we learn compact deep features via a siamese network from paired edge image and camera pose and build a feature-pose database. After that, we use a novel two-GAN (generative adversarial network) model to detect field markings in real images. Finally, we query an initial camera pose from the feature-pose database and refine camera poses using truncated distance images. We evaluate our method on both synthetic and real data. Our method not only demonstrates the robustness on the synthetic data but also achieves the state-of-the-art accuracy on a standard soccer dataset and very high performance on a volleyball dataset.
Tasks Calibration
Published 2018-10-25
URL http://arxiv.org/abs/1810.10658v1
PDF http://arxiv.org/pdf/1810.10658v1.pdf
PWC https://paperswithcode.com/paper/sports-camera-calibration-via-synthetic-data
Repo https://github.com/lood339/SCCvSD
Framework pytorch
comments powered by Disqus