October 20, 2019

3089 words 15 mins read

Paper Group AWR 304

SceneEDNet: A Deep Learning Approach for Scene Flow Estimation. LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics. Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications. Neural Guided Constraint Logic Programming for Program Synthesis. Automatic Program Synthesi …

SceneEDNet: A Deep Learning Approach for Scene Flow Estimation


Title	SceneEDNet: A Deep Learning Approach for Scene Flow Estimation
Authors	Ravi Kumar Thakur, Snehasis Mukherjee
Abstract	Estimating scene flow in RGB-D videos is attracting much interest of the computer vision researchers, due to its potential applications in robotics. The state-of-the-art techniques for scene flow estimation, typically rely on the knowledge of scene structure of the frame and the correspondence between frames. However, with the increasing amount of RGB-D data captured from sophisticated sensors like Microsoft Kinect, and the recent advances in the area of sophisticated deep learning techniques, introduction of an efficient deep learning technique for scene flow estimation, is becoming important. This paper introduces a first effort to apply a deep learning method for direct estimation of scene flow by presenting a fully convolutional neural network with an encoder-decoder (ED) architecture. The proposed network SceneEDNet involves estimation of three dimensional motion vectors of all the scene points from sequence of stereo images. The training for direct estimation of scene flow is done using consecutive pairs of stereo images and corresponding scene flow ground truth. The proposed architecture is applied on a huge dataset and provides meaningful results.
Tasks	Scene Flow Estimation
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03464v1
PDF	http://arxiv.org/pdf/1807.03464v1.pdf
PWC	https://paperswithcode.com/paper/sceneednet-a-deep-learning-approach-for-scene
Repo	https://github.com/ravikt/sceneednet
Framework	none

LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics


Title	LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
Authors	Sourav Garg, Niko Suenderhauf, Michael Milford
Abstract	Human visual scene understanding is so remarkable that we are able to recognize a revisited place when entering it from the opposite direction it was first visited, even in the presence of extreme variations in appearance. This capability is especially apparent during driving: a human driver can recognize where they are when travelling in the reverse direction along a route for the first time, without having to turn back and look. The difficulty of this problem exceeds any addressed in past appearance- and viewpoint-invariant visual place recognition (VPR) research, in part because large parts of the scene are not commonly observable from opposite directions. Consequently, as shown in this paper, the precision-recall performance of current state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders of magnitude below what would be usable in a closed-loop system. Current engineered solutions predominantly rely on panoramic camera or LIDAR sensing setups; an eminently suitable engineering solution but one that is clearly very different to how humans navigate, which also has implications for how naturally humans could interact and communicate with the navigation system. In this paper we develop a suite of novel semantic- and appearance-based techniques to enable for the first time high performance place recognition in this challenging scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of images using the convolutional feature maps from a state-of-the-art dense semantic segmentation network. Then, to verify the spatial semantic arrangement of the top matching candidates, we develop a novel approach for mining semantically-salient keypoint correspondences.
Tasks	Scene Understanding, Semantic Segmentation, Visual Place Recognition
Published	2018-04-16
URL	http://arxiv.org/abs/1804.05526v3
PDF	http://arxiv.org/pdf/1804.05526v3.pdf
PWC	https://paperswithcode.com/paper/lost-appearance-invariant-place-recognition
Repo	https://github.com/oravus/lostX
Framework	none

Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications


Title	Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications
Authors	Xiaoxiao Du, Alina Zare
Abstract	In classifier (or regression) fusion the aim is to combine the outputs of several algorithms to boost overall performance. Standard supervised fusion algorithms often require accurate and precise training labels. However, accurate labels may be difficult to obtain in many remote sensing applications. This paper proposes novel classification and regression fusion models that can be trained given ambiguosly and imprecisely labeled training data in which training labels are associated with sets of data points (i.e., “bags”) instead of individual data points (i.e., “instances”) following a multiple instance learning framework. Experiments were conducted based on the proposed algorithms on both synthetic data and applications such as target detection and crop yield prediction given remote sensing data. The proposed algorithms show effective classification and regression performance.
Tasks	Multiple Instance Learning
Published	2018-03-11
URL	http://arxiv.org/abs/1803.04048v2
PDF	http://arxiv.org/pdf/1803.04048v2.pdf
PWC	https://paperswithcode.com/paper/multiple-instance-choquet-integral-classifier
Repo	https://github.com/GatorSense/MICI
Framework	none

Neural Guided Constraint Logic Programming for Program Synthesis


Title	Neural Guided Constraint Logic Programming for Program Synthesis
Authors	Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William E. Byrd, Matthew Might, Raquel Urtasun, Richard Zemel
Abstract	Synthesizing programs using example input/outputs is a classic problem in artificial intelligence. We present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren. Crucially, the neural model uses miniKanren’s internal representation as input; miniKanren represents a PBE problem as recursive constraints imposed by the provided examples. We explore Recurrent Neural Network and Graph Neural Network models. We contribute a modified miniKanren, drivable by an external agent, available at https://github.com/xuexue/neuralkanren. We show that our neural-guided approach using constraints can synthesize programs faster in many cases, and importantly, can generalize to larger problems.
Tasks	Program Synthesis
Published	2018-09-08
URL	http://arxiv.org/abs/1809.02840v3
PDF	http://arxiv.org/pdf/1809.02840v3.pdf
PWC	https://paperswithcode.com/paper/neural-guided-constraint-logic-programming
Repo	https://github.com/xuexue/neuralkanren
Framework	pytorch

Automatic Program Synthesis of Long Programs with a Learned Garbage Collector


Title	Automatic Program Synthesis of Long Programs with a Learned Garbage Collector
Authors	Amit Zohar, Lior Wolf
Abstract	We consider the problem of generating automatic code given sample input-output pairs. We train a neural network to map from the current state and the outputs to the program’s next statement. The neural network optimizes multiple tasks concurrently: the next operation out of a set of high level commands, the operands of the next statement, and which variables can be dropped from memory. Using our method we are able to create programs that are more than twice as long as existing state-of-the-art solutions, while improving the success rate for comparable lengths, and cutting the run-time by two orders of magnitude. Our code, including an implementation of various literature baselines, is publicly available at https://github.com/amitz25/PCCoder
Tasks	Program Synthesis
Published	2018-09-12
URL	http://arxiv.org/abs/1809.04682v2
PDF	http://arxiv.org/pdf/1809.04682v2.pdf
PWC	https://paperswithcode.com/paper/automatic-program-synthesis-of-long-programs
Repo	https://github.com/amitz25/PCCoder
Framework	pytorch

The Mirage of Action-Dependent Baselines in Reinforcement Learning


Title	The Mirage of Action-Dependent Baselines in Reinforcement Learning
Authors	George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine
Abstract	Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the previously observed empirical gains. Furthermore, the variance decomposition highlights areas for improvement, which we demonstrate by illustrating a simple change to the typical value function parameterization that can significantly improve performance.
Tasks	Policy Gradient Methods
Published	2018-02-27
URL	http://arxiv.org/abs/1802.10031v3
PDF	http://arxiv.org/pdf/1802.10031v3.pdf
PWC	https://paperswithcode.com/paper/the-mirage-of-action-dependent-baselines-in
Repo	https://github.com/brain-research/mirage-rl
Framework	tf

Training and Refining Deep Learning Based Denoisers without Ground Truth Data


Title	Training and Refining Deep Learning Based Denoisers without Ground Truth Data
Authors	Shakarim Soltanayev, Se Young Chun
Abstract	Recently developed deep-learning-based denoisers often outperform state-of-the-art conventional denoisers such as the BM3D. They are typically trained to minimize the mean squared error (MSE) between the output image of a deep neural network (DNN) and a ground truth image. Thus, it is important for deep-learning-based denoisers to use high quality noiseless ground truth data for high performance. However, it is often challenging or even infeasible to obtain noiseless images in some applications. Here, we propose a method based on Stein’s unbiased risk estimator (SURE) for training DNN denoisers based only on the use of noisy images in the training data with Gaussian noise. We demonstrate that our SURE-based method, without the use of ground truth data, is able to train DNN denoisers to yield performances close to those networks trained with ground truth for both grayscale and color images. We also propose a SURE-based refining method with a noisy test image for further performance improvement. Our quick refining method outperformed conventional BM3D, deep image prior, and often the networks trained with ground truth. Potential extension of our SURE-based methods to Poisson noise model was also investigated.
Tasks	Image Denoising
Published	2018-03-04
URL	http://arxiv.org/abs/1803.01314v3
PDF	http://arxiv.org/pdf/1803.01314v3.pdf
PWC	https://paperswithcode.com/paper/training-deep-learning-based-denoisers
Repo	https://github.com/Shakarim94/Net-SURE
Framework	tf

Learning Human-Object Interactions by Graph Parsing Neural Networks


Title	Learning Human-Object Interactions by Graph Parsing Neural Networks
Authors	Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu
Abstract	This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings. The code is available at https://github.com/SiyuanQi/gpnn.
Tasks	Human-Object Interaction Detection
Published	2018-08-23
URL	http://arxiv.org/abs/1808.07962v1
PDF	http://arxiv.org/pdf/1808.07962v1.pdf
PWC	https://paperswithcode.com/paper/learning-human-object-interactions-by-graph
Repo	https://github.com/SiyuanQi/gpnn
Framework	pytorch

Multi-Level Contextual Network for Biomedical Image Segmentation


Title	Multi-Level Contextual Network for Biomedical Image Segmentation
Authors	Amirhossein Dadashzadeh, Alireza Tavakoli Targhi
Abstract	Accurate and reliable image segmentation is an essential part of biomedical image analysis. In this paper, we consider the problem of biomedical image segmentation using deep convolutional neural networks. We propose a new end-to-end network architecture that effectively integrates local and global contextual patterns of histologic primitives to obtain a more reliable segmentation result. Specifically, we introduce a deep fully convolution residual network with a new skip connection strategy to control the contextual information passed forward. Moreover, our trained model is also computationally inexpensive due to its small number of network parameters. We evaluate our method on two public datasets for epithelium segmentation and tubule segmentation tasks. Our experimental results show that the proposed method provides a fast and effective way of producing a pixel-wise dense prediction of biomedical images.
Tasks	Semantic Segmentation
Published	2018-09-30
URL	http://arxiv.org/abs/1810.00327v1
PDF	http://arxiv.org/pdf/1810.00327v1.pdf
PWC	https://paperswithcode.com/paper/multi-level-contextual-network-for-biomedical
Repo	https://github.com/Plrbear/biomedical-image-segmentation
Framework	tf

Reachability Analysis of Deep Neural Networks with Provable Guarantees


Title	Reachability Analysis of Deep Neural Networks with Provable Guarantees
Authors	Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska
Abstract	Verifying correctness of deep neural networks (DNNs) is challenging. We study a generic reachability problem for feed-forward DNNs which, for a given set of inputs to the network and a Lipschitz-continuous function over its outputs, computes the lower and upper bound on the function values. Because the network and the function are Lipschitz continuous, all values in the interval between the lower and upper bound are reachable. We show how to obtain the safety verification problem, the output range analysis problem and a robustness measure by instantiating the reachability problem. We present a novel algorithm based on adaptive nested optimisation to solve the reachability problem. The technique has been implemented and evaluated on a range of DNNs, demonstrating its efficiency, scalability and ability to handle a broader class of networks than state-of-the-art verification approaches.
Tasks
Published	2018-05-06
URL	http://arxiv.org/abs/1805.02242v1
PDF	http://arxiv.org/pdf/1805.02242v1.pdf
PWC	https://paperswithcode.com/paper/reachability-analysis-of-deep-neural-networks
Repo	https://github.com/trustAI/DeepGO
Framework	none

Multimodal Sentiment Analysis To Explore the Structure of Emotions


Title	Multimodal Sentiment Analysis To Explore the Structure of Emotions
Authors	Anthony Hu, Seth Flaxman
Abstract	We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as “self-reported emotions.” We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model’s results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks.
Tasks	Multimodal Sentiment Analysis, Sentiment Analysis
Published	2018-05-25
URL	http://arxiv.org/abs/1805.10205v1
PDF	http://arxiv.org/pdf/1805.10205v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-sentiment-analysis-to-explore-the
Repo	https://github.com/anthonyhu/tumblr-emotions
Framework	tf

Efficient Exploration through Bayesian Deep Q-Networks


Title	Efficient Exploration through Bayesian Deep Q-Networks
Authors	Kamyar Azizzadenesheli, Animashree Anandkumar
Abstract	We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d sqrt{T}) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear regression, resulting in an approximated posterior over Q-function. This allows us to directly incorporate the uncertainty over the Q-function and deploy Thompson sampling on the learned posterior distribution resulting in efficient exploration/exploitation trade-off. We empirically study the behavior of BDQN on a wide range of Atari games. Since BDQN carries out more efficient exploration and exploitation, it is able to reach higher return substantially faster compared to DDQN.
Tasks	Atari Games, Efficient Exploration
Published	2018-02-13
URL	https://arxiv.org/abs/1802.04412v4
PDF	https://arxiv.org/pdf/1802.04412v4.pdf
PWC	https://paperswithcode.com/paper/efficient-exploration-through-bayesian-deep-q
Repo	https://github.com/kazizzad/BDQN-MxNet-Gluon
Framework	mxnet

Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network


Title	Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network
Authors	Huicheng Liu
Abstract	Stock market prediction is one of the most attractive research topic since the successful prediction on the market’s future movement leads to significant profit. Traditional short term stock market predictions are usually based on the analysis of historical market data, such as stock prices, moving averages or daily returns. However, financial news also contains useful information on public companies and the market. Existing methods in finance literature exploit sentiment signal features, which are limited by not considering factors such as events and the news context. We address this issue by leveraging deep neural models to extract rich semantic features from news text. In particular, a Bidirectional-LSTM are used to encode the news text and capture the context information, self attention mechanism are applied to distribute attention on most relative words, news and days. In terms of predicting directional changes in both Standard & Poor’s 500 index and individual companies stock price, we show that this technique is competitive with other state of the art approaches, demonstrating the effectiveness of recent NLP technology advances for computational finance.
Tasks	Stock Market Prediction, Stock Trend Prediction
Published	2018-11-15
URL	http://arxiv.org/abs/1811.06173v1
PDF	http://arxiv.org/pdf/1811.06173v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-financial-news-for-stock-trend
Repo	https://github.com/maobubu/stock-prediction
Framework	none

Deep Geometric Prior for Surface Reconstruction


Title	Deep Geometric Prior for Surface Reconstruction
Authors	Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, Daniele Panozzo
Abstract	The reconstruction of a discrete surface from a point cloud is a fundamental geometry processing problem that has been studied for decades, with many methods developed. We propose the use of a deep neural network as a geometric prior for surface reconstruction. Specifically, we overfit a neural network representing a local chart parameterization to part of an input point cloud using the Wasserstein distance as a measure of approximation. By jointly fitting many such networks to overlapping parts of the point cloud, while enforcing a consistency condition, we compute a manifold atlas. By sampling this atlas, we can produce a dense reconstruction of the surface approximating the input cloud. The entire procedure does not require any training data or explicit regularization, yet, we show that it is able to perform remarkably well: not introducing typical overfitting artifacts, and approximating sharp features closely at the same time. We experimentally show that this geometric prior produces good results for both man-made objects containing sharp features and smoother organic objects, as well as noisy inputs. We compare our method with a number of well-known reconstruction methods on a standard surface reconstruction benchmark.
Tasks
Published	2018-11-27
URL	http://arxiv.org/abs/1811.10943v2
PDF	http://arxiv.org/pdf/1811.10943v2.pdf
PWC	https://paperswithcode.com/paper/deep-geometric-prior-for-surface
Repo	https://github.com/fwilliams/deep-geometric-prior
Framework	pytorch

Sports Camera Calibration via Synthetic Data


Title	Sports Camera Calibration via Synthetic Data
Authors	Jianhui Chen, James J. Little
Abstract	Calibrating sports cameras is important for autonomous broadcasting and sports analysis. Here we propose a highly automatic method for calibrating sports cameras from a single image using synthetic data. First, we develop a novel camera pose engine. The camera pose engine has only three significant free parameters so that it can effectively generate a lot of camera poses and corresponding edge (i.e, field marking) images. Then, we learn compact deep features via a siamese network from paired edge image and camera pose and build a feature-pose database. After that, we use a novel two-GAN (generative adversarial network) model to detect field markings in real images. Finally, we query an initial camera pose from the feature-pose database and refine camera poses using truncated distance images. We evaluate our method on both synthetic and real data. Our method not only demonstrates the robustness on the synthetic data but also achieves the state-of-the-art accuracy on a standard soccer dataset and very high performance on a volleyball dataset.
Tasks	Calibration
Published	2018-10-25
URL	http://arxiv.org/abs/1810.10658v1
PDF	http://arxiv.org/pdf/1810.10658v1.pdf
PWC	https://paperswithcode.com/paper/sports-camera-calibration-via-synthetic-data
Repo	https://github.com/lood339/SCCvSD
Framework	pytorch