Paper Group AWR 304
SceneEDNet: A Deep Learning Approach for Scene Flow Estimation. LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics. Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications. Neural Guided Constraint Logic Programming for Program Synthesis. Automatic Program Synthesi …
SceneEDNet: A Deep Learning Approach for Scene Flow Estimation
Title | SceneEDNet: A Deep Learning Approach for Scene Flow Estimation |
Authors | Ravi Kumar Thakur, Snehasis Mukherjee |
Abstract | Estimating scene flow in RGB-D videos is attracting much interest of the computer vision researchers, due to its potential applications in robotics. The state-of-the-art techniques for scene flow estimation, typically rely on the knowledge of scene structure of the frame and the correspondence between frames. However, with the increasing amount of RGB-D data captured from sophisticated sensors like Microsoft Kinect, and the recent advances in the area of sophisticated deep learning techniques, introduction of an efficient deep learning technique for scene flow estimation, is becoming important. This paper introduces a first effort to apply a deep learning method for direct estimation of scene flow by presenting a fully convolutional neural network with an encoder-decoder (ED) architecture. The proposed network SceneEDNet involves estimation of three dimensional motion vectors of all the scene points from sequence of stereo images. The training for direct estimation of scene flow is done using consecutive pairs of stereo images and corresponding scene flow ground truth. The proposed architecture is applied on a huge dataset and provides meaningful results. |
Tasks | Scene Flow Estimation |
Published | 2018-07-10 |
URL | http://arxiv.org/abs/1807.03464v1 |
http://arxiv.org/pdf/1807.03464v1.pdf | |
PWC | https://paperswithcode.com/paper/sceneednet-a-deep-learning-approach-for-scene |
Repo | https://github.com/ravikt/sceneednet |
Framework | none |
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
Title | LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics |
Authors | Sourav Garg, Niko Suenderhauf, Michael Milford |
Abstract | Human visual scene understanding is so remarkable that we are able to recognize a revisited place when entering it from the opposite direction it was first visited, even in the presence of extreme variations in appearance. This capability is especially apparent during driving: a human driver can recognize where they are when travelling in the reverse direction along a route for the first time, without having to turn back and look. The difficulty of this problem exceeds any addressed in past appearance- and viewpoint-invariant visual place recognition (VPR) research, in part because large parts of the scene are not commonly observable from opposite directions. Consequently, as shown in this paper, the precision-recall performance of current state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders of magnitude below what would be usable in a closed-loop system. Current engineered solutions predominantly rely on panoramic camera or LIDAR sensing setups; an eminently suitable engineering solution but one that is clearly very different to how humans navigate, which also has implications for how naturally humans could interact and communicate with the navigation system. In this paper we develop a suite of novel semantic- and appearance-based techniques to enable for the first time high performance place recognition in this challenging scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of images using the convolutional feature maps from a state-of-the-art dense semantic segmentation network. Then, to verify the spatial semantic arrangement of the top matching candidates, we develop a novel approach for mining semantically-salient keypoint correspondences. |
Tasks | Scene Understanding, Semantic Segmentation, Visual Place Recognition |
Published | 2018-04-16 |
URL | http://arxiv.org/abs/1804.05526v3 |
http://arxiv.org/pdf/1804.05526v3.pdf | |
PWC | https://paperswithcode.com/paper/lost-appearance-invariant-place-recognition |
Repo | https://github.com/oravus/lostX |
Framework | none |
Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications
Title | Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications |
Authors | Xiaoxiao Du, Alina Zare |
Abstract | In classifier (or regression) fusion the aim is to combine the outputs of several algorithms to boost overall performance. Standard supervised fusion algorithms often require accurate and precise training labels. However, accurate labels may be difficult to obtain in many remote sensing applications. This paper proposes novel classification and regression fusion models that can be trained given ambiguosly and imprecisely labeled training data in which training labels are associated with sets of data points (i.e., “bags”) instead of individual data points (i.e., “instances”) following a multiple instance learning framework. Experiments were conducted based on the proposed algorithms on both synthetic data and applications such as target detection and crop yield prediction given remote sensing data. The proposed algorithms show effective classification and regression performance. |
Tasks | Multiple Instance Learning |
Published | 2018-03-11 |
URL | http://arxiv.org/abs/1803.04048v2 |
http://arxiv.org/pdf/1803.04048v2.pdf | |
PWC | https://paperswithcode.com/paper/multiple-instance-choquet-integral-classifier |
Repo | https://github.com/GatorSense/MICI |
Framework | none |
Neural Guided Constraint Logic Programming for Program Synthesis
Title | Neural Guided Constraint Logic Programming for Program Synthesis |
Authors | Lisa Zhang, Gregory Rosenblatt, Ethan Fetaya, Renjie Liao, William E. Byrd, Matthew Might, Raquel Urtasun, Richard Zemel |
Abstract | Synthesizing programs using example input/outputs is a classic problem in artificial intelligence. We present a method for solving Programming By Example (PBE) problems by using a neural model to guide the search of a constraint logic programming system called miniKanren. Crucially, the neural model uses miniKanren’s internal representation as input; miniKanren represents a PBE problem as recursive constraints imposed by the provided examples. We explore Recurrent Neural Network and Graph Neural Network models. We contribute a modified miniKanren, drivable by an external agent, available at https://github.com/xuexue/neuralkanren. We show that our neural-guided approach using constraints can synthesize programs faster in many cases, and importantly, can generalize to larger problems. |
Tasks | Program Synthesis |
Published | 2018-09-08 |
URL | http://arxiv.org/abs/1809.02840v3 |
http://arxiv.org/pdf/1809.02840v3.pdf | |
PWC | https://paperswithcode.com/paper/neural-guided-constraint-logic-programming |
Repo | https://github.com/xuexue/neuralkanren |
Framework | pytorch |
Automatic Program Synthesis of Long Programs with a Learned Garbage Collector
Title | Automatic Program Synthesis of Long Programs with a Learned Garbage Collector |
Authors | Amit Zohar, Lior Wolf |
Abstract | We consider the problem of generating automatic code given sample input-output pairs. We train a neural network to map from the current state and the outputs to the program’s next statement. The neural network optimizes multiple tasks concurrently: the next operation out of a set of high level commands, the operands of the next statement, and which variables can be dropped from memory. Using our method we are able to create programs that are more than twice as long as existing state-of-the-art solutions, while improving the success rate for comparable lengths, and cutting the run-time by two orders of magnitude. Our code, including an implementation of various literature baselines, is publicly available at https://github.com/amitz25/PCCoder |
Tasks | Program Synthesis |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04682v2 |
http://arxiv.org/pdf/1809.04682v2.pdf | |
PWC | https://paperswithcode.com/paper/automatic-program-synthesis-of-long-programs |
Repo | https://github.com/amitz25/PCCoder |
Framework | pytorch |
The Mirage of Action-Dependent Baselines in Reinforcement Learning
Title | The Mirage of Action-Dependent Baselines in Reinforcement Learning |
Authors | George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine |
Abstract | Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the previously observed empirical gains. Furthermore, the variance decomposition highlights areas for improvement, which we demonstrate by illustrating a simple change to the typical value function parameterization that can significantly improve performance. |
Tasks | Policy Gradient Methods |
Published | 2018-02-27 |
URL | http://arxiv.org/abs/1802.10031v3 |
http://arxiv.org/pdf/1802.10031v3.pdf | |
PWC | https://paperswithcode.com/paper/the-mirage-of-action-dependent-baselines-in |
Repo | https://github.com/brain-research/mirage-rl |
Framework | tf |
Training and Refining Deep Learning Based Denoisers without Ground Truth Data
Title | Training and Refining Deep Learning Based Denoisers without Ground Truth Data |
Authors | Shakarim Soltanayev, Se Young Chun |
Abstract | Recently developed deep-learning-based denoisers often outperform state-of-the-art conventional denoisers such as the BM3D. They are typically trained to minimize the mean squared error (MSE) between the output image of a deep neural network (DNN) and a ground truth image. Thus, it is important for deep-learning-based denoisers to use high quality noiseless ground truth data for high performance. However, it is often challenging or even infeasible to obtain noiseless images in some applications. Here, we propose a method based on Stein’s unbiased risk estimator (SURE) for training DNN denoisers based only on the use of noisy images in the training data with Gaussian noise. We demonstrate that our SURE-based method, without the use of ground truth data, is able to train DNN denoisers to yield performances close to those networks trained with ground truth for both grayscale and color images. We also propose a SURE-based refining method with a noisy test image for further performance improvement. Our quick refining method outperformed conventional BM3D, deep image prior, and often the networks trained with ground truth. Potential extension of our SURE-based methods to Poisson noise model was also investigated. |
Tasks | Image Denoising |
Published | 2018-03-04 |
URL | http://arxiv.org/abs/1803.01314v3 |
http://arxiv.org/pdf/1803.01314v3.pdf | |
PWC | https://paperswithcode.com/paper/training-deep-learning-based-denoisers |
Repo | https://github.com/Shakarim94/Net-SURE |
Framework | tf |
Learning Human-Object Interactions by Graph Parsing Neural Networks
Title | Learning Human-Object Interactions by Graph Parsing Neural Networks |
Authors | Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, Song-Chun Zhu |
Abstract | This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end. For a given scene, GPNN infers a parse graph that includes i) the HOI graph structure represented by an adjacency matrix, and ii) the node labels. Within a message passing inference framework, GPNN iteratively computes the adjacency matrices and node labels. We extensively evaluate our model on three HOI detection benchmarks on images and videos: HICO-DET, V-COCO, and CAD-120 datasets. Our approach significantly outperforms state-of-art methods, verifying that GPNN is scalable to large datasets and applies to spatial-temporal settings. The code is available at https://github.com/SiyuanQi/gpnn. |
Tasks | Human-Object Interaction Detection |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.07962v1 |
http://arxiv.org/pdf/1808.07962v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-human-object-interactions-by-graph |
Repo | https://github.com/SiyuanQi/gpnn |
Framework | pytorch |
Multi-Level Contextual Network for Biomedical Image Segmentation
Title | Multi-Level Contextual Network for Biomedical Image Segmentation |
Authors | Amirhossein Dadashzadeh, Alireza Tavakoli Targhi |
Abstract | Accurate and reliable image segmentation is an essential part of biomedical image analysis. In this paper, we consider the problem of biomedical image segmentation using deep convolutional neural networks. We propose a new end-to-end network architecture that effectively integrates local and global contextual patterns of histologic primitives to obtain a more reliable segmentation result. Specifically, we introduce a deep fully convolution residual network with a new skip connection strategy to control the contextual information passed forward. Moreover, our trained model is also computationally inexpensive due to its small number of network parameters. We evaluate our method on two public datasets for epithelium segmentation and tubule segmentation tasks. Our experimental results show that the proposed method provides a fast and effective way of producing a pixel-wise dense prediction of biomedical images. |
Tasks | Semantic Segmentation |
Published | 2018-09-30 |
URL | http://arxiv.org/abs/1810.00327v1 |
http://arxiv.org/pdf/1810.00327v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-level-contextual-network-for-biomedical |
Repo | https://github.com/Plrbear/biomedical-image-segmentation |
Framework | tf |
Reachability Analysis of Deep Neural Networks with Provable Guarantees
Title | Reachability Analysis of Deep Neural Networks with Provable Guarantees |
Authors | Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska |
Abstract | Verifying correctness of deep neural networks (DNNs) is challenging. We study a generic reachability problem for feed-forward DNNs which, for a given set of inputs to the network and a Lipschitz-continuous function over its outputs, computes the lower and upper bound on the function values. Because the network and the function are Lipschitz continuous, all values in the interval between the lower and upper bound are reachable. We show how to obtain the safety verification problem, the output range analysis problem and a robustness measure by instantiating the reachability problem. We present a novel algorithm based on adaptive nested optimisation to solve the reachability problem. The technique has been implemented and evaluated on a range of DNNs, demonstrating its efficiency, scalability and ability to handle a broader class of networks than state-of-the-art verification approaches. |
Tasks | |
Published | 2018-05-06 |
URL | http://arxiv.org/abs/1805.02242v1 |
http://arxiv.org/pdf/1805.02242v1.pdf | |
PWC | https://paperswithcode.com/paper/reachability-analysis-of-deep-neural-networks |
Repo | https://github.com/trustAI/DeepGO |
Framework | none |
Multimodal Sentiment Analysis To Explore the Structure of Emotions
Title | Multimodal Sentiment Analysis To Explore the Structure of Emotions |
Authors | Anthony Hu, Seth Flaxman |
Abstract | We propose a novel approach to multimodal sentiment analysis using deep neural networks combining visual analysis and natural language processing. Our goal is different than the standard sentiment analysis goal of predicting whether a sentence expresses positive or negative sentiment; instead, we aim to infer the latent emotional state of the user. Thus, we focus on predicting the emotion word tags attached by users to their Tumblr posts, treating these as “self-reported emotions.” We demonstrate that our multimodal model combining both text and image features outperforms separate models based solely on either images or text. Our model’s results are interpretable, automatically yielding sensible word lists associated with emotions. We explore the structure of emotions implied by our model and compare it to what has been posited in the psychology literature, and validate our model on a set of images that have been used in psychology studies. Finally, our work also provides a useful tool for the growing academic study of images - both photographs and memes - on social networks. |
Tasks | Multimodal Sentiment Analysis, Sentiment Analysis |
Published | 2018-05-25 |
URL | http://arxiv.org/abs/1805.10205v1 |
http://arxiv.org/pdf/1805.10205v1.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-sentiment-analysis-to-explore-the |
Repo | https://github.com/anthonyhu/tumblr-emotions |
Framework | tf |
Efficient Exploration through Bayesian Deep Q-Networks
Title | Efficient Exploration through Bayesian Deep Q-Networks |
Authors | Kamyar Azizzadenesheli, Animashree Anandkumar |
Abstract | We study reinforcement learning (RL) in high dimensional episodic Markov decision processes (MDP). We consider value-based RL when the optimal Q-value is a linear function of d-dimensional state-action feature representation. For instance, in deep-Q networks (DQN), the Q-value is a linear function of the feature representation layer (output layer). We propose two algorithms, one based on optimism, LINUCB, and another based on posterior sampling, LINPSRL. We guarantee frequentist and Bayesian regret upper bounds of O(d sqrt{T}) for these two algorithms, where T is the number of episodes. We extend these methods to deep RL and propose Bayesian deep Q-networks (BDQN), which uses an efficient Thompson sampling algorithm for high dimensional RL. We deploy the double DQN (DDQN) approach, and instead of learning the last layer of Q-network using linear regression, we use Bayesian linear regression, resulting in an approximated posterior over Q-function. This allows us to directly incorporate the uncertainty over the Q-function and deploy Thompson sampling on the learned posterior distribution resulting in efficient exploration/exploitation trade-off. We empirically study the behavior of BDQN on a wide range of Atari games. Since BDQN carries out more efficient exploration and exploitation, it is able to reach higher return substantially faster compared to DDQN. |
Tasks | Atari Games, Efficient Exploration |
Published | 2018-02-13 |
URL | https://arxiv.org/abs/1802.04412v4 |
https://arxiv.org/pdf/1802.04412v4.pdf | |
PWC | https://paperswithcode.com/paper/efficient-exploration-through-bayesian-deep-q |
Repo | https://github.com/kazizzad/BDQN-MxNet-Gluon |
Framework | mxnet |
Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network
Title | Leveraging Financial News for Stock Trend Prediction with Attention-Based Recurrent Neural Network |
Authors | Huicheng Liu |
Abstract | Stock market prediction is one of the most attractive research topic since the successful prediction on the market’s future movement leads to significant profit. Traditional short term stock market predictions are usually based on the analysis of historical market data, such as stock prices, moving averages or daily returns. However, financial news also contains useful information on public companies and the market. Existing methods in finance literature exploit sentiment signal features, which are limited by not considering factors such as events and the news context. We address this issue by leveraging deep neural models to extract rich semantic features from news text. In particular, a Bidirectional-LSTM are used to encode the news text and capture the context information, self attention mechanism are applied to distribute attention on most relative words, news and days. In terms of predicting directional changes in both Standard & Poor’s 500 index and individual companies stock price, we show that this technique is competitive with other state of the art approaches, demonstrating the effectiveness of recent NLP technology advances for computational finance. |
Tasks | Stock Market Prediction, Stock Trend Prediction |
Published | 2018-11-15 |
URL | http://arxiv.org/abs/1811.06173v1 |
http://arxiv.org/pdf/1811.06173v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-financial-news-for-stock-trend |
Repo | https://github.com/maobubu/stock-prediction |
Framework | none |
Deep Geometric Prior for Surface Reconstruction
Title | Deep Geometric Prior for Surface Reconstruction |
Authors | Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, Daniele Panozzo |
Abstract | The reconstruction of a discrete surface from a point cloud is a fundamental geometry processing problem that has been studied for decades, with many methods developed. We propose the use of a deep neural network as a geometric prior for surface reconstruction. Specifically, we overfit a neural network representing a local chart parameterization to part of an input point cloud using the Wasserstein distance as a measure of approximation. By jointly fitting many such networks to overlapping parts of the point cloud, while enforcing a consistency condition, we compute a manifold atlas. By sampling this atlas, we can produce a dense reconstruction of the surface approximating the input cloud. The entire procedure does not require any training data or explicit regularization, yet, we show that it is able to perform remarkably well: not introducing typical overfitting artifacts, and approximating sharp features closely at the same time. We experimentally show that this geometric prior produces good results for both man-made objects containing sharp features and smoother organic objects, as well as noisy inputs. We compare our method with a number of well-known reconstruction methods on a standard surface reconstruction benchmark. |
Tasks | |
Published | 2018-11-27 |
URL | http://arxiv.org/abs/1811.10943v2 |
http://arxiv.org/pdf/1811.10943v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-geometric-prior-for-surface |
Repo | https://github.com/fwilliams/deep-geometric-prior |
Framework | pytorch |
Sports Camera Calibration via Synthetic Data
Title | Sports Camera Calibration via Synthetic Data |
Authors | Jianhui Chen, James J. Little |
Abstract | Calibrating sports cameras is important for autonomous broadcasting and sports analysis. Here we propose a highly automatic method for calibrating sports cameras from a single image using synthetic data. First, we develop a novel camera pose engine. The camera pose engine has only three significant free parameters so that it can effectively generate a lot of camera poses and corresponding edge (i.e, field marking) images. Then, we learn compact deep features via a siamese network from paired edge image and camera pose and build a feature-pose database. After that, we use a novel two-GAN (generative adversarial network) model to detect field markings in real images. Finally, we query an initial camera pose from the feature-pose database and refine camera poses using truncated distance images. We evaluate our method on both synthetic and real data. Our method not only demonstrates the robustness on the synthetic data but also achieves the state-of-the-art accuracy on a standard soccer dataset and very high performance on a volleyball dataset. |
Tasks | Calibration |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10658v1 |
http://arxiv.org/pdf/1810.10658v1.pdf | |
PWC | https://paperswithcode.com/paper/sports-camera-calibration-via-synthetic-data |
Repo | https://github.com/lood339/SCCvSD |
Framework | pytorch |