January 27, 2020

3069 words 15 mins read

Paper Group ANR 1073

AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks. Toward Automated Quest Generation in Text-Adventure Games. PWOC-3D: Deep Occlusion-Aware End-to-End Scene Flow Estimation. Real-Time Dense Stereo Embedded in A UAV for Road Inspection. A fast multi-object tracking system using an object detector ensemble. SDC - Stacked Dilated Conv …

AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks


Title	AMNet: Deep Atrous Multiscale Stereo Disparity Estimation Networks
Authors	Xianzhi Du, Mostafa El-Khamy, Jungwon Lee
Abstract	In this paper, a new deep learning architecture for stereo disparity estimation is proposed. The proposed atrous multiscale network (AMNet) adopts an efficient feature extractor with depthwise-separable convolutions and an extended cost volume that deploys novel stereo matching costs on the deep features. A stacked atrous multiscale network is proposed to aggregate rich multiscale contextual information from the cost volume which allows for estimating the disparity with high accuracy at multiple scales. AMNet can be further modified to be a foreground-background aware network, FBA-AMNet, which is capable of discriminating between the foreground and the background objects in the scene at multiple scales. An iterative multitask learning method is proposed to train FBA-AMNet end-to-end. The proposed disparity estimation networks, AMNet and FBA-AMNet, show accurate disparity estimates and advance the state of the art on the challenging Middlebury, KITTI 2012, KITTI 2015, and Sceneflow stereo disparity estimation benchmarks.
Tasks	Disparity Estimation, Stereo Matching, Stereo Matching Hand
Published	2019-04-19
URL	http://arxiv.org/abs/1904.09099v1
PDF	http://arxiv.org/pdf/1904.09099v1.pdf
PWC	https://paperswithcode.com/paper/amnet-deep-atrous-multiscale-stereo-disparity
Repo
Framework

Toward Automated Quest Generation in Text-Adventure Games


Title	Toward Automated Quest Generation in Text-Adventure Games
Authors	Prithviraj Ammanabrolu, William Broniec, Alex Mueller, Jeremy Paul, Mark O. Riedl
Abstract	Interactive fictions, or text-adventures, are games in which a player interacts with a world entirely through textual descriptions and text actions. Text-adventure games are typically structured as puzzles or quests wherein the player must execute certain actions in a certain order to succeed. In this paper, we consider the problem of procedurally generating a quest, defined as a series of actions required to progress towards a goal, in a text-adventure game. Quest generation in text environments is challenging because they must be semantically coherent. We present and evaluate two quest generation techniques: (1) a Markov model, and (2) a neural generative model. We specifically look at generating quests about cooking and train our models on recipe data. We evaluate our techniques with human participant studies looking at perceived creativity and coherence.
Tasks
Published	2019-09-13
URL	https://arxiv.org/abs/1909.06283v3
PDF	https://arxiv.org/pdf/1909.06283v3.pdf
PWC	https://paperswithcode.com/paper/toward-automated-quest-generation-in-text
Repo
Framework

PWOC-3D: Deep Occlusion-Aware End-to-End Scene Flow Estimation


Title	PWOC-3D: Deep Occlusion-Aware End-to-End Scene Flow Estimation
Authors	Rohan Saxena, René Schuster, Oliver Wasenmüller, Didier Stricker
Abstract	In the last few years, convolutional neural networks (CNNs) have demonstrated increasing success at learning many computer vision tasks including dense estimation problems such as optical flow and stereo matching. However, the joint prediction of these tasks, called scene flow, has traditionally been tackled using slow classical methods based on primitive assumptions which fail to generalize. The work presented in this paper overcomes these drawbacks efficiently (in terms of speed and accuracy) by proposing PWOC-3D, a compact CNN architecture to predict scene flow from stereo image sequences in an end-to-end supervised setting. Further, large motion and occlusions are well-known problems in scene flow estimation. PWOC-3D employs specialized design decisions to explicitly model these challenges. In this regard, we propose a novel self-supervised strategy to predict occlusions from images (learned without any labeled occlusion data). Leveraging several such constructs, our network achieves competitive results on the KITTI benchmark and the challenging FlyingThings3D dataset. Especially on KITTI, PWOC-3D achieves the second place among end-to-end deep learning methods with 48 times fewer parameters than the top-performing method.
Tasks	Optical Flow Estimation, Scene Flow Estimation, Stereo Matching, Stereo Matching Hand
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06116v1
PDF	http://arxiv.org/pdf/1904.06116v1.pdf
PWC	https://paperswithcode.com/paper/pwoc-3d-deep-occlusion-aware-end-to-end-scene
Repo
Framework

Real-Time Dense Stereo Embedded in A UAV for Road Inspection


Title	Real-Time Dense Stereo Embedded in A UAV for Road Inspection
Authors	Rui Fan, Jianhao Jiao, Jie Pan, Huaiyang Huang, Shaojie Shen, Ming Liu
Abstract	The condition assessment of road surfaces is essential to ensure their serviceability while still providing maximum road traffic safety. This paper presents a robust stereo vision system embedded in an unmanned aerial vehicle (UAV). The perspective view of the target image is first transformed into the reference view, and this not only improves the disparity accuracy, but also reduces the algorithm’s computational complexity. The cost volumes generated from stereo matching are then filtered using a bilateral filter. The latter has been proved to be a feasible solution for the functional minimisation problem in a fully connected Markov random field model. Finally, the disparity maps are transformed by minimising an energy function with respect to the roll angle and disparity projection model. This makes the damaged road areas more distinguishable from the road surface. The proposed system is implemented on an NVIDIA Jetson TX2 GPU with CUDA for real-time purposes. It is demonstrated through experiments that the damaged road areas can be easily distinguished from the transformed disparity maps.
Tasks	Stereo Matching, Stereo Matching Hand
Published	2019-04-12
URL	http://arxiv.org/abs/1904.06017v1
PDF	http://arxiv.org/pdf/1904.06017v1.pdf
PWC	https://paperswithcode.com/paper/real-time-dense-stereo-embedded-in-a-uav-for
Repo
Framework

A fast multi-object tracking system using an object detector ensemble


Title	A fast multi-object tracking system using an object detector ensemble
Authors	Richard Cobos, Jefferson Hernandez, Andres G. Abad
Abstract	Multiple-Object Tracking (MOT) is of crucial importance for applications such as retail video analytics and video surveillance. Object detectors are often the computational bottleneck of modern MOT systems, limiting their use for real-time applications. In this paper, we address this issue by leveraging on an ensemble of detectors, each running every f frames. We measured the performance of our system in the MOT16 benchmark. The proposed model surpassed other online entries of the MOT16 challenge in speed, while maintaining an acceptable accuracy.
Tasks	Multi-Object Tracking, Multiple Object Tracking, Object Tracking
Published	2019-08-06
URL	https://arxiv.org/abs/1908.04349v1
PDF	https://arxiv.org/pdf/1908.04349v1.pdf
PWC	https://paperswithcode.com/paper/a-fast-multi-object-tracking-system-using-an
Repo
Framework

SDC - Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks


Title	SDC - Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks
Authors	René Schuster, Oliver Wasenmüller, Christian Unger, Didier Stricker
Abstract	Dense pixel matching is important for many computer vision tasks such as disparity and flow estimation. We present a robust, unified descriptor network that considers a large context region with high spatial variance. Our network has a very large receptive field and avoids striding layers to maintain spatial resolution. These properties are achieved by creating a novel neural network layer that consists of multiple, parallel, stacked dilated convolutions (SDC). Several of these layers are combined to form our SDC descriptor network. In our experiments, we show that our SDC features outperform state-of-the-art feature descriptors in terms of accuracy and robustness. In addition, we demonstrate the superior performance of SDC in state-of-the-art stereo matching, optical flow and scene flow algorithms on several famous public benchmarks.
Tasks	Optical Flow Estimation, Stereo Matching, Stereo Matching Hand
Published	2019-04-05
URL	http://arxiv.org/abs/1904.03076v1
PDF	http://arxiv.org/pdf/1904.03076v1.pdf
PWC	https://paperswithcode.com/paper/sdc-stacked-dilated-convolution-a-unified
Repo
Framework

EdgeStereo: An Effective Multi-Task Learning Network for Stereo Matching and Edge Detection


Title	EdgeStereo: An Effective Multi-Task Learning Network for Stereo Matching and Edge Detection
Authors	Xiao Song, Xu Zhao, Liangji Fang, Hanwen Hu
Abstract	Recently, leveraging on the development of end-to-end convolutional neural networks (CNNs), deep stereo matching networks have achieved remarkable performance far exceeding traditional approaches. However, state-of-the-art stereo frameworks still have difficulties at finding correct correspondences in texture-less regions, detailed structures, small objects and near boundaries, which could be alleviated by geometric clues such as edge contours and corresponding constraints. To improve the quality of disparity estimates in these challenging areas, we propose an effective multi-task learning network, EdgeStereo, composed of a disparity estimation branch and an edge detection branch, which enables end-to-end predictions of both disparity map and edge map. To effectively incorporate edge cues, we propose the edge-aware smoothness loss and edge feature embedding for inter-task interactions. It is demonstrated that based on our unified model, edge detection task and stereo matching task can promote each other. In addition, we design a compact module called residual pyramid to replace the commonly-used multi-stage cascaded structures or 3-D convolution based regularization modules in current stereo matching networks. By the time of the paper submission, EdgeStereo achieves state-of-art performance on the FlyingThings3D dataset, KITTI 2012 and KITTI 2015 stereo benchmarks, outperforming other published stereo matching methods by a noteworthy margin. EdgeStereo also achieves comparable generalization performance for disparity estimation because of the incorporation of edge cues.
Tasks	Disparity Estimation, Edge Detection, Multi-Task Learning, Stereo Matching, Stereo Matching Hand
Published	2019-03-05
URL	https://arxiv.org/abs/1903.01700v2
PDF	https://arxiv.org/pdf/1903.01700v2.pdf
PWC	https://paperswithcode.com/paper/edgestereo-an-effective-multi-task-learning
Repo
Framework

Stochastic Convolutional Sparse Coding


Title	Stochastic Convolutional Sparse Coding
Authors	Jinhui Xiong, Peter Richtárik, Wolfgang Heidrich
Abstract	State-of-the-art methods for Convolutional Sparse Coding usually employ Fourier-domain solvers in order to speed up the convolution operators. However, this approach is not without shortcomings. For example, Fourier-domain representations implicitly assume circular boundary conditions and make it hard to fully exploit the sparsity of the problem as well as the small spatial support of the filters. In this work, we propose a novel stochastic spatial-domain solver, in which a randomized subsampling strategy is introduced during the learning sparse codes. Afterwards, we extend the proposed strategy in conjunction with online learning, scaling the CSC model up to very large sample sizes. In both cases, we show experimentally that the proposed subsampling strategy, with a reasonable selection of the subsampling rate, outperforms the state-of-the-art frequency-domain solvers in terms of execution time without losing the learning quality. Finally, we evaluate the effectiveness of the over-complete dictionary learned from large-scale datasets, which demonstrates an improved sparse representation of the natural images on account of more abundant learned image features.
Tasks
Published	2019-08-31
URL	https://arxiv.org/abs/1909.00145v1
PDF	https://arxiv.org/pdf/1909.00145v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-convolutional-sparse-coding
Repo
Framework

A Method for Estimating the Proximity of Vector Representation Groups in Multidimensional Space. On the Example of the Paraphrase Task


Title	A Method for Estimating the Proximity of Vector Representation Groups in Multidimensional Space. On the Example of the Paraphrase Task
Authors	Artem Artemov, Boris Alekseev
Abstract	The following paper presents a method of comparing two sets of vectors. The method can be applied in all tasks, where it is necessary to measure the closeness of two objects presented as sets of vectors. It may be applicable when we compare the meanings of two sentences as part of the problem of paraphrasing. This is the problem of measuring semantic similarity of two sentences (group of words). The existing methods are not sensible for the word order or syntactic connections in the considered sentences. The method appears to be advantageous because it neither presents a group of words as one scalar value, nor does it try to show the closeness through an aggregation vector, which is mean for the set of vectors. Instead of that we measure the cosine of the angle as the mean for the first group vectors projections (the context) on one side and each vector of the second group on the other side. The similarity of two sentences defined by these means does not lose any semantic characteristics and takes account of the words traits. The method was verified on the comparison of sentence pairs in Russian.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2019-08-25
URL	https://arxiv.org/abs/1908.09341v2
PDF	https://arxiv.org/pdf/1908.09341v2.pdf
PWC	https://paperswithcode.com/paper/a-method-for-estimating-the-proximity-of
Repo
Framework

What Do Adversarially Robust Models Look At?


Title	What Do Adversarially Robust Models Look At?
Authors	Takahiro Itazuri, Yoshihiro Fukuhara, Hirokatsu Kataoka, Shigeo Morishima
Abstract	In this paper, we address the open question: “What do adversarially robust models look at?” Recently, it has been reported in many works that there exists the trade-off between standard accuracy and adversarial robustness. According to prior works, this trade-off is rooted in the fact that adversarially robust and standard accurate models might depend on very different sets of features. However, it has not been well studied what kind of difference actually exists. In this paper, we analyze this difference through various experiments visually and quantitatively. Experimental results show that adversarially robust models look at things at a larger scale than standard models and pay less attention to fine textures. Furthermore, although it has been claimed that adversarially robust features are not compatible with standard accuracy, there is even a positive effect by using them as pre-trained models particularly in low resolution datasets.
Tasks
Published	2019-05-19
URL	https://arxiv.org/abs/1905.07666v1
PDF	https://arxiv.org/pdf/1905.07666v1.pdf
PWC	https://paperswithcode.com/paper/what-do-adversarially-robust-models-look-at
Repo
Framework

Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize


Title	Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize
Authors	Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song
Abstract	Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth. The estimated depth provides additional information complementary to individual semantic features, which can be helpful for other vision tasks such as tracking, recognition and detection. However, there are large appearance variations between images from different spectral bands, which is a challenge for cross-spectral stereo matching. Existing deep unsupervised stereo matching methods are sensitive to the appearance variations and do not perform well on cross-spectral data. We propose a novel unsupervised cross-spectral stereo matching framework based on image-to-image translation. First, a style adaptation network transforms images across different spectral bands by cycle consistency and adversarial learning, during which appearance variations are minimized. Then, a stereo matching network is trained with image pairs from the same spectra using view reconstruction loss. At last, the estimated disparity is utilized to supervise the spectral-translation network in an end-to-end way. Moreover, a novel style adaptation network F-cycleGAN is proposed to improve the robustness of spectral translation. Our method can tackle appearance variations and enhance the robustness of unsupervised cross-spectral stereo matching. Experimental results show that our method achieves good performance without using depth supervision or explicit semantic information.
Tasks	Image-to-Image Translation, Stereo Matching, Stereo Matching Hand
Published	2019-03-04
URL	http://arxiv.org/abs/1903.01078v1
PDF	http://arxiv.org/pdf/1903.01078v1.pdf
PWC	https://paperswithcode.com/paper/unsupervised-cross-spectral-stereo-matching
Repo
Framework

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound


Title	Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound
Authors	Lin F. Yang, Mengdi Wang
Abstract	Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However existing methods either have no theoretical guarantee or suffer a regret that is exponential in the planning horizon $H$. In this paper, we propose an online RL algorithm, namely the MatrixRL, that leverages ideas from linear bandit to learn a low-dimensional representation of the probability transition model while carefully balancing the exploitation-exploration tradeoff. We show that MatrixRL achieves a regret bound ${O}\big(H^2d\log T\sqrt{T}\big)$ where $d$ is the number of features. MatrixRL has an equivalent kernelized version, which is able to work with an arbitrary kernel Hilbert space without using explicit features. In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space. To our best knowledge, for RL using features or kernels, our results are the first regret bounds that are near-optimal in time $T$ and dimension $d$ (or $\widetilde{d}$) and polynomial in the planning horizon $H$.
Tasks
Published	2019-05-24
URL	https://arxiv.org/abs/1905.10389v2
PDF	https://arxiv.org/pdf/1905.10389v2.pdf
PWC	https://paperswithcode.com/paper/reinforcement-leaning-in-feature-space-matrix
Repo
Framework

Learning low-dimensional state embeddings and metastable clusters from time series data


Title	Learning low-dimensional state embeddings and metastable clusters from time series data
Authors	Yifan Sun, Yaqi Duan, Hao Gong, Mengdi Wang
Abstract	This paper studies how to find compact state embeddings from high-dimensional Markov state trajectories, where the transition kernel has a small intrinsic rank. In the spirit of diffusion map, we propose an efficient method for learning a low-dimensional state embedding and capturing the process’s dynamics. This idea also leads to a kernel reshaping method for more accurate nonparametric estimation of the transition function. State embedding can be used to cluster states into metastable sets, thereby identifying the slow dynamics. Sharp statistical error bounds and misclassification rate are proved. Experiment on a simulated dynamical system shows that the state clustering method indeed reveals metastable structures. We also experiment with time series generated by layers of a Deep-Q-Network when playing an Atari game. The embedding method identifies game states to be similar if they share similar future events, even though their raw data are far different.
Tasks	Time Series
Published	2019-06-01
URL	https://arxiv.org/abs/1906.00302v2
PDF	https://arxiv.org/pdf/1906.00302v2.pdf
PWC	https://paperswithcode.com/paper/190600302
Repo
Framework

Competitive ratio versus regret minimization: achieving the best of both worlds


Title	Competitive ratio versus regret minimization: achieving the best of both worlds
Authors	Amit Daniely, Yishay Mansour
Abstract	We consider online algorithms under both the competitive ratio criteria and the regret minimization one. Our main goal is to build a unified methodology that would be able to guarantee both criteria simultaneously. For a general class of online algorithms, namely any Metrical Task System (MTS), we show that one can simultaneously guarantee the best known competitive ratio and a natural regret bound. For the paging problem we further show an efficient online algorithm (polynomial in the number of pages) with this guarantee. To this end, we extend an existing regret minimization algorithm (specifically, Kapralov and Panigrahy) to handle movement cost (the cost of switching between states of the online system). We then show how to use the extended regret minimization algorithm to combine multiple online algorithms. Our end result is an online algorithm that can combine a “base” online algorithm, having a guaranteed competitive ratio, with a range of online algorithms that guarantee a small regret over any interval of time. The combined algorithm guarantees both that the competitive ratio matches that of the base algorithm and a low regret over any time interval. As a by product, we obtain an expert algorithm with close to optimal regret bound on every time interval, even in the presence of switching costs. This result is of independent interest.
Tasks
Published	2019-04-07
URL	http://arxiv.org/abs/1904.03602v1
PDF	http://arxiv.org/pdf/1904.03602v1.pdf
PWC	https://paperswithcode.com/paper/competitive-ratio-versus-regret-minimization
Repo
Framework

Sensitivity of quantum PageRank


Title	Sensitivity of quantum PageRank
Authors	Hirotada Honda
Abstract	In this paper, we discuss the sensitivity of quantum PageRank. By using the finite dimensional perturbation theory, we estimate the change of the quantum PageRank under a small analytical perturbation on the Google matrix. In addition, we will show the way to estimate the lower bound of the convergence radius as well as the error bound of the finite sum in the expansion of the perturbed PageRank.
Tasks
Published	2019-06-27
URL	https://arxiv.org/abs/1907.01641v1
PDF	https://arxiv.org/pdf/1907.01641v1.pdf
PWC	https://paperswithcode.com/paper/sensitivity-of-quantum-pagerank
Repo
Framework