January 27, 2020

3055 words 15 mins read

Paper Group ANR 1143

DAL – A Deep Depth-aware Long-term Tracker. Score and Lyrics-Free Singing Voice Generation. Neural networks-based backward scheme for fully nonlinear PDEs. A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression. Unsupervised Representation Learning for Gaze Estimation. 360 Panorama Synthesis from a Sparse S …

DAL – A Deep Depth-aware Long-term Tracker


Title	DAL – A Deep Depth-aware Long-term Tracker
Authors	Yanlin Qian, Alan Lukežič, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas
Abstract	The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep features. Moreover, the same depth-aware correlation filter is used for target re-detection. Comprehensive evaluations show that the proposed tracker achieves state-of-the-art performance on the Princeton RGBD, STC, and the newly-released CDTB benchmarks and runs 20 fps.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00660v1
PDF	https://arxiv.org/pdf/1912.00660v1.pdf
PWC	https://paperswithcode.com/paper/dal-a-deep-depth-aware-long-term-tracker
Repo
Framework

Score and Lyrics-Free Singing Voice Generation


Title	Score and Lyrics-Free Singing Voice Generation
Authors	Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, Yi-Hsuan Yang
Abstract	Generative models for singing voice have been mostly concerned with the task of “singing voice synthesis,” i.e., to produce singing voice waveforms given musical scores and text lyrics. In this work, we explore a novel yet challenging alternative: singing voice generation without pre-assigned scores and lyrics, in both training and inference time. In particular, we propose three either unconditioned or weakly conditioned singing voice generation schemes. We outline the associated challenges and propose a pipeline to tackle these new tasks. This involves the development of source separation and transcription models for data preparation, adversarial networks for audio generation, and customized metrics for evaluation.
Tasks	Audio Generation
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11747v1
PDF	https://arxiv.org/pdf/1912.11747v1.pdf
PWC	https://paperswithcode.com/paper/score-and-lyrics-free-singing-voice-1
Repo
Framework

Neural networks-based backward scheme for fully nonlinear PDEs


Title	Neural networks-based backward scheme for fully nonlinear PDEs
Authors	Huyen Pham, Huyên Pham, Xavier Warin
Abstract	We propose a numerical method for solving high dimensional fully nonlinear partial differential equations (PDEs). Our algorithm estimates simultaneously by backward time induction the solution and its gradient by multi-layer neural networks, through a sequence of learning problems obtained from the minimization of suitable quadratic loss functions and training simulations. This methodology extends to the fully non-linear case the approach recently proposed in [HPW19] for semi-linear PDEs. Numerical tests illustrate the performance and accuracy of our method on several examples in high dimension with nonlinearity on the Hessian term including a linear quadratic control problem with control on the diffusion coefficient.
Tasks
Published	2019-07-31
URL	https://arxiv.org/abs/1908.00412v1
PDF	https://arxiv.org/pdf/1908.00412v1.pdf
PWC	https://paperswithcode.com/paper/neural-networks-based-backward-scheme-for
Repo
Framework

A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression


Title	A First-Order Algorithmic Framework for Wasserstein Distributionally Robust Logistic Regression
Authors	Jiajin Li, Sen Huang, Anthony Man-Cho So
Abstract	Wasserstein distance-based distributionally robust optimization (DRO) has received much attention lately due to its ability to provide a robustness interpretation of various learning models. Moreover, many of the DRO problems that arise in the learning context admits exact convex reformulations and hence can be tackled by off-the-shelf solvers. Nevertheless, the use of such solvers severely limits the applicability of DRO in large-scale learning problems, as they often rely on general purpose interior-point algorithms. On the other hand, there are very few works that attempt to develop fast iterative methods to solve these DRO problems, which typically possess complicated structures. In this paper, we take a first step towards resolving the above difficulty by developing a first-order algorithmic framework for tackling a class of Wasserstein distance-based distributionally robust logistic regression (DRLR) problem. Specifically, we propose a novel linearized proximal ADMM to solve the DRLR problem, whose objective is convex but consists of a smooth term plus two non-separable non-smooth terms. We prove that our method enjoys a sublinear convergence rate. Furthermore, we conduct three different experiments to show its superb performance on both synthetic and real-world datasets. In particular, our method can achieve the same accuracy up to 800+ times faster than the standard off-the-shelf solver.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12778v1
PDF	https://arxiv.org/pdf/1910.12778v1.pdf
PWC	https://paperswithcode.com/paper/a-first-order-algorithmic-framework-for
Repo
Framework

Unsupervised Representation Learning for Gaze Estimation


Title	Unsupervised Representation Learning for Gaze Estimation
Authors	Yu Yu, Jean-Marc Odobez
Abstract	Although automatic gaze estimation is very important to a large variety of application areas, it is difficult to train accurate and robust gaze models, in great part due to the difficulty in collecting large and diverse data (annotating 3D gaze is expensive and existing datasets use different setups). To address this issue, our main contribution in this paper is to propose an effective approach to learn a low dimensional gaze representation without gaze annotations, which to the best of our best knowledge, is the first work to do so. The main idea is to rely on a gaze redirection network and use the gaze representation difference of the input and target images (of the redirection network) as the redirection variable. A redirection loss in image domain allows the joint training of both the redirection network and the gaze representation network. In addition, we propose a warping field regularization which not only provides an explicit physical meaning to the gaze representations but also avoids redirection distortions. Promising results on few-shot gaze estimation (competitive results can be achieved with as few as <=100 calibration samples), cross-dataset gaze estimation, gaze network pretraining, and another task (head pose estimation) demonstrate the validity of our framework.
Tasks	Calibration, Gaze Estimation, Head Pose Estimation, Pose Estimation, Representation Learning, Unsupervised Representation Learning
Published	2019-11-16
URL	https://arxiv.org/abs/1911.06939v3
PDF	https://arxiv.org/pdf/1911.06939v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-representation-learning-for-gaze
Repo
Framework

360 Panorama Synthesis from a Sparse Set of Images with Unknown Field of View


Title	360 Panorama Synthesis from a Sparse Set of Images with Unknown Field of View
Authors	Julius Surya Sumantri, In Kyu Park
Abstract	360 images represent scenes captured in all possible viewing directions and enable viewers to navigate freely around the scene thereby providing an immersive experience. Conversely, conventional images represent scenes in a single viewing direction with a small or limited field of view (FOV). As a result, only certain parts of the scenes are observed, and valuable information about the surroundings is lost. In this paper, a learning-based approach that reconstructs the scene in 360 x 180 from a sparse set of conventional images (typically 4 images) is proposed. The proposed approach first estimates the FOV of input images relative to the panorama. The estimated FOV is then used as the prior for synthesizing a high-resolution 360 panoramic output. The proposed method overcomes the difficulty of learning-based approach in synthesizing high resolution images (up to 512$\times$1024). Experimental results demonstrate that the proposed method produces 360 panorama with reasonable quality. Results also show that the proposed method outperforms the alternative method and can be generalized for non-panoramic scenes and images captured by a smartphone camera.
Tasks
Published	2019-04-06
URL	https://arxiv.org/abs/1904.03326v4
PDF	https://arxiv.org/pdf/1904.03326v4.pdf
PWC	https://paperswithcode.com/paper/360-panorama-synthesis-from-a-sparse-set-of
Repo
Framework

Limited Lookahead in Imperfect-Information Games


Title	Limited Lookahead in Imperfect-Information Games
Authors	Christian Kroer, Tuomas Sandholm
Abstract	Limited lookahead has been studied for decades in perfect-information games. We initiate a new direction via two simultaneous deviation points: generalization to imperfect-information games and a game-theoretic approach. We study how one should act when facing an opponent whose lookahead is limited. We study this for opponents that differ based on their lookahead depth, based on whether they, too, have imperfect information, and based on how they break ties. We characterize the hardness of finding a Nash equilibrium or an optimal commitment strategy for either player, showing that in some of these variations the problem can be solved in polynomial time while in others it is PPAD-hard, NP-hard, or inapproximable. We proceed to design algorithms for computing optimal commitment strategies—for when the opponent breaks ties favorably, according to a fixed rule, or adversarially. We then experimentally investigate the impact of limited lookahead. The limited-lookahead player often obtains the value of the game if she knows the expected values of nodes in the game tree for some equilibrium—but we prove this is not sufficient in general. Finally, we study the impact of noise in those estimates and different lookahead depths.
Tasks
Published	2019-02-17
URL	https://arxiv.org/abs/1902.06335v2
PDF	https://arxiv.org/pdf/1902.06335v2.pdf
PWC	https://paperswithcode.com/paper/limited-lookahead-in-imperfect-information
Repo
Framework

Semi-supervised Learning on Graph with an Alternating Diffusion Process


Title	Semi-supervised Learning on Graph with an Alternating Diffusion Process
Authors	Qilin Li, Senjian An, Ling Li, Wanquan Liu
Abstract	Graph-based semi-supervised learning usually involves two separate stages, constructing an affinity graph and then propagating labels for transductive inference on the graph. It is suboptimal to solve them independently, as the correlation between the affinity graph and labels are not fully exploited. In this paper, we integrate the two stages into one unified framework by formulating the graph construction as a regularized function estimation problem similar to label propagation. We propose an alternating diffusion process to solve the two problems simultaneously, which allows us to learn the graph and unknown labels in an iterative fashion. With the proposed framework, we are able to adequately leverage both the given labels and estimated labels to construct a better graph, and effectively propagate labels on such a dynamic graph updated simultaneously with the newly obtained labels. Extensive experiments on various real-world datasets have demonstrated the superiority of the proposed method compared to other state-of-the-art methods.
Tasks	graph construction
Published	2019-02-16
URL	http://arxiv.org/abs/1902.06105v1
PDF	http://arxiv.org/pdf/1902.06105v1.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-learning-on-graph-with-an
Repo
Framework

Single Level Feature-to-Feature Forecasting with Deformable Convolutions


Title	Single Level Feature-to-Feature Forecasting with Deformable Convolutions
Authors	Josip Šarić, Marin Oršić, Tonći Antunović, Sacha Vražić, Siniša Šegvić
Abstract	Future anticipation is of vital importance in autonomous driving and other decision-making systems. We present a method to anticipate semantic segmentation of future frames in driving scenarios based on feature-to-feature forecasting. Our method is based on a semantic segmentation model without lateral connections within the upsampling path. Such design ensures that the forecasting addresses only the most abstract features on a very coarse resolution. We further propose to express feature-to-feature forecasting with deformable convolutions. This increases the modelling power due to being able to represent different motion patterns within a single feature map. Experiments show that our models with deformable convolutions outperform their regular and dilated counterparts while minimally increasing the number of parameters. Our method achieves state of the art performance on the Cityscapes validation set when forecasting nine timesteps into the future.
Tasks	Autonomous Driving, Decision Making, Semantic Segmentation
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11475v1
PDF	https://arxiv.org/pdf/1907.11475v1.pdf
PWC	https://paperswithcode.com/paper/single-level-feature-to-feature-forecasting
Repo
Framework

Neural Simplex Architecture


Title	Neural Simplex Architecture
Authors	Dung T. Phan, Radu Grosu, Nils Jansen, Nicola Paoletti, Scott A. Smolka, Scott D. Stoller
Abstract	We present the Neural Simplex Architecture (NSA), a new approach to runtime assurance that provides safety guarantees for neural controllers (obtained e.g. using reinforcement learning) of autonomous and other complex systems without unduly sacrificing performance. NSA is inspired by the Simplex control architecture of Sha et al., but with some significant differences. In the traditional approach, the advanced controller (AC) is treated as a black box; when the decision module switches control to the baseline controller (BC), the BC remains in control forever. There is relatively little work on switching control back to the AC, and there are no techniques for correcting the AC’s behavior after it generates a potentially unsafe control input that causes a failover to the BC. Our NSA addresses both of these limitations. NSA not only provides safety assurances in the presence of a possibly unsafe neural controller, but can also improve the safety of such a controller in an online setting via retraining, without overly degrading its performance. To demonstrate NSA’s benefits, we have conducted several significant case studies in the continuous control domain. These include a target-seeking ground rover navigating an obstacle field, and a neural controller for an artificial pancreas system.
Tasks	Continuous Control
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00528v2
PDF	https://arxiv.org/pdf/1908.00528v2.pdf
PWC	https://paperswithcode.com/paper/neural-simplex-architecture
Repo
Framework

A System-Level Solution for Low-Power Object Detection


Title	A System-Level Solution for Low-Power Object Detection
Authors	Fanrong Li, Zitao Mo, Peisong Wang, Zejian Liu, Jiayun Zhang, Gang Li, Qinghao Hu, Xiangyu He, Cong Leng, Yang Zhang, Jian Cheng
Abstract	Object detection has made impressive progress in recent years with the help of deep learning. However, state-of-the-art algorithms are both computation and memory intensive. Though many lightweight networks are developed for a trade-off between accuracy and efficiency, it is still a challenge to make it practical on an embedded device. In this paper, we present a system-level solution for efficient object detection on a heterogeneous embedded device. The detection network is quantized to low bits and allows efficient implementation with shift operators. In order to make the most of the benefits of low-bit quantization, we design a dedicated accelerator with programmable logic. Inside the accelerator, a hybrid dataflow is exploited according to the heterogeneous property of different convolutional layers. We adopt a straightforward but resource-friendly column-prior tiling strategy to map the computation-intensive convolutional layers to the accelerator that can support arbitrary feature size. Other operations can be performed on the low-power CPU cores, and the entire system is executed in a pipelined manner. As a case study, we evaluate our object detection system on a real-world surveillance video with input size of 512x512, and it turns out that the system can achieve an inference speed of 18 fps at the cost of 6.9W (with display) with an mAP of 66.4 verified on the PASCAL VOC 2012 dataset.
Tasks	Object Detection, Quantization
Published	2019-09-24
URL	https://arxiv.org/abs/1909.10964v2
PDF	https://arxiv.org/pdf/1909.10964v2.pdf
PWC	https://paperswithcode.com/paper/a-system-level-solution-for-low-power-object
Repo
Framework


Title	On the Role of Weight Sharing During Deep Option Learning
Authors	Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, Miao Liu, Gerald Tesauro
Abstract	The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over options can be disentangled from the dynamics of the underlying options. In fact, learning can be sped up by focusing the policy over options on states where options are actually likely to terminate. We put our new algorithms to the test in application to sample efficient learning of Atari games, and demonstrate significantly improved stability and faster convergence when learning long options.
Tasks	Atari Games
Published	2019-12-31
URL	https://arxiv.org/abs/1912.13408v2
PDF	https://arxiv.org/pdf/1912.13408v2.pdf
PWC	https://paperswithcode.com/paper/on-the-role-of-weight-sharing-during-deep
Repo
Framework

Recent Advances in Algorithmic High-Dimensional Robust Statistics


Title	Recent Advances in Algorithmic High-Dimensional Robust Statistics
Authors	Ilias Diakonikolas, Daniel M. Kane
Abstract	Learning in the presence of outliers is a fundamental problem in statistics. Until recently, all known efficient unsupervised learning algorithms were very sensitive to outliers in high dimensions. In particular, even for the task of robust mean estimation under natural distributional assumptions, no efficient algorithm was known. Recent work in theoretical computer science gave the first efficient robust estimators for a number of fundamental statistical tasks, including mean and covariance estimation. Since then, there has been a flurry of research activity on algorithmic high-dimensional robust estimation in a range of settings. In this survey article, we introduce the core ideas and algorithmic techniques in the emerging area of algorithmic high-dimensional robust statistics with a focus on robust mean estimation. We also provide an overview of the approaches that have led to computationally efficient robust estimators for a range of broader statistical tasks and discuss new directions and opportunities for future work.
Tasks
Published	2019-11-14
URL	https://arxiv.org/abs/1911.05911v1
PDF	https://arxiv.org/pdf/1911.05911v1.pdf
PWC	https://paperswithcode.com/paper/recent-advances-in-algorithmic-high
Repo
Framework

A Low-Power Domino Logic Architecture for Memristor-Based Neuromorphic Computing


Title	A Low-Power Domino Logic Architecture for Memristor-Based Neuromorphic Computing
Authors	Cory Merkel, Animesh Nikam
Abstract	We propose a domino logic architecture for memristor-based neuromorphic computing. The design uses the delay of memristor RC circuits to represent synaptic computations and a simple binary neuron activation function. Synchronization schemes are proposed for communicating information between neural network layers, and a simple linear power model is developed to estimate the design’s energy efficiency for a particular network size. Results indicate that the proposed architecture can achieve 0.61 fJ per classification per component (neurons and synapses) and outperforms other designs in terms of energy per % accuracy.
Tasks
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05781v1
PDF	https://arxiv.org/pdf/1906.05781v1.pdf
PWC	https://paperswithcode.com/paper/a-low-power-domino-logic-architecture-for
Repo
Framework

DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks


Title	DSAL-GAN: Denoising based Saliency Prediction with Generative Adversarial Networks
Authors	Prerana Mukherjee, Manoj Sharma, Megh Makwana, Ajay Pratap Singh, Avinash Upadhyay, Akkshita Trivedi, Brejesh Lall, Santanu Chaudhury
Abstract	Synthesizing high quality saliency maps from noisy images is a challenging problem in computer vision and has many practical applications. Samples generated by existing techniques for saliency detection cannot handle the noise perturbations smoothly and fail to delineate the salient objects present in the given scene. In this paper, we present a novel end-to-end coupled Denoising based Saliency Prediction with Generative Adversarial Network (DSAL-GAN) framework to address the problem of salient object detection in noisy images. DSAL-GAN consists of two generative adversarial-networks (GAN) trained end-to-end to perform denoising and saliency prediction altogether in a holistic manner. The first GAN consists of a generator which denoises the noisy input image, and in the discriminator counterpart we check whether the output is a denoised image or ground truth original image. The second GAN predicts the saliency maps from raw pixels of the input denoised image using a data-driven metric based on saliency prediction method with adversarial loss. Cycle consistency loss is also incorporated to further improve salient region prediction. We demonstrate with comprehensive evaluation that the proposed framework outperforms several baseline saliency models on various performance benchmarks.
Tasks	Denoising, Object Detection, Saliency Detection, Saliency Prediction, Salient Object Detection
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01215v1
PDF	http://arxiv.org/pdf/1904.01215v1.pdf
PWC	https://paperswithcode.com/paper/dsal-gan-denoising-based-saliency-prediction
Repo
Framework