January 29, 2020

3600 words 17 mins read

Paper Group ANR 725

Paper Group ANR 725

Nostalgin: Extracting 3D City Models from Historical Image Data. Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task. Prediction and Sampling with Local Graph Transforms for Quasi-Lossless Light Field Compression. Deterministic Completion of Rectangular Matrices Using Ramanujan Bigraphs – II: Explicit Cons …

Nostalgin: Extracting 3D City Models from Historical Image Data

Title Nostalgin: Extracting 3D City Models from Historical Image Data
Authors Amol Kapoor, Hunter Larco, Raimondas Kiveris
Abstract What did it feel like to walk through a city from the past? In this work, we describe Nostalgin (Nostalgia Engine), a method that can faithfully reconstruct cities from historical images. Unlike existing work in city reconstruction, we focus on the task of reconstructing 3D cities from historical images. Working with historical image data is substantially more difficult, as there are significantly fewer buildings available and the details of the camera parameters which captured the images are unknown. Nostalgin can generate a city model even if there is only a single image per facade, regardless of viewpoint or occlusions. To achieve this, our novel architecture combines image segmentation, rectification, and inpainting. We motivate our design decisions with experimental analysis of individual components of our pipeline, and show that we can improve on baselines in both speed and visual realism. We demonstrate the efficacy of our pipeline by recreating two 1940s Manhattan city blocks. We aim to deploy Nostalgin as an open source platform where users can generate immersive historical experiences from their own photos.
Tasks Semantic Segmentation
Published 2019-05-06
URL https://arxiv.org/abs/1905.01772v1
PDF https://arxiv.org/pdf/1905.01772v1.pdf
PWC https://paperswithcode.com/paper/nostalgin-extracting-3d-city-models-from
Repo
Framework

Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task

Title Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task
Authors Aritra Bhowmik, Stefan Gumhold, Carsten Rother, Eric Brachmann
Abstract We address a core problem of computer vision: Detection and description of 2D feature points for image matching. For a long time, hand-crafted designs, like the seminal SIFT algorithm, were unsurpassed in accuracy and efficiency. Recently, learned feature detectors emerged that implement detection and description using neural networks. Training these networks usually resorts to optimizing low-level matching scores, often pre-defining sets of image patches which should or should not match, or which should or should not contain key points. Unfortunately, increased accuracy for these low-level matching scores does not necessarily translate to better performance in high-level vision tasks. We propose a new training methodology which embeds the feature detector in a complete vision pipeline, and where the learnable parameters are trained in an end-to-end fashion. We overcome the discrete nature of key point selection and descriptor matching using principles from reinforcement learning. As an example, we address the task of relative pose estimation between a pair of images. We demonstrate that the accuracy of a state-of-the-art learning-based feature detector can be increased when trained for the task it is supposed to solve at test time. Our training methodology poses little restrictions on the task to learn, and works for any architecture which predicts key point heat maps, and descriptors for key point locations.
Tasks Pose Estimation
Published 2019-12-02
URL https://arxiv.org/abs/1912.00623v2
PDF https://arxiv.org/pdf/1912.00623v2.pdf
PWC https://paperswithcode.com/paper/reinforced-feature-points-optimizing-feature
Repo
Framework

Prediction and Sampling with Local Graph Transforms for Quasi-Lossless Light Field Compression

Title Prediction and Sampling with Local Graph Transforms for Quasi-Lossless Light Field Compression
Authors Mira Rizkallah, Thomas Maugey, Christine Guillemot
Abstract Graph-based transforms have been shown to be powerful tools in terms of image energy compaction. However, when the support increases to best capture signal dependencies, the computation of the basis functions becomes rapidly untractable. This problem is in particular compelling for high dimensional imaging data such as light fields. The use of local transforms with limited supports is a way to cope with this computational difficulty. Unfortunately, the locality of the support may not allow us to fully exploit long term signal dependencies present in both the spatial and angular dimensions in the case of light fields. This paper describes sampling and prediction schemes with local graph-based transforms enabling to efficiently compact the signal energy and exploit dependencies beyond the local graph support. The proposed approach is investigated and is shown to be very efficient in the context of spatio-angular transforms for quasi-lossless compression of light fields.
Tasks
Published 2019-03-08
URL http://arxiv.org/abs/1903.03546v1
PDF http://arxiv.org/pdf/1903.03546v1.pdf
PWC https://paperswithcode.com/paper/prediction-and-sampling-with-local-graph
Repo
Framework

Deterministic Completion of Rectangular Matrices Using Ramanujan Bigraphs – II: Explicit Constructions and Phase Transitions

Title Deterministic Completion of Rectangular Matrices Using Ramanujan Bigraphs – II: Explicit Constructions and Phase Transitions
Authors Shantanu Prasad Burnwal, Mathukumalli Vidyasagar, Kaneenika Sinha
Abstract Matrix completion is a part of compressed sensing, and refers to determining an unknown low-rank matrix from a relatively small number of samples of the elements of the matrix. The problem has applications in recommendation engines, sensor localization, quantum tomography etc. In a companion paper (Part-1), the first and second author showed that it is possible to guarantee exact completion of an unknown low rank matrix, if the sample set corresponds to the edge set of a Ramanujan bigraph. In this paper, we present for the first time an infinite family of unbalanced Ramanujan bigraphs with explicitly constructed biadjacency matrices. In addition, we also show how to construct the adjacency matrices for the currently available families of Ramanujan graphs. In an attempt to determine how close the sufficient condition presented in Part-1 is to being necessary, we carried out numerical simulations of nuclear norm minimization on randomly generated low-rank matrices. The results revealed several noteworthy points, the most interesting of which is the existence of a phase transition. For square matrices, the maximum rank $\bar{r}$ for which nuclear norm minimization correctly completes all low-rank matrices is approximately $\bar{r} \approx d/3$, where $d$ is the degree of the Ramanujan graph. This upper limit appears to be independent of the specific family of Ramanujan graphs. The percentage of low-rank matrices that are recovered changes from 100% to 0% if the rank is increased by just two beyond $\bar{r}$. Again, this phenomenon appears to be independent of the specific family of Ramanujan graphs.
Tasks Matrix Completion
Published 2019-10-08
URL https://arxiv.org/abs/1910.03937v1
PDF https://arxiv.org/pdf/1910.03937v1.pdf
PWC https://paperswithcode.com/paper/deterministic-completion-of-rectangular-1
Repo
Framework

Deep Learning to Scale up Time Series Traffic Prediction

Title Deep Learning to Scale up Time Series Traffic Prediction
Authors Julien Monteil, Anton Dekusar, Claudio Gambella, Yassine Lassoued, Martin Mevissen
Abstract The transport literature is dense regarding short-term traffic predictions, up to the scale of 1 hour, yet less dense for long-term traffic predictions. The transport literature is also sparse when it comes to city-scale traffic predictions, mainly because of low data availability. The main question we try to answer in this work is to which extent the approaches used for short-term prediction at a link level can be scaled up for long-term prediction at a city scale. We investigate a city-scale traffic dataset with 14 weeks of speed observations collected every 15 minutes over 1098 segments in the hypercenter of Los Angeles, California. We look at a variety of machine learning and deep learning predictors for link-based predictions, and investigate ways to make such predictors scale up for larger areas, with brute force, clustering, and model design approaches. In particular we propose a novel deep learning spatio-temporal predictor inspired from recent works on recommender systems. We discuss the potential of including spatio-temporal features into the predictors, and conclude that modelling such features can be helpful for long-term predictions, while simpler predictors achieve very satisfactory performance for link-based and short-term forecasting. The trade-off is discussed not only in terms of prediction accuracy vs prediction horizon but also in terms of training time and model sizing.
Tasks Recommendation Systems, Time Series, Traffic Prediction
Published 2019-11-29
URL https://arxiv.org/abs/1911.13042v1
PDF https://arxiv.org/pdf/1911.13042v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-to-scale-up-time-series-traffic
Repo
Framework

Graph Convolution Networks for Probabilistic Modeling of Driving Acceleration

Title Graph Convolution Networks for Probabilistic Modeling of Driving Acceleration
Authors Jianyu Su, Peter A. Beling, Rui Guo, Kyungtae Han
Abstract The ability to model and predict ego-vehicle’s surrounding traffic is crucial for autonomous pilots and intelligent driver-assistance systems. Acceleration prediction is important as one of the major components of traffic prediction. This paper proposes novel approaches to the acceleration prediction problem. By representing spatial relationships between vehicles with a graph model, we build a generalized acceleration prediction framework. This paper studies the effectiveness of proposed Graph Convolution Networks, which operate on graphs predicting the acceleration distribution for vehicles driving on highways. We further investigate prediction improvement through integrating of Recurrent Neural Networks to disentangle the temporal complexity inherent in the traffic data. Results from simulation studies using comprehensive performance metrics support the conclusion that our proposed networks outperform state-of-the-art methods in generating realistic trajectories over a prediction horizon.
Tasks Traffic Prediction
Published 2019-11-22
URL https://arxiv.org/abs/1911.09837v2
PDF https://arxiv.org/pdf/1911.09837v2.pdf
PWC https://paperswithcode.com/paper/graph-convolution-networks-for-probabilistic
Repo
Framework

Sequential Graph Dependency Parser

Title Sequential Graph Dependency Parser
Authors Sean Welleck, Kyunghyun Cho
Abstract We propose a method for non-projective dependency parsing by incrementally predicting a set of edges. Since the edges do not have a pre-specified order, we propose a set-based learning method. Our method blends graph, transition, and easy-first parsing, including a prior state of the parser as a special case. The proposed transition-based method successfully parses near the state of the art on both projective and non-projective languages, without assuming a certain parsing order.
Tasks Dependency Parsing
Published 2019-05-27
URL https://arxiv.org/abs/1905.10930v2
PDF https://arxiv.org/pdf/1905.10930v2.pdf
PWC https://paperswithcode.com/paper/sequential-graph-dependency-parser
Repo
Framework

Multimodal Emotion Recognition Model using Physiological Signals

Title Multimodal Emotion Recognition Model using Physiological Signals
Authors Yuxuan Zhao, Xinyan Cao, Jinlong Lin, Dunshan Yu, Xixin Cao
Abstract As an important field of research in Human-Machine Interactions, emotion recognition based on physiological signals has become research hotspots. Motivated by the outstanding performance of deep learning approaches in recognition tasks, we proposed a Multimodal Emotion Recognition Model that consists of a 3D convolutional neural network model, a 1D convolutional neural network model and a biologically inspired multimodal fusion model which integrates multimodal information on the decision level for emotion recognition. We use this model to classify four emotional regions from the arousal valence plane, i.e., low arousal and low valence (LALV), high arousal and low valence (HALV), low arousal and high valence (LAHV) and high arousal and high valence (HAHV) in the DEAP and AMIGOS dataset. The 3D CNN model and 1D CNN model are used for emotion recognition based on electroencephalogram (EEG) signals and peripheral physiological signals respectively, and get the accuracy of 93.53% and 95.86% with the original EEG signals in these two datasets. Compared with the single-modal recognition, the multimodal fusion model improves the accuracy of emotion recognition by 5% ~ 25%, and the fusion result of EEG signals (decomposed into four frequency bands) and peripheral physiological signals get the accuracy of 95.77%, 97.27% and 91.07%, 99.74% in these two datasets respectively. Integrated EEG signals and peripheral physiological signals, this model could reach the highest accuracy about 99% in both datasets which shows that our proposed method demonstrates certain advantages in solving the emotion recognition tasks.
Tasks EEG, Emotion Recognition, Multimodal Emotion Recognition
Published 2019-11-29
URL https://arxiv.org/abs/1911.12918v1
PDF https://arxiv.org/pdf/1911.12918v1.pdf
PWC https://paperswithcode.com/paper/multimodal-emotion-recognition-model-using
Repo
Framework

CAGFuzz: Coverage-Guided Adversarial Generative Fuzzing Testing of Deep Learning Systems

Title CAGFuzz: Coverage-Guided Adversarial Generative Fuzzing Testing of Deep Learning Systems
Authors Pengcheng Zhang, Qiyin Dai, Patrizio Pelliccione
Abstract Deep Learning systems (DL) based on Deep Neural Networks (DNNs) are more and more used in various aspects of our life, including unmanned vehicles, speech processing, and robotics. However, due to the limited dataset and the dependence on manual labeling data, DNNs often fail to detect their erroneous behaviors, which may lead to serious problems. Several approaches have been proposed to enhance the input examples for testing DL systems. However, they have the following limitations. First, they design and generate adversarial examples from the perspective of model, which may cause low generalization ability when they are applied to other models. Second, they only use surface feature constraints to judge the difference between the adversarial example generated and the original example. The deep feature constraints, which contain high-level semantic information, such as image object category and scene semantics are completely neglected. To address these two problems, in this paper, we propose CAGFuzz, a Coverage-guided Adversarial Generative Fuzzing testing approach, which generates adversarial examples for a targeted DNN to discover its potential defects. First, we train an adversarial case generator (AEG) from the perspective of general data set. Second, we extract the depth features of the original and adversarial examples, and constrain the adversarial examples by cosine similarity to ensure that the semantic information of adversarial examples remains unchanged. Finally, we retrain effective adversarial examples to improve neuron testing coverage rate. Based on several popular data sets, we design a set of dedicated experiments to evaluate CAGFuzz. The experimental results show that CAGFuzz can improve the neuron coverage rate, detect hidden errors, and also improve the accuracy of the target DNN.
Tasks
Published 2019-11-14
URL https://arxiv.org/abs/1911.07931v1
PDF https://arxiv.org/pdf/1911.07931v1.pdf
PWC https://paperswithcode.com/paper/cagfuzz-coverage-guided-adversarial
Repo
Framework

Iterative Model-Based Reinforcement Learning Using Simulations in the Differentiable Neural Computer

Title Iterative Model-Based Reinforcement Learning Using Simulations in the Differentiable Neural Computer
Authors Adeel Mufti, Svetlin Penkov, Subramanian Ramamoorthy
Abstract We propose a lifelong learning architecture, the Neural Computer Agent (NCA), where a Reinforcement Learning agent is paired with a predictive model of the environment learned by a Differentiable Neural Computer (DNC). The agent and DNC model are trained in conjunction iteratively. The agent improves its policy in simulations generated by the DNC model and rolls out the policy to the live environment, collecting experiences in new portions or tasks of the environment for further learning. Experiments in two synthetic environments show that DNC models can continually learn from pixels alone to simulate new tasks as they are encountered by the agent, while the agents can be successfully trained to solve the tasks using Proximal Policy Optimization entirely in simulations.
Tasks
Published 2019-06-17
URL https://arxiv.org/abs/1906.07248v1
PDF https://arxiv.org/pdf/1906.07248v1.pdf
PWC https://paperswithcode.com/paper/iterative-model-based-reinforcement-learning
Repo
Framework

Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization

Title Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization
Authors Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell
Abstract Humans can learn task-agnostic priors from interactive experience and utilize the priors for novel tasks without any finetuning. In this paper, we propose Scoring-Aggregating-Planning (SAP), a framework that can learn task-agnostic semantics and dynamics priors from arbitrary quality interactions under sparse reward and then plan on unseen tasks in zero-shot condition. The framework finds a neural score function for local regional state and action pairs that can be aggregated to approximate the quality of a full trajectory; moreover, a dynamics model that is learned with self-supervision can be incorporated for planning. Many previous works that leverage interactive data for policy learning either need massive on-policy environmental interactions or assume access to expert data while we can achieve a similar goal with pure off-policy imperfect data. Instantiating our framework results in a generalizable policy to unseen tasks. Experiments demonstrate that the proposed method can outperform baseline methods on a wide range of applications including gridworld, robotics tasks, and video games.
Tasks
Published 2019-10-17
URL https://arxiv.org/abs/1910.08143v1
PDF https://arxiv.org/pdf/1910.08143v1.pdf
PWC https://paperswithcode.com/paper/scoring-aggregating-planning-learning-task
Repo
Framework

VLUC: An Empirical Benchmark for Video-Like Urban Computing on Citywide Crowd and Traffic Prediction

Title VLUC: An Empirical Benchmark for Video-Like Urban Computing on Citywide Crowd and Traffic Prediction
Authors Renhe Jiang, Zekun Cai, Zhaonan Wang, Chuang Yang, Zipei Fan, Xuan Song, Kota Tsubouchi, Ryosuke Shibasaki
Abstract Nowadays, massive urban human mobility data are being generated from mobile phones, car navigation systems, and traffic sensors. Predicting the density and flow of the crowd or traffic at a citywide level becomes possible by using the big data and cutting-edge AI technologies. It has been a very significant research topic with high social impact, which can be widely applied to emergency management, traffic regulation, and urban planning. In particular, by meshing a large urban area to a number of fine-grained mesh-grids, citywide crowd and traffic information in a continuous time period can be represented like a video, where each timestamp can be seen as one video frame. Based on this idea, a series of methods have been proposed to address video-like prediction for citywide crowd and traffic. In this study, we publish a new aggregated human mobility dataset generated from a real-world smartphone application and build a standard benchmark for such kind of video-like urban computing with this new dataset and the existing open datasets. We first comprehensively review the state-of-the-art works of literature and formulate the density and in-out flow prediction problem, then conduct a thorough performance assessment for those methods. With this benchmark, we hope researchers can easily follow up and quickly launch a new solution on this topic.
Tasks Traffic Prediction
Published 2019-11-16
URL https://arxiv.org/abs/1911.06982v1
PDF https://arxiv.org/pdf/1911.06982v1.pdf
PWC https://paperswithcode.com/paper/vluc-an-empirical-benchmark-for-video-like
Repo
Framework

Automatic Prostate Zonal Segmentation Using Fully Convolutional Network with Feature Pyramid Attention

Title Automatic Prostate Zonal Segmentation Using Fully Convolutional Network with Feature Pyramid Attention
Authors Yongkai Liu, Guang Yang, Sohrab Afshari Mirak, Melina Hosseiny, Afshin Azadikhah, Xinran Zhong, Robert E. Reiter, Yeejin Lee, Steven Raman, Kyunghyun Sung
Abstract Our main objective is to develop a novel deep learning-based algorithm for automatic segmentation of prostate zone and to evaluate the proposed algorithm on an additional independent testing data in comparison with inter-reader consistency between two experts. With IRB approval and HIPAA compliance, we designed a novel convolutional neural network (CNN) for automatic segmentation of the prostatic transition zone (TZ) and peripheral zone (PZ) on T2-weighted (T2w) MRI. The total study cohort included 359 patients from two sources; 313 from a deidentified publicly available dataset (SPIE-AAPM-NCI PROSTATEX challenge) and 46 from a large U.S. tertiary referral center with 3T MRI (external testing dataset (ETD)). The TZ and PZ contours were manually annotated by research fellows, supervised by genitourinary (GU) radiologists. The model was developed using 250 patients and tested internally using the remaining 63 patients from the PROSTATEX (internal testing dataset (ITD)) and tested again (n=46) externally using the ETD. The Dice Similarity Coefficient (DSC) was used to evaluate the segmentation performance. DSCs for PZ and TZ were 0.74 and 0.86 in the ITD respectively. In the ETD, DSCs for PZ and TZ were 0.74 and 0.792, respectively. The inter-reader consistency (Expert 2 vs. Expert 1) were 0.71 (PZ) and 0.75 (TZ). This novel DL algorithm enabled automatic segmentation of PZ and TZ with high accuracy on both ITD and ETD without a performance difference for PZ and less than 10% TZ difference. In the ETD, the proposed method can be comparable to experts in the segmentation of prostate zones.
Tasks
Published 2019-10-31
URL https://arxiv.org/abs/1911.00127v1
PDF https://arxiv.org/pdf/1911.00127v1.pdf
PWC https://paperswithcode.com/paper/automatic-prostate-zonal-segmentation-using
Repo
Framework

Effects of Blur and Deblurring to Visual Object Tracking

Title Effects of Blur and Deblurring to Visual Object Tracking
Authors Qing Guo, Wei Feng, Zhihao Chen, Ruijun Gao, Liang Wan, Song Wang
Abstract Intuitively, motion blur may hurt the performance of visual object tracking. However, we lack quantitative evaluation of tracker robustness to different levels of motion blur. Meanwhile, while image deblurring methods can produce visually clearer videos for pleasing human eyes, it is unknown whether visual object tracking can benefit from image deblurring or not. In this paper, we address these two problems by constructing a Blurred Video Tracking benchmark, which contains a variety of videos with different levels of motion blurs, as well as ground truth tracking results for evaluating trackers. We extensively evaluate 23 trackers on this benchmark and observe several new interesting results. Specifically, we find that light blur may improve the performance of many trackers, but heavy blur always hurts the tracking performance. We also find that image deblurring may help to improve tracking performance on heavily blurred videos but hurt the performance on lightly blurred videos. According to these observations, we propose a new GAN based scheme to improve the tracker robustness to motion blurs. In this scheme, a finetuned discriminator is used as an adaptive assessor to selectively deblur frames during the tracking process. We use this scheme to successfully improve the accuracy and robustness of 6 trackers.
Tasks Deblurring, Object Tracking, Visual Object Tracking
Published 2019-08-21
URL https://arxiv.org/abs/1908.07904v1
PDF https://arxiv.org/pdf/1908.07904v1.pdf
PWC https://paperswithcode.com/paper/190807904
Repo
Framework

Jointly Trained Image and Video Generation using Residual Vectors

Title Jointly Trained Image and Video Generation using Residual Vectors
Authors Yatin Dandi, Aniket Das, Soumye Singhal, Vinay P. Namboodiri, Piyush Rai
Abstract In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire video. We utilize the technique to jointly train an image generation model with a fixed prior along with a video generation model lacking constraints such as disentanglement. The joint training enables the image generator to exploit temporal information while the video generation model learns to flexibly share information across frames. Moreover, experimental results verify our approach’s compatibility with pre-training on videos or images and training on datasets containing a mixture of both. A comprehensive set of quantitative and qualitative evaluations reveal the improvements in sample quality and diversity over both video generation and image generation baselines. We further demonstrate the technique’s capabilities of exploiting similarity in features across frames by applying it to a model based on decomposing the video into motion and content. The proposed model allows minor variations in content across frames while maintaining the temporal dependence through latent vectors encoding the pose or motion features.
Tasks Image Generation, Video Generation
Published 2019-12-17
URL https://arxiv.org/abs/1912.07991v1
PDF https://arxiv.org/pdf/1912.07991v1.pdf
PWC https://paperswithcode.com/paper/jointly-trained-image-and-video-generation
Repo
Framework
comments powered by Disqus