January 25, 2020

3086 words 15 mins read

Paper Group NAWR 23

6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images. TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts. Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks. Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes. A Do …

6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images


Title	6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images
Authors	Di Wu, Zhaoyong Zhuang, Canqun Xiang, Wenbin Zou and Xia Li
Abstract	We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenario. Our approach efficiently detects traffic partic- ipants in a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The method, called 6D-VNet, extends Mask R-CNN by adding customised heads for predicting vehicle s finer class, ro- tation and translation. The proposed 6D-VNet is trained end-to-end compared to previous methods. Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g., in autonomous driving sce- narios. Additionally, we incorporate the mutual informa- tion between traffic participants via a modified non-local block. As opposed to the original non-local block imple- mentation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values. Our 6D-VNet reaches the 1 st place in ApolloScape challenge 3D Car Instance task1 [21]. Code has been made available at: https://github.com/stevenwudi/6DVNET.
Tasks	Autonomous Driving, Pose Estimation
Published	2019-06-15
URL	http://openaccess.thecvf.com/content_CVPRW_2019/html/WAD/Wu_6D-VNet_End-To-End_6-DoF_Vehicle_Pose_Estimation_From_Monocular_RGB_Images_CVPRW_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPRW_2019/papers/WAD/Wu_6D-VNet_End-To-End_6-DoF_Vehicle_Pose_Estimation_From_Monocular_RGB_Images_CVPRW_2019_paper.pdf
PWC	https://paperswithcode.com/paper/6d-vnet-end-to-end-6dof-vehicle-pose
Repo	https://github.com/stevenwudi/6DVNET
Framework	pytorch

TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts


Title	TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts
Authors	Ziyang Hong, Yvan Petillot, David Lane, Yishu Miao, Sen Wang
Abstract	Visual place recognition is a fundamental problem for many vision based applications. Sparse feature and deep learning based methods have been successful and dominant over the decade. However, most of them do not explicitly leverage high-level semantic information to deal with challenging scenarios where they may fail. This paper proposes a novel visual place recognition algorithm, termed TextPlace, based on scene texts in the wild. Since scene texts are high-level information invariant to illumination changes and very distinct for different places when considering spatial correlation, it is beneficial for visual place recognition tasks under extreme appearance changes and perceptual aliasing. It also takes spatial-temporal dependence between scene texts into account for topological localization. Extensive experiments show that TextPlace achieves state-of-the-art performance, verifying the effectiveness of using high-level scene texts for robust visual place recognition in urban areas.
Tasks	Visual Place Recognition
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Hong_TextPlace_Visual_Place_Recognition_and_Topological_Localization_Through_Reading_Scene_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Hong_TextPlace_Visual_Place_Recognition_and_Topological_Localization_Through_Reading_Scene_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/textplace-visual-place-recognition-and
Repo	https://github.com/ziyanghong/dataset
Framework	none

Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks


Title	Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks
Authors	Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-Ichi Maeda
Abstract	Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones.
Tasks	Model Compression, Neural Architecture Search
Published	2019-12-01
URL	http://papers.nips.cc/paper/8793-exploring-unexplored-tensor-network-decompositions-for-convolutional-neural-networks
PDF	http://papers.nips.cc/paper/8793-exploring-unexplored-tensor-network-decompositions-for-convolutional-neural-networks.pdf
PWC	https://paperswithcode.com/paper/exploring-unexplored-tensor-network
Repo	https://github.com/pfnet-research/einconv
Framework	pytorch

Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes


Title	Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
Authors	Greg Yang
Abstract	Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al.~(2018) and Matthews et al.~(2018) for deep fully-connected networks, as well as by Novak et al.~(2019) and Garriga-Alonso et al.~(2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the \emph{tensor programs} technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there. We provide open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network at github.com/thegregyang/GP4A. Please see our arxiv version for the complete and up-to-date version of this paper.
Tasks	Gaussian Processes
Published	2019-12-01
URL	http://papers.nips.cc/paper/9186-wide-feedforward-or-recurrent-neural-networks-of-any-architecture-are-gaussian-processes
PDF	http://papers.nips.cc/paper/9186-wide-feedforward-or-recurrent-neural-networks-of-any-architecture-are-gaussian-processes.pdf
PWC	https://paperswithcode.com/paper/wide-feedforward-or-recurrent-neural-networks
Repo	https://github.com/thegregyang/GP4A
Framework	none

A Domain Agnostic Measure for Monitoring and Evaluating GANs


Title	A Domain Agnostic Measure for Monitoring and Evaluating GANs
Authors	Paulina Grnarova, Kfir Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause
Abstract	Generative Adversarial Networks (GANs) have shown remarkable results in modeling complex distributions, but their evaluation remains an unsettled issue. Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training. The latter cannot be determined by simply inspecting the generator and discriminator loss curves as they behave non-intuitively. We leverage the notion of duality gap from game theory to propose a measure that addresses both (i) and (ii) at a low computational cost. Extensive experiments show the effectiveness of this measure to rank different GAN models and capture the typical GAN failure scenarios, including mode collapse and non-convergent behaviours. This evaluation metric also provides meaningful monitoring on the progression of the loss during training. It highly correlates with FID on natural image datasets, and with domain specific scores for text, sound and cosmology data where FID is not directly suitable. In particular, our proposed metric requires no labels or a pretrained classifier, making it domain agnostic.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/9377-a-domain-agnostic-measure-for-monitoring-and-evaluating-gans
PDF	http://papers.nips.cc/paper/9377-a-domain-agnostic-measure-for-monitoring-and-evaluating-gans.pdf
PWC	https://paperswithcode.com/paper/a-domain-agnostic-measure-for-monitoring-and
Repo	https://github.com/pgrnar/DualityGap
Framework	none

Recurrent Highway Networks with Grouped Auxiliary Memory


Title	Recurrent Highway Networks with Grouped Auxiliary Memory
Authors	Wei Luo ; Feng Yu
Abstract	Recurrent neural networks (RNNs) are challenging to train, let alone those with deep spatial structures. Architectures built upon highway connections such as Recurrent Highway Network (RHN) were developed to allow larger step-to-step transition depth, leading to more expressive models. However, problems that require capturing long-term dependencies still can not be well addressed by these models. Moreover, the ability to keep long-term memories tends to diminish when the spatial depth increases, since deeper structure may accelerate gradient vanishing. In this paper, we address these issues by proposing a novel RNN architecture based on RHN, namely the Recurrent Highway Network with Grouped Auxiliary Memory (GAM-RHN). The proposed architecture interconnects the RHN with a set of auxiliary memory units specifically for storing long-term information via reading and writing operations, which is analogous to Memory Augmented Neural Networks (MANNs). Experimental results on artificial long time lag tasks show that GAM-RHNs can be trained efficiently while being deep in both time and space. We also evaluate the proposed architecture on a variety of tasks, including language modeling, sequential image classification, and financial market forecasting. The potential of our approach is demonstrated by achieving state-of-the-art results on these tasks.
Tasks	Image Classification, Language Modelling, Sequential Image Classification, Stock Trend Prediction
Published	2019-12-13
URL	https://ieeexplore.ieee.org/document/8932404
PDF	https://ieeexplore.ieee.org/document/8932404
PWC	https://paperswithcode.com/paper/recurrent-highway-networks-with-grouped
Repo	https://github.com/WilliamRo/gam_rhn
Framework	tf

Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG


Title	Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG
Authors	YeongHyeon Park, Il Dong Yun, Si-Hyuck Kang
Abstract	ST elevation myocardial infarction (STEMI) is an acute life-threatening disease. It shows a high mortality risk when a patient is not timely treated within the golden time, prompt diagnosis with limited information such as electrocardiogram (ECG) is crucial. However, previous studies among physicians and paramedics have shown that the accuracy of STEMI diagnosis by the ECG is not sufficient. Thus, we propose a detecting algorithm based on a convolutional neural network (CNN) for detecting the STEMI on 12-lead ECG in order to support physicians, especially in an emergency room. We mostly focus on enhancing the detecting performance using a preprocessing technique. First, we reduce the noise of ECG using a notch filter and high-pass filter. We also segment pulses from ECG to focus on the ST segment. We use 96 normal and 179 STEMI records provided by Seoul National University Bundang Hospital (SNUBH) for the experiment. The sensitivity, specificity, and area under the curve (AUC) of the receiver operating characteristic (ROC) curve are increased from 0.685, 0.350, and 0.526 to 0.932, 0.896, and 0.943, respectively, depending on the preprocessing technique. As our result shows, the proposed method is effective to enhance STEIM detecting performance. Also, the proposed algorithm would be expected to help timely and the accurate diagnosis of STEMI in clinical practices.
Tasks
Published	2019-07-24
URL	https://ieeexplore.ieee.org/abstract/document/8771175
PDF	https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8771175
PWC	https://paperswithcode.com/paper/preprocessing-method-for-performance
Repo	https://github.com/YeongHyeon/Enhancementing-Method-for-STEMI-Detection
Framework	tf

Activation Atlas


Title	Activation Atlas
Authors	Shan Carter, Zan Armstrong, Ludwig Schubert, Ian Johnson, Chris Olah
Abstract	By using feature inversion to visualize millions of activations from an image classification network, we create an explorable activation atlas of features the network has learned which can reveal how the network typically represents some concepts.
Tasks	Image Classification
Published	2019-03-06
URL	https://distill.pub/2019/activation-atlas/
PDF	https://distill.pub/2019/activation-atlas/
PWC	https://paperswithcode.com/paper/activation-atlas
Repo	https://github.com/tensorflow/lucid
Framework	tf

Practical Differentially Private Top-k Selection with Pay-what-you-get Composition


Title	Practical Differentially Private Top-k Selection with Pay-what-you-get Composition
Authors	David Durfee, Ryan M. Rogers
Abstract	We study the problem of top-k selection over a large domain universe subject to user-level differential privacy. Typically, the exponential mechanism or report noisy max are the algorithms used to solve this problem. However, these algorithms require querying the database for the count of each domain element. We focus on the setting where the data domain is unknown, which is different than the setting of frequent itemsets where an apriori type algorithm can help prune the space of domain elements to query. We design algorithms that ensures (approximate) differential privacy and only needs access to the true top-k’ elements from the data for any chosen k’ ≥ k. This is a highly desirable feature for making differential privacy practical, since the algorithms require no knowledge of the domain. We consider both the setting where a user’s data can modify an arbitrary number of counts by at most 1, i.e. unrestricted sensitivity, and the setting where a user’s data can modify at most some small, fixed number of counts by at most 1, i.e. restricted sensitivity. Additionally, we provide a pay-what-you-get privacy composition bound for our algorithms. That is, our algorithms might return fewer than k elements when the top-k elements are queried, but the overall privacy budget only decreases by the size of the outcome set.
Tasks
Published	2019-12-01
URL	http://papers.nips.cc/paper/8612-practical-differentially-private-top-k-selection-with-pay-what-you-get-composition
PDF	http://papers.nips.cc/paper/8612-practical-differentially-private-top-k-selection-with-pay-what-you-get-composition.pdf
PWC	https://paperswithcode.com/paper/practical-differentially-private-top-k
Repo	https://github.com/rrogers386/DPComposition
Framework	none

Performance prediction of data streams on high-performance architecture


Title	Performance prediction of data streams on high-performance architecture
Authors	Bhaskar Gautam, Annappa Basava
Abstract	Worldwide sensor streams are expanding continuously with unbounded velocity in volume, and for this acceleration, there is an adaptation of large stream data processing system from the homogeneous to rack-scale architecture which makes serious concern in the domain of workload optimization, scheduling, and resource management algorithms. Our proposed framework is based on providing architecture independent performance prediction model to enable resource adaptive distributed stream data processing platform. It is comprised of seven pre-defined domain for dynamic data stream metrics including a self-driven model which tries to fit these metrics using ridge regularization regression algorithm. Another significant contribution lies in fully-automated performance prediction model inherited from the state-of-the-art distributed data management system for distributed stream processing systems using Gaussian processes regression that cluster metrics with the help of dimensionality reduction algorithm. We implemented its base on Apache Heron and evaluated with proposed Benchmark Suite comprising of five domain-specific topologies. To assess the proposed methodologies, we forcefully ingest tuple skewness among the benchmarking topologies to set up the ground truth for predictions and found that accuracy of predicting the performance of data streams increased up to 80.62% from 66.36% along with the reduction of error from 37.14 to 16.06%.
Tasks	Dimensionality Reduction, Gaussian Processes
Published	2019-01-07
URL	https://doi.org/10.1186/s13673-018-0163-4
PDF	https://rdcu.be/bMVaG
PWC	https://paperswithcode.com/paper/performance-prediction-of-data-streams-on
Repo	https://github.com/bhaskar24/StreamBenchmark
Framework	none

Optimal Decision Tree with Noisy Outcomes


Title	Optimal Decision Tree with Noisy Outcomes
Authors	Su Jia, Viswanath Nagarajan, Fatemeh Navidi, R Ravi
Abstract	A fundamental task in active learning involves performing a sequence of tests to identify an unknown hypothesis that is drawn from a known distribution. This problem, known as optimal decision tree induction, has been widely studied for decades and the asymptotically best-possible approximation algorithm has been devised for it. We study a generalization where certain test outcomes are noisy, even in the more general case when the noise is persistent, i.e., repeating the test on the scenario gives the same noisy output, disallowing simple repetition as a way to gain confidence. We design new approximation algorithms for both the non-adaptive setting, where the test sequence must be fixed a-priori, and the adaptive setting where the test sequence depends on the outcomes of prior tests. Previous work in the area assumed at most a constant number of noisy outcomes per test and per scenario and provided approximation ratios that were problem dependent (such as the minimum probability of a hypothesis). Our new approximation algorithms provide guarantees that are nearly best-possible and work for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades smoothly with this number. Our results adapt and generalize methods used for submodular ranking and stochastic set cover. We evaluate the performance of our algorithms on two natural applications with noise: toxic chemical identification and active learning of linear classifiers. Despite our logarithmic theoretical approximation guarantees, our methods give solutions with cost very close to the information theoretic minimum, demonstrating the effectiveness of our methods.
Tasks	Active Learning
Published	2019-12-01
URL	http://papers.nips.cc/paper/8592-optimal-decision-tree-with-noisy-outcomes
PDF	http://papers.nips.cc/paper/8592-optimal-decision-tree-with-noisy-outcomes.pdf
PWC	https://paperswithcode.com/paper/optimal-decision-tree-with-noisy-outcomes
Repo	https://github.com/sjia1/ODT-with-noisy-outcomes
Framework	none

Pyramid U-Network for Skeleton Extraction From Shape Points


Title	Pyramid U-Network for Skeleton Extraction From Shape Points
Authors	Rowel Atienza
Abstract	The knowledge about the skeleton of a given geometric shape has many practical applications such as shape animation, shape comparison, shape recognition, and estimating structural strength. Skeleton extraction becomes a more challenging problem when the topology is represented in point cloud domain. In this paper, we present the network architecture, PSPU-SkelNet, for TeamPH which ranked 3rd in Point SkelNetOn 2019 challenge. PSPU-SkelNet is a pyramid of three U-Nets that predicts the skeleton from a given shape point cloud. PSPU-SkelNet achieves a Chamfer Distance (CD) of 2.9105 on the final test dataset. The code of PSPU SkelNet is available at https://github.com/roatienza/skelnet.
Tasks
Published	2019-06-17
URL	http://openaccess.thecvf.com/content_CVPRW_2019/papers/SkelNetOn/Atienza_Pyramid_U-Network_for_Skeleton_Extraction_From_Shape_Points_CVPRW_2019_paper.pdf
PDF	http://openaccess.thecvf.com/content_CVPRW_2019/papers/SkelNetOn/Atienza_Pyramid_U-Network_for_Skeleton_Extraction_From_Shape_Points_CVPRW_2019_paper.pdf
PWC	https://paperswithcode.com/paper/pyramid-u-network-for-skeleton-extraction
Repo	https://github.com/roatienza/skelnet
Framework	none

Sphere Generative Adversarial Network Based on Geometric Moment Matching


Title	Sphere Generative Adversarial Network Based on Geometric Moment Matching
Authors	Sung Woo Park, Junseok Kwon
Abstract	We propose sphere generative adversarial network (GAN), a novel integral probability metric (IPM)-based GAN. Sphere GAN uses the hypersphere to bound IPMs in the objective function. Thus, it can be trained stably. On the hypersphere, sphere GAN exploits the information of higher-order statistics of data using geometric moment matching, thereby providing more accurate results. In the paper, we mathematically prove the good properties of sphere GAN. In experiments, sphere GAN quantitatively and qualitatively surpasses recent state-of-the-art GANs for unsupervised image generation problems with the CIFAR-10, STL-10, and LSUN bedroom datasets. Source code is available at https://github.com/pswkiki/SphereGAN.
Tasks	Image Generation
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Park_Sphere_Generative_Adversarial_Network_Based_on_Geometric_Moment_Matching_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Park_Sphere_Generative_Adversarial_Network_Based_on_Geometric_Moment_Matching_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/sphere-generative-adversarial-network-based
Repo	https://github.com/taki0112/SphereGAN-Tensorflow
Framework	tf

Fixed That for You: Generating Contrastive Claims with Semantic Edits


Title	Fixed That for You: Generating Contrastive Claims with Semantic Edits
Authors	Christopher Hidey, Kathy McKeown
Abstract	Understanding contrastive opinions is a key component of argument generation. Central to an argument is the claim, a statement that is in dispute. Generating a counter-argument then requires generating a response in contrast to the main claim of the original argument. To generate contrastive claims, we create a corpus of Reddit comment pairs self-labeled by posters using the acronym FTFY (fixed that for you). We then train neural models on these pairs to edit the original claim and produce a new claim with a different view. We demonstrate significant improvement over a sequence-to-sequence baseline in BLEU score and a human evaluation for fluency, coherence, and contrast.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/N19-1174/
PDF	https://www.aclweb.org/anthology/N19-1174
PWC	https://paperswithcode.com/paper/fixed-that-for-you-generating-contrastive
Repo	https://github.com/chridey/fixedthat
Framework	pytorch

Sparse and noisy LiDAR completion with RGB guidance anduncertainty


Title	Sparse and noisy LiDAR completion with RGB guidance anduncertainty
Authors	Wouter Van Gansbeke, Davy Neven, Bert De Brabandere, Luc Van Gool
Abstract	his work proposes a new method to accurately complete sparse LiDAR maps guided by RGB images. For autonomous vehicles and robotics the use of LiDAR is indispensable in order to achieve precise depth predictions. A multitude of applications depend on the awareness of their surroundings, and use depth cues to reason and react accordingly. On the one hand, monocular depth prediction methods fail to generate absolute and precise depth maps. On the other hand, stereoscopic approaches are still significantly outperformed by LiDAR based approaches. The goal of the depth completion task is to generate dense depth predictions from sparse and irregular point clouds which are mapped to a 2D plane. We propose a new framework which extracts both global and local information in order to produce proper depth maps. We argue that simple depth completion does not require a deep network. However, we additionally propose a fusion method with RGB guidance from a monocular camera in order to leverage object information and to correct mistakes in the sparse input. This improves the accuracy significantly. Moreover, confidence masks are exploited in order to take into account the uncertainty in the depth predictions from each modality. This fusion method outperforms the state-of-the-art and ranks first on the KITTI depth completion benchmark.
Tasks	Autonomous Vehicles, Depth Completion, Depth Estimation
Published	2019-02-14
URL	https://arxiv.org/abs/1902.05356
PDF	https://arxiv.org/pdf/1902.05356.pdf
PWC	https://paperswithcode.com/paper/sparse-and-noisy-lidar-completion-with-rgb-1
Repo	https://github.com/wvangansbeke/Sparse-Depth-Completion
Framework	pytorch