Paper Group NAWR 23
6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images. TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts. Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks. Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes. A Do …
6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images
Title | 6D-VNet: End-to-end 6DoF Vehicle Pose Estimation from Monocular RGB Images |
Authors | Di Wu, Zhaoyong Zhuang, Canqun Xiang, Wenbin Zou and Xia Li |
Abstract | We present a conceptually simple framework for 6DoF object pose estimation, especially for autonomous driving scenario. Our approach efficiently detects traffic partic- ipants in a monocular RGB image while simultaneously regressing their 3D translation and rotation vectors. The method, called 6D-VNet, extends Mask R-CNN by adding customised heads for predicting vehicle s finer class, ro- tation and translation. The proposed 6D-VNet is trained end-to-end compared to previous methods. Furthermore, we show that the inclusion of translational regression in the joint losses is crucial for the 6DoF pose estimation task, where object translation distance along longitudinal axis varies significantly, e.g., in autonomous driving sce- narios. Additionally, we incorporate the mutual informa- tion between traffic participants via a modified non-local block. As opposed to the original non-local block imple- mentation, the proposed weighting modification takes the spatial neighbouring information into consideration whilst counteracting the effect of extreme gradient values. Our 6D-VNet reaches the 1 st place in ApolloScape challenge 3D Car Instance task1 [21]. Code has been made available at: https://github.com/stevenwudi/6DVNET. |
Tasks | Autonomous Driving, Pose Estimation |
Published | 2019-06-15 |
URL | http://openaccess.thecvf.com/content_CVPRW_2019/html/WAD/Wu_6D-VNet_End-To-End_6-DoF_Vehicle_Pose_Estimation_From_Monocular_RGB_Images_CVPRW_2019_paper.html |
http://openaccess.thecvf.com/content_CVPRW_2019/papers/WAD/Wu_6D-VNet_End-To-End_6-DoF_Vehicle_Pose_Estimation_From_Monocular_RGB_Images_CVPRW_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/6d-vnet-end-to-end-6dof-vehicle-pose |
Repo | https://github.com/stevenwudi/6DVNET |
Framework | pytorch |
TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts
Title | TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts |
Authors | Ziyang Hong, Yvan Petillot, David Lane, Yishu Miao, Sen Wang |
Abstract | Visual place recognition is a fundamental problem for many vision based applications. Sparse feature and deep learning based methods have been successful and dominant over the decade. However, most of them do not explicitly leverage high-level semantic information to deal with challenging scenarios where they may fail. This paper proposes a novel visual place recognition algorithm, termed TextPlace, based on scene texts in the wild. Since scene texts are high-level information invariant to illumination changes and very distinct for different places when considering spatial correlation, it is beneficial for visual place recognition tasks under extreme appearance changes and perceptual aliasing. It also takes spatial-temporal dependence between scene texts into account for topological localization. Extensive experiments show that TextPlace achieves state-of-the-art performance, verifying the effectiveness of using high-level scene texts for robust visual place recognition in urban areas. |
Tasks | Visual Place Recognition |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Hong_TextPlace_Visual_Place_Recognition_and_Topological_Localization_Through_Reading_Scene_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Hong_TextPlace_Visual_Place_Recognition_and_Topological_Localization_Through_Reading_Scene_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/textplace-visual-place-recognition-and |
Repo | https://github.com/ziyanghong/dataset |
Framework | none |
Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks
Title | Exploring Unexplored Tensor Network Decompositions for Convolutional Neural Networks |
Authors | Kohei Hayashi, Taiki Yamaguchi, Yohei Sugawara, Shin-Ichi Maeda |
Abstract | Tensor decomposition methods are widely used for model compression and fast inference in convolutional neural networks (CNNs). Although many decompositions are conceivable, only CP decomposition and a few others have been applied in practice, and no extensive comparisons have been made between available methods. Previous studies have not determined how many decompositions are available, nor which of them is optimal. In this study, we first characterize a decomposition class specific to CNNs by adopting a flexible graphical notation. The class includes such well-known CNN modules as depthwise separable convolution layers and bottleneck layers, but also previously unknown modules with nonlinear activations. We also experimentally compare the tradeoff between prediction accuracy and time/space complexity for modules found by enumerating all possible decompositions, or by using a neural architecture search. We find some nonlinear decompositions outperform existing ones. |
Tasks | Model Compression, Neural Architecture Search |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8793-exploring-unexplored-tensor-network-decompositions-for-convolutional-neural-networks |
http://papers.nips.cc/paper/8793-exploring-unexplored-tensor-network-decompositions-for-convolutional-neural-networks.pdf | |
PWC | https://paperswithcode.com/paper/exploring-unexplored-tensor-network |
Repo | https://github.com/pfnet-research/einconv |
Framework | pytorch |
Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes
Title | Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes |
Authors | Greg Yang |
Abstract | Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al.~(2018) and Matthews et al.~(2018) for deep fully-connected networks, as well as by Novak et al.~(2019) and Garriga-Alonso et al.~(2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the \emph{tensor programs} technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there. We provide open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network at github.com/thegregyang/GP4A. Please see our arxiv version for the complete and up-to-date version of this paper. |
Tasks | Gaussian Processes |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9186-wide-feedforward-or-recurrent-neural-networks-of-any-architecture-are-gaussian-processes |
http://papers.nips.cc/paper/9186-wide-feedforward-or-recurrent-neural-networks-of-any-architecture-are-gaussian-processes.pdf | |
PWC | https://paperswithcode.com/paper/wide-feedforward-or-recurrent-neural-networks |
Repo | https://github.com/thegregyang/GP4A |
Framework | none |
A Domain Agnostic Measure for Monitoring and Evaluating GANs
Title | A Domain Agnostic Measure for Monitoring and Evaluating GANs |
Authors | Paulina Grnarova, Kfir Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause |
Abstract | Generative Adversarial Networks (GANs) have shown remarkable results in modeling complex distributions, but their evaluation remains an unsettled issue. Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training. The latter cannot be determined by simply inspecting the generator and discriminator loss curves as they behave non-intuitively. We leverage the notion of duality gap from game theory to propose a measure that addresses both (i) and (ii) at a low computational cost. Extensive experiments show the effectiveness of this measure to rank different GAN models and capture the typical GAN failure scenarios, including mode collapse and non-convergent behaviours. This evaluation metric also provides meaningful monitoring on the progression of the loss during training. It highly correlates with FID on natural image datasets, and with domain specific scores for text, sound and cosmology data where FID is not directly suitable. In particular, our proposed metric requires no labels or a pretrained classifier, making it domain agnostic. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/9377-a-domain-agnostic-measure-for-monitoring-and-evaluating-gans |
http://papers.nips.cc/paper/9377-a-domain-agnostic-measure-for-monitoring-and-evaluating-gans.pdf | |
PWC | https://paperswithcode.com/paper/a-domain-agnostic-measure-for-monitoring-and |
Repo | https://github.com/pgrnar/DualityGap |
Framework | none |
Recurrent Highway Networks with Grouped Auxiliary Memory
Title | Recurrent Highway Networks with Grouped Auxiliary Memory |
Authors | Wei Luo ; Feng Yu |
Abstract | Recurrent neural networks (RNNs) are challenging to train, let alone those with deep spatial structures. Architectures built upon highway connections such as Recurrent Highway Network (RHN) were developed to allow larger step-to-step transition depth, leading to more expressive models. However, problems that require capturing long-term dependencies still can not be well addressed by these models. Moreover, the ability to keep long-term memories tends to diminish when the spatial depth increases, since deeper structure may accelerate gradient vanishing. In this paper, we address these issues by proposing a novel RNN architecture based on RHN, namely the Recurrent Highway Network with Grouped Auxiliary Memory (GAM-RHN). The proposed architecture interconnects the RHN with a set of auxiliary memory units specifically for storing long-term information via reading and writing operations, which is analogous to Memory Augmented Neural Networks (MANNs). Experimental results on artificial long time lag tasks show that GAM-RHNs can be trained efficiently while being deep in both time and space. We also evaluate the proposed architecture on a variety of tasks, including language modeling, sequential image classification, and financial market forecasting. The potential of our approach is demonstrated by achieving state-of-the-art results on these tasks. |
Tasks | Image Classification, Language Modelling, Sequential Image Classification, Stock Trend Prediction |
Published | 2019-12-13 |
URL | https://ieeexplore.ieee.org/document/8932404 |
https://ieeexplore.ieee.org/document/8932404 | |
PWC | https://paperswithcode.com/paper/recurrent-highway-networks-with-grouped |
Repo | https://github.com/WilliamRo/gam_rhn |
Framework | tf |
Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG
Title | Preprocessing Method for Performance Enhancement in CNN-based STEMI Detection from 12-lead ECG |
Authors | YeongHyeon Park, Il Dong Yun, Si-Hyuck Kang |
Abstract | ST elevation myocardial infarction (STEMI) is an acute life-threatening disease. It shows a high mortality risk when a patient is not timely treated within the golden time, prompt diagnosis with limited information such as electrocardiogram (ECG) is crucial. However, previous studies among physicians and paramedics have shown that the accuracy of STEMI diagnosis by the ECG is not sufficient. Thus, we propose a detecting algorithm based on a convolutional neural network (CNN) for detecting the STEMI on 12-lead ECG in order to support physicians, especially in an emergency room. We mostly focus on enhancing the detecting performance using a preprocessing technique. First, we reduce the noise of ECG using a notch filter and high-pass filter. We also segment pulses from ECG to focus on the ST segment. We use 96 normal and 179 STEMI records provided by Seoul National University Bundang Hospital (SNUBH) for the experiment. The sensitivity, specificity, and area under the curve (AUC) of the receiver operating characteristic (ROC) curve are increased from 0.685, 0.350, and 0.526 to 0.932, 0.896, and 0.943, respectively, depending on the preprocessing technique. As our result shows, the proposed method is effective to enhance STEIM detecting performance. Also, the proposed algorithm would be expected to help timely and the accurate diagnosis of STEMI in clinical practices. |
Tasks | |
Published | 2019-07-24 |
URL | https://ieeexplore.ieee.org/abstract/document/8771175 |
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8771175 | |
PWC | https://paperswithcode.com/paper/preprocessing-method-for-performance |
Repo | https://github.com/YeongHyeon/Enhancementing-Method-for-STEMI-Detection |
Framework | tf |
Activation Atlas
Title | Activation Atlas |
Authors | Shan Carter, Zan Armstrong, Ludwig Schubert, Ian Johnson, Chris Olah |
Abstract | By using feature inversion to visualize millions of activations from an image classification network, we create an explorable activation atlas of features the network has learned which can reveal how the network typically represents some concepts. |
Tasks | Image Classification |
Published | 2019-03-06 |
URL | https://distill.pub/2019/activation-atlas/ |
https://distill.pub/2019/activation-atlas/ | |
PWC | https://paperswithcode.com/paper/activation-atlas |
Repo | https://github.com/tensorflow/lucid |
Framework | tf |
Practical Differentially Private Top-k Selection with Pay-what-you-get Composition
Title | Practical Differentially Private Top-k Selection with Pay-what-you-get Composition |
Authors | David Durfee, Ryan M. Rogers |
Abstract | We study the problem of top-k selection over a large domain universe subject to user-level differential privacy. Typically, the exponential mechanism or report noisy max are the algorithms used to solve this problem. However, these algorithms require querying the database for the count of each domain element. We focus on the setting where the data domain is unknown, which is different than the setting of frequent itemsets where an apriori type algorithm can help prune the space of domain elements to query. We design algorithms that ensures (approximate) differential privacy and only needs access to the true top-k’ elements from the data for any chosen k’ ≥ k. This is a highly desirable feature for making differential privacy practical, since the algorithms require no knowledge of the domain. We consider both the setting where a user’s data can modify an arbitrary number of counts by at most 1, i.e. unrestricted sensitivity, and the setting where a user’s data can modify at most some small, fixed number of counts by at most 1, i.e. restricted sensitivity. Additionally, we provide a pay-what-you-get privacy composition bound for our algorithms. That is, our algorithms might return fewer than k elements when the top-k elements are queried, but the overall privacy budget only decreases by the size of the outcome set. |
Tasks | |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8612-practical-differentially-private-top-k-selection-with-pay-what-you-get-composition |
http://papers.nips.cc/paper/8612-practical-differentially-private-top-k-selection-with-pay-what-you-get-composition.pdf | |
PWC | https://paperswithcode.com/paper/practical-differentially-private-top-k |
Repo | https://github.com/rrogers386/DPComposition |
Framework | none |
Performance prediction of data streams on high-performance architecture
Title | Performance prediction of data streams on high-performance architecture |
Authors | Bhaskar Gautam, Annappa Basava |
Abstract | Worldwide sensor streams are expanding continuously with unbounded velocity in volume, and for this acceleration, there is an adaptation of large stream data processing system from the homogeneous to rack-scale architecture which makes serious concern in the domain of workload optimization, scheduling, and resource management algorithms. Our proposed framework is based on providing architecture independent performance prediction model to enable resource adaptive distributed stream data processing platform. It is comprised of seven pre-defined domain for dynamic data stream metrics including a self-driven model which tries to fit these metrics using ridge regularization regression algorithm. Another significant contribution lies in fully-automated performance prediction model inherited from the state-of-the-art distributed data management system for distributed stream processing systems using Gaussian processes regression that cluster metrics with the help of dimensionality reduction algorithm. We implemented its base on Apache Heron and evaluated with proposed Benchmark Suite comprising of five domain-specific topologies. To assess the proposed methodologies, we forcefully ingest tuple skewness among the benchmarking topologies to set up the ground truth for predictions and found that accuracy of predicting the performance of data streams increased up to 80.62% from 66.36% along with the reduction of error from 37.14 to 16.06%. |
Tasks | Dimensionality Reduction, Gaussian Processes |
Published | 2019-01-07 |
URL | https://doi.org/10.1186/s13673-018-0163-4 |
https://rdcu.be/bMVaG | |
PWC | https://paperswithcode.com/paper/performance-prediction-of-data-streams-on |
Repo | https://github.com/bhaskar24/StreamBenchmark |
Framework | none |
Optimal Decision Tree with Noisy Outcomes
Title | Optimal Decision Tree with Noisy Outcomes |
Authors | Su Jia, Viswanath Nagarajan, Fatemeh Navidi, R Ravi |
Abstract | A fundamental task in active learning involves performing a sequence of tests to identify an unknown hypothesis that is drawn from a known distribution. This problem, known as optimal decision tree induction, has been widely studied for decades and the asymptotically best-possible approximation algorithm has been devised for it. We study a generalization where certain test outcomes are noisy, even in the more general case when the noise is persistent, i.e., repeating the test on the scenario gives the same noisy output, disallowing simple repetition as a way to gain confidence. We design new approximation algorithms for both the non-adaptive setting, where the test sequence must be fixed a-priori, and the adaptive setting where the test sequence depends on the outcomes of prior tests. Previous work in the area assumed at most a constant number of noisy outcomes per test and per scenario and provided approximation ratios that were problem dependent (such as the minimum probability of a hypothesis). Our new approximation algorithms provide guarantees that are nearly best-possible and work for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades smoothly with this number. Our results adapt and generalize methods used for submodular ranking and stochastic set cover. We evaluate the performance of our algorithms on two natural applications with noise: toxic chemical identification and active learning of linear classifiers. Despite our logarithmic theoretical approximation guarantees, our methods give solutions with cost very close to the information theoretic minimum, demonstrating the effectiveness of our methods. |
Tasks | Active Learning |
Published | 2019-12-01 |
URL | http://papers.nips.cc/paper/8592-optimal-decision-tree-with-noisy-outcomes |
http://papers.nips.cc/paper/8592-optimal-decision-tree-with-noisy-outcomes.pdf | |
PWC | https://paperswithcode.com/paper/optimal-decision-tree-with-noisy-outcomes |
Repo | https://github.com/sjia1/ODT-with-noisy-outcomes |
Framework | none |
Pyramid U-Network for Skeleton Extraction From Shape Points
Title | Pyramid U-Network for Skeleton Extraction From Shape Points |
Authors | Rowel Atienza |
Abstract | The knowledge about the skeleton of a given geometric shape has many practical applications such as shape animation, shape comparison, shape recognition, and estimating structural strength. Skeleton extraction becomes a more challenging problem when the topology is represented in point cloud domain. In this paper, we present the network architecture, PSPU-SkelNet, for TeamPH which ranked 3rd in Point SkelNetOn 2019 challenge. PSPU-SkelNet is a pyramid of three U-Nets that predicts the skeleton from a given shape point cloud. PSPU-SkelNet achieves a Chamfer Distance (CD) of 2.9105 on the final test dataset. The code of PSPU SkelNet is available at https://github.com/roatienza/skelnet. |
Tasks | |
Published | 2019-06-17 |
URL | http://openaccess.thecvf.com/content_CVPRW_2019/papers/SkelNetOn/Atienza_Pyramid_U-Network_for_Skeleton_Extraction_From_Shape_Points_CVPRW_2019_paper.pdf |
http://openaccess.thecvf.com/content_CVPRW_2019/papers/SkelNetOn/Atienza_Pyramid_U-Network_for_Skeleton_Extraction_From_Shape_Points_CVPRW_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/pyramid-u-network-for-skeleton-extraction |
Repo | https://github.com/roatienza/skelnet |
Framework | none |
Sphere Generative Adversarial Network Based on Geometric Moment Matching
Title | Sphere Generative Adversarial Network Based on Geometric Moment Matching |
Authors | Sung Woo Park, Junseok Kwon |
Abstract | We propose sphere generative adversarial network (GAN), a novel integral probability metric (IPM)-based GAN. Sphere GAN uses the hypersphere to bound IPMs in the objective function. Thus, it can be trained stably. On the hypersphere, sphere GAN exploits the information of higher-order statistics of data using geometric moment matching, thereby providing more accurate results. In the paper, we mathematically prove the good properties of sphere GAN. In experiments, sphere GAN quantitatively and qualitatively surpasses recent state-of-the-art GANs for unsupervised image generation problems with the CIFAR-10, STL-10, and LSUN bedroom datasets. Source code is available at https://github.com/pswkiki/SphereGAN. |
Tasks | Image Generation |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Park_Sphere_Generative_Adversarial_Network_Based_on_Geometric_Moment_Matching_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Park_Sphere_Generative_Adversarial_Network_Based_on_Geometric_Moment_Matching_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/sphere-generative-adversarial-network-based |
Repo | https://github.com/taki0112/SphereGAN-Tensorflow |
Framework | tf |
Fixed That for You: Generating Contrastive Claims with Semantic Edits
Title | Fixed That for You: Generating Contrastive Claims with Semantic Edits |
Authors | Christopher Hidey, Kathy McKeown |
Abstract | Understanding contrastive opinions is a key component of argument generation. Central to an argument is the claim, a statement that is in dispute. Generating a counter-argument then requires generating a response in contrast to the main claim of the original argument. To generate contrastive claims, we create a corpus of Reddit comment pairs self-labeled by posters using the acronym FTFY (fixed that for you). We then train neural models on these pairs to edit the original claim and produce a new claim with a different view. We demonstrate significant improvement over a sequence-to-sequence baseline in BLEU score and a human evaluation for fluency, coherence, and contrast. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/N19-1174/ |
https://www.aclweb.org/anthology/N19-1174 | |
PWC | https://paperswithcode.com/paper/fixed-that-for-you-generating-contrastive |
Repo | https://github.com/chridey/fixedthat |
Framework | pytorch |
Sparse and noisy LiDAR completion with RGB guidance anduncertainty
Title | Sparse and noisy LiDAR completion with RGB guidance anduncertainty |
Authors | Wouter Van Gansbeke, Davy Neven, Bert De Brabandere, Luc Van Gool |
Abstract | his work proposes a new method to accurately complete sparse LiDAR maps guided by RGB images. For autonomous vehicles and robotics the use of LiDAR is indispensable in order to achieve precise depth predictions. A multitude of applications depend on the awareness of their surroundings, and use depth cues to reason and react accordingly. On the one hand, monocular depth prediction methods fail to generate absolute and precise depth maps. On the other hand, stereoscopic approaches are still significantly outperformed by LiDAR based approaches. The goal of the depth completion task is to generate dense depth predictions from sparse and irregular point clouds which are mapped to a 2D plane. We propose a new framework which extracts both global and local information in order to produce proper depth maps. We argue that simple depth completion does not require a deep network. However, we additionally propose a fusion method with RGB guidance from a monocular camera in order to leverage object information and to correct mistakes in the sparse input. This improves the accuracy significantly. Moreover, confidence masks are exploited in order to take into account the uncertainty in the depth predictions from each modality. This fusion method outperforms the state-of-the-art and ranks first on the KITTI depth completion benchmark. |
Tasks | Autonomous Vehicles, Depth Completion, Depth Estimation |
Published | 2019-02-14 |
URL | https://arxiv.org/abs/1902.05356 |
https://arxiv.org/pdf/1902.05356.pdf | |
PWC | https://paperswithcode.com/paper/sparse-and-noisy-lidar-completion-with-rgb-1 |
Repo | https://github.com/wvangansbeke/Sparse-Depth-Completion |
Framework | pytorch |