Paper Group ANR 195
Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control. Practical Privacy Preserving POI Recommendation. Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video. Imbalanced classification: an objective-oriented review. Training Keyword Spotters with Limited and Synthesized Spe …
Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control
Title | Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control |
Authors | Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, Marco Pavone |
Abstract | Reasoning about human motion through an environment is an important prerequisite to safe and socially-aware robotic navigation. As a result, multi-agent behavior prediction has become a core component of modern human-robot interactive systems, such as self-driving cars. While there exist a multitude of methods for trajectory forecasting, many of them have only been evaluated with one semantic class of agents and only use prior trajectory information, ignoring a plethora of information available online to autonomous systems from common sensors. Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of agents with distinct semantic classes while incorporating heterogeneous data (e.g. semantic maps and camera images). Our model is designed to be tightly integrated with robotic planning and control frameworks; it is capable of producing predictions that are conditioned on ego-agent motion plans. We demonstrate the performance of our model on several challenging real-world trajectory forecasting datasets, outperforming a wide array of state-of-the-art deterministic and generative methods. |
Tasks | Self-Driving Cars |
Published | 2020-01-09 |
URL | https://arxiv.org/abs/2001.03093v1 |
https://arxiv.org/pdf/2001.03093v1.pdf | |
PWC | https://paperswithcode.com/paper/trajectron-multi-agent-generative-trajectory |
Repo | |
Framework | |
Practical Privacy Preserving POI Recommendation
Title | Practical Privacy Preserving POI Recommendation |
Authors | Chaochao Chen, Bingzhe Wu, Wenjin Fang, Jun Zhou, Li Wang, Yuan Qi, Xiaolin Zheng |
Abstract | Point-of-Interest (POI) recommendation has been extensively studied and successfully applied in industry recently. However, most existing approaches build centralized models on the basis of collecting users’ data. Both private data and models are held by the recommender, which causes serious privacy concerns. In this paper, we propose a novel Privacy preserving POI Recommendation (PriRec) framework. First, to protect data privacy, users’ private data (features and actions) are kept on their own side, e.g., Cellphone or Pad. Meanwhile, the public data need to be accessed by all the users are kept by the recommender to reduce the storage costs of users’ devices. Those public data include: (1) static data only related to the status of POI, such as POI categories, and (2) dynamic data depend on user-POI actions such as visited counts. The dynamic data could be sensitive, and we develop local differential privacy techniques to release such data to public with privacy guarantees. Second, PriRec follows the representations of Factorization Machine (FM) that consists of linear model and the feature interaction model. To protect the model privacy, the linear models are saved on users’ side, and we propose a secure decentralized gradient descent protocol for users to learn it collaboratively. The feature interaction model is kept by the recommender since there is no privacy risk, and we adopt secure aggregation strategy in federated learning paradigm to learn it. To this end, PriRec keeps users’ private raw data and models in users’ own hands, and protects user privacy to a large extent. We apply PriRec in real-world datasets, and comprehensive experiments demonstrate that, compared with FM, PriRec achieves comparable or even better recommendation accuracy. |
Tasks | |
Published | 2020-03-05 |
URL | https://arxiv.org/abs/2003.02834v1 |
https://arxiv.org/pdf/2003.02834v1.pdf | |
PWC | https://paperswithcode.com/paper/practical-privacy-preserving-poi |
Repo | |
Framework | |
Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video
Title | Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video |
Authors | Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong |
Abstract | In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video. Specifically, given an untrimmed video and a query sentence, our goal is to localize a temporal segment in the video that semantically corresponds to the query sentence, with no reliance on any temporal annotation during training. We propose a two-stage model to tackle this problem in a coarse-to-fine manner. In the coarse stage, we first generate a set of fixed-length temporal proposals using multi-scale sliding windows, and match their visual features against the sentence features to identify the best-matched proposal as a coarse grounding result. In the fine stage, we perform a fine-grained matching between the visual features of the frames in the best-matched proposal and the sentence features to locate the precise frame boundary of the fine grounding result. Comprehensive experiments on the ActivityNet Captions dataset and the Charades-STA dataset demonstrate that our two-stage model achieves compelling performance. |
Tasks | |
Published | 2020-01-25 |
URL | https://arxiv.org/abs/2001.09308v1 |
https://arxiv.org/pdf/2001.09308v1.pdf | |
PWC | https://paperswithcode.com/paper/look-closer-to-ground-better-weakly |
Repo | |
Framework | |
Imbalanced classification: an objective-oriented review
Title | Imbalanced classification: an objective-oriented review |
Authors | Yang Feng, Min Zhou, Xin Tong |
Abstract | A common issue for classification in scientific research and industry is the existence of imbalanced classes. When sample sizes of different classes are imbalanced in training data, naively implementing a classification method often leads to unsatisfactory prediction results on test data. Multiple resampling techniques have been proposed to address the class imbalance issues. Yet, there is no general guidance on when to use each technique. In this article, we provide an objective-oriented review of the common resampling techniques for binary classification under imbalanced class sizes. The learning objectives we consider include the classical paradigm that minimizes the overall classification error, the cost-sensitive learning paradigm that minimizes a cost-adjusted weighted type I and type II errors, and the Neyman-Pearson paradigm that minimizes the type II error subject to a type I error constraint. Under each paradigm, we investigate the combination of the resampling techniques and a few state-of-the-art classification methods. For each pair of resampling techniques and classification methods, we use simulation studies to study the performance under different evaluation metrics. From these extensive simulation experiments, we demonstrate under each classification paradigm, the complex dynamics among resampling techniques, base classification methods, evaluation metrics, and imbalance ratios. For practitioners, the take-away message is that with imbalanced data, one usually should consider all the combinations of resampling techniques and the base classification methods. |
Tasks | |
Published | 2020-02-11 |
URL | https://arxiv.org/abs/2002.04592v1 |
https://arxiv.org/pdf/2002.04592v1.pdf | |
PWC | https://paperswithcode.com/paper/imbalanced-classification-an-objective |
Repo | |
Framework | |
Training Keyword Spotters with Limited and Synthesized Speech Data
Title | Training Keyword Spotters with Limited and Synthesized Speech Data |
Authors | James Lin, Kevin Kilgour, Dominik Roblek, Matthew Sharifi |
Abstract | With the rise of low power speech-enabled devices, there is a growing demand to quickly produce models for recognizing arbitrary sets of keywords. As with many machine learning tasks, one of the most challenging parts in the model creation process is obtaining a sufficient amount of training data. In this paper, we explore the effectiveness of synthesized speech data in training small, spoken term detection models of around 400k parameters. Instead of training such models directly on the audio or low level features such as MFCCs, we use a pre-trained speech embedding model trained to extract useful features for keyword spotting models. Using this speech embedding, we show that a model which detects 10 keywords when trained on only synthetic speech is equivalent to a model trained on over 500 real examples. We also show that a model without our speech embeddings would need to be trained on over 4000 real examples to reach the same accuracy. |
Tasks | Keyword Spotting |
Published | 2020-01-31 |
URL | https://arxiv.org/abs/2002.01322v1 |
https://arxiv.org/pdf/2002.01322v1.pdf | |
PWC | https://paperswithcode.com/paper/training-keyword-spotters-with-limited-and |
Repo | |
Framework | |
Privacy Preserving Point-of-interest Recommendation Using Decentralized Matrix Factorization
Title | Privacy Preserving Point-of-interest Recommendation Using Decentralized Matrix Factorization |
Authors | Chaochao Chen, Ziqi Liu, Peilin Zhao, Jun Zhou, Xiaolong Li |
Abstract | Points of interest (POI) recommendation has been drawn much attention recently due to the increasing popularity of location-based networks, e.g., Foursquare and Yelp. Among the existing approaches to POI recommendation, Matrix Factorization (MF) based techniques have proven to be effective. However, existing MF approaches suffer from two major problems: (1) Expensive computations and storages due to the centralized model training mechanism: the centralized learners have to maintain the whole user-item rating matrix, and potentially huge low rank matrices. (2) Privacy issues: the users’ preferences are at risk of leaking to malicious attackers via the centralized learner. To solve these, we present a Decentralized MF (DMF) framework for POI recommendation. Specifically, instead of maintaining all the low rank matrices and sensitive rating data for training, we propose a random walk based decentralized training technique to train MF models on each user’s end, e.g., cell phone and Pad. By doing so, the ratings of each user are still kept on one’s own hand, and moreover, decentralized learning can be taken as distributed learning with multi-learners (users), and thus alleviates the computation and storage issue. Experimental results on two real-world datasets demonstrate that, comparing with the classic and state-of-the-art latent factor models, DMF significantly improvements the recommendation performance in terms of precision and recall. |
Tasks | |
Published | 2020-03-12 |
URL | https://arxiv.org/abs/2003.05610v1 |
https://arxiv.org/pdf/2003.05610v1.pdf | |
PWC | https://paperswithcode.com/paper/privacy-preserving-point-of-interest |
Repo | |
Framework | |
The role of surrogate models in the development of digital twins of dynamic systems
Title | The role of surrogate models in the development of digital twins of dynamic systems |
Authors | Souvik Chakraborty, Sondipon Adhikari, Ranjan Ganguli |
Abstract | Digital twin technology has significant promise, relevance and potential of widespread applicability in various industrial sectors such as aerospace, infrastructure and automotive. However, the adoption of this technology has been slower due to the lack of clarity for specific applications. A discrete damped dynamic system is used in this paper to explore the concept of a digital twin. As digital twins are also expected to exploit data and computational methods, there is a compelling case for the use of surrogate models in this context. Motivated by this synergy, we have explored the possibility of using surrogate models within the digital twin technology. In particular, the use of Gaussian process (GP) emulator within the digital twin technology is explored. GP has the inherent capability of addressing noise and sparse data and hence, makes a compelling case to be used within the digital twin framework. Cases involving stiffness variation and mass variation are considered, individually and jointly along with different levels of noise and sparsity in data. Our numerical simulation results clearly demonstrate that surrogate models such as GP emulators have the potential to be an effective tool for the development of digital twins. Aspects related to data quality and sampling rate are analysed. Key concepts introduced in this paper are summarised and ideas for urgent future research needs are proposed. |
Tasks | |
Published | 2020-01-25 |
URL | https://arxiv.org/abs/2001.09292v1 |
https://arxiv.org/pdf/2001.09292v1.pdf | |
PWC | https://paperswithcode.com/paper/the-role-of-surrogate-models-in-the |
Repo | |
Framework | |
Block Annotation: Better Image Annotation for Semantic Segmentation with Sub-Image Decomposition
Title | Block Annotation: Better Image Annotation for Semantic Segmentation with Sub-Image Decomposition |
Authors | Hubert Lin, Paul Upchurch, Kavita Bala |
Abstract | Image datasets with high-quality pixel-level annotations are valuable for semantic segmentation: labelling every pixel in an image ensures that rare classes and small objects are annotated. However, full-image annotations are expensive, with experts spending up to 90 minutes per image. We propose block sub-image annotation as a replacement for full-image annotation. Despite the attention cost of frequent task switching, we find that block annotations can be crowdsourced at higher quality compared to full-image annotation with equal monetary cost using existing annotation tools developed for full-image annotation. Surprisingly, we find that 50% pixels annotated with blocks allows semantic segmentation to achieve equivalent performance to 100% pixels annotated. Furthermore, as little as 12% of pixels annotated allows performance as high as 98% of the performance with dense annotation. In weakly-supervised settings, block annotation outperforms existing methods by 3-4% (absolute) given equivalent annotation time. To recover the necessary global structure for applications such as characterizing spatial context and affordance relationships, we propose an effective method to inpaint block-annotated images with high-quality labels without additional human effort. As such, fewer annotations can also be used for these applications compared to full-image annotation. |
Tasks | Semantic Segmentation |
Published | 2020-02-16 |
URL | https://arxiv.org/abs/2002.06626v1 |
https://arxiv.org/pdf/2002.06626v1.pdf | |
PWC | https://paperswithcode.com/paper/block-annotation-better-image-annotation-for |
Repo | |
Framework | |
Ellipse R-CNN: Learning to Infer Elliptical Object from Clustering and Occlusion
Title | Ellipse R-CNN: Learning to Infer Elliptical Object from Clustering and Occlusion |
Authors | Wenbo Dong, Pravakar Roy, Cheng Peng, Volkan Isler |
Abstract | Images of heavily occluded objects in cluttered scenes, such as fruit clusters in trees, are hard to segment. To further retrieve the 3D size and 6D pose of each individual object in such cases, bounding boxes are not reliable from multiple views since only a little portion of the object’s geometry is captured. We introduce the first CNN-based ellipse detector, called Ellipse R-CNN, to represent and infer occluded objects as ellipses. We first propose a robust and compact ellipse regression based on the Mask R-CNN architecture for elliptical object detection. Our method can infer the parameters of multiple elliptical objects even they are occluded by other neighboring objects. For better occlusion handling, we exploit refined feature regions for the regression stage, and integrate the U-Net structure for learning different occlusion patterns to compute the final detection score. The correctness of ellipse regression is validated through experiments performed on synthetic data of clustered ellipses. We further quantitatively and qualitatively demonstrate that our approach outperforms the state-of-the-art model (i.e., Mask R-CNN followed by ellipse fitting) and its three variants on both synthetic and real datasets of occluded and clustered elliptical objects. |
Tasks | Object Detection |
Published | 2020-01-30 |
URL | https://arxiv.org/abs/2001.11584v1 |
https://arxiv.org/pdf/2001.11584v1.pdf | |
PWC | https://paperswithcode.com/paper/ellipse-r-cnn-learning-to-infer-elliptical |
Repo | |
Framework | |
3D Aggregated Faster R-CNN for General Lesion Detection
Title | 3D Aggregated Faster R-CNN for General Lesion Detection |
Authors | Ning Zhang, Yu Cao, Benyuan Liu, Yan Luo |
Abstract | Lesions are damages and abnormalities in tissues of the human body. Many of them can later turn into fatal diseases such as cancers. Detecting lesions are of great importance for early diagnosis and timely treatment. To this end, Computed Tomography (CT) scans often serve as the screening tool, allowing us to leverage the modern object detection techniques to detect the lesions. However, lesions in CT scans are often small and sparse. The local area of lesions can be very confusing, leading the region based classifier branch of Faster R-CNN easily fail. Therefore, most of the existing state-of-the-art solutions train two types of heterogeneous networks (multi-phase) separately for the candidate generation and the False Positive Reduction (FPR) purposes. In this paper, we enforce an end-to-end 3D Aggregated Faster R-CNN solution by stacking an “aggregated classifier branch” on the backbone of RPN. This classifier branch is equipped with Feature Aggregation and Local Magnification Layers to enhance the classifier branch. We demonstrate our model can achieve the state of the art performance on both LUNA16 and DeepLesion dataset. Especially, we achieve the best single-model FROC performance on LUNA16 with the inference time being 4.2s per processed scan. |
Tasks | Computed Tomography (CT), Object Detection |
Published | 2020-01-29 |
URL | https://arxiv.org/abs/2001.11071v1 |
https://arxiv.org/pdf/2001.11071v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-aggregated-faster-r-cnn-for-general-lesion |
Repo | |
Framework | |
Fair Active Learning
Title | Fair Active Learning |
Authors | Hadis Anahideh, Abolfazl Asudeh |
Abstract | Bias in training data, as well as proxy attributes, are probably the main reasons for unfair machine learning outcomes. ML models are trained on historical data that are problematic due to the inherent societal bias. Besides, collecting labeled data in societal applications is challenging and costly. Subsequently, proxy attributes are often used as alternatives to labels. Yet, biased proxies cause model unfairness. In this paper, we introduce fair active learning (FAL) as a resolution. Considering a limited labeling budget, FAL carefully selects data points to be labeled in order to balance the model performance and fairness. Our comprehensive experiments on real datasets, confirm a significant fairness improvement while maintaining the model performance. |
Tasks | Active Learning |
Published | 2020-01-06 |
URL | https://arxiv.org/abs/2001.01796v3 |
https://arxiv.org/pdf/2001.01796v3.pdf | |
PWC | https://paperswithcode.com/paper/fair-active-learning |
Repo | |
Framework | |
Seeing Around Corners with Edge-Resolved Transient Imaging
Title | Seeing Around Corners with Edge-Resolved Transient Imaging |
Authors | Joshua Rapp, Charles Saunders, Julián Tachella, John Murray-Bruce, Yoann Altmann, Jean-Yves Tourneret, Stephen McLaughlin, Robin M. A. Dawson, Franco N. C. Wong, Vivek K Goyal |
Abstract | Non-line-of-sight (NLOS) imaging is a rapidly growing field seeking to form images of objects outside the field of view, with potential applications in search and rescue, reconnaissance, and even medical imaging. The critical challenge of NLOS imaging is that diffuse reflections scatter light in all directions, resulting in weak signals and a loss of directional information. To address this problem, we propose a method for seeing around corners that derives angular resolution from vertical edges and longitudinal resolution from the temporal response to a pulsed light source. We introduce an acquisition strategy, scene response model, and reconstruction algorithm that enable the formation of 2.5-dimensional representations – a plan view plus heights – and a 180$^{\circ}$ field of view (FOV) for large-scale scenes. Our experiments demonstrate accurate reconstructions of hidden rooms up to 3 meters in each dimension. |
Tasks | |
Published | 2020-02-17 |
URL | https://arxiv.org/abs/2002.07118v1 |
https://arxiv.org/pdf/2002.07118v1.pdf | |
PWC | https://paperswithcode.com/paper/seeing-around-corners-with-edge-resolved |
Repo | |
Framework | |
Plug & Play Convolutional Regression Tracker for Video Object Detection
Title | Plug & Play Convolutional Regression Tracker for Video Object Detection |
Authors | Ye Lyu, Michael Ying Yang, George Vosselman, Gui-Song Xia |
Abstract | Video object detection targets to simultaneously localize the bounding boxes of the objects and identify their classes in a given video. One challenge for video object detection is to consistently detect all objects across the whole video. As the appearance of objects may deteriorate in some frames, features or detections from the other frames are commonly used to enhance the prediction. In this paper, we propose a Plug & Play scale-adaptive convolutional regression tracker for the video object detection task, which could be easily and compatibly implanted into the current state-of-the-art detection networks. As the tracker reuses the features from the detector, it is a very light-weighted increment to the detection network. The whole network performs at the speed close to a standard object detector. With our new video object detection pipeline design, image object detectors can be easily turned into efficient video object detectors without modifying any parameters. The performance is evaluated on the large-scale ImageNet VID dataset. Our Plug & Play design improves mAP score for the image detector by around 5% with only little speed drop. |
Tasks | Object Detection, Video Object Detection |
Published | 2020-03-02 |
URL | https://arxiv.org/abs/2003.00981v1 |
https://arxiv.org/pdf/2003.00981v1.pdf | |
PWC | https://paperswithcode.com/paper/plug-play-convolutional-regression-tracker |
Repo | |
Framework | |
Estimating a Null Model of Scientific Image Reuse to Support Research Integrity Investigations
Title | Estimating a Null Model of Scientific Image Reuse to Support Research Integrity Investigations |
Authors | Daniel E. Acuna, Ziyue Xiang |
Abstract | When there is a suspicious figure reuse case in science, research integrity investigators often find it difficult to rebut authors claiming that “it happened by chance”. In other words, when there is a “collision” of image features, it is difficult to justify whether it appears rarely or not. In this article, we provide a method to predict the rarity of an image feature by statistically estimating the chance of it randomly occurring across all scientific imagery. Our method is based on high-dimensional density estimation of ORB features using 7+ million images in the PubMed Open Access Subset dataset. We show that this method can lead to meaningful feedback during research integrity investigations by providing a null hypothesis for scientific image reuse and thus a p-value during deliberations. We apply the model to a sample of increasingly complex imagery and confirm that it produces decreasingly smaller p-values as expected. We discuss applications to research integrity investigations as well as future work. |
Tasks | Density Estimation |
Published | 2020-02-22 |
URL | https://arxiv.org/abs/2003.00878v1 |
https://arxiv.org/pdf/2003.00878v1.pdf | |
PWC | https://paperswithcode.com/paper/estimating-a-null-model-of-scientific-image |
Repo | |
Framework | |
Graphon Pooling in Graph Neural Networks
Title | Graphon Pooling in Graph Neural Networks |
Authors | Alejandro Parada-Mayorga, Luana Ruiz, Alejandro Ribeiro |
Abstract | Graph neural networks (GNNs) have been used effectively in different applications involving the processing of signals on irregular structures modeled by graphs. Relying on the use of shift-invariant graph filters, GNNs extend the operation of convolution to graphs. However, the operations of pooling and sampling are still not clearly defined and the approaches proposed in the literature either modify the graph structure in a way that does not preserve its spectral properties, or require defining a policy for selecting which nodes to keep. In this work, we propose a new strategy for pooling and sampling on GNNs using graphons which preserves the spectral properties of the graph. To do so, we consider the graph layers in a GNN as elements of a sequence of graphs that converge to a graphon. In this way we have no ambiguity in the node labeling when mapping signals from one layer to the other and a spectral representation that is consistent throughout the layers. We evaluate this strategy in a synthetic and a real-world numerical experiment where we show that graphon pooling GNNs are less prone to overfitting and improve upon other pooling techniques, especially when the dimensionality reduction ratios between layers is large. |
Tasks | Dimensionality Reduction |
Published | 2020-03-03 |
URL | https://arxiv.org/abs/2003.01795v1 |
https://arxiv.org/pdf/2003.01795v1.pdf | |
PWC | https://paperswithcode.com/paper/graphon-pooling-in-graph-neural-networks |
Repo | |
Framework | |