Paper Group AWR 425
Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels. SumQE: a BERT-based Summary Quality Estimation Model. Onset detection: A new approach to QBH system. Learning in the Machine: To Share or Not to Share?. SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds. Data-Efficient Classification of Birdcall Through Co …
Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels
Title | Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels |
Authors | Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian |
Abstract | Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility of specifying new events in video in a more traditional manner: by writing queries that compose outputs of existing, pre-trained models. To write these queries, we have developed Rekall, a library that exposes a data model and programming model for compositional video event specification. Rekall represents video annotations from different sources (object detectors, transcripts, etc.) as spatiotemporal labels associated with continuous volumes of spacetime in a video, and provides operators for composing labels into queries that model new video events. We demonstrate the use of Rekall in analyzing video from cable TV news broadcasts, films, static-camera vehicular video streams, and commercial autonomous vehicle logs. In these efforts, domain experts were able to quickly (in a few hours to a day) author queries that enabled the accurate detection of new events (on par with, and in some cases much more accurate than, learned approaches) and to rapidly retrieve video clips for human-in-the-loop tasks such as video content curation and training data curation. Finally, in a user study, novice users of Rekall were able to author queries to retrieve new events in video given just one hour of query development time. |
Tasks | |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02993v1 |
https://arxiv.org/pdf/1910.02993v1.pdf | |
PWC | https://paperswithcode.com/paper/rekall-specifying-video-events-using |
Repo | https://github.com/scanner-research/rekall |
Framework | none |
SumQE: a BERT-based Summary Quality Estimation Model
Title | SumQE: a BERT-based Summary Quality Estimation Model |
Authors | Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos |
Abstract | We propose SumQE, a novel Quality Estimation model for summarization based on BERT. The model addresses linguistic quality aspects that are only indirectly captured by content-based approaches to summary evaluation, without involving comparison with human references. SumQE achieves very high correlations with human ratings, outperforming simpler models addressing these linguistic aspects. Predictions of the SumQE model can be used for system development, and to inform users of the quality of automatically produced summaries and other types of generated text. |
Tasks | |
Published | 2019-09-02 |
URL | https://arxiv.org/abs/1909.00578v1 |
https://arxiv.org/pdf/1909.00578v1.pdf | |
PWC | https://paperswithcode.com/paper/sumqe-a-bert-based-summary-quality-estimation |
Repo | https://github.com/nlpaueb/SumQE |
Framework | tf |
Onset detection: A new approach to QBH system
Title | Onset detection: A new approach to QBH system |
Authors | Ritwik Bhaduri, Soham Bonnerjee, Subhrajyoty Roy |
Abstract | Query by Humming (QBH) is a system to provide a user with the song(s) which the user hums to the system. Current QBH method requires the extraction of onset and pitch information in order to track similarity with various versions of different songs. However, we here focus on detecting precise onsets only and use them to build a QBH system which is better than existing methods in terms of speed and memory and empirically in terms of accuracy. We also provide statistical analogy for onset detection functions and provide a measure of error in our algorithm. |
Tasks | |
Published | 2019-08-17 |
URL | https://arxiv.org/abs/1908.07409v2 |
https://arxiv.org/pdf/1908.07409v2.pdf | |
PWC | https://paperswithcode.com/paper/onset-detection-a-new-approach-to-qbh-system |
Repo | https://github.com/subroy13/OnsetDetection |
Framework | none |
Learning in the Machine: To Share or Not to Share?
Title | Learning in the Machine: To Share or Not to Share? |
Authors | Jordan Ott, Erik Linstead, Nicholas LaHaye, Pierre Baldi |
Abstract | Weight-sharing is one of the pillars behind Convolutional Neural Networks and their successes. However, in physical neural systems such as the brain, weight-sharing is implausible. This discrepancy raises the fundamental question of whether weight-sharing is necessary. If so, to which degree of precision? If not, what are the alternatives? The goal of this study is to investigate these questions, primarily through simulations where the weight-sharing assumption is relaxed. Taking inspiration from neural circuitry, we explore the use of Free Convolutional Networks and neurons with variable connection patterns. Using Free Convolutional Networks, we show that while weight-sharing is a pragmatic optimization approach, it is not a necessity in computer vision applications. Furthermore, Free Convolutional Networks match the performance observed in standard architectures when trained using properly translated data (akin to video). Under the assumption of translationally augmented data, Free Convolutional Networks learn translationally invariant representations that yield an approximate form of weight sharing. |
Tasks | |
Published | 2019-09-23 |
URL | https://arxiv.org/abs/1909.11483v2 |
https://arxiv.org/pdf/1909.11483v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-in-the-machine-to-share-or-not-to |
Repo | https://github.com/jordanott/WeightSharing |
Framework | none |
SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds
Title | SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds |
Authors | Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai |
Abstract | With the development of deep neural networks, the demand for a significant amount of annotated training data becomes the performance bottlenecks in many fields of research and applications. Image synthesis can generate annotated images automatically and freely, which gains increasing attention recently. In this paper, we propose to synthesize scene text images from the 3D virtual worlds, where the precise descriptions of scenes, editable illumination/visibility, and realistic physics are provided. Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety. In this way, real-world variations, including complex perspective transformations, various illuminations, and occlusions, can be realized in our synthesized scene text images. Moreover, the same text instances with various viewpoints can be produced by randomly moving and rotating the virtual camera, which acts as human eyes. The experiments on the standard scene text detection benchmarks using the generated synthetic data demonstrate the effectiveness and superiority of the proposed method. The code and synthetic data is available at: https://github.com/MhLiao/SynthText3D |
Tasks | Image Generation, Scene Text Detection |
Published | 2019-07-13 |
URL | https://arxiv.org/abs/1907.06007v2 |
https://arxiv.org/pdf/1907.06007v2.pdf | |
PWC | https://paperswithcode.com/paper/synthtext3d-synthesizing-scene-text-images |
Repo | https://github.com/MhLiao/SynthText3D |
Framework | none |
Data-Efficient Classification of Birdcall Through Convolutional Neural Networks Transfer Learning
Title | Data-Efficient Classification of Birdcall Through Convolutional Neural Networks Transfer Learning |
Authors | Dina B. Efremova, Mangalam Sankupellay, Dmitry A. Konovalov |
Abstract | Deep learning Convolutional Neural Network (CNN) models are powerful classification models but require a large amount of training data. In niche domains such as bird acoustics, it is expensive and difficult to obtain a large number of training samples. One method of classifying data with a limited number of training samples is to employ transfer learning. In this research, we evaluated the effectiveness of birdcall classification using transfer learning from a larger base dataset (2814 samples in 46 classes) to a smaller target dataset (351 samples in 10 classes) using the ResNet-50 CNN. We obtained 79% average validation accuracy on the target dataset in 5-fold cross-validation. The methodology of transfer learning from an ImageNet-trained CNN to a project-specific and a much smaller set of classes and images was extended to the domain of spectrogram images, where the base dataset effectively played the role of the ImageNet. |
Tasks | Transfer Learning |
Published | 2019-09-17 |
URL | https://arxiv.org/abs/1909.07526v1 |
https://arxiv.org/pdf/1909.07526v1.pdf | |
PWC | https://paperswithcode.com/paper/data-efficient-classification-of-birdcall |
Repo | https://github.com/dmitryako/aus10spectrograms |
Framework | none |
Reasoning and Generalization in RL: A Tool Use Perspective
Title | Reasoning and Generalization in RL: A Tool Use Perspective |
Authors | Sam Wenke, Dan Saunders, Mike Qiu, Jim Fleming |
Abstract | Learning to use tools to solve a variety of tasks is an innate ability of humans and has been observed of animals in the wild. However, the underlying mechanisms that are required to learn to use tools are abstract and widely contested in the literature. In this paper, we study tool use in the context of reinforcement learning and propose a framework for analyzing generalization inspired by a classic study of tool using behavior, the trap-tube task. Recently, it has become common in reinforcement learning to measure generalization performance on a single test set of environments. We instead propose transfers that produce multiple test sets that are used to measure specified types of generalization, inspired by abilities demonstrated by animal and human tool users. The source code to reproduce our experiments is publicly available at https://github.com/fomorians/gym_tool_use. |
Tasks | |
Published | 2019-07-03 |
URL | https://arxiv.org/abs/1907.02050v1 |
https://arxiv.org/pdf/1907.02050v1.pdf | |
PWC | https://paperswithcode.com/paper/reasoning-and-generalization-in-rl-a-tool-use |
Repo | https://github.com/fomorians/gym_tool_use |
Framework | none |
The State of Sparsity in Deep Neural Networks
Title | The State of Sparsity in Deep Neural Networks |
Authors | Trevor Gale, Erich Elsen, Sara Hooker |
Abstract | We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller datasets perform inconsistently, and that simple magnitude pruning approaches achieve comparable or better results. Additionally, we replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et al., 2018) at scale and show that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization. Together, these results highlight the need for large-scale benchmarks in the field of model compression. We open-source our code, top performing model checkpoints, and results of all hyperparameter configurations to establish rigorous baselines for future work on compression and sparsification. |
Tasks | Model Compression, Sparse Learning |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09574v1 |
http://arxiv.org/pdf/1902.09574v1.pdf | |
PWC | https://paperswithcode.com/paper/the-state-of-sparsity-in-deep-neural-networks |
Repo | https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn |
Framework | tf |
Analytical Methods for Interpretable Ultradense Word Embeddings
Title | Analytical Methods for Interpretable Ultradense Word Embeddings |
Authors | Philipp Dufter, Hinrich Schütze |
Abstract | Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose. In contrast to Densifier, DensRay can be computed in closed form, is hyperparameter-free and thus more robust than Densifier. We evaluate the three methods on lexicon induction and set-based word analogy. In addition we provide qualitative insights as to how interpretable word spaces can be used for removing gender bias from embeddings. |
Tasks | Word Embeddings |
Published | 2019-04-18 |
URL | https://arxiv.org/abs/1904.08654v2 |
https://arxiv.org/pdf/1904.08654v2.pdf | |
PWC | https://paperswithcode.com/paper/analytical-methods-for-interpretable |
Repo | https://github.com/pdufter/densray |
Framework | none |
DirectPose: Direct End-to-End Multi-Person Pose Estimation
Title | DirectPose: Direct End-to-End Multi-Person Pose Estimation |
Authors | Zhi Tian, Hao Chen, Chunhua Shen |
Abstract | We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose. Inspired by recent anchor-free object detectors, which directly regress the two corners of target bounding-boxes, the proposed framework directly predicts instance-aware keypoints for all the instances from a raw input image, eliminating the need for heuristic grouping in bottom-up methods or bounding-box detection and RoI operations in top-down ones. We also propose a novel Keypoint Alignment (KPAlign) mechanism, which overcomes the main difficulty: lack of the alignment between the convolutional features and predictions in this end-to-end framework. KPAlign improves the framework’s performance by a large margin while still keeping the framework end-to-end trainable. With the only postprocessing non-maximum suppression (NMS), our proposed framework can detect multi-person keypoints with or without bounding-boxes in a single shot. Experiments demonstrate that the end-to-end paradigm can achieve competitive or better performance than previous strong baselines, in both bottom-up and top-down methods. We hope that our end-to-end approach can provide a new perspective for the human pose estimation task. |
Tasks | Multi-Person Pose Estimation, Pose Estimation |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07451v2 |
https://arxiv.org/pdf/1911.07451v2.pdf | |
PWC | https://paperswithcode.com/paper/directpose-direct-end-to-end-multi-person |
Repo | https://github.com/aim-uofa/AdelaiDet |
Framework | pytorch |
Trust Region-Guided Proximal Policy Optimization
Title | Trust Region-Guided Proximal Policy Optimization |
Authors | Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan |
Abstract | Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well. Extensive experiments verify the advantage of the proposed method. |
Tasks | |
Published | 2019-01-29 |
URL | https://arxiv.org/abs/1901.10314v2 |
https://arxiv.org/pdf/1901.10314v2.pdf | |
PWC | https://paperswithcode.com/paper/trust-region-guided-proximal-policy |
Repo | https://github.com/wangyuhuix/TRGPPO |
Framework | tf |
Efficient Parameter-free Clustering Using First Neighbor Relations
Title | Efficient Parameter-free Clustering Using First Neighbor Relations |
Authors | M. Saquib Sarfraz, Vivek Sharma, Rainer Stiefelhagen |
Abstract | We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques. |
Tasks | |
Published | 2019-02-28 |
URL | http://arxiv.org/abs/1902.11266v1 |
http://arxiv.org/pdf/1902.11266v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-parameter-free-clustering-using |
Repo | https://github.com/ssarfraz/FINCH-CLustering |
Framework | none |
Classification with Costly Features as a Sequential Decision-Making Problem
Title | Classification with Costly Features as a Sequential Decision-Making Problem |
Authors | Jaromír Janisch, Tomáš Pevný, Viliam Lisý |
Abstract | This work focuses on a specific classification problem, where the information about a sample is not readily available, but has to be acquired for a cost, and there is a per-sample budget. Inspired by real-world use-cases, we analyze average and hard variations of a directly specified budget. We postulate the problem in its explicit formulation and then convert it into an equivalent MDP, that can be solved with deep reinforcement learning. Also, we evaluate a real-world inspired setting with sparse training dataset with missing features. The presented method performs robustly well in all settings across several distinct datasets, outperforming other prior-art algorithms. The method is flexible, as showcased with all mentioned modifications and can be improved with any domain independent advancement in RL. |
Tasks | Classification with Costly Features, Decision Making |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02564v1 |
https://arxiv.org/pdf/1909.02564v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-with-costly-features-as-a |
Repo | https://github.com/jaara/classification-with-costly-features |
Framework | pytorch |
Learnable Triangulation of Human Pose
Title | Learnable Triangulation of Human Pose |
Authors | Karim Iskakov, Egor Burkov, Victor Lempitsky, Yury Malkov |
Abstract | We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (https://saic-violet.github.io/learnable-triangulation). |
Tasks | 3D Human Pose Estimation, Pose Estimation |
Published | 2019-05-14 |
URL | https://arxiv.org/abs/1905.05754v1 |
https://arxiv.org/pdf/1905.05754v1.pdf | |
PWC | https://paperswithcode.com/paper/190505754 |
Repo | https://github.com/karfly/learnable-triangulation-pytorch |
Framework | pytorch |
An Efficient Sampling-based Method for Online Informative Path Planning in Unknown Environments
Title | An Efficient Sampling-based Method for Online Informative Path Planning in Unknown Environments |
Authors | Lukas Schmid, Michael Pantic, Raghav Khanna, Lionel Ott, Roland Siegwart, Juan Nieto |
Abstract | The ability to plan informative paths online is essential to robot autonomy. In particular, sampling-based approaches are often used as they are capable of using arbitrary information gain formulations. However, they are prone to local minima, resulting in sub-optimal trajectories, and sometimes do not reach global coverage. In this paper, we present a new RRT*-inspired online informative path planning algorithm. Our method continuously expands a single tree of candidate trajectories and rewires segments to maintain the tree and refine intermediate trajectories. This allows the algorithm to achieve global coverage and maximize the utility of a path in a global context, using a single objective function. We demonstrate the algorithm’s capabilities in the applications of autonomous indoor exploration as well as accurate Truncated Signed Distance Field (TSDF)-based 3D reconstruction on-board a Micro Aerial vehicle (MAV). We study the impact of commonly used information gain and cost formulations in these scenarios and propose a novel TSDF-based 3D reconstruction gain and cost-utility formulation. Detailed evaluation in realistic simulation environments show that our approach outperforms state of the art methods in these tasks. Experiments on a real MAV demonstrate the ability of our method to robustly plan in real-time, exploring an indoor environment solely with on-board sensing and computation. We make our framework available for future research. |
Tasks | 3D Reconstruction |
Published | 2019-09-20 |
URL | https://arxiv.org/abs/1909.09548v2 |
https://arxiv.org/pdf/1909.09548v2.pdf | |
PWC | https://paperswithcode.com/paper/an-efficient-sampling-based-method-for-online |
Repo | https://github.com/ethz-asl/mav_active_3d_planning |
Framework | none |