January 31, 2020

2923 words 14 mins read

Paper Group AWR 425

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels. SumQE: a BERT-based Summary Quality Estimation Model. Onset detection: A new approach to QBH system. Learning in the Machine: To Share or Not to Share?. SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds. Data-Efficient Classification of Birdcall Through Co …

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels


Title	Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels
Authors	Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian
Abstract	Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility of specifying new events in video in a more traditional manner: by writing queries that compose outputs of existing, pre-trained models. To write these queries, we have developed Rekall, a library that exposes a data model and programming model for compositional video event specification. Rekall represents video annotations from different sources (object detectors, transcripts, etc.) as spatiotemporal labels associated with continuous volumes of spacetime in a video, and provides operators for composing labels into queries that model new video events. We demonstrate the use of Rekall in analyzing video from cable TV news broadcasts, films, static-camera vehicular video streams, and commercial autonomous vehicle logs. In these efforts, domain experts were able to quickly (in a few hours to a day) author queries that enabled the accurate detection of new events (on par with, and in some cases much more accurate than, learned approaches) and to rapidly retrieve video clips for human-in-the-loop tasks such as video content curation and training data curation. Finally, in a user study, novice users of Rekall were able to author queries to retrieve new events in video given just one hour of query development time.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1910.02993v1
PDF	https://arxiv.org/pdf/1910.02993v1.pdf
PWC	https://paperswithcode.com/paper/rekall-specifying-video-events-using
Repo	https://github.com/scanner-research/rekall
Framework	none

SumQE: a BERT-based Summary Quality Estimation Model


Title	SumQE: a BERT-based Summary Quality Estimation Model
Authors	Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki, Ion Androutsopoulos
Abstract	We propose SumQE, a novel Quality Estimation model for summarization based on BERT. The model addresses linguistic quality aspects that are only indirectly captured by content-based approaches to summary evaluation, without involving comparison with human references. SumQE achieves very high correlations with human ratings, outperforming simpler models addressing these linguistic aspects. Predictions of the SumQE model can be used for system development, and to inform users of the quality of automatically produced summaries and other types of generated text.
Tasks
Published	2019-09-02
URL	https://arxiv.org/abs/1909.00578v1
PDF	https://arxiv.org/pdf/1909.00578v1.pdf
PWC	https://paperswithcode.com/paper/sumqe-a-bert-based-summary-quality-estimation
Repo	https://github.com/nlpaueb/SumQE
Framework	tf

Onset detection: A new approach to QBH system


Title	Onset detection: A new approach to QBH system
Authors	Ritwik Bhaduri, Soham Bonnerjee, Subhrajyoty Roy
Abstract	Query by Humming (QBH) is a system to provide a user with the song(s) which the user hums to the system. Current QBH method requires the extraction of onset and pitch information in order to track similarity with various versions of different songs. However, we here focus on detecting precise onsets only and use them to build a QBH system which is better than existing methods in terms of speed and memory and empirically in terms of accuracy. We also provide statistical analogy for onset detection functions and provide a measure of error in our algorithm.
Tasks
Published	2019-08-17
URL	https://arxiv.org/abs/1908.07409v2
PDF	https://arxiv.org/pdf/1908.07409v2.pdf
PWC	https://paperswithcode.com/paper/onset-detection-a-new-approach-to-qbh-system
Repo	https://github.com/subroy13/OnsetDetection
Framework	none


Title	Learning in the Machine: To Share or Not to Share?
Authors	Jordan Ott, Erik Linstead, Nicholas LaHaye, Pierre Baldi
Abstract	Weight-sharing is one of the pillars behind Convolutional Neural Networks and their successes. However, in physical neural systems such as the brain, weight-sharing is implausible. This discrepancy raises the fundamental question of whether weight-sharing is necessary. If so, to which degree of precision? If not, what are the alternatives? The goal of this study is to investigate these questions, primarily through simulations where the weight-sharing assumption is relaxed. Taking inspiration from neural circuitry, we explore the use of Free Convolutional Networks and neurons with variable connection patterns. Using Free Convolutional Networks, we show that while weight-sharing is a pragmatic optimization approach, it is not a necessity in computer vision applications. Furthermore, Free Convolutional Networks match the performance observed in standard architectures when trained using properly translated data (akin to video). Under the assumption of translationally augmented data, Free Convolutional Networks learn translationally invariant representations that yield an approximate form of weight sharing.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.11483v2
PDF	https://arxiv.org/pdf/1909.11483v2.pdf
PWC	https://paperswithcode.com/paper/learning-in-the-machine-to-share-or-not-to
Repo	https://github.com/jordanott/WeightSharing
Framework	none

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds


Title	SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds
Authors	Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai
Abstract	With the development of deep neural networks, the demand for a significant amount of annotated training data becomes the performance bottlenecks in many fields of research and applications. Image synthesis can generate annotated images automatically and freely, which gains increasing attention recently. In this paper, we propose to synthesize scene text images from the 3D virtual worlds, where the precise descriptions of scenes, editable illumination/visibility, and realistic physics are provided. Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety. In this way, real-world variations, including complex perspective transformations, various illuminations, and occlusions, can be realized in our synthesized scene text images. Moreover, the same text instances with various viewpoints can be produced by randomly moving and rotating the virtual camera, which acts as human eyes. The experiments on the standard scene text detection benchmarks using the generated synthetic data demonstrate the effectiveness and superiority of the proposed method. The code and synthetic data is available at: https://github.com/MhLiao/SynthText3D
Tasks	Image Generation, Scene Text Detection
Published	2019-07-13
URL	https://arxiv.org/abs/1907.06007v2
PDF	https://arxiv.org/pdf/1907.06007v2.pdf
PWC	https://paperswithcode.com/paper/synthtext3d-synthesizing-scene-text-images
Repo	https://github.com/MhLiao/SynthText3D
Framework	none

Data-Efficient Classification of Birdcall Through Convolutional Neural Networks Transfer Learning


Title	Data-Efficient Classification of Birdcall Through Convolutional Neural Networks Transfer Learning
Authors	Dina B. Efremova, Mangalam Sankupellay, Dmitry A. Konovalov
Abstract	Deep learning Convolutional Neural Network (CNN) models are powerful classification models but require a large amount of training data. In niche domains such as bird acoustics, it is expensive and difficult to obtain a large number of training samples. One method of classifying data with a limited number of training samples is to employ transfer learning. In this research, we evaluated the effectiveness of birdcall classification using transfer learning from a larger base dataset (2814 samples in 46 classes) to a smaller target dataset (351 samples in 10 classes) using the ResNet-50 CNN. We obtained 79% average validation accuracy on the target dataset in 5-fold cross-validation. The methodology of transfer learning from an ImageNet-trained CNN to a project-specific and a much smaller set of classes and images was extended to the domain of spectrogram images, where the base dataset effectively played the role of the ImageNet.
Tasks	Transfer Learning
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07526v1
PDF	https://arxiv.org/pdf/1909.07526v1.pdf
PWC	https://paperswithcode.com/paper/data-efficient-classification-of-birdcall
Repo	https://github.com/dmitryako/aus10spectrograms
Framework	none

Reasoning and Generalization in RL: A Tool Use Perspective


Title	Reasoning and Generalization in RL: A Tool Use Perspective
Authors	Sam Wenke, Dan Saunders, Mike Qiu, Jim Fleming
Abstract	Learning to use tools to solve a variety of tasks is an innate ability of humans and has been observed of animals in the wild. However, the underlying mechanisms that are required to learn to use tools are abstract and widely contested in the literature. In this paper, we study tool use in the context of reinforcement learning and propose a framework for analyzing generalization inspired by a classic study of tool using behavior, the trap-tube task. Recently, it has become common in reinforcement learning to measure generalization performance on a single test set of environments. We instead propose transfers that produce multiple test sets that are used to measure specified types of generalization, inspired by abilities demonstrated by animal and human tool users. The source code to reproduce our experiments is publicly available at https://github.com/fomorians/gym_tool_use.
Tasks
Published	2019-07-03
URL	https://arxiv.org/abs/1907.02050v1
PDF	https://arxiv.org/pdf/1907.02050v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-and-generalization-in-rl-a-tool-use
Repo	https://github.com/fomorians/gym_tool_use
Framework	none

The State of Sparsity in Deep Neural Networks


Title	The State of Sparsity in Deep Neural Networks
Authors	Trevor Gale, Erich Elsen, Sara Hooker
Abstract	We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet. Across thousands of experiments, we demonstrate that complex techniques (Molchanov et al., 2017; Louizos et al., 2017b) shown to yield high compression rates on smaller datasets perform inconsistently, and that simple magnitude pruning approaches achieve comparable or better results. Additionally, we replicate the experiments performed by (Frankle & Carbin, 2018) and (Liu et al., 2018) at scale and show that unstructured sparse architectures learned through pruning cannot be trained from scratch to the same test set performance as a model trained with joint sparsification and optimization. Together, these results highlight the need for large-scale benchmarks in the field of model compression. We open-source our code, top performing model checkpoints, and results of all hyperparameter configurations to establish rigorous baselines for future work on compression and sparsification.
Tasks	Model Compression, Sparse Learning
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09574v1
PDF	http://arxiv.org/pdf/1902.09574v1.pdf
PWC	https://paperswithcode.com/paper/the-state-of-sparsity-in-deep-neural-networks
Repo	https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn
Framework	tf

Analytical Methods for Interpretable Ultradense Word Embeddings


Title	Analytical Methods for Interpretable Ultradense Word Embeddings
Authors	Philipp Dufter, Hinrich Schütze
Abstract	Word embeddings are useful for a wide variety of tasks, but they lack interpretability. By rotating word spaces, interpretable dimensions can be identified while preserving the information contained in the embeddings without any loss. In this work, we investigate three methods for making word spaces interpretable by rotation: Densifier (Rothe et al., 2016), linear SVMs and DensRay, a new method we propose. In contrast to Densifier, DensRay can be computed in closed form, is hyperparameter-free and thus more robust than Densifier. We evaluate the three methods on lexicon induction and set-based word analogy. In addition we provide qualitative insights as to how interpretable word spaces can be used for removing gender bias from embeddings.
Tasks	Word Embeddings
Published	2019-04-18
URL	https://arxiv.org/abs/1904.08654v2
PDF	https://arxiv.org/pdf/1904.08654v2.pdf
PWC	https://paperswithcode.com/paper/analytical-methods-for-interpretable
Repo	https://github.com/pdufter/densray
Framework	none

DirectPose: Direct End-to-End Multi-Person Pose Estimation


Title	DirectPose: Direct End-to-End Multi-Person Pose Estimation
Authors	Zhi Tian, Hao Chen, Chunhua Shen
Abstract	We propose the first direct end-to-end multi-person pose estimation framework, termed DirectPose. Inspired by recent anchor-free object detectors, which directly regress the two corners of target bounding-boxes, the proposed framework directly predicts instance-aware keypoints for all the instances from a raw input image, eliminating the need for heuristic grouping in bottom-up methods or bounding-box detection and RoI operations in top-down ones. We also propose a novel Keypoint Alignment (KPAlign) mechanism, which overcomes the main difficulty: lack of the alignment between the convolutional features and predictions in this end-to-end framework. KPAlign improves the framework’s performance by a large margin while still keeping the framework end-to-end trainable. With the only postprocessing non-maximum suppression (NMS), our proposed framework can detect multi-person keypoints with or without bounding-boxes in a single shot. Experiments demonstrate that the end-to-end paradigm can achieve competitive or better performance than previous strong baselines, in both bottom-up and top-down methods. We hope that our end-to-end approach can provide a new perspective for the human pose estimation task.
Tasks	Multi-Person Pose Estimation, Pose Estimation
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07451v2
PDF	https://arxiv.org/pdf/1911.07451v2.pdf
PWC	https://paperswithcode.com/paper/directpose-direct-end-to-end-multi-person
Repo	https://github.com/aim-uofa/AdelaiDet
Framework	pytorch

Trust Region-Guided Proximal Policy Optimization


Title	Trust Region-Guided Proximal Policy Optimization
Authors	Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan
Abstract	Proximal policy optimization (PPO) is one of the most popular deep reinforcement learning (RL) methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, as a model-free RL method, the success of PPO relies heavily on the effectiveness of its exploratory policy search. In this paper, we give an in-depth analysis on the exploration behavior of PPO, and show that PPO is prone to suffer from the risk of lack of exploration especially under the case of bad initialization, which may lead to the failure of training or being trapped in bad local optima. To address these issues, we proposed a novel policy optimization method, named Trust Region-Guided PPO (TRGPPO), which adaptively adjusts the clipping range within the trust region. We formally show that this method not only improves the exploration ability within the trust region but enjoys a better performance bound compared to the original PPO as well. Extensive experiments verify the advantage of the proposed method.
Tasks
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10314v2
PDF	https://arxiv.org/pdf/1901.10314v2.pdf
PWC	https://paperswithcode.com/paper/trust-region-guided-proximal-policy
Repo	https://github.com/wangyuhuix/TRGPPO
Framework	tf

Efficient Parameter-free Clustering Using First Neighbor Relations


Title	Efficient Parameter-free Clustering Using First Neighbor Relations
Authors	M. Saquib Sarfraz, Vivek Sharma, Rainer Stiefelhagen
Abstract	We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.
Tasks
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11266v1
PDF	http://arxiv.org/pdf/1902.11266v1.pdf
PWC	https://paperswithcode.com/paper/efficient-parameter-free-clustering-using
Repo	https://github.com/ssarfraz/FINCH-CLustering
Framework	none

Classification with Costly Features as a Sequential Decision-Making Problem


Title	Classification with Costly Features as a Sequential Decision-Making Problem
Authors	Jaromír Janisch, Tomáš Pevný, Viliam Lisý
Abstract	This work focuses on a specific classification problem, where the information about a sample is not readily available, but has to be acquired for a cost, and there is a per-sample budget. Inspired by real-world use-cases, we analyze average and hard variations of a directly specified budget. We postulate the problem in its explicit formulation and then convert it into an equivalent MDP, that can be solved with deep reinforcement learning. Also, we evaluate a real-world inspired setting with sparse training dataset with missing features. The presented method performs robustly well in all settings across several distinct datasets, outperforming other prior-art algorithms. The method is flexible, as showcased with all mentioned modifications and can be improved with any domain independent advancement in RL.
Tasks	Classification with Costly Features, Decision Making
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02564v1
PDF	https://arxiv.org/pdf/1909.02564v1.pdf
PWC	https://paperswithcode.com/paper/classification-with-costly-features-as-a
Repo	https://github.com/jaara/classification-with-costly-features
Framework	pytorch

Learnable Triangulation of Human Pose


Title	Learnable Triangulation of Human Pose
Authors	Karim Iskakov, Egor Burkov, Victor Lempitsky, Yury Malkov
Abstract	We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (https://saic-violet.github.io/learnable-triangulation).
Tasks	3D Human Pose Estimation, Pose Estimation
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05754v1
PDF	https://arxiv.org/pdf/1905.05754v1.pdf
PWC	https://paperswithcode.com/paper/190505754
Repo	https://github.com/karfly/learnable-triangulation-pytorch
Framework	pytorch

An Efficient Sampling-based Method for Online Informative Path Planning in Unknown Environments


Title	An Efficient Sampling-based Method for Online Informative Path Planning in Unknown Environments
Authors	Lukas Schmid, Michael Pantic, Raghav Khanna, Lionel Ott, Roland Siegwart, Juan Nieto
Abstract	The ability to plan informative paths online is essential to robot autonomy. In particular, sampling-based approaches are often used as they are capable of using arbitrary information gain formulations. However, they are prone to local minima, resulting in sub-optimal trajectories, and sometimes do not reach global coverage. In this paper, we present a new RRT*-inspired online informative path planning algorithm. Our method continuously expands a single tree of candidate trajectories and rewires segments to maintain the tree and refine intermediate trajectories. This allows the algorithm to achieve global coverage and maximize the utility of a path in a global context, using a single objective function. We demonstrate the algorithm’s capabilities in the applications of autonomous indoor exploration as well as accurate Truncated Signed Distance Field (TSDF)-based 3D reconstruction on-board a Micro Aerial vehicle (MAV). We study the impact of commonly used information gain and cost formulations in these scenarios and propose a novel TSDF-based 3D reconstruction gain and cost-utility formulation. Detailed evaluation in realistic simulation environments show that our approach outperforms state of the art methods in these tasks. Experiments on a real MAV demonstrate the ability of our method to robustly plan in real-time, exploring an indoor environment solely with on-board sensing and computation. We make our framework available for future research.
Tasks	3D Reconstruction
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09548v2
PDF	https://arxiv.org/pdf/1909.09548v2.pdf
PWC	https://paperswithcode.com/paper/an-efficient-sampling-based-method-for-online
Repo	https://github.com/ethz-asl/mav_active_3d_planning
Framework	none