Paper Group ANR 780
Real-Time Object Detection and Localization in Compressive Sensed Video on Embedded Hardware. Logical reduction of metarules. Forward-backward-forward methods with variance reduction for stochastic variational inequalities. Multi-Span Acoustic Modelling using Raw Waveform Signals. Human Gait Database for Normal Walk Collected by Smart Phone Acceler …
Real-Time Object Detection and Localization in Compressive Sensed Video on Embedded Hardware
Title | Real-Time Object Detection and Localization in Compressive Sensed Video on Embedded Hardware |
Authors | Yeshwanth Ravi Theja Bethi, Sathyaprakash Narayanan, Venkat Rangan, Chetan Singh Thakur |
Abstract | Every day around the world, interminable terabytes of data are being captured for surveillance purposes. A typical 1-2MP CCTV camera generates around 7-12GB of data per day. Frame-by-frame processing of such enormous amount of data requires hefty computational resources. In recent years, compressive sensing approaches have shown impressive results in signal processing by reducing the sampling bandwidth. Different sampling mechanisms were developed to incorporate compressive sensing in image and video acquisition. Pixel-wise coded exposure is one among the promising sensing paradigms for capturing videos in the compressed domain, which was also realized into an all-CMOS sensor \cite{Xiong2017}. Though cameras that perform compressive sensing save a lot of bandwidth at the time of sampling and minimize the memory required to store videos, we cannot do much in terms of processing until the videos are reconstructed to the original frames. But, the reconstruction of compressive-sensed (CS) videos still takes a lot of time and is also computationally expensive. In this work, we show that object detection and localization can be possible directly on the CS frames (easily upto 20x compression). To our knowledge, this is the first time that the problem of object detection and localization on CS frames has been attempted. Hence, we also created a dataset for training in the CS domain. We were able to achieve a good accuracy of 46.27% mAP(Mean Average Precision) with the proposed model with an inference time of 23ms directly on the compressed frames(approx. 20 original domain frames), this facilitated for real-time inference which was verified on NVIDIA TX2 embedded board. Our framework will significantly reduce the communication bandwidth, and thus reduction in power as the video compression will be done at the image sensor processing core. |
Tasks | Compressive Sensing, Object Detection, Real-Time Object Detection, Video Compression |
Published | 2019-12-18 |
URL | https://arxiv.org/abs/1912.08519v2 |
https://arxiv.org/pdf/1912.08519v2.pdf | |
PWC | https://paperswithcode.com/paper/real-time-object-detection-and-localization |
Repo | |
Framework | |
Logical reduction of metarules
Title | Logical reduction of metarules |
Authors | Andrew Cropper, Sophie Tourret |
Abstract | Many forms of inductive logic programming (ILP) use \emph{metarules}, second-order Horn clauses, to define the structure of learnable programs and thus the hypothesis space. Deciding which metarules to use for a given learning task is a major open problem and is a trade-off between efficiency and expressivity: the hypothesis space grows given more metarules, so we wish to use fewer metarules, but if we use too few metarules then we lose expressivity. In this paper, we study whether fragments of metarules can be logically reduced to minimal finite subsets. We consider two traditional forms of logical reduction: subsumption and entailment. We also consider a new reduction technique called \emph{derivation reduction}, which is based on SLD-resolution. We compute reduced sets of metarules for fragments relevant to ILP and theoretically show whether these reduced sets are reductions for more general infinite fragments. We experimentally compare learning with reduced sets of metarules on three domains: Michalski trains, string transformations, and game rules. In general, derivation reduced sets of metarules outperforms subsumption and entailment reduced sets, both in terms of predictive accuracies and learning times. |
Tasks | |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.10952v1 |
https://arxiv.org/pdf/1907.10952v1.pdf | |
PWC | https://paperswithcode.com/paper/logical-reduction-of-metarules |
Repo | |
Framework | |
Forward-backward-forward methods with variance reduction for stochastic variational inequalities
Title | Forward-backward-forward methods with variance reduction for stochastic variational inequalities |
Authors | Radu Ioan Bot, Panayotis Mertikopoulos, Mathias Staudigl, Phan Tu Vuong |
Abstract | We develop a new stochastic algorithm with variance reduction for solving pseudo-monotone stochastic variational inequalities. Our method builds on Tseng’s forward-backward-forward (FBF) algorithm, which is known in the deterministic literature to be a valuable alternative to Korpelevich’s extragradient method when solving variational inequalities over a convex and closed set governed by pseudo-monotone, Lipschitz continuous operators. The main computational advantage of Tseng’s algorithm is that it relies only on a single projection step and two independent queries of a stochastic oracle. Our algorithm incorporates a variance reduction mechanism and leads to almost sure (a.s.) convergence to an optimal solution. To the best of our knowledge, this is the first stochastic look-ahead algorithm achieving this by using only a single projection at each iteration.. |
Tasks | |
Published | 2019-02-09 |
URL | http://arxiv.org/abs/1902.03355v1 |
http://arxiv.org/pdf/1902.03355v1.pdf | |
PWC | https://paperswithcode.com/paper/forward-backward-forward-methods-with |
Repo | |
Framework | |
Multi-Span Acoustic Modelling using Raw Waveform Signals
Title | Multi-Span Acoustic Modelling using Raw Waveform Signals |
Authors | Patrick von Platen, Chao Zhang, Philip Woodland |
Abstract | Traditional automatic speech recognition (ASR) systems often use an acoustic model (AM) built on handcrafted acoustic features, such as log Mel-filter bank (FBANK) values. Recent studies found that AMs with convolutional neural networks (CNNs) can directly use the raw waveform signal as input. Given sufficient training data, these AMs can yield a competitive word error rate (WER) to those built on FBANK features. This paper proposes a novel multi-span structure for acoustic modelling based on the raw waveform with multiple streams of CNN input layers, each processing a different span of the raw waveform signal. Evaluation on both the single channel CHiME4 and AMI data sets show that multi-span AMs give a lower WER than FBANK AMs by an average of about 5% (relative). Analysis of the trained multi-span model reveals that the CNNs can learn filters that are rather different to the log Mel filters. Furthermore, the paper shows that a widely used single span raw waveform AM can be improved by using a smaller CNN kernel size and increased stride to yield improved WERs. |
Tasks | Acoustic Modelling, Speech Recognition |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.11047v2 |
https://arxiv.org/pdf/1906.11047v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-span-acoustic-modelling-using-raw |
Repo | |
Framework | |
Human Gait Database for Normal Walk Collected by Smart Phone Accelerometer
Title | Human Gait Database for Normal Walk Collected by Smart Phone Accelerometer |
Authors | Amir Vajdi, Mohammad Reza Zaghian, Saman Farahmand, Elham Rastegar, Kian Maroofi, Shaohua Jia, Marc Pomplun, Nurit Haspel, Akram Bayat |
Abstract | The goal of this study is to introduce a comprehensive gait database of 93 human subjects who walked between two endpoints during two different sessions and record their gait data using two smartphones, one was attached to the right thigh and another one on the left side of the waist. This data is collected with the intention to be utilized by a deep learning-based method which requires enough time points. The metadata including age, gender, smoking, daily exercise time, height, and weight of an individual is recorded. this data set is publicly available. |
Tasks | |
Published | 2019-05-04 |
URL | https://arxiv.org/abs/1905.03109v2 |
https://arxiv.org/pdf/1905.03109v2.pdf | |
PWC | https://paperswithcode.com/paper/human-gait-database-for-normal-walk-collected |
Repo | |
Framework | |
Deep SCNN-based Real-time Object Detection for Self-driving Vehicles Using LiDAR Temporal Data
Title | Deep SCNN-based Real-time Object Detection for Self-driving Vehicles Using LiDAR Temporal Data |
Authors | Shibo Zhou, Ying Chen, Xiaohua Li, Arindam Sanyal |
Abstract | Real-time accurate detection of three-dimensional (3D) objects is a fundamental necessity for self-driving vehicles. Most existing computer-vision approaches are based on convolutional neural networks (CNNs). Although the CNN-based approaches can achieve high detection accuracy, their high energy consumption is a severe drawback. To resolve this problem, novel energy efficient approaches should be explored. Spiking neural network (SNN) is a promising direction because it has orders-of-magnitude lower energy consumption than CNN. Unfortunately, SNN has been studied mostly in small networks only. The application of SNN for large 3D object detection networks has not been explored. In this paper, we integrate spiking convolutional neural network (SCNN) with temporal coding into YOLOv2 architecture for real-time object detection. To take the advantage of spiking signals, we develop a novel data preprocessing layer that translates 3D point-cloud data into spike time data. We also propose an analog circuit to implement the non-leaky integrate and fire neuron used in our SCNN, from which we obtain the energy consumption of each spike. Moreover, we present a method to estimate the network sparsity and the energy consumption of the overall object detection network. Extensive experiments have been conducted based on the KITTI dataset, which show that the proposed network can reach competitive detection accuracy as existing approaches, yet with much lower average energy consumption. When implemented in dedicated hardware, our network could have a mean sparsity of 56.24% and extremely low total energy consumption of 0.247mJ only. Implemented in NVIDIA GTX 1080i GPU, we can achieve 35.7 fps frame rate, high enough for real-time object detection. |
Tasks | 3D Object Detection, Object Detection, Real-Time Object Detection |
Published | 2019-12-17 |
URL | https://arxiv.org/abs/1912.07906v3 |
https://arxiv.org/pdf/1912.07906v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-scnn-based-real-time-object-detection |
Repo | |
Framework | |
GIBBONFINDR: An R package for the detection and classification of acoustic signals
Title | GIBBONFINDR: An R package for the detection and classification of acoustic signals |
Authors | Dena J. Clink, Holger Klinck |
Abstract | The recent improvements in recording technology, data storage and battery life have led to an increased interest in the use of passive acoustic monitoring for a variety of research questions. One of the main obstacles in implementing wide scale acoustic monitoring programs in terrestrial environments is the lack of user-friendly, open source programs for processing large sound archives. Here we describe the new, open-source R package GIBBONFINDR which has functions for detection, classification and visualization of acoustic signals using a variety of readily available machine learning algorithms in the R programming environment. We provide a case study showing how GIBBONFINDR functions can be used in a workflow to detect and classify Bornean gibbon (Hylobates muelleri) calls in long-term acoustic data sets recorded in Danum Valley Conservation Area, Sabah, Malaysia. Machine learning is currently one of the most rapidly growing fields– with applications across many disciplines– and our goal is to make commonly used signal processing techniques and machine learning algorithms readily available for ecologists who are interested in incorporating bioacoustics techniques into their research. |
Tasks | Acoustic Modelling |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02572v2 |
https://arxiv.org/pdf/1906.02572v2.pdf | |
PWC | https://paperswithcode.com/paper/gibbonr-an-r-package-for-the-detection-and |
Repo | |
Framework | |
Heavy Hitters and Bernoulli Convolutions
Title | Heavy Hitters and Bernoulli Convolutions |
Authors | Alexander Kushkuley |
Abstract | A very simple event frequency approximation algorithm that is sensitive to event timeliness is suggested. The algorithm iteratively updates categorical click-distribution, producing (path of) a random walk on a standard $n$-dimensional simplex. Under certain conditions, this random walk is self-similar and corresponds to a biased Bernoulli convolution. Algorithm evaluation naturally leads to estimation of moments of biased (finite and infinite) Bernoulli convolutions. |
Tasks | |
Published | 2019-05-22 |
URL | https://arxiv.org/abs/1905.08930v2 |
https://arxiv.org/pdf/1905.08930v2.pdf | |
PWC | https://paperswithcode.com/paper/heavy-hitters-and-bernoulli-convolutions |
Repo | |
Framework | |
Deriving star cluster parameters with convolutional neural networks. II. Extinction and cluster/background classification
Title | Deriving star cluster parameters with convolutional neural networks. II. Extinction and cluster/background classification |
Authors | J. Bialopetravičius, D. Narbutis, V. Vansevičius |
Abstract | Context. Convolutional neural networks (CNNs) have been established as the go-to method for fast object detection and classification on natural images. This opens the door for astrophysical parameter inference on the exponentially increasing amount of sky survey data. Until now, star cluster analysis was based on integral or resolved stellar photometry, which limits the amount of information that can be extracted from individual pixels of cluster images. Aims. We aim to create a CNN capable of inferring star cluster evolutionary, structural, and environmental parameters from multi-band images, as well to demonstrate its capabilities in discriminating genuine clusters from galactic stellar backgrounds. Methods. A CNN based on the deep residual network (ResNet) architecture was created and trained to infer cluster ages, masses, sizes, and extinctions, with respect to the degeneracies between them. Mock clusters placed on M83 Hubble Space Telescope (HST) images utilizing three photometric passbands (F336W, F438W, and F814W) were used. The CNN is also capable of predicting the likelihood of a cluster’s presence in an image, as well as quantifying its visibility (signal-to-noise). Results. The CNN was tested on mock images of artificial clusters and has demonstrated reliable inference results for clusters of ages $\lesssim$100 Myr, extinctions $A_V$ between 0 and 3 mag, masses between $3\times10^3$ and $3\times10^5$ ${\rm M_\odot}$, and sizes between 0.04 and 0.4 arcsec at the distance of the M83 galaxy. Real M83 galaxy cluster parameter inference tests were performed with objects taken from previous studies and have demonstrated consistent results. |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2019-11-22 |
URL | https://arxiv.org/abs/1911.10059v1 |
https://arxiv.org/pdf/1911.10059v1.pdf | |
PWC | https://paperswithcode.com/paper/deriving-star-cluster-parameters-with-1 |
Repo | |
Framework | |
An error bound for Lasso and Group Lasso in high dimensions
Title | An error bound for Lasso and Group Lasso in high dimensions |
Authors | Antoine Dedieu |
Abstract | We leverage recent advances in high-dimensional statistics to derive new L2 estimation upper bounds for Lasso and Group Lasso in high-dimensions. For Lasso, our bounds scale as $(k^*/n) \log(p/k^*)$—$n\times p$ is the size of the design matrix and $k^*$ the dimension of the ground truth $\boldsymbol{\beta}^*$—and match the optimal minimax rate. For Group Lasso, our bounds scale as $(s^*/n) \log\left( G / s^* \right) + m^* / n$—$G$ is the total number of groups and $m^*$ the number of coefficients in the $s^*$ groups which contain $\boldsymbol{\beta}^*$—and improve over existing results. We additionally show that when the signal is strongly group-sparse, Group Lasso is superior to Lasso. |
Tasks | |
Published | 2019-12-21 |
URL | https://arxiv.org/abs/1912.11398v2 |
https://arxiv.org/pdf/1912.11398v2.pdf | |
PWC | https://paperswithcode.com/paper/an-error-bound-for-lasso-and-group-lasso-in |
Repo | |
Framework | |
xYOLO: A Model For Real-Time Object Detection In Humanoid Soccer On Low-End Hardware
Title | xYOLO: A Model For Real-Time Object Detection In Humanoid Soccer On Low-End Hardware |
Authors | Daniel Barry, Munir Shah, Merel Keijsers, Humayun Khan, Banon Hopman |
Abstract | With the emergence of onboard vision processing for areas such as the internet of things (IoT), edge computing and autonomous robots, there is increasing demand for computationally efficient convolutional neural network (CNN) models to perform real-time object detection on resource constraints hardware devices. Tiny-YOLO is generally considered as one of the faster object detectors for low-end devices and is the basis for our work. Our experiments on this network have shown that Tiny-YOLO can achieve 0.14 frames per second(FPS) on the Raspberry Pi 3 B, which is too slow for soccer playing autonomous humanoid robots detecting goal and ball objects. In this paper we propose an adaptation to the YOLO CNN model named xYOLO, that can achieve object detection at a speed of 9.66 FPS on the Raspberry Pi 3 B. This is achieved by trading an acceptable amount of accuracy, making the network approximately 70 times faster than Tiny-YOLO. Greater inference speed-ups were also achieved on a desktop CPU and GPU. Additionally we contribute an annotated Darknet dataset for goal and ball detection. |
Tasks | Object Detection, Real-Time Object Detection |
Published | 2019-10-08 |
URL | https://arxiv.org/abs/1910.03159v1 |
https://arxiv.org/pdf/1910.03159v1.pdf | |
PWC | https://paperswithcode.com/paper/xyolo-a-model-for-real-time-object-detection |
Repo | |
Framework | |
Face Parsing with RoI Tanh-Warping
Title | Face Parsing with RoI Tanh-Warping |
Authors | Jinpeng Lin, Hao Yang, Dong Chen, Ming Zeng, Fang Wen, Lu Yuan |
Abstract | Face parsing computes pixel-wise label maps for different semantic components (e.g., hair, mouth, eyes) from face images. Existing face parsing literature have illustrated significant advantages by focusing on individual regions of interest (RoIs) for faces and facial components. However, the traditional crop-and-resize focusing mechanism ignores all contextual area outside the RoIs, and thus is not suitable when the component area is unpredictable, e.g. hair. Inspired by the physiological vision system of human, we propose a novel RoI Tanh-warping operator that combines the central vision and the peripheral vision together. It addresses the dilemma between a limited sized RoI for focusing and an unpredictable area of surrounding context for peripheral information. To this end, we propose a novel hybrid convolutional neural network for face parsing. It uses hierarchical local based method for inner facial components and global methods for outer facial components. The whole framework is simple and principled, and can be trained end-to-end. To facilitate future research of face parsing, we also manually relabel the training data of the HELEN dataset and will make it public. Experiments on both HELEN and LFW-PL benchmarks demonstrate that our method surpasses state-of-the-art methods. |
Tasks | |
Published | 2019-06-04 |
URL | https://arxiv.org/abs/1906.01342v1 |
https://arxiv.org/pdf/1906.01342v1.pdf | |
PWC | https://paperswithcode.com/paper/face-parsing-with-roi-tanh-warping-1 |
Repo | |
Framework | |
Learning More with Less: GAN-based Medical Image Augmentation
Title | Learning More with Less: GAN-based Medical Image Augmentation |
Authors | Changhee Han, Kohei Murao, Shin’ichi Satoh, Hideki Nakayama |
Abstract | Convolutional Neural Network (CNN)-based accurate prediction typically requires large-scale annotated training data. In Medical Imaging, however, both obtaining medical data and annotating them by expert physicians are challenging; to overcome this lack of data, Data Augmentation (DA) using Generative Adversarial Networks (GANs) is essential, since they can synthesize additional annotated training data to handle small and fragmented medical images from various scanners–those generated images, realistic but completely novel, can further fill the real image distribution uncovered by the original dataset. As a tutorial, this paper introduces GAN-based Medical Image Augmentation, along with tricks to boost classification/object detection/segmentation performance using them, based on our experience and related work. Moreover, we show our first GAN-based DA work using automatic bounding box annotation, for robust CNN-based brain metastases detection on 256 x 256 MR images; GAN-based DA can boost 10% sensitivity in diagnosis with a clinically acceptable number of additional False Positives, even with highly-rough and inconsistent bounding boxes. |
Tasks | Data Augmentation, Image Augmentation, Object Detection |
Published | 2019-03-29 |
URL | https://arxiv.org/abs/1904.00838v3 |
https://arxiv.org/pdf/1904.00838v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-more-with-less-gan-based-medical |
Repo | |
Framework | |
Mind2Mind : transfer learning for GANs
Title | Mind2Mind : transfer learning for GANs |
Authors | Yaël Frégier, Jean-Baptiste Gouray |
Abstract | We propose an approach for transfer learning with GAN architectures. In general, transfer learning enables deep networks for classification tasks to be trained with limited computing and data resources. However a similar approach is missing in the specific context of generative tasks. This is partly due to the fact that the extremal layers of the two networks of a GAN, which should be learned during transfer, are on two opposite sides. This requires back-propagating information through both networks, which is computationally expensive. We develop a method to directly train these extremal layers against each other, by-passing all the intermediate layers. We also prove rigorously, for Wasserstein GANs, a theorem ensuring the convergence of the learning of the transferred GAN. Finally, we compare our method to state-of-the-art methods and show that our method converges much faster and requires less data. |
Tasks | Transfer Learning |
Published | 2019-06-27 |
URL | https://arxiv.org/abs/1906.11613v1 |
https://arxiv.org/pdf/1906.11613v1.pdf | |
PWC | https://paperswithcode.com/paper/mind2mind-transfer-learning-for-gans |
Repo | |
Framework | |
Intra-Camera Supervised Person Re-Identification: A New Benchmark
Title | Intra-Camera Supervised Person Re-Identification: A New Benchmark |
Authors | Xiangping Zhu, Xiatian Zhu, Minxian Li, Vittorio Murino, Shaogang Gong |
Abstract | Existing person re-identification (re-id) methods rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process therefore leading to poor scalability in practical re-id applications. To overcome this fundamental limitation, we consider person re-identification without inter-camera identity association but only with identity labels independently annotated within each individual camera-view. This eliminates the most time-consuming and tedious inter-camera identity labelling process in order to significantly reduce the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which we call Intra-Camera Supervised (ICS) person re-id. Under this ICS setting with weaker label supervision, we formulate a Multi-Task Multi-Label (MTML) deep learning method. Given no inter-camera association, MTML is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MTML can also efficiently learn the discriminative re-id feature representations by fully using the available identity labels within each camera-view. Extensive experiments demonstrate the performance superiority of our MTML model over the state-of-the-art alternative methods on three large-scale person re-id datasets in the proposed intra-camera supervised learning setting. |
Tasks | Multi-Label Learning, Person Re-Identification |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.10344v1 |
https://arxiv.org/pdf/1908.10344v1.pdf | |
PWC | https://paperswithcode.com/paper/intra-camera-supervised-person-re |
Repo | |
Framework | |