January 30, 2020

3137 words 15 mins read

Paper Group ANR 359

Towards Precise End-to-end Weakly Supervised Object Detection Network. Event-based attention and tracking on neuromorphic hardware. A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement. Self-adaptive Potential-based Stopping Criteria for Particle Swarm Optimization. Automatically Evaluating Balance: A Machine Lea …

Towards Precise End-to-end Weakly Supervised Object Detection Network


Title	Towards Precise End-to-end Weakly Supervised Object Detection Network
Authors	Ke Yang, Dongsheng Li, Yong Dou
Abstract	It is challenging for weakly supervised object detection network to precisely predict the positions of the objects, since there are no instance-level category annotations. Most existing methods tend to solve this problem by using a two-phase learning procedure, i.e., multiple instance learning detector followed by a fully supervised learning detector with bounding-box regression. Based on our observation, this procedure may lead to local minima for some object categories. In this paper, we propose to jointly train the two phases in an end-to-end manner to tackle this problem. Specifically, we design a single network with both multiple instance learning and bounding-box regression branches that share the same backbone. Meanwhile, a guided attention module using classification loss is added to the backbone for effectively extracting the implicit location information in the features. Experimental results on public datasets show that our method achieves state-of-the-art performance.
Tasks	Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection
Published	2019-11-27
URL	https://arxiv.org/abs/1911.12148v1
PDF	https://arxiv.org/pdf/1911.12148v1.pdf
PWC	https://paperswithcode.com/paper/towards-precise-end-to-end-weakly-supervised-1
Repo
Framework

Event-based attention and tracking on neuromorphic hardware


Title	Event-based attention and tracking on neuromorphic hardware
Authors	Alpha Renner, Matthew Evanusa, Yulia Sandamirskaya
Abstract	We present a fully event-driven vision and processing system for selective attention and tracking, realized on a neuromorphic processor Loihi interfaced to an event-based Dynamic Vision Sensor DAVIS. The attention mechanism is realized as a recurrent spiking neural network that implements attractor-dynamics of dynamic neural fields. We demonstrate capability of the system to create sustained activation that supports object tracking when distractors are present or when the object slows down or stops, reducing the number of generated events.
Tasks	Object Tracking
Published	2019-07-09
URL	https://arxiv.org/abs/1907.04060v1
PDF	https://arxiv.org/pdf/1907.04060v1.pdf
PWC	https://paperswithcode.com/paper/event-based-attention-and-tracking-on
Repo
Framework

A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement


Title	A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement
Authors	Kai Zhen, Mi Suk Lee, Minje Kim
Abstract	In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters.
Tasks	Speech Enhancement
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06468v4
PDF	https://arxiv.org/pdf/1908.06468v4.pdf
PWC	https://paperswithcode.com/paper/efficient-context-aggregation-for-end-to-end
Repo
Framework

Self-adaptive Potential-based Stopping Criteria for Particle Swarm Optimization


Title	Self-adaptive Potential-based Stopping Criteria for Particle Swarm Optimization
Authors	Bernd Bassimir, Manuel Schmitt, Rolf Wanka
Abstract	We study the variant of Particle Swarm Optimization (PSO) that applies random velocities in a dimension instead of the regular velocity update equations as soon as the so-called potential of the swarm falls below a certain bound in this dimension, arbitrarily set by the user. In this case, the swarm performs a forced move. In this paper, we are interested in how, by counting the forced moves, the swarm can decide for itself to stop its movement because it is improbable to find better solution candidates as it already has found. We formally prove that when the swarm is close to a (local) optimum, it behaves like a blind-searching cloud, and that the frequency of forced moves exceeds a certain, objective function-independent value. Based on this observation, we define stopping criteria and evaluate them experimentally showing that good solution candidates can be found much faster than applying other criteria.
Tasks
Published	2019-05-29
URL	https://arxiv.org/abs/1906.08867v1
PDF	https://arxiv.org/pdf/1906.08867v1.pdf
PWC	https://paperswithcode.com/paper/self-adaptive-potential-based-stopping
Repo
Framework

Automatically Evaluating Balance: A Machine Learning Approach


Title	Automatically Evaluating Balance: A Machine Learning Approach
Authors	Tian Bao, Brooke N. Klatt, Susan L. Whitney, Kathleen H. Sienko, Jenna Wiens
Abstract	Compared to in-clinic balance training, in-home training is not as effective. This is, in part, due to the lack of feedback from physical therapists (PTs). Here, we analyze the feasibility of using trunk sway data and machine learning (ML) techniques to automatically evaluate balance, providing accurate assessments outside of the clinic. We recruited sixteen participants to perform standing balance exercises. For each exercise, we recorded trunk sway data and had a PT rate balance performance on a scale of 1 to 5. The rating scale was adapted from the Functional Independence Measure. From the trunk sway data, we extracted a 61-dimensional feature vector representing performance of each exercise. Given these labeled data, we trained a multi-class support vector machine (SVM) to map trunk sway features to PT ratings. Evaluated in a leave-one-participant-out scheme, the model achieved a classification accuracy of 82%. Compared to participant self-assessment ratings, the SVM outputs were significantly closer to PT ratings. The results of this pilot study suggest that in the absence of PTs, ML techniques can provide accurate assessments during standing balance exercises. Such automated assessments could reduce PT consultation time and increase user compliance outside of the clinic.
Tasks
Published	2019-06-07
URL	https://arxiv.org/abs/1906.05657v1
PDF	https://arxiv.org/pdf/1906.05657v1.pdf
PWC	https://paperswithcode.com/paper/automatically-evaluating-balance-a-machine
Repo
Framework

MLPerf Training Benchmark


Title	MLPerf Training Benchmark
Authors	Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, Matei Zaharia
Abstract	Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits high variance, and software and hardware systems are so diverse that fair benchmarking with the same binary, code, and even hyperparameters is difficult. We therefore present MLPerf, an ML benchmark that overcomes these challenges. Our analysis quantitatively evaluates MLPerf’s efficacy at driving performance and scalability improvements across two rounds of results from multiple vendors.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01500v3
PDF	https://arxiv.org/pdf/1910.01500v3.pdf
PWC	https://paperswithcode.com/paper/mlperf-training-benchmark
Repo
Framework

Differentiable Scene Graphs


Title	Differentiable Scene Graphs
Authors	Moshiko Raboh, Roei Herzig, Gal Chechik, Jonathan Berant, Amir Globerson
Abstract	Reasoning about complex visual scenes involves perception of entities and their relations. Scene graphs provide a natural representation for reasoning tasks, by assigning labels to both entities (nodes) and relations (edges). Unfortunately, reasoning systems based on SGs are typically trained in a two-step procedure: First, training a model to predict SGs from images; Then, a separate model is created to reason based on predicted SGs. In many domains, it is preferable to train systems jointly in an end-to-end manner, but SGs are not commonly used as intermediate components in visual reasoning systems because being discrete and sparse, scene-graph representations are non-differentiable and difficult to optimize. Here we propose Differentiable Scene Graphs (DSGs), an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks. DSGs provide a dense representation for all regions and pairs of regions, and do not spend modelling capacity on areas of the images that do not contain objects or relations of interest. We evaluate our model on the challenging task of identifying referring relationships (RR) in three benchmark datasets, Visual Genome, VRD and CLEVR. We describe a multi-task objective, and train in an end-to-end manner supervised by the downstream RR task. Using DSGs as an intermediate representation leads to new state-of-the-art performance.
Tasks	Visual Reasoning
Published	2019-02-26
URL	https://arxiv.org/abs/1902.10200v5
PDF	https://arxiv.org/pdf/1902.10200v5.pdf
PWC	https://paperswithcode.com/paper/learning-latent-scene-graph-representations
Repo
Framework

An Extensive Review of Computational Dance Automation Techniques and Applications


Title	An Extensive Review of Computational Dance Automation Techniques and Applications
Authors	Manish Joshi, Sangeeta Jadhav
Abstract	Dance is an art and when technology meets this kind of art, it’s a novel attempt in itself. Several researchers have attempted to automate several aspects of dance, right from dance notation to choreography. Furthermore, we have encountered several applications of dance automation like e-learning, heritage preservation, etc. Despite several attempts by researchers for more than two decades in various styles of dance all round the world, we found a review paper that portrays the research status in this area dating to 1990 \cite{politis1990computers}. Hence, we decide to come up with a comprehensive review article that showcases several aspects of dance automation. This paper is an attempt to review research work reported in the literature, categorize and group all research work completed so far in the field of automating dance. We have explicitly identified six major categories corresponding to the use of computers in dance automation namely dance representation, dance capturing, dance semantics, dance generation, dance processing approaches and applications of dance automation systems. We classified several research papers under these categories according to their research approach and functionality. With the help of proposed categories and subcategories one can easily determine the state of research and the new avenues left for exploration in the field of dance automation.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00606v1
PDF	https://arxiv.org/pdf/1906.00606v1.pdf
PWC	https://paperswithcode.com/paper/190600606
Repo
Framework

Multi-Task Learning of Height and Semantics from Aerial Images


Title	Multi-Task Learning of Height and Semantics from Aerial Images
Authors	Marcela Carvalho, Bertrand Le Saux, Pauline Trouvé-Peloux, Frédéric Champagnat, Andrés Almansa
Abstract	Aerial or satellite imagery is a great source for land surface analysis, which might yield land use maps or elevation models. In this investigation, we present a neural network framework for learning semantics and local height together. We show how this joint multi-task learning benefits to each task on the large dataset of the 2018 Data Fusion Contest. Moreover, our framework also yields an uncertainty map which allows assessing the prediction of the model. Code is available at https://github.com/marcelampc/mtl_aerial_images .
Tasks	Multi-Task Learning
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07543v1
PDF	https://arxiv.org/pdf/1911.07543v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-of-height-and-semantics
Repo
Framework

RGait-NET: An Effective Network for Recovering Missing Information from Occluded Gait Cycles


Title	RGait-NET: An Effective Network for Recovering Missing Information from Occluded Gait Cycles
Authors	Dhritimaan Das, Ayush Agarwal, Pratik Chattopadhyay, Lipo Wang
Abstract	Gait of a person refers to his/her walking pattern, and according to medical studies gait of every individual is unique. Over the past decade, several computer vision-based gait recognition approaches have been proposed in which walking information corresponding to a complete gait cycle has been used to construct gait features for person identification. These methods compute gait features with the inherent assumption that a complete gait cycle is always available. However, in most public places occlusion is an inevitable occurrence, and due to this, only a fraction of a gait cycle gets captured by the monitoring camera. Unavailability of complete gait cycle information drastically affects the accuracy of the extracted features, and till date, only a few occlusion handling strategies to gait recognition have been proposed. But none of these performs reliably and robustly in the presence of a single cycle with incomplete information, and because of this practical application of gait recognition is quite limited. In this work, we develop deep learning-based algorithm to accurately identify the affected frames as well as predict the missing frames to reconstruct a complete gait cycle. While occlusion detection has been carried out by employing a VGG-16 model, the model for frame reconstruction is based on Long-Short Term Memory network that has been trained to optimize a multi-objective function based on dice coefficient and cross-entropy loss. The effectiveness of the proposed occlusion reconstruction algorithm is evaluated by computing the accuracy of the popular Gait Energy Feature on the reconstructed sequence. Experimental evaluation on public data sets and comparative analysis with other occlusion handling methods verify the effectiveness of our approach.
Tasks	Gait Recognition, Person Identification
Published	2019-12-14
URL	https://arxiv.org/abs/1912.06765v3
PDF	https://arxiv.org/pdf/1912.06765v3.pdf
PWC	https://paperswithcode.com/paper/rgait-net-an-effective-network-for-recovering
Repo
Framework

Reasoning about Qualitative Direction and Distance between Extended Objects using Answer Set Programming


Title	Reasoning about Qualitative Direction and Distance between Extended Objects using Answer Set Programming
Authors	Yusuf Izmirlioglu
Abstract	In this thesis, we introduce a novel formal framework to represent and reason about qualitative direction and distance relations between extended objects using Answer Set Programming (ASP). We take Cardinal Directional Calculus (CDC) as a starting point and extend CDC with new sorts of constraints which involve defaults, preferences and negation. We call this extended version as nCDC. Then we further extend nCDC by augmenting qualitative distance relation and name this extension as nCDC+. For CDC, nCDC, nCDC+, we introduce an ASP-based general framework to solve consistency checking problems, address composition and inversion of qualitative spatial relations, infer unknown or missing relations between objects, and find a suitable configuration of objects which fulfills a given inquiry.
Tasks
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08257v1
PDF	https://arxiv.org/pdf/1909.08257v1.pdf
PWC	https://paperswithcode.com/paper/reasoning-about-qualitative-direction-and
Repo
Framework

Neighborhood Watch: Representation Learning with Local-Margin Triplet Loss and Sampling Strategy for K-Nearest-Neighbor Image Classification


Title	Neighborhood Watch: Representation Learning with Local-Margin Triplet Loss and Sampling Strategy for K-Nearest-Neighbor Image Classification
Authors	Phawis Thammasorn, Daniel Hippe, Wanpracha Chaovalitwongse, Matthew Spraker, Landon Wootton, Matthew Nyflot, Stephanie Combs, Jan Peeken, Eric Ford
Abstract	Deep representation learning using triplet network for classification suffers from a lack of theoretical foundation and difficulty in tuning both the network and classifiers for performance. To address the problem, local-margin triplet loss along with local positive and negative mining strategy is proposed with theory on how the strategy integrate nearest-neighbor hyper-parameter with triplet learning to increase subsequent classification performance. Results in experiments with 2 public datasets, MNIST and Cifar-10, and 2 small medical image datasets demonstrate that proposed strategy outperforms end-to-end softmax and typical triplet loss in settings without data augmentation while maintaining utility of transferable feature for related tasks. The method serves as a good performance baseline where end-to-end methods encounter difficulties such as small sample data with limited allowable data augmentation.
Tasks	Data Augmentation, Image Classification, Representation Learning
Published	2019-10-28
URL	https://arxiv.org/abs/1911.07940v1
PDF	https://arxiv.org/pdf/1911.07940v1.pdf
PWC	https://paperswithcode.com/paper/neighborhood-watch-representation-learning
Repo
Framework

Biometric Recognition Using Deep Learning: A Survey


Title	Biometric Recognition Using Deep Learning: A Survey
Authors	Shervin Minaee, Amirali Abdolrashidi, Hang Su, Mohammed Bennamoun, David Zhang
Abstract	Deep learning-based models have been very successful in achieving state-of-the-art results in many of the computer vision, speech recognition, and natural language processing tasks in the last few years. These models seem a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airport security systems. Deep learning-based models have increasingly been leveraged to improve the accuracy of different biometric recognition systems in recent years. In this work, we provide a comprehensive survey of more than 120 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications. For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. We will then talk about several promising deep learning works developed for that biometric, and show their performance on popular public benchmarks. We will also discuss some of the main challenges while using these models for biometric recognition, and possible future directions to which research in this area is headed.
Tasks	Gait Recognition, Speech Recognition
Published	2019-11-30
URL	https://arxiv.org/abs/1912.00271v1
PDF	https://arxiv.org/pdf/1912.00271v1.pdf
PWC	https://paperswithcode.com/paper/biometric-recognition-using-deep-learning-a
Repo
Framework

Deep 1D-Convnet for accurate Parkinson disease detection and severity prediction from gait


Title	Deep 1D-Convnet for accurate Parkinson disease detection and severity prediction from gait
Authors	Imanne El Maachi, Guillaume-Alexandre Bilodeau, Wassim Bouachir
Abstract	Diagnosing Parkinson’s disease is a complex task that requires the evaluation of several motor and non-motor symptoms. During diagnosis, gait abnormalities are among the important symptoms that physicians should consider. However, gait evaluation is challenging and relies on the expertise and subjectivity of clinicians. In this context, the use of an intelligent gait analysis algorithm may assist physicians in order to facilitate the diagnosis process. This paper proposes a novel intelligent Parkinson detection system based on deep learning techniques to analyze gait information. We used 1D convolutional neural network (1D-Convnet) to build a Deep Neural Network (DNN) classifier. The proposed model processes 18 1D-signals coming from foot sensors measuring the vertical ground reaction force (VGRF). The first part of the network consists of 18 parallel 1D-Convnet corresponding to system inputs. The second part is a fully connected network that connects the concatenated outputs of the 1D-Convnets to obtain a final classification. We tested our algorithm in Parkinson’s detection and in the prediction of the severity of the disease with the Unified Parkinson’s Disease Rating Scale (UPDRS). Our experiments demonstrate the high efficiency of the proposed method in the detection of Parkinson disease based on gait data. The proposed algorithm achieved an accuracy of 98.7 %. To our knowledge, this is the state-of-the-start performance in Parkinson’s gait recognition. Furthermore, we achieved an accuracy of 85.3 % in Parkinson’s severity prediction. To the best of our knowledge, this is the first algorithm to perform a severity prediction based on the UPDRS. Our results show that the model is able to learn intrinsic characteristics from gait data and to generalize to unseen subjects, which could be helpful in a clinical diagnosis.
Tasks	Gait Recognition
Published	2019-10-25
URL	https://arxiv.org/abs/1910.11509v3
PDF	https://arxiv.org/pdf/1910.11509v3.pdf
PWC	https://paperswithcode.com/paper/deep-1d-convnet-for-accurate-parkinson
Repo
Framework

Generalized Tensor Models for Recurrent Neural Networks


Title	Generalized Tensor Models for Recurrent Neural Networks
Authors	Valentin Khrulkov, Oleksii Hrinchuk, Ivan Oseledets
Abstract	Recurrent Neural Networks (RNNs) are very successful at solving challenging problems with sequential data. However, this observed efficiency is not yet entirely explained by theory. It is known that a certain class of multiplicative RNNs enjoys the property of depth efficiency — a shallow network of exponentially large width is necessary to realize the same score function as computed by such an RNN. Such networks, however, are not very often applied to real life tasks. In this work, we attempt to reduce the gap between theory and practice by extending the theoretical analysis to RNNs which employ various nonlinearities, such as Rectified Linear Unit (ReLU), and show that they also benefit from properties of universality and depth efficiency. Our theoretical results are verified by a series of extensive computational experiments.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10801v1
PDF	http://arxiv.org/pdf/1901.10801v1.pdf
PWC	https://paperswithcode.com/paper/generalized-tensor-models-for-recurrent
Repo
Framework