Paper Group ANR 359
Towards Precise End-to-end Weakly Supervised Object Detection Network. Event-based attention and tracking on neuromorphic hardware. A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement. Self-adaptive Potential-based Stopping Criteria for Particle Swarm Optimization. Automatically Evaluating Balance: A Machine Lea …
Towards Precise End-to-end Weakly Supervised Object Detection Network
Title | Towards Precise End-to-end Weakly Supervised Object Detection Network |
Authors | Ke Yang, Dongsheng Li, Yong Dou |
Abstract | It is challenging for weakly supervised object detection network to precisely predict the positions of the objects, since there are no instance-level category annotations. Most existing methods tend to solve this problem by using a two-phase learning procedure, i.e., multiple instance learning detector followed by a fully supervised learning detector with bounding-box regression. Based on our observation, this procedure may lead to local minima for some object categories. In this paper, we propose to jointly train the two phases in an end-to-end manner to tackle this problem. Specifically, we design a single network with both multiple instance learning and bounding-box regression branches that share the same backbone. Meanwhile, a guided attention module using classification loss is added to the backbone for effectively extracting the implicit location information in the features. Experimental results on public datasets show that our method achieves state-of-the-art performance. |
Tasks | Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection |
Published | 2019-11-27 |
URL | https://arxiv.org/abs/1911.12148v1 |
https://arxiv.org/pdf/1911.12148v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-precise-end-to-end-weakly-supervised-1 |
Repo | |
Framework | |
Event-based attention and tracking on neuromorphic hardware
Title | Event-based attention and tracking on neuromorphic hardware |
Authors | Alpha Renner, Matthew Evanusa, Yulia Sandamirskaya |
Abstract | We present a fully event-driven vision and processing system for selective attention and tracking, realized on a neuromorphic processor Loihi interfaced to an event-based Dynamic Vision Sensor DAVIS. The attention mechanism is realized as a recurrent spiking neural network that implements attractor-dynamics of dynamic neural fields. We demonstrate capability of the system to create sustained activation that supports object tracking when distractors are present or when the object slows down or stops, reducing the number of generated events. |
Tasks | Object Tracking |
Published | 2019-07-09 |
URL | https://arxiv.org/abs/1907.04060v1 |
https://arxiv.org/pdf/1907.04060v1.pdf | |
PWC | https://paperswithcode.com/paper/event-based-attention-and-tracking-on |
Repo | |
Framework | |
A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement
Title | A Dual-Staged Context Aggregation Method Towards Efficient End-To-End Speech Enhancement |
Authors | Kai Zhen, Mi Suk Lee, Minje Kim |
Abstract | In speech enhancement, an end-to-end deep neural network converts a noisy speech signal to a clean speech directly in time domain without time-frequency transformation or mask estimation. However, aggregating contextual information from a high-resolution time domain signal with an affordable model complexity still remains challenging. In this paper, we propose a densely connected convolutional and recurrent network (DCCRN), a hybrid architecture, to enable dual-staged temporal context aggregation. With the dense connectivity and cross-component identical shortcut, DCCRN consistently outperforms competing convolutional baselines with an average STOI improvement of 0.23 and PESQ of 1.38 at three SNR levels. The proposed method is computationally efficient with only 1.38 million parameters. The generalizability performance on the unseen noise types is still decent considering its low complexity, although it is relatively weaker comparing to Wave-U-Net with 7.25 times more parameters. |
Tasks | Speech Enhancement |
Published | 2019-08-18 |
URL | https://arxiv.org/abs/1908.06468v4 |
https://arxiv.org/pdf/1908.06468v4.pdf | |
PWC | https://paperswithcode.com/paper/efficient-context-aggregation-for-end-to-end |
Repo | |
Framework | |
Self-adaptive Potential-based Stopping Criteria for Particle Swarm Optimization
Title | Self-adaptive Potential-based Stopping Criteria for Particle Swarm Optimization |
Authors | Bernd Bassimir, Manuel Schmitt, Rolf Wanka |
Abstract | We study the variant of Particle Swarm Optimization (PSO) that applies random velocities in a dimension instead of the regular velocity update equations as soon as the so-called potential of the swarm falls below a certain bound in this dimension, arbitrarily set by the user. In this case, the swarm performs a forced move. In this paper, we are interested in how, by counting the forced moves, the swarm can decide for itself to stop its movement because it is improbable to find better solution candidates as it already has found. We formally prove that when the swarm is close to a (local) optimum, it behaves like a blind-searching cloud, and that the frequency of forced moves exceeds a certain, objective function-independent value. Based on this observation, we define stopping criteria and evaluate them experimentally showing that good solution candidates can be found much faster than applying other criteria. |
Tasks | |
Published | 2019-05-29 |
URL | https://arxiv.org/abs/1906.08867v1 |
https://arxiv.org/pdf/1906.08867v1.pdf | |
PWC | https://paperswithcode.com/paper/self-adaptive-potential-based-stopping |
Repo | |
Framework | |
Automatically Evaluating Balance: A Machine Learning Approach
Title | Automatically Evaluating Balance: A Machine Learning Approach |
Authors | Tian Bao, Brooke N. Klatt, Susan L. Whitney, Kathleen H. Sienko, Jenna Wiens |
Abstract | Compared to in-clinic balance training, in-home training is not as effective. This is, in part, due to the lack of feedback from physical therapists (PTs). Here, we analyze the feasibility of using trunk sway data and machine learning (ML) techniques to automatically evaluate balance, providing accurate assessments outside of the clinic. We recruited sixteen participants to perform standing balance exercises. For each exercise, we recorded trunk sway data and had a PT rate balance performance on a scale of 1 to 5. The rating scale was adapted from the Functional Independence Measure. From the trunk sway data, we extracted a 61-dimensional feature vector representing performance of each exercise. Given these labeled data, we trained a multi-class support vector machine (SVM) to map trunk sway features to PT ratings. Evaluated in a leave-one-participant-out scheme, the model achieved a classification accuracy of 82%. Compared to participant self-assessment ratings, the SVM outputs were significantly closer to PT ratings. The results of this pilot study suggest that in the absence of PTs, ML techniques can provide accurate assessments during standing balance exercises. Such automated assessments could reduce PT consultation time and increase user compliance outside of the clinic. |
Tasks | |
Published | 2019-06-07 |
URL | https://arxiv.org/abs/1906.05657v1 |
https://arxiv.org/pdf/1906.05657v1.pdf | |
PWC | https://paperswithcode.com/paper/automatically-evaluating-balance-a-machine |
Repo | |
Framework | |
MLPerf Training Benchmark
Title | MLPerf Training Benchmark |
Authors | Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, Matei Zaharia |
Abstract | Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML. But ML training presents three unique benchmarking challenges absent from other domains: optimizations that improve training throughput can increase the time to solution, training is stochastic and time to solution exhibits high variance, and software and hardware systems are so diverse that fair benchmarking with the same binary, code, and even hyperparameters is difficult. We therefore present MLPerf, an ML benchmark that overcomes these challenges. Our analysis quantitatively evaluates MLPerf’s efficacy at driving performance and scalability improvements across two rounds of results from multiple vendors. |
Tasks | |
Published | 2019-10-02 |
URL | https://arxiv.org/abs/1910.01500v3 |
https://arxiv.org/pdf/1910.01500v3.pdf | |
PWC | https://paperswithcode.com/paper/mlperf-training-benchmark |
Repo | |
Framework | |
Differentiable Scene Graphs
Title | Differentiable Scene Graphs |
Authors | Moshiko Raboh, Roei Herzig, Gal Chechik, Jonathan Berant, Amir Globerson |
Abstract | Reasoning about complex visual scenes involves perception of entities and their relations. Scene graphs provide a natural representation for reasoning tasks, by assigning labels to both entities (nodes) and relations (edges). Unfortunately, reasoning systems based on SGs are typically trained in a two-step procedure: First, training a model to predict SGs from images; Then, a separate model is created to reason based on predicted SGs. In many domains, it is preferable to train systems jointly in an end-to-end manner, but SGs are not commonly used as intermediate components in visual reasoning systems because being discrete and sparse, scene-graph representations are non-differentiable and difficult to optimize. Here we propose Differentiable Scene Graphs (DSGs), an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks. DSGs provide a dense representation for all regions and pairs of regions, and do not spend modelling capacity on areas of the images that do not contain objects or relations of interest. We evaluate our model on the challenging task of identifying referring relationships (RR) in three benchmark datasets, Visual Genome, VRD and CLEVR. We describe a multi-task objective, and train in an end-to-end manner supervised by the downstream RR task. Using DSGs as an intermediate representation leads to new state-of-the-art performance. |
Tasks | Visual Reasoning |
Published | 2019-02-26 |
URL | https://arxiv.org/abs/1902.10200v5 |
https://arxiv.org/pdf/1902.10200v5.pdf | |
PWC | https://paperswithcode.com/paper/learning-latent-scene-graph-representations |
Repo | |
Framework | |
An Extensive Review of Computational Dance Automation Techniques and Applications
Title | An Extensive Review of Computational Dance Automation Techniques and Applications |
Authors | Manish Joshi, Sangeeta Jadhav |
Abstract | Dance is an art and when technology meets this kind of art, it’s a novel attempt in itself. Several researchers have attempted to automate several aspects of dance, right from dance notation to choreography. Furthermore, we have encountered several applications of dance automation like e-learning, heritage preservation, etc. Despite several attempts by researchers for more than two decades in various styles of dance all round the world, we found a review paper that portrays the research status in this area dating to 1990 \cite{politis1990computers}. Hence, we decide to come up with a comprehensive review article that showcases several aspects of dance automation. This paper is an attempt to review research work reported in the literature, categorize and group all research work completed so far in the field of automating dance. We have explicitly identified six major categories corresponding to the use of computers in dance automation namely dance representation, dance capturing, dance semantics, dance generation, dance processing approaches and applications of dance automation systems. We classified several research papers under these categories according to their research approach and functionality. With the help of proposed categories and subcategories one can easily determine the state of research and the new avenues left for exploration in the field of dance automation. |
Tasks | |
Published | 2019-06-03 |
URL | https://arxiv.org/abs/1906.00606v1 |
https://arxiv.org/pdf/1906.00606v1.pdf | |
PWC | https://paperswithcode.com/paper/190600606 |
Repo | |
Framework | |
Multi-Task Learning of Height and Semantics from Aerial Images
Title | Multi-Task Learning of Height and Semantics from Aerial Images |
Authors | Marcela Carvalho, Bertrand Le Saux, Pauline Trouvé-Peloux, Frédéric Champagnat, Andrés Almansa |
Abstract | Aerial or satellite imagery is a great source for land surface analysis, which might yield land use maps or elevation models. In this investigation, we present a neural network framework for learning semantics and local height together. We show how this joint multi-task learning benefits to each task on the large dataset of the 2018 Data Fusion Contest. Moreover, our framework also yields an uncertainty map which allows assessing the prediction of the model. Code is available at https://github.com/marcelampc/mtl_aerial_images . |
Tasks | Multi-Task Learning |
Published | 2019-11-18 |
URL | https://arxiv.org/abs/1911.07543v1 |
https://arxiv.org/pdf/1911.07543v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-task-learning-of-height-and-semantics |
Repo | |
Framework | |
RGait-NET: An Effective Network for Recovering Missing Information from Occluded Gait Cycles
Title | RGait-NET: An Effective Network for Recovering Missing Information from Occluded Gait Cycles |
Authors | Dhritimaan Das, Ayush Agarwal, Pratik Chattopadhyay, Lipo Wang |
Abstract | Gait of a person refers to his/her walking pattern, and according to medical studies gait of every individual is unique. Over the past decade, several computer vision-based gait recognition approaches have been proposed in which walking information corresponding to a complete gait cycle has been used to construct gait features for person identification. These methods compute gait features with the inherent assumption that a complete gait cycle is always available. However, in most public places occlusion is an inevitable occurrence, and due to this, only a fraction of a gait cycle gets captured by the monitoring camera. Unavailability of complete gait cycle information drastically affects the accuracy of the extracted features, and till date, only a few occlusion handling strategies to gait recognition have been proposed. But none of these performs reliably and robustly in the presence of a single cycle with incomplete information, and because of this practical application of gait recognition is quite limited. In this work, we develop deep learning-based algorithm to accurately identify the affected frames as well as predict the missing frames to reconstruct a complete gait cycle. While occlusion detection has been carried out by employing a VGG-16 model, the model for frame reconstruction is based on Long-Short Term Memory network that has been trained to optimize a multi-objective function based on dice coefficient and cross-entropy loss. The effectiveness of the proposed occlusion reconstruction algorithm is evaluated by computing the accuracy of the popular Gait Energy Feature on the reconstructed sequence. Experimental evaluation on public data sets and comparative analysis with other occlusion handling methods verify the effectiveness of our approach. |
Tasks | Gait Recognition, Person Identification |
Published | 2019-12-14 |
URL | https://arxiv.org/abs/1912.06765v3 |
https://arxiv.org/pdf/1912.06765v3.pdf | |
PWC | https://paperswithcode.com/paper/rgait-net-an-effective-network-for-recovering |
Repo | |
Framework | |
Reasoning about Qualitative Direction and Distance between Extended Objects using Answer Set Programming
Title | Reasoning about Qualitative Direction and Distance between Extended Objects using Answer Set Programming |
Authors | Yusuf Izmirlioglu |
Abstract | In this thesis, we introduce a novel formal framework to represent and reason about qualitative direction and distance relations between extended objects using Answer Set Programming (ASP). We take Cardinal Directional Calculus (CDC) as a starting point and extend CDC with new sorts of constraints which involve defaults, preferences and negation. We call this extended version as nCDC. Then we further extend nCDC by augmenting qualitative distance relation and name this extension as nCDC+. For CDC, nCDC, nCDC+, we introduce an ASP-based general framework to solve consistency checking problems, address composition and inversion of qualitative spatial relations, infer unknown or missing relations between objects, and find a suitable configuration of objects which fulfills a given inquiry. |
Tasks | |
Published | 2019-09-18 |
URL | https://arxiv.org/abs/1909.08257v1 |
https://arxiv.org/pdf/1909.08257v1.pdf | |
PWC | https://paperswithcode.com/paper/reasoning-about-qualitative-direction-and |
Repo | |
Framework | |
Neighborhood Watch: Representation Learning with Local-Margin Triplet Loss and Sampling Strategy for K-Nearest-Neighbor Image Classification
Title | Neighborhood Watch: Representation Learning with Local-Margin Triplet Loss and Sampling Strategy for K-Nearest-Neighbor Image Classification |
Authors | Phawis Thammasorn, Daniel Hippe, Wanpracha Chaovalitwongse, Matthew Spraker, Landon Wootton, Matthew Nyflot, Stephanie Combs, Jan Peeken, Eric Ford |
Abstract | Deep representation learning using triplet network for classification suffers from a lack of theoretical foundation and difficulty in tuning both the network and classifiers for performance. To address the problem, local-margin triplet loss along with local positive and negative mining strategy is proposed with theory on how the strategy integrate nearest-neighbor hyper-parameter with triplet learning to increase subsequent classification performance. Results in experiments with 2 public datasets, MNIST and Cifar-10, and 2 small medical image datasets demonstrate that proposed strategy outperforms end-to-end softmax and typical triplet loss in settings without data augmentation while maintaining utility of transferable feature for related tasks. The method serves as a good performance baseline where end-to-end methods encounter difficulties such as small sample data with limited allowable data augmentation. |
Tasks | Data Augmentation, Image Classification, Representation Learning |
Published | 2019-10-28 |
URL | https://arxiv.org/abs/1911.07940v1 |
https://arxiv.org/pdf/1911.07940v1.pdf | |
PWC | https://paperswithcode.com/paper/neighborhood-watch-representation-learning |
Repo | |
Framework | |
Biometric Recognition Using Deep Learning: A Survey
Title | Biometric Recognition Using Deep Learning: A Survey |
Authors | Shervin Minaee, Amirali Abdolrashidi, Hang Su, Mohammed Bennamoun, David Zhang |
Abstract | Deep learning-based models have been very successful in achieving state-of-the-art results in many of the computer vision, speech recognition, and natural language processing tasks in the last few years. These models seem a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airport security systems. Deep learning-based models have increasingly been leveraged to improve the accuracy of different biometric recognition systems in recent years. In this work, we provide a comprehensive survey of more than 120 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications. For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. We will then talk about several promising deep learning works developed for that biometric, and show their performance on popular public benchmarks. We will also discuss some of the main challenges while using these models for biometric recognition, and possible future directions to which research in this area is headed. |
Tasks | Gait Recognition, Speech Recognition |
Published | 2019-11-30 |
URL | https://arxiv.org/abs/1912.00271v1 |
https://arxiv.org/pdf/1912.00271v1.pdf | |
PWC | https://paperswithcode.com/paper/biometric-recognition-using-deep-learning-a |
Repo | |
Framework | |
Deep 1D-Convnet for accurate Parkinson disease detection and severity prediction from gait
Title | Deep 1D-Convnet for accurate Parkinson disease detection and severity prediction from gait |
Authors | Imanne El Maachi, Guillaume-Alexandre Bilodeau, Wassim Bouachir |
Abstract | Diagnosing Parkinson’s disease is a complex task that requires the evaluation of several motor and non-motor symptoms. During diagnosis, gait abnormalities are among the important symptoms that physicians should consider. However, gait evaluation is challenging and relies on the expertise and subjectivity of clinicians. In this context, the use of an intelligent gait analysis algorithm may assist physicians in order to facilitate the diagnosis process. This paper proposes a novel intelligent Parkinson detection system based on deep learning techniques to analyze gait information. We used 1D convolutional neural network (1D-Convnet) to build a Deep Neural Network (DNN) classifier. The proposed model processes 18 1D-signals coming from foot sensors measuring the vertical ground reaction force (VGRF). The first part of the network consists of 18 parallel 1D-Convnet corresponding to system inputs. The second part is a fully connected network that connects the concatenated outputs of the 1D-Convnets to obtain a final classification. We tested our algorithm in Parkinson’s detection and in the prediction of the severity of the disease with the Unified Parkinson’s Disease Rating Scale (UPDRS). Our experiments demonstrate the high efficiency of the proposed method in the detection of Parkinson disease based on gait data. The proposed algorithm achieved an accuracy of 98.7 %. To our knowledge, this is the state-of-the-start performance in Parkinson’s gait recognition. Furthermore, we achieved an accuracy of 85.3 % in Parkinson’s severity prediction. To the best of our knowledge, this is the first algorithm to perform a severity prediction based on the UPDRS. Our results show that the model is able to learn intrinsic characteristics from gait data and to generalize to unseen subjects, which could be helpful in a clinical diagnosis. |
Tasks | Gait Recognition |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11509v3 |
https://arxiv.org/pdf/1910.11509v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-1d-convnet-for-accurate-parkinson |
Repo | |
Framework | |
Generalized Tensor Models for Recurrent Neural Networks
Title | Generalized Tensor Models for Recurrent Neural Networks |
Authors | Valentin Khrulkov, Oleksii Hrinchuk, Ivan Oseledets |
Abstract | Recurrent Neural Networks (RNNs) are very successful at solving challenging problems with sequential data. However, this observed efficiency is not yet entirely explained by theory. It is known that a certain class of multiplicative RNNs enjoys the property of depth efficiency — a shallow network of exponentially large width is necessary to realize the same score function as computed by such an RNN. Such networks, however, are not very often applied to real life tasks. In this work, we attempt to reduce the gap between theory and practice by extending the theoretical analysis to RNNs which employ various nonlinearities, such as Rectified Linear Unit (ReLU), and show that they also benefit from properties of universality and depth efficiency. Our theoretical results are verified by a series of extensive computational experiments. |
Tasks | |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10801v1 |
http://arxiv.org/pdf/1901.10801v1.pdf | |
PWC | https://paperswithcode.com/paper/generalized-tensor-models-for-recurrent |
Repo | |
Framework | |