July 29, 2019

3109 words 15 mins read

Paper Group ANR 25

Paper Group ANR 25

Deep Detection of People and their Mobility Aids for a Hospital Robot. Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques. Learning to Learn from Weak Supervision by Full Supervision. Learning Relevant Features of Data with Multi-scale Tensor Networks. Exploiting the potential of unlabeled end …

Deep Detection of People and their Mobility Aids for a Hospital Robot

Title Deep Detection of People and their Mobility Aids for a Hospital Robot
Authors Andres Vasquez, Marina Kollmitz, Andreas Eitel, Wolfram Burgard
Abstract Robots operating in populated environments encounter many different types of people, some of whom might have an advanced need for cautious interaction, because of physical impairments or their advanced age. Robots therefore need to recognize such advanced demands to provide appropriate assistance, guidance or other forms of support. In this paper, we propose a depth-based perception pipeline that estimates the position and velocity of people in the environment and categorizes them according to the mobility aids they use: pedestrian, person in wheelchair, person in a wheelchair with a person pushing them, person with crutches and person using a walker. We present a fast region proposal method that feeds a Region-based Convolutional Network (Fast R-CNN). With this, we speed up the object detection process by a factor of seven compared to a dense sliding window approach. We furthermore propose a probabilistic position, velocity and class estimator to smooth the CNN’s detections and account for occlusions and misclassifications. In addition, we introduce a new hospital dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm that our pipeline successfully keeps track of people and their mobility aids, even in challenging situations with multiple people from different categories and frequent occlusions. Videos of our experiments and the dataset are available at http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAids
Tasks Object Detection
Published 2017-08-02
URL http://arxiv.org/abs/1708.00674v1
PDF http://arxiv.org/pdf/1708.00674v1.pdf
PWC https://paperswithcode.com/paper/deep-detection-of-people-and-their-mobility
Repo
Framework

Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques

Title Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques
Authors Tache Irina Andra
Abstract This article deals with the problem of vessel edge and centerline detection using classical image processing techniques due to their simpleness and easiness to be implemented. The method is divided into four steps: the vessel enhancement which implies a non-linear filtering proposed by Frangi, the thresholding using Otsu method and the contour detection using the Canny edge detector due to its good performances for the small vessels and the morphological skeletonisation. The algorithms are tested on real data collected from a cardiac catheterism laboratory and it is accurate for images with good spatial resolution (512*512). The output image can be used for further processing in order to find the vessel length or its radius.
Tasks Contour Detection
Published 2017-06-13
URL http://arxiv.org/abs/1707.03710v1
PDF http://arxiv.org/pdf/1707.03710v1.pdf
PWC https://paperswithcode.com/paper/contour-and-centreline-tracking-of-vessels
Repo
Framework

Learning to Learn from Weak Supervision by Full Supervision

Title Learning to Learn from Weak Supervision by Full Supervision
Authors Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps
Abstract In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model.
Tasks
Published 2017-11-30
URL http://arxiv.org/abs/1711.11383v1
PDF http://arxiv.org/pdf/1711.11383v1.pdf
PWC https://paperswithcode.com/paper/learning-to-learn-from-weak-supervision-by
Repo
Framework

Learning Relevant Features of Data with Multi-scale Tensor Networks

Title Learning Relevant Features of Data with Multi-scale Tensor Networks
Authors E. M. Stoudenmire
Abstract Inspired by coarse-graining approaches used in physics, we show how similar algorithms can be adapted for data. The resulting algorithms are based on layered tree tensor networks and scale linearly with both the dimension of the input and the training set size. Computing most of the layers with an unsupervised algorithm, then optimizing just the top layer for supervised classification of the MNIST and fashion-MNIST data sets gives very good results. We also discuss mixing a prior guess for supervised weights together with an unsupervised representation of the data, yielding a smaller number of features nevertheless able to give good performance.
Tasks Tensor Networks
Published 2017-12-31
URL http://arxiv.org/abs/1801.00315v1
PDF http://arxiv.org/pdf/1801.00315v1.pdf
PWC https://paperswithcode.com/paper/learning-relevant-features-of-data-with-multi
Repo
Framework

Exploiting the potential of unlabeled endoscopic video data with self-supervised learning

Title Exploiting the potential of unlabeled endoscopic video data with self-supervised learning
Authors Tobias Ross, David Zimmerer, Anant Vemuri, Fabian Isensee, Manuel Wiesenfarth, Sebastian Bodenstedt, Fabian Both, Philip Kessler, Martin Wagner, Beat Müller, Hannes Kenngott, Stefanie Speidel, Annette Kopp-Schneider, Klaus Maier-Hein, Lena Maier-Hein
Abstract Surgical data science is a new research field that aims to observe all aspects of the patient treatment process in order to provide the right assistance at the right time. Due to the breakthrough successes of deep learning-based solutions for automatic image annotation, the availability of reference annotations for algorithm training is becoming a major bottleneck in the field. The purpose of this paper was to investigate the concept of self-supervised learning to address this issue. Our approach is guided by the hypothesis that unlabeled video data can be used to learn a representation of the target domain that boosts the performance of state-of-the-art machine learning algorithms when used for pre-training. Core of the method is an auxiliary task based on raw endoscopic video data of the target domain that is used to initialize the convolutional neural network (CNN) for the target task. In this paper, we propose the re-colorization of medical images with a generative adversarial network (GAN)-based architecture as auxiliary task. A variant of the method involves a second pre-training step based on labeled data for the target task from a related domain. We validate both variants using medical instrument segmentation as target task. The proposed approach can be used to radically reduce the manual annotation effort involved in training CNNs. Compared to the baseline approach of generating annotated data from scratch, our method decreases exploratively the number of labeled images by up to 75% without sacrificing performance. Our method also outperforms alternative methods for CNN pre-training, such as pre-training on publicly available non-medical or medical data using the target task (in this instance: segmentation). As it makes efficient use of available (non-)public and (un-)labeled data, the approach has the potential to become a valuable tool for CNN (pre-)training.
Tasks Colorization, Instance Segmentation, Semantic Segmentation
Published 2017-11-27
URL http://arxiv.org/abs/1711.09726v3
PDF http://arxiv.org/pdf/1711.09726v3.pdf
PWC https://paperswithcode.com/paper/exploiting-the-potential-of-unlabeled
Repo
Framework

Deep supervised learning using local errors

Title Deep supervised learning using local errors
Authors Hesham Mostafa, Vishwajith Ramesh, Gert Cauwenberghs
Abstract Error backpropagation is a highly effective mechanism for learning high-quality hierarchical features in deep networks. Updating the features or weights in one layer, however, requires waiting for the propagation of error signals from higher layers. Learning using delayed and non-local errors makes it hard to reconcile backpropagation with the learning mechanisms observed in biological neural networks as it requires the neurons to maintain a memory of the input long enough until the higher-layer errors arrive. In this paper, we propose an alternative learning mechanism where errors are generated locally in each layer using fixed, random auxiliary classifiers. Lower layers could thus be trained independently of higher layers and training could either proceed layer by layer, or simultaneously in all layers using local error information. We address biological plausibility concerns such as weight symmetry requirements and show that the proposed learning mechanism based on fixed, broad, and random tuning of each neuron to the classification categories outperforms the biologically-motivated feedback alignment learning technique on the MNIST, CIFAR10, and SVHN datasets, approaching the performance of standard backpropagation. Our approach highlights a potential biological mechanism for the supervised, or task-dependent, learning of feature hierarchies. In addition, we show that it is well suited for learning deep networks in custom hardware where it can drastically reduce memory traffic and data communication overheads.
Tasks
Published 2017-11-17
URL http://arxiv.org/abs/1711.06756v1
PDF http://arxiv.org/pdf/1711.06756v1.pdf
PWC https://paperswithcode.com/paper/deep-supervised-learning-using-local-errors
Repo
Framework

MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings

Title MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings
Authors Arijit Biswas, Mukul Bhutani, Subhajit Sanyal
Abstract E-commerce websites such as Amazon, Alibaba, Flipkart, and Walmart sell billions of products. Machine learning (ML) algorithms involving products are often used to improve the customer experience and increase revenue, e.g., product similarity, recommendation, and price estimation. The products are required to be represented as features before training an ML algorithm. In this paper, we propose an approach called MRNet-Product2Vec for creating generic embeddings of products within an e-commerce ecosystem. We learn a dense and low-dimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multi-task Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted. The task set includes several intrinsic characteristics about a product such as price, weight, size, color, popularity, and material. We evaluate the proposed embedding quantitatively and qualitatively. We demonstrate that they are almost as good as sparse and extremely high-dimensional TF-IDF representation in spite of having less than 3% of the TF-IDF dimension. We also use a multimodal autoencoder for comparing products from different language-regions and show preliminary yet promising qualitative results.
Tasks
Published 2017-09-21
URL http://arxiv.org/abs/1709.07534v1
PDF http://arxiv.org/pdf/1709.07534v1.pdf
PWC https://paperswithcode.com/paper/mrnet-product2vec-a-multi-task-recurrent
Repo
Framework

3D Morphable Models as Spatial Transformer Networks

Title 3D Morphable Models as Spatial Transformer Networks
Authors Anil Bas, Patrik Huber, William A. P. Smith, Muhammad Awais, Josef Kittler
Abstract In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes.
Tasks
Published 2017-08-23
URL http://arxiv.org/abs/1708.07199v1
PDF http://arxiv.org/pdf/1708.07199v1.pdf
PWC https://paperswithcode.com/paper/3d-morphable-models-as-spatial-transformer
Repo
Framework

Visual Question Answering as a Meta Learning Task

Title Visual Question Answering as a Meta Learning Task
Authors Damien Teney, Anton van den Hengel
Abstract The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. At test time, the method is provided with a support set of example questions/answers, over which it reasons to resolve the given question. The support set is not fixed and can be extended without retraining, thereby expanding the capabilities of the model. To exploit this dynamically provided information, we adapt a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and meta networks. Experiments demonstrate the capability of the system to learn to produce completely novel answers (i.e. never seen during training) from examples provided at test time. In comparison to the existing state of the art, the proposed method produces qualitatively distinct results with higher recall of rare answers, and a better sample efficiency that allows training with little initial data. More importantly, it represents an important step towards vision-and-language methods that can learn and reason on-the-fly.
Tasks Meta-Learning, Question Answering, Visual Question Answering
Published 2017-11-22
URL http://arxiv.org/abs/1711.08105v1
PDF http://arxiv.org/pdf/1711.08105v1.pdf
PWC https://paperswithcode.com/paper/visual-question-answering-as-a-meta-learning
Repo
Framework

Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions

Title Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Authors Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà
Abstract We present a practical approach for processing mobile sensor time series data for continual deep learning predictions. The approach comprises data cleaning, normalization, capping, time-based compression, and finally classification with a recurrent neural network. We demonstrate the effectiveness of the approach in a case study with 279 participants. On the basis of sparse sensor events, the network continually predicts whether the participants would attend to a notification within 10 minutes. Compared to a random baseline, the classifier achieves a 40% performance increase (AUC of 0.702) on a withheld test set. This approach allows to forgo resource-intensive, domain-specific, error-prone feature engineering, which may drastically increase the applicability of machine learning to mobile phone sensor data.
Tasks Feature Engineering, Time Series
Published 2017-05-17
URL http://arxiv.org/abs/1705.06224v1
PDF http://arxiv.org/pdf/1705.06224v1.pdf
PWC https://paperswithcode.com/paper/practical-processing-of-mobile-sensor-data
Repo
Framework

DNN adaptation by automatic quality estimation of ASR hypotheses

Title DNN adaptation by automatic quality estimation of ASR hypotheses
Authors Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi
Abstract In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of “good quality” instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the “true” sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores “predicted” by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.
Tasks
Published 2017-02-06
URL http://arxiv.org/abs/1702.01714v1
PDF http://arxiv.org/pdf/1702.01714v1.pdf
PWC https://paperswithcode.com/paper/dnn-adaptation-by-automatic-quality
Repo
Framework

Improved Linear Embeddings via Lagrange Duality

Title Improved Linear Embeddings via Lagrange Duality
Authors Kshiteej Sheth, Dinesh Garg, Anirban Dasgupta
Abstract Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.
Tasks
Published 2017-11-30
URL http://arxiv.org/abs/1711.11527v2
PDF http://arxiv.org/pdf/1711.11527v2.pdf
PWC https://paperswithcode.com/paper/improved-linear-embeddings-via-lagrange
Repo
Framework

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

Title Distributed Statistical Estimation and Rates of Convergence in Normal Approximation
Authors Stanislav Minsker, Nate Strawn
Abstract This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, as well as provide performance guarantees for distributed maximum likelihood estimation.
Tasks
Published 2017-04-09
URL http://arxiv.org/abs/1704.02658v3
PDF http://arxiv.org/pdf/1704.02658v3.pdf
PWC https://paperswithcode.com/paper/distributed-statistical-estimation-and-rates
Repo
Framework

Lecture video indexing using boosted margin maximizing neural networks

Title Lecture video indexing using boosted margin maximizing neural networks
Authors Di Ma, Xi Zhang, Xu Ouyang, Gady Agam
Abstract This paper presents a novel approach for lecture video indexing using a boosted deep convolutional neural network system. The indexing is performed by matching high quality slide images, for which text is either known or extracted, to lower resolution video frames with possible noise, perspective distortion, and occlusions. We propose a deep neural network integrated with a boosting framework composed of two sub-networks targeting feature extraction and similarity determination to perform the matching. The trained network is given as input a pair of slide image and a candidate video frame image and produces the similarity between them. A boosting framework is integrated into our proposed network during the training process. Experimental results show that the proposed approach is much more capable of handling occlusion, spatial transformations, and other types of noises when compared with known approaches.
Tasks
Published 2017-12-02
URL http://arxiv.org/abs/1712.00575v1
PDF http://arxiv.org/pdf/1712.00575v1.pdf
PWC https://paperswithcode.com/paper/lecture-video-indexing-using-boosted-margin
Repo
Framework

Benchmark Environments for Multitask Learning in Continuous Domains

Title Benchmark Environments for Multitask Learning in Continuous Domains
Authors Peter Henderson, Wei-Di Chang, Florian Shkurti, Johanna Hansen, David Meger, Gregory Dudek
Abstract As demand drives systems to generalize to various domains and problems, the study of multitask, transfer and lifelong learning has become an increasingly important pursuit. In discrete domains, performance on the Atari game suite has emerged as the de facto benchmark for assessing multitask learning. However, in continuous domains there is a lack of agreement on standard multitask evaluation environments which makes it difficult to compare different approaches fairly. In this work, we describe a benchmark set of tasks that we have developed in an extendable framework based on OpenAI Gym. We run a simple baseline using Trust Region Policy Optimization and release the framework publicly to be expanded and used for the systematic comparison of multitask, transfer, and lifelong learning in continuous domains.
Tasks
Published 2017-08-14
URL http://arxiv.org/abs/1708.04352v1
PDF http://arxiv.org/pdf/1708.04352v1.pdf
PWC https://paperswithcode.com/paper/benchmark-environments-for-multitask-learning
Repo
Framework
comments powered by Disqus