Paper Group ANR 25
Deep Detection of People and their Mobility Aids for a Hospital Robot. Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques. Learning to Learn from Weak Supervision by Full Supervision. Learning Relevant Features of Data with Multi-scale Tensor Networks. Exploiting the potential of unlabeled end …
Deep Detection of People and their Mobility Aids for a Hospital Robot
Title | Deep Detection of People and their Mobility Aids for a Hospital Robot |
Authors | Andres Vasquez, Marina Kollmitz, Andreas Eitel, Wolfram Burgard |
Abstract | Robots operating in populated environments encounter many different types of people, some of whom might have an advanced need for cautious interaction, because of physical impairments or their advanced age. Robots therefore need to recognize such advanced demands to provide appropriate assistance, guidance or other forms of support. In this paper, we propose a depth-based perception pipeline that estimates the position and velocity of people in the environment and categorizes them according to the mobility aids they use: pedestrian, person in wheelchair, person in a wheelchair with a person pushing them, person with crutches and person using a walker. We present a fast region proposal method that feeds a Region-based Convolutional Network (Fast R-CNN). With this, we speed up the object detection process by a factor of seven compared to a dense sliding window approach. We furthermore propose a probabilistic position, velocity and class estimator to smooth the CNN’s detections and account for occlusions and misclassifications. In addition, we introduce a new hospital dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm that our pipeline successfully keeps track of people and their mobility aids, even in challenging situations with multiple people from different categories and frequent occlusions. Videos of our experiments and the dataset are available at http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAids |
Tasks | Object Detection |
Published | 2017-08-02 |
URL | http://arxiv.org/abs/1708.00674v1 |
http://arxiv.org/pdf/1708.00674v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-detection-of-people-and-their-mobility |
Repo | |
Framework | |
Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques
Title | Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques |
Authors | Tache Irina Andra |
Abstract | This article deals with the problem of vessel edge and centerline detection using classical image processing techniques due to their simpleness and easiness to be implemented. The method is divided into four steps: the vessel enhancement which implies a non-linear filtering proposed by Frangi, the thresholding using Otsu method and the contour detection using the Canny edge detector due to its good performances for the small vessels and the morphological skeletonisation. The algorithms are tested on real data collected from a cardiac catheterism laboratory and it is accurate for images with good spatial resolution (512*512). The output image can be used for further processing in order to find the vessel length or its radius. |
Tasks | Contour Detection |
Published | 2017-06-13 |
URL | http://arxiv.org/abs/1707.03710v1 |
http://arxiv.org/pdf/1707.03710v1.pdf | |
PWC | https://paperswithcode.com/paper/contour-and-centreline-tracking-of-vessels |
Repo | |
Framework | |
Learning to Learn from Weak Supervision by Full Supervision
Title | Learning to Learn from Weak Supervision by Full Supervision |
Authors | Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps |
Abstract | In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. |
Tasks | |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11383v1 |
http://arxiv.org/pdf/1711.11383v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-learn-from-weak-supervision-by |
Repo | |
Framework | |
Learning Relevant Features of Data with Multi-scale Tensor Networks
Title | Learning Relevant Features of Data with Multi-scale Tensor Networks |
Authors | E. M. Stoudenmire |
Abstract | Inspired by coarse-graining approaches used in physics, we show how similar algorithms can be adapted for data. The resulting algorithms are based on layered tree tensor networks and scale linearly with both the dimension of the input and the training set size. Computing most of the layers with an unsupervised algorithm, then optimizing just the top layer for supervised classification of the MNIST and fashion-MNIST data sets gives very good results. We also discuss mixing a prior guess for supervised weights together with an unsupervised representation of the data, yielding a smaller number of features nevertheless able to give good performance. |
Tasks | Tensor Networks |
Published | 2017-12-31 |
URL | http://arxiv.org/abs/1801.00315v1 |
http://arxiv.org/pdf/1801.00315v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-relevant-features-of-data-with-multi |
Repo | |
Framework | |
Exploiting the potential of unlabeled endoscopic video data with self-supervised learning
Title | Exploiting the potential of unlabeled endoscopic video data with self-supervised learning |
Authors | Tobias Ross, David Zimmerer, Anant Vemuri, Fabian Isensee, Manuel Wiesenfarth, Sebastian Bodenstedt, Fabian Both, Philip Kessler, Martin Wagner, Beat Müller, Hannes Kenngott, Stefanie Speidel, Annette Kopp-Schneider, Klaus Maier-Hein, Lena Maier-Hein |
Abstract | Surgical data science is a new research field that aims to observe all aspects of the patient treatment process in order to provide the right assistance at the right time. Due to the breakthrough successes of deep learning-based solutions for automatic image annotation, the availability of reference annotations for algorithm training is becoming a major bottleneck in the field. The purpose of this paper was to investigate the concept of self-supervised learning to address this issue. Our approach is guided by the hypothesis that unlabeled video data can be used to learn a representation of the target domain that boosts the performance of state-of-the-art machine learning algorithms when used for pre-training. Core of the method is an auxiliary task based on raw endoscopic video data of the target domain that is used to initialize the convolutional neural network (CNN) for the target task. In this paper, we propose the re-colorization of medical images with a generative adversarial network (GAN)-based architecture as auxiliary task. A variant of the method involves a second pre-training step based on labeled data for the target task from a related domain. We validate both variants using medical instrument segmentation as target task. The proposed approach can be used to radically reduce the manual annotation effort involved in training CNNs. Compared to the baseline approach of generating annotated data from scratch, our method decreases exploratively the number of labeled images by up to 75% without sacrificing performance. Our method also outperforms alternative methods for CNN pre-training, such as pre-training on publicly available non-medical or medical data using the target task (in this instance: segmentation). As it makes efficient use of available (non-)public and (un-)labeled data, the approach has the potential to become a valuable tool for CNN (pre-)training. |
Tasks | Colorization, Instance Segmentation, Semantic Segmentation |
Published | 2017-11-27 |
URL | http://arxiv.org/abs/1711.09726v3 |
http://arxiv.org/pdf/1711.09726v3.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-the-potential-of-unlabeled |
Repo | |
Framework | |
Deep supervised learning using local errors
Title | Deep supervised learning using local errors |
Authors | Hesham Mostafa, Vishwajith Ramesh, Gert Cauwenberghs |
Abstract | Error backpropagation is a highly effective mechanism for learning high-quality hierarchical features in deep networks. Updating the features or weights in one layer, however, requires waiting for the propagation of error signals from higher layers. Learning using delayed and non-local errors makes it hard to reconcile backpropagation with the learning mechanisms observed in biological neural networks as it requires the neurons to maintain a memory of the input long enough until the higher-layer errors arrive. In this paper, we propose an alternative learning mechanism where errors are generated locally in each layer using fixed, random auxiliary classifiers. Lower layers could thus be trained independently of higher layers and training could either proceed layer by layer, or simultaneously in all layers using local error information. We address biological plausibility concerns such as weight symmetry requirements and show that the proposed learning mechanism based on fixed, broad, and random tuning of each neuron to the classification categories outperforms the biologically-motivated feedback alignment learning technique on the MNIST, CIFAR10, and SVHN datasets, approaching the performance of standard backpropagation. Our approach highlights a potential biological mechanism for the supervised, or task-dependent, learning of feature hierarchies. In addition, we show that it is well suited for learning deep networks in custom hardware where it can drastically reduce memory traffic and data communication overheads. |
Tasks | |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06756v1 |
http://arxiv.org/pdf/1711.06756v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-supervised-learning-using-local-errors |
Repo | |
Framework | |
MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings
Title | MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings |
Authors | Arijit Biswas, Mukul Bhutani, Subhajit Sanyal |
Abstract | E-commerce websites such as Amazon, Alibaba, Flipkart, and Walmart sell billions of products. Machine learning (ML) algorithms involving products are often used to improve the customer experience and increase revenue, e.g., product similarity, recommendation, and price estimation. The products are required to be represented as features before training an ML algorithm. In this paper, we propose an approach called MRNet-Product2Vec for creating generic embeddings of products within an e-commerce ecosystem. We learn a dense and low-dimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multi-task Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted. The task set includes several intrinsic characteristics about a product such as price, weight, size, color, popularity, and material. We evaluate the proposed embedding quantitatively and qualitatively. We demonstrate that they are almost as good as sparse and extremely high-dimensional TF-IDF representation in spite of having less than 3% of the TF-IDF dimension. We also use a multimodal autoencoder for comparing products from different language-regions and show preliminary yet promising qualitative results. |
Tasks | |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07534v1 |
http://arxiv.org/pdf/1709.07534v1.pdf | |
PWC | https://paperswithcode.com/paper/mrnet-product2vec-a-multi-task-recurrent |
Repo | |
Framework | |
3D Morphable Models as Spatial Transformer Networks
Title | 3D Morphable Models as Spatial Transformer Networks |
Authors | Anil Bas, Patrik Huber, William A. P. Smith, Muhammad Awais, Josef Kittler |
Abstract | In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes. |
Tasks | |
Published | 2017-08-23 |
URL | http://arxiv.org/abs/1708.07199v1 |
http://arxiv.org/pdf/1708.07199v1.pdf | |
PWC | https://paperswithcode.com/paper/3d-morphable-models-as-spatial-transformer |
Repo | |
Framework | |
Visual Question Answering as a Meta Learning Task
Title | Visual Question Answering as a Meta Learning Task |
Authors | Damien Teney, Anton van den Hengel |
Abstract | The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. At test time, the method is provided with a support set of example questions/answers, over which it reasons to resolve the given question. The support set is not fixed and can be extended without retraining, thereby expanding the capabilities of the model. To exploit this dynamically provided information, we adapt a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and meta networks. Experiments demonstrate the capability of the system to learn to produce completely novel answers (i.e. never seen during training) from examples provided at test time. In comparison to the existing state of the art, the proposed method produces qualitatively distinct results with higher recall of rare answers, and a better sample efficiency that allows training with little initial data. More importantly, it represents an important step towards vision-and-language methods that can learn and reason on-the-fly. |
Tasks | Meta-Learning, Question Answering, Visual Question Answering |
Published | 2017-11-22 |
URL | http://arxiv.org/abs/1711.08105v1 |
http://arxiv.org/pdf/1711.08105v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-question-answering-as-a-meta-learning |
Repo | |
Framework | |
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Title | Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions |
Authors | Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà |
Abstract | We present a practical approach for processing mobile sensor time series data for continual deep learning predictions. The approach comprises data cleaning, normalization, capping, time-based compression, and finally classification with a recurrent neural network. We demonstrate the effectiveness of the approach in a case study with 279 participants. On the basis of sparse sensor events, the network continually predicts whether the participants would attend to a notification within 10 minutes. Compared to a random baseline, the classifier achieves a 40% performance increase (AUC of 0.702) on a withheld test set. This approach allows to forgo resource-intensive, domain-specific, error-prone feature engineering, which may drastically increase the applicability of machine learning to mobile phone sensor data. |
Tasks | Feature Engineering, Time Series |
Published | 2017-05-17 |
URL | http://arxiv.org/abs/1705.06224v1 |
http://arxiv.org/pdf/1705.06224v1.pdf | |
PWC | https://paperswithcode.com/paper/practical-processing-of-mobile-sensor-data |
Repo | |
Framework | |
DNN adaptation by automatic quality estimation of ASR hypotheses
Title | DNN adaptation by automatic quality estimation of ASR hypotheses |
Authors | Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi |
Abstract | In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of “good quality” instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the “true” sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores “predicted” by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline. |
Tasks | |
Published | 2017-02-06 |
URL | http://arxiv.org/abs/1702.01714v1 |
http://arxiv.org/pdf/1702.01714v1.pdf | |
PWC | https://paperswithcode.com/paper/dnn-adaptation-by-automatic-quality |
Repo | |
Framework | |
Improved Linear Embeddings via Lagrange Duality
Title | Improved Linear Embeddings via Lagrange Duality |
Authors | Kshiteej Sheth, Dinesh Garg, Anirban Dasgupta |
Abstract | Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion. |
Tasks | |
Published | 2017-11-30 |
URL | http://arxiv.org/abs/1711.11527v2 |
http://arxiv.org/pdf/1711.11527v2.pdf | |
PWC | https://paperswithcode.com/paper/improved-linear-embeddings-via-lagrange |
Repo | |
Framework | |
Distributed Statistical Estimation and Rates of Convergence in Normal Approximation
Title | Distributed Statistical Estimation and Rates of Convergence in Normal Approximation |
Authors | Stanislav Minsker, Nate Strawn |
Abstract | This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, as well as provide performance guarantees for distributed maximum likelihood estimation. |
Tasks | |
Published | 2017-04-09 |
URL | http://arxiv.org/abs/1704.02658v3 |
http://arxiv.org/pdf/1704.02658v3.pdf | |
PWC | https://paperswithcode.com/paper/distributed-statistical-estimation-and-rates |
Repo | |
Framework | |
Lecture video indexing using boosted margin maximizing neural networks
Title | Lecture video indexing using boosted margin maximizing neural networks |
Authors | Di Ma, Xi Zhang, Xu Ouyang, Gady Agam |
Abstract | This paper presents a novel approach for lecture video indexing using a boosted deep convolutional neural network system. The indexing is performed by matching high quality slide images, for which text is either known or extracted, to lower resolution video frames with possible noise, perspective distortion, and occlusions. We propose a deep neural network integrated with a boosting framework composed of two sub-networks targeting feature extraction and similarity determination to perform the matching. The trained network is given as input a pair of slide image and a candidate video frame image and produces the similarity between them. A boosting framework is integrated into our proposed network during the training process. Experimental results show that the proposed approach is much more capable of handling occlusion, spatial transformations, and other types of noises when compared with known approaches. |
Tasks | |
Published | 2017-12-02 |
URL | http://arxiv.org/abs/1712.00575v1 |
http://arxiv.org/pdf/1712.00575v1.pdf | |
PWC | https://paperswithcode.com/paper/lecture-video-indexing-using-boosted-margin |
Repo | |
Framework | |
Benchmark Environments for Multitask Learning in Continuous Domains
Title | Benchmark Environments for Multitask Learning in Continuous Domains |
Authors | Peter Henderson, Wei-Di Chang, Florian Shkurti, Johanna Hansen, David Meger, Gregory Dudek |
Abstract | As demand drives systems to generalize to various domains and problems, the study of multitask, transfer and lifelong learning has become an increasingly important pursuit. In discrete domains, performance on the Atari game suite has emerged as the de facto benchmark for assessing multitask learning. However, in continuous domains there is a lack of agreement on standard multitask evaluation environments which makes it difficult to compare different approaches fairly. In this work, we describe a benchmark set of tasks that we have developed in an extendable framework based on OpenAI Gym. We run a simple baseline using Trust Region Policy Optimization and release the framework publicly to be expanded and used for the systematic comparison of multitask, transfer, and lifelong learning in continuous domains. |
Tasks | |
Published | 2017-08-14 |
URL | http://arxiv.org/abs/1708.04352v1 |
http://arxiv.org/pdf/1708.04352v1.pdf | |
PWC | https://paperswithcode.com/paper/benchmark-environments-for-multitask-learning |
Repo | |
Framework | |