July 29, 2019

3109 words 15 mins read

Paper Group ANR 25

Deep Detection of People and their Mobility Aids for a Hospital Robot. Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques. Learning to Learn from Weak Supervision by Full Supervision. Learning Relevant Features of Data with Multi-scale Tensor Networks. Exploiting the potential of unlabeled end …

Deep Detection of People and their Mobility Aids for a Hospital Robot


Title	Deep Detection of People and their Mobility Aids for a Hospital Robot
Authors	Andres Vasquez, Marina Kollmitz, Andreas Eitel, Wolfram Burgard
Abstract	Robots operating in populated environments encounter many different types of people, some of whom might have an advanced need for cautious interaction, because of physical impairments or their advanced age. Robots therefore need to recognize such advanced demands to provide appropriate assistance, guidance or other forms of support. In this paper, we propose a depth-based perception pipeline that estimates the position and velocity of people in the environment and categorizes them according to the mobility aids they use: pedestrian, person in wheelchair, person in a wheelchair with a person pushing them, person with crutches and person using a walker. We present a fast region proposal method that feeds a Region-based Convolutional Network (Fast R-CNN). With this, we speed up the object detection process by a factor of seven compared to a dense sliding window approach. We furthermore propose a probabilistic position, velocity and class estimator to smooth the CNN’s detections and account for occlusions and misclassifications. In addition, we introduce a new hospital dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm that our pipeline successfully keeps track of people and their mobility aids, even in challenging situations with multiple people from different categories and frequent occlusions. Videos of our experiments and the dataset are available at http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAids
Tasks	Object Detection
Published	2017-08-02
URL	http://arxiv.org/abs/1708.00674v1
PDF	http://arxiv.org/pdf/1708.00674v1.pdf
PWC	https://paperswithcode.com/paper/deep-detection-of-people-and-their-mobility
Repo
Framework

Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques


Title	Contour and Centreline Tracking of Vessels from Angiograms using the Classical Image Processing Techniques
Authors	Tache Irina Andra
Abstract	This article deals with the problem of vessel edge and centerline detection using classical image processing techniques due to their simpleness and easiness to be implemented. The method is divided into four steps: the vessel enhancement which implies a non-linear filtering proposed by Frangi, the thresholding using Otsu method and the contour detection using the Canny edge detector due to its good performances for the small vessels and the morphological skeletonisation. The algorithms are tested on real data collected from a cardiac catheterism laboratory and it is accurate for images with good spatial resolution (512*512). The output image can be used for further processing in order to find the vessel length or its radius.
Tasks	Contour Detection
Published	2017-06-13
URL	http://arxiv.org/abs/1707.03710v1
PDF	http://arxiv.org/pdf/1707.03710v1.pdf
PWC	https://paperswithcode.com/paper/contour-and-centreline-tracking-of-vessels
Repo
Framework

Learning to Learn from Weak Supervision by Full Supervision


Title	Learning to Learn from Weak Supervision by Full Supervision
Authors	Mostafa Dehghani, Aliaksei Severyn, Sascha Rothe, Jaap Kamps
Abstract	In this paper, we propose a method for training neural networks when we have a large set of data with weak labels and a small amount of data with true labels. In our proposed model, we train two neural networks: a target network, the learner and a confidence network, the meta-learner. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to control the magnitude of the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model.
Tasks
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11383v1
PDF	http://arxiv.org/pdf/1711.11383v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-learn-from-weak-supervision-by
Repo
Framework

Learning Relevant Features of Data with Multi-scale Tensor Networks


Title	Learning Relevant Features of Data with Multi-scale Tensor Networks
Authors	E. M. Stoudenmire
Abstract	Inspired by coarse-graining approaches used in physics, we show how similar algorithms can be adapted for data. The resulting algorithms are based on layered tree tensor networks and scale linearly with both the dimension of the input and the training set size. Computing most of the layers with an unsupervised algorithm, then optimizing just the top layer for supervised classification of the MNIST and fashion-MNIST data sets gives very good results. We also discuss mixing a prior guess for supervised weights together with an unsupervised representation of the data, yielding a smaller number of features nevertheless able to give good performance.
Tasks	Tensor Networks
Published	2017-12-31
URL	http://arxiv.org/abs/1801.00315v1
PDF	http://arxiv.org/pdf/1801.00315v1.pdf
PWC	https://paperswithcode.com/paper/learning-relevant-features-of-data-with-multi
Repo
Framework

Exploiting the potential of unlabeled endoscopic video data with self-supervised learning


Title	Exploiting the potential of unlabeled endoscopic video data with self-supervised learning
Authors	Tobias Ross, David Zimmerer, Anant Vemuri, Fabian Isensee, Manuel Wiesenfarth, Sebastian Bodenstedt, Fabian Both, Philip Kessler, Martin Wagner, Beat Müller, Hannes Kenngott, Stefanie Speidel, Annette Kopp-Schneider, Klaus Maier-Hein, Lena Maier-Hein
Abstract	Surgical data science is a new research field that aims to observe all aspects of the patient treatment process in order to provide the right assistance at the right time. Due to the breakthrough successes of deep learning-based solutions for automatic image annotation, the availability of reference annotations for algorithm training is becoming a major bottleneck in the field. The purpose of this paper was to investigate the concept of self-supervised learning to address this issue. Our approach is guided by the hypothesis that unlabeled video data can be used to learn a representation of the target domain that boosts the performance of state-of-the-art machine learning algorithms when used for pre-training. Core of the method is an auxiliary task based on raw endoscopic video data of the target domain that is used to initialize the convolutional neural network (CNN) for the target task. In this paper, we propose the re-colorization of medical images with a generative adversarial network (GAN)-based architecture as auxiliary task. A variant of the method involves a second pre-training step based on labeled data for the target task from a related domain. We validate both variants using medical instrument segmentation as target task. The proposed approach can be used to radically reduce the manual annotation effort involved in training CNNs. Compared to the baseline approach of generating annotated data from scratch, our method decreases exploratively the number of labeled images by up to 75% without sacrificing performance. Our method also outperforms alternative methods for CNN pre-training, such as pre-training on publicly available non-medical or medical data using the target task (in this instance: segmentation). As it makes efficient use of available (non-)public and (un-)labeled data, the approach has the potential to become a valuable tool for CNN (pre-)training.
Tasks	Colorization, Instance Segmentation, Semantic Segmentation
Published	2017-11-27
URL	http://arxiv.org/abs/1711.09726v3
PDF	http://arxiv.org/pdf/1711.09726v3.pdf
PWC	https://paperswithcode.com/paper/exploiting-the-potential-of-unlabeled
Repo
Framework

Deep supervised learning using local errors


Title	Deep supervised learning using local errors
Authors	Hesham Mostafa, Vishwajith Ramesh, Gert Cauwenberghs
Abstract	Error backpropagation is a highly effective mechanism for learning high-quality hierarchical features in deep networks. Updating the features or weights in one layer, however, requires waiting for the propagation of error signals from higher layers. Learning using delayed and non-local errors makes it hard to reconcile backpropagation with the learning mechanisms observed in biological neural networks as it requires the neurons to maintain a memory of the input long enough until the higher-layer errors arrive. In this paper, we propose an alternative learning mechanism where errors are generated locally in each layer using fixed, random auxiliary classifiers. Lower layers could thus be trained independently of higher layers and training could either proceed layer by layer, or simultaneously in all layers using local error information. We address biological plausibility concerns such as weight symmetry requirements and show that the proposed learning mechanism based on fixed, broad, and random tuning of each neuron to the classification categories outperforms the biologically-motivated feedback alignment learning technique on the MNIST, CIFAR10, and SVHN datasets, approaching the performance of standard backpropagation. Our approach highlights a potential biological mechanism for the supervised, or task-dependent, learning of feature hierarchies. In addition, we show that it is well suited for learning deep networks in custom hardware where it can drastically reduce memory traffic and data communication overheads.
Tasks
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06756v1
PDF	http://arxiv.org/pdf/1711.06756v1.pdf
PWC	https://paperswithcode.com/paper/deep-supervised-learning-using-local-errors
Repo
Framework

MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings


Title	MRNet-Product2Vec: A Multi-task Recurrent Neural Network for Product Embeddings
Authors	Arijit Biswas, Mukul Bhutani, Subhajit Sanyal
Abstract	E-commerce websites such as Amazon, Alibaba, Flipkart, and Walmart sell billions of products. Machine learning (ML) algorithms involving products are often used to improve the customer experience and increase revenue, e.g., product similarity, recommendation, and price estimation. The products are required to be represented as features before training an ML algorithm. In this paper, we propose an approach called MRNet-Product2Vec for creating generic embeddings of products within an e-commerce ecosystem. We learn a dense and low-dimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multi-task Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted. The task set includes several intrinsic characteristics about a product such as price, weight, size, color, popularity, and material. We evaluate the proposed embedding quantitatively and qualitatively. We demonstrate that they are almost as good as sparse and extremely high-dimensional TF-IDF representation in spite of having less than 3% of the TF-IDF dimension. We also use a multimodal autoencoder for comparing products from different language-regions and show preliminary yet promising qualitative results.
Tasks
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07534v1
PDF	http://arxiv.org/pdf/1709.07534v1.pdf
PWC	https://paperswithcode.com/paper/mrnet-product2vec-a-multi-task-recurrent
Repo
Framework

3D Morphable Models as Spatial Transformer Networks


Title	3D Morphable Models as Spatial Transformer Networks
Authors	Anil Bas, Patrik Huber, William A. P. Smith, Muhammad Awais, Josef Kittler
Abstract	In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes.
Tasks
Published	2017-08-23
URL	http://arxiv.org/abs/1708.07199v1
PDF	http://arxiv.org/pdf/1708.07199v1.pdf
PWC	https://paperswithcode.com/paper/3d-morphable-models-as-spatial-transformer
Repo
Framework

Visual Question Answering as a Meta Learning Task


Title	Visual Question Answering as a Meta Learning Task
Authors	Damien Teney, Anton van den Hengel
Abstract	The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. At test time, the method is provided with a support set of example questions/answers, over which it reasons to resolve the given question. The support set is not fixed and can be extended without retraining, thereby expanding the capabilities of the model. To exploit this dynamically provided information, we adapt a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and meta networks. Experiments demonstrate the capability of the system to learn to produce completely novel answers (i.e. never seen during training) from examples provided at test time. In comparison to the existing state of the art, the proposed method produces qualitatively distinct results with higher recall of rare answers, and a better sample efficiency that allows training with little initial data. More importantly, it represents an important step towards vision-and-language methods that can learn and reason on-the-fly.
Tasks	Meta-Learning, Question Answering, Visual Question Answering
Published	2017-11-22
URL	http://arxiv.org/abs/1711.08105v1
PDF	http://arxiv.org/pdf/1711.08105v1.pdf
PWC	https://paperswithcode.com/paper/visual-question-answering-as-a-meta-learning
Repo
Framework

Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions


Title	Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Authors	Kleomenis Katevas, Ilias Leontiadis, Martin Pielot, Joan Serrà
Abstract	We present a practical approach for processing mobile sensor time series data for continual deep learning predictions. The approach comprises data cleaning, normalization, capping, time-based compression, and finally classification with a recurrent neural network. We demonstrate the effectiveness of the approach in a case study with 279 participants. On the basis of sparse sensor events, the network continually predicts whether the participants would attend to a notification within 10 minutes. Compared to a random baseline, the classifier achieves a 40% performance increase (AUC of 0.702) on a withheld test set. This approach allows to forgo resource-intensive, domain-specific, error-prone feature engineering, which may drastically increase the applicability of machine learning to mobile phone sensor data.
Tasks	Feature Engineering, Time Series
Published	2017-05-17
URL	http://arxiv.org/abs/1705.06224v1
PDF	http://arxiv.org/pdf/1705.06224v1.pdf
PWC	https://paperswithcode.com/paper/practical-processing-of-mobile-sensor-data
Repo
Framework

DNN adaptation by automatic quality estimation of ASR hypotheses


Title	DNN adaptation by automatic quality estimation of ASR hypotheses
Authors	Daniele Falavigna, Marco Matassoni, Shahab Jalalvand, Matteo Negri, Marco Turchi
Abstract	In this paper we propose to exploit the automatic Quality Estimation (QE) of ASR hypotheses to perform the unsupervised adaptation of a deep neural network modeling acoustic probabilities. Our hypothesis is that significant improvements can be achieved by: i)automatically transcribing the evaluation data we are currently trying to recognise, and ii) selecting from it a subset of “good quality” instances based on the word error rate (WER) scores predicted by a QE component. To validate this hypothesis, we run several experiments on the evaluation data sets released for the CHiME-3 challenge. First, we operate in oracle conditions in which manual transcriptions of the evaluation data are available, thus allowing us to compute the “true” sentence WER. In this scenario, we perform the adaptation with variable amounts of data, which are characterised by different levels of quality. Then, we move to realistic conditions in which the manual transcriptions of the evaluation data are not available. In this case, the adaptation is performed on data selected according to the WER scores “predicted” by a QE component. Our results indicate that: i) QE predictions allow us to closely approximate the adaptation results obtained in oracle conditions, and ii) the overall ASR performance based on the proposed QE-driven adaptation method is significantly better than the strong, most recent, CHiME-3 baseline.
Tasks
Published	2017-02-06
URL	http://arxiv.org/abs/1702.01714v1
PDF	http://arxiv.org/pdf/1702.01714v1.pdf
PWC	https://paperswithcode.com/paper/dnn-adaptation-by-automatic-quality
Repo
Framework

Improved Linear Embeddings via Lagrange Duality


Title	Improved Linear Embeddings via Lagrange Duality
Authors	Kshiteej Sheth, Dinesh Garg, Anirban Dasgupta
Abstract	Near isometric orthogonal embeddings to lower dimensions are a fundamental tool in data science and machine learning. In this paper, we present the construction of such embeddings that minimizes the maximum distortion for a given set of points. We formulate the problem as a non convex constrained optimization problem. We first construct a primal relaxation and then use the theory of Lagrange duality to create dual relaxation. We also suggest a polynomial time algorithm based on the theory of convex optimization to solve the dual relaxation provably. We provide a theoretical upper bound on the approximation guarantees for our algorithm, which depends only on the spectral properties of the dataset. We experimentally demonstrate the superiority of our algorithm compared to baselines in terms of the scalability and the ability to achieve lower distortion.
Tasks
Published	2017-11-30
URL	http://arxiv.org/abs/1711.11527v2
PDF	http://arxiv.org/pdf/1711.11527v2.pdf
PWC	https://paperswithcode.com/paper/improved-linear-embeddings-via-lagrange
Repo
Framework

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation


Title	Distributed Statistical Estimation and Rates of Convergence in Normal Approximation
Authors	Stanislav Minsker, Nate Strawn
Abstract	This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important characteristic for large distributed systems. We establish connections between performance of these distributed algorithms and the rates of convergence in normal approximation, and prove non-asymptotic deviations guarantees, as well as limit theorems, for the resulting estimators. Our techniques are illustrated through several examples: in particular, we obtain new results for the median-of-means estimator, as well as provide performance guarantees for distributed maximum likelihood estimation.
Tasks
Published	2017-04-09
URL	http://arxiv.org/abs/1704.02658v3
PDF	http://arxiv.org/pdf/1704.02658v3.pdf
PWC	https://paperswithcode.com/paper/distributed-statistical-estimation-and-rates
Repo
Framework

Lecture video indexing using boosted margin maximizing neural networks


Title	Lecture video indexing using boosted margin maximizing neural networks
Authors	Di Ma, Xi Zhang, Xu Ouyang, Gady Agam
Abstract	This paper presents a novel approach for lecture video indexing using a boosted deep convolutional neural network system. The indexing is performed by matching high quality slide images, for which text is either known or extracted, to lower resolution video frames with possible noise, perspective distortion, and occlusions. We propose a deep neural network integrated with a boosting framework composed of two sub-networks targeting feature extraction and similarity determination to perform the matching. The trained network is given as input a pair of slide image and a candidate video frame image and produces the similarity between them. A boosting framework is integrated into our proposed network during the training process. Experimental results show that the proposed approach is much more capable of handling occlusion, spatial transformations, and other types of noises when compared with known approaches.
Tasks
Published	2017-12-02
URL	http://arxiv.org/abs/1712.00575v1
PDF	http://arxiv.org/pdf/1712.00575v1.pdf
PWC	https://paperswithcode.com/paper/lecture-video-indexing-using-boosted-margin
Repo
Framework

Benchmark Environments for Multitask Learning in Continuous Domains


Title	Benchmark Environments for Multitask Learning in Continuous Domains
Authors	Peter Henderson, Wei-Di Chang, Florian Shkurti, Johanna Hansen, David Meger, Gregory Dudek
Abstract	As demand drives systems to generalize to various domains and problems, the study of multitask, transfer and lifelong learning has become an increasingly important pursuit. In discrete domains, performance on the Atari game suite has emerged as the de facto benchmark for assessing multitask learning. However, in continuous domains there is a lack of agreement on standard multitask evaluation environments which makes it difficult to compare different approaches fairly. In this work, we describe a benchmark set of tasks that we have developed in an extendable framework based on OpenAI Gym. We run a simple baseline using Trust Region Policy Optimization and release the framework publicly to be expanded and used for the systematic comparison of multitask, transfer, and lifelong learning in continuous domains.
Tasks
Published	2017-08-14
URL	http://arxiv.org/abs/1708.04352v1
PDF	http://arxiv.org/pdf/1708.04352v1.pdf
PWC	https://paperswithcode.com/paper/benchmark-environments-for-multitask-learning
Repo
Framework