January 27, 2020

3204 words 16 mins read

Paper Group ANR 1234

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm. Eyenet: Attention based Convolutional Encoder-Decoder Network for Eye Region Segmentation. Efficient ConvNet-based Object Detection for Unmanned Aerial Vehicles by Selective Tile Processing. Learning Interactive Behaviors for Musculoskeletal Robots Using Bay …

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm


Title	Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
Authors	Stefano Spigler, Mario Geiger, Matthieu Wyart
Abstract	How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as $n^{-\beta}$ where $n$ is the number of training examples and $\beta$ an exponent that depends on both data and algorithm. In this work we measure $\beta$ when applying kernel methods to real datasets. For MNIST we find $\beta\approx 0.4$ and for CIFAR10 $\beta\approx 0.1$. Remarkably, $\beta$ is the same for regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we introduce the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption — namely that the data are sampled from a regular lattice — we derive analytically $\beta$ for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies, $\beta$ depends only on the training data and their dimension. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, our results quantify how smooth Gaussian data should be to avoid the curse of dimensionality, and indicate that for kernel learning the relevant dimension of the data should be defined in terms of how the distance between nearest data points depends on $n$. With this definition one obtains reasonable effective smoothness estimates for MNIST and CIFAR10.
Tasks
Published	2019-05-26
URL	https://arxiv.org/abs/1905.10843v5
PDF	https://arxiv.org/pdf/1905.10843v5.pdf
PWC	https://paperswithcode.com/paper/asymptotic-learning-curves-of-kernel-methods
Repo
Framework

Eyenet: Attention based Convolutional Encoder-Decoder Network for Eye Region Segmentation


Title	Eyenet: Attention based Convolutional Encoder-Decoder Network for Eye Region Segmentation
Authors	Priya Kansal, Sabari Nathan
Abstract	With the immersive development in the field of augmented and virtual reality, accurate and speedy eye-tracking is required. Facebook Research has organized a challenge, named OpenEDS Semantic Segmentation challenge for per-pixel segmentation of the key eye regions: the sclera, the iris, the pupil, and everything else (background). There are two constraints set for the participants viz MIOU and the computational complexity of the model. More recently, researchers have achieved quite a good result using the convolutional neural networks (CNN) in segmenting eyeregions. However, the environmental challenges involved in this task such as low resolution, blur, unusual glint and, illumination, off-angles, off-axis, use of glasses and different color of iris region hinder the accuracy of segmentation. To address the challenges in eye segmentation, the present work proposes a robust and computationally efficient attention-based convolutional encoder-decoder network for segmenting all the eye regions. Our model, named EyeNet, includes modified residual units as the backbone, two types of attention blocks and multi-scale supervision for segmenting the aforesaid four eye regions. Our proposed model achieved a total score of 0.974(EDS Evaluation metric) on test data, which demonstrates superior results compared to the baseline methods.
Tasks	Eye Tracking, Semantic Segmentation
Published	2019-10-08
URL	https://arxiv.org/abs/1910.03274v1
PDF	https://arxiv.org/pdf/1910.03274v1.pdf
PWC	https://paperswithcode.com/paper/eyenet-attention-based-convolutional-encoder
Repo
Framework

Efficient ConvNet-based Object Detection for Unmanned Aerial Vehicles by Selective Tile Processing


Title	Efficient ConvNet-based Object Detection for Unmanned Aerial Vehicles by Selective Tile Processing
Authors	George Plastiras, Christos Kyrkou, Theocharis Theocharides
Abstract	Many applications utilizing Unmanned Aerial Vehicles (UAVs) require the use of computer vision algorithms to analyze the information captured from their on-board camera. Recent advances in deep learning have made it possible to use single-shot Convolutional Neural Network (CNN) detection algorithms that process the input image to detect various objects of interest. To keep the computational demands low these neural networks typically operate on small image sizes which, however, makes it difficult to detect small objects. This is further emphasized when considering UAVs equipped with cameras where due to the viewing range, objects tend to appear relatively small. This paper therefore, explores the trade-offs involved when maintaining the resolution of the objects of interest by extracting smaller patches (tiles) from the larger input image and processing them using a neural network. Specifically, we introduce an attention mechanism to focus on detecting objects only in some of the tiles and a memory mechanism to keep track of information for tiles that are not processed. Through the analysis of different methods and experiments we show that by carefully selecting which tiles to process we can considerably improve the detection accuracy while maintaining comparable performance to CNNs that resize and process a single image which makes the proposed approach suitable for UAV applications.
Tasks	Object Detection
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06073v1
PDF	https://arxiv.org/pdf/1911.06073v1.pdf
PWC	https://paperswithcode.com/paper/efficient-convnet-based-object-detection-for
Repo
Framework

Learning Interactive Behaviors for Musculoskeletal Robots Using Bayesian Interaction Primitives


Title	Learning Interactive Behaviors for Musculoskeletal Robots Using Bayesian Interaction Primitives
Authors	Joseph Campbell, Arne Hitzmann, Simon Stepputtis, Shuhei Ikemoto, Koh Hosoda, Heni Ben Amor
Abstract	Musculoskeletal robots that are based on pneumatic actuation have a variety of properties, such as compliance and back-drivability, that render them particularly appealing for human-robot collaboration. However, programming interactive and responsive behaviors for such systems is extremely challenging due to the nonlinearity and uncertainty inherent to their control. In this paper, we propose an approach for learning Bayesian Interaction Primitives for musculoskeletal robots given a limited set of example demonstrations. We show that this approach is capable of real-time state estimation and response generation for interaction with a robot for which no analytical model exists. Human-robot interaction experiments on a ‘handshake’ task show that the approach generalizes to new positions, interaction partners, and movement velocities.
Tasks
Published	2019-08-15
URL	https://arxiv.org/abs/1908.05552v1
PDF	https://arxiv.org/pdf/1908.05552v1.pdf
PWC	https://paperswithcode.com/paper/learning-interactive-behaviors-for
Repo
Framework

Transfer Learning from Partial Annotations for Whole Brain Segmentation


Title	Transfer Learning from Partial Annotations for Whole Brain Segmentation
Authors	Chengliang Dai, Yuanhan Mo, Elsa Angelini, Yike Guo, Wenjia Bai
Abstract	Brain MR image segmentation is a key task in neuroimaging studies. It is commonly conducted using standard computational tools, such as FSL, SPM, multi-atlas segmentation etc, which are often registration-based and suffer from expensive computation cost. Recently, there is an increased interest using deep neural networks for brain image segmentation, which have demonstrated advantages in both speed and performance. However, neural networks-based approaches normally require a large amount of manual annotations for optimising the massive amount of network parameters. For 3D networks used in volumetric image segmentation, this has become a particular challenge, as a 3D network consists of many more parameters compared to its 2D counterpart. Manual annotation of 3D brain images is extremely time-consuming and requires extensive involvement of trained experts. To address the challenge with limited manual annotations, here we propose a novel multi-task learning framework for brain image segmentation, which utilises a large amount of automatically generated partial annotations together with a small set of manually created full annotations for network training. Our method yields a high performance comparable to state-of-the-art methods for whole brain segmentation.
Tasks	Brain Image Segmentation, Brain Segmentation, Multi-Task Learning, Semantic Segmentation, Transfer Learning
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10851v1
PDF	https://arxiv.org/pdf/1908.10851v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-from-partial-annotations
Repo
Framework

Detecting Alzheimer’s Disease by estimating attention and elicitation path through the alignment of spoken picture descriptions with the picture prompt


Title	Detecting Alzheimer’s Disease by estimating attention and elicitation path through the alignment of spoken picture descriptions with the picture prompt
Authors	Bahman Mirheidari, Yilin Pan, Traci Walker, Markus Reuber, Annalena Venneri, Daniel Blackburn, Heidi Christensen
Abstract	Cognitive decline is a sign of Alzheimer’s disease (AD), and there is evidence that tracking a person’s eye movement, using eye tracking devices, can be used for the automatic identification of early signs of cognitive decline. However, such devices are expensive and may not be easy-to-use for people with cognitive problems. In this paper, we present a new way of capturing similar visual features, by using the speech of people describing the Cookie Theft picture - a common cognitive testing task - to identify regions in the picture prompt that will have caught the speaker’s attention and elicited their speech. After aligning the automatically recognised words with different regions of the picture prompt, we extract information inspired by eye tracking metrics such as coordinates of the area of interests (AOI)s, time spent in AOI, time to reach the AOI, and the number of AOI visits. Using the DementiaBank dataset we train a binary classifier (AD vs. healthy control) using 10-fold cross-validation and achieve an 80% F1-score using the timing information from the forced alignments of the automatic speech recogniser (ASR); this achieved around 72% using the timing information from the ASR outputs.
Tasks	Eye Tracking
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00515v1
PDF	https://arxiv.org/pdf/1910.00515v1.pdf
PWC	https://paperswithcode.com/paper/detecting-alzheimers-disease-by-estimating
Repo
Framework

Continual Learning in Practice


Title	Continual Learning in Practice
Authors	Tom Diethe, Tom Borchert, Eno Thereska, Borja Balle, Neil Lawrence
Abstract	This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine Learning. We describe the challenges and proposes a reference architecture.
Tasks	AutoML, Continual Learning
Published	2019-03-12
URL	http://arxiv.org/abs/1903.05202v2
PDF	http://arxiv.org/pdf/1903.05202v2.pdf
PWC	https://paperswithcode.com/paper/continual-learning-in-practice
Repo
Framework

Runtime Analysis of Fitness-Proportionate Selection on Linear Functions


Title	Runtime Analysis of Fitness-Proportionate Selection on Linear Functions
Authors	Duc-Cuong Dang, Anton Eremeev, Per Kristian Lehre
Abstract	This paper extends the runtime analysis of non-elitist evolutionary algorithms (EAs) with fitness-proportionate selection from the simple OneMax function to the linear functions. Not only does our analysis cover a larger class of fitness functions, it also holds for a wider range of mutation rates. We show that with overwhelmingly high probability, no linear function can be optimised in less than exponential time, assuming bitwise mutation rate $\Theta(1/n)$ and population size $\lambda=n^k$ for any constant $k>2$. In contrast to this negative result, we also show that for any linear function with polynomially bounded weights, the EA achieves a polynomial expected runtime if the mutation rate is reduced to $\Theta(1/n^2)$ and the population size is sufficiently large. Furthermore, the EA with mutation rate $\chi/n=\Theta(1/n)$ and modest population size $\lambda=\Omega(\ln n)$ optimises the scaled fitness function $e^{(\chi+\varepsilon)f(x)}$ for any linear function $f$ and any $\varepsilon>0$ in expected time $O(n\lambda\ln\lambda+n^2)$. These upper bounds also extend to some additively decomposed fitness functions, such as the Royal Road functions. We expect that the obtained results may be useful not only for the development of the theory of evolutionary algorithms, but also for biological applications, such as the directed evolution.
Tasks
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08686v1
PDF	https://arxiv.org/pdf/1908.08686v1.pdf
PWC	https://paperswithcode.com/paper/runtime-analysis-of-fitness-proportionate
Repo
Framework

Towards Making a Dependency Parser See


Title	Towards Making a Dependency Parser See
Authors	Michalina Strzyz, David Vilares, Carlos Gómez-Rodríguez
Abstract	We explore whether it is possible to leverage eye-tracking data in an RNN dependency parser (for English) when such information is only available during training, i.e., no aggregated or token-level gaze features are used at inference time. To do so, we train a multitask learning model that parses sentences as sequence labeling and leverages gaze features as auxiliary tasks. Our method also learns to train from disjoint datasets, i.e. it can be used to test whether already collected gaze features are useful to improve the performance on new non-gazed annotated treebanks. Accuracy gains are modest but positive, showing the feasibility of the approach. It can serve as a first step towards architectures that can better leverage eye-tracking data or other complementary information available only for training sentences, possibly leading to improvements in syntactic parsing.
Tasks	Eye Tracking
Published	2019-09-03
URL	https://arxiv.org/abs/1909.01053v1
PDF	https://arxiv.org/pdf/1909.01053v1.pdf
PWC	https://paperswithcode.com/paper/towards-making-a-dependency-parser-see
Repo
Framework

Region Tracking in an Image Sequence: Preventing Driver Inattention


Title	Region Tracking in an Image Sequence: Preventing Driver Inattention
Authors	Matthew Kowal, Gillian Sandison, Len Yabuki-Soh, Raner la Bastide
Abstract	Driver inattention is a large problem on the roads around the world. The objective of this project was to develop an eye tracking algorithm with sufficient computational efficiency and accuracy, to successfully realize when the driver was looking away from the road for an extended period. The method of tracking involved the minimization of a functional, using the gradient descent and level set methods. The algorithm was then discretized and implemented using C and MATLAB. Multiple synthetic images, grey-scale and colour images were tested using the final design, with a desired region coverage of 82%. Further work is needed to decrease the computation time, increase the robustness of the algorithm, develop a small device capable of running the algorithm, as well as physically implement this device into various vehicles.
Tasks	Eye Tracking
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08914v1
PDF	https://arxiv.org/pdf/1908.08914v1.pdf
PWC	https://paperswithcode.com/paper/region-tracking-in-an-image-sequence
Repo
Framework

cFineGAN: Unsupervised multi-conditional fine-grained image generation


Title	cFineGAN: Unsupervised multi-conditional fine-grained image generation
Authors	Gunjan Aggarwal, Abhishek Sinha
Abstract	We propose an unsupervised multi-conditional image generation pipeline: cFineGAN, that can generate an image conditioned on two input images such that the generated image preserves the texture of one and the shape of the other input. To achieve this goal, we extend upon the recently proposed work of FineGAN \citep{singh2018finegan} and make use of standard as well as shape-biased pre-trained ImageNet models. We demonstrate both qualitatively as well as quantitatively the benefit of using the shape-biased network. We present our image generation result across three benchmark datasets- CUB-200-2011, Stanford Dogs and UT Zappos50k.
Tasks	Conditional Image Generation, Image Generation
Published	2019-12-06
URL	https://arxiv.org/abs/1912.05028v1
PDF	https://arxiv.org/pdf/1912.05028v1.pdf
PWC	https://paperswithcode.com/paper/cfinegan-unsupervised-multi-conditional-fine
Repo
Framework

Deep Lifetime Clustering


Title	Deep Lifetime Clustering
Authors	S Chandra Mouli, Leonardo Teixeira, Jennifer Neville, Bruno Ribeiro
Abstract	The goal of lifetime clustering is to develop an inductive model that maps subjects into $K$ clusters according to their underlying (unobserved) lifetime distribution. We introduce a neural-network based lifetime clustering model that can find cluster assignments by directly maximizing the divergence between the empirical lifetime distributions of the clusters. Accordingly, we define a novel clustering loss function over the lifetime distributions (of entire clusters) based on a tight upper bound of the two-sample Kuiper test p-value. The resultant model is robust to the modeling issues associated with the unobservability of termination signals, and does not assume proportional hazards. Our results in real and synthetic datasets show significantly better lifetime clusters (as evaluated by C-index, Brier Score, Logrank score and adjusted Rand index) as compared to competing approaches.
Tasks
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00547v2
PDF	https://arxiv.org/pdf/1910.00547v2.pdf
PWC	https://paperswithcode.com/paper/deep-lifetime-clustering-1
Repo
Framework

Structure Learning with Similarity Preserving


Title	Structure Learning with Similarity Preserving
Authors	Zhao Kang, Xiao Lu, Yiwei Lu, Chong Peng, Zenglin Xu
Abstract	Leveraging on the underlying low-dimensional structure of data, low-rank and sparse modeling approaches have achieved great success in a wide range of applications. However, in many applications the data can display structures beyond simply being low-rank or sparse. Fully extracting and exploiting hidden structure information in the data is always desirable and favorable. To reveal more underlying effective manifold structure, in this paper, we explicitly model the data relation. Specifically, we propose a structure learning framework that retains the pairwise similarities between the data points. Rather than just trying to reconstruct the original data based on self-expression, we also manage to reconstruct the kernel matrix, which functions as similarity preserving. Consequently, this technique is particularly suitable for the class of learning problems that are sensitive to sample similarity, e.g., clustering and semisupervised classification. To take advantage of representation power of deep neural network, a deep auto-encoder architecture is further designed to implement our model. Extensive experiments on benchmark data sets demonstrate that our proposed framework can consistently and significantly improve performance on both evaluation tasks. We conclude that the quality of structure learning can be enhanced if similarity information is incorporated.
Tasks
Published	2019-12-03
URL	https://arxiv.org/abs/1912.01197v1
PDF	https://arxiv.org/pdf/1912.01197v1.pdf
PWC	https://paperswithcode.com/paper/structure-learning-with-similarity-preserving
Repo
Framework

Recovery of Superquadrics from Range Images using Deep Learning: A Preliminary Study


Title	Recovery of Superquadrics from Range Images using Deep Learning: A Preliminary Study
Authors	Tim Oblak, Klemen Grm, Aleš Jaklič, Peter Peer, Vitomir Štruc, Franc Solina
Abstract	It has been a longstanding goal in computer vision to describe the 3D physical space in terms of parameterized volumetric models that would allow autonomous machines to understand and interact with their surroundings. Such models are typically motivated by human visual perception and aim to represents all elements of the physical word ranging from individual objects to complex scenes using a small set of parameters. One of the de facto stadards to approach this problem are superquadrics - volumetric models that define various 3D shape primitives and can be fitted to actual 3D data (either in the form of point clouds or range images). However, existing solutions to superquadric recovery involve costly iterative fitting procedures, which limit the applicability of such techniques in practice. To alleviate this problem, we explore in this paper the possibility to recover superquadrics from range images without time consuming iterative parameter estimation techniques by using contemporary deep-learning models, more specifically, convolutional neural networks (CNNs). We pose the superquadric recovery problem as a regression task and develop a CNN regressor that is able to estimate the parameters of a superquadric model from a given range image. We train the regressor on a large set of synthetic range images, each containing a single (unrotated) superquadric shape and evaluate the learned model in comparaitve experiments with the current state-of-the-art. Additionally, we also present a qualitative analysis involving a dataset of real-world objects. The results of our experiments show that the proposed regressor not only outperforms the existing state-of-the-art, but also ensures a 270x faster execution time.
Tasks
Published	2019-04-13
URL	https://arxiv.org/abs/1904.06585v2
PDF	https://arxiv.org/pdf/1904.06585v2.pdf
PWC	https://paperswithcode.com/paper/recovery-of-superquadrics-from-range-images
Repo
Framework

Stochastic Conditional Generative Networks with Basis Decomposition


Title	Stochastic Conditional Generative Networks with Basis Decomposition
Authors	Ze Wang, Xiuyuan Cheng, Guillermo Sapiro, Qiang Qiu
Abstract	While generative adversarial networks (GANs) have revolutionized machine learning, a number of open questions remain to fully understand them and exploit their power. One of these questions is how to efficiently achieve proper diversity and sampling of the multi-mode data space. To address this, we introduce BasisGAN, a stochastic conditional multi-mode image generator. By exploiting the observation that a convolutional filter can be well approximated as a linear combination of a small set of basis elements, we learn a plug-and-played basis generator to stochastically generate basis elements, with just a few hundred of parameters, to fully embed stochasticity into convolutional filters. By sampling basis elements instead of filters, we dramatically reduce the cost of modeling the parameter space with no sacrifice on either image diversity or fidelity. To illustrate this proposed plug-and-play framework, we construct variants of BasisGAN based on state-of-the-art conditional image generation networks, and train the networks by simply plugging in a basis generator, without additional auxiliary components, hyperparameters, or training objectives. The experimental success is complemented with theoretical results indicating how the perturbations introduced by the proposed sampling of basis elements can propagate to the appearance of generated images.
Tasks	Conditional Image Generation, Image Generation
Published	2019-09-25
URL	https://arxiv.org/abs/1909.11286v2
PDF	https://arxiv.org/pdf/1909.11286v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-conditional-generative-networks
Repo
Framework