Paper Group ANR 527
Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception. Towards Understanding the Importance of Shortcut Connections in Residual Networks. Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training. Facial Makeup Transfer Combining Illumination Transfe …
Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception
Title | Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception |
Authors | Léni K. Le Goff, Ghanim Mukhtar, Alexandre Coninx, Stéphane Doncieux |
Abstract | To solve its task, a robot needs to have the ability to interpret its perceptions. In vision, this interpretation is particularly difficult and relies on the understanding of the structure of the scene, at least to the extent of its task and sensorimotor abilities. A robot with the ability to build and adapt this interpretation process according to its own tasks and capabilities would push away the limits of what robots can achieve in a non controlled environment. A solution is to provide the robot with processes to build such representations that are not specific to an environment or a situation. A lot of works focus on objects segmentation, recognition and manipulation. Defining an object solely on the basis of its visual appearance is challenging given the wide range of possible objects and environments. Therefore, current works make simplifying assumptions about the structure of a scene. Such assumptions reduce the adaptivity of the object extraction process to the environments in which the assumption holds. To limit such assumptions, we introduce an exploration method aimed at identifying moveable elements in a scene without considering the concept of object. By using the interactive perception framework, we aim at bootstrapping the acquisition process of a representation of the environment with a minimum of context specific assumptions. The robotic system builds a perceptual map called relevance map which indicates the moveable parts of the current scene. A classifier is trained online to predict the category of each region (moveable or non-moveable). It is also used to select a region with which to interact, with the goal of minimizing the uncertainty of the classification. A specific classifier is introduced to fit these needs: the collaborative mixture models classifier. The method is tested on a set of scenarios of increasing complexity, using both simulations and a PR2 robot. |
Tasks | |
Published | 2019-01-30 |
URL | http://arxiv.org/abs/1901.10968v1 |
http://arxiv.org/pdf/1901.10968v1.pdf | |
PWC | https://paperswithcode.com/paper/bootstrapping-robotic-ecological-perception |
Repo | |
Framework | |
Towards Understanding the Importance of Shortcut Connections in Residual Networks
Title | Towards Understanding the Importance of Shortcut Connections in Residual Networks |
Authors | Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao |
Abstract | Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory. |
Tasks | |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04653v3 |
https://arxiv.org/pdf/1909.04653v3.pdf | |
PWC | https://paperswithcode.com/paper/towards-understanding-the-importance-of |
Repo | |
Framework | |
Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
Title | Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training |
Authors | Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler |
Abstract | We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying semi-supervised training to 17.55 hours of untranscribed speech. We make use of factorised time-delay neural networks (TDNN-F) for acoustic modelling, since these have recently been shown to be effective in resource-scarce situations. Three semi-supervised training passes were performed, where the decoded output from each pass was used for acoustic model training in the subsequent pass. The automatic transcriptions from the best performing pass were used for language model augmentation. To ensure the quality of automatic transcriptions, decoder confidence is used as a threshold. The acoustic and language models obtained from the semi-supervised approach show significant improvement in terms of WER and perplexity compared to the baseline. Incorporating the automatically generated transcriptions yields a 6.55% improvement in language model perplexity. The use of 17.55 hour of Somali acoustic data in semi-supervised training shows an improvement of 7.74% relative over the baseline. |
Tasks | Acoustic Modelling, Keyword Spotting, Language Modelling, Speech Recognition |
Published | 2019-07-06 |
URL | https://arxiv.org/abs/1907.03064v1 |
https://arxiv.org/pdf/1907.03064v1.pdf | |
PWC | https://paperswithcode.com/paper/improved-low-resource-somali-speech |
Repo | |
Framework | |
Facial Makeup Transfer Combining Illumination Transfer
Title | Facial Makeup Transfer Combining Illumination Transfer |
Authors | Xin Jin, Rui Han, Ning Ning, Xiaodong Li, Xiaokun Zhang |
Abstract | To meet the women appearance needs, we present a novel virtual experience approach of facial makeup transfer, developed into windows platform application software. The makeup effects could present on the user’s input image in real time, with an only single reference image. The input image and reference image are divided into three layers by facial feature points landmarked: facial structure layer, facial color layer, and facial detail layer. Except for the above layers are processed by different algorithms to generate output image, we also add illumination transfer, so that the illumination effect of the reference image is automatically transferred to the input image. Our approach has the following three advantages: (1) Black or dark and white facial makeup could be effectively transferred by introducing illumination transfer; (2) Efficiently transfer facial makeup within seconds compared to those methods based on deep learning frameworks; (3) Reference images with the air-bangs could transfer makeup perfectly. |
Tasks | Facial Makeup Transfer |
Published | 2019-07-08 |
URL | https://arxiv.org/abs/1907.03398v1 |
https://arxiv.org/pdf/1907.03398v1.pdf | |
PWC | https://paperswithcode.com/paper/facial-makeup-transfer-combining-illumination |
Repo | |
Framework | |
QC-Automator: Deep Learning-based Automated Quality Control for Diffusion MR Images
Title | QC-Automator: Deep Learning-based Automated Quality Control for Diffusion MR Images |
Authors | Zahra Riahi Samani, Jacob Antony Alappatt, Drew Parker, Abdol Aziz Ould Ismail, Ragini Verma |
Abstract | Quality assessment of diffusion MRI (dMRI) data is essential prior to any analysis, so that appropriate pre-processing can be used to improve data quality and ensure that the presence of MRI artifacts do not affect the results of subsequent image analysis. Manual quality assessment of the data is subjective, possibly error-prone, and infeasible, especially considering the growing number of consortium-like studies, underlining the need for automation of the process. In this paper, we have developed a deep-learning-based automated quality control (QC) tool, QC-Automator, for dMRI data, that can handle a variety of artifacts such as motion, multiband interleaving, ghosting, susceptibility, herringbone and chemical shifts. QC-Automator uses convolutional neural networks along with transfer learning to train the automated artifact detection on a labeled dataset of ~332000 slices of dMRI data, from 155 unique subjects and 5 scanners with different dMRI acquisitions, achieving a 98% accuracy in detecting artifacts. The method is fast and paves the way for efficient and effective artifact detection in large datasets. It is also demonstrated to be replicable on other datasets with different acquisition parameters. |
Tasks | Transfer Learning |
Published | 2019-11-15 |
URL | https://arxiv.org/abs/1911.06816v1 |
https://arxiv.org/pdf/1911.06816v1.pdf | |
PWC | https://paperswithcode.com/paper/qc-automator-deep-learning-based-automated |
Repo | |
Framework | |
PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
Title | PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer |
Authors | Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, Shuicheng Yan |
Abstract | In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image. Existing methods have achieved promising progress in constrained scenarios, but transferring between images with large pose and expression differences is still challenging. Besides, they cannot realize customizable transfer that allows a controllable shade of makeup or specifies the part to transfer, which limits their applications. To address these issues, we propose Pose and expression robust Spatial-aware GAN (PSGAN). It first utilizes Makeup Distill Network to disentangle the makeup of the reference image as two spatial-aware makeup matrices. Then, Attentive Makeup Morphing module is introduced to specify how the makeup of a pixel in the source image is morphed from the reference image. With the makeup matrices and the source image, Makeup Apply Network is used to perform makeup transfer. Our PSGAN not only achieves state-of-the-art results even when large pose and expression differences exist but also is able to perform partial and shade-controllable makeup transfer. We also collected a dataset containing facial images with various poses and expressions for evaluations. |
Tasks | |
Published | 2019-09-16 |
URL | https://arxiv.org/abs/1909.06956v2 |
https://arxiv.org/pdf/1909.06956v2.pdf | |
PWC | https://paperswithcode.com/paper/psgan-pose-robust-spatial-aware-gan-for |
Repo | |
Framework | |
Spin Detection in Robotic Table Tennis
Title | Spin Detection in Robotic Table Tennis |
Authors | Jonas Tebbe, Lukas Klamt, Yapeng Gao, Andreas Zell |
Abstract | In table tennis, the rotation (spin) of the ball plays a crucial role. A table tennis match will feature a variety of strokes. Each generates different amounts and types of spin. To develop a robot that can compete with a human player, the robot needs to detect spin, so it can plan an appropriate return stroke. In this paper we compare three methods to estimate spin. The first two approaches use a high-speed camera that captures the ball in flight at a frame rate of 380 Hz. This camera allows the movement of the circular brand logo printed on the ball to be seen. The first approach uses background difference to determine the position of the logo. In a second alternative, we train a CNN to predict the orientation of the logo. The third method evaluates the trajectory of the ball and derives the rotation from the effect of the Magnus force. This method gives the highest accuracy and is used for a demonstration. Our robot successfully copes with different spin types in a real table tennis rally against a human opponent. |
Tasks | |
Published | 2019-05-20 |
URL | https://arxiv.org/abs/1905.07967v2 |
https://arxiv.org/pdf/1905.07967v2.pdf | |
PWC | https://paperswithcode.com/paper/spin-detection-in-robotic-table-tennis |
Repo | |
Framework | |
Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data
Title | Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data |
Authors | Zhuoxun He, Lingxi Xie, Xin Chen, Ya Zhang, Yanfeng Wang, Qi Tian |
Abstract | Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed improved accuracy, yet we notice that these methods augment data have also caused a considerable gap between clean and augmented data. In this paper, we revisit this problem from an analytical perspective, for which we estimate the upper-bound of expected risk using two terms, namely, empirical risk and generalization error, respectively. We develop an understanding of data augmentation as regularization, which highlights the major features. As a result, data augmentation significantly reduces the generalization error, but meanwhile leads to a slightly higher empirical risk. On the assumption that data augmentation helps models converge to a better region, the model can benefit from a lower empirical risk achieved by a simple method, i.e., using less-augmented data to refine the model trained on fully-augmented data. Our approach achieves consistent accuracy gain on a few standard image classification benchmarks, and the gain transfers to object detection. |
Tasks | Data Augmentation, Image Classification, Object Detection |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09148v2 |
https://arxiv.org/pdf/1909.09148v2.pdf | |
PWC | https://paperswithcode.com/paper/data-augmentation-revisited-rethinking-the |
Repo | |
Framework | |
Lautum Regularization for Semi-supervised Transfer Learning
Title | Lautum Regularization for Semi-supervised Transfer Learning |
Authors | Daniel Jakubovitz, Miguel R. D. Rodrigues, Raja Giryes |
Abstract | Transfer learning is a very important tool in deep learning as it allows propagating information from one “source dataset” to another “target dataset”, especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest a novel information theoretic approach for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by imposing a Lautum information based regularization that relates the network weights to the target data. We demonstrate the effectiveness of the proposed approach in various transfer learning experiments. |
Tasks | Transfer Learning |
Published | 2019-04-02 |
URL | https://arxiv.org/abs/1904.01670v3 |
https://arxiv.org/pdf/1904.01670v3.pdf | |
PWC | https://paperswithcode.com/paper/lautum-regularization-for-semi-supervised |
Repo | |
Framework | |
Split Q Learning: Reinforcement Learning with Two-Stream Rewards
Title | Split Q Learning: Reinforcement Learning with Two-Stream Rewards |
Authors | Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi |
Abstract | Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson’s and Alzheimer’s diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems. |
Tasks | Decision Making, Q-Learning, Recommendation Systems |
Published | 2019-06-21 |
URL | https://arxiv.org/abs/1906.12350v2 |
https://arxiv.org/pdf/1906.12350v2.pdf | |
PWC | https://paperswithcode.com/paper/split-q-learning-reinforcement-learning-with |
Repo | |
Framework | |
Cognitive swarming in complex environments with attractor dynamics and oscillatory computing
Title | Cognitive swarming in complex environments with attractor dynamics and oscillatory computing |
Authors | Joseph D. Monaco, Grace M. Hwang, Kevin M. Schultz, Kechen Zhang |
Abstract | Neurobiological theories of spatial cognition developed with respect to recording data from relatively small and/or simplistic environments compared to animals’ natural habitats. It has been unclear how to extend theoretical models to large or complex spaces. Complementarily, in autonomous systems technology, applications have been growing for distributed control methods that scale to large numbers of low-footprint mobile platforms. Animals and many-robot groups must solve common problems of navigating complex and uncertain environments. Here, we introduce the ‘NeuroSwarms’ control framework to investigate whether adaptive, autonomous swarm control of minimal artificial agents can be achieved by direct analogy to neural circuits of rodent spatial cognition. NeuroSwarms analogizes agents to neurons and swarming groups to recurrent networks. We implemented neuron-like agent interactions in which mutually visible agents operate as if they were reciprocally-connected place cells in an attractor network. We attributed a phase state to agents to enable patterns of oscillatory synchronization similar to hippocampal models of theta-rhythmic (5-12 Hz) sequence generation. We demonstrate that multi-agent swarming and reward-approach dynamics can be expressed as a mobile form of Hebbian learning and that NeuroSwarms supports a single-entity paradigm that directly informs theoretical models of animal cognition. We present emergent behaviors including phase-organized rings and trajectory sequences that interact with environmental cues and geometry in large, fragmented mazes. Thus, NeuroSwarms is a model artificial spatial system that integrates autonomous control and theoretical neuroscience to potentially uncover common principles to advance both domains. |
Tasks | |
Published | 2019-09-15 |
URL | https://arxiv.org/abs/1909.06711v1 |
https://arxiv.org/pdf/1909.06711v1.pdf | |
PWC | https://paperswithcode.com/paper/cognitive-swarming-in-complex-environments |
Repo | |
Framework | |
Aerodynamic Data Fusion Towards the Digital Twin Paradigm
Title | Aerodynamic Data Fusion Towards the Digital Twin Paradigm |
Authors | S. Ashwin Renganathan, Kohei Harada, Dimitri N. Mavris |
Abstract | We consider the fusion of two aerodynamic data sets originating from differing fidelity physical or computer experiments. We specifically address the fusion of: 1) noisy and in-complete fields from wind tunnel measurements and 2) deterministic but biased fields from numerical simulations. These two data sources are fused in order to estimate the \emph{true} field that best matches measured quantities that serves as the ground truth. For example, two sources of pressure fields about an aircraft are fused based on measured forces and moments from a wind-tunnel experiment. A fundamental challenge in this problem is that the true field is unknown and can not be estimated with 100% certainty. We employ a Bayesian framework to infer the true fields conditioned on measured quantities of interest; essentially we perform a \emph{statistical correction} to the data. The fused data may then be used to construct more accurate surrogate models suitable for early stages of aerospace design. We also introduce an extension of the Proper Orthogonal Decomposition with constraints to solve the same problem. Both methods are demonstrated on fusing the pressure distributions for flow past the RAE2822 airfoil and the Common Research Model wing at transonic conditions. Comparison of both methods reveal that the Bayesian method is more robust when data is scarce while capable of also accounting for uncertainties in the data. Furthermore, given adequate data, the POD based and Bayesian approaches lead to \emph{similar} results. |
Tasks | |
Published | 2019-11-02 |
URL | https://arxiv.org/abs/1911.02924v1 |
https://arxiv.org/pdf/1911.02924v1.pdf | |
PWC | https://paperswithcode.com/paper/aerodynamic-data-fusion-towards-the-digital |
Repo | |
Framework | |
SalGaze: Personalizing Gaze Estimation Using Visual Saliency
Title | SalGaze: Personalizing Gaze Estimation Using Visual Saliency |
Authors | Zhuoqing Chang, Matias Di Martino, Qiang Qiu, Steven Espinosa, Guillermo Sapiro |
Abstract | Traditional gaze estimation methods typically require explicit user calibration to achieve high accuracy. This process is cumbersome and recalibration is often required when there are changes in factors such as illumination and pose. To address this challenge, we introduce SalGaze, a framework that utilizes saliency information in the visual content to transparently adapt the gaze estimation algorithm to the user without explicit user calibration. We design an algorithm to transform a saliency map into a differentiable loss map that can be used for the optimization of CNN-based models. SalGaze is also able to greatly augment standard point calibration data with implicit video saliency calibration data using a unified framework. We show accuracy improvements over 24% using our technique on existing methods. |
Tasks | Calibration, Gaze Estimation |
Published | 2019-10-23 |
URL | https://arxiv.org/abs/1910.10603v1 |
https://arxiv.org/pdf/1910.10603v1.pdf | |
PWC | https://paperswithcode.com/paper/salgaze-personalizing-gaze-estimation-using |
Repo | |
Framework | |
Online Algorithm for Unsupervised Sensor Selection
Title | Online Algorithm for Unsupervised Sensor Selection |
Authors | Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama |
Abstract | In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions cannot be evaluated since the ground truth annotations may not be available. Our objective is to learn strategies for selecting a test that gives the best trade-off between accuracy and costs in such Unsupervised Sensor Selection (USS) problems. Clearly, learning is feasible only if ground truth can be inferred (explicitly or implicitly) from the problem structure. It is observed that this happens if the problem satisfies the ‘Weak Dominance’ (WD) property. We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property. We argue that our algorithm is optimal and evaluate its performance on problem instances generated from synthetic and real-world datasets. |
Tasks | |
Published | 2019-01-15 |
URL | http://arxiv.org/abs/1901.04676v2 |
http://arxiv.org/pdf/1901.04676v2.pdf | |
PWC | https://paperswithcode.com/paper/online-algorithm-for-unsupervised-sensor |
Repo | |
Framework | |
Learning Disentangled Representations via Mutual Information Estimation
Title | Learning Disentangled Representations via Mutual Information Estimation |
Authors | Eduardo Hugo Sanchez, Mathieu Serrurier, Mathias Ortner |
Abstract | In this paper, we investigate the problem of learning disentangled representations. Given a pair of images sharing some attributes, we aim to create a low-dimensional representation which is split into two parts: a shared representation that captures the common information between the images and an exclusive representation that contains the specific information of each image. To address this issue, we propose a model based on mutual information estimation without relying on image reconstruction or image generation. Mutual information maximization is performed to capture the attributes of data in the shared and exclusive representations while we minimize the mutual information between the shared and exclusive representation to enforce representation disentanglement. We show that these representations are useful to perform downstream tasks such as image classification and image retrieval based on the shared or exclusive component. Moreover, classification results show that our model outperforms the state-of-the-art model based on VAE/GAN approaches in representation disentanglement. |
Tasks | Image Classification, Image Generation, Image Reconstruction, Image Retrieval |
Published | 2019-12-09 |
URL | https://arxiv.org/abs/1912.03915v1 |
https://arxiv.org/pdf/1912.03915v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-disentangled-representations-via-1 |
Repo | |
Framework | |