January 29, 2020

3379 words 16 mins read

Paper Group ANR 527

Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception. Towards Understanding the Importance of Shortcut Connections in Residual Networks. Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training. Facial Makeup Transfer Combining Illumination Transfe …

Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception


Title	Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception
Authors	Léni K. Le Goff, Ghanim Mukhtar, Alexandre Coninx, Stéphane Doncieux
Abstract	To solve its task, a robot needs to have the ability to interpret its perceptions. In vision, this interpretation is particularly difficult and relies on the understanding of the structure of the scene, at least to the extent of its task and sensorimotor abilities. A robot with the ability to build and adapt this interpretation process according to its own tasks and capabilities would push away the limits of what robots can achieve in a non controlled environment. A solution is to provide the robot with processes to build such representations that are not specific to an environment or a situation. A lot of works focus on objects segmentation, recognition and manipulation. Defining an object solely on the basis of its visual appearance is challenging given the wide range of possible objects and environments. Therefore, current works make simplifying assumptions about the structure of a scene. Such assumptions reduce the adaptivity of the object extraction process to the environments in which the assumption holds. To limit such assumptions, we introduce an exploration method aimed at identifying moveable elements in a scene without considering the concept of object. By using the interactive perception framework, we aim at bootstrapping the acquisition process of a representation of the environment with a minimum of context specific assumptions. The robotic system builds a perceptual map called relevance map which indicates the moveable parts of the current scene. A classifier is trained online to predict the category of each region (moveable or non-moveable). It is also used to select a region with which to interact, with the goal of minimizing the uncertainty of the classification. A specific classifier is introduced to fit these needs: the collaborative mixture models classifier. The method is tested on a set of scenarios of increasing complexity, using both simulations and a PR2 robot.
Tasks
Published	2019-01-30
URL	http://arxiv.org/abs/1901.10968v1
PDF	http://arxiv.org/pdf/1901.10968v1.pdf
PWC	https://paperswithcode.com/paper/bootstrapping-robotic-ecological-perception
Repo
Framework

Towards Understanding the Importance of Shortcut Connections in Residual Networks


Title	Towards Understanding the Importance of Shortcut Connections in Residual Networks
Authors	Tianyi Liu, Minshuo Chen, Mo Zhou, Simon S. Du, Enlu Zhou, Tuo Zhao
Abstract	Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.
Tasks
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04653v3
PDF	https://arxiv.org/pdf/1909.04653v3.pdf
PWC	https://paperswithcode.com/paper/towards-understanding-the-importance-of
Repo
Framework

Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training


Title	Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
Authors	Astik Biswas, Raghav Menon, Ewald van der Westhuizen, Thomas Niesler
Abstract	We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying semi-supervised training to 17.55 hours of untranscribed speech. We make use of factorised time-delay neural networks (TDNN-F) for acoustic modelling, since these have recently been shown to be effective in resource-scarce situations. Three semi-supervised training passes were performed, where the decoded output from each pass was used for acoustic model training in the subsequent pass. The automatic transcriptions from the best performing pass were used for language model augmentation. To ensure the quality of automatic transcriptions, decoder confidence is used as a threshold. The acoustic and language models obtained from the semi-supervised approach show significant improvement in terms of WER and perplexity compared to the baseline. Incorporating the automatically generated transcriptions yields a 6.55% improvement in language model perplexity. The use of 17.55 hour of Somali acoustic data in semi-supervised training shows an improvement of 7.74% relative over the baseline.
Tasks	Acoustic Modelling, Keyword Spotting, Language Modelling, Speech Recognition
Published	2019-07-06
URL	https://arxiv.org/abs/1907.03064v1
PDF	https://arxiv.org/pdf/1907.03064v1.pdf
PWC	https://paperswithcode.com/paper/improved-low-resource-somali-speech
Repo
Framework

Facial Makeup Transfer Combining Illumination Transfer


Title	Facial Makeup Transfer Combining Illumination Transfer
Authors	Xin Jin, Rui Han, Ning Ning, Xiaodong Li, Xiaokun Zhang
Abstract	To meet the women appearance needs, we present a novel virtual experience approach of facial makeup transfer, developed into windows platform application software. The makeup effects could present on the user’s input image in real time, with an only single reference image. The input image and reference image are divided into three layers by facial feature points landmarked: facial structure layer, facial color layer, and facial detail layer. Except for the above layers are processed by different algorithms to generate output image, we also add illumination transfer, so that the illumination effect of the reference image is automatically transferred to the input image. Our approach has the following three advantages: (1) Black or dark and white facial makeup could be effectively transferred by introducing illumination transfer; (2) Efficiently transfer facial makeup within seconds compared to those methods based on deep learning frameworks; (3) Reference images with the air-bangs could transfer makeup perfectly.
Tasks	Facial Makeup Transfer
Published	2019-07-08
URL	https://arxiv.org/abs/1907.03398v1
PDF	https://arxiv.org/pdf/1907.03398v1.pdf
PWC	https://paperswithcode.com/paper/facial-makeup-transfer-combining-illumination
Repo
Framework

QC-Automator: Deep Learning-based Automated Quality Control for Diffusion MR Images


Title	QC-Automator: Deep Learning-based Automated Quality Control for Diffusion MR Images
Authors	Zahra Riahi Samani, Jacob Antony Alappatt, Drew Parker, Abdol Aziz Ould Ismail, Ragini Verma
Abstract	Quality assessment of diffusion MRI (dMRI) data is essential prior to any analysis, so that appropriate pre-processing can be used to improve data quality and ensure that the presence of MRI artifacts do not affect the results of subsequent image analysis. Manual quality assessment of the data is subjective, possibly error-prone, and infeasible, especially considering the growing number of consortium-like studies, underlining the need for automation of the process. In this paper, we have developed a deep-learning-based automated quality control (QC) tool, QC-Automator, for dMRI data, that can handle a variety of artifacts such as motion, multiband interleaving, ghosting, susceptibility, herringbone and chemical shifts. QC-Automator uses convolutional neural networks along with transfer learning to train the automated artifact detection on a labeled dataset of ~332000 slices of dMRI data, from 155 unique subjects and 5 scanners with different dMRI acquisitions, achieving a 98% accuracy in detecting artifacts. The method is fast and paves the way for efficient and effective artifact detection in large datasets. It is also demonstrated to be replicable on other datasets with different acquisition parameters.
Tasks	Transfer Learning
Published	2019-11-15
URL	https://arxiv.org/abs/1911.06816v1
PDF	https://arxiv.org/pdf/1911.06816v1.pdf
PWC	https://paperswithcode.com/paper/qc-automator-deep-learning-based-automated
Repo
Framework

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer


Title	PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer
Authors	Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, Shuicheng Yan
Abstract	In this paper, we address the makeup transfer task, which aims to transfer the makeup from a reference image to a source image. Existing methods have achieved promising progress in constrained scenarios, but transferring between images with large pose and expression differences is still challenging. Besides, they cannot realize customizable transfer that allows a controllable shade of makeup or specifies the part to transfer, which limits their applications. To address these issues, we propose Pose and expression robust Spatial-aware GAN (PSGAN). It first utilizes Makeup Distill Network to disentangle the makeup of the reference image as two spatial-aware makeup matrices. Then, Attentive Makeup Morphing module is introduced to specify how the makeup of a pixel in the source image is morphed from the reference image. With the makeup matrices and the source image, Makeup Apply Network is used to perform makeup transfer. Our PSGAN not only achieves state-of-the-art results even when large pose and expression differences exist but also is able to perform partial and shade-controllable makeup transfer. We also collected a dataset containing facial images with various poses and expressions for evaluations.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.06956v2
PDF	https://arxiv.org/pdf/1909.06956v2.pdf
PWC	https://paperswithcode.com/paper/psgan-pose-robust-spatial-aware-gan-for
Repo
Framework

Spin Detection in Robotic Table Tennis


Title	Spin Detection in Robotic Table Tennis
Authors	Jonas Tebbe, Lukas Klamt, Yapeng Gao, Andreas Zell
Abstract	In table tennis, the rotation (spin) of the ball plays a crucial role. A table tennis match will feature a variety of strokes. Each generates different amounts and types of spin. To develop a robot that can compete with a human player, the robot needs to detect spin, so it can plan an appropriate return stroke. In this paper we compare three methods to estimate spin. The first two approaches use a high-speed camera that captures the ball in flight at a frame rate of 380 Hz. This camera allows the movement of the circular brand logo printed on the ball to be seen. The first approach uses background difference to determine the position of the logo. In a second alternative, we train a CNN to predict the orientation of the logo. The third method evaluates the trajectory of the ball and derives the rotation from the effect of the Magnus force. This method gives the highest accuracy and is used for a demonstration. Our robot successfully copes with different spin types in a real table tennis rally against a human opponent.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.07967v2
PDF	https://arxiv.org/pdf/1905.07967v2.pdf
PWC	https://paperswithcode.com/paper/spin-detection-in-robotic-table-tennis
Repo
Framework

Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data


Title	Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data
Authors	Zhuoxun He, Lingxi Xie, Xin Chen, Ya Zhang, Yanfeng Wang, Qi Tian
Abstract	Data augmentation has been widely applied as an effective methodology to improve generalization in particular when training deep neural networks. Recently, researchers proposed a few intensive data augmentation techniques, which indeed improved accuracy, yet we notice that these methods augment data have also caused a considerable gap between clean and augmented data. In this paper, we revisit this problem from an analytical perspective, for which we estimate the upper-bound of expected risk using two terms, namely, empirical risk and generalization error, respectively. We develop an understanding of data augmentation as regularization, which highlights the major features. As a result, data augmentation significantly reduces the generalization error, but meanwhile leads to a slightly higher empirical risk. On the assumption that data augmentation helps models converge to a better region, the model can benefit from a lower empirical risk achieved by a simple method, i.e., using less-augmented data to refine the model trained on fully-augmented data. Our approach achieves consistent accuracy gain on a few standard image classification benchmarks, and the gain transfers to object detection.
Tasks	Data Augmentation, Image Classification, Object Detection
Published	2019-09-19
URL	https://arxiv.org/abs/1909.09148v2
PDF	https://arxiv.org/pdf/1909.09148v2.pdf
PWC	https://paperswithcode.com/paper/data-augmentation-revisited-rethinking-the
Repo
Framework

Lautum Regularization for Semi-supervised Transfer Learning


Title	Lautum Regularization for Semi-supervised Transfer Learning
Authors	Daniel Jakubovitz, Miguel R. D. Rodrigues, Raja Giryes
Abstract	Transfer learning is a very important tool in deep learning as it allows propagating information from one “source dataset” to another “target dataset”, especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest a novel information theoretic approach for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during the network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by imposing a Lautum information based regularization that relates the network weights to the target data. We demonstrate the effectiveness of the proposed approach in various transfer learning experiments.
Tasks	Transfer Learning
Published	2019-04-02
URL	https://arxiv.org/abs/1904.01670v3
PDF	https://arxiv.org/pdf/1904.01670v3.pdf
PWC	https://paperswithcode.com/paper/lautum-regularization-for-semi-supervised
Repo
Framework

Split Q Learning: Reinforcement Learning with Two-Stream Rewards


Title	Split Q Learning: Reinforcement Learning with Two-Stream Rewards
Authors	Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi
Abstract	Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for a reinforcement learning problem, which extends the standard Q-learning approach to incorporate a two-stream framework of reward processing with biases biologically associated with several neurological and psychiatric conditions, including Parkinson’s and Alzheimer’s diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. For AI community, the development of agents that react differently to different types of rewards can enable us to understand a wide spectrum of multi-agent interactions in complex real-world socioeconomic systems. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions and user preferences in long-term recommendation systems.
Tasks	Decision Making, Q-Learning, Recommendation Systems
Published	2019-06-21
URL	https://arxiv.org/abs/1906.12350v2
PDF	https://arxiv.org/pdf/1906.12350v2.pdf
PWC	https://paperswithcode.com/paper/split-q-learning-reinforcement-learning-with
Repo
Framework

Cognitive swarming in complex environments with attractor dynamics and oscillatory computing


Title	Cognitive swarming in complex environments with attractor dynamics and oscillatory computing
Authors	Joseph D. Monaco, Grace M. Hwang, Kevin M. Schultz, Kechen Zhang
Abstract	Neurobiological theories of spatial cognition developed with respect to recording data from relatively small and/or simplistic environments compared to animals’ natural habitats. It has been unclear how to extend theoretical models to large or complex spaces. Complementarily, in autonomous systems technology, applications have been growing for distributed control methods that scale to large numbers of low-footprint mobile platforms. Animals and many-robot groups must solve common problems of navigating complex and uncertain environments. Here, we introduce the ‘NeuroSwarms’ control framework to investigate whether adaptive, autonomous swarm control of minimal artificial agents can be achieved by direct analogy to neural circuits of rodent spatial cognition. NeuroSwarms analogizes agents to neurons and swarming groups to recurrent networks. We implemented neuron-like agent interactions in which mutually visible agents operate as if they were reciprocally-connected place cells in an attractor network. We attributed a phase state to agents to enable patterns of oscillatory synchronization similar to hippocampal models of theta-rhythmic (5-12 Hz) sequence generation. We demonstrate that multi-agent swarming and reward-approach dynamics can be expressed as a mobile form of Hebbian learning and that NeuroSwarms supports a single-entity paradigm that directly informs theoretical models of animal cognition. We present emergent behaviors including phase-organized rings and trajectory sequences that interact with environmental cues and geometry in large, fragmented mazes. Thus, NeuroSwarms is a model artificial spatial system that integrates autonomous control and theoretical neuroscience to potentially uncover common principles to advance both domains.
Tasks
Published	2019-09-15
URL	https://arxiv.org/abs/1909.06711v1
PDF	https://arxiv.org/pdf/1909.06711v1.pdf
PWC	https://paperswithcode.com/paper/cognitive-swarming-in-complex-environments
Repo
Framework

Aerodynamic Data Fusion Towards the Digital Twin Paradigm


Title	Aerodynamic Data Fusion Towards the Digital Twin Paradigm
Authors	S. Ashwin Renganathan, Kohei Harada, Dimitri N. Mavris
Abstract	We consider the fusion of two aerodynamic data sets originating from differing fidelity physical or computer experiments. We specifically address the fusion of: 1) noisy and in-complete fields from wind tunnel measurements and 2) deterministic but biased fields from numerical simulations. These two data sources are fused in order to estimate the \emph{true} field that best matches measured quantities that serves as the ground truth. For example, two sources of pressure fields about an aircraft are fused based on measured forces and moments from a wind-tunnel experiment. A fundamental challenge in this problem is that the true field is unknown and can not be estimated with 100% certainty. We employ a Bayesian framework to infer the true fields conditioned on measured quantities of interest; essentially we perform a \emph{statistical correction} to the data. The fused data may then be used to construct more accurate surrogate models suitable for early stages of aerospace design. We also introduce an extension of the Proper Orthogonal Decomposition with constraints to solve the same problem. Both methods are demonstrated on fusing the pressure distributions for flow past the RAE2822 airfoil and the Common Research Model wing at transonic conditions. Comparison of both methods reveal that the Bayesian method is more robust when data is scarce while capable of also accounting for uncertainties in the data. Furthermore, given adequate data, the POD based and Bayesian approaches lead to \emph{similar} results.
Tasks
Published	2019-11-02
URL	https://arxiv.org/abs/1911.02924v1
PDF	https://arxiv.org/pdf/1911.02924v1.pdf
PWC	https://paperswithcode.com/paper/aerodynamic-data-fusion-towards-the-digital
Repo
Framework

SalGaze: Personalizing Gaze Estimation Using Visual Saliency


Title	SalGaze: Personalizing Gaze Estimation Using Visual Saliency
Authors	Zhuoqing Chang, Matias Di Martino, Qiang Qiu, Steven Espinosa, Guillermo Sapiro
Abstract	Traditional gaze estimation methods typically require explicit user calibration to achieve high accuracy. This process is cumbersome and recalibration is often required when there are changes in factors such as illumination and pose. To address this challenge, we introduce SalGaze, a framework that utilizes saliency information in the visual content to transparently adapt the gaze estimation algorithm to the user without explicit user calibration. We design an algorithm to transform a saliency map into a differentiable loss map that can be used for the optimization of CNN-based models. SalGaze is also able to greatly augment standard point calibration data with implicit video saliency calibration data using a unified framework. We show accuracy improvements over 24% using our technique on existing methods.
Tasks	Calibration, Gaze Estimation
Published	2019-10-23
URL	https://arxiv.org/abs/1910.10603v1
PDF	https://arxiv.org/pdf/1910.10603v1.pdf
PWC	https://paperswithcode.com/paper/salgaze-personalizing-gaze-estimation-using
Repo
Framework

Online Algorithm for Unsupervised Sensor Selection


Title	Online Algorithm for Unsupervised Sensor Selection
Authors	Arun Verma, Manjesh K. Hanawal, Csaba Szepesvári, Venkatesh Saligrama
Abstract	In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions cannot be evaluated since the ground truth annotations may not be available. Our objective is to learn strategies for selecting a test that gives the best trade-off between accuracy and costs in such Unsupervised Sensor Selection (USS) problems. Clearly, learning is feasible only if ground truth can be inferred (explicitly or implicitly) from the problem structure. It is observed that this happens if the problem satisfies the ‘Weak Dominance’ (WD) property. We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property. We argue that our algorithm is optimal and evaluate its performance on problem instances generated from synthetic and real-world datasets.
Tasks
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04676v2
PDF	http://arxiv.org/pdf/1901.04676v2.pdf
PWC	https://paperswithcode.com/paper/online-algorithm-for-unsupervised-sensor
Repo
Framework

Learning Disentangled Representations via Mutual Information Estimation


Title	Learning Disentangled Representations via Mutual Information Estimation
Authors	Eduardo Hugo Sanchez, Mathieu Serrurier, Mathias Ortner
Abstract	In this paper, we investigate the problem of learning disentangled representations. Given a pair of images sharing some attributes, we aim to create a low-dimensional representation which is split into two parts: a shared representation that captures the common information between the images and an exclusive representation that contains the specific information of each image. To address this issue, we propose a model based on mutual information estimation without relying on image reconstruction or image generation. Mutual information maximization is performed to capture the attributes of data in the shared and exclusive representations while we minimize the mutual information between the shared and exclusive representation to enforce representation disentanglement. We show that these representations are useful to perform downstream tasks such as image classification and image retrieval based on the shared or exclusive component. Moreover, classification results show that our model outperforms the state-of-the-art model based on VAE/GAN approaches in representation disentanglement.
Tasks	Image Classification, Image Generation, Image Reconstruction, Image Retrieval
Published	2019-12-09
URL	https://arxiv.org/abs/1912.03915v1
PDF	https://arxiv.org/pdf/1912.03915v1.pdf
PWC	https://paperswithcode.com/paper/learning-disentangled-representations-via-1
Repo
Framework