April 2, 2020

3295 words 16 mins read

Paper Group ANR 299

Paper Group ANR 299

The utility of tactile force to autonomous learning of in-hand manipulation is task-dependent. Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation and Database with Automatic Labelling. AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization. Character-independent font identificat …

The utility of tactile force to autonomous learning of in-hand manipulation is task-dependent

Title The utility of tactile force to autonomous learning of in-hand manipulation is task-dependent
Authors Romina Mir, Ali Marjaninejad, Francisco J. Valero-Cuevas
Abstract Tactile sensors provide information that can be used to learn and execute manipulation tasks. Different tasks, however, might require different levels of sensory information; which in turn likely affect learning rates and performance. This paper evaluates the role of tactile information on autonomous learning of manipulation with a simulated 3-finger tendon-driven hand. We compare the ability of the same learning algorithm (Proximal Policy Optimization, PPO) to learn two manipulation tasks (rolling a ball about the horizontal axis with and without rotational stiffness) with three levels of tactile sensing: no sensing, 1D normal force, and 3D force vector. Surprisingly, and contrary to recent work on manipulation, adding 1D force-sensing did not always improve learning rates compared to no sensing—likely due to whether or not normal force is relevant to the task. Nonetheless, even though 3D force-sensing increases the dimensionality of the sensory input—which would in general hamper algorithm convergence—it resulted in faster learning rates and better performance. We conclude that, in general, sensory input is useful to learning only when it is relevant to the task—as is the case of 3D force-sensing for in-hand manipulation against gravity. Moreover, the utility of 3D force-sensing can even offset the added computational cost of learning with higher-dimensional sensory input.
Tasks
Published 2020-02-05
URL https://arxiv.org/abs/2002.02418v1
PDF https://arxiv.org/pdf/2002.02418v1.pdf
PWC https://paperswithcode.com/paper/the-utility-of-tactile-force-to-autonomous
Repo
Framework

Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation and Database with Automatic Labelling

Title Enhanced Self-Perception in Mixed Reality: Egocentric Arm Segmentation and Database with Automatic Labelling
Authors Ester Gonzalez-Sosa, Pablo Perez, Ruben Tolosana, Redouane Kachach, Alvaro Villegas
Abstract In this study, we focus on the egocentric segmentation of arms to improve self-perception in Augmented Virtuality (AV). The main contributions of this work are: i) a comprehensive survey of segmentation algorithms for AV; ii) an Egocentric Arm Segmentation Dataset, composed of more than 10, 000 images, comprising variations of skin color, and gender, among others. We provide all details required for the automated generation of groundtruth and semi-synthetic images; iii) the use of deep learning for the first time for segmenting arms in AV; iv) to showcase the usefulness of this database, we report results on different real egocentric hand datasets, including GTEA Gaze+, EDSH, EgoHands, Ego Youtube Hands, THU-Read, TEgO, FPAB, and Ego Gesture, which allow for direct comparisons with existing approaches utilizing color or depth. Results confirm the suitability of the EgoArm dataset for this task, achieving improvement up to 40% with respect to the original network, depending on the particular dataset. Results also suggest that, while approaches based on color or depth can work in controlled conditions (lack of occlusion, uniform lighting, only objects of interest in the near range, controlled background, etc.), egocentric segmentation based on deep learning is more robust in real AV applications.
Tasks
Published 2020-03-27
URL https://arxiv.org/abs/2003.12352v1
PDF https://arxiv.org/pdf/2003.12352v1.pdf
PWC https://paperswithcode.com/paper/enhanced-self-perception-in-mixed-reality
Repo
Framework

AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization

Title AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization
Authors Yiming Li, Changhong Fu, Fangqiang Ding, Ziyuan Huang, Geng Lu
Abstract Most existing trackers based on discriminative correlation filters (DCF) try to introduce predefined regularization term to improve the learning of target objects, e.g., by suppressing background learning or by restricting change rate of correlation filters. However, predefined parameters introduce much effort in tuning them and they still fail to adapt to new situations that the designer did not think of. In this work, a novel approach is proposed to online automatically and adaptively learn spatio-temporal regularization term. Spatially local response map variation is introduced as spatial regularization to make DCF focus on the learning of trust-worthy parts of the object, and global response map variation determines the updating rate of the filter. Extensive experiments on four UAV benchmarks have proven the superiority of our method compared to the state-of-the-art CPU- and GPU-based trackers, with a speed of ~60 frames per second running on a single CPU. Our tracker is additionally proposed to be applied in UAV localization. Considerable tests in the indoor practical scenarios have proven the effectiveness and versatility of our localization method. The code is available at https://github.com/vision4robotics/AutoTrack.
Tasks Visual Tracking
Published 2020-03-29
URL https://arxiv.org/abs/2003.12949v1
PDF https://arxiv.org/pdf/2003.12949v1.pdf
PWC https://paperswithcode.com/paper/autotrack-towards-high-performance-visual
Repo
Framework

Character-independent font identification

Title Character-independent font identification
Authors Daichi Haraguchi, Shota Harada, Brian Kenji Iwana, Yuto Shinahara, Seiichi Uchida
Abstract There are a countless number of fonts with various shapes and styles. In addition, there are many fonts that only have subtle differences in features. Due to this, font identification is a difficult task. In this paper, we propose a method of determining if any two characters are from the same font or not. This is difficult due to the difference between fonts typically being smaller than the difference between alphabet classes. Additionally, the proposed method can be used with fonts regardless of whether they exist in the training or not. In order to accomplish this, we use a Convolutional Neural Network (CNN) trained with various font image pairs. In the experiment, the network is trained on image pairs of various fonts. We then evaluate the model on a different set of fonts that are unseen by the network. The evaluation is performed with an accuracy of 92.27%. Moreover, we analyzed the relationship between character classes and font identification accuracy.
Tasks
Published 2020-01-24
URL https://arxiv.org/abs/2001.08893v1
PDF https://arxiv.org/pdf/2001.08893v1.pdf
PWC https://paperswithcode.com/paper/character-independent-font-identification
Repo
Framework

A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions

Title A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions
Authors Wanrong Zhu, Xi Chen, Wei Biao Wu
Abstract Stochastic gradient descent (SGD) algorithm is widely used for parameter estimation especially in online setting. While this recursive algorithm is popular for computation and memory efficiency, the problem of quantifying variability and randomness of the solutions has been rarely studied. This paper aims at conducting statistical inference of SGD-based estimates in online setting. In particular, we propose a fully online estimator for the covariance matrix of averaged SGD iterates (ASGD). Based on the classic asymptotic normality results of ASGD, we construct asymptotically valid confidence intervals for model parameters. Upon receiving new observations, we can quickly update the covariance estimator and confidence intervals. This approach fits in online setting even if the total number of data is unknown and takes the full advantage of SGD: efficiency in both computation and memory.
Tasks
Published 2020-02-10
URL https://arxiv.org/abs/2002.03979v1
PDF https://arxiv.org/pdf/2002.03979v1.pdf
PWC https://paperswithcode.com/paper/a-fully-online-approach-for-covariance
Repo
Framework

Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation

Title Real-Time Semantic Segmentation via Auto Depth, Downsampling Joint Decision and Feature Aggregation
Authors Peng Sun, Jiaxiang Wu, Songyuan Li, Peiwen Lin, Junzhou Huang, Xi Li
Abstract To satisfy the stringent requirements on computational resources in the field of real-time semantic segmentation, most approaches focus on the hand-crafted design of light-weight segmentation networks. Recently, Neural Architecture Search (NAS) has been used to search for the optimal building blocks of networks automatically, but the network depth, downsampling strategy, and feature aggregation way are still set in advance by trial and error. In this paper, we propose a joint search framework, called AutoRTNet, to automate the design of these strategies. Specifically, we propose hyper-cells to jointly decide the network depth and downsampling strategy, and an aggregation cell to achieve automatic multi-scale feature aggregation. Experimental results show that AutoRTNet achieves 73.9% mIoU on the Cityscapes test set and 110.0 FPS on an NVIDIA TitanXP GPU card with 768x1536 input images.
Tasks Neural Architecture Search, Real-Time Semantic Segmentation, Semantic Segmentation
Published 2020-03-31
URL https://arxiv.org/abs/2003.14226v1
PDF https://arxiv.org/pdf/2003.14226v1.pdf
PWC https://paperswithcode.com/paper/real-time-semantic-segmentation-via-auto
Repo
Framework

Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction

Title Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and Objects for 3D Hand Pose Estimation under Hand-Object Interaction
Authors Anil Armagan, Guillermo Garcia-Hernando, Seungryul Baek, Shreyas Hampali, Mahdi Rad, Zhaohui Zhang, Shipeng Xie, MingXiu Chen, Boshen Zhang, Fu Xiong, Yang Xiao, Zhiguo Cao, Junsong Yuan, Pengfei Ren, Weiting Huang, Haifeng Sun, Marek Hrúz, Jakub Kanis, Zdeněk Krňoul, Qingfu Wan, Shile Li, Linlin Yang, Dongheui Lee, Angela Yao, Weiguo Zhou, Sijia Mei, Yunhui Liu, Adrian Spurr, Umar Iqbal, Pavlo Molchanov, Philippe Weinzaepfel, Romain Brégier, Gregory Rogez, Vincent Lepetit, Tae-Kyun Kim
Abstract In this work, we study how well different type of approaches generalise in the task of 3D hand pose estimation under hand-object interaction and single hand scenarios. We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set. Unfortunately, since the space of hand poses is highly dimensional, it is inherently not feasible to cover the whole space densely, despite recent efforts in collecting large-scale training datasets. This sampling problem is even more severe when hands are interacting with objects and/or inputs are RGB rather than depth images, as RGB images also vary with lighting conditions and colors. To address these issues, we designed a public challenge to evaluate the abilities of current 3D hand pose estimators~(HPEs) to interpolate and extrapolate the poses of a training set. More exactly, our challenge is designed (a) to evaluate the influence of both depth and color modalities on 3D hand pose estimation, under the presence or absence of objects; (b) to assess the generalisation abilities \wrt~four main axes: shapes, articulations, viewpoints, and objects; (c) to explore the use of a synthetic hand model to fill the gaps of current datasets. Through the challenge, the overall accuracy has dramatically improved over the baseline, especially on extrapolation tasks, from 27mm to 13mm mean joint error. Our analyses highlight the impacts of: Data pre-processing, ensemble approaches, the use of MANO model, and different HPE methods/backbones.
Tasks Hand Pose Estimation, Pose Estimation
Published 2020-03-30
URL https://arxiv.org/abs/2003.13764v1
PDF https://arxiv.org/pdf/2003.13764v1.pdf
PWC https://paperswithcode.com/paper/measuring-generalisation-to-unseen-viewpoints
Repo
Framework

Resolving Spurious Correlations in Causal Models of Environments via Interventions

Title Resolving Spurious Correlations in Causal Models of Environments via Interventions
Authors Sergei Volodin, Nevan Wichers, Jeremy Nixon
Abstract Causal models could increase interpretability, robustness to distributional shift and sample efficiency of RL agents. In this vein, we address the question of learning a causal model of an RL environment. This problem is known to be difficult due to spurious correlations. We overcome this difficulty by rewarding an RL agent for designing and executing interventions to discover the true model. We compare rewarding the agent for disproving uncertain edges in the causal graph, rewarding the agent for activating a certain node, or rewarding the agent for increasing the causal graph loss. We show that our methods result in a better causal graph than one generated by following the random policy, or a policy trained on the environment’s reward. We find that rewarding for the causal graph loss works the best.
Tasks
Published 2020-02-12
URL https://arxiv.org/abs/2002.05217v1
PDF https://arxiv.org/pdf/2002.05217v1.pdf
PWC https://paperswithcode.com/paper/resolving-spurious-correlations-in-causal
Repo
Framework

XCS Classifier System with Experience Replay

Title XCS Classifier System with Experience Replay
Authors Anthony Stein, Roland Maier, Lukas Rosenbauer, Jörg Hähner
Abstract XCS constitutes the most deeply investigated classifier system today. It bears strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks – among which the Deep-Q-Network (DQN) is a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN’s successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage stored raw experiences encountered so far. To bridge this gap, this paper investigates the benefits of extending XCS with ER. On the one hand, we demonstrate that for single-step tasks ER bears massive potential for improvements in terms of sample efficiency. On the shady side, however, we reveal that the use of ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains.
Tasks
Published 2020-02-13
URL https://arxiv.org/abs/2002.05628v1
PDF https://arxiv.org/pdf/2002.05628v1.pdf
PWC https://paperswithcode.com/paper/xcs-classifier-system-with-experience-replay
Repo
Framework

Generating Followup Questions for Interpretable Multi-hop Question Answering

Title Generating Followup Questions for Interpretable Multi-hop Question Answering
Authors Christopher Malon, Bing Bai
Abstract We propose a framework for answering open domain multi-hop questions in which partial information is read and used to generate followup questions, to finally be answered by a pretrained single-hop answer extractor. This framework makes each hop interpretable, and makes the retrieval associated with later hops as flexible and specific as for the first hop. As a first instantiation of this framework, we train a pointer-generator network to predict followup questions based on the question and partial information. This provides a novel application of a neural question generation network, which is applied to give weak ground truth single-hop followup questions based on the final answers and their supporting facts. Learning to generate followup questions that select the relevant answer spans against downstream supporting facts, while avoiding distracting premises, poses an exciting semantic challenge for text generation. We present an evaluation using the two-hop bridge questions of HotpotQA.
Tasks Question Answering, Question Generation, Text Generation
Published 2020-02-27
URL https://arxiv.org/abs/2002.12344v1
PDF https://arxiv.org/pdf/2002.12344v1.pdf
PWC https://paperswithcode.com/paper/generating-followup-questions-for
Repo
Framework

Generating Embroidery Patterns Using Image-to-Image Translation

Title Generating Embroidery Patterns Using Image-to-Image Translation
Authors Mohammad Akif Beg, Jia Yuan Yu
Abstract In many scenarios in computer vision, machine learning, and computer graphics, there is a requirement to learn the mapping from an image of one domain to an image of another domain, called Image-to-image translation. For example, style transfer, object transfiguration, visually altering the appearance of weather conditions in an image, changing the appearance of a day image into a night image or vice versa, photo enhancement, to name a few. In this paper, we propose two machine learning techniques to solve the embroidery image-to-image translation. Our goal is to generate a preview image which looks similar to an embroidered image, from a user-uploaded image. Our techniques are modifications of two existing techniques, neural style transfer, and cycle-consistent generative-adversarial network. Neural style transfer renders the semantic content of an image from one domain in the style of a different image in another domain, whereas a cycle-consistent generative adversarial network learns the mapping from an input image to output image without any paired training data, and also learn a loss function to train this mapping. Furthermore, the techniques we propose are independent of any embroidery attributes, such as elevation of the image, light-source, start, and endpoints of a stitch, type of stitch used, fabric type, etc. Given the user image, our techniques can generate a preview image which looks similar to an embroidered image. We train and test our propose techniques on an embroidery dataset which consist of simple 2D images. To do so, we prepare an unpaired embroidery dataset with more than 8000 user-uploaded images along with embroidered images. Empirical results show that these techniques successfully generate an approximate preview of an embroidered version of a user image, which can help users in decision making.
Tasks Decision Making, Image-to-Image Translation, Style Transfer
Published 2020-03-05
URL https://arxiv.org/abs/2003.02909v1
PDF https://arxiv.org/pdf/2003.02909v1.pdf
PWC https://paperswithcode.com/paper/generating-embroidery-patterns-using-image-to
Repo
Framework

HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing

Title HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing
Authors Xiyou Zhou, Zhiyu Chen, Xiaoyong Jin, William Yang Wang
Abstract Computation-intensive pretrained models have been taking the lead of many natural language processing benchmarks such as GLUE. However, energy efficiency in the process of model training and inference becomes a critical bottleneck. We introduce HULK, a multi-task energy efficiency benchmarking platform for responsible natural language processing. With HULK, we compare pretrained models’ energy efficiency from the perspectives of time and cost. Baseline benchmarking results are provided for further analysis. The fine-tuning efficiency of different pretrained models can differ a lot among different tasks and fewer parameter number does not necessarily imply better efficiency. We analyzed such phenomenon and demonstrate the method of comparing the multi-task efficiency of pretrained models. Our platform is available at https://sites.engineering.ucsb.edu/~xiyou/hulk/.
Tasks
Published 2020-02-14
URL https://arxiv.org/abs/2002.05829v1
PDF https://arxiv.org/pdf/2002.05829v1.pdf
PWC https://paperswithcode.com/paper/hulk-an-energy-efficiency-benchmark-platform
Repo
Framework

Adaptive Expansion Bayesian Optimization for Unbounded Global Optimization

Title Adaptive Expansion Bayesian Optimization for Unbounded Global Optimization
Authors Wei Chen, Mark Fuge
Abstract Bayesian optimization is normally performed within fixed variable bounds. In cases like hyperparameter tuning for machine learning algorithms, setting the variable bounds is not trivial. It is hard to guarantee that any fixed bounds will include the true global optimum. We propose a Bayesian optimization approach that only needs to specify an initial search space that does not necessarily include the global optimum, and expands the search space when necessary. However, over-exploration may occur during the search space expansion. Our method can adaptively balance exploration and exploitation in an expanding space. Results on a range of synthetic test functions and an MLP hyperparameter optimization task show that the proposed method out-performs or at least as good as the current state-of-the-art methods.
Tasks Hyperparameter Optimization
Published 2020-01-12
URL https://arxiv.org/abs/2001.04815v1
PDF https://arxiv.org/pdf/2001.04815v1.pdf
PWC https://paperswithcode.com/paper/adaptive-expansion-bayesian-optimization-for
Repo
Framework

Discriminative Adversarial Search for Abstractive Summarization

Title Discriminative Adversarial Search for Abstractive Summarization
Authors Thomas Scialom, Paul-Alexis Dray, Sylvain Lamprier, Benjamin Piwowarski, Jacopo Staiano
Abstract We introduce a novel approach for sequence decoding, Discriminative Adversarial Search (DAS), which has the desirable properties of alleviating the effects of exposure bias without requiring external metrics. Inspired by Generative Adversarial Networks (GANs), wherein a discriminator is used to improve the generator, our method differs from GANs in that the generator parameters are not updated at training time and the discriminator is only used to drive sequence generation at inference time. We investigate the effectiveness of the proposed approach on the task of Abstractive Summarization: the results obtained show that a naive application of DAS improves over the state-of-the-art methods, with further gains obtained via discriminator retraining. Moreover, we show how DAS can be effective for cross-domain adaptation. Finally, all results reported are obtained without additional rule-based filtering strategies, commonly used by the best performing systems available: this indicates that DAS can effectively be deployed without relying on post-hoc modifications of the generated outputs.
Tasks Abstractive Text Summarization, Domain Adaptation
Published 2020-02-24
URL https://arxiv.org/abs/2002.10375v1
PDF https://arxiv.org/pdf/2002.10375v1.pdf
PWC https://paperswithcode.com/paper/discriminative-adversarial-search-for
Repo
Framework

On the impressive performance of randomly weighted encoders in summarization tasks

Title On the impressive performance of randomly weighted encoders in summarization tasks
Authors Jonathan Pilault, Jaehong Park, Christopher Pal
Abstract In this work, we investigate the performance of untrained randomly initialized encoders in a general class of sequence to sequence models and compare their performance with that of fully-trained encoders on the task of abstractive summarization. We hypothesize that random projections of an input text have enough representational power to encode the hierarchical structure of sentences and semantics of documents. Using a trained decoder to produce abstractive text summaries, we empirically demonstrate that architectures with untrained randomly initialized encoders perform competitively with respect to the equivalent architectures with fully-trained encoders. We further find that the capacity of the encoder not only improves overall model generalization but also closes the performance gap between untrained randomly initialized and full-trained encoders. To our knowledge, it is the first time that general sequence to sequence models with attention are assessed for trained and randomly projected representations on abstractive summarization.
Tasks Abstractive Text Summarization
Published 2020-02-21
URL https://arxiv.org/abs/2002.09084v1
PDF https://arxiv.org/pdf/2002.09084v1.pdf
PWC https://paperswithcode.com/paper/on-the-impressive-performance-of-randomly
Repo
Framework
comments powered by Disqus