January 25, 2020

3321 words 16 mins read

Paper Group ANR 1690

Prioritized Sequence Experience Replay. Improving Neural Language Modeling via Adversarial Training. Phase Portraits as Movement Primitives for Fast Humanoid Robot Control. Gradientless Descent: High-Dimensional Zeroth-Order Optimization. Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent. A Comparative Study on Mach …

Prioritized Sequence Experience Replay


Title	Prioritized Sequence Experience Replay
Authors	Marc Brittain, Josh Bertram, Xuxi Yang, Peng Wei
Abstract	Experience replay is widely used in deep reinforcement learning algorithms and allows agents to remember and learn from experiences from the past. In an effort to learn more efficiently, researchers proposed prioritized experience replay (PER) which samples important transitions more frequently. In this paper, we propose Prioritized Sequence Experience Replay (PSER) a framework for prioritizing sequences of experience in an attempt to both learn more efficiently and to obtain better performance. We compare the performance of PER and PSER sampling techniques in a tabular Q-learning environment and in DQN on the Atari 2600 benchmark. We prove theoretically that PSER is guaranteed to converge faster than PER and empirically show PSER substantially improves upon PER.
Tasks	Q-Learning
Published	2019-05-25
URL	https://arxiv.org/abs/1905.12726v2
PDF	https://arxiv.org/pdf/1905.12726v2.pdf
PWC	https://paperswithcode.com/paper/190512726
Repo
Framework

Improving Neural Language Modeling via Adversarial Training


Title	Improving Neural Language Modeling via Adversarial Training
Authors	Dilin Wang, Chengyue Gong, Qiang Liu
Abstract	Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed-form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.07, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.
Tasks	Language Modelling, Machine Translation
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03805v2
PDF	https://arxiv.org/pdf/1906.03805v2.pdf
PWC	https://paperswithcode.com/paper/improving-neural-language-modeling-via
Repo
Framework

Phase Portraits as Movement Primitives for Fast Humanoid Robot Control


Title	Phase Portraits as Movement Primitives for Fast Humanoid Robot Control
Authors	Guilherme Maeda, Okan Koc, Jun Morimoto
Abstract	Currently, usual approaches for fast robot control are largely reliant on solving online optimal control problems. Such methods are known to be computationally intensive and sensitive to model accuracy. On the other hand, animals plan complex motor actions not only fast but seemingly with little effort even on unseen tasks. This natural sense of time and coordination motivates us to approach robot control from a motor skill learning perspective to design fast and computationally light controllers that can be learned autonomously by the robot under mild modeling assumptions. This article introduces Phase Portrait Movement Primitives (PPMP), a primitive that predicts dynamics on a low dimensional phase space which in turn is used to govern the high dimensional kinematics of the task. The stark difference with other primitive formulations is a built-in mechanism for phase prediction in the form of coupled oscillators that replaces model-based state estimators such as Kalman filters. The policy is trained by optimizing the parameters of the oscillators whose output is connected to a kinematic distribution in the form of a phase portrait. The drastic reduction in dimensionality allows us to efficiently train and execute PPMPs on a real human-sized, dual-arm humanoid upper body on a task involving 20 degrees-of-freedom. We demonstrate PPMPs in interactions requiring fast reactions times while generating anticipative pose adaptation in both discrete and cyclic tasks.
Tasks
Published	2019-12-07
URL	https://arxiv.org/abs/1912.03535v1
PDF	https://arxiv.org/pdf/1912.03535v1.pdf
PWC	https://paperswithcode.com/paper/phase-portraits-as-movement-primitives-for
Repo
Framework

Gradientless Descent: High-Dimensional Zeroth-Order Optimization


Title	Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Authors	Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi Zhang
Abstract	Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis that shows convergence within an $\epsilon$-ball of the optimum in $O(kQ\log(n)\log(R/\epsilon))$ evaluations, for any monotone transform of a smooth and strongly convex objective with latent dimension $k < n$, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on BBOB and MuJoCo benchmarks.
Tasks
Published	2019-11-14
URL	https://arxiv.org/abs/1911.06317v3
PDF	https://arxiv.org/pdf/1911.06317v3.pdf
PWC	https://paperswithcode.com/paper/gradientless-descent-high-dimensional-zeroth-1
Repo
Framework

Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent


Title	Energy-Aware Neural Architecture Optimization with Fast Splitting Steepest Descent
Authors	Dilin Wang, Meng Li, Lemeng Wu, Vikas Chandra, Qiang Liu
Abstract	Designing energy-efficient networks is of critical importance for enabling state-of-the-art deep learning in mobile and edge settings where the computation and energy budgets are highly limited. Recently, Wu et al. (2019) framed the search of efficient neural architectures into a continuous splitting process: it iteratively splits existing neurons into multiple off-springs to achieve progressive loss minimization, thus finding novel architectures by gradually growing the neural network. However, this method was not specifically tailored for designing energy-efficient networks, and is computationally expensive on large-scale benchmarks. In this work, we substantially improve Wu et al. (2019) in two significant ways: 1) we incorporate the energy cost of splitting different neurons to better guide the splitting process, thereby discovering more energy-efficient network architectures; 2) we substantially speed up the splitting process of Wu et al. (2019), which requires expensive eigen-decomposition, by proposing a highly scalable Rayleigh-quotient stochastic gradient algorithm. Our fast algorithm allows us to reduce the computational cost of splitting to the same level of typical back-propagation updates and enables efficient implementation on GPU. Extensive empirical results show that our method can train highly accurate and energy-efficient networks on challenging datasets such as ImageNet, improving a variety of baselines, including the pruning-based methods and expert-designed architectures.
Tasks
Published	2019-10-07
URL	https://arxiv.org/abs/1910.03103v1
PDF	https://arxiv.org/pdf/1910.03103v1.pdf
PWC	https://paperswithcode.com/paper/energy-aware-neural-architecture-optimization
Repo
Framework

A Comparative Study on Machine Learning Algorithms for the Control of a Wall Following Robot


Title	A Comparative Study on Machine Learning Algorithms for the Control of a Wall Following Robot
Authors	Issam Hammad, Kamal El-Sankary, Jason Gu
Abstract	A comparison of the performance of various machine learning models to predict the direction of a wall following robot is presented in this paper. The models were trained using an open-source dataset that contains 24 ultrasound sensors readings and the corresponding direction for each sample. This dataset was captured using SCITOS G5 mobile robot by placing the sensors on the robot waist. In addition to the full format with 24 sensors per record, the dataset has two simplified formats with 4 and 2 input sensor readings per record. Several control models were proposed previously for this dataset using all three dataset formats. In this paper, two primary research contributions are presented. First, presenting machine learning models with accuracies higher than all previously proposed models for this dataset using all three formats. A perfect solution for the 4 and 2 inputs sensors formats is presented using Decision Tree Classifier by achieving a mean accuracy of 100%. On the other hand, a mean accuracy of 99.82% was achieves using the 24 sensor inputs by employing the Gradient Boost Classifier. Second, presenting a comparative study on the performance of different machine learning and deep learning algorithms on this dataset. Therefore, providing an overall insight on the performance of these algorithms for similar sensor fusion problems. All the models in this paper were evaluated using Monte-Carlo cross-validation.
Tasks	Sensor Fusion
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11856v1
PDF	https://arxiv.org/pdf/1912.11856v1.pdf
PWC	https://paperswithcode.com/paper/a-comparative-study-on-machine-learning
Repo
Framework

Learning a Domain-Invariant Embedding for Unsupervised Domain Adaptation Using Class-Conditioned Distribution Alignment


Title	Learning a Domain-Invariant Embedding for Unsupervised Domain Adaptation Using Class-Conditioned Distribution Alignment
Authors	Alex Gabourie, Mohammad Rostami, Philip Pope, Soheil Kolouri, Kyungnam Kim
Abstract	We address the problem of unsupervised domain adaptation (UDA) by learning a cross-domain agnostic embedding space, where the distance between the probability distributions of the two source and target visual domains is minimized. We use the output space of a shared cross-domain deep encoder to model the embedding space anduse the Sliced-Wasserstein Distance (SWD) to measure and minimize the distance between the embedded distributions of two source and target domains to enforce the embedding to be domain-agnostic.Additionally, we use the source domain labeled data to train a deep classifier from the embedding space to the label space to enforce the embedding space to be discriminative.As a result of this training scheme, we provide an effective solution to train the deep classification network on the source domain such that it will generalize well on the target domain, where only unlabeled training data is accessible. To mitigate the challenge of class matching, we also align corresponding classes in the embedding space by using high confidence pseudo-labels for the target domain, i.e. assigning the class for which the source classifier has a high prediction probability. We provide experimental results on UDA benchmark tasks to demonstrate that our method is effective and leads to state-of-the-art performance.
Tasks	Domain Adaptation, Unsupervised Domain Adaptation
Published	2019-07-04
URL	https://arxiv.org/abs/1907.02271v2
PDF	https://arxiv.org/pdf/1907.02271v2.pdf
PWC	https://paperswithcode.com/paper/learning-a-domain-invariant-embedding-for
Repo
Framework

Variational Quantum Algorithms for Dimensionality Reduction and Classification


Title	Variational Quantum Algorithms for Dimensionality Reduction and Classification
Authors	Jin-Min Liang, Shu-Qian Shen, Ming Li, Lei Li
Abstract	In this work, we present a quantum neighborhood preserving embedding and a quantum local discriminant embedding for dimensionality reduction and classification. We demonstrate that these two algorithms have an exponential speedup over their respectively classical counterparts. Along the way, we propose a variational quantum generalized eigenvalue solver that finds the generalized eigenvalues and eigenstates of a matrix pencil $(\mathcal{G},\mathcal{S})$. As a proof-of-principle, we implement our algorithm to solve $2^5\times2^5$ generalized eigenvalue problems. Finally, our results offer two optional outputs with quantum or classical form, which can be directly applied in another quantum or classical machine learning process.
Tasks	Dimensionality Reduction
Published	2019-10-27
URL	https://arxiv.org/abs/1910.12164v2
PDF	https://arxiv.org/pdf/1910.12164v2.pdf
PWC	https://paperswithcode.com/paper/variational-quantum-algorithms-for
Repo
Framework

Semi-Bagging Based Deep Neural Architecture to Extract Text from High Entropy Images


Title	Semi-Bagging Based Deep Neural Architecture to Extract Text from High Entropy Images
Authors	Pranay Dugar, Anirban Chatterjee, Rajesh Shreedhar Bhat, Saswata Sahoo
Abstract	Extracting texts of various size and shape from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in natural scene, etc. The existing works (based on only CNN) often perform sub-optimally when the image contains regions of high entropy having multiple objects. This paper presents an end-to-end text detection strategy combining a segmentation algorithm and an ensemble of multiple text detectors of different types to detect text in every individual image segments independently. The proposed strategy involves a super-pixel based image segmenter which splits an image into multiple regions. A convolutional deep neural architecture is developed which works on each of the segments and detects texts of multiple shapes, sizes, and structures. It outperforms the competing methods in terms of coverage in detecting texts in images especially the ones where the text of various types and sizes are compacted in a small region along with various other objects. Furthermore, the proposed text detection method along with a text recognizer outperforms the existing state-of-the-art approaches in extracting text from high entropy images. We validate the results on a dataset consisting of product images on an e-commerce website.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01284v1
PDF	https://arxiv.org/pdf/1907.01284v1.pdf
PWC	https://paperswithcode.com/paper/semi-bagging-based-deep-neural-architecture
Repo
Framework

Dynamic Region Division for Adaptive Learning Pedestrian Counting


Title	Dynamic Region Division for Adaptive Learning Pedestrian Counting
Authors	Gaoqi He, Zhenwei Ma, Binhao Huang, Bin Sheng, Yubo Yuan
Abstract	Accurate pedestrian counting algorithm is critical to eliminate insecurity in the congested public scenes. However, counting pedestrians in crowded scenes often suffer from severe perspective distortion. In this paper, basing on the straight-line double region pedestrian counting method, we propose a dynamic region division algorithm to keep the completeness of counting objects. Utilizing the object bounding boxes obtained by YoloV3 and expectation division line of the scene, the boundary for nearby region and distant one is generated under the premise of retaining whole head. Ulteriorly, appropriate learning models are applied to count pedestrians in each obtained region. In the distant region, a novel inception dilated convolutional neural network is proposed to solve the problem of choosing dilation rate. In the nearby region, YoloV3 is used for detecting the pedestrian in multi-scale. Accordingly, the total number of pedestrians in each frame is obtained by fusing the result in nearby and distant regions. A typical subway pedestrian video dataset is chosen to conduct experiment in this paper. The result demonstrate that proposed algorithm is superior to existing machine learning based methods in general performance.
Tasks
Published	2019-08-12
URL	https://arxiv.org/abs/1908.03978v1
PDF	https://arxiv.org/pdf/1908.03978v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-region-division-for-adaptive-learning
Repo
Framework

Fairest of Them All: Establishing a Strong Baseline for Cross-Domain Person ReID


Title	Fairest of Them All: Establishing a Strong Baseline for Cross-Domain Person ReID
Authors	Devinder Kumar, Parthipan Siva, Paul Marchwica, Alexander Wong
Abstract	Person re-identification (ReID) remains a very difficult challenge in computer vision, and critical for large-scale video surveillance scenarios where an individual could appear in different camera views at different times. There has been recent interest in tackling this challenge using cross-domain approaches, which leverages data from source domains that are different than the target domain. Such approaches are more practical for real-world widespread deployment given that they don’t require on-site training (as with unsupervised or domain transfer approaches) or on-site manual annotation and training (as with supervised approaches). In this study, we take a systematic approach to establishing a large baseline source domain and target domain for cross-domain person ReID. We accomplish this by conducting a comprehensive analysis to study the similarities between source domains proposed in literature, and studying the effects of incrementally increasing the size of the source domain. This allows us to establish a balanced source domain and target domain split that promotes variety in both source and target domains. Furthermore, using lessons learned from the state-of-the-art supervised person re-identification methods, we establish a strong baseline method for cross-domain person ReID. Experiments show that a source domain composed of two of the largest person ReID domains (SYSU and MSMT) performs well across six commonly-used target domains. Furthermore, we show that, surprisingly, two of the recent commonly-used domains (PRID and GRID) have too few query images to provide meaningful insights. As such, based on our findings, we propose the following balanced baseline for cross-domain person ReID consisting of: i) a fixed multi-source domain consisting of SYSU, MSMT, Airport and 3DPeS, and ii) a multi-target domain consisting of Market-1501, DukeMTMC-reID, CUHK03, PRID, GRID and VIPeR.
Tasks	Person Re-Identification
Published	2019-07-28
URL	https://arxiv.org/abs/1907.12016v2
PDF	https://arxiv.org/pdf/1907.12016v2.pdf
PWC	https://paperswithcode.com/paper/fairest-of-them-all-establishing-a-strong
Repo
Framework

Automated Machine Learning in Practice: State of the Art and Recent Results


Title	Automated Machine Learning in Practice: State of the Art and Recent Results
Authors	Lukas Tuggener, Mohammadreza Amirian, Katharina Rombach, Stefan Lörwald, Anastasia Varlet, Christian Westermann, Thilo Stadelmann
Abstract	A main driver behind the digitization of industry and society is the belief that data-driven model building and decision making can contribute to higher degrees of automation and more informed decisions. Building such models from data often involves the application of some form of machine learning. Thus, there is an ever growing demand in work force with the necessary skill set to do so. This demand has given rise to a new research topic concerned with fitting machine learning models fully automatically - AutoML. This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.
Tasks	AutoML, Decision Making
Published	2019-07-19
URL	https://arxiv.org/abs/1907.08392v1
PDF	https://arxiv.org/pdf/1907.08392v1.pdf
PWC	https://paperswithcode.com/paper/automated-machine-learning-in-practice-state
Repo
Framework

A Riemanian Approach to Blob Detection in Manifold-Valued Images


Title	A Riemanian Approach to Blob Detection in Manifold-Valued Images
Authors	Aleksei Shestov, Mikhail Kumskov
Abstract	This paper is devoted to the problem of blob detection in manifold-valued images. Our solution is based on new definitions of blob response functions. We define the blob response functions by means of curvatures of an image graph, considered as a submanifold. We call the proposed framework Riemannian blob detection. We prove that our approach can be viewed as a generalization of the grayscale blob detection technique. An expression of the Riemannian blob response functions through the image Hessian is derived. We provide experiments for the case of vector-valued images on 2D surfaces: the proposed framework is tested on the task of chemical compounds classification.
Tasks
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13653v1
PDF	https://arxiv.org/pdf/1905.13653v1.pdf
PWC	https://paperswithcode.com/paper/a-riemanian-approach-to-blob-detection-in
Repo
Framework


Title	Transferable Representation Learning in Vision-and-Language Navigation
Authors	Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie
Abstract	Vision-and-Language Navigation (VLN) tasks such as Room-to-Room (R2R) require machine agents to interpret natural language instructions and learn to act in visually realistic environments to achieve navigation goals. The overall task requires competence in several perception problems: successful agents combine spatio-temporal, vision and language understanding to produce appropriate action sequences. Our approach adapts pre-trained vision and language representations to relevant in-domain tasks making them more effective for VLN. Specifically, the representations are adapted to solve both a cross-modal sequence alignment and sequence coherence task. In the sequence alignment task, the model determines whether an instruction corresponds to a sequence of visual frames. In the sequence coherence task, the model determines whether the perceptual sequences are predictive sequentially in the instruction-conditioned latent space. By transferring the domain-adapted representations, we improve competitive agents in R2R as measured by the success rate weighted by path length (SPL) metric.
Tasks	Representation Learning
Published	2019-08-09
URL	https://arxiv.org/abs/1908.03409v2
PDF	https://arxiv.org/pdf/1908.03409v2.pdf
PWC	https://paperswithcode.com/paper/transferable-representation-learning-in
Repo
Framework

Fruit Detection, Segmentation and 3D Visualisation of Environments in Apple Orchards


Title	Fruit Detection, Segmentation and 3D Visualisation of Environments in Apple Orchards
Authors	Hanwen Kang, Chao Chen
Abstract	Robotic harvesting of fruits in orchards is a challenging task, since high density and overlapping of fruits and branches can heavily impact the success rate of robotic harvesting. Therefore, the vision system is demanded to provide comprehensive information of the working environment to guide the manipulator and gripping system to successful detach the target fruits. In this study, a deep learning based one-stage detector DaSNet-V2 is developed to perform the multi-task vision sensing in the working environment of apple orchards. DaSNet-V2 combines the detection and instance segmentation of fruits and semantic segmentation of branch into a single network architecture. Meanwhile, a light-weight backbone network LW-net is utilised in the DaSNet-V2 model to improve the computational efficiency of the model. In the experiment, DaSNet-V2 is tested and evaluated on the RGB-D images of the orchard. From the experiment results, DaSNet-V2 with lightweight backbone achieves 0.844, 0.858, and 0.795 on the F 1 score of the detection, and mean intersection of union on the instance segmentation of fruits and semantic segmentation of branches, respectively. To provide a direct-viewing of the working environment in orchards, the obtained sensing results are illustrated by 3D visualisation . The robustness and efficiency of the DaSNet-V2 in detection and segmentation are validated by the experiments in the real-environment of apple orchard.
Tasks	Instance Segmentation, Semantic Segmentation
Published	2019-11-28
URL	https://arxiv.org/abs/1911.12889v1
PDF	https://arxiv.org/pdf/1911.12889v1.pdf
PWC	https://paperswithcode.com/paper/fruit-detection-segmentation-and-3d
Repo
Framework