February 1, 2020

3446 words 17 mins read

Paper Group AWR 258

Personalized Hashtag Recommendation for Micro-videos. gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations. Fast and Incremental Loop Closure Detection Using Proximity Graphs. Exploiting the Sign of the Advantage Function to Lear …

Personalized Hashtag Recommendation for Micro-videos


Title	Personalized Hashtag Recommendation for Micro-videos
Authors	Yinwei Wei, Zhiyong Cheng, Xuzheng Yu, Zhou Zhao, Lei Zhu, Liqiang Nie
Abstract	Personalized hashtag recommendation methods aim to suggest users hashtags to annotate, categorize, and describe their posts. The hashtags, that a user provides to a post (e.g., a micro-video), are the ones which in her mind can well describe the post content where she is interested in. It means that we should consider both users’ preferences on the post contents and their personal understanding on the hashtags. Most existing methods rely on modeling either the interactions between hashtags and posts or the interactions between users and hashtags for hashtag recommendation. These methods have not well explored the complicated interactions among users, hashtags, and micro-videos. In this paper, towards the personalized micro-video hashtag recommendation, we propose a Graph Convolution Network based Personalized Hashtag Recommendation (GCN-PHR) model, which leverages recently advanced GCN techniques to model the complicate interactions among <users, hashtags, micro-videos> and learn their representations. In our model, the users, hashtags, and micro-videos are three types of nodes in a graph and they are linked based on their direct associations. In particular, the message-passing strategy is used to learn the representation of a node (e.g., user) by aggregating the message passed from the directly linked other types of nodes (e.g., hashtag and micro-video). Because a user is often only interested in certain parts of a micro-video and a hashtag is typically used to describe the part (of a micro-video) that the user is interested in, we leverage the attention mechanism to filter the message passed from micro-videos to users and hashtags, which can significantly improve the representation capability. Extensive experiments have been conducted on two real-world micro-video datasets and demonstrate that our model outperforms the state-of-the-art approaches by a large margin.
Tasks
Published	2019-08-27
URL	https://arxiv.org/abs/1908.09987v1
PDF	https://arxiv.org/pdf/1908.09987v1.pdf
PWC	https://paperswithcode.com/paper/personalized-hashtag-recommendation-for-micro
Repo	https://github.com/weiyinwei/GCN_PHR
Framework	pytorch

gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo


Title	gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo
Authors	Nestor Gonzalez Lopez, Yue Leire Erro Nuin, Elias Barba Moral, Lander Usategui San Juan, Alejandro Solano Rueda, Víctor Mayoral Vilches, Risto Kojcev
Abstract	This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.
Tasks	Transfer Reinforcement Learning
Published	2019-03-14
URL	http://arxiv.org/abs/1903.06278v2
PDF	http://arxiv.org/pdf/1903.06278v2.pdf
PWC	https://paperswithcode.com/paper/gym-gazebo2-a-toolkit-for-reinforcement
Repo	https://github.com/AcutronicRobotics/gym-gazebo2
Framework	none

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations


Title	Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
Authors	Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum
Abstract	A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations. When combined with deep reinforcement learning, T-REX outperforms state-of-the-art imitation learning and IRL methods on multiple Atari and MuJoCo benchmark tasks and achieves performance that is often more than twice the performance of the best demonstration. We also demonstrate that T-REX is robust to ranking noise and can accurately extrapolate intention by simply watching a learner noisily improve at a task over time.
Tasks	Imitation Learning
Published	2019-04-12
URL	https://arxiv.org/abs/1904.06387v5
PDF	https://arxiv.org/pdf/1904.06387v5.pdf
PWC	https://paperswithcode.com/paper/extrapolating-beyond-suboptimal
Repo	https://github.com/francidellungo/Minigrid_HCI-project
Framework	none

Fast and Incremental Loop Closure Detection Using Proximity Graphs


Title	Fast and Incremental Loop Closure Detection Using Proximity Graphs
Authors	Shan An, Guangfu Che, Fangru Zhou, Xianglong Liu, Xin Ma, Yu Chen
Abstract	Visual loop closure detection, which can be considered as an image retrieval task, is an important problem in SLAM (Simultaneous Localization and Mapping) systems. The frequently used bag-of-words (BoW) models can achieve high precision and moderate recall. However, the requirement for lower time costs and fewer memory costs for mobile robot applications is not well satisfied. In this paper, we propose a novel loop closure detection framework titled `FILD’ (Fast and Incremental Loop closure Detection), which focuses on an on-line and incremental graph vocabulary construction for fast loop closure detection. The global and local features of frames are extracted using the Convolutional Neural Networks (CNN) and SURF on the GPU, which guarantee extremely fast extraction speeds. The graph vocabulary construction is based on one type of proximity graph, named Hierarchical Navigable Small World (HNSW) graphs, which is modified to adapt to this specific application. In addition, this process is coupled with a novel strategy for real-time geometrical verification, which only keeps binary hash codes and significantly saves on memory usage. Extensive experiments on several publicly available datasets show that the proposed approach can achieve fairly good recall at 100% precision compared to other state-of-the-art methods. The source code can be downloaded at https://github.com/AnshanTJU/FILD for further studies. \|
Tasks	Image Retrieval, Loop Closure Detection, Simultaneous Localization and Mapping
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10752v1
PDF	https://arxiv.org/pdf/1911.10752v1.pdf
PWC	https://paperswithcode.com/paper/fast-and-incremental-loop-closure-detection
Repo	https://github.com/AnshanTJU/FILD
Framework	none

Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains


Title	Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains
Authors	Matthieu Zimmer, Paul Weng
Abstract	In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.
Tasks
Published	2019-06-10
URL	https://arxiv.org/abs/1906.04556v2
PDF	https://arxiv.org/pdf/1906.04556v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-the-sign-of-the-advantage-function
Repo	https://github.com/matthieu637/ddrl
Framework	tf

Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion


Title	Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
Authors	Andy T. Liu, Po-chun Hsu, Hung-yi Lee
Abstract	We present an unsupervised end-to-end training scheme where we discover discrete subword units from speech without using any labels. The discrete subword units are learned under an ASR-TTS autoencoder reconstruction setting, where an ASR-Encoder is trained to discover a set of common linguistic units given a variety of speakers, and a TTS-Decoder trained to project the discovered units back to the designated speech. We propose a discrete encoding method, Multilabel-Binary Vectors (MBV), to make the ASR-TTS autoencoder differentiable. We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language. Therefore, the TTS-Decoder can synthesize speech with the same content as the input of ASR-Encoder but with different speaker characteristics, which achieves voice conversion (VC). We further improve the quality of VC using adversarial training, where we train a TTS-Patcher that augments the output of TTS-Decoder. Objective and subjective evaluations show that the proposed approach offers strong VC results as it eliminates speaker identity while preserving content within speech. In the ZeroSpeech 2019 Challenge, we achieved outstanding performance in terms of low bitrate.
Tasks	Voice Conversion
Published	2019-05-28
URL	https://arxiv.org/abs/1905.11563v3
PDF	https://arxiv.org/pdf/1905.11563v3.pdf
PWC	https://paperswithcode.com/paper/unsupervised-end-to-end-learning-of-discrete
Repo	https://github.com/andi611/ZeroSpeech-TTS-without-T
Framework	pytorch

Direct Sparse Mapping


Title	Direct Sparse Mapping
Authors	Jon Zubizarreta, Iker Aguinaga, J. M. M. Montiel
Abstract	Photometric bundle adjustment, PBA, accurately estimates geometry from video. However, current PBA systems have a temporary map that cannot manage scene reobservations. We present, DSM, a full monocular visual SLAM based on PBA. Its persistent map handles reobservations, yielding the most accurate results up to date on EuRoC for a direct method.
Tasks	Simultaneous Localization and Mapping
Published	2019-04-13
URL	http://arxiv.org/abs/1904.06577v1
PDF	http://arxiv.org/pdf/1904.06577v1.pdf
PWC	https://paperswithcode.com/paper/direct-sparse-mapping
Repo	https://github.com/jzubizarreta/dsm
Framework	none

Continuous Direct Sparse Visual Odometry from RGB-D Images


Title	Continuous Direct Sparse Visual Odometry from RGB-D Images
Authors	Maani Ghaffari, William Clark, Anthony Bloch, Ryan M. Eustice, Jessy W. Grizzle
Abstract	This paper reports on a novel formulation and evaluation of visual odometry from RGB-D images. Assuming a static scene, the developed theoretical framework generalizes the widely used direct energy formulation (photometric error minimization) technique for obtaining a rigid body transformation that aligns two overlapping RGB-D images to a continuous formulation. The continuity is achieved through functional treatment of the problem and representing the process models over RGB-D images in a reproducing kernel Hilbert space; consequently, the registration is not limited to the specific image resolution and the framework is fully analytical with a closed-form derivation of the gradient. We solve the problem by maximizing the inner product between two functions defined over RGB-D images, while the continuous action of the rigid body motion Lie group is captured through the integration of the flow in the corresponding Lie algebra. Energy-based approaches have been extremely successful and the developed framework in this paper shares many of their desired properties such as the parallel structure on both CPUs and GPUs, sparsity, semi-dense tracking, avoiding explicit data association which is computationally expensive, and possible extensions to the simultaneous localization and mapping frameworks. The evaluations on experimental data and comparison with the equivalent energy-based formulation of the problem confirm the effectiveness of the proposed technique, especially, when the lack of structure and texture in the environment is evident.
Tasks	Simultaneous Localization and Mapping, Visual Odometry
Published	2019-04-03
URL	https://arxiv.org/abs/1904.02266v3
PDF	https://arxiv.org/pdf/1904.02266v3.pdf
PWC	https://paperswithcode.com/paper/continuous-direct-sparse-visual-odometry-from
Repo	https://github.com/MaaniGhaffari/cvo-rgbd
Framework	none

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics


Title	Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics
Authors	Alexandre Salle, Aline Villavicencio
Abstract	In distributional semantics, the pointwise mutual information ($\mathit{PMI}$) weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as $\mathit{PMI}$ goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative $\mathit{PMI}$ ($\mathit{\texttt{-} PMI}$) at $0$, also known as Positive $\mathit{PMI}$ ($\mathit{PPMI}$). In this paper, we investigate alternative ways of dealing with $\mathit{\texttt{-} PMI}$ and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different $\mathit{PMI}$ matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive $\mathit{PMI}$ (or both), we find that most of the encoded semantics and syntax come from positive $\mathit{PMI}$, in contrast to $\mathit{\texttt{-} PMI}$ which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel $PMI$ variants and grounding the popular $PPMI$ measure.
Tasks
Published	2019-08-19
URL	https://arxiv.org/abs/1908.06941v1
PDF	https://arxiv.org/pdf/1908.06941v1.pdf
PWC	https://paperswithcode.com/paper/why-so-down-the-role-of-negative-and-positive
Repo	https://github.com/alexandres/lexvec
Framework	none

Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks


Title	Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks
Authors	Baris Gecer, Alexander Lattas, Stylianos Ploumpis, Jiankang Deng, Athanasios Papaioannou, Stylianos Moschoglou, Stefanos Zafeiriou
Abstract	Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was demonstrated that Generative Adversarial Networks (GANs) can be used for generating high-quality textures of faces. Nevertheless, the generation process either omits the geometry and normals, or independent processes are used to produce 3D shape information. In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. The qualitative results shown in this pre-print is compressed due to size limitations, full resolution results and the accompanying video can be found at the project page: https://github.com/barisgecer/TBGAN.
Tasks	Face Generation
Published	2019-09-05
URL	https://arxiv.org/abs/1909.02215v1
PDF	https://arxiv.org/pdf/1909.02215v1.pdf
PWC	https://paperswithcode.com/paper/synthesizing-coupled-3d-face-modalities-by
Repo	https://github.com/barisgecer/TBGAN
Framework	none

The Landscape of R Packages for Automated Exploratory Data Analysis


Title	The Landscape of R Packages for Automated Exploratory Data Analysis
Authors	Mateusz Staniak, Przemyslaw Biecek
Abstract	The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development.
Tasks	Feature Engineering
Published	2019-03-27
URL	https://arxiv.org/abs/1904.02101v3
PDF	https://arxiv.org/pdf/1904.02101v3.pdf
PWC	https://paperswithcode.com/paper/the-landscape-of-r-packages-for-automated
Repo	https://github.com/mstaniak/autoEDA-resources
Framework	none

The Convolutional Tsetlin Machine


Title	The Convolutional Tsetlin Machine
Authors	Ole-Christoffer Granmo, Sondre Glimsdal, Lei Jiao, Morten Goodwin, Christian W. Omlin, Geir Thore Berge
Abstract	Convolutional neural networks (CNNs) have obtained astounding successes for important pattern recognition tasks, but they suffer from high computational complexity and the lack of interpretability. The recent Tsetlin Machine (TM) attempts to address this lack by using easy-to-interpret conjunctive clauses in propositional logic to solve complex pattern recognition problems. The TM provides competitive accuracy in several benchmarks, while keeping the important property of interpretability. It further facilitates hardware-near implementation since inputs, patterns, and outputs are expressed as bits, while recognition and learning rely on straightforward bit manipulation. In this paper, we exploit the TM paradigm by introducing the Convolutional Tsetlin Machine (CTM), as an interpretable alternative to CNNs. Whereas the TM categorizes an image by employing each clause once to the whole image, the CTM uses each clause as a convolution filter. That is, a clause is evaluated multiple times, once per image patch taking part in the convolution. To make the clauses location-aware, each patch is further augmented with its coordinates within the image. The output of a convolution clause is obtained simply by ORing the outcome of evaluating the clause on each patch. In the learning phase of the TM, clauses that evaluate to 1 are contrasted against the input. For the CTM, we instead contrast against one of the patches, randomly selected among the patches that made the clause evaluate to 1. Accordingly, the standard Type I and Type II feedback of the classic TM can be employed directly, without further modification. The CTM obtains a peak test accuracy of 99.4% on MNIST, 96.31% on Kuzushiji-MNIST, 91.5% on Fashion-MNIST, and 100.0% on the 2D Noisy XOR Problem, which is competitive with results reported for simple 4-layer CNNs, BinaryConnect, Logistic Circuits and an FPGA-accelerated Binary CNN.
Tasks
Published	2019-05-23
URL	https://arxiv.org/abs/1905.09688v5
PDF	https://arxiv.org/pdf/1905.09688v5.pdf
PWC	https://paperswithcode.com/paper/the-convolutional-tsetlin-machine
Repo	https://github.com/zdx3578/pyTsetlinMachine
Framework	none

Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and Animals


Title	Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and Animals
Authors	Fan Fei, Zhan Tu, Yilun Yang, Jian Zhang, Xinyan Deng
Abstract	Insects and hummingbirds exhibit extraordinary flight capabilities and can simultaneously master seemingly conflicting goals: stable hovering and aggressive maneuvering, unmatched by small scale man-made vehicles. Flapping Wing Micro Air Vehicles (FWMAVs) hold great promise for closing this performance gap. However, design and control of such systems remain challenging due to various constraints. Here, we present an open source high fidelity dynamic simulation for FWMAVs to serve as a testbed for the design, optimization and flight control of FWMAVs. For simulation validation, we recreated the hummingbird-scale robot developed in our lab in the simulation. System identification was performed to obtain the model parameters. The force generation, open-loop and closed-loop dynamic response between simulated and experimental flights were compared and validated. The unsteady aerodynamics and the highly nonlinear flight dynamics present challenging control problems for conventional and learning control algorithms such as Reinforcement Learning. The interface of the simulation is fully compatible with OpenAI Gym environment. As a benchmark study, we present a linear controller for hovering stabilization and a Deep Reinforcement Learning control policy for goal-directed maneuvering. Finally, we demonstrate direct simulation-to-real transfer of both control policies onto the physical robot, further demonstrating the fidelity of the simulation.
Tasks
Published	2019-02-25
URL	http://arxiv.org/abs/1902.09628v1
PDF	http://arxiv.org/pdf/1902.09628v1.pdf
PWC	https://paperswithcode.com/paper/flappy-hummingbird-an-open-source-dynamic
Repo	https://github.com/purdue-biorobotics/flappy
Framework	tf

Discovery of Physics from Data: Universal Laws and Discrepancy Models


Title	Discovery of Physics from Data: Universal Laws and Discrepancy Models
Authors	Brian de Silva, David M. Higdon, Steven L. Brunton, J. Nathan Kutz
Abstract	Machine learning (ML) and artificial intelligence (AI) algorithms are now being used to automate the discovery of physics principles and governing equations from measurement data alone. However, positing a universal physical law from data is challenging without simultaneously proposing an accompanying discrepancy model to account for the inevitable mismatch between theory and measurements. By revisiting the classic problem of modeling falling objects of different size and mass, we highlight a number of subtle and nuanced issues that must be addressed by modern data-driven methods for the automated discovery of physics. Specifically, we show that measurement noise and complex secondary physical mechanisms, such as unsteady fluid drag forces, can obscure the underlying law of gravitation, leading to an erroneous model. Without proposing an appropriate discrepancy model to handle these drag forces, the data supports an Aristotelian, versus a Galilean, theory of gravitation. Using the sparse identification of nonlinear dynamics (SINDy) algorithm, with the additional assumption that each separate falling object is governed by the same physical law, we are able to identify a viable discrepancy model to account for the fluid dynamic forces that explain the mismatch between a posited universal law of gravity and the measurement data. This work highlights the fact that the simple application of ML/AI will generally be insufficient to extract universal physical laws without further modification.
Tasks
Published	2019-06-19
URL	https://arxiv.org/abs/1906.07906v1
PDF	https://arxiv.org/pdf/1906.07906v1.pdf
PWC	https://paperswithcode.com/paper/discovery-of-physics-from-data-universal-laws
Repo	https://github.com/briandesilva/discovery-of-physics-from-data
Framework	none

Core Semantic First: A Top-down Approach for AMR Parsing


Title	Core Semantic First: A Top-down Approach for AMR Parsing
Authors	Deng Cai, Wai Lam
Abstract	We introduce a novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): Graph Spanning based Parsing (GSP). One novel characteristic of GSP is that it constructs a parse graph incrementally in a top-down fashion. Starting from the root, at each step, a new node and its connections to existing nodes will be jointly predicted. The output graph spans the nodes by the distance to the root, following the intuition of first grasping the main ideas then digging into more details. The \textit{core semantic first} principle emphasizes capturing the main ideas of a sentence, which is of great interest. We evaluate our model on the latest AMR sembank and achieve the state-of-the-art performance in the sense that no heuristic graph re-categorization is adopted. More importantly, the experiments show that our parser is especially good at obtaining the core semantics.
Tasks	Amr Parsing
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04303v2
PDF	https://arxiv.org/pdf/1909.04303v2.pdf
PWC	https://paperswithcode.com/paper/core-semantic-first-a-top-down-approach-for
Repo	https://github.com/jcyk/AMR-parser
Framework	none