Paper Group AWR 258
Personalized Hashtag Recommendation for Micro-videos. gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo. Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations. Fast and Incremental Loop Closure Detection Using Proximity Graphs. Exploiting the Sign of the Advantage Function to Lear …
Personalized Hashtag Recommendation for Micro-videos
Title | Personalized Hashtag Recommendation for Micro-videos |
Authors | Yinwei Wei, Zhiyong Cheng, Xuzheng Yu, Zhou Zhao, Lei Zhu, Liqiang Nie |
Abstract | Personalized hashtag recommendation methods aim to suggest users hashtags to annotate, categorize, and describe their posts. The hashtags, that a user provides to a post (e.g., a micro-video), are the ones which in her mind can well describe the post content where she is interested in. It means that we should consider both users’ preferences on the post contents and their personal understanding on the hashtags. Most existing methods rely on modeling either the interactions between hashtags and posts or the interactions between users and hashtags for hashtag recommendation. These methods have not well explored the complicated interactions among users, hashtags, and micro-videos. In this paper, towards the personalized micro-video hashtag recommendation, we propose a Graph Convolution Network based Personalized Hashtag Recommendation (GCN-PHR) model, which leverages recently advanced GCN techniques to model the complicate interactions among <users, hashtags, micro-videos> and learn their representations. In our model, the users, hashtags, and micro-videos are three types of nodes in a graph and they are linked based on their direct associations. In particular, the message-passing strategy is used to learn the representation of a node (e.g., user) by aggregating the message passed from the directly linked other types of nodes (e.g., hashtag and micro-video). Because a user is often only interested in certain parts of a micro-video and a hashtag is typically used to describe the part (of a micro-video) that the user is interested in, we leverage the attention mechanism to filter the message passed from micro-videos to users and hashtags, which can significantly improve the representation capability. Extensive experiments have been conducted on two real-world micro-video datasets and demonstrate that our model outperforms the state-of-the-art approaches by a large margin. |
Tasks | |
Published | 2019-08-27 |
URL | https://arxiv.org/abs/1908.09987v1 |
https://arxiv.org/pdf/1908.09987v1.pdf | |
PWC | https://paperswithcode.com/paper/personalized-hashtag-recommendation-for-micro |
Repo | https://github.com/weiyinwei/GCN_PHR |
Framework | pytorch |
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo
Title | gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo |
Authors | Nestor Gonzalez Lopez, Yue Leire Erro Nuin, Elias Barba Moral, Lander Usategui San Juan, Alejandro Solano Rueda, Víctor Mayoral Vilches, Risto Kojcev |
Abstract | This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots. |
Tasks | Transfer Reinforcement Learning |
Published | 2019-03-14 |
URL | http://arxiv.org/abs/1903.06278v2 |
http://arxiv.org/pdf/1903.06278v2.pdf | |
PWC | https://paperswithcode.com/paper/gym-gazebo2-a-toolkit-for-reinforcement |
Repo | https://github.com/AcutronicRobotics/gym-gazebo2 |
Framework | none |
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations
Title | Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations |
Authors | Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum |
Abstract | A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-learning-from-observation algorithm, Trajectory-ranked Reward EXtrapolation (T-REX), that extrapolates beyond a set of (approximately) ranked demonstrations in order to infer high-quality reward functions from a set of potentially poor demonstrations. When combined with deep reinforcement learning, T-REX outperforms state-of-the-art imitation learning and IRL methods on multiple Atari and MuJoCo benchmark tasks and achieves performance that is often more than twice the performance of the best demonstration. We also demonstrate that T-REX is robust to ranking noise and can accurately extrapolate intention by simply watching a learner noisily improve at a task over time. |
Tasks | Imitation Learning |
Published | 2019-04-12 |
URL | https://arxiv.org/abs/1904.06387v5 |
https://arxiv.org/pdf/1904.06387v5.pdf | |
PWC | https://paperswithcode.com/paper/extrapolating-beyond-suboptimal |
Repo | https://github.com/francidellungo/Minigrid_HCI-project |
Framework | none |
Fast and Incremental Loop Closure Detection Using Proximity Graphs
Title | Fast and Incremental Loop Closure Detection Using Proximity Graphs |
Authors | Shan An, Guangfu Che, Fangru Zhou, Xianglong Liu, Xin Ma, Yu Chen |
Abstract | Visual loop closure detection, which can be considered as an image retrieval task, is an important problem in SLAM (Simultaneous Localization and Mapping) systems. The frequently used bag-of-words (BoW) models can achieve high precision and moderate recall. However, the requirement for lower time costs and fewer memory costs for mobile robot applications is not well satisfied. In this paper, we propose a novel loop closure detection framework titled `FILD’ (Fast and Incremental Loop closure Detection), which focuses on an on-line and incremental graph vocabulary construction for fast loop closure detection. The global and local features of frames are extracted using the Convolutional Neural Networks (CNN) and SURF on the GPU, which guarantee extremely fast extraction speeds. The graph vocabulary construction is based on one type of proximity graph, named Hierarchical Navigable Small World (HNSW) graphs, which is modified to adapt to this specific application. In addition, this process is coupled with a novel strategy for real-time geometrical verification, which only keeps binary hash codes and significantly saves on memory usage. Extensive experiments on several publicly available datasets show that the proposed approach can achieve fairly good recall at 100% precision compared to other state-of-the-art methods. The source code can be downloaded at https://github.com/AnshanTJU/FILD for further studies. | |
Tasks | Image Retrieval, Loop Closure Detection, Simultaneous Localization and Mapping |
Published | 2019-11-25 |
URL | https://arxiv.org/abs/1911.10752v1 |
https://arxiv.org/pdf/1911.10752v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-and-incremental-loop-closure-detection |
Repo | https://github.com/AnshanTJU/FILD |
Framework | none |
Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains
Title | Exploiting the Sign of the Advantage Function to Learn Deterministic Policies in Continuous Domains |
Authors | Matthieu Zimmer, Paul Weng |
Abstract | In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies. |
Tasks | |
Published | 2019-06-10 |
URL | https://arxiv.org/abs/1906.04556v2 |
https://arxiv.org/pdf/1906.04556v2.pdf | |
PWC | https://paperswithcode.com/paper/exploiting-the-sign-of-the-advantage-function |
Repo | https://github.com/matthieu637/ddrl |
Framework | tf |
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
Title | Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion |
Authors | Andy T. Liu, Po-chun Hsu, Hung-yi Lee |
Abstract | We present an unsupervised end-to-end training scheme where we discover discrete subword units from speech without using any labels. The discrete subword units are learned under an ASR-TTS autoencoder reconstruction setting, where an ASR-Encoder is trained to discover a set of common linguistic units given a variety of speakers, and a TTS-Decoder trained to project the discovered units back to the designated speech. We propose a discrete encoding method, Multilabel-Binary Vectors (MBV), to make the ASR-TTS autoencoder differentiable. We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language. Therefore, the TTS-Decoder can synthesize speech with the same content as the input of ASR-Encoder but with different speaker characteristics, which achieves voice conversion (VC). We further improve the quality of VC using adversarial training, where we train a TTS-Patcher that augments the output of TTS-Decoder. Objective and subjective evaluations show that the proposed approach offers strong VC results as it eliminates speaker identity while preserving content within speech. In the ZeroSpeech 2019 Challenge, we achieved outstanding performance in terms of low bitrate. |
Tasks | Voice Conversion |
Published | 2019-05-28 |
URL | https://arxiv.org/abs/1905.11563v3 |
https://arxiv.org/pdf/1905.11563v3.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-end-to-end-learning-of-discrete |
Repo | https://github.com/andi611/ZeroSpeech-TTS-without-T |
Framework | pytorch |
Direct Sparse Mapping
Title | Direct Sparse Mapping |
Authors | Jon Zubizarreta, Iker Aguinaga, J. M. M. Montiel |
Abstract | Photometric bundle adjustment, PBA, accurately estimates geometry from video. However, current PBA systems have a temporary map that cannot manage scene reobservations. We present, DSM, a full monocular visual SLAM based on PBA. Its persistent map handles reobservations, yielding the most accurate results up to date on EuRoC for a direct method. |
Tasks | Simultaneous Localization and Mapping |
Published | 2019-04-13 |
URL | http://arxiv.org/abs/1904.06577v1 |
http://arxiv.org/pdf/1904.06577v1.pdf | |
PWC | https://paperswithcode.com/paper/direct-sparse-mapping |
Repo | https://github.com/jzubizarreta/dsm |
Framework | none |
Continuous Direct Sparse Visual Odometry from RGB-D Images
Title | Continuous Direct Sparse Visual Odometry from RGB-D Images |
Authors | Maani Ghaffari, William Clark, Anthony Bloch, Ryan M. Eustice, Jessy W. Grizzle |
Abstract | This paper reports on a novel formulation and evaluation of visual odometry from RGB-D images. Assuming a static scene, the developed theoretical framework generalizes the widely used direct energy formulation (photometric error minimization) technique for obtaining a rigid body transformation that aligns two overlapping RGB-D images to a continuous formulation. The continuity is achieved through functional treatment of the problem and representing the process models over RGB-D images in a reproducing kernel Hilbert space; consequently, the registration is not limited to the specific image resolution and the framework is fully analytical with a closed-form derivation of the gradient. We solve the problem by maximizing the inner product between two functions defined over RGB-D images, while the continuous action of the rigid body motion Lie group is captured through the integration of the flow in the corresponding Lie algebra. Energy-based approaches have been extremely successful and the developed framework in this paper shares many of their desired properties such as the parallel structure on both CPUs and GPUs, sparsity, semi-dense tracking, avoiding explicit data association which is computationally expensive, and possible extensions to the simultaneous localization and mapping frameworks. The evaluations on experimental data and comparison with the equivalent energy-based formulation of the problem confirm the effectiveness of the proposed technique, especially, when the lack of structure and texture in the environment is evident. |
Tasks | Simultaneous Localization and Mapping, Visual Odometry |
Published | 2019-04-03 |
URL | https://arxiv.org/abs/1904.02266v3 |
https://arxiv.org/pdf/1904.02266v3.pdf | |
PWC | https://paperswithcode.com/paper/continuous-direct-sparse-visual-odometry-from |
Repo | https://github.com/MaaniGhaffari/cvo-rgbd |
Framework | none |
Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics
Title | Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics |
Authors | Alexandre Salle, Aline Villavicencio |
Abstract | In distributional semantics, the pointwise mutual information ($\mathit{PMI}$) weighting of the cooccurrence matrix performs far better than raw counts. There is, however, an issue with unobserved pair cooccurrences as $\mathit{PMI}$ goes to negative infinity. This problem is aggravated by unreliable statistics from finite corpora which lead to a large number of such pairs. A common practice is to clip negative $\mathit{PMI}$ ($\mathit{\texttt{-} PMI}$) at $0$, also known as Positive $\mathit{PMI}$ ($\mathit{PPMI}$). In this paper, we investigate alternative ways of dealing with $\mathit{\texttt{-} PMI}$ and, more importantly, study the role that negative information plays in the performance of a low-rank, weighted factorization of different $\mathit{PMI}$ matrices. Using various semantic and syntactic tasks as probes into models which use either negative or positive $\mathit{PMI}$ (or both), we find that most of the encoded semantics and syntax come from positive $\mathit{PMI}$, in contrast to $\mathit{\texttt{-} PMI}$ which contributes almost exclusively syntactic information. Our findings deepen our understanding of distributional semantics, while also introducing novel $PMI$ variants and grounding the popular $PPMI$ measure. |
Tasks | |
Published | 2019-08-19 |
URL | https://arxiv.org/abs/1908.06941v1 |
https://arxiv.org/pdf/1908.06941v1.pdf | |
PWC | https://paperswithcode.com/paper/why-so-down-the-role-of-negative-and-positive |
Repo | https://github.com/alexandres/lexvec |
Framework | none |
Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks
Title | Synthesizing Coupled 3D Face Modalities by Trunk-Branch Generative Adversarial Networks |
Authors | Baris Gecer, Alexander Lattas, Stylianos Ploumpis, Jiankang Deng, Athanasios Papaioannou, Stylianos Moschoglou, Stefanos Zafeiriou |
Abstract | Generating realistic 3D faces is of high importance for computer graphics and computer vision applications. Generally, research on 3D face generation revolves around linear statistical models of the facial surface. Nevertheless, these models cannot represent faithfully either the facial texture or the normals of the face, which are very crucial for photo-realistic face synthesis. Recently, it was demonstrated that Generative Adversarial Networks (GANs) can be used for generating high-quality textures of faces. Nevertheless, the generation process either omits the geometry and normals, or independent processes are used to produce 3D shape information. In this paper, we present the first methodology that generates high-quality texture, shape, and normals jointly, which can be used for photo-realistic synthesis. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. The qualitative results shown in this pre-print is compressed due to size limitations, full resolution results and the accompanying video can be found at the project page: https://github.com/barisgecer/TBGAN. |
Tasks | Face Generation |
Published | 2019-09-05 |
URL | https://arxiv.org/abs/1909.02215v1 |
https://arxiv.org/pdf/1909.02215v1.pdf | |
PWC | https://paperswithcode.com/paper/synthesizing-coupled-3d-face-modalities-by |
Repo | https://github.com/barisgecer/TBGAN |
Framework | none |
The Landscape of R Packages for Automated Exploratory Data Analysis
Title | The Landscape of R Packages for Automated Exploratory Data Analysis |
Authors | Mateusz Staniak, Przemyslaw Biecek |
Abstract | The increasing availability of large but noisy data sets with a large number of heterogeneous variables leads to the increasing interest in the automation of common tasks for data analysis. The most time-consuming part of this process is the Exploratory Data Analysis, crucial for better domain understanding, data cleaning, data validation, and feature engineering. There is a growing number of libraries that attempt to automate some of the typical Exploratory Data Analysis tasks to make the search for new insights easier and faster. In this paper, we present a systematic review of existing tools for Automated Exploratory Data Analysis (autoEDA). We explore the features of twelve popular R packages to identify the parts of analysis that can be effectively automated with the current tools and to point out new directions for further autoEDA development. |
Tasks | Feature Engineering |
Published | 2019-03-27 |
URL | https://arxiv.org/abs/1904.02101v3 |
https://arxiv.org/pdf/1904.02101v3.pdf | |
PWC | https://paperswithcode.com/paper/the-landscape-of-r-packages-for-automated |
Repo | https://github.com/mstaniak/autoEDA-resources |
Framework | none |
The Convolutional Tsetlin Machine
Title | The Convolutional Tsetlin Machine |
Authors | Ole-Christoffer Granmo, Sondre Glimsdal, Lei Jiao, Morten Goodwin, Christian W. Omlin, Geir Thore Berge |
Abstract | Convolutional neural networks (CNNs) have obtained astounding successes for important pattern recognition tasks, but they suffer from high computational complexity and the lack of interpretability. The recent Tsetlin Machine (TM) attempts to address this lack by using easy-to-interpret conjunctive clauses in propositional logic to solve complex pattern recognition problems. The TM provides competitive accuracy in several benchmarks, while keeping the important property of interpretability. It further facilitates hardware-near implementation since inputs, patterns, and outputs are expressed as bits, while recognition and learning rely on straightforward bit manipulation. In this paper, we exploit the TM paradigm by introducing the Convolutional Tsetlin Machine (CTM), as an interpretable alternative to CNNs. Whereas the TM categorizes an image by employing each clause once to the whole image, the CTM uses each clause as a convolution filter. That is, a clause is evaluated multiple times, once per image patch taking part in the convolution. To make the clauses location-aware, each patch is further augmented with its coordinates within the image. The output of a convolution clause is obtained simply by ORing the outcome of evaluating the clause on each patch. In the learning phase of the TM, clauses that evaluate to 1 are contrasted against the input. For the CTM, we instead contrast against one of the patches, randomly selected among the patches that made the clause evaluate to 1. Accordingly, the standard Type I and Type II feedback of the classic TM can be employed directly, without further modification. The CTM obtains a peak test accuracy of 99.4% on MNIST, 96.31% on Kuzushiji-MNIST, 91.5% on Fashion-MNIST, and 100.0% on the 2D Noisy XOR Problem, which is competitive with results reported for simple 4-layer CNNs, BinaryConnect, Logistic Circuits and an FPGA-accelerated Binary CNN. |
Tasks | |
Published | 2019-05-23 |
URL | https://arxiv.org/abs/1905.09688v5 |
https://arxiv.org/pdf/1905.09688v5.pdf | |
PWC | https://paperswithcode.com/paper/the-convolutional-tsetlin-machine |
Repo | https://github.com/zdx3578/pyTsetlinMachine |
Framework | none |
Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and Animals
Title | Flappy Hummingbird: An Open Source Dynamic Simulation of Flapping Wing Robots and Animals |
Authors | Fan Fei, Zhan Tu, Yilun Yang, Jian Zhang, Xinyan Deng |
Abstract | Insects and hummingbirds exhibit extraordinary flight capabilities and can simultaneously master seemingly conflicting goals: stable hovering and aggressive maneuvering, unmatched by small scale man-made vehicles. Flapping Wing Micro Air Vehicles (FWMAVs) hold great promise for closing this performance gap. However, design and control of such systems remain challenging due to various constraints. Here, we present an open source high fidelity dynamic simulation for FWMAVs to serve as a testbed for the design, optimization and flight control of FWMAVs. For simulation validation, we recreated the hummingbird-scale robot developed in our lab in the simulation. System identification was performed to obtain the model parameters. The force generation, open-loop and closed-loop dynamic response between simulated and experimental flights were compared and validated. The unsteady aerodynamics and the highly nonlinear flight dynamics present challenging control problems for conventional and learning control algorithms such as Reinforcement Learning. The interface of the simulation is fully compatible with OpenAI Gym environment. As a benchmark study, we present a linear controller for hovering stabilization and a Deep Reinforcement Learning control policy for goal-directed maneuvering. Finally, we demonstrate direct simulation-to-real transfer of both control policies onto the physical robot, further demonstrating the fidelity of the simulation. |
Tasks | |
Published | 2019-02-25 |
URL | http://arxiv.org/abs/1902.09628v1 |
http://arxiv.org/pdf/1902.09628v1.pdf | |
PWC | https://paperswithcode.com/paper/flappy-hummingbird-an-open-source-dynamic |
Repo | https://github.com/purdue-biorobotics/flappy |
Framework | tf |
Discovery of Physics from Data: Universal Laws and Discrepancy Models
Title | Discovery of Physics from Data: Universal Laws and Discrepancy Models |
Authors | Brian de Silva, David M. Higdon, Steven L. Brunton, J. Nathan Kutz |
Abstract | Machine learning (ML) and artificial intelligence (AI) algorithms are now being used to automate the discovery of physics principles and governing equations from measurement data alone. However, positing a universal physical law from data is challenging without simultaneously proposing an accompanying discrepancy model to account for the inevitable mismatch between theory and measurements. By revisiting the classic problem of modeling falling objects of different size and mass, we highlight a number of subtle and nuanced issues that must be addressed by modern data-driven methods for the automated discovery of physics. Specifically, we show that measurement noise and complex secondary physical mechanisms, such as unsteady fluid drag forces, can obscure the underlying law of gravitation, leading to an erroneous model. Without proposing an appropriate discrepancy model to handle these drag forces, the data supports an Aristotelian, versus a Galilean, theory of gravitation. Using the sparse identification of nonlinear dynamics (SINDy) algorithm, with the additional assumption that each separate falling object is governed by the same physical law, we are able to identify a viable discrepancy model to account for the fluid dynamic forces that explain the mismatch between a posited universal law of gravity and the measurement data. This work highlights the fact that the simple application of ML/AI will generally be insufficient to extract universal physical laws without further modification. |
Tasks | |
Published | 2019-06-19 |
URL | https://arxiv.org/abs/1906.07906v1 |
https://arxiv.org/pdf/1906.07906v1.pdf | |
PWC | https://paperswithcode.com/paper/discovery-of-physics-from-data-universal-laws |
Repo | https://github.com/briandesilva/discovery-of-physics-from-data |
Framework | none |
Core Semantic First: A Top-down Approach for AMR Parsing
Title | Core Semantic First: A Top-down Approach for AMR Parsing |
Authors | Deng Cai, Wai Lam |
Abstract | We introduce a novel scheme for parsing a piece of text into its Abstract Meaning Representation (AMR): Graph Spanning based Parsing (GSP). One novel characteristic of GSP is that it constructs a parse graph incrementally in a top-down fashion. Starting from the root, at each step, a new node and its connections to existing nodes will be jointly predicted. The output graph spans the nodes by the distance to the root, following the intuition of first grasping the main ideas then digging into more details. The \textit{core semantic first} principle emphasizes capturing the main ideas of a sentence, which is of great interest. We evaluate our model on the latest AMR sembank and achieve the state-of-the-art performance in the sense that no heuristic graph re-categorization is adopted. More importantly, the experiments show that our parser is especially good at obtaining the core semantics. |
Tasks | Amr Parsing |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04303v2 |
https://arxiv.org/pdf/1909.04303v2.pdf | |
PWC | https://paperswithcode.com/paper/core-semantic-first-a-top-down-approach-for |
Repo | https://github.com/jcyk/AMR-parser |
Framework | none |