January 25, 2020

3077 words 15 mins read

Paper Group ANR 1659

Multimodal Dataset of Human-Robot Hugging Interaction. Heartbeat Anomaly Detection using Adversarial Oversampling. Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce. Meta-Q-Learning. Neural Logic Reinforcement Learning. Learning Disentangled Representations with Reference-Based Variational Autoencode …

Multimodal Dataset of Human-Robot Hugging Interaction


Title	Multimodal Dataset of Human-Robot Hugging Interaction
Authors	Kunal Bagewadi, Joseph Campbell, Heni Ben Amor
Abstract	A hug is a tight embrace and an expression of warmth, sympathy and camaraderie. Despite the fact that a hug often only takes a few seconds, it is filled with details and nuances and is a highly complex process of coordination between two agents. For human-robot collaborative tasks, it is necessary for humans to develop trust and see the robot as a partner to perform a given task together. Datasets representing agent-agent interaction are scarce and, if available, of limited quality. To study the underlying phenomena and variations in a hug between a person and a robot, we deployed Baxter humanoid robot and wearable sensors on persons to record 353 episodes of hugging activity. 33 people were given minimal instructions to hug the humanoid robot for as natural hugging interaction as possible. In the paper, we present our methodology and analysis of the collected dataset. The use of this dataset is to implement machine learning methods for the humanoid robot to learn to anticipate and react to the movements of a person approaching for a hug. In this regard, we show the significance of the dataset by highlighting certain features in our dataset.
Tasks
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07471v1
PDF	https://arxiv.org/pdf/1909.07471v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-dataset-of-human-robot-hugging
Repo
Framework

Heartbeat Anomaly Detection using Adversarial Oversampling


Title	Heartbeat Anomaly Detection using Adversarial Oversampling
Authors	Jefferson L. P. Lima, David Macêdo, Cleber Zanchettin
Abstract	Cardiovascular diseases are one of the most common causes of death in the world. Prevention, knowledge of previous cases in the family, and early detection is the best strategy to reduce this fact. Different machine learning approaches to automatic diagnostic are being proposed to this task. As in most health problems, the imbalance between examples and classes is predominant in this problem and affects the performance of the automated solution. In this paper, we address the classification of heartbeats images in different cardiovascular diseases. We propose a two-dimensional Convolutional Neural Network for classification after using a InfoGAN architecture for generating synthetic images to unbalanced classes. We call this proposal Adversarial Oversampling and compare it with the classical oversampling methods as SMOTE, ADASYN, and RandomOversampling. The results show that the proposed approach improves the classifier performance for the minority classes without harming the performance in the balanced classes.
Tasks	Anomaly Detection
Published	2019-01-28
URL	http://arxiv.org/abs/1901.09972v1
PDF	http://arxiv.org/pdf/1901.09972v1.pdf
PWC	https://paperswithcode.com/paper/heartbeat-anomaly-detection-using-adversarial
Repo
Framework


Title	Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce
Authors	Jian-Guo Zhang, Pengcheng Zou, Zhao Li, Yao Wan, Xiuming Pan, Yu Gong, Philip S. Yu
Abstract	Nowadays, more and more customers browse and purchase products in favor of using mobile E-Commerce Apps such as Taobao and Amazon. Since merchants are usually inclined to describe redundant and over-informative product titles to attract attentions from customers, it is important to concisely display short product titles on limited screen of mobile phones. To address this discrepancy, previous studies mainly consider textual information of long product titles and lacks of human-like view during training and evaluation process. In this paper, we propose a Multi-Modal Generative Adversarial Network (MM-GAN) for short product title generation in E-Commerce, which innovatively incorporates image information and attribute tags from product, as well as textual information from original long titles. MM-GAN poses short title generation as a reinforcement learning process, where the generated titles are evaluated by the discriminator in a human-like view. Extensive experiments on a large-scale E-Commerce dataset demonstrate that our algorithm outperforms other state-of-the-art methods. Moreover, we deploy our model into a real-world online E-Commerce environment and effectively boost the performance of click through rate and click conversion rate by 1.66% and 1.87%, respectively.
Tasks
Published	2019-04-03
URL	http://arxiv.org/abs/1904.01735v1
PDF	http://arxiv.org/pdf/1904.01735v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-generative-adversarial-network
Repo
Framework

Meta-Q-Learning


Title	Meta-Q-Learning
Authors	Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, Alexander J. Smola
Abstract	This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state of the art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, using a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with state of the art meta-RL algorithms.
Tasks	Continuous Control, Q-Learning
Published	2019-09-30
URL	https://arxiv.org/abs/1910.00125v1
PDF	https://arxiv.org/pdf/1910.00125v1.pdf
PWC	https://paperswithcode.com/paper/meta-q-learning
Repo
Framework

Neural Logic Reinforcement Learning


Title	Neural Logic Reinforcement Learning
Authors	Zhengyao Jiang, Shan Luo
Abstract	Deep reinforcement learning (DRL) has achieved significant breakthroughs in various tasks. However, most DRL algorithms suffer a problem of generalizing the learned policy which makes the learning performance largely affected even by minor modifications of the training environment. Except that, the use of deep neural networks makes the learned policies hard to be interpretable. To address these two challenges, we propose a novel algorithm named Neural Logic Reinforcement Learning (NLRL) to represent the policies in reinforcement learning by first-order logic. NLRL is based on policy gradient methods and differentiable inductive logic programming that have demonstrated significant advantages in terms of interpretability and generalisability in supervised tasks. Extensive experiments conducted on cliff-walking and blocks manipulation tasks demonstrate that NLRL can induce interpretable policies achieving near-optimal performance while demonstrating good generalisability to environments of different initial states and problem sizes.
Tasks	Policy Gradient Methods
Published	2019-04-24
URL	https://arxiv.org/abs/1904.10729v2
PDF	https://arxiv.org/pdf/1904.10729v2.pdf
PWC	https://paperswithcode.com/paper/neural-logic-reinforcement-learning
Repo
Framework

Learning Disentangled Representations with Reference-Based Variational Autoencoders


Title	Learning Disentangled Representations with Reference-Based Variational Autoencoders
Authors	Adria Ruiz, Oriol Martinez, Xavier Binefa, Jakob Verbeek
Abstract	Learning disentangled representations from visual data, where different high-level generative factors are independently encoded, is of importance for many computer vision tasks. Solving this problem, however, typically requires to explicitly label all the factors of interest in training images. To alleviate the annotation cost, we introduce a learning setting which we refer to as “reference-based disentangling”. Given a pool of unlabeled images, the goal is to learn a representation where a set of target factors are disentangled from others. The only supervision comes from an auxiliary “reference set” containing images where the factors of interest are constant. In order to address this problem, we propose reference-based variational autoencoders, a novel deep generative model designed to exploit the weak-supervision provided by the reference set. By addressing tasks such as feature learning, conditional image generation or attribute transfer, we validate the ability of the proposed model to learn disentangled representations from this minimal form of supervision.
Tasks	Conditional Image Generation, Image Generation
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08534v1
PDF	http://arxiv.org/pdf/1901.08534v1.pdf
PWC	https://paperswithcode.com/paper/learning-disentangled-representations-with
Repo
Framework

Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View


Title	Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View
Authors	Jelena Diakonikolas, Lorenzo Orecchia
Abstract	This note provides a novel, simple analysis of the method of conjugate gradients for the minimization of convex quadratic functions. In contrast with standard arguments, our proof is entirely self-contained and does not rely on the existence of Chebyshev polynomials. Another advantage of our development is that it clarifies the relation between the method of conjugate gradients and general accelerated methods for smooth minimization by unifying their analyses within the framework of the Approximate Duality Gap Technique that was introduced by the authors.
Tasks
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00289v3
PDF	https://arxiv.org/pdf/1907.00289v3.pdf
PWC	https://paperswithcode.com/paper/conjugate-gradients-and-accelerated-methods
Repo
Framework

Sequence to sequence deep learning models for solar irradiation forecasting


Title	Sequence to sequence deep learning models for solar irradiation forecasting
Authors	Bhaskar Pratim Mukhoty, Vikas Maurya, Sandeep Kumar Shukla
Abstract	The energy output a photo voltaic(PV) panel is a function of solar irradiation and weather parameters like temperature and wind speed etc. A general measure for solar irradiation called Global Horizontal Irradiance (GHI), customarily reported in Watt/meter$^2$, is a generic indicator for this intermittent energy resource. An accurate prediction of GHI is necessary for reliable grid integration of the renewable as well as for power market trading. While some machine learning techniques are well introduced along with the traditional time-series forecasting techniques, deep-learning techniques remains less explored for the task at hand. In this paper we give deep learning models suitable for sequence to sequence prediction of GHI. The deep learning models are reported for short-term forecasting ${1-24}$ hour along with the state-of-the art techniques like Gradient Boosted Regression Trees(GBRT) and Feed Forward Neural Networks(FFNN). We have checked that spatio-temporal features like wind direction, wind speed and GHI of neighboring location improves the prediction accuracy of the deep learning models significantly. Among the various sequence-to-sequence encoder-decoder models LSTM performed superior, handling short-comings of the state-of-the-art techniques.
Tasks	Time Series, Time Series Forecasting
Published	2019-04-30
URL	http://arxiv.org/abs/1904.13081v1
PDF	http://arxiv.org/pdf/1904.13081v1.pdf
PWC	https://paperswithcode.com/paper/sequence-to-sequence-deep-learning-models-for
Repo
Framework

Curriculum Learning in Deep Neural Networks for Financial Forecasting


Title	Curriculum Learning in Deep Neural Networks for Financial Forecasting
Authors	Allison Koenecke, Amita Gajewar
Abstract	For any financial organization, computing accurate quarterly forecasts for various products is one of the most critical operations. As the granularity at which forecasts are needed increases, traditional statistical time series models may not scale well. We apply deep neural networks in the forecasting domain by experimenting with techniques from Natural Language Processing (Encoder-Decoder LSTMs) and Computer Vision (Dilated CNNs), as well as incorporating transfer learning. A novel contribution of this paper is the application of curriculum learning to neural network models built for time series forecasting. We illustrate the performance of our models using Microsoft’s revenue data corresponding to Enterprise, and Small, Medium & Corporate products, spanning approximately 60 regions across the globe for 8 different business segments, and totaling in the order of tens of billions of USD. We compare our models’ performance to the ensemble model of traditional statistics and machine learning techniques currently used by Microsoft Finance. With this in-production model as a baseline, our experiments yield an approximately 30% improvement in overall accuracy on test data. We find that our curriculum learning LSTM-based model performs best, showing that it is reasonable to implement our proposed methods without overfitting on medium-sized data.
Tasks	Time Series, Time Series Forecasting, Transfer Learning
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12887v2
PDF	https://arxiv.org/pdf/1904.12887v2.pdf
PWC	https://paperswithcode.com/paper/curriculum-learning-in-deep-neural-networks
Repo
Framework

Probabilistic Forecasting of Sensory Data with Generative Adversarial Networks - ForGAN


Title	Probabilistic Forecasting of Sensory Data with Generative Adversarial Networks - ForGAN
Authors	Alireza Koochali, Peter Schichtel, Sheraz Ahmed, Andreas Dengel
Abstract	Time series forecasting is one of the challenging problems for humankind. Traditional forecasting methods using mean regression models have severe shortcomings in reflecting real-world fluctuations. While new probabilistic methods rush to rescue, they fight with technical difficulties like quantile crossing or selecting a prior distribution. To meld the different strengths of these fields while avoiding their weaknesses as well as to push the boundary of the state-of-the-art, we introduce ForGAN - one step ahead probabilistic forecasting with generative adversarial networks. ForGAN utilizes the power of the conditional generative adversarial network to learn the data generating distribution and compute probabilistic forecasts from it. We argue how to evaluate ForGAN in opposition to regression methods. To investigate probabilistic forecasting of ForGAN, we create a new dataset and demonstrate our method abilities on it. This dataset will be made publicly available for comparison. Furthermore, we test ForGAN on two publicly available datasets, namely Mackey-Glass dataset and Internet traffic dataset (A5M) where the impressive performance of ForGAN demonstrate its high capability in forecasting future values.
Tasks	Time Series, Time Series Forecasting
Published	2019-03-29
URL	http://arxiv.org/abs/1903.12549v1
PDF	http://arxiv.org/pdf/1903.12549v1.pdf
PWC	https://paperswithcode.com/paper/probabilistic-forecasting-of-sensory-data
Repo
Framework

The Impact of Popularity Bias on Fairness and Calibration in Recommendation


Title	The Impact of Popularity Bias on Fairness and Calibration in Recommendation
Authors	Himan Abdollahpouri, Masoud Mansoury, Robin Burke, Bamshad Mobasher
Abstract	Recently there has been a growing interest in fairness-aware recommender systems, including fairness in providing consistent performance across different users or groups of users. A recommender system could be considered unfair if the recommendations do not fairly represent the tastes of a certain group of users while other groups receive recommendations that are consistent with their preferences. In this paper, we use a metric called miscalibration for measuring how a recommendation algorithm is responsive to users’ true preferences and we consider how various algorithms may result in different degrees of miscalibration. A well-known type of bias in recommendation is popularity bias where few popular items are over-represented in recommendations, while the majority of other items do not get significant exposure. We conjecture that popularity bias is one important factor leading to miscalibration in recommendation. Our experimental results using two real-world datasets show that there is a strong correlation between how different user groups are affected by algorithmic popularity bias and their level of interest in popular items. Moreover, we show algorithms with greater popularity bias amplification tend to have greater miscalibration.
Tasks	Calibration, Recommendation Systems
Published	2019-10-13
URL	https://arxiv.org/abs/1910.05755v3
PDF	https://arxiv.org/pdf/1910.05755v3.pdf
PWC	https://paperswithcode.com/paper/the-impact-of-popularity-bias-on-fairness-and
Repo
Framework

On the Convergence of Approximate and Regularized Policy Iteration Schemes


Title	On the Convergence of Approximate and Regularized Policy Iteration Schemes
Authors	Elena Smirnova, Elvis Dohmatob
Abstract	Entropy regularized algorithms such as Soft Q-learning and Soft Actor-Critic, recently showed state-of-the-art performance on a number of challenging reinforcement learning (RL) tasks. The regularized formulation modifies the standard RL objective and thus generally converges to a policy different from the optimal greedy policy of the original RL problem. Practically, it is important to control the sub-optimality of the regularized optimal policy. In this paper, we establish sufficient conditions for convergence of a large class of regularized dynamic programming algorithms, unified under regularized modified policy iteration (MPI) and conservative value iteration (VI) schemes. We provide explicit convergence rates to the optimality depending on the decrease rate of the regularization parameter. Our experiments show that the empirical error closely follows the established theoretical convergence rates. In addition to optimality, we demonstrate two desirable behaviours of the regularized algorithms even in the absence of approximations: robustness to stochasticity of environment and safety of trajectories induced by the policy iterates.
Tasks	Q-Learning
Published	2019-09-20
URL	https://arxiv.org/abs/1909.09621v2
PDF	https://arxiv.org/pdf/1909.09621v2.pdf
PWC	https://paperswithcode.com/paper/on-the-convergence-of-approximate-and
Repo
Framework

Joint Inference of Reward Machines and Policies for Reinforcement Learning


Title	Joint Inference of Reward Machines and Policies for Reinforcement Learning
Authors	Zhe Xu, Ivan Gavran, Yousef Ahmad, Rupak Majumdar, Daniel Neider, Ufuk Topcu, Bo Wu
Abstract	Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, i.e., a type of Mealy machine that encodes the reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis reward machine and a sample of RL episodes. It derives q-functions from the current hypothesis reward machine, and performs RL to update the q-functions. While performing RL, the algorithm updates the sample by adding RL episodes along which the obtained rewards are inconsistent with the rewards based on the current hypothesis reward machine. In the next iteration, the algorithm infers a new hypothesis reward machine from the updated sample. Based on an equivalence relationship we defined between states of reward machines, we transfer the q-functions between the hypothesis reward machines in consecutive iterations. We prove that the proposed algorithm converges almost surely to an optimal policy in the limit if a minimal reward machine can be inferred and the maximal length of each RL episode is sufficiently long. The experiments show that learning high-level knowledge in the form of reward machines can lead to fast convergence to optimal policies in RL, while standard RL methods such as q-learning and hierarchical RL methods fail to converge to optimal policies after a substantial number of training steps in many tasks.
Tasks	Q-Learning
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05912v1
PDF	https://arxiv.org/pdf/1909.05912v1.pdf
PWC	https://paperswithcode.com/paper/joint-inference-of-reward-machines-and
Repo
Framework

SQLR: Short-Term Memory Q-Learning for Elastic Provisioning


Title	SQLR: Short-Term Memory Q-Learning for Elastic Provisioning
Authors	Constantine Ayimba, Paolo Casari, Vincenzo Mancuso
Abstract	As more and more application providers transition to the cloud and deliver their services on a Software as a Service (SaaS) basis, cloud providers need to make their provisioning systems agile enough to meet Service Level Agreements. At the same time they should guard against over-provisioning which limits their capacity to accommodate more tenants. To this end we propose SQLR, a dynamic provisioning system employing a customized model-free reinforcement learning algorithm that is capable of reusing contextual knowledge learned from one workload to optimize resource provisioning for other workload patterns. SQLR achieves results comparable to those where resources are unconstrained, with minimal overhead. Our experiments show that we can reduce the amount of provisioned resources by almost 25% with less than 1% overall service unavailability (due to blocking) while delivering similar response times as those of an over-provisioned system.
Tasks	Q-Learning
Published	2019-09-12
URL	https://arxiv.org/abs/1909.05772v2
PDF	https://arxiv.org/pdf/1909.05772v2.pdf
PWC	https://paperswithcode.com/paper/sqlr-short-term-memory-q-learning-for-elastic
Repo
Framework


Title	Kernel Trajectory Maps for Multi-Modal Probabilistic Motion Prediction
Authors	Weiming Zhi, Lionel Ott, Fabio Ramos
Abstract	Understanding the dynamics of an environment, such as the movement of humans and vehicles, is crucial for agents to achieve long-term autonomy in urban environments. This requires the development of methods to capture the multi-modal and probabilistic nature of motion patterns. We present Kernel Trajectory Maps (KTM) to capture the trajectories of movement in an environment. KTMs leverage the expressiveness of kernels from non-parametric modelling by projecting input trajectories onto a set of representative trajectories, to condition on a sequence of observed waypoint coordinates, and predict a multi-modal distribution over possible future trajectories. The output is a mixture of continuous stochastic processes, where each realisation is a continuous functional trajectory, which can be queried at arbitrarily fine time steps.
Tasks	motion prediction
Published	2019-07-11
URL	https://arxiv.org/abs/1907.05127v2
PDF	https://arxiv.org/pdf/1907.05127v2.pdf
PWC	https://paperswithcode.com/paper/kernel-trajectory-maps-for-multi-modal
Repo
Framework