April 3, 2020

3234 words 16 mins read

Paper Group ANR 70

Estimation of Rate Control Parameters for Video Coding Using CNN. Ready Policy One: World Building Through Active Learning. Learning Autoencoders with Relational Regularization. Context Aware Image Annotation in Active Learning. Boosting API Recommendation with Implicit Feedback. Facial Feedback for Reinforcement Learning: A Case Study and Offline …

Estimation of Rate Control Parameters for Video Coding Using CNN


Title	Estimation of Rate Control Parameters for Video Coding Using CNN
Authors	Maria Santamaria, Ebroul Izquierdo, Saverio Blasi, Marta Mrak
Abstract	Rate-control is essential to ensure efficient video delivery. Typical rate-control algorithms rely on bit allocation strategies, to appropriately distribute bits among frames. As reference frames are essential for exploiting temporal redundancies, intra frames are usually assigned a larger portion of the available bits. In this paper, an accurate method to estimate number of bits and quality of intra frames is proposed, which can be used for bit allocation in a rate-control scheme. The algorithm is based on deep learning, where networks are trained using the original frames as inputs, while distortions and sizes of compressed frames after encoding are used as ground truths. Two approaches are proposed where either local or global distortions are predicted.
Tasks
Published	2020-03-13
URL	https://arxiv.org/abs/2003.06315v1
PDF	https://arxiv.org/pdf/2003.06315v1.pdf
PWC	https://paperswithcode.com/paper/estimation-of-rate-control-parameters-for
Repo
Framework

Ready Policy One: World Building Through Active Learning


Title	Ready Policy One: World Building Through Active Learning
Authors	Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
Abstract	Model-Based Reinforcement Learning (MBRL) offers a promising direction for sample efficient learning, often achieving state of the art results for continuous control tasks. However, many existing MBRL methods rely on combining greedy policies with exploration heuristics, and even those which utilize principled exploration bonuses construct dual objectives in an ad hoc fashion. In this paper we introduce Ready Policy One (RP1), a framework that views MBRL as an active learning problem, where we aim to improve the world model in the fewest samples possible. RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization, allowing the algorithm to trade off reward v.s. exploration at different stages of learning. In addition, we introduce a principled mechanism to terminate sample collection once we have a rich enough trajectory batch to improve the model. We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
Tasks	Active Learning, Continuous Control
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02693v1
PDF	https://arxiv.org/pdf/2002.02693v1.pdf
PWC	https://paperswithcode.com/paper/ready-policy-one-world-building-through
Repo
Framework

Learning Autoencoders with Relational Regularization


Title	Learning Autoencoders with Relational Regularization
Authors	Hongteng Xu, Dixin Luo, Ricardo Henao, Svati Shah, Lawrence Carin
Abstract	A new algorithmic framework is proposed for learning autoencoders of data distributions. We minimize the discrepancy between the model and target distributions, with a \emph{relational regularization} on the learnable latent prior. This regularization penalizes the fused Gromov-Wasserstein (FGW) distance between the latent prior and its corresponding posterior, allowing one to flexibly learn a structured prior distribution associated with the generative model. Moreover, it helps co-training of multiple autoencoders even if they have heterogeneous architectures and incomparable latent spaces. We implement the framework with two scalable algorithms, making it applicable for both probabilistic and deterministic autoencoders. Our relational regularized autoencoder (RAE) outperforms existing methods, $e.g.$, the variational autoencoder, Wasserstein autoencoder, and their variants, on generating images. Additionally, our relational co-training strategy for autoencoders achieves encouraging results in both synthesis and real-world multi-view learning tasks.
Tasks	MULTI-VIEW LEARNING
Published	2020-02-07
URL	https://arxiv.org/abs/2002.02913v2
PDF	https://arxiv.org/pdf/2002.02913v2.pdf
PWC	https://paperswithcode.com/paper/learning-autoencoders-with-relational
Repo
Framework

Context Aware Image Annotation in Active Learning


Title	Context Aware Image Annotation in Active Learning
Authors	Yingcheng Sun, Kenneth Loparo
Abstract	Image annotation for active learning is labor-intensive. Various automatic and semi-automatic labeling methods are proposed to save the labeling cost, but a reduction in the number of labeled instances does not guarantee a reduction in cost because the queries that are most valuable to the learner may be the most difficult or ambiguous cases, and therefore the most expensive for an oracle to label accurately. In this paper, we try to solve this problem by using image metadata to offer the oracle more clues about the image during annotation process. We propose a Context Aware Image Annotation Framework (CAIAF) that uses image metadata as similarity metric to cluster images into groups for annotation. We also present useful metadata information as context for each image on the annotation interface. Experiments show that it reduces that annotation cost with CAIAF compared to the conventional framework, while maintaining a high classification performance.
Tasks	Active Learning
Published	2020-02-06
URL	https://arxiv.org/abs/2002.02775v1
PDF	https://arxiv.org/pdf/2002.02775v1.pdf
PWC	https://paperswithcode.com/paper/context-aware-image-annotation-in-active
Repo
Framework

Boosting API Recommendation with Implicit Feedback


Title	Boosting API Recommendation with Implicit Feedback
Authors	Yu Zhou, Xinying Yang, Taolue Chen, Zhiqiu Huang, Xiaoxing Ma, Harald Gall
Abstract	Developers often need to use appropriate APIs to program efficiently, but it is usually a difficult task to identify the exact one they need from a vast of candidates. To ease the burden, a multitude of API recommendation approaches have been proposed. However, most of the currently available API recommenders do not support the effective integration of users’ feedback into the recommendation loop. In this paper, we propose a framework, BRAID (Boosting RecommendAtion with Implicit FeeDback), which leverages learning-to-rank and active learning techniques to boost recommendation performance. By exploiting users’ feedback information, we train a learning-to-rank model to re-rank the recommendation results. In addition, we speed up the feedback learning process with active learning. Existing query-based API recommendation approaches can be plugged into BRAID. We select three state-of-the-art API recommendation approaches as baselines to demonstrate the performance enhancement of BRAID measured by Hit@k (Top-k), MAP, and MRR. Empirical experiments show that, with acceptable overheads, the recommendation performance improves steadily and substantially with the increasing percentage of feedback data, comparing with the baselines.
Tasks	Active Learning, Learning-To-Rank
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01264v1
PDF	https://arxiv.org/pdf/2002.01264v1.pdf
PWC	https://paperswithcode.com/paper/boosting-api-recommendation-with-implicit
Repo
Framework

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework


Title	Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework
Authors	Guangliang Li, Hamdi Dibeklioğlu, Shimon Whiteson, Hayley Hung
Abstract	Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user. Previous research showed that humans give copious feedback early in training but very sparsely thereafter. In this article, we investigate the potential of agent learning from trainers’ facial expressions via interpreting them as evaluative feedback. To do so, we implemented TAMER which is a popular interactive reinforcement learning method in a reinforcement-learning benchmark problem — Infinite Mario, and conducted the first large-scale study of TAMER involving 561 participants. With designed CNN-RNN model, our analysis shows that telling trainers to use facial expressions and competition can improve the accuracies for estimating positive and negative feedback using facial expressions. In addition, our results with a simulation experiment show that learning solely from predicted feedback based on facial expressions is possible and using strong/effective prediction models or a regression method, facial responses would significantly improve the performance of agents. Furthermore, our experiment supports previous studies demonstrating the importance of bi-directional feedback and competitive elements in the training interface.
Tasks
Published	2020-01-23
URL	https://arxiv.org/abs/2001.08703v1
PDF	https://arxiv.org/pdf/2001.08703v1.pdf
PWC	https://paperswithcode.com/paper/facial-feedback-for-reinforcement-learning-a
Repo
Framework

Information-Theoretic Probing with Minimum Description Length


Title	Information-Theoretic Probing with Minimum Description Length
Authors	Elena Voita, Ivan Titov
Abstract	To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes, differences in their accuracy fail to adequately reflect differences in representations. For example, they do not substantially favour pretrained representations over randomly initialized ones. Analogously, their accuracy can be similar when probing for genuine linguistic labels and probing for random synthetic tasks. To see reasonable differences in accuracy with respect to these random baselines, previous work had to constrain either the amount of probe training data or its model size. Instead, we propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL). With MDL probing, training a probe to predict labels is recast as teaching it to effectively transmit the data. Therefore, the measure of interest changes from probe accuracy to the description length of labels given representations. In addition to probe quality, the description length evaluates “the amount of effort” needed to achieve the quality. This amount of effort characterizes either (i) size of a probing model, or (ii) the amount of data needed to achieve the high quality. We consider two methods for estimating MDL which can be easily implemented on top of the standard probing pipelines: variational coding and online coding. We show that these methods agree in results and are more informative and stable than the standard probes.
Tasks
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12298v1
PDF	https://arxiv.org/pdf/2003.12298v1.pdf
PWC	https://paperswithcode.com/paper/information-theoretic-probing-with-minimum
Repo
Framework

Fair Transfer of Multiple Style Attributes in Text


Title	Fair Transfer of Multiple Style Attributes in Text
Authors	Karan Dabas, Nishtha Madan, Vijay Arya, Sameep Mehta, Gautam Singh, Tanmoy Chakraborty
Abstract	To preserve anonymity and obfuscate their identity on online platforms users may morph their text and portray themselves as a different gender or demographic. Similarly, a chatbot may need to customize its communication style to improve engagement with its audience. This manner of changing the style of written text has gained significant attention in recent years. Yet these past research works largely cater to the transfer of single style attributes. The disadvantage of focusing on a single style alone is that this often results in target text where other existing style attributes behave unpredictably or are unfairly dominated by the new style. To counteract this behavior, it would be nice to have a style transfer mechanism that can transfer or control multiple styles simultaneously and fairly. Through such an approach, one could obtain obfuscated or written text incorporated with a desired degree of multiple soft styles such as female-quality, politeness, or formalness. In this work, we demonstrate that the transfer of multiple styles cannot be achieved by sequentially performing multiple single-style transfers. This is because each single style-transfer step often reverses or dominates over the style incorporated by a previous transfer step. We then propose a neural network architecture for fairly transferring multiple style attributes in a given text. We test our architecture on the Yelp data set to demonstrate our superior performance as compared to existing one-style transfer steps performed in a sequence.
Tasks	Chatbot, Style Transfer
Published	2020-01-18
URL	https://arxiv.org/abs/2001.06693v1
PDF	https://arxiv.org/pdf/2001.06693v1.pdf
PWC	https://paperswithcode.com/paper/fair-transfer-of-multiple-style-attributes-in
Repo
Framework

A Kernel of Truth: Determining Rumor Veracity on Twitter by Diffusion Pattern Alone


Title	A Kernel of Truth: Determining Rumor Veracity on Twitter by Diffusion Pattern Alone
Authors	Nir Rosenfeld, Aron Szanto, David C. Parkes
Abstract	Recent work in the domain of misinformation detection has leveraged rich signals in the text and user identities associated with content on social media. But text can be strategically manipulated and accounts reopened under different aliases, suggesting that these approaches are inherently brittle. In this work, we investigate an alternative modality that is naturally robust: the pattern in which information propagates. Can the veracity of an unverified rumor spreading online be discerned solely on the basis of its pattern of diffusion through the social network? Using graph kernels to extract complex topological information from Twitter cascade structures, we train accurate predictive models that are blind to language, user identities, and time, demonstrating for the first time that such “sanitized” diffusion patterns are highly informative of veracity. Our results indicate that, with proper aggregation, the collective sharing pattern of the crowd may reveal powerful signals of rumor truth or falsehood, even in the early stages of propagation.
Tasks
Published	2020-01-28
URL	https://arxiv.org/abs/2002.00850v2
PDF	https://arxiv.org/pdf/2002.00850v2.pdf
PWC	https://paperswithcode.com/paper/a-kernel-of-truth-determining-rumor-veracity
Repo
Framework

A Scalable Chatbot Platform Leveraging Online Community Posts: A Proof-of-Concept Study


Title	A Scalable Chatbot Platform Leveraging Online Community Posts: A Proof-of-Concept Study
Authors	Sihyeon Jo, Sangwon Im, SangWook Han, Seung Hee Yang, Hee-Eun Kim, Seong-Woo Kim
Abstract	The development of natural language processing algorithms and the explosive growth of conversational data are encouraging researches on the human-computer conversation. Still, getting qualified conversational data on a large scale is difficult and expensive. In this paper, we verify the feasibility of constructing a data-driven chatbot with processed online community posts by using them as pseudo-conversational data. We argue that chatbots for various purposes can be built extensively through the pipeline exploiting the common structure of community posts. Our experiment demonstrates that chatbots created along the pipeline can yield the proper responses.
Tasks	Chatbot
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03278v1
PDF	https://arxiv.org/pdf/2001.03278v1.pdf
PWC	https://paperswithcode.com/paper/a-scalable-chatbot-platform-leveraging-online
Repo
Framework

Deep Reinforcement Learning with Weighted Q-Learning


Title	Deep Reinforcement Learning with Weighted Q-Learning
Authors	Andrea Cini, Carlo D’Eramo, Jan Peters, Cesare Alippi
Abstract	Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias w.r.t. relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.
Tasks	Gaussian Processes, Q-Learning
Published	2020-03-20
URL	https://arxiv.org/abs/2003.09280v2
PDF	https://arxiv.org/pdf/2003.09280v2.pdf
PWC	https://paperswithcode.com/paper/deep-reinforcement-learning-with-weighted-q
Repo
Framework

Tackling Two Challenges of 6D Object Pose Estimation: Lack of Real Annotated RGB Images and Scalability to Number of Objects


Title	Tackling Two Challenges of 6D Object Pose Estimation: Lack of Real Annotated RGB Images and Scalability to Number of Objects
Authors	Juil Sock, Pedro Castro, Anil Armagan, Guillermo Garcia-Hernando, Tae-Kyun Kim
Abstract	State-of-the-art methods for 6D object pose estimation typically train a Deep Neural Network per object, and its training data first comes from a 3D object mesh. Models trained with synthetic data alone do not generalise well, and training a model for multiple objects sharply drops its accuracy. In this work, we address these two main challenges for 6D object pose estimation and investigate viable methods in experiments. For lack of real RGB data with pose annotations, we propose a novel self-supervision method via pose consistency. For scalability to multiple objects, we apply additional parameterisation to a backbone network and distill knowledge from teachers to a student network for model compression. We further evaluate the combination of the two methods for settings where we are given only synthetic data and a single network for multiple objects. In experiments using LINEMOD, LINEMOD OCCLUSION and T-LESS datasets, the methods significantly boost baseline accuracies and are comparable with the upper bounds, i.e., object specific networks trained on real data with pose labels.
Tasks	6D Pose Estimation using RGB, Model Compression, Pose Estimation
Published	2020-03-27
URL	https://arxiv.org/abs/2003.12344v1
PDF	https://arxiv.org/pdf/2003.12344v1.pdf
PWC	https://paperswithcode.com/paper/tackling-two-challenges-of-6d-object-pose
Repo
Framework

Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models


Title	Integrating Discrete and Neural Features via Mixed-feature Trans-dimensional Random Field Language Models
Authors	Silin Gao, Zhijian Ou, Wei Yang, Huifang Xu
Abstract	There has been a long recognition that discrete features (n-gram features) and neural network based features have complementary strengths for language models (LMs). Improved performance can be obtained by model interpolation, which is, however, a suboptimal two-step integration of discrete and neural features. The trans-dimensional random field (TRF) framework has the potential advantage of being able to flexibly integrate a richer set of features. However, either discrete or neural features are used alone in previous TRF LMs. This paper develops a mixed-feature TRF LM and demonstrates its advantage in integrating discrete and neural features. Various LMs are trained over PTB and Google one-billion-word datasets, and evaluated in N-best list rescoring experiments for speech recognition. Among all single LMs (i.e. without model interpolation), the mixed-feature TRF LMs perform the best, improving over both discrete TRF LMs and neural TRF LMs alone, and also being significantly better than LSTM LMs. Compared to interpolating two separately trained models with discrete and neural features respectively, the performance of mixed-feature TRF LMs matches the best interpolated model, and with simplified one-step training process and reduced training time.
Tasks	Speech Recognition
Published	2020-02-14
URL	https://arxiv.org/abs/2002.05967v1
PDF	https://arxiv.org/pdf/2002.05967v1.pdf
PWC	https://paperswithcode.com/paper/integrating-discrete-and-neural-features-via
Repo
Framework

Curriculum in Gradient-Based Meta-Reinforcement Learning


Title	Curriculum in Gradient-Based Meta-Reinforcement Learning
Authors	Bhairav Mehta, Tristan Deleu, Sharath Chandra Raparthy, Chris J. Pal, Liam Paull
Abstract	Gradient-based meta-learners such as Model-Agnostic Meta-Learning (MAML) have shown strong few-shot performance in supervised and reinforcement learning settings. However, specifically in the case of meta-reinforcement learning (meta-RL), we can show that gradient-based meta-learners are sensitive to task distributions. With the wrong curriculum, agents suffer the effects of meta-overfitting, shallow adaptation, and adaptation instability. In this work, we begin by highlighting intriguing failure cases of gradient-based meta-RL and show that task distributions can wildly affect algorithmic outputs, stability, and performance. To address this problem, we leverage insights from recent literature on domain randomization and propose meta Active Domain Randomization (meta-ADR), which learns a curriculum of tasks for gradient-based meta-RL in a similar as ADR does for sim2real transfer. We show that this approach induces more stable policies on a variety of simulated locomotion and navigation tasks. We assess in- and out-of-distribution generalization and find that the learned task distributions, even in an unstructured task space, greatly improve the adaptation performance of MAML. Finally, we motivate the need for better benchmarking in meta-RL that prioritizes \textit{generalization} over single-task adaption performance.
Tasks	Meta-Learning
Published	2020-02-19
URL	https://arxiv.org/abs/2002.07956v1
PDF	https://arxiv.org/pdf/2002.07956v1.pdf
PWC	https://paperswithcode.com/paper/curriculum-in-gradient-based-meta
Repo
Framework

EEG-based Drowsiness Estimation for Driving Safety using Deep Q-Learning


Title	EEG-based Drowsiness Estimation for Driving Safety using Deep Q-Learning
Authors	Yurui Ming, Dongrui Wu, Yu-Kai Wang, Yuhui Shi, Chin-Teng Lin
Abstract	Fatigue is the most vital factor of road fatalities and one manifestation of fatigue during driving is drowsiness. In this paper, we propose using deep Q-learning to analyze an electroencephalogram (EEG) dataset captured during a simulated endurance driving test. By measuring the correlation between drowsiness and driving performance, this experiment represents an important brain-computer interface (BCI) paradigm especially from an application perspective. We adapt the terminologies in the driving test to fit the reinforcement learning framework, thus formulate the drowsiness estimation problem as an optimization of a Q-learning task. By referring to the latest deep Q-Learning technologies and attending to the characteristics of EEG data, we tailor a deep Q-network for action proposition that can indirectly estimate drowsiness. Our results show that the trained model can trace the variations of mind state in a satisfactory way against the testing EEG data, which demonstrates the feasibility and practicability of this new computation paradigm. We also show that our method outperforms the supervised learning counterpart and is superior for real applications. To the best of our knowledge, we are the first to introduce the deep reinforcement learning method to this BCI scenario, and our method can be potentially generalized to other BCI cases.
Tasks	EEG, Q-Learning
Published	2020-01-08
URL	https://arxiv.org/abs/2001.02399v1
PDF	https://arxiv.org/pdf/2001.02399v1.pdf
PWC	https://paperswithcode.com/paper/eeg-based-drowsiness-estimation-for-driving
Repo
Framework