January 28, 2020

2973 words 14 mins read

Paper Group ANR 1062

Conversational implicatures in English dialogue: Annotated dataset. Benchmarking Adversarial Robustness. Using Natural Language for Reward Shaping in Reinforcement Learning. Quantitative Logic Reasoning. Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response. Value of Information in Probabilistic Log …

Conversational implicatures in English dialogue: Annotated dataset


Title	Conversational implicatures in English dialogue: Annotated dataset
Authors	Elizabeth Jasmi George, Radhika Mamidi
Abstract	Human dialogue often contains utterances having meanings entirely different from the sentences used and are clearly understood by the interlocutors. But in human-computer interactions, the machine fails to understand the implicated meaning unless it is trained with a dataset containing the implicated meaning of an utterance along with the utterance and the context in which it is uttered. In linguistic terms, conversational implicatures are the meanings of the speaker’s utterance that are not part of what is explicitly said. In this paper, we introduce a dataset of dialogue snippets with three constituents, which are the context, the utterance, and the implicated meanings. These implicated meanings are the conversational implicatures. The utterances are collected by transcribing from listening comprehension sections of English tests like TOEFL (Test of English as a Foreign Language) as well as scraping dialogues from movie scripts available on IMSDb (Internet Movie Script Database). The utterances are manually annotated with implicatures.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1911.10704v1
PDF	https://arxiv.org/pdf/1911.10704v1.pdf
PWC	https://paperswithcode.com/paper/conversational-implicatures-in-english
Repo
Framework

Benchmarking Adversarial Robustness


Title	Benchmarking Adversarial Robustness
Authors	Yinpeng Dong, Qi-An Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, Jun Zhu
Abstract	Deep neural networks are vulnerable to adversarial examples, which becomes one of the most important research problems in the development of deep learning. While a lot of efforts have been made in recent years, it is of great significance to perform correct and complete evaluations of the adversarial attack and defense algorithms. In this paper, we establish a comprehensive, rigorous, and coherent benchmark to evaluate adversarial robustness on image classification tasks. After briefly reviewing plenty of representative attack and defense methods, we perform large-scale experiments with two robustness curves as the fair-minded evaluation criteria to fully understand the performance of these methods. Based on the evaluation results, we draw several important findings and provide insights for future research.
Tasks	Adversarial Attack, Image Classification
Published	2019-12-26
URL	https://arxiv.org/abs/1912.11852v1
PDF	https://arxiv.org/pdf/1912.11852v1.pdf
PWC	https://paperswithcode.com/paper/benchmarking-adversarial-robustness
Repo
Framework

Using Natural Language for Reward Shaping in Reinforcement Learning


Title	Using Natural Language for Reward Shaping in Reinforcement Learning
Authors	Prasoon Goyal, Scott Niekum, Raymond J. Mooney
Abstract	Recent reinforcement learning (RL) approaches have shown strong performance in complex domains such as Atari games, but are often highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. However, designing appropriate shaping rewards is known to be difficult as well as time-consuming. In this work, we address this problem by using natural language instructions to perform reward shaping. We propose the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent. These intermediate language-based rewards can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma’s Revenge from the Atari Learning Environment, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that, for the same number of interactions with the environment, language-based rewards lead to successful completion of the task 60% more often on average, compared to learning without language.
Tasks	Atari Games, Montezuma’s Revenge
Published	2019-03-05
URL	https://arxiv.org/abs/1903.02020v2
PDF	https://arxiv.org/pdf/1903.02020v2.pdf
PWC	https://paperswithcode.com/paper/using-natural-language-for-reward-shaping-in
Repo
Framework

Quantitative Logic Reasoning


Title	Quantitative Logic Reasoning
Authors	Marcelo Finger
Abstract	In this paper we show several similarities among logic systems that deal simultaneously with deductive and quantitative inference. We claim it is appropriate to call the tasks those systems perform as Quantitative Logic Reasoning. Analogous properties hold throughout that class, for whose members there exists a set of linear algebraic techniques applicable in the study of satisfiability decision problems. In this presentation, we consider as Quantitative Logic Reasoning the tasks performed by propositional Probabilistic Logic; first-order logic with counting quantifiers over a fragment containing unary and limited binary predicates; and propositional Lukasiewicz Infinitely-valued Probabilistic Logic
Tasks
Published	2019-05-14
URL	https://arxiv.org/abs/1905.05665v1
PDF	https://arxiv.org/pdf/1905.05665v1.pdf
PWC	https://paperswithcode.com/paper/quantitative-logic-reasoning
Repo
Framework

Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response


Title	Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response
Authors	Svitlana Vyetrenko, Shaojie Xu
Abstract	We demonstrate an application of risk-sensitive reinforcement learning to optimizing execution in limit order book markets. We represent taking order execution decisions based on limit order book knowledge by a Markov Decision Process; and train a trading agent in a market simulator, which emulates multi-agent interaction by synthesizing market response to our agent’s execution decisions from historical data. Due to market impact, executing high volume orders can incur significant cost. We learn trading signals from market microstructure in presence of simulated market response and derive explainable decision-tree-based execution policies using risk-sensitive Q-learning to minimize execution cost subject to constraints on cost variance.
Tasks	Q-Learning
Published	2019-06-05
URL	https://arxiv.org/abs/1906.02312v1
PDF	https://arxiv.org/pdf/1906.02312v1.pdf
PWC	https://paperswithcode.com/paper/risk-sensitive-compact-decision-trees-for
Repo
Framework

Value of Information in Probabilistic Logic Programs


Title	Value of Information in Probabilistic Logic Programs
Authors	Sarthak Ghosh, C. R. Ramakrishnan
Abstract	In medical decision making, we have to choose among several expensive diagnostic tests such that the certainty about a patient’s health is maximized while remaining within the bounds of resources like time and money. The expected increase in certainty in the patient’s condition due to performing a test is called the value of information (VoI) for that test. In general, VoI relates to acquiring additional information to improve decision-making based on probabilistic reasoning in an uncertain system. This paper presents a framework for acquiring information based on VoI in uncertain systems modeled as Probabilistic Logic Programs (PLPs). Optimal decision-making in uncertain systems modeled as PLPs have already been studied before. But, acquiring additional information to further improve the results of making the optimal decision has remained open in this context. We model decision-making in an uncertain system with a PLP and a set of top-level queries, with a set of utility measures over the distributions of these queries. The PLP is annotated with a set of atoms labeled as “observable”; in the medical diagnosis example, the observable atoms will be results of diagnostic tests. Each observable atom has an associated cost. This setting of optimally selecting observations based on VoI is more general than that considered by any prior work. Given a limited budget, optimally choosing observable atoms based on VoI is intractable in general. We give a greedy algorithm for constructing a “conditional plan” of observations: a schedule where the selection of what atom to observe next depends on earlier observations. We show that, preempting the algorithm anytime before completion provides a usable result, the result improves over time, and, in the absence of a well-defined budget, converges to the optimal solution.
Tasks	Decision Making, Medical Diagnosis
Published	2019-09-18
URL	https://arxiv.org/abs/1909.08234v1
PDF	https://arxiv.org/pdf/1909.08234v1.pdf
PWC	https://paperswithcode.com/paper/value-of-information-in-probabilistic-logic
Repo
Framework

iCartoonFace: A Benchmark of Cartoon Person Recognition


Title	iCartoonFace: A Benchmark of Cartoon Person Recognition
Authors	Shichao Li, Yi Zheng, Xiangju Lu, Bo Peng
Abstract	Cartoons receive increasingly attention and have a huge global market. Cartoon person recognition has a wealth of application scenarios. However, there is no large and high quality dataset for cartoon person recognition. It limit the development of recognition algorithms. In this paper, we propose the first large unconstrained cartoon database called iCartoonFace. We have released the dataset publicly available to promote cartoon person recognition research\footnote{The dataset can be applied by sending email to zhengyi01@qiyi.com}. The dataset contains 68,312 images of 2,639 identities. The dataset consists of persons which come from cartoon videos. The samples are extracted from public available images on website and online videos on iQiYi company. All images pass through a careful manual annotation process. We evaluated the state-of-the-art image classification and face recognition algorithms on the iCartoonFace dataset as a baseline. A dataset fusion method which utilize face feature to improve the performance of cartoon recognition task is proposed. Experimental performance show that the performance of baseline models much worse than human performance. The proposed dataset fusion method achieves a 4.74% improvement over the baseline model. In a word, state-of-the-art algorithms for classification and recognition are far from being perfect for unconstrained cartoon person recognition.
Tasks	Face Recognition, Image Classification, Person Recognition
Published	2019-07-31
URL	https://arxiv.org/abs/1907.13394v2
PDF	https://arxiv.org/pdf/1907.13394v2.pdf
PWC	https://paperswithcode.com/paper/icartoonface-a-benchmark-of-cartoon-person
Repo
Framework

PLOTS: Procedure Learning from Observations using Subtask Structure


Title	PLOTS: Procedure Learning from Observations using Subtask Structure
Authors	Tong Mu, Karan Goel, Emma Brunskill
Abstract	In many cases an intelligent agent may want to learn how to mimic a single observed demonstrated trajectory. In this work we consider how to perform such procedural learning from observation, which could help to enable agents to better use the enormous set of video data on observation sequences. Our approach exploits the properties of this setting to incrementally build an open loop action plan that can yield the desired subsequence, and can be used in both Markov and partially observable Markov domains. In addition, procedures commonly involve repeated extended temporal action subsequences. Our method optimistically explores actions to leverage potential repeated structure in the procedure. In comparing to some state-of-the-art approaches we find that our explicit procedural learning from observation method is about 100 times faster than policy-gradient based approaches that learn a stochastic policy and is faster than model based approaches as well. We also find that performing optimistic action selection yields substantial speed ups when latent dynamical structure is present.
Tasks
Published	2019-04-17
URL	http://arxiv.org/abs/1904.09162v1
PDF	http://arxiv.org/pdf/1904.09162v1.pdf
PWC	https://paperswithcode.com/paper/plots-procedure-learning-from-observations
Repo
Framework

Fully-Automatic Semantic Segmentation for Food Intake Tracking in Long-Term Care Homes


Title	Fully-Automatic Semantic Segmentation for Food Intake Tracking in Long-Term Care Homes
Authors	Kaylen J Pfisterer, Robert Amelard, Audrey G Chung, Braeden Syrnyk, Alexander MacLean, Alexander Wong
Abstract	Malnutrition impacts quality of life and places annually-recurring burden on the health care system. Half of older adults are at risk for malnutrition in long-term care (LTC). Monitoring and measuring nutritional intake is paramount yet involves time-consuming and subjective visual assessment, limiting current methods’ reliability. The opportunity for automatic image-based estimation exists. Some progress outside LTC has been made (e.g., calories consumed, food classification), however, these methods have not been implemented in LTC, potentially due to a lack of ability to independently evaluate automatic segmentation methods within the intake estimation pipeline. Here, we propose and evaluate a novel fully-automatic semantic segmentation method for pixel-level classification of food on a plate using a deep convolutional neural network (DCNN). The macroarchitecture of the DCNN is a multi-scale encoder-decoder food network (EDFN) architecture comprising a residual encoder microarchitecture, a pyramid scene parsing decoder microarchitecture, and a specialized per-pixel food/no-food classification layer. The network was trained and validated on the pre-labelled UNIMIB 2016 food dataset (1027 tray images, 73 categories), and tested on our novel LTC plate dataset (390 plate images, 9 categories). Our fully-automatic segmentation method attained similar intersection over union to the semi-automatic graph cuts (91.2% vs. 93.7%). Advantages of our proposed system include: testing on a novel dataset, decoupled error analysis, no user-initiated annotations, with similar segmentation accuracy and enhanced reliability in terms of types of segmentation errors. This may address several short-comings currently limiting utility of automated food intake tracking in time-constrained LTC and hospital settings.
Tasks	Scene Parsing, Semantic Segmentation
Published	2019-10-24
URL	https://arxiv.org/abs/1910.11250v1
PDF	https://arxiv.org/pdf/1910.11250v1.pdf
PWC	https://paperswithcode.com/paper/fully-automatic-semantic-segmentation-for
Repo
Framework

Maximum Probability Principle and Black-Box Priors


Title	Maximum Probability Principle and Black-Box Priors
Authors	Amir Emad Marvasti, Ehsan Emad Marvasti, Hassan Foroosh
Abstract	We present an axiomatic way of assigning probabilities to probabilistic models. In particular, we quantify an upper bound for probability of a model or in terms of information theory, a lower bound for amount of information that is assumed in a model. In our setup, maximizing probabilities of models is equivalent to removing assumptions or information stored in the model. Furthermore, we represent the problem of learning from an alternative view where the underlying probability space is considered directly. In this perspective both the true underlying model (Oracle) and the model at hand are events. subsequently, learning is presented in three perspectives: maximizing the likelihood of oracle given model, intersection of model and the oracle and symmetric difference complement of model and the oracle.
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09417v2
PDF	https://arxiv.org/pdf/1910.09417v2.pdf
PWC	https://paperswithcode.com/paper/maximum-probability-principle-and-black-box
Repo
Framework

Clustering in Partially Labeled Stochastic Block Models via Total Variation Minimization


Title	Clustering in Partially Labeled Stochastic Block Models via Total Variation Minimization
Authors	Alexander Jung
Abstract	A main task in data analysis is to organize data points into coherent groups or clusters. The stochastic block model is a probabilistic model for the cluster structure. This model prescribes different probabilities for the presence of edges within a cluster and between different clusters. We assume that the cluster assignments are known for at least one data point in each cluster. In such a partially labeled stochastic block model, clustering amounts to estimating the cluster assignments of the remaining data points. We study total variation minimization as a method for this clustering task. We implement the resulting clustering algorithm as a highly scalable message passing protocol. We also provide a condition on the model parameters such that total variation minimization allows for accurate clustering.
Tasks
Published	2019-11-03
URL	https://arxiv.org/abs/1911.00958v1
PDF	https://arxiv.org/pdf/1911.00958v1.pdf
PWC	https://paperswithcode.com/paper/clustering-in-partially-labeled-stochastic
Repo
Framework

Music Performance Analysis: A Survey


Title	Music Performance Analysis: A Survey
Authors	Alexander Lerch, Claire Arthur, Ashis Pati, Siddharth Gururani
Abstract	Music Information Retrieval (MIR) tends to focus on the analysis of audio signals. Often, a single music recording is used as representative of a “song” even though different performances of the same song may reveal different properties. A performance is distinct in many ways from a (arguably more abstract) representation of a “song,” “piece,” or musical score. The characteristics of the (recorded) performance – as opposed to the score or musical idea – can have a major impact on how a listener perceives music. The analysis of music performance, however, has been traditionally only a peripheral topic for the MIR research community. This paper surveys the field of Music Performance Analysis (MPA) from various perspectives, discusses its significance to the field of MIR, and points out opportunities for future research in this field.
Tasks	Information Retrieval, Music Information Retrieval
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00178v1
PDF	https://arxiv.org/pdf/1907.00178v1.pdf
PWC	https://paperswithcode.com/paper/music-performance-analysis-a-survey
Repo
Framework

Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning


Title	Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning
Authors	Gabriel V. de la Cruz Jr., Yunshu Du, Matthew E. Taylor
Abstract	Deep Reinforcement Learning (DRL) algorithms are known to be data inefficient. One reason is that a DRL agent learns both the feature and the policy tabula rasa. Integrating prior knowledge into DRL algorithms is one way to improve learning efficiency since it helps to build helpful representations. In this work, we consider incorporating human knowledge to accelerate the asynchronous advantage actor-critic (A3C) algorithm by pre-training a small amount of non-expert human demonstrations. We leverage the supervised autoencoder framework and propose a novel pre-training strategy that jointly trains a weighted supervised classification loss, an unsupervised reconstruction loss, and an expected return loss. The resulting pre-trained model learns more useful features compared to independently training in supervised or unsupervised fashion. Our pre-training method drastically improved the learning performance of the A3C agent in Atari games of Pong and MsPacman, exceeding the performance of the state-of-the-art algorithms at a much smaller number of game interactions. Our method is light-weight and easy to implement in a single machine. For reproducibility, our code is available at github.com/gabrieledcjr/DeepRL/tree/A3C-ALA2019
Tasks	Atari Games
Published	2019-04-03
URL	http://arxiv.org/abs/1904.02206v1
PDF	http://arxiv.org/pdf/1904.02206v1.pdf
PWC	https://paperswithcode.com/paper/jointly-pre-training-with-supervised
Repo
Framework

Gradient Ascent for Active Exploration in Bandit Problems


Title	Gradient Ascent for Active Exploration in Bandit Problems
Authors	Pierre Ménard
Abstract	We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.
Tasks
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08165v1
PDF	https://arxiv.org/pdf/1905.08165v1.pdf
PWC	https://paperswithcode.com/paper/gradient-ascent-for-active-exploration-in
Repo
Framework

DQN with model-based exploration: efficient learning on environments with sparse rewards


Title	DQN with model-based exploration: efficient learning on environments with sparse rewards
Authors	Stephen Zhen Gou, Yuyang Liu
Abstract	We propose Deep Q-Networks (DQN) with model-based exploration, an algorithm combining both model-free and model-based approaches that explores better and learns environments with sparse rewards more efficiently. DQN is a general-purpose, model-free algorithm and has been proven to perform well in a variety of tasks including Atari 2600 games since it’s first proposed by Minh et el. However, like many other reinforcement learning (RL) algorithms, DQN suffers from poor sample efficiency when rewards are sparse in an environment. As a result, most of the transitions stored in the replay memory have no informative reward signal, and provide limited value to the convergence and training of the Q-Network. However, one insight is that these transitions can be used to learn the dynamics of the environment as a supervised learning problem. The transitions also provide information of the distribution of visited states. Our algorithm utilizes these two observations to perform a one-step planning during exploration to pick an action that leads to states least likely to be seen, thus improving the performance of exploration. We demonstrate our agent’s performance in two classic environments with sparse rewards in OpenAI gym: Mountain Car and Lunar Lander.
Tasks	Atari Games
Published	2019-03-22
URL	http://arxiv.org/abs/1903.09295v1
PDF	http://arxiv.org/pdf/1903.09295v1.pdf
PWC	https://paperswithcode.com/paper/dqn-with-model-based-exploration-efficient
Repo
Framework