Regret Minimization in Structured Reinforcement Learning

Author: Damianos Tranos; Alexandre Proutiere; Yevgeny Seldin; Kth; []

Keywords: ENGINEERING AND TECHNOLOGY; TEKNIK OCH TEKNOLOGIER; TEKNIK OCH TEKNOLOGIER; ENGINEERING AND TECHNOLOGY; Reinforcement Learning; Electrical Engineering; Elektro- och systemteknik;

Abstract: We consider a class of sequential decision making problems in the presence of uncertainty, which belongs to the field of Reinforcement Learning (RL). Specifically, we study discrete Markov decision Processes (MDPs) which model a decision maker or agent that interacts with a stochastic and dynamic environment and receives feedback from it in the form of a reward. The agent seeks to maximize a notion of cumulative reward. Because the environment (both the system dynamics and reward function) is unknown, it faces an exploration-exploitation dilemma, where it must balance exploring its available actions or exploiting what it believes to be the best one. This dilemma captured by the notion of regret, which compares the rewards that the agent has accumulated thus far with those that would have been obtained by an optimal policy. The agent is then said to behave optimally, if it minimizes its regret.This thesis investigates the fundamental regret limits that can be achieved by any agent. We derive general asymptotic and problem specific regret lower bounds for the cases of ergodic and deterministic MDPs. We make these explicit for ergodic MDPs that are unstructured, for MDPs with Lipschitz transitions and rewards, as well as for deterministic MDPs that satisfy a decoupling property. Furthermore, we propose DEL, an algorithm that is valid for any ergodic MDP with any structure and whose regret upper bound matches the associated regret lower bounds, thus being truly optimal. For this algorithm, we present theoretical regret guarantees as well as a numerical demonstration that verifies its ability to exploit the underlying structure.

CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)

Regret Minimization in Structured Reinforcement Learning

Searchphrases right now

Popular searches

Popular dissertations yesterday (2024-04-25)