Learning Sequential Decision Rules in Control Design: Regret-Optimal and Risk-Coherent Methods

Abstract: Engineering sciences deal with the problem of optimal design in the face ofuncertainty. In particular, control engineering is concerned about designing poli-cies/laws/algorithms that sequentially take decisions given unreliable data. Thisthesis addresses two particular instances of optimal sequential decision making fortwo different problems.The first problem is known as theH∞-norm (or, in general,`2-gain for nonlinearsystems) estimation problem, which is a fundamental quantity in control designthrough,e.g., the small gain theorem. Given an unknown system, the goal is to findthe maximum`2-gain which, in a model-free approach, involves solving a sequentialinput design problem. TheH∞-norm estimation problem (or simply “gain estimationproblem”) is cast as the composition of a multi-armed bandit problem generatingdata, and an optimal estimation problem given that data. The problem of generatingdata is a sequential input-design problem in which, at every round, the decision-maker chooses one (or many) frequencies to sample from the unknown frequencyresponse of the system under study. We show that Thompson Sampling (TS), aclassical bandit algorithm, is optimal within the class of algorithms that chooses onlyone frequency per round. Additionally, we introduce Weighted Thompson Sampling(WTS), which is a TS-based algorithm that can sample many frequencies at everyround. In this thesis, we prove that WTS is an optimal bandit policy within the classof algorithms that can sample many frequencies simultaneously. On the other hand,the problem of estimating theH∞-norm of the system using the data provided bythe bandit algorithm is also discussed. In particular, we show that the expectedestimation error of the gain of the system asymptotically matches the Cram ́er-Raolower bound for a proposed estimator, and for every bandit policy in a wide class ofalgorithms.In the second part, we address the problem of risk-coherent optimal controldesign for disturbance rejection under uncertainty, where optimality is studied fromanH2and anH∞sense. We consider a parametric model for the plant and thenoise spectrum, where the modeling error between the model and the real system isuncertain. This uncertainty is condensed in a probability density function over thedifferent realizations of the parameters defining the model. We use this information todesign a controller that minimizes the risk of falling into poor closed-loop performancewithin a financial theory of risk framework. When the parameters in the plant arenot known with sufficient accuracy for control purposes, we introduce a frameworkthat allows us to tackle the joint-stabilization problem by means of sequential convexrelaxations, each of them leading to a semi-definite program. On the other hand,when the noise spectrum is uncertain, we propose a systematic scenario approachfor designingH2- andH∞-optimal controllers in terms of quadratically-constrainedlinear programs and sequential semi-definite programming, respectively. Simulationsshow that, from a risk-theoretical perspective, exploiting the information encodedin the probability density function of the parameters defining the models betterbalances the risk of falling into poor closed-loop performances.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)