Derivation of solution for simple control problem

58 Views Asked by At

While trying to understand the fundamental concepts in control theory reading the following article Dual Control for Approximate Bayesian Reinforcement Learning (chapter 3.1, "A toy problem") i came across the following solutions to a very simple problem:

Consider the linear, scalar system:

$x_{k+1} = ax_k + bu_k + \xi_k$

Where as $x_k$ denotes the state at timestep $k$, $u_k$ the control action at timestep $k$ and $\xi$ is normally distributed.

Consider the following cost function: $L(x, u) = [\sum_{k=0}^T (x_k - r_k)^T W_k (x_k - r_k) + \sum_{k=0}^{T-1}u_k^TU_ku_k]$ where as $r = [r_0...r_T]$ is a target trayectory. $W_k$ and $U_k$ define state and control cost respectively.

If $a$ and $b$ are known, the optimal $u_k$ to drive the current state $x_k$ to zero in one step can be trivially verified to be $u_{k, oracle}^* = -\frac{abx_k}{U + b^2}$

Let now parameter $b$ be uncertain, with current belief $p(b) \sim N(b; u_k, \alpha_k^2)$ at time $k$. The naive option of simply replacing the parameter with the current mean estimate is known as certainty equivalence (CE) control in the dual control literature. The resulting control law is $u_{k, ce}^* = -\frac{au_kx_k}{U + u_k^2}$


How can i derive these solutions?