Can I use control theory when the output is stochastic and has delayed reactions to the controller?

88 Views Asked by At

I am a complete novice in control theory. My understanding of control theory is that it can be used to adjust parameters of a system based on feedback to reach some desired state. It sounds like a useful set of tools for a problem that I have, but I'm struggling to formulate it.

My problem is (essentially) as follows: Users are allowed to click on one of two buttons, A or B, whenever they want to, over the course of an hour. My desired state is that I want the number of A-presses to be about the same as the number of B-presses ($A-B = 0$). There is an internal parameter $u$ that can make one of the buttons more desirable to click than the other to an arbitrary degree, which will cause more users to click on that button in the future.

Let's say that after 10 minutes, A has been pressed 300 times and B has been pressed only 100 times ($y = 300-100 = 200$). Clearly, for some unknown external reason, pressing A is more desirable to the users than pressing B. I would like to take this feedback and adjust the internal parameter $u$ to make B more enticing and encourage users to press B more, with the goal of bringing $y$ nearer to zero.

Can I use control theory techniques to help with this problem? It seems like a perfect fit based on the description of control theory, but every example that I've seen so far has had a well defined output based on a physical system such as the car's speed in cruise control, rather than something as stochastic as the users choices based on their desires. Additionally, the car's speed updates immediately, whereas the impact of changing $u$ on $y$ could be substantially delayed. I've tried implementing an ad-hoc solution for this problem with simulated users, but it regularly overshoots the desired solution and oscillates. Any tips are greatly appreciated!

Simulation Details

The simulation is coded up in Python and works as follows. At each timestep, one of the buttons is pressed by a "user". The probability of the pressed button being A, $P(A) = \sigma(u+v)$ where $\sigma(x) = \frac{1}{1 + e^{-x}}$ is the sigmoid function to produce a probability, $u$ is the internal parameter, and $v$ is some pre-determined value indicating how the users feel about the buttons from the beginning. $P(B) = 1 - P(A)$.

For example, in the case that $v = 0.5$, the users will favour button A. To perfectly counteract this, $u$ needs to be set to $-0.5$.

1

There are 1 best solutions below

2
On BEST ANSWER

Model

Let $A$ be the number of clicks on button A, and $B$ - on button B. Define $e:=A-B$, and the goal is to drive $e$ to zero.

At each step we have $$ e(t+1) = \begin{cases}e(t)+1 \text{ with the probability }p(A), \\e(t)-1 \text{ with the probability }1-p(A). \\\end{cases} $$ I assume that the next control action is taken after $N$ clicks are collected, and thus I consider expectations. Then we obtain $e(t+1) = e(t) + N(2p(A)-1) = e(t)+Ng(u(t))$, where $g \in (-1,1)$ $$ g(u(t)) = 2\frac{1}{1+e^{-u(t)-v}}-1, $$ $u$ is the control signal, and $v$ is a constant.

Control

Note that $g(u)$ is monotonic. Without any deep analysis, I would suggest trying a simple PI controller $$ \begin{aligned} z(t) &= z(t-1)+\alpha e(t),\\ u(t) &= -z(t) - \beta e(t). \end{aligned} $$ Here the gains $\alpha$ and $\beta$ are to be tuned.