Optimizing with probability distribution as variable

89 Views Asked by At

I'm not sure if I can do this, but I am interested in optimizing with the shape of a probability distribution as the optimizing variable. To learn more about this I have a toy problem:

Think of N roomba-like agents trying to explore a closed, square space in 2-D. They only update their headings when they bump into one of the boundaries. When this occurs they choose their new heading from a random distribution $\theta_{i}$, representing the $i$-th agent, eg:

\begin{aligned} &\underset{\theta_{1,...,N}}{max} \text{ [exploration metric] } \\ & \qquad \dot{x} = A_{\sigma}x \\ & \qquad \text{[boundaries]} \\ & \qquad \text{...} \end{aligned}

There is no benefit for the agents to explore previously explored spaces, so there is an incentive to alter the distributions. In order to constrain the problem further there can also be a limit on time/iterations/control energy.

Is this a reasonable way to think about this type of problem? If so, could someone point me to some resources to learn more about this, both in theory and applied? I am open to approaching this from either continuous, discrete, or both perspectives. Thanks!