The solution to discrete time finite horizon LQR problem is well studied. We have the linear system $$x_{k+1}=A x_{k}+B u_{k}+w_k$$ where $w_k$ is a random variable with mean $0$ and finite second moment, and we want to minimize $$J(\pi)=\mathbb{E} \{x_{N}^{\top} P x_{N}+\sum_{k=0}^{N-1} x_{k}^{\top} Q x_{k}+u_{k}^{\top} R u_{k}\}$$.
Given an input $x_0$, and sample i.i.d noise at each time step, we can recursively find optimal $u_k$.
Now, my question is, what if we want to mininize an "empirical" version of the above expectation, which is an average of $T$ trails: $$J = \frac{1}{T}\sum_{i=0}^{T} \big(x_{N}^{i\top} P x_{N}^{i}+\sum_{k=0}^{N-1} x_{k}^{i\top} Q x_{k}^{i}+u_{k}^{\top} R u_{k}\big)$$ where $$x_{k+1}^i=A x_{k}^i+B u_{k}+w_k^i$$
Of course, the initial point of each trail is the same, i.e., $x_0^i=x_0$ for all $i$. The key point here is that we want to find a SINGLE control sequence that minimize the empirical mean of several indepedent trails, where the state may be different due to indepdent noise $w_k^i$. Any existing theory or ideas to deal with the problem? Thanks!