Say I have a probability distribution $P(s'|s, a)$ I'd like to estimate. By sampling $s'$ from a generative model $N$ times for each $(s, a)$ pair, I obtain the probability distribution $\hat P(s'|s, a)$ for each $s'$, which is simply the number of times $s'$ occurs divided by $N$. It can be written as $\hat P(s'|s, a) = \sum_{i=1}^N\frac{I(s_i=s')}{N}$
Now I can use Hoeffding's inequality to show that: $$\mathcal P(|\hat P(s'|s,a) - P(s'|s,a)|\geq t) \leq 2e^{2nt^2}$$
However, if I use $\hat P$ to estimate another quantity $E_{P}\{V^*(s')\} = \sum_{s'}P(s'|s,a)V^*(s')$ by $E_{\hat P}\{V^*(s')\} = \sum_{s'}\hat P(s'|s,a)V^*(s')$. How can I bound the difference $$\mathcal P(| \sum_{s'}P(s'|s,a)V^*(s')-\sum_{s'}\hat P(s'|s,a)V^*(s')|\geq t)\leq ??$$ by using the result above? Or can I simply use Hoeffding's inequality directly? If so, I still want to know the relationship between these two bounds.
Besides, $V$ can be considered a fixed vector(not a random variable). So the variation part is still the difference between $\hat P$ and $P$.
My question is equivalent to if I can use $N$ iid samples to estimate a tabular distribution $P(s'|s,a)$ as $\hat P(s'|s,a)$.(The difference between $P$ and $\hat P$ can be easily bounded by Hoeffding). What can I say about the difference between $E_{\hat P}(f(s'))$ and $E_{P}(f(s'))$
Re-edit: I think I have figured that out. For each $(s,a)$ pair, I still have $N$ iid samples $s'$. Then, instead of estimating $P(s|s,a)$,we calculate $$\frac{1}{N}\sum_{i=1}^N \sum_{s'\in\mathcal S}I(s_i=s')V^*(s') \\=\sum_{s'\in\mathcal S}\sum_{i=1}^N \frac{I(s_i=s')}{N}V^*(s')\\=\sum_{s'\in\mathcal S}\hat P(s'|s,a)V^*(s')$$ Then we can use Hoeffding's inequality directly. Here, the key is to construct $N$ iid $I(s_i=s')V^*(s')$ and treat it as our new random variable, and then we can bound the sample mean from the true mean. The sample mean then can be written as an expectation of $V$ w.r.t. $\hat P$/$P$.