I'm reading the 'Overthinking: Pareto-smoothed cross-validation' in Chapter 7.4 of Richard McElreath's textbook Statistical Rethinking 2nd edition. The author said:
Cross-validation estimates the out-of-sample log-pointwise-predictive-density (lppd, page 210). If you have N observations and fit the model N times, dropping a single observation $y_i$ each time, then the out-of-sample lppd is the sum of the average accuracy for each omitted $y_i$ . $$lppd_{cv} = \sum_{i=1}^{N} \frac{1}{S} \sum_{s=1}^{S} \log Pr(y_i | \theta_{-i,s})$$ where s indexes samples from a Markov chain and $θ_{−i,s}$ is the s-th sample from the posterior distribution computed for observations omitting $y_i$ .
Importance sampling replaces the computation of N posterior distributions by using an estimate of the importance of each i to the posterior distribution. We draw samples from the full posterior distribution $p(θ|y)$, but we want samples from the reduced leave-one-out posterior distribution $p(θ|y_{−i} )$.So we reweight each sample s by the inverse of the probability of the omitted observation: $$ r(\theta_s) = \frac{1}{p(y_i|\theta_s)} $$ This weight is only relative, but it is normalized inside the calculation like this: $$\text{lppd}_{\text{IS}} = \sum_{i=1}^N \log \left( \frac{\sum_{s=1}^S r(\theta_s) p(y_i | \theta_s)}{\sum_{s=1}^S r(\theta_s)} \right)$$ And that is the importance sampling estimate of out-of-sample lppd.
As seen above,the importance sampling weight, $ r(\theta_s) $, is defined as the inverse of the likelihood of the omitted observation, such that:
$$ r(\theta_s) = \frac{1}{p(y_i|\theta_s)} $$
Given this definition, my intuition was that the product of $ r(\theta_s) $ and the likelihood $ p(y_i|\theta_s) $ should equal 1 in the numerator of lppdIS, since:
$$ r(\theta_s) \cdot p(y_i|\theta_s) = \frac{1}{p(y_i|\theta_s)} \cdot p(y_i|\theta_s) = 1 $$
However, the context of the $lppd_{IS}$ calculation suggests that this product does not equal 1.
I am wondering if there's a conceptual detail I'm overlooking, or perhaps a subtlety in the calculation that evades me. Any insights would be greatly appreciated!