If we have a deterministic signal $x[n]$ and its transform
$$ X(f) = \sum\limits_{n=-\infty}^{\infty}x[n]\exp\left(-2\pi fn\right)$$
I can think of this as containing knowledge of a discrete-time Fourier series (writing the function as a sum of cosines). In this sense, it shows the oscillatory nature of the signal and illuminates what will happen to it as if goes through a linear system (based on $H(f)$):
$$ y[n] = \sum\limits_{k=\infty}^{\infty} x[n-k]h[k]$$
But with a WSS stochastic signals $x[n]$ and $y[n]$, we transform a function of expectations. What exactly does $S_{xy}(f) = \sum\limits_{\tau=-\infty}^{\infty}R_{xy}[\tau]\exp\left(-2\pi fn\right)$ show (where $\mathbb{E}x[n+\tau]y[n] := R_{xy}[\tau]$)? Or what about when $S_{xx}$? I understand the argument that shows that these functions are a "distribution of expected covariance/expected power over frequency". I'm just unsure about applying the deterministic interpretation of frequencies/sinusoids to this stochastic variety. What exactly does it mean to say the expectations can be composed of sinusoids of varying phase and magnitude? I'm just not feeling a good connection to what these spectrums show.
The best "hand waving" explanation I have seen was that "If $S_{xx}(f)$ has concentrations at higher frequencies, then we sort of expect $R[\tau]$ to rapidly jiggle around, reaching its "steady-state" ($m^2$ where $m$ is the mean) sooner. Hence, the process is not predictable for long, and it seems reasonable that it too changes rapidly. Along this line of reasoning, it can be shown that if $x[n]$ goes through $h$, $R_{xx}[\tau]$ goes through $h[k]$ and $h[-k]$ (so if $h$ makes $x$ more frequent, the statistics becomes more frequent too, etc.
Another interpretation I have seen is that $R_{xx}[0] - R_{xx}[\tau]$ is the expected power of the difference signal.
$\mathbb{E}x[n+\tau]y[n] := R_{xy}[\tau]$ is the expected value of the cross-correlation at lag $\tau$. This is the average over the signal of the degree to which the signal $y$ "now" can be used to estimate the signal $x$ at "now + $\tau$". Since these signals are assumed stochastic (but stationary enough for these expectations to be stationary) we cannot say what the real cross-correlation is, only what the average is. This expection tells us about the predictability of one signal from the other on average.
$R_{xx}$ is the autocorrelation equivalent to the above. This expectation tells us about the predictability of a signal at various lags given its current value, on average.
The $S_{xy}$ and $S_{xx}$ are the power spectra of these expectations. If you can predict over long times, the correlations in the expectations will have their first zero-crossing(s) at large lags and will be "large" out to these large lags. After a few zero-crossings, we expect the correlations to meander around zero indicating that we have left predictability. The spectra will then have narrow bandwidth (around DC). Narrow bandwidth means that the peak near DC pokes up out of the noise floor and there is some consistent, predictable behaviour in the signal(s). Equivalently, the system has a low slew rate.
If you can only predict over short times, the correlations in the expectations will have their first zero-crossing(s) at small lags and will quickly become random noise centered at zero (so indicate unpredictability). The spectra will then have wide bandwidth (around DC). Wide bandwidth means that the smeared out lump around DC does not rise much out of the noise floor and the signal(s) are not strong predictors of their later behaviour. Equivalently, the system has a fast slew rate.
Note that the signals may be better predictors at certain frequencies than at others (for example, if the source system is resonant at one of these frequencies, or the source system injects narrowband noise). Consequently, the correlations may show short or long durations of predictability (i.e. with the entire spectrum muddled together) but the transforms may show that the predictability at certain frequencies is vastly longer or vastly shorter than what one might estimate from zero-crossings or correlation decay rates.
$R_{xx}[0] - R_{xx}[r] = \mathbb{E}x[n+0]x[n] - \mathbb{E}x[n+r]x[n]$, which is proportional to expected power "now" minus power at "now + $r$", so is proportional to the power difference, as you mention. Note that this is another "on average" claim -- at any particular time, this is probably false, but the mean power difference averaged over long times will tend to this value.