Meaning of a term in stochastic gradient descent

63 Views Asked by At

This is in reference to the first two pages of Robbins-Monro "A stochastic approximation method," https://projecteuclid.org/euclid.aoms/1177729586.

What is the meaning of the RHS in (8)? As I understand it, there is a measure space $(\Sigma,\mu)$ and $x_n$ and $y_n$ are real-valued measurable functions on $\Sigma.$ The LHS of (8) is a conditional probability, and so for each number $y$ it is a function whose domain is $x_n(\Sigma)\subset\mathbb{R}.$ However, for each number $x$ and $y$, $H(y\mid x)$ is a number, and so it seems that for each number $y$, the RHS $H(y\mid x_n)$ is a measurable function on $\Sigma.$ So I don't understand the meaning of the equality.