In Bishop's Neural Networks for Pattern recognition , he expresses an error function
$$ E= \frac{1}{2}\sum_{n=1}^{N}\sum_{k}\{(y_k(\mathbf{x}^{n};\mathbf{w})-t^{n}_{k}\}^2$$
where $\mathbf{x}^{n}$ is an instance vector from the dataset, $\mathbf{w}$ is the weight matrix of a hidden layer, $y_k(\mathbf{x}^{n};\mathbf{w})$ is the predicted or model's output, $t_{k}$ is the actual target value, $N$ is the size of training data, and $k$ indexes the outputs. He then multiplies by $\frac{1}{N}$, takes the limit as $N$ approaches infinity to find
$$ E= \lim_{N \to \infty }\frac{1}{2N}\sum_{n=1}^{N}\sum_{k}\{(y_k(\mathbf{x}^{n};\mathbf{w})-t^{n}_{k}\}^2$$
which is apparently
$$\frac{1}{2} \sum_{k}\int \int \{(y_k(\mathbf{x};\mathbf{w})-t_{k}\}^2 p(t_k,\mathbf{x}) \, dt_k \, d\mathbf{x}.$$
Why does the Riemann integral get multiplied by the underlying probability distribution $p(t_k,\mathbf{x})$ ?