Let $\mathbb H$ be an arbitrary separable Hilbert space. The covariance operator $C:\mathbb H\to\mathbb H$ between two $\mathbb H$-valued zero mean random elements $X$ and $Y$ with $\operatorname E\|X\|^2<\infty$ and $\operatorname E\|Y\|^2<\infty$ is defined by $$ C(h)=\operatorname E[\langle X,h\rangle Y] $$ for each $h\in\mathbb H$. Why is the covariance operator defined in this way?
It seems that we can arrive at this definition if we try to generalise the definition of the covariance matrix. Suppose for the moment that $\mathbb H=\mathbb R^n$. Then the covariance matrix is given by $\operatorname E[XY^T]$ which is a bounded linear operator from $\mathbb R^n$ to $\mathbb R^n$. Also, we have that $$ \operatorname E[YX^T](h) =\operatorname E[YX^Th] =\operatorname E[Y\langle X,h\rangle] =\operatorname E[\langle X,h\rangle Y] $$ for each $h\in\mathbb R^n$. The rightmost expression only depends on the inner product $\langle\cdot,\cdot\rangle$ and maybe we can say that this is the definition of the covariance operator for the arbitrary $\mathbb H$. Is this the right intuition behind the definition of the covariance operator?
Any help is much appreciated!
Your line of thought is one of the possible intuitions. One another is the following: Note that a matrix does not only represent a linear operator but also a bilinear form. We define the covariance form of $X,Y \colon \Omega \to H$ by $\def\(#1){\left<#1\right>}$ $$ c(h,k) = \mathbf E [\(X,h)\(Y,k)] $$ that is $$ c(h,k) = \mathrm{cov}\bigl(\(X,h), \(Y,k)\bigr) $$ we reduce it to the covariance of real - mean zero - random variables. This is a bounded, bilinear form. By the Riesz representation theorem, $c$ corresponds to an unique linear operator $C \colon H \to H$ by $$ \(Ch,k) = c(h,k), $$ Then we have, $$ \(Ch, k) = \mathbf E [\(X,h)\(Y,k)] = \({\mathbf E [\(X,h)Y]}, k ) $$ so $$ Ch = \mathbf E [\(X,h)Y]. $$