Suppose we are given $m$ points $\{a_1, \cdots, a_m\}$ where $a_i \in \mathbb{R}^n$, and the corresponding target values are $\{b_1, \cdots, b_m\}$ where $b_j \in \mathbb{R}$, and we want to solve the corresponding least square problem as follows
$$ \min_{x \in \mathbb{R}^n} \quad \frac{1}{2m} \sum_{i=1}^m (a_i^{\top}x - b_i)^2 $$
The true gradient is $\nabla f(x) = \frac{1}{m}\sum_{i=1}^m a_i(a_i^{\top}x - b_i)$
It is claimed that $v_k = a_k(a_k^{\top}x - b_k)$ where $k$ is a random number being drawn from $\{1, \cdots, m\}$ uniformly, is an unbiased estimator of gradient.
How can we show this?
Let $X$ be a random variable whose support is $S_X =\{1, \cdots, m\}$ and is distributed as $P_X(X=k)=\frac{1}{m}$ for $k$ in $S_X$. Definition of expectation says
$$ \mathbb{E}_X [g(X)]= \sum_{ k \in S_X} g(k)P_X(k) $$ Therefore, $$ \mathbb{E}_X [a_X(a_X^{\top}x - b_X)] = \sum_{ k=1}^m a_k(a_k^{\top}x - b_k)\cdot \frac{1}{m}=\nabla f(x)$$