Consider the quadratic function: $$ f(x) := \frac{1}{2} \sum_{i=1}^d a_i x_i^2 $$ where $x, a \in \mathbb{R}^d$. According to Gradient Descent w.r.t. $x$, we update as follows: $$ x^{(k+1)} := x^{(k)} - t \nabla f(x^{(k)}) $$ With $t$ being the step size. Therefore, $$ x^{(k+1)}_i = (1- t a_i) x_i^{(k)} $$ Implying that $$ x_{i}^{(k)} = (1-t a_i)^{k} x_i^{(0)} $$ Now assume that $t |a_i| < 1$ for all $i$, and there exists exactly one $i$ s.t. $a_i < 0$. According to this source, it holds that:
The function diverges to negative infinity exponentially quickly from any randomly chosen starting point.
My question is: What exactly does it mean that ‘The function diverges to negative infinity’?
Your understanding is wrong . In the source context, he means if a single $a_i$ is negative, then the algorithm will let $x_j^k \to 0 ,\, j\not = i$ except the $x_i^k \to \infty ,as \,k\to \infty$ , then the value of function $f_k = \frac 12 a_i(x_i^k)^2\to-\infty$ because $a_i $ is negative