I am currently reading a paper, and a proof (see below) in the paper talks about the hessian being positive semidefinite for a matrix variable function $g:\mathbb{R}^{n\times k} \to \mathbb{R}$, and said that $\nabla^{2}g(U) \succeq 0$, where $U \in \mathbb{R}^{n\times k}$. I think $\nabla^{2}$ means the hessian, but on the other hand I also heard that this means $\nabla^{2}g(U)[\dot{U}] \geq 0$ for all $\dot{U}$, where $\nabla^{2}g(U)[\dot{U}] = \lim_{ t \to 0 }\frac{\nabla g(U+t\dot{U}) -\nabla g(U)}{t}$.
I think the hessian is defined as if we treat the matrix as its vectorization in dimension $\mathbb{R}^{nk}$, so it's overall dimension should be $\mathbb{R}^{nk\times nk}$. Maybe it's also possible to treat the hessian as a function $\mathbb{R}^{n\times k} \times \mathbb{R}^{n\times k} \times \mathbb{R}^{n\times k} \to \mathbb{R}$, in the sense that we take the point and do the gradient direction and then the hessian direction?
In any case, I am not sure how the hessian being positive definite is equivalent (or implies) all directional derivative on the gradient is positive.
Specifically, I have attached the proof I'm reading below and I am confused with how $0 \leq \frac{1}{2}\left\langle\dot{U}, \nabla^2 g(U)[\dot{U}]\right\rangle$ in equation 8.
Lemma 1 Let $f(X)$ be a convex, twice continuously differentiable function of $X \in \mathcal{S}^{n \times n}$. Consider the convex problem $$ \underset{X \succeq 0}{\operatorname{minimize}} f(X) \tag{5} $$ Now consider the rank-constrained factorized version of the problem: $$ \underset{U \in \mathbb{R}^{n \times k}}{\operatorname{minimize}} g(U)=f\left(U U^T\right) \tag{6} $$ If $U$ is an SOSP of (6) with $\operatorname{rank}(U)<k$, then $U$ is a global minimum of (6) and $U U^T$ is a global minimum of (5). (Notice that such a point may not exist in general.)
Proof: Necessary and sufficient optimality conditions for (5) are: $\nabla f(X) \succeq 0$ and $\nabla f(X) X=0$. Let $U$ be an SOSP for (6) with $\operatorname{rank}(U)<k$ and define $X=U U^T$. Then, $\nabla g(U)=2 \nabla f\left(U U^T\right) U=0$ and $\nabla^2 g(U) \succeq 0$. The first statement readily shows that $\nabla f(X) X=$ 0 . The Hessians of $f$ and $g$ are related by: $$ \frac{1}{2} \nabla^2 g(U)[\dot{U}]=\nabla f\left(U U^T\right) \dot{U}+\nabla^2 f\left(U U^T\right)\left[U \dot{U}^T+\dot{U} U^T\right] U \tag{7} $$ Since $\operatorname{rank}(U)<k$, there exists a vector $z \in \mathbb{R}^k$ such that $U z=0$ and $\|z\|_2=1$. For any $x \in \mathbb{R}^n$, set $\dot{U}=x z^T$ so that $U \dot{U}^T+\dot{U} U^T=0$. Using second-order stationarity of $U$, we find: $$ 0 \leq \frac{1}{2}\left\langle\dot{U}, \nabla^2 g(U)[\dot{U}]\right\rangle=\left\langle x z^T, \nabla f\left(U U^T\right) x z^T\right\rangle=x^T \nabla f\left(U U^T\right) x \tag{8} $$ This holds for all $x \in \mathbb{R}^n$, hence $\nabla f\left(U U^T\right) \succeq 0$ and $X=U U^T$ is optimal for (5). Since (5) is a relaxation of (6), it follows that $U$ is optimal for (6).