Mean-value theorem for function of matrices

143 Views Asked by At

Consider a twice differentiable function $f: {\Bbb R}^{d \times d} \to {\Bbb R}$ with bounded derivatives

$$ \left | \frac{\partial^2 f}{\partial X_{kl} \partial X_{ij}} \right | \leq D $$

I am looking for a mean-value theorem for $f$ which, if adapted from general versions of Taylor's theorem for Banach spaces, would state the following: \begin{equation} f(X+Y) - f(X) \leq \text{tr}(\nabla f(X)^\top Y) + \frac12 D \|Y\|^2_F, \end{equation}

where $\|Y\|_F$ is the Frobenius norm of the matrix $Y$. Where can I find such a statement — ideally, with a proof?


Here is my own attempt to prove this statement (I have an additional constant $C$ that pops up though).

Define $g: [0,1] \to R$ as $g(t) = f(X + t Y)$. Then by applying the standard mean-value theorem to $g$, we get \begin{align} f(X+Y) - f(X) &= g(1) - g(0) = \int_0^1 g'(t) dt \\ &= g'(0) + \int_0^1 g'(t) - g'(0) dt \\ &= g'(0) + \int_0^1 \int_0^t g''(s) ds dt. \end{align}

For the first term, we simply apply the multivariate chain rule, $$ g'(t) = \sum_{i,j=1}^d \frac{\partial f}{\partial X_{ij}} (X + t Y) \cdot Y_{ij}, $$ which yields $$ g'(0) = \sum_{i,j=1}^d \frac{\partial f}{\partial X_{ij}} (X) \cdot Y_{ij} = \text{tr}(\nabla f(X)^\top Y). $$

For the second term, note that (by a change of variable $s = ut$) \begin{align} & \int_0^t g''(s) ds = \left( \int_0^1 g''(u) du \right) \cdot t \\ \implies & \int_0^1 \int_0^t g''(s) ds = \int_0^1 g''(u) du \cdot \int_0^1 t dt = \frac12 \int_0^1 g''(u) du. \end{align}

By combining the last equations, we obtain $$ f(X+Y) - f(X) = g'(0) + \frac12 \int_0^1 g''(u) du. $$

Next, we apply the multivariate chain rule again to $g''(u)$: \begin{align} \frac{d^2 g}{d t^2} &= \frac{d}{d t} \left( \frac{d g}{d t} \right) = \frac{d}{d t} \left( \sum_{i,j=1}^d \frac{\partial f}{\partial X_{ij}} (X + t Y) \cdot \frac{\partial X_{ij}(t)}{\partial t} \right) \\ &= \sum_{i,j=1}^d \frac{d}{d t} \left( \frac{\partial f}{\partial X_{ij}} (X + t Y) \right) \cdot \frac{\partial X_{ij}(t)}{\partial t} + \frac{\partial f}{\partial X_{ij}} (X + t Y) \cdot \frac{d}{d t} Y_{ij} \\ &= \sum_{i,j=1}^d \sum_{k,l=1}^d \frac{\partial}{\partial X_{kl}} \left( \frac{\partial f}{\partial X_{ij}} (X + t Y) \right) \cdot \frac{\partial X_{kl}(t)}{\partial t} \cdot \frac{\partial X_{ij}(t)}{\partial t} + 0 \\ &= \sum_{i,j,k,l=1}^d \frac{\partial^2 f}{\partial X_{kl} \partial X_{ij}} (X + t Y) \cdot Y_{kl} Y_{ij}. \end{align}

We can upper bound this term by noting that $$ \sum_{i,j,k,l=1}^d Y_{kl} Y_{ij} = \left ( \sum_{i,j}^d Y_{ij} \right)^2 \leq C \sum_{i,j}^d Y_{ij}^2 = C \| Y \|^2_F, $$ for a given constant $C > 0$ (which depends on $d$).

We therefore conclude that $$ f(X+Y) - f(X) \leq \text{tr}(\nabla f(X)^\top Y) + \frac12 C D \| Y \|^2_F. $$