Why does the gradient of function $f$ only exists for function that outputs scalars?

79 Views Asked by At

I am studying matrix calculus and I don't quite understand why the gradient only exists for functions that take $m \times n$ matrices as input and that output a scalar, i.e., functions of the type $\mathbb R^{m \times n} \to \mathbb R$.

1

There are 1 best solutions below

0
On BEST ANSWER

The gradient of a differentiable function $f : \mathbb R^{m \times n} \to \mathbb R$

$$\nabla f (\mathrm X) : \mathbb R^{m \times n} \to \mathbb R^{m \times n}$$

is the matrix-valued function that produces the directional derivative of $f$ in the direction of $\mathrm V \in \mathbb R^{m \times n}$ at $\mathrm X \in \mathbb R^{m \times n}$ via the following Frobenius inner product

$$\langle \mathrm V, \nabla f (\mathrm X) \rangle$$

If the output of a function is not a scalar, then its directional derivative will not be a scalar either. Hence, there is no way of producing the directional derivative via some inner product, as inner products produce scalars (by definition).