How is differentiation of $xx^T$ with respect to $x$ as $2x^T$, where $x$ is a vector? $x^T $means transpose of $x$ vector.
Differentiation of $xx^T$ where $x$ is a vector
7.9k Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail AtThere are 3 best solutions below
On
I'm posting this second answer at the request of Doctor Mohawk, who wants to see the derivative of $xx^T$ for column vectors $x$. We note the function $x \to xx^T$ maps column vectors $x \in \Bbb R^n$ into the space of real square $n \times n$ matrices, as opposed to $x \to x^Tx$, which maps onto $\Bbb R$. In any event, proceding in a manner similar to the answer above, we find
$(x + h)(x + h)^T = xx^T + hx^T + xh^T + hh^T; \tag{1}$
we observe that for column vectors $y, z \in \Bbb R^n$
$zy^T = \begin{pmatrix} z_1 \\ z_2 \\ \vdots \\ z_n \end{pmatrix} \begin {pmatrix} y_1 & y_2 & \dots & y_n \end{pmatrix} = \begin{bmatrix} z_i y_j \end{bmatrix}, \tag{2}$
and so
$hh^T = \begin{bmatrix} h_i h_j \end{bmatrix}; \tag{3}$
now if we agree that the norm $ \Vert z \Vert$ of any $z \in \Bbb R^n$ is given by
$\Vert z \Vert^2 = z^Tz = \sum_i z_i^2, \tag{4}$
and the norm of a matrix
$M = \begin{bmatrix} m_{ij} \end{bmatrix} \tag {5}$
is $\Vert M \Vert^2 = \sum_{i, j} m_{ij}^2, \tag{6}$
then
$\Vert hh^T \Vert^2 = \sum_{i, j} h_i^2 h_j^2 = (h^Th)^2; \tag{7}$
we now return to (1) and see that
$(x + h)(x + h)^T - xx^T - ( hx^T + xh^T) = hh^T; \tag{8}$
for fixed $x$ the map
$h \to hx^T + xh^T \tag{9}$
is manifestly linear in $h$. Furthermore, from (7) we have
$\Vert hh^T \Vert = h^Th = \sum_i h_i^2, \tag{10}$
so in accord with (4),
$\Vert hh^T \Vert = \Vert h \Vert^2, \tag{11}$
whence, for $h \ne 0$,
$\dfrac{\Vert hh^T \Vert}{\Vert h \Vert} = \Vert h \Vert \to 0 \tag{12}$
as
$\Vert h \Vert \to 0; \tag{13}$
since, from (8),
$\Vert (x + h)(x + h)^T - xx^T - ( hx^T + xh^T) \Vert = \Vert hh^T \Vert, \tag{14}$
we see that the linear map $hx^T + xh^T$ approximates the "error" in $(x + h)(x + h)^T - xx^T$ up to second order in $\Vert h \Vert$; but this is precisely the definition of the derivative.
On
Differentiation of $xx^T$ is a 3 dimensional tensor to the best of my knowledge,with each element in the matrix $xx^T$ to be differentiated by each element of the vector $x$,and you can just see the result of a derivative online at site http://www.matrixcalculus.org/.Any further communication is preferred.
I think, if we use the usual convention that $x$ denotes a column vector, we want to differentiate $x^Tx$ instead of $xx^T$; the result doesn't appear to be true in the latter case, as is explained below.
Let $V$ denote a finite dimenesional space of column vectors over $R$.
Set
$f(x) = x^Tx; \tag{1}$
then
$f(x + h) = (x + h)^T(x + h) = (x^T + h^T)(x + h)$ $= x^Tx + x^Th + h^Tx + h^Th, \tag{2}$
so that
$f(x + h) - f(x) = x^Th + h^Tx + h^Th = x^Th + h^Tx + \vert h \vert^2, \tag{3}$
where $\vert h \vert^2 = h^Th$ is of course a norm on $V$; from (3) it follows that the derivative of $f(x)$ is the linear map $D_xf:V \to \Bbb R$, where
$D_xf(h) = x^Th + h^Tx. \tag{4}$
We now observe that, just as $x^Tx \in \Bbb R$ for all $x \in V$, $x^Th, h^Tx \in \Bbb R$ as well; all these quantities are scalars. This being the case, we have that
$(h^Tx)^T = h^Tx, \tag{5}$
but
$(h^Tx)^T = x^Th, \tag{6}$
so that
$h^Tx = x^Th; \tag{7}$
using this in (4) yields
$D_xf(h) = x^Th + x^Th = 2x^Th, \tag{8}$
i.e. the derivative of $x^Tx$ is $D_xf = 2x^T$; this of course taken in the sense of a linear functional from $V$ to $\Bbb R$.
Now, the reason I went with $f(x) = x^Tx$ instead of $xx^T$ is that, for column vectors $x$, $xx^T$ is an $n \times n$ real matrix; nevertheless, much of what we have said above still applies, though with minor alterations (i.e. systematically replacing $u^Tv$ with $vu^T)$. We cannot, however, assert that $xh^T = hx^T$ in this case; to see this, simply work out the components of the first row of each matrix; you will find $x_1 h^T$ and $h_1 x^T$, respectively; they are not in general the same; here $x = (x_1, x_2, \ldots, x_n)^T$ and so forth.
Finally, I have gone with real vector spaces in the above, since transpose "${}^T$" is usually used in the real case, with adjoint, "${}^\dagger$" being reserved for the complex.
Hope this helps. Cheers,
and as always,
Fiat Lux!!!