I'm trying to figure out the chain rule in relation to a vector-matrix calculation. I calculate derivatives of several vector-functions:
$q_1=x^Tx$, $q_2=x \cdot x$, $q_3=xx^T$, $q_4=xx^Tx$, $q_5=(xx^T)(xx^T)$
We use a vector $x$ for differentiation, and the above functions $q_{1...5}$ are various combinations of the vector $x$ and the resulting objects:
$q_1,q_2 \rightarrow$ scalars
$q_3 \rightarrow$ matrix
$q_4 \rightarrow$ vector
$q_5 \rightarrow$ matrix
The derivative of a vector with respect to a vector will be the identity matrix, i.e. $\frac{dx}{dx}=\boldsymbol{1}$:
Now let's see the results obtained through the chain rule:
$\frac{dq_1}{dx}=\frac{dx}{dx}^Tx+x^T\frac{dx}{dx}=\boldsymbol{1}^Tx+x^T\boldsymbol{1}$
$\frac{dq_2}{dx}=\boldsymbol{1}x+x\boldsymbol{1}$
$\frac{dq_3}{dx}=\boldsymbol{1}x^T+x\boldsymbol{1}^T$
$\frac{dq_4}{dx}=\boldsymbol{1}x^Tx+x\boldsymbol{1}^Tx+xx^T\boldsymbol{1}$
$\frac{dq_5}{dx}=\boldsymbol{1}x^T(xx^T)+x\boldsymbol{1}^T(xx^T)+(xx^T)\boldsymbol{1}x^T+(xx^T)x\boldsymbol{1}^T$
Now let's briefly analyze the results:
sum of a row-vector and a column-vector. To get the result, we need to transpose either a row-vector or a column-vector
a similar situation, only this time in one of the terms we need to swap $x$ and $\boldsymbol{1}$ manually
none of the terms is computable, but logically, as a result of differentiation, a tensor should be obtained, therefore, ordinary products must be replaced by Kronecker products
first and third terms are matrices, which corresponds to the logic of the result, but the second has a non-computable structure, and it is not known how to convert it to a computable one
logically, a tensor should be obtained, but the logic of permutations in the terms is also difficult to disclose
My question is: there must be rules for transforming "chain" expressions obtained by differentiating complex vector-matrix expressions by the chain rule to obtain computable results. Are they known? I would be happy and grateful for help in understanding the solution to this problem.
Some example:
EDIT NUMBER 3:



$ \newcommand\DD[2]{\frac{\mathrm d#1}{\mathrm d#2}} \newcommand\tDD[2]{\mathrm d#1/\mathrm d#2} \newcommand\diff{\mathrm D} \newcommand\R{\mathbb R} $
Let's change perspectives. Your rule $\tDD xx = \mathbf 1$ tells me that what you want is the total derivative; this rule is equivalent to saying that the total derivative $\diff f_x$ at any point $x \in \R^n$ of the function $f(x) = x$ is the identity, i.e. $\diff f_x(v) = v$ for all $v \in \R^n$. Your transposes are essentially stand-ins for inner products. Let $\cdot$ be the standard inner product on $\mathbb R^n$. Then we may write each of your $q$'s as $$ q_1(x) = q_2(x) = x\cdot x,\quad q_3(x; w) = x(x\cdot w),\quad q_4(x) = (x\cdot x)x,\quad q_5(x; w) = x(x\cdot x)(x\cdot w). $$ I've interpreted the outer products $xx^T$ as functions $w \mapsto x(x\cdot w)$, and in $q_5$ I've used the associativity of matrix multiplication to get $$ (xx^T)(xx^T) = x(x^Tx)x^T. $$ When taking a total derivative $\diff f_x$, we may leave the point of evaluation $x$ implicit and write e.g. $\diff[f(x)]$ or even just $\diff f$ if $f$ is implicitly a function of $x$. If we want to differentiate a variable other than $x$, e.g. $y$, we will write e.g. $\diff_y[x + 2y](v) = 2v$. The total derivative has three fundamental properties:
Lets apply this to each $q$:
$$ \diff[q_1](v) = \diff[x\cdot x](v) = \dot\diff[\dot x\cdot x](v) + \dot\diff[x\cdot\dot x](v) = 2\dot\diff[\dot x\cdot x](v) = 2v\cdot x, $$$$ \diff[q_3](v) = \diff[x(x\cdot w)](v) = \dot\diff[\dot x(x\cdot w)](v) + \dot\diff[x(\dot x\cdot w)](v) = v(x\cdot w) + x(v\cdot w), $$$$ \diff[q_4](v) = 2(v\cdot x)x + (x\cdot x)v, $$$$ \diff[q_5](v) = v(x\cdot x)(x\cdot w) + 2x(v\cdot x)(x\cdot w) + x(x\cdot x)(v\cdot w), $$ in summary $$ \diff[q_1](v) = 2v\cdot x,\quad \diff[q_3(x; w)](v) = v(x\cdot w) + x(v\cdot w),\quad \diff[q_4](v) = 2(v\cdot x)x + (x\cdot x)vm $$$$ \diff[q_5(x; w)](v) = v(c\cdot x)(x\cdot w) + 2x(v\cdot x)(x\cdot w) + x(x\cdot x)(v\cdot w). $$ Note how $\diff[q_3]$ and $\diff[q_5]$ end up with two extra vector parameters $v, w$; this is indicating that these derivatives are higher-order tensors (where by "tensor" we mean a multilinear map). The tensor types of each of the above are
In this case, $(p, q)$ says that $q$ vectors are inputs and $p$ vectors are outputs. We call $p + q$ the degree of the tensor. We can translate these back into index/tensor notation as follows: $$ (\diff[q_1])_i = 2x_i \sim 2x^T, $$$$ (\diff[q_3])_{ij}^k = \delta^k_ix_j + \delta_{ij}x^k \sim \mathbf1\otimes x^T + x\otimes\mathbf g, $$$$ (\diff[q_4])_i^j = 2x_ix^j + x_kx^k\delta_i^j \sim 2x\otimes x^T + |x|^2\mathbf1, $$$$ (\diff[q_5])_{ij}^k = \delta_i^kx_lx^lx_j + 2x^kx_ix_j + x^kx_lx^l\delta_{ij} \sim |x|^2\mathbf1\otimes x^T + 2x\otimes x^T\otimes x^T + x\otimes\mathbf g. $$ In this context, $x^T$ is best thought of as the $(0,1)$ tensor dual to $x$. $\mathbf1$ is the (1,1)-identity tensor, which can be thought of as the identity matrix. Closely related is the metric tensor $\mathbf g(v, w) = v\cdot w$. Only $\diff[q_1]$ and $\diff[q_2]$ can be written in matrix notation, since they are the only degree $\leq2$ tensors; for $\diff[q_2]$ we could write $$ \diff[q_2] \sim 2xx^T + |x|^2\mathbf1. $$ We can see from the above precisely where your equations fail