Partial derivative of product of complex matrices

61 Views Asked by At

Given a quantity

$$e(n) = d(n) - \mathbf{w}^{H}(n)\mathbf{u}(n)$$

where $\mathbf{w}(n)$ and $\mathbf{u}(n)$ are both $M \times 1$ vectors. I am trying to derive the gradient of $|e(n)|^{2}$ with respect to $\mathbf{w}(n)$. To do that

$$e(n) = d(n) - \mathbf{w}^{H}(n)\mathbf{u}(n)$$ $$e^{*}(n) = d^{*}(n) - \mathbf{u}^{H}(n)\mathbf{w}(n)$$

Hence,

$$|e(n)|^{2} = |d(n)|^{2} - d^{*}(n)\mathbf{w}^{H}(n)\mathbf{u}(n) - d(n)\mathbf{u}^{H}(n)\mathbf{w}(n) + \mathbf{w}^{H}(n)\mathbf{u}(n)\mathbf{u}^{H}(n)\mathbf{w}(n)$$

When calculating the partial derivative w.r.t $\mathbf{w}(n)$ I am not sure If I am doing it correctly.

\begin{align} \nabla|e(n)|^{2} =& \, 0 - 0 - d(n)\mathbf{u}^{H}(n) + \mathbf{w}^{H}(n)\mathbf{u}(n)\mathbf{u}^{H}(n) (?) \end{align}

Any help will be much appreciated.

2

There are 2 best solutions below

2
On

Let me try my best to refresh what I learned. But please do give correction if anywhere wrong.

You are close to the right, except missing $2$ in the last term. \begin{align} \nabla|e(n)|^{2} =& \, 0 - 0 - d(n)\mathbf{u}^{H}(n) + 2\mathbf{w}^{H}(n)\mathbf{u}(n)\mathbf{u}^{H}(n) (?) \end{align}


Updated

Consider a linear function of the form \begin{align} f(\mathbf{x})=\mathbf{a}^\mathrm{T}\mathbf{x} \end{align} where $\mathbf{a}$ and $\mathbf{x}$ are $n$-dimensional vectors. The superscript $\mathrm{T}$ means transpose for real vector and the corresponding complex version will be Hermetian notation $\mathrm{H}$. We have already known that \begin{align} \frac{\partial x_i}{\partial x_j}=\delta_{ij}, \end{align} and that the gradient of a scalar function is defined as a column vector with its components corresponding to the partial derivatives. Let's step into the inside of a vector (or matrix) to derive the gradient of a linear function (here obtained from vector inner product). \begin{align} \frac{\partial}{\partial x_k}f(\mathbf{x})&=\frac{\partial}{\partial x_k}\left(\begin{bmatrix}a_1&\ldots&a_n\end{bmatrix}\begin{bmatrix} x_1\\ \vdots\\ x_n\end{bmatrix}\right)\\ &=\begin{bmatrix}a_1&\ldots&a_n\end{bmatrix}\frac{\partial}{\partial x_k}\left(\begin{bmatrix} x_1\\ \vdots\\ x_n\end{bmatrix}\right)\\ &=\begin{bmatrix}a_1&\ldots&a_n\end{bmatrix}\left(\begin{bmatrix} \delta_{ik}\\ \vdots\\ \delta_{nk}\end{bmatrix}\right)\\ &=a_k \end{align} We then assemble all the resulting derivatives of components for $k=1\ldots n$. \begin{align} \nabla f(\mathbf{x})=\begin{bmatrix} \frac{\partial}{\partial x_1}\\ \vdots \\ \frac{\partial}{\partial x_n} \end{bmatrix}f(\mathbf{x})=\begin{bmatrix}\vdots\\\frac{\partial}{\partial x_k}\\ \vdots\end{bmatrix}\mathbf{a}^\mathrm{T}\mathbf{x}=\begin{bmatrix}\vdots\\a_k\\\vdots\end{bmatrix}=\mathbf{a}\label{eq:Nabla_aTx} \end{align} Note: the gradient definition shall not be confused by the definition of the derivative of a function $f: \mathbb{R}^n \to \mathbb{R}$ w.r.t the column vector $\mathbf{x}$. If $y$ is a scalar function of independent scalar variable $x$, then the differential $\mathrm{d}y$ of $y$ is related to $\mathrm{d}x$ by the formula \begin{align} \mathrm{d}y =\frac{dy}{dx}\mathrm{d}x \end{align} where $\mathrm{d}y/\mathrm{d}x$ denotes the derivative of $y$ with respect to $x$. The definition can be still applicable when the independent variable has higher dimensions, like vector or matrix. For instance, the differential of the scalar function $f:\mathbb{R}^n\to\mathbb{R}$ is still a scalar \begin{align} \mathrm{d}f(\mathbf{x})&=\begin{bmatrix}\frac{\partial f(\mathbf{x})}{\partial x_1}&\dots&\frac{\partial f(\mathbf{x})}{\partial x_n}\end{bmatrix}\mathrm{d}\mathbf{x}\label{eq:Differential_f} \end{align} which is also consistent with the result of a column vector $\mathrm{d}\mathbf{x}$ left-multiplied by a row vector. Since $f(\mathbf{x})= \mathbf{a}^\mathrm{T}\mathbf{x}$ is a scalar function, the result of differentiation w.r.t. $\mathbf{x}$ must be a row vector. In this way \begin{align} \frac{\mathrm{d}f(\mathbf{\mathbf{x}})}{\mathrm{d}\mathbf{x}}=\frac{\mathrm{d}\mathbf{a}^\mathrm{T}\mathbf{x}}{\mathrm{d}\mathbf{x}}=\frac{\mathrm{d}\mathbf{x}^\mathrm{T}\mathbf{a}}{\mathrm{d}\mathbf{x}}=\mathbf{a}^\mathrm{T}=\nabla^\mathrm{T}\!f(\mathbf{x}) \end{align} Therefore, we can conclude that the derivative with respect to a column vector is ended up with a row vector. The full version of chain rule for the vector function can be defined as \begin{align} \frac{\mathrm{d}f(g,h)}{\mathrm{d}\mathbf{x}}=\frac{\mathrm{d}(g(\mathbf{x})^T)}{\mathrm{d}\mathbf{x}} \frac{\partial f(g,h)}{\partial g}+\frac{\mathrm{d}(h(\mathbf{x})^T)}{\mathrm{d}\mathbf{x}} \frac{\partial f(g,h)}{\partial h} \end{align} where $f,g,h:\mathbb R^n\to\mathbb R$.

So a scalar function $f(\mathbf{x})= \mathbf{a}^\mathrm{T}\mathbf{x}$ becomes invariant under transposition (called Hermetian or self-adjoint) and the result of differentiation w.r.t. $\mathbf{x}$ (a column vector) must be a row vector. In this way \begin{align} \frac{\mathrm{d}f(\mathbf{\mathbf{x}})}{\mathrm{d}\mathbf{x}}=\frac{\mathrm{d}\mathbf{a}^\mathrm{T}\mathbf{x}}{\mathrm{d}\mathbf{x}}=\frac{\mathrm{d}\mathbf{x}^\mathrm{T}\mathbf{a}}{\mathrm{d}\mathbf{x}}=\mathbf{a}^\mathrm{T}=\nabla^\mathrm{T}\!f(\mathbf{x}) \end{align} Now consider a quadratic function \begin{align} f(\mathbf{x})=\mathbf{x}^\mathrm{T}\mathbf{A}\mathbf{x} \end{align} where $\mathbf{x}$ is a $n$-size vector and $\mathbf{A}$ is a $n\times n$ matrix. We can utilize the chain-rule to derive the gradient in matrix notation in a straight-forward way. \begin{align} \frac{\partial}{\partial x_k}f(\mathbf{x})&=\frac{\partial}{\partial x_k}\left(\begin{bmatrix}x_1&\ldots&x_n\end{bmatrix}\begin{bmatrix}a_{11}&\ldots&a_{1n}\\\vdots&\ddots&\vdots\\a_{n1}&\ldots&a_{nn}\\\end{bmatrix}\begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}\right)\\ &=\left(\frac{\partial}{\partial x_k}\begin{bmatrix}x_1&\ldots&x_n\end{bmatrix}\right)\mathbf{A}\begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}+\begin{bmatrix}x_1&\ldots&x_n\end{bmatrix}\mathbf{A}\frac{\partial}{\partial x_k}\begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}\\ &=\begin{bmatrix}\delta_{1k}&\ldots&\delta_{nk}\end{bmatrix}\mathbf{A}\begin{bmatrix}x_1\\\vdots\\x_n\end{bmatrix}+\begin{bmatrix}x_1&\ldots&x_n\end{bmatrix}\mathbf{A}\begin{bmatrix}\delta_{ik}\\\vdots\\\delta_{nk}\end{bmatrix}\\ &=a_{ki}x_i+x_ia_{ik} \end{align} Therefore, after assembling all the components, we have \begin{align} \nabla^\mathrm{T}f(\mathbf{x})=(\mathbf{A}^\mathrm{T}+\mathbf{A})\mathbf{x}\label{eq:NablaQuadratics} \end{align}

Please note the convention: gradient is defined as column vector.

0
On

In this kind of situation, you usually need to consider $\mathbf{w}^*$ as a fixed variable. It follows that the 'correct' Wirtinger gradient is $$ -d(n) \mathbf{u}^*[n]+ \mathbf{u}^*[n] \mathbf{u}^T[n] \mathbf{w}^* = -\mathbf{u}^*[n] \left(d[n]-\mathbf{u}^T[n] \mathbf{w}^* \right)= -e[n] \mathbf{u}^*[n] $$