Hessian of a squared bilinear form

132 Views Asked by At

I have to expand the function $f:\mathbb{R}^n \times \mathbb{R}^m \rightarrow \mathbb{R}$ in Taylor series where $$ f(x,y) = (x^TAy + B^Tx + C^Ty)^2$$

with $A\in\mathbb{R}^{n\times m}$, $B\in\mathbb{R}^n$ and $C\in\mathbb{R}^m$. I could rewrite it in two forms $$\begin{aligned} f(x,y) &= x^T(Ay+B)(Ay+B)^Tx + 2x^T(Ay+B)C^Ty +y^TCC^Ty\\ &= y^T(A^Tx+C)(A^Tx+C)^Ty + 2y^T(A^Tx+C)B^Tx +x^TBB^Tx \end{aligned}$$

To quickly check that $$ \begin{aligned} \frac{\partial f}{\partial x} (x,y) &= 2(Ay+B)(Ay+B)^Tx + 2(Ay+B)C^Ty\\ \frac{\partial f}{\partial y} (x,y) &= 2(A^Tx+C)(A^Tx+C)^Ty + 2(A^Tx+C)B^Tx \end{aligned} $$ and $$ \begin{aligned} \frac{\partial^2 f}{\partial x^2} (x,y) &= 2(Ay+B)(Ay+B)^T\\ \frac{\partial^2 f}{\partial y^2} (x,y) &= 2(A^Tx+C)(A^Tx+C)^T \end{aligned} $$

However, what about the cross term $\frac{\partial^2 f}{\partial x\partial y}$? How to obtain it? The only thing I could think of was to calculate the gradient of each row of $\frac{\partial f}{\partial x}$ with respect to $y$ and I obtained $$\frac{\partial^2 f_i}{\partial x\partial y} (x,y) = 2(a_i^Tx^TA+A^Txa_i)y +2A^Txb_i+2B^Txa_i^T + 2(Ca_i+a_i^TC^T)y + 2Cb_i$$ where $a_i$ and $b_i$ are the $i$-th rows of $A$ and $B$. However, I tried to construct the matrix $\frac{\partial^2 f}{\partial x\partial y}$ and I couldn't really find an single expression for it in terms of $A, B$ and $C$ in the end. Is it possible?

I would appreciate very much any suggestions. =)

1

There are 1 best solutions below

2
On BEST ANSWER

$\def\m#1{\left[\begin{array}{c}#1\end{array}\right]}\def\p{{\partial}}\def\grad#1#2{\frac{\p #1}{\p #2}}\def\hess#1#2#3{\frac{\p^2 #1}{\p #2\,\p #3^T}}$Combine vector pairs into partitioned vectors $$\eqalign{ w &= \pmatrix{x\\y} \qquad h &= \pmatrix{b\\c} \qquad w,h \in {\mathbb R}^{p} \\ }$$ where $p=m+n.\;$ Then consider the following partitioning of the identity matrix $$\eqalign{ I_p = \m{I_n&0_{nm}\\0_{mn}&I_m} = \big[{E_1\;\big|\; E_2}\big] }$$ The $E_k$ are matrix analogs of the standard basis vectors, which can be used either to extract components of a partitioned vector or matrix, e.g. $$\eqalign{ x &= E_1^Tw \qquad y = E_2^Tw \qquad M_{ij} = E_i^TME_j \\ }$$ or to construct a partitioned vector or matrix from such components, e.g. $$\eqalign{ w &= E_1x + E_2y \qquad M &= \sum_{i=1}^2 \sum_{i=j}^2 E_i M_{ij} E_j^T \qquad I_p &= \left(E_1E_1^T + E_2E_2^T\right) \\ }$$ Applying this to the current problem $$\eqalign{ x^TAy &= (E_1^Tw)^TA(E_2^Tw) = w^T(E_1AE_2^T)w = w^TMw \\ M &= \m{0_{nn}&A\\0_{mn}&0_{mm}}\\ }$$ Next, consider the following scalar function and its gradient and hessian. $$\eqalign{ \phi &= x^TAy + b^Tx + c^Ty \;=\; w^TMw + h^Tw \\ \grad{\phi}{w} &= 2Mw + h \\ \hess{\phi}{w}{w} &= 2M \\ }$$ Finally, write your function in terms of this function.
Then calculate its gradient and hessian with respect to $w$. $$\eqalign{ f &= \phi^2 \\ \grad{f}{w} &= 2\phi\left(\grad{\phi}{w}\right) \\ \hess{f}{w}{w} &= 2\phi\left(\hess{\phi}{w}{w}\right) + 2\left(\grad{\phi}{w}\right)\left(\grad{\phi}{w}\right)^T \\ }$$ Look at the partitioned form and extract the desired component hessians, e.g. $$\eqalign{ \hess{f}{w}{w} &= \m{ \hess{f}{x}{x} & \hess{f}{x}{y} \\ \hess{f}{y}{x} & \hess{f}{y}{y}} \quad\implies\quad \hess{f}{x}{y} &= E_1^T \left(\hess{f}{w}{w}\right) E_2 \\\\ }$$


But it looks like you're after the Taylor series.

Expanding about the point $w_0$ yields $$\eqalign{ f &= f(w) \\ f_0 &= f(w_0) \\ f &= f_0 + \left(\grad{f}{w}\right)^T(w-w_0) + \frac 12\,(w-w_0)^T\left(\hess{f}{w}{w}\right)(w-w_0) \\ }$$ which can be expressed much more concisely using differentials $$\eqalign{ df &= (f-f_0) \\ dw &= (w-w_0) \\ df &= \left(\grad{f}{w}\right)^Tdw + \frac 12\,dw^T\left(\hess{f}{w}{w}\right)dw \\ }$$ To break it into its block components, insert the identity matrix $I_p$ between the various factors $$\eqalign{ df &= \left(\grad{f}{w}\right)^T\left(E_1E_1^T + E_2E_2^T\right)dw \\ &+\; \frac 12\,dw^T\left(E_1E_1^T + E_2E_2^T\right)\left(\hess{f}{w}{w}\right)\left(E_1E_1^T + E_2E_2^T\right)dw \\ }$$ and expand to get an expression in terms of the original $(x,y)$ variables
$-$ I'll leave that exercise to you.