Given matrix $A \in \mathbb{R}^{m \times n}$ and vector $b \in \mathbb{R}^m$, let $f : \mathbb{R}^{n+m} \to \mathbb{R}$ be defined as $$f(x,y) := \frac{1}{2} \|Ax-(b^Ty)y\|_2^2$$ where $x \in \mathbb{R}^n$ and $y \in \mathbb{R}^m$. Find the gradient $\nabla f(x,y)$ and the Hessian $\nabla^2 f(x,y)$.
I tried to expand the expression for $f(x,y)$ and compute the partial derivatives with respect to $x$ and $y$ but I don't understand properly.
$\def\bb{\mathbb}$ Combine the vectors $x\in{\bb R}^{n}$ and $y\in{\bb R}^{m}$ into a single long vector $$\eqalign{ w = \pmatrix{x\\y} \in{\bb R}^p\qquad p=m+n \\ }$$ and define block matrix analogs of the standard basis vectors $$\eqalign{ &&e_1 = \pmatrix{{\tt1}\\0},\qquad &e_2=\pmatrix{0\\{\tt1}}\\ &&E_1 = \pmatrix{I_n\\0_n},\qquad &E_2 = \pmatrix{0_m\\I_m}, \qquad 0_n\in{\bb R}^{(p-n)\times n} \\ &{\rm so\,that} \\ &&x = E_1^Tw, \qquad &y=E_2^Tw \\ }$$ The following vector will prove useful. $$\eqalign{ h &= Ax - (b^Ty)y \\ &= AE_1^Tw - (b^TE_2^Tw)E_2^Tw \\ dh &= AE_1^Tdw - (b^TE_2^Tdw)E_2^Tw - (b^TE_2^Tw)E_2^Tdw \\ &= \Big(AE_1^T - E_2^Twb^TE_2^T - (b^TE_2^Tw)E_2^T\Big)\,dw \\ &= M\,dw \\ }$$ Write the function in terms of these new variables and calculate its gradient. $$\eqalign{ f &= \tfrac 12 h^Th \\ df &= h^Tdh \\&= h^TM\,dw \\&= (M^Th)^Tdw \\ \nabla f \doteq \frac{\partial f}{\partial w} &= M^Th \\ &= \Big(AE_1^T - E_2^Twb^TE_2^T - (b^TE_2^Tw)E_2^T\Big)^Th \\ &= \Big(E_1A^T - E_2by^T - (y^Tb)E_2\Big)\, \Big(Ax - (b^Ty)y\Big) \\ }$$ So that's the gradient vector.
I'll leave the Hessian matrix to you.