I have a function $f(\mathbf{x})$: $\mathbb{R}^N \rightarrow \mathbb{R}$ given by:
$f(\mathbf{x}) = \lvert \mathbf{A}(\mathbf{x} \circ \mathbf{x})-\mathbf{b} \rvert^2$.
with $\mathbf{A} \in \mathbb{R}^{N\times N}$ a constant matrix, $\mathbf{b}$ the known vector $\in \mathbb{R}^N$, $\lvert \cdot \rvert^2$ the $\ell_2$ norm and $\circ$ the hadamard product. I would like to know the expression about the gradient of $f$ with respect to $\mathbf{x}$.
For typing convenience, define the vector $$w = A(x\circ x) - b$$ and use a colon to denote the trace/Frobenius product, i.e. $$\eqalign{ A:B &= {\rm Tr}(AB^T) \;\;\;\,=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \\ A:(B\circ C) &= (A\circ B):C \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ A:A &= \big\|A\big\|^2_F \\ A:B &= B:A \;\;=\; B^T:A^T \\ CA:B &= C:BA^T = A:C^TB \\ }$$ Write the function in terms of the above, then calculate its differential and gradient. $$\eqalign{ f &= w:w \\ df &= 2w:dw \\ &= 2w:A\,(x\circ dx + dx\circ x) \\ &= 2A^Tw:(2x\circ dx) \\ &= (4x\circ A^Tw):dx \\ \frac{\partial f}{\partial x} &= 4x\circ A^Tw \\\\ }$$
Note that there are at least two types of solutions for $x$ which yield a gradient of zero.
The first is $$x=0$$ and the second is a family of $2^N$ solutions which can be written using the inverse matrix, an element-wise square root, and a sign vector with elements $s_k=\pm 1$ $$\eqalign{ &x = s\circ\big(A^{-1}b\big)^{\circ\frac 12} \\ &x\circ x = A^{-1}b \\ }$$ The question in the comments is:
If you use a gradient descent algorithm, which solution will it converge to?
It likely depends on your initial guess for $x$, but it's possible that the iterations will be strongly attracted to one of the solutions no matter where you start the algorithm.