I am trying to calculate the derivative of $$(\mathbf{Y-X \beta})^T\mathbf{P}(\mathbf{Y-X \beta}) $$ where $\mathbf{P}$ is a positive definite matrix. The actual dimensions of each element is not given in the question specification, but since it is for the purposes of minimising $\beta$ for regression analysis, I think $\mathbf{X}$ is mxn, $\mathbf{\beta}\in \mathbf{R}^n$ and $\mathbf{Y}\in \mathbf{R}^m$. First, I expand the expression,
$$(\mathbf{Y-X \beta})^T\mathbf{P}(\mathbf{Y-X \beta}) = (\mathbf{Y^TP-\beta^T\mathbf{X}^TP})(\mathbf{Y-X \beta}) = \mathbf{Y^TPY-Y^TPX\beta -\beta^TX^TPY+\beta^TX^TPX\beta} $$
Now I take the derivative for wrt $\beta$. For the final term, I am using that it is a quadratic form and I think I am assuming $\mathbf{X^TPX}$ is symmetric. I am just using identities on - https://en.wikipedia.org/wiki/Matrix_calculus Anyway I get,
$$\mathbf{-Y^TPX-Y^TPX}+2\mathbf{\beta^TX^TPX} = -2\mathbf{Y^TPX+2\beta^TX^TPX}$$
From here, I can equate to $0$ and take the transpose, to solve for $\beta$ (assuming everything is inversable for now, don't worry).
$$\mathbf{\beta^TX^TPX=Y^TPX}\iff \mathbf{X^TPX\beta=X^TPY} \iff \beta=\mathbf{(X^TPX)^{-1}X^TPY}$$
The solutions solve it slightly differently. They said since, $(\mathbf{Y-X \beta})^T\mathbf{P}(\mathbf{Y-X \beta})$ is already a quadratic form, we can just use this to calculate the derivative as $$\mathbf{-X^T}2\mathbf{P(Y-X\beta})=-2\mathbf{X^TPY} + 2{\mathbf{X^TPX\beta}}$$. As you can see, this is the same as my derivative, but transposed. Of course, once I transpose to solve for $\beta$, this is no longer the case and we get the same final solution. I have 2 questions.
Is the method I have done incorrect, i.e. if the question was just calculate the derivative have I done it incorrectly. If so would you kindly point out where I have made my mistake?
Could anyone recommend some literature/web page that explains the process the solutions took for taking the derivative by spotting it was a quadratic form.
Thank you very much!
The derivative you want is Fréchet derivative (see https://en.wikipedia.org/wiki/Fr%C3%A9chet_derivative). Let $$ \mathbf f(\beta)=(\mathbf{Y-X\beta})^T\mathbf{P}(\mathbf{Y-X \beta}). $$ Then \begin{eqnarray} D\mathbf f(\beta)\mathbf h&=&\lim_{t\to0}\frac{\mathbf f(\beta+t\mathbf h)-\mathbf f(\beta)}{t}\\ &=&\lim_{t\to0}\frac{-t(\mathbf{Y}-\mathbf{X}\beta)^T\mathbf{Ph}-t\mathbf{h}^T\mathbf{P}(\mathbf{Y}-\mathbf{X}\beta)+t^2\mathbf{h}^T\mathbf{Ph}}{t}\\ &=&-(\mathbf{Y}-\mathbf{X}\beta)^T\mathbf{Ph}-\mathbf{h}^T\mathbf{P}(\mathbf{Y}-\mathbf{X}\beta). \end{eqnarray}