I'm trying to understand derivation of the least squares method in matrices terms: $$S(\beta) = y^Ty - 2 \beta X^Ty + \beta ^ T X^TX \beta$$ Where $\beta$ is $m \times 1$ vertical vector, $X$ is $n \times m$ matrix and $y$ is $n \times 1$ vector. The question is: why $$\frac{d(2\beta X^Ty)}{d \beta} = 2X^Ty$$ I tried to derive it directly via definition of derivative: $$\frac{d(2\beta X^Ty)}{d \beta} = \lim_{\Delta \beta \to 0} \frac{2\Delta\beta X^T y}{\Delta \beta} = \lim_{\Delta \beta \to 0} 2\Delta\beta X^T y \cdot \Delta \beta^{-1}$$ May be the last equality must be as in the next line, but anyway I don't understand why $$2\Delta\beta \Delta \beta^{-1} X^T y $$And, what is $\Delta \beta^{-1}$? Vectors don't have the inverse form.
The same questions I have to this quasion: $$(\beta ^ T X^TX \beta)' =2 X^T X \beta$$
There are two approaches when taking vector derivatives. First, you can work in coordinates. This will always work, but is not always pleasant. In this case $$S(\beta) = y^Ty - 2\sum_i \beta_i(X^Ty)_i + \sum_{i,j} \beta_i (X^TX)_{ij} \beta_j$$ so \begin{align*} \frac{\partial S}{\partial \beta_k} &= -2\sum_i \delta_{ik}(X^Ty)_i + \sum_{i,j} \delta_{ik}(X^TX)_{ij}\beta_j + \sum_{i,j} \beta_i(X^TX)_{ij}\delta_{jk}\\ &= -2(X^Ty)_ k + \sum_j (X^TX)_{kj}\beta_j + \sum_i \beta_i(X^TX)_{ik}\\ \frac{\partial S}{\partial \beta} &= -2X^Ty + 2(X^TX)\beta. \end{align*}
The second approach is to work with the differential $dS(\beta)[\delta \beta]$ which computes the directional derivative $\frac{d}{dt}S(\beta + t\delta \beta)\Big\vert_{t\to 0}$; since the directional derivative is linear you must have $$dS(\beta)[\delta \beta] = \left(\frac{\partial S}{\partial \beta}\right)^T\delta \beta$$ and so you can often recover an elegant, coordinate-free expression for the derivative from the differential. I wrote up some notes on this here: https://www.dropbox.com/s/7bj966ifgqiljmt/calculus.pdf?dl=0
In this case \begin{align*} dS(\beta)[\delta \beta] &= -2\delta \beta^TX^Ty + \delta \beta^TX^TX\beta + \beta^TX^TX\delta \beta\\ &= \left[-2y^TX +2\beta^TX^TX\right]\delta \beta. \end{align*}