How does the author get from
$u = e^tSe - \lambda(e^te-1)$
to
$\frac{\partial u}{\partial x} = 2Se - 2\lambda e$
?
I understand that there's the chain-rule being applied, but I don't know how I get from
$ \frac{\partial{e^t}}{\partial e} Se + e^t\frac{\partial Se}{\partial e} $
to
$2Se$
?
Source: Pattern Classification from Duda, Hart & Stork p.116
In matrix calculus, the chain rule is not your friend. Working with differentials is much less error-prone.
Because a quantity and its differential (e.g. $Y$ and $dY$) have the same tensorial character, they can be treated (mathematically) exactly the same way.
However a gradient, e.g. $\frac{\partial Y}{\partial X}$, has a completely different character. In most cases, it is a higher-order tensor, which is difficult to calculate. And once calculated, it can be difficult to work with algebraically, e.g. when should it be transposed, how should it be multiplied, etc.
So find the differential of your function $$\eqalign{ u &= e^TSe -\lambda(e^Te-1) \cr\cr du &= de^TSe+e^TSde -\lambda(de^Te+e^Tde) \cr &= e^TS^Tde+e^TSde -\lambda(e^Tde+e^Tde) \cr &= (e^TS^T+e^TS - 2\lambda e^T)\,de \cr &= (Se+S^Te - 2\lambda e)^T\,de \cr \cr }$$ The gradient $\Big(g=\frac{\partial u}{\partial e}\Big)$ is the vector which satisfies $$du=g^Tde$$ Therefore $$\eqalign{ g &= Se+S^Te - 2\lambda e \cr &= 2Se - 2\lambda e \cr }$$ The last step is allowed because $S$ is symmetric -- which the authors should have mentioned at some point.