When deriving the Discrete Kalman Filter, there is an intermediate step where you take the derivative of the trace of $P_k$ and set it equal to 0:
$P_k = E[e_k e_k^T]$
$ = P_k^- + K_kH_kP_k^-H_k^-K_k^T - P_k^-H_k^-K_k^T- K_kH_kP_k^-+K_kR_kK_k^T$
$ d(trP_k)\over dK_k$ $ = 2K_kH_kP_k^-H_k^T - 2(H_kP_k^-)^T +2K_kP_k = 0$
Why does using this method give us the minimum mean-squared error?
In terms of the system state $x_i$, we know the trace of $P_k$ is
$\sum E[(x_i-\hat x_i)^2] = \sum E[(x_i^2-2\hat x_ix_i^2+\hat x_i^2)]$
So why does setting the derivative of this equal to 0 and solving for $K_k$ give us a optimal Kalman Gain Matrix $K_k$ with minimal error, rather than maximal?