What does it mean to minimize a matrix

1.5k Views Asked by At

In Gilbert Strang's 'Introduction to Applied Mathematics', in chapter 2.5 (Least Squares Estimation and the Kalman Filter), in proof 2I, he talks about 'minimizing the covariance matrix'. I can't determine what the criteria is for minimizing a matrix. Does anyone have insight to lend?

1

There are 1 best solutions below

2
On BEST ANSWER

Strang’s style is quite informal and I don’t particularly care for it (though it apparently appeals to some people). In this case, a more formal presentation would have certainly spelled out what he meant. Looking over his book's proof of 2I, I would say that Strang most likely means minimization of the matrix $P$ in the spectral norm over all matrices $L$ satisfying $LA=I$. Here’s why I think so:

In Strang’s notation, $V$ is the covariance matrix which is symmetric positive semi-definite. The matrix he says he wants to minimize has the form $P=LVL^T$, which (following some context-specific simplifications) he then breaks down as $P= L_0V{L_0}^T+ (L-L_0)V(L-L_0)^T$. $P$ is clearly a real symmetric (hence normal) positive semi-definite matrix which implies that $$\left\| P \right\|_2 = \sigma _{\max } \left( P \right) = \lambda_{\max } \left( P \right) = \mathop {\max }\limits_{\left\| x \right\|_2 = 1} x^T Px.$$

Note that $L$ is the only free parameter in Strang’s expression for $P$. If we assume that Strang is attempting to minimize the spectral norm of $P$ over all $L$ satisfying $LA=I$, then this is accomplished by finding $L$ such that $\mathop {\min }\limits_{\{ L|LA = I\} } \left\| P \right\|_2 = \mathop {\min }\limits_{\{ L|LA = I\} } \mathop {\max }\limits_{\left\| x \right\|_2 = 1} x^T Px$. That is, $$\mathop {\min }\limits_{\{ L|LA = I\} } \left\| P \right\|_2 = \mathop {\min }\limits_{\{ L|LA = I\} } \mathop {\max }\limits_{\left\| x \right\|_2 = 1} \left\{ {x^T L_0 VL_0^T x + x^T (L - L_0 )V(L - L_0 )^T x} \right\}$$

The second quadratic form, the only component involving $L$, is clearly bounded below by $0$ and this bound is achieved by taking $L=L_0$. This gives $P = L_0 VL_0^T$ which is what Strang wanted to show.

As an aside, in looking over his proof, I noticed that Strang has a minor typo in one of the intermediate steps. He writes

$$(L - L_0 )VL_0 ^T = (L - L_0 )VV^{ - 1} A(A^T V^{ - 1} A)^{ - 1}$$

when he should have written

$$(L - L_0 )VL_0 ^T = (L - L_0 )VV^{ - 1} A\left\{ {(A^T V^{ - 1} A)^{ - 1} } \right\}^T.$$