Expressing a summation using matrix algebra

8.6k Views Asked by At

Consider the $r \times n$ matrix $$\begin{pmatrix} X_{11} & X_{12} & \cdots & X_{1n} \\ X_{21} & X_{22} & \cdots & X_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{r1} & X_{r2} & \cdots & X_{rn} \end{pmatrix}\text{.}$$ Define $$\begin{align*} &\bar{X} = \dfrac{\sum\limits_{i=1}^{r}\sum\limits_{j=1}^{n}X_{ij}}{nr} \\ &\bar{X}_{i} = \dfrac{\sum\limits_{j=1}^{n}X_{ij}}{n}\text{.} \end{align*}$$ I am interested in knowing if there is a possible way to write the summations $$\begin{align*} \hat{v}^{S} &= \sum\limits_{i=1}^{r}\sum\limits_{j=1}^{n}\left(X_{ij}-\bar{X}_{i}\right)^{2} \\ \hat{a}^{S} &= \sum\limits_{i=1}^{r}\left(\bar{X}_i - \bar{X}\right)^{2} \end{align*}$$ in terms of matrix operations (anything one would learn in a first course in linear algebra, such as multiplication of matrices, inverses of matrices, determinants, eigenvalues, etc.).

The reason why is because I have to memorize these formulas for the actuarial exam I will be taking soon, and I am not interested in memorizing summations if there is a way to express them in matrix form.

There may not be an answer to what I seek, and I might just have to memorize these summations as is, but I thought I would ask in case there is.

ETA: I did pass this exam (at least 93% scored) but am still interested in knowing if there is a solution to this problem.

2

There are 2 best solutions below

0
On BEST ANSWER

Here are some expressions, although not necessarily easier to memorize!

Let $\mathbf{1}_{n}$ denote the column vector of all ones and length $n$, and $\mathbf{I}$ the $n\times n$ identity matrix. Then, $$ \hat{v}^{S} = \text{Tr}\left( \mathbf{X}^{T}\mathbf{X} \left(\mathbf{I}_{n}- \frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right)\right), $$ where $\text{Tr}(\mathbf{M})$ denotes the trace of $\mathbf{M}$, and \begin{align*} \hat{a}^{S} &= \frac{1}{n^{2}} \left( \mathbf{1}_{n}^{T}\mathbf{X}^{T} \mathbf{X}\mathbf{1}_{n} - \frac{1}{r} (\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n})^{2} \right). \end{align*} Moving things around, you can also alternatively write: \begin{align*} \hat{a}^{S} &= \frac{1}{n^{2}} \mathbf{1}_{n}^{T}\mathbf{X}^{T} \left( \mathbf{I} - \frac{1}{r} \mathbf{1}_{r}\mathbf{1}_{r}^{T} \right) \mathbf{X}\mathbf{1}_{n}. \end{align*} It is useful to note that $\mathbf{1}_{n}\mathbf{1}_{n}^{T}$ is the $n \times n$ all-ones matrix, and $\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n}$ is the sum of all entries in $\mathbf{X}$.

Why are the above true?

Note that ${\overline{\mathbf{X}}}_{i}$ is the sum of all entries of the $i$th row of $\mathbf{X}$ normalized by $n$, which can be written as $\frac{1}{n}\mathbf{X}_{i,:}\mathbf{1}_{n}$, where $\mathbf{X}_{i,:}$ is the $i$th row of $\mathbf{X}$. Let $\mathbf{R}$ denote the column vector of length $r$ obtained by vertically stacking the ${\overline{\mathbf{X}}}_{i}$'s. Then, $$ \mathbf{R} = \frac{1}{n}\mathbf{X}\mathbf{1}_{n}. $$ Also, note that $\mathbf{R}\mathbf{1}_{n}^{T}$ is an $r \times n$ matrix whose entire $i$th row is equal to $R_{i} = {\overline{\mathbf{X}}}_{i}$. Then, $\hat{v}^{S}$ is the sum of the squared entries of the matrix $$ \mathbf{X} - \mathbf{R}\mathbf{1}_{n}^{T}, $$ which is also called the squared Frobenius norm of the matrix, denoted by $\|\|_{F}^{2}$, that is, \begin{align*} \hat{v}^{S} &= \| \mathbf{X} - \mathbf{R}\mathbf{1}_{n}^{T} \|_{F}^{2} \\ %= \sum\limits_{i=1}^{r}\sum\limits_{j=1}^{n}\left(X_{ij}-\bar{X}_{i}\right)^{2} &= \| \mathbf{X} - \frac{1}{n}\mathbf{X}\mathbf{1}_{n}\mathbf{1}_{n}^{T} \|_{F}^{2} \\ &= \| \mathbf{X} \left(\mathbf{I}_{n}- \frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right) \|_{F}^{2}\\ &= \text{Tr}\left( \mathbf{X} \left(\mathbf{I}_{n}- \frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right)\left(\mathbf{I}_{n}- \frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right)^{T}\mathbf{X}^{T}\right)\\ &= \text{Tr}\left( \mathbf{X}^{T}\mathbf{X} \left(\mathbf{I}_{n}- \frac{2}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T} +\frac{1}{n^{2}}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right)\right)\\ &= \text{Tr}\left( \mathbf{X}^{T}\mathbf{X} \left(\mathbf{I}_{n}- \frac{2}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T} +\frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right)\right)\\ &= \text{Tr}\left( \mathbf{X}^{T}\mathbf{X} \left(\mathbf{I}_{n}- \frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{T}\right)\right). \end{align*}

For the second part, $\overline{\mathbf{X}}$ is a scalar, equal to the sum of all entries in $\mathbf{X}$ normalized by $rn$, i.e., $$ \overline{\mathbf{X}} = \frac{1}{rn} \mathbf{1}_{r}^{T} \mathbf{X} \mathbf{1}_{n}. $$ Then, \begin{align*} \hat{a}^{S} &= \sum\limits_{i=1}^{r}\left(\bar{X}_i - \bar{X}\right)^{2}\\ &= \|\mathbf{R} - \overline{\mathbf{X}} \cdot \mathbf{1}_{r} \|_{2}^{2}\\ &= \|\frac{1}{n}\mathbf{X}\mathbf{1}_{n} - \frac{1}{rn} \mathbf{1}_{r}^{T} \mathbf{X} \mathbf{1}_{n} \cdot \mathbf{1}_{r} \|_{2}^{2} \\ &= \frac{1}{n^{2}} \|\mathbf{X}\mathbf{1}_{n} - \frac{1}{r} (\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n}) \cdot \mathbf{1}_{r} \|_{2}^{2} \\ &= \frac{1}{n^{2}} \left( \mathbf{1}_{n}^{T}\mathbf{X}^{T} \mathbf{X}\mathbf{1}_{n} - \frac{2}{r} (\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n}) \cdot \mathbf{1}_{n}^{T}\mathbf{X}^{T}\mathbf{1}_{r} + \frac{1}{r^{2}}(\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n})^{2} \mathbf{1}_{r}^{T}\mathbf{1}_{r} \right)\\ &= \frac{1}{n^{2}} \left( \mathbf{1}_{n}^{T}\mathbf{X}^{T} \mathbf{X}\mathbf{1}_{n} - \frac{2}{r} (\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n})^{2} + \frac{1}{r}(\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n})^{2} \right)\\ &= \frac{1}{n^{2}} \left( \mathbf{1}_{n}^{T}\mathbf{X}^{T} \mathbf{X}\mathbf{1}_{n} - \frac{1}{r} (\mathbf{1}_{r}^{T}\mathbf{X}\mathbf{1}_{n})^{2} \right), \end{align*} which completes the proof.

0
On

There is a way to express them in matrix form, but it makes less sense than intuition from statistics.

Let $$ A_m := \begin{pmatrix}1/m \\ 1/m \\ \vdots \\ 1/m\end{pmatrix} \ \text{(an}\ 1×m\ \text{column vector)} $$ for any $m∈{\mathbb N}$. Then: $$\bar{X}_i = X\,A_n;\quad \text{Likewise, } \bar{X} = \bar{X}_i^{\mathsf T}\,A_r = A_r^{\mathsf T}\,X\,A_n\,. $$

Obviously, the matrix $X_{ij} - \bar{X}_i$ (let denote it by $W$) can be expressed as $$W = X - n\cdot\bar{X}_i\,A_n^{\mathsf T} = X - n\cdot X\,A_n\,A_n^{\mathsf T} = X\,(I_n - n\,A_n\,A_n^{\mathsf T})\,, $$ where $I_n$ is n × n identity matrix. Sum of squares is trickier. For an n × 1 row vector $\mathbf w$, we know that $$\sum\limits_{j=1}^{n} (w_j)^2 = {\mathbf w}\,{\mathbf w}^{\mathsf T}\,,$$ so if $r=1$, then $$\hat v^S = \sum\limits_{j=1}^{n} (X_{1j} - \bar{X}_1)^2 = W\,W^{\mathsf T} = X\,(I_n - n\,A_n\,A_n^{\mathsf T})^2\,X^{\mathsf T}. $$ What is $W\,W^{\mathsf T}$ for general $r$? It is easy to get that it’s an r × r matrix with diagonal elements containing squares of respective rows of $W$. Their sum (known as matrix trace, “tr”) gives $\hat v^S$: $$\hat v^S = \operatorname{tr}(W\,W^{\mathsf T}) = \operatorname{tr}(X\,(I_n - n\,A_n\,A_n^{\mathsf T})^2\,X^{\mathsf T}). $$ And, finally, $$\hat a^S = \bar{X}_i^{\mathsf T}\,(I_r - r\,A_r\,A_r^{\mathsf T})^2\,\bar{X}_i =\quad\\= A_n^{\mathsf T}\,X^{\mathsf T}\,(I_r - r\,A_r\,A_r^{\mathsf T})^2\,X\,A_n\,. $$


Update: it can be found that $I_m - m\,A_m\,A_m^{\mathsf T}$ matrix (or “$\mathbf{I}_{m} - \frac{1}{m}\mathbf{1}_{m}\mathbf{1}_{m}^{T}$” in m.a.’s notation) is idempotent (actually, it is orthogonal projection on a hyperplane). Hence, there is no difference between $(I_n - n\,A_n\,A_n^{\mathsf T})^2$ and $\mathbf{I}_{n} - \frac{1}{n}\mathbf{1}_{n}\mathbf{1}_{n}^{\mathsf T}$.