Can I write this sum as a matrix product?

289 Views Asked by At

Let $\eta$ be a $n \times p$ matrix and $\Sigma$ a $p\times p$ matrix. Is it possible to rewrite the sum over element-wise quadratic forms,

$$\sum_{i=1}^n \eta_i^T \ \Sigma \ \eta_i,$$

where $\eta_i$ is the i-th row of eta and thus a $p \times 1$ vector, so that we eliminate the sum and have a matrix product (vectorization)?

1

There are 1 best solutions below

6
On BEST ANSWER

There are multiple equivalent ways to represent the quantity, here are 5 of them:

  1. $y = \sum_i x_i^T A x_i = \sum_{ijk} X_{ij} X_{ik} A_{jk}$ (index notation)
  2. $y = A_{jk} X_{ij} X_{ik}$ (Einstein notation, ignoring upper/lower index distinction)
  3. $y = \operatorname{tr}(X A X^T)$ (trace formulation)
  4. $y = \big\langle A, X^T X\big\rangle_{\mathbf F}$ (Frobenius inner product)
  5. $y = A\cdot X^{\otimes 2} = A\cdot (X\otimes X)$ (where "$\cdot$" is the appropriate tensor contraction)

All of these are equivalent, if you need some guidance to understand the differences let me know.

In any case my recommendation if you need to implement this is to get comfortable with Einstein notation and make use the very powerful einsum function that many linear algebra frameworks offer nowadays. It can be very useful anytime you deal with products of tensors. In essence, what we need is a description of the axis of the input tensors and how they should be combined. Here, in Einstein notation we have $y = A_{jk} X_{ij} X_{ik}$, so then, for example in python we would have

y = np.einsum("ij,ik,jk->", X, X, A)

If instead, you were were interested in computing $x^T A x$ multiple times for different $x$'s (i.e. batch computation) then you would do $y_i = A_{jk} X_{ij} X_{ik}$, i.e.

y = np.einsum("ij,ik,jk->i", X, X, A)

Or, if you want to compute your quantity in batch mode for multiple (capital) $X$'s, then you'd collect them in a 3-rd order tensor/list of matrices $T_{mij} = X^{(m)}_{ij}$ and compute

y = np.einsum("mij,mik,jk->m", T, T, A)

which would be the list/vector of containing the scalar results. As you can see, einsum is very versatile. Moreover, it is really fast as it internally can perform several optimization schemes (cf. Matrix chain multiplication problem), as long as you can afford to have everything in memory (higher order tensors get very memory heavy very fast.)

If, for any reason, you cannot use the einsum approach, you should use the Frobenius inner product variant, which boils down to

y = np.sum(A * (X.T.dot(X)))

In any case, never do np.trace(X.dot(A.dot(X.T)) as this is an order of magnitude less efficient since you end up computing all entries of $X\cdot A\cdot X^T$, but the trace ignores everything but the diagonal. The cost would be $\mathcal O(np^2 + n^2 p)$ as opposed to $\mathcal O(np^2)$ of the Frobenius version