Matrix vector product vs. matrix computation

90 Views Asked by At

A weird question that has me confused. Suppose I have a symmetric matrix $A$, which has to be computed somehow. For example, the Hessian matrix is a symmetric matrix that is computed by taking the gradient twice. A covariance matrix is also symmetric as another example. $A$ will have $n^2$ entries but really only need to compute $n^2/2$ of them since it is symmetric.

Now consider a vector of appropriate length $z$. The product $Az$ will yield a vector, not a matrix. So it seems like in terms of computation time/steps that the product $Az$ can actually be calculated faster than computing $A$ itself? Has anyone ever thought of this, and if so in what context could this be of use?

2

There are 2 best solutions below

0
On BEST ANSWER

There are some cases where this is true. For instance, if $A=ab^T=a\otimes b$ is a rank 1 matrix, then you should definitely not compute $A$, or store it. Just keep $a$ and $b$ around instead and it will be much faster/cheaper to compute $Av$ (or almost anything really).

Another interesting case is Hessian-vector products. Let's say you want $H_f[x] u$ where $H$ is the Hessian and $u$ is a vector. But $H$ can be very expensive to compute or store. (Computing $\nabla f$ is usually feasible). Let $v(x) = \nabla f(x)$. Because $$ v(x+\delta) = \nabla f(x+\delta) \approx \nabla f(x) + H_f[x]\delta $$ Suppose $\delta = \epsilon u$. Then substituting: $$ v(x+\epsilon u) = \nabla f(x+\epsilon u) \approx \nabla f(x) + \epsilon H_f[x] u \;\;\implies\;\; H_f[x] u \approx \frac{\nabla f(x+\epsilon u) -\nabla f(x)}{\epsilon} $$ so you can estimate $H_f[x] u$ without actually computing $H_f$. (Note that the second-order finite difference is usually better).

I'm not really sure in general though. It probably depends on the mechanism by which $A$ is generated (or the properties of $A$) that determines whether it can be "skipped".


Two cases that are only tangentially related to your question also come to mind.

When solving $Ax=b$, it is often a bad idea to compute $A^{-1}$. Instead, you can solve the system without ever computing $A^{-1}$, which might not fit in memory, and is almost certainly slower and less numerically stable to calculate. If you use e.g. LU factorization, then solving is even reusable too. Some Krylov subspace methods have similar principles I think.

Another cool case is used in kernel methods from machine learning (also here). Suppose you have an input space $X$ and a high dimensional feature space $F$ (which is an inner-product space), with a map $\Phi:X\rightarrow F$. The point of $F$ is to restructure the data in a way that makes it more amenable to some task (e.g. more separable). If $x_1, x_2\in X$, then we might be interested in the inner product $\langle\Phi(x_1),\Phi(x_2)\rangle_F$ (e.g. to get the similarity between the points). However, for many spaces $F$, there is a kernel function $K:X\times X\rightarrow \mathbb{R}$ such that $K(x_1,x_2) = \langle\Phi(x_1),\Phi(x_2)\rangle_F$ can be computed very cheaply in the lower space, without ever calculating $\Phi$.

0
On

It sounds as something that would be discussed in Numerical Linear Algebra courses. Haven't dealt with it directly, though, just similar operations.