I asked this on SO first, but decided to move the math part of my question here.
Consider a $p \times p$ symmetric and positive definite matrix $\bf A$ (where $p=70000$, i.e., $\bf A$ is roughly 40 GB using 8-byte doubles). We want to calculate the first $3$ diagonal elements of the inverse matrix, $({\bf A}^{-1})_{11}$, $({\bf A}^{-1})_{22}$ and $({\bf A}^{-1})_{33}$.
I have found this paper by James R. Bunch who seems to solve this exact problem without calculating the full inverse $\bf A^{-1}$. If I understand it correctly he first calculates the Cholesky decomposition, i.e. the upper triangular matrix $\bf R$ which satisfies $\bf A=R^T R$, which needs $\frac16p^2+\frac12p^2-\frac23p$ floating point operations (multiplications/divisions) using the LINPACK function SPOFA. He then proceeds to calculate individual diagonal elements of the inverse $({\bf A^{-1}})_{ii}$ using an expression which exploits the sparsity of ${\bf R}^T{\bf y}={\bf e}_j$ and which requires $\frac12(p-i)^2+\frac52(p-i)+2$ floating point operations. (I don't understand the full details of this, so I can't currently sum it up correctly).
The paper is based on LINPACK; it isn't cited by anyone, so it seems nobody cared for the last 23 years? After reading this, I'm wondering whether this is still the best way of doing things, or whether a modern LAPACK-based approach could avoid the Cholesky decomposition?
In short, is there a quicker way to calculate those diagonal elements of the inverse?
I suggested that a partial Cholesky decomposition should be possible here, but based on a comment I received, I started thinking about my answer again and realised that my suggestion was incorrect. Apologies if I lead anyone astray.
You will need to use a full Cholesky decomposition, but the problem to deduce the inverse can be reduced in scale to save redundant computation.
If your SPD matrix $\mathbf{A}$ is written in block form as
$\mathbf{A}=\begin{bmatrix} \mathbf{A}_{00} & \mathbf{A}_{10}^T \\ \mathbf{A}_{10} & \mathbf{A}_{11} \end{bmatrix}$
with $\mathbf{A}_{00}$ being a $3\times3$ SPD block, its inverse will be given by
$\mathbf{A}^{-1}=\begin{bmatrix} \mathbf{B}_{00} & \mathbf{B}_{10}^T \\ \mathbf{B}_{10} & \mathbf{B}_{11} \end{bmatrix}$
The equivalent Cholesky decomposition of $\mathbf{A}$ is then given by
$\mathbf{R}=\begin{bmatrix} \mathbf{R}_{00} & \\ \mathbf{R}_{10} & \mathbf{R}_{11} \end{bmatrix}$
The resulting matrix equation to solve for the inverse block matrix $\mathbf{B}_{00}$ can then be reduced to
$\begin{bmatrix} \mathbf{R}_{00} & \\ \mathbf{R}_{10} & \mathbf{R}_{11} \end{bmatrix} \begin{bmatrix} \mathbf{R}_{00}^T & \mathbf{R}_{10}^T\\ & \mathbf{R}_{11}^T \end{bmatrix} \begin{bmatrix} \mathbf{B}_{00} \\ \mathbf{B}_{10} \end{bmatrix}=\begin{bmatrix}\mathbf{I}_{00} \\ \mathbf{0} \end{bmatrix}$
and solved via
$\begin{bmatrix} \mathbf{R}_{00} & \\ \mathbf{R}_{10} & \mathbf{R}_{11} \end{bmatrix} \begin{bmatrix} \mathbf{X}_{0} \\ \mathbf{X}_{1} \end{bmatrix} =\begin{bmatrix}\mathbf{I}_{00} \\ \mathbf{0} \end{bmatrix}$
and
$\begin{bmatrix} \mathbf{R}_{00}^T & \mathbf{R}_{10}^T\\ & \mathbf{R}_{11}^T \end{bmatrix} \begin{bmatrix} \mathbf{B}_{00} \\ \mathbf{B}_{10} \end{bmatrix}=\begin{bmatrix}\mathbf{X}_{0} \\ \mathbf{X}_{1} \end{bmatrix}$
The solution you are looking for will be the diagonal entries of $\mathbf{B}_{00}$. The right hand side of this expression is only $n\times 3$, and this combined with lower triangular structure of $\mathbf{R}$ should be about the most efficient way to get the solution you require.