how to calculate variance on SVD parameter estimation?

946 Views Asked by At

How do i Calculate the variance of a estimated parameter by SVD?

I know that there is an uncertainty on the dataset, but how can that be used to calculate variance of an parameter?

1

There are 1 best solutions below

1
On

Since you haven't introduced any notation, I'll simply assume that your regression model is of the form

$X\beta = y + \epsilon$

where $X$ is of size $n$ by $m$, $\beta$ is a vector of length $m$ and the vectors $y$ and $\epsilon$ are of length $n$. Here, the $\beta$ coefficients are to be obtained and $\epsilon$ has a multivariate normal distribution with mean 0 and covariance matrix $I$. If the measurement standard deviations are not all equal but they're known and the errors are independent, then you can scale the equations to get a covariance of $I$. If you have correlated measurement errors in $y$ things get to be a bit more complicated, and you'll need to check out some more advanced reference.

The least squares estimate of the parameters can be written as

$\hat{\beta}=X^{\dagger}y$

where $X^{\dagger}$ is the pseudoinverse of $X$.

The expected value of $\hat{\beta}$ is then

$E[\hat{\beta}]=X^{\dagger} E[y]$

$E[\hat{\beta}]=X^{\dagger} X\beta$.

also

$Var(\hat{\beta})=Var(X^{\dagger}y)=X^{\dagger}IX^{\dagger^{T}}$.

The nicest case occurs when $X$ has full column rank. In that case, since $X$ has full column rank,

$X^{\dagger}=(X^{T}X)^{-1}X^{T}$.

The least squares estimate is unbiased in this case since

$E[\hat{\beta}]=X^{\dagger}X\beta=(X^{T}X)^{-1}X^{T}X\beta=\beta$.

The variance is

$Var(\hat{\beta})=(X^{T}X)^{-1}X^{T}X(X^{T}X)^{-1}=(X^{T}X)^{-1}$.

When we use the singular value decomposition, we can write $X$ in the "compact" form as

$X=U_{p}S_{p}V_{p}^{T}$,

where $p$ is the number of nonzero singular values of $X$ (in practice, some tolerance must be set below which we discard small values), $S_{p}$ is a $p$ by $p$ diagonal matrix of singular values, $U_{p}$ is an $n$ by $p$ matrix with orthonormal columns, and $V_{p}$ is a $m$ by $p$ matrix with orthonormal columns.

The pseudoinverse is

$X^{\dagger}=V_{p}S_{p}^{-1}U_{p}^{T}$.

The expected value of $\hat{\beta}$ is now

$E[\hat{\beta}]=X^{\dagger} X\beta=V_{p}S_{p}^{-1}U_{p}^{T}U_{p}S_{p}V_{p}^{T}\beta$

and after a bit of simplification,

$E[\hat{\beta}]=V_{p}V_{p}^{T}\beta$.

If it happens that $X$ is of full column rank (that is $p=m$), then $V_{p}V_{p}^{T}=I$, and $\hat{\beta}$ is still an unbiased estimate of $\beta$. However, if $X$ is not of full column rank, then $\hat{\beta}$ will no longer be an unbiased estimate of $\beta$. In fact, you can't even bound the magnitude of this bias without making further assumptions about $\beta$!

As before, we can also compute the variance of $\hat{\beta}$. This is

$Var(\hat{\beta})=X^{\dagger} IX^{\dagger^{T}}=V_{p}S_{p}^{-1}U_{p}^{T}U_{p}S_{p}^{-1}V_{p}^{T}=V_{p}S_{p}^{-2}V_{p}^{T}$.

You can easily check that when $X$ is of rank $m$ (that is, $p=m$), then $(X^{T}X)^{-1}=V_{p}S_{p}^{-2}V_{p}^{T}$. Numerically, it's more accurate to use the SVD of $X$ then it is to compute $(X^{T}X)^{-1}$ because computing $X^{T}X$ squares the condition number of $X$.

When $p$ is less than $m$, it is misleading to report this covariance matrix for $\hat{\beta}$, because of the fact that estimate is biased and the bias might be far larger than any uncertainty propagated from uncertainty in $y$.