How to obtain the factor values in factor analysis?

183 Views Asked by At

I'm trying to understand factor analyisis. Let's say we observed $n$ $d$-dimensional values and want to have $p$ factors. Then after doing a factor analysis the j-th component of one observation can be represented as:

$ X_j = \mu_j + \sum_{k=1}^p l_{j,k} \cdot f_{k} + \epsilon_j $

Hope this is right so far. Now my question: If we have done that, how can we obtain the factor values of each observation. Shouldn't there be p Values for every observation, so that we "create" $n \cdot p$ values?

Even doing a factor analysis using R I can't find any command to obtain these values. Can someone please help me?

1

There are 1 best solutions below

1
On BEST ANSWER

Factor Analysis is built upon whats called The Orthogonal Factor Model, which is this in matrix form:

$X-\mu =LF+\epsilon \ \ \ $

Your notation is off where its supposed be : $X_p=\mu_p+\Sigma_k^ml_{p,k}*f_k+\epsilon_p$, where p is the number of variables and m is the number of observations.

Factor Analysis Steps:

1) We have different methods of estimating the loadings, $L$, either through principal components or maximum likelihood; from here on I will use principal components. Let $\Sigma$ be the covariance matrix of the data set you are observing. Then $\Sigma$=$LL'+0$, where $L$ is the principal components (i.e. eigenvectors) of the covariance matrix. However we want to 'reduce' our variable number so we will delete the principal components that do not high variance proportions. If we reduce our principal component amount then $\Sigma$=$LL'+\Psi$, where $L$ is less than the number of eigenvectors and $\Psi$ is actually the identity matrix whose diagonal values are some value $\Psi_i=\sigma_{ii}-\Sigma_j^ml^2_{i,j}$ for i=1...p.

2) We could now perform a factor rotation but I'll skip this step for now

3) Now to estimate the factor scores $F$, we will use ordinary least squares because we are using principal components, where in matrix form: $F=(L'L)^{-1}L'(X-\bar x)$.

If you are using R, look up the package FactorMiner, its a good package for performing a full factor analysis.

Full Example Using R:

I am going to be using the iris data set with only variables 1-4

1)

data=iris[,1:4]
head(data) # just to show you what it looks like

which outputs

Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
1   5.1 3.5 1.4 0.2
2   4.9 3.0 1.4 0.2
3   4.7 3.2 1.3 0.2
4   4.6 3.1 1.5 0.2
5   5.0 3.6 1.4 0.2
6   5.4 3.9 1.7 0.4

2) Now we will find the principal components (eigenvectors) of the covariance matrix.

eigen(cov(data))

which outputs:

$values
[1] 4.22824171 0.24267075 0.07820950 0.02383509

$vectors
            [,1]        [,2]        [,3]       [,4]
[1,]  0.36138659 -0.65658877 -0.58202985  0.3154872
[2,] -0.08452251 -0.73016143  0.59791083 -0.3197231
[3,]  0.85667061  0.17337266  0.07623608 -0.4798390
[4,]  0.35828920  0.07548102  0.54583143  0.7536574

Now we see which principal components contribute the most variance to the data, we need to calculate the variance contributed by each component, which is calculated by the eigenvalue divided by the sum of the eigenvalues, $Var_k=\lambda_k/\Sigma_i^p\lambda_i$. The sum of the eigenvalues is 4.57, so the first component contributes 4.228/4.57=0.92% of the variance, and component two contributes 0.24/4.57=0.052% of the variance, and so forth. In principal component analysis we would cut off the components that do not contribute much variance to the dateset, so we would actually cut off components 2-4, as they contribute less than 10% of the data, but I will include component 2 as its typical to include at least two components .

3) Now our factor Analysis model becomes $\Sigma=LL'+\Psi$ where L is only the first principal component because it envelopes the most variance in the model L:

L=eigen(cov(data))$vectors[,1:2]
L

which outputs, which is the first and second eigenvector:

    [,1]        [,2]
[1,]  0.36138659 -0.65658877
[2,] -0.08452251 -0.73016143
[3,]  0.85667061  0.17337266
[4,]  0.35828920  0.07548102

To calculate $\Psi$ we do $1-\Sigma_i l^2_i$, so for the first $\Psi_1$ it would be $1-0.36^2-0.65^2$, which is the first row of our L, which equates to 0.4479.

Psi = matrix(NA, 4, 4) # Creating my Psi 4 by 4 matrix with no values 
for(i in 1:4) {
  for(j in 1:4) {
    if(i==j) { # on diagonal values set our $\Psi_i=1-\Sigma_l^2$
      sum = 0
      for(k in 1:2) {
        sum = sum + (L[j,k])^2
      }
      sum = 1- sum
      Psi[i,j]= sum
    }
    else { # non diagonal values set to 0
      Psi[i,j] = 0
    }
  }
}
Psi

which outputs:

[,1]      [,2]      [,3]      [,4]
[1,] 0.4382909 0.0000000 0.0000000 0.0000000
[2,] 0.0000000 0.4597202 0.0000000 0.0000000
[3,] 0.0000000 0.0000000 0.2360574 0.0000000
[4,] 0.0000000 0.0000000 0.0000000 0.8659315

Now we can calculate our residual matrix by $\Sigma-LL'-\Psi$

cov(data)-L%*%t(L)-Psi

which outputs:

Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   -0.3143065 -0.49130450    1.0785607  0.43634977
Sepal.Width    -0.4913045 -0.81002058   -0.1306584 -0.03624254
Petal.Length    1.0785607 -0.13065839    2.1162779  0.97558723
Petal.Width     0.4363498 -0.03624254    0.9755872 -0.41899374

So these will be our errors that we would have to add on to our model to make it exact.

4) To calculate the factor scores $F$, we use $F=(L'L)^{-1}L'(X-\bar x)$, where $\bar x$ is the mean vector, mean of the variables:

data=scale(data, scale=FALSE) # centers the data by subtracting each column by its mean
head(data)

which outputs:

      Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,]   -0.7433333  0.44266667       -2.358  -0.9993333
[2,]   -0.9433333 -0.05733333       -2.358  -0.9993333
[3,]   -1.1433333  0.14266667       -2.458  -0.9993333
[4,]   -1.2433333  0.04266667       -2.258  -0.9993333
[5,]   -0.8433333  0.54266667       -2.358  -0.9993333
[6,]   -0.4433333  0.84266667       -2.058  -0.7993333

Now we calculate $F$ by:

F=t(solve(t(L)%*%L)%*%t(L)%*%t(data))
head(F)

which outputs:

          [,1]       [,2]
[1,] -2.684126 -0.3193972
[2,] -2.714142  0.1770012
[3,] -2.888991  0.1449494
[4,] -2.745343  0.3182990
[5,] -2.728717 -0.3267545
[6,] -2.280860 -0.7413304

These are the factor scores, hopefully this is what you were looking for.