PCA analysis on 1D set of observations

4.5k Views Asked by At

There are lots of discussions on the web about Principal Component Analysis (PCA) and how to use tools such as matlab octave for that. However none of them fit my problem.

My problem is that I have a set of 1D observations and when I want to plot the data points in a 2D XY chart, the X axis are the indices of the observations (integer numbers) and the Y axis is the value of observations. For example, my observations are

0.7 0.4 0.55 0.2 0.63 0.83 0.48 0.91 0.73

Now when I use princomp in octave, I get this result

octave:8> X = [0.7 0.4 0.55 0.2 0.63 0.83 0.48 0.91 0.73];
octave:9> [pc, z, w, Tsq] = princomp (X)
warning: XXX FIXME XXX Tsq return from princomp fails some tests
pc =  1
z =

   0.096667  -0.203333  -0.053333  -0.403333   0.026667   0.226667  -0.123333   0.306667   0.126667

w =  0.049200
Tsq =  8

What does that mean? I want to see two axis which show the directions of variances.

UPDATE

Baseed on what Gottfried Helms said, Ichanged my input to

  octave:1> X=[1 0.7; 2 0.4; 3 0.55; 4 0.2; 5 0.63; 6 0.83; 7 0.48; 8 0.91; 9 0.73];

and here is what i get

 octave:5> [pc,score,l,t] = princomp(X)
 warning: XXX FIXME XXX Tsq return from princomp fails some tests
 pc =

    -0.999358  -0.035833
    -0.035833   0.999358

 score =

    3.9940e+00   2.3994e-01
    3.0054e+00  -9.5704e-02
    2.0006e+00   1.8367e-02
    1.0138e+00  -3.6724e-01
   -9.5555e-04   2.6650e-02
   -1.0075e+00   1.9069e-01
   -1.9943e+00  -1.9492e-01
   -3.0091e+00   1.9897e-01
   -4.0020e+00  -1.6747e-02

 l =

    7.509591
    0.039609

 t =

    3.577651
    1.433997
    0.541503     
    3.541846
    0.017931
    1.053196
    1.488853
    2.205235
    2.139788

Now how can I plot the two lines which show the direction?

2

There are 2 best solutions below

1
On BEST ANSWER

You need to supply the information for the second axis. You told, that as second measure you use the index of the data-point. So the complete data, which should be PC'able is

 x: 1   2   3    4   5    6    7    8    9
 y: 0.7 0.4 0.55 0.2 0.63 0.83 0.48 0.91 0.73

And of course, any statistical software which has pca implemented, gives you a solution.


[update] The solution which you get with SPSS is $$\small \begin{array} {ll|rr|rr} x & y & pc_1 & pc_2 & pc_1' & pc_2' \\ 1.00 & 0.70 & -0.60379 & -1.79251 &-0.6404&1.9012\\ 2.00 & 0.40 & -1.18552 & -0.16896 &-1.2574&0.1792\\ 3.00 & 0.55 & -0.57194 & -0.46302 &-0.6066&0.4911\\ 4.00 & 0.20 & -1.28649 & 1.37361 &-1.3645&-1.4569\\ 5.00 & 0.63 & 0.07083 & -0.11364 &0.0751&0.1205\\ 6.00 & 0.83 & 0.81722 & -0.62077 &0.8668&0.6584\\ 7.00 & 0.48 & 0.10267 & 1.21586 &0.1089&-1.2896\\ 8.00 & 0.91 & 1.45999 & -0.27139 &1.5486&0.2878 \\ 9.00 & 0.73 & 1.19701 & 0.84080 &1.2696&-0.8918 \end{array} $$

Note, that $pc_1$ and $pc_2$ is based on computation of the correlation-matrix with $n-1$ instead of $n$ in the denominator . Using $n$ we get $pc_1'$ and $pc_2'$ .
The SPSS-command for $pc_1$ and $pc_2$ was

FACTOR 
/VARIABLES x y  
/ANALYSIS x  y 
/PRINT UNIVARIATE INITIAL CORRELATION REPR EXTRACTION FSCORE 
/CRITERIA FACTORS(2) ITERATE(25) 
/EXTRACTION PC 
/ROTATION NOROTATE 
/SAVE REG(ALL) 
/METHOD=CORRELATION.
0
On

Principal component analysis isn't really applicable to a set of scalar observations. The first (and only) principal component will be $1$, and the score of each observation for that component will simply be its mean-shifted value ($x-\mu$).

The core of PCA is the analysis of covariance, figuring out how the elements of a multidimensional observation relate to each other. If you've only got one scalar per observation, PCA doesn't have anything interesting to tell you about that.