Average standardized distance between units?

79 Views Asked by At

Let $$ \pmb{X} = \begin{pmatrix} x_{11}& \cdots& x_{1j}& \cdots& x_{1p} \\ \vdots& & \vdots & & \vdots \\ x_{i1} & \cdots & x_{ij} & \cdots & x_{ip} \\ \vdots && \vdots && \vdots\\ x_{n1} & \cdots & x_{nj} & \cdots & x_{np} \end{pmatrix}. $$

Let $$\pmb{x}_i^{(ind)} = \begin{pmatrix} \frac{x_{i1}-\overline{x}_1}{\sqrt{n}s_1} \\ \vdots \\ \frac{x_{ij}-\overline{x}_j}{\sqrt{n}s_j} \\ \vdots \\ \frac{x_{i1}-\overline{x}_p}{\sqrt{n}s_p} \\ \end{pmatrix}$$ such that $$\pmb{X}_R = \begin{pmatrix} \pmb{x}_1^{(ind)^{\top}} \\ \vdots \\ \pmb{x}_i^{(ind)^{\top}} \\ \vdots \\ \pmb{x}_n^{(ind)^{\top}} \end{pmatrix}.$$

Moreover, define the standardized Euclidean distance as $$d^2(\pmb{x}_i^{(ind)}, \pmb{x}_l^{(ind)}) = \sum_{j=1}^p \frac{(x_{ij}-x_{lj})^2}{ns_j^2}.$$

I have three questions, two of which I think I already answered.

  1. What is the sum of the squares of all the standardized distances between all the pairs of units, i.e., $$ \sum_{i=1}^n \sum_{l=1}^n d^2(\pmb{x}_i^{(ind)}, \pmb{x}_l^{(ind)}),$$

  2. What is the sum of the squares of all the standardized distances between all the units and their mean vector?

  3. What is the average standardized distance between the units?


  1. For this question, the answer is $$ \sum_{i=1}^n \sum_{l=1}^n d^2(\pmb{x}_i^{(ind)}, \pmb{x}_l^{(ind)}) = \cdots = 2pn.$$

  2. For this, the answer is
    $$\sum_{i=1}^n d^2(\pmb{x}_i^{(ind)}, \pmb{\overline{x}}^{(ind)}) = \cdots = p$$

  3. This is where I'm stuck. I don't know what to compute. Should I use the Mahalanobis distance?


Edit: I have the following for question 3: $$\mathbb{E}[d^2(\pmb{x}_i^{(ind)}, \pmb{x}_l^{(ind)})] = \mathbb{E}\left[\sum_{j=1}^p \frac{(x_{ij}-x_{lj})^2}{ns_j^2} \right] = \frac{1}{n} \sum_{j=1}^p \frac{1}{s_j^2} \underbrace{\mathbb{E}[(x_{ij}-x_{lj})^2]}_{=s_j^2} = \frac{p}{n}, $$ do you agree?