A research paper presents an equation for 'correlation distance between between two vectors'. But I cannot find information its derivation.

29 Views Asked by At

Hallo Mathematics StackExchange,

I currently trying to pick apart a research paper titled 'Accounting for Label Unscertainty in Machine Learning for Detection of Acute Respiratory Distress Syndrome' (https://ieeexplore.ieee.org/document/8304750).

The problem that I am having is understanding where the first equation in this paper comes from, and its not cited so I haven't had any luck looking through the sources. But its presented as follows:

To implement this sampling strategy, we first calculated pairwise correlation distance matrices to represent dependency over the span of each patient’s time-series data. Given an m-by-n matrix for each patient’s data, where m is the number of times the patient was observed, and each observation is treated as 1-by-n row vectors, the correlation distance between vectors $X_a$ and $X_b$ for a single pair of observations is defined as: $$d_{ab} = 1 - \frac{(X_a-\tilde{X_a})(X_b-\tilde{X_b})}{\sqrt{(X_a-\tilde{X_a})(X_a-\tilde{X_a})^\prime} \sqrt{(X_b-\tilde{X_b})(X_b-\tilde{X_b})^\prime}}$$ where $\tilde{X_a}=\frac{1}{n}\sum_{j}X_{aj}$ and $\tilde{X_b}=\frac{1}{n}\sum_{j}X_{bj}$.

Using this correlation distance formula, an m-by-m correlation distance matrix can be derived for all observations on the patient, taken pairwise.

I have done some reading into correlation distance and have seen quite a few formulas that take the same form as above, but those formulas do not subtract from one. Here is an example (although not quite the same): https://stats.stackexchange.com/questions/269834/how-to-calculate-one-number-from-pearson-correlation-distance-of-more-than-two-v
Also, I do not understand what the primes signify in the denominator. I'm at a loss of where to even begin.

So here are my main questions:

  • Why does this equation subtract from one?
  • What do the primes signify?
  • It seems $\tilde{X_a}$ and $\tilde{X_b}$ are scalars. I don't know how to interpret the subraction of a scalar from a 1xn vector.