Bhattacharya Distance (or A Measure of Similarity) --- On Matrices with Different Dimensions

443 Views Asked by At

We have a series of observations of different properties (such as heart rate or blood sugar level and others as well) across different days from different people from different geographical regions. The number of days on which the observations are made is the same for all regions; but there could be different number of participants in each region. The result of the data collection will be as follows:

CASE 1)

We have a matrix, $M_i$, for each region $i$, with $T$ observations across time (rows) and $N_i$ (columns) for number of participants from that region. Here, we consider each property separately.

CASE 2)

We have a matrix, $M_{i,j}$, for each region $i$ and property $j$, with $T$ observations across time (rows) and $N_{i,j}$ (columns) for number of participants from region $i$ and property $j$. Here, we consider all properties together as representing a region.

OBJECTIVE

The goal is to identify people from which regions have a more similar constitution.

QUESTION ONE

How can we compute the Bhattacharya Distance across the various matrices $M_i$ (in pairs or combined, whichever is simpler) separately for each property (corresponding to Case 1 of the data collection)?

QUESTION TWO

How can we compute the Bhattacharya Distance across the various matrices $M_{i,j}$ for all the properties combined for a region (corresponding to Case 2 of the data collection)?

Dimension of Matrix: $Dim(M_{i,j})=(T,N_{i,j})$

There are two cases here.

1) $N_{i,j}=N_{i,k}$ $,\forall j,k$

That is the number of participants in a region are the same for all properties being measured.

2) $N_{i,j} \neq N_{i,k}$ $,\forall j,k$

That is the number of participants in a region could be different for some of the properties being measured.

But if helpful we can assume the simple case 1) $N_{i,j}=N_{i,k}$.

Please let me know if anything is not clear and if you need any further information. This is also posted on the statistics website and has received some interest but no answers: https://stats.stackexchange.com/questions/188532/bhattacharya-distance-or-a-measure-of-similarity-on-matrices-with-differen

1

There are 1 best solutions below

0
On

Answer to Questions One:

Using the Johnson Lindenstrauss transformation, we can reduce the dimensions of the larger dataset to be the same dimension as the smaller dataset and then compute the Bhattacharya Distance.

Link: "https://en.wikipedia.org/wiki/Johnson–Lindenstrauss_lemma"