Apply function on two set to create a new (+ process validation: bootstrap iterations)

39 Views Asked by At

I am new to algebra and help would be more than welcome to tell me if the process I have built is OK, and if my attempt to apply formula on two sets to create a new one is also OK.

Context

I have a database containing species records (e.g. 10 different species, with 100 rows by species ; in columns are quantitative variables). I want to compute Euclidean distances (considering all variables) between randomly sampled 20 row by species, between each species and species h. I want to bootstrap this calculation an increasing number of time to assess the effect of iteration augmentation on results linearity (to say: OK, we have reach linearity, results should be OK). The aim is to show a figure like that (1 line color = 1 species Euclidean distance to species h):

enter image description here

To explain the process, I illustrate it with distance calculation between species and species h:

  1. We define sets and as original species records.

  1. Then we define and as proper subsets of and composed of 20 records randomly sampled in and , without replacement, so that probability P(r) for records to be selected is:

  1. Then we define the following function to compute the mean Euclidean distance between all records of and

With n = 20 (variables) and n' = 20 (randomly sampled records ; size of and ).

  1. Then we define set D, which contains Euclidean distances between records and :

  1. Finally, we define set B containing number of iterations of the whole process, from sampling event (with replacement between each iteration, giving a probability for records to be selected between iterations) to sed D computation. The following formula allow computing set M:

With B = rounded values.

Mainly, I am not pretty sure that I have the right to build this way...

Could you please tell me if it is OK to call functions this way in sets ? And if you spot mistakes in the process ?

Many thanks for your help !