Apply function on two set to create a new (+ process validation: bootstrap iterations)

56 Views Asked by RPO At 26 Apr 2025 - 3:59

I am new to algebra and help would be more than welcome to tell me if the process I have built is OK, and if my attempt to apply formula on two sets to create a new one is also OK.

Context

I have a database containing species records (e.g. 10 different species, with 100 rows by species ; in columns are quantitative variables). I want to compute Euclidean distances (considering all variables) between randomly sampled 20 row by species, between each species and species h. I want to bootstrap this calculation an increasing number of time to assess the effect of iteration augmentation on results linearity (to say: OK, we have reach linearity, results should be OK). The aim is to show a figure like that (1 line color = 1 species Euclidean distance to species h):

To explain the process, I illustrate it with distance calculation between species $\alpha$ and species h:

We define sets $R_\alpha$ and $R_h$ as original species records.

$R_\alpha = \left \{ 1,2,3,4,...,n | n\in \mathbb{N} \: and\: n\geq 21 \right \}$

$R_h = \left \{ 1,2,3,4,...,n | n\in \mathbb{N} \: and\: n\geq 21 \right \}$

Then we define $S_\alpha$ and $S_h$ as proper subsets of $R_\alpha$ and $R_h$ composed of 20 records randomly sampled in $R_\alpha$ and $R_h$ , without replacement, so that probability P(r) for records to be selected is:

$P(r)=\frac{(N-n)!}{N!}$

$S_\alpha \subset R_\alpha \: and\: S_h \subset R_h\:, with\: n(S)=20$

Then we define the following function to compute the mean Euclidean distance between all records of $S_\alpha$ and $S_h$

$f(x,y)=\frac{1}{n'}\sum_{j=1}^{n'}\left ( \sqrt{\sum_{i=1}^{n}(y_i-x_i)^{2}} \right )_j$

With n = 20 (variables) and n' = 20 (randomly sampled records ; size of $S_\alpha$ and $S_h$ ).

Then we define set D, which contains Euclidean distances between records $S_\alpha$ and $S_h$ :

$d_{(\alpha ,h)}=\left \{ f(x,y)|x\in S_\alpha \: and\: y\in S_h \right \}$

Finally, we define set B containing number of iterations of the whole process, from sampling event (with replacement between each iteration, giving a probability $P(r)=\frac{1}{N}$ for records to be selected between iterations) to sed D computation. The following formula $f(x)$ allow computing set M:

$B\approx \left \{ 1*1.6^x | x\in \mathbb{N}_0\: and\: 0\leq x\geq 20\right \}$

$f(x)=\frac{1}{n''}\sum_{l=1}^{n''}x_{l}$

$M_{(\alpha ,h)}=\left \{ f(x)|x\in D\: and\: n''\in B \right \}$

With B = rounded values.

Mainly, I am not pretty sure that I have the right to build $M_{(\alpha ,h)}$ this way...

Could you please tell me if it is OK to call functions this way in sets ? And if you spot mistakes in the process ?

Many thanks for your help !

Original Q&A

Apply function on two set to create a new (+ process validation: bootstrap iterations)

Related Questions in LINEAR-ALGEBRA

Related Questions in FUNCTIONS

Related Questions in NOTATION

Related Questions in BOOTSTRAP-SAMPLING

Trending Questions

Popular # Hahtags

Popular Questions