Average percentage overlap between two/more datasets

495 Views Asked by At

I am analyzing different schemes of mutual funds. Each scheme has many funds in its portfolio. I wanted to analyse the overlap (of funds) between these schemes. I can find the overlapping of scheme1 with scheme2 $(A \bigcap B) / A $, where $A$ is the no. of funds in scheme1 and $B$ is the no. of funds in scheme2. Is there a way I can find the average overlap between two schemes?

So, suppose these are my data(dataframes in pandas):

funds    Qty   Value   Asset%
fund_0   q_0   v_0      p_0
fund_1   q_1   v_1      p_1
fund_2   q_2   v_2      p_2
fund_3   q_3   v_3      p_3
fund_4   q_4   v_4      p_4
fund_5   q_5   v_5      p_5
fund_6   q_6   v_6      p_6

scheme2 looks like this:

funds    Qty    Value   Asset%
fund_0   q_0_2  v_0_2      p_0_2
fund_2   q_2_2  v_1_2      p_1_2
fund_5   q_5_2  v_6_2      p_6_2
fund_6   q_2    v_2         p_2
fund_7   q_3    v_3         p_3
fund_8   q_4    v_4         p_4
fund_9   q_5    v_5         p_5
fund_10  q_01   v_98        p_59

For this case, I will be having the common funds in these schemes as fund_0,fund_2,fund_5 and fund_6 So I can find the overlap of each scheme with respect to other. Now I have some questions here:

  1. Is there anything like average overlap, which would make sense in the real world?
  2. Right now I am calculating the average overlap by using the number of common funds in each scheme. So for % overlap for scheme 1 above would be 100 * 4 / 7, 7 being the total no. of funds in scheme 1. So is there a more meaningful metric for this?
  3. What would be the overlap of portfolios or schemes mean in case of comparison between 3 or more schemes?