I have $m$ vectors with i.i.d binary random symbols of the same length of $F$ bits. These vectors have a $70\%$ similarity with each other, (It means, for example, if the length of these vectors is $100$ bits, $70$ bits of them are exactly the same within all vectors). These vectors are clustered into $K$ clusters with considering the similarity between vectors with this constraint that the maximum distance (hamming distance)within each cluster be $\leq \Delta$. Now I want to calculate the function of conditional entropy of each vector within a cluster with its centroid and then make an average for all clusters. This function is related to the number of different bits between each vector and its centroid in the cluster. for example, if the number of different bits between vector $X_1$ its centroid $C_1$ in cluster 1 is $p=0.03$ then $H(X_1|C_1)=H(0.03)=-0.03\log0.3-(0.97)\log(0.97)$.
My question is, how I can take averaging for all the vectors over all the clusters since this value of $p$ could be different for each vector? I should take the average over $p$
I will be grateful if you can help me with this question.