I have the results of a constructed logistic regression model in which the objective function is $Y = Y(X_1, ..., X_k)$. By result I mean here the values in the interval $[0,1]$ obtained by the logistic regression.
Consider the problem: we are given a sample of $m$ observations of a new categorical feature $X_{k+1}$ with $\ell$ categories. For each observation in this sample we have a score $Y$ (we do not have the values of the variables $X_i, i=1,2,...,k$). We have information on some characteristic $Z$ that determined the original binary target (this is some loss information). The higher $Y$, the higher $Z$ should be.
Our aim is to estimate, based on the received sample, how many more observations of the new feature $X_{k+1}$ we need in order to be able to build the model with variables $X_1, ..., X_k, X_{k+1}$ at a certain significance level. Do you know of any methods to estimate the size of such a sample and what parameters should be taken into account when determining the sample size?