So I have the following question assuming I start with N academic papers, though I was thinking to make this simple I start with one academic paper. And say it has C citations, and each one of these C citations, has their own D citations. Is there a way I could estimate when there would be a convergence of the number of citations. Basically when I would end up reading citations that I had already read? (So basically D would be one of the N papers)
(The inspiration for this question is because I have to do a literature review in Bioengienering, and I was wondering, what depth I should go to to read relevant papers, that are cited by a relevant paper, that is sufficient to have done an in depth analysis of my sub field)
I'll sketch an example of how we can mathematically model this citation network to estimate the number of nodes. It's going to be a fairly crude model, but it should give an idea as to how more sophisticated models could work. (The idea presented here is similar to the mark and recapture method used to estimate animal population sizes.)
We model the citation network as a directed network (no loops, no parallel edges).
Empirical data. We first collect the following statistics:
Null model. The choice of null model will be critical for the accuracy of the result, but I'll assume a fairly basic model to illustrate the concept. We have a null model with the following parameters:
Then, $p$ is approximated by $$\hat{p}_n:=1-\frac{\displaystyle\binom{n-2}{d}\binom{n-2-d}{d}}{\displaystyle\binom{n-1}{d}^2}.$$ This estimates the probability that, in the null model, two distinct vertices $X'$ and $Y'$ are (a) not neighbours and (b) do not have a common out-neighbour.
Estimate for the number of nodes. Since we have estimates for $d$ and $p$, we can estimate $n$ from the above formula (pick the value of $n$ for which $\hat{p}_n$ is closest to $p$). This is the maximum likelihood estimate.
More sophisticated null models. More sophisticated null models would give better estimates, but would be mathematically harder to analyse.