I have a large set of values $t = \{t_i\}_{i=1}^N$. In actuality, these values (in some set of units) can range between $0$ and an unknown cutoff of the order of $10^7$, but they come from a numerical simulation which, due to memory issues, I have to downsample, so in the course of the simulation I have dropped all $t_i<5.0$.
I would like to calculate the cumulative probability that $t > T$. When I count the number of $t_i$ greater than $T$, and I plot it versus $T$, I get a nice looking truncated power law type distribution for the counts $N(t>T)$ across the variable $T$.
However, I cannot simply write $ P(t>T) = N(t>T)/N$, because I neglected very many values at $T<5.0$, and I should really be normalizing by the total number of my values, including those I neglected, rather than the size of my downsampled data.
That is, the largest value of $P(t>T)$ should happen at $T=0$, and not at $T=5.0$, which is where it would occur if I did it this way.
How can I handle a truncated dataset of this form? I need to calculate a histogram using the frequency of occurrence of values, but I have no means to normalize the counts, because I don't know how many values should actually exist if I hadn't truncated the data.
Any help is appreciated! Thanks
Why did you choose $5.0$? Clearly you are losing critical information, and the data set you end up with is not a good sample. If you must downsize the sample, don't downsize it by choosing data truncated arbitrarily, but just choose a random sample which is small enough, if possible. This way you can hope to get a representing sample, which you evidently do not have now.