I'll try to be as specific as possible here. This is a problem I'm trying to solve at work. There are two questions:
Question 1: How can I prove I am within a range 95% of the time with 99.99% confidence using a discrete and dependent data set?
Question 2: How many useful data points can I get from a dependent data set to do PDFs with?
Setup:
I have GPS data from flight tests. I can figure out the autocorrelation between each data point and find points that are uncorrelated. Can I used these data points to do the statistics in question 1?
1 data point per flight test ensures independent data, this is best case scenario but it costs more money to do that. I want to get as many useful data points per flight test as possible.
Any ideas?
Suppose the population distribution is $\mathsf{Gamma}(shape=5, rate=1/10),$ which might arise if you were adding five independent exponential waiting time distributions (each with rate $1/10$ or mean $10).$
A simple and traditional way to get an idea of the density function from data is to make a histogram. A more modern and sophisticated way is to use a 'kernel density estimator' of the data. (You can google that if you're interested.)
For samples of sizes $n = 50, 500,$ and $5000$ from this distribution, I'll show a histogram of the data and a KDE of the data (red curve) along with the true density of $\mathsf{Gamma}(5, 1/10)$ in black. As you can see, larger samples tend to give better approximations of the density function. [In practice, I suppose the true density function might not be known.]
Here is the R code that produced the figure, in case it is of any use.
Addendum on ACF: Roughly speaking, the autocorrelation for lag $g=10$ of a sequence $W_i,$ with $i = 1, \dots, 100,$ is the sample correlation of $(W_1, W_2, \dots W_{90})$ and $(W_{10}, W_{11}, \dots, W_{100}).$ However, in finding this correlation, sample means and variances for all 100 observations are used. The ACF function of $W_i$ consists of autocorrelations for lags $g = 0$ (autocorrelation 1), $g =1, 2, 3, \dots.$ You can google 'autocorrelation' for details.
The program below simulates a Markov Chain $W_i$ over $m= 10,000$ steps, and makes an ACF plot of the series $W_1, \dots W_{10000}$ for various lags. For simplicity, $W_i$'s take only values 0 and 1; the state space of the chain is $\{0,1\}.$ [This is a simulated weather process for sun=0 and rain=1 during the rainy season in a Mediterranean climate: e.g., TelAviv, San Francisco, Santiago.]
It seems that autocorrelations decay to insignificance after about lag 10. [Knowing that it rains today has essentially no predictive value for rain ten days from now.] Then the second ACF plot of thinned data $W_1, W_{11}, \dots W_{9991}$ shows essentially no autocorrelations (beyond $g=0).$ Autocorrelations within the broken blue horizontal bands are taken as insignificant.