A question about calculating empirical cumulative distribution function

233 Views Asked by At

I have a general question for calculating ecdf (empirical cumulative distribution function) at the annual scale. Assuming I have a basin with 10 climate stations. I have 30 value for every 10 stations (representative of 30 years). Now I want to calculate ecdf on the annual scale for this basin. Is it correct to combine all 30 values of every 10 station to achieve 300 values and then using Matlab ecdf function? Or I should get the average of each year from every 10 stations to achieve 30 mean values than using Matlab ecdf function? Any help or suggestion is highly appreciated.

1

There are 1 best solutions below

0
On

The main question is "ECDF or what?" You don't say. I suspect you're to get one ECDF for the basin, which would mean getting averages for each of he 30 years.

You should understand how the ECDF is made from data before you blindly use Minitab to get the ECDF.

Setps are: (1) Sort the 30 observations from smallest to largest. (2) Make a stairstep function that starts on the left at $0.$ Then at each data value the function jumps up by $1/n = 1/30.$ Unlikely here, but if there are $k$ observations tied at a value, then the jump at that value is $k/n.$

For illustration,here are 30 similated data values from R statistical software. I'll show the data summary, a histogram, and finally, the ECDF made by R. The procedure rug in R puts small tick marks along the horizontal axis at each datapoint. It's fine for small samples.

set.seed(2020)  # for reproducibility using R
x = rnorm(30, 100, 10)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  69.61   97.91  102.50  101.57  109.39  121.74 
par(mfrow=c(1,2))  # enables 2 panels per plot
 hist(x, col="skyblue2", label=T, ylim=c(0,13)); rug(x)
 plot(ecdf(x)); rug(x)
par(mfrow=c(1,1))

enter image description here

As a general principle, one hopes that the hist0gram of a large sample from a distribution will give an idea of the density function of the distribution. Similarly, an ECDF may give an ides of the CDF of the distribution.

A sample of size 30 is not large enough for this to work really well, but here is the histogram (this time on a density scale) along with the density function (red curve) of the distribution $\mathsf{Norm}(\mu=100,\sigma=10)$ that I used to generate the data. Also, the ECDF is shown along with the corresponding CDF.

enter image description here

Of course, your data will be different and results in Minitab look different from those in R, but this should give you an idea what is going on.