Attempting to model the time distribution of earthquakes

677 Views Asked by At

I'm struggling to choose which distribution would be appropriate for the data I have. I have data for the number of earthquakes every year for 50 years and my initial thought was to use a poisson distribution to model the data. However I've now confused myself and am not sure whether something like a binomial distribution would be better? Thank you

2

There are 2 best solutions below

0
On

You should consider what you are modeling. The Poisson distribution represents discrete counts of occurrences in a continuous time/number of events. (eg how many customers will show up in an hour). The binomial distribution represents discrete counts for discrete events. (eg how many heads out of 10 trials)

For what you are trying to do it seems like poisson is best since an earthquake can happen at any time. Unless you are doing something weird like counting all earthquakes in a discrete time period/place as a single earthquake.

0
On

First, you need a clear definition of earthquake. (a) Is place: Worldwide? California? Japan? (b) Is depth: Any, Up to 30 miles, Etc.? (c) Is magnitude: Above 2? Above 5? Etc.? ('Seismic events' below magnitude .5 are not worth considering because you don't know if they are actually construction noise, mining activity, or a huge truck running into a bridge support. Many small quakes go undetected because they are beyond range of a monitoring station.) Also, it would be interesting to know if there is any clear trend upward or downward in the number of earthquakes per year over the 50-year period.

For example, I have data on earthquakes above magnitude 0.6, in a (roughly rectangular) region defined by longitude and latitude that includes most of California and some of Nevada, at depths up to 30 miles, for a period of 12 days. Numbers of quakes per day were: 33, 49, 28, 33, 37, 40, 51, 55, 38, 33, 33, 32. This is not enough data to answer your question, but enough to know some questions to ask.

You say you have the observed number $X_i$ of quakes per year each of 50 years. If the criteria for inclusion are consistent across the time span, you could do a goodness-of-fit (GOF) test to the Poisson distribution by finding the average number of quakes $\hat \lambda$ per year, and the total count $N$ for 50 years.

Then the (estimated) expected count $E_i = NP_i$ for each year, where $P_i = e^{-\hat \lambda}\hat \lambda^{X_i}/X_i!$.


Correction: [Added a day later.] Originally, I tried to show a method here for computing the chi-squared GOF statistic, but on further reflection, I realize my method might not work for your data. I would have to see the data to be sure. If you are still following this, please list the 50 observations in your Question so I can cut/paste them for computation--and leave a Comment to get my attention. Alternatively, the example below may give you an idea how to proceed.


For my patheticly inadequate data, Minitab statistical software combined count categories in an unsuccessful attempt to get large enough $E_i$s (expected counts). Thus, the GOF test is not conclusive. But a plot of observed and expected values is not totally discouraging.

enter image description here

It is not clear to me how you would check GOF to a binomial distribution, unless you consider your highest annual count to be a true limit.

If you could get data on interarrival times between earthquakes, then you could do a Kolmogorov-Smirnov test of GOF to the exponential distribution. I suppose you know that USGS has excellent data on earthquakes worldwide, so maybe you could get such data from there.

As a side note: I have pondered (so far, unproductively) whether there is a reasonable fit of magnitudes to a continuous distribution. My interest in the particular days for which I have data is that they include several days in 2000 just before and after a notable earthquake in Yountville, California (the far outlier in the histogram below). It has been suggested that a lognormal distribution may fit histograms such as this, but I feel that the drop-off in frequencies at the low end is arbitrary, owing to lack of detection, not lack of existence, for small magnitude events. (There may have been a local foreshock just before the Yountville quake and there were clearly a few small local aftershocks, but all clear effects were strictly local. So I doubt that the earthquakes shown here are atypical, except for the particular outlier of interest.)

enter image description here