Calculating probabilities of tornadoes

761 Views Asked by At

First of all, I am neither mathematician or statistician. I am an amateur who loves working with data. I have been trying to figure out to calculate the probabilities of tornadoes happening in certain locations.

I am only looking at Ohio and I have the following data for each 0.5 miles to 0.5 miles grid:

  • If that grid was not impacted by a tornado at all, the value is 0
  • If that grid was impacted by just one tornado, the value is 1
  • If that grid was impacted by multiple tornadoes (2 or more), the value is 2

And the data is as followed:

  • 93.83% of Ohio was never impacted by any tornadoes. (0)
  • 4.66% of Ohio was only impacted by one tornado. (1)
  • 1.50% of Ohio was impacted by multiple tornadoes. (2)

I am trying to calculate the probability of a tornado occurring in a location that was impacted by another tornado, and my calculations are as followed:

Probability of a tornado occurring independent of its location:

(4.66+1.50) / 100 = 6.17%

Given a tornado has occurred, probability of that tornado impacting a location that was impacted before:

1.50%/6.17% = 22.18%

Given a tornado has occurred, probability of that tornado impacting a location that was not impacted before: 4.66%/6.17% = 77.82%

Therefore probability of a tornado occurring in a location that was impacted before is:

(6.17 * 22.18)/100 = 1.37%

I feel like I am making a mistake somewhere but I can't pinpoint the problem. Are my calculations correct? If not, where am I making a mistake?

2

There are 2 best solutions below

0
On BEST ANSWER

Whenever you ask for the probability of an event, you always need to specify the space of potential events from which the event is to be drawn. Sometimes the space is implicit from context—for example, a fair die roll is assumed to be drawn from the uniform distribution over {1, 2, 3, 4, 5, 6}. But in your case, there are several different questions you could be asking:

  1. If we select a random location, what’s the probability that it was impacted by multiple tornadoes?
  2. If we select a random tornado, what’s the probability that it occurred in a location that was impacted by another tornado?
  3. If we select a random location that was impacted by a tornado, what’s the probability that another tornado occurred there?

Your work makes it appear as if you’re calculating (1). Your arithmetic in “1.50%/6.17% = 22.18%” is incorrect, and if you corrected that, you’d notice that the final result is just 1.50% (because you take 1.50%, divide it by 6.17%, and then multiply it by 6.17% again). That is indeed the correct answer to (1); you’ll have to decide whether (1) is the question you intended to ask.

0
On

The other poster is correct. You need to keep in mind not only what event you are looking for, but also what your distribution space is. Your work, as currently stated, calculates the following:

(4.66+1.50) / 100 = 6.17% answers the following question:

If we choose a random location in Ohio, what are the chances it has had at least one tornado?

1.50%/6.17% = 22.18% answers:

If we choose a random location with a tornado, what are the chances that it has had more than one?

4.66%/6.17% = 77.82% answers:

Out of locations that have had at least one tornado, how much have only had one?

(6.17 * 22.18)/100 = (6.17 * 1.50%/6.17%) should equal 1.50%; this answers:

If we pick a random county in Ohio, what are the chances it's had more than one tornado?

This is the original statistic that you started out with, and this is not what you needed. Since you are looking for a 1-year probability of a tornado hitting an area that another tornado has hit, you need to interpret your data differently, as follows:

1) Separate your data into years.

2) For each year, calculate how many locations have seen their first tornado, and how many have seen a repeat tornado.

3) As you progress through the years, the number of new locations would theoretically diminish (as the number of locations without a tornado dwindles), and the number of repeat locations would theoretically increase (as the number of the locations with at least one tornado increases). This model is called a $logistic$ $regression$ $model$.

4) Therefore, you would enter this data into a TI-83 calculator (or any other statistics-supporting software). The output would be a logistical equation for the number of new locations, and a logistical equation for the number of repeat locations.

5) In order to calculate a one-year probability, you would integrade this equation from this year to the next year. A 49-year probability would require integrading from 2020 to 2069.

6) Don't forget about the margin of error. The farther in advance you plan, the more error you will accumulate. In addition to simple standard deviation, there are factors, such as global warming, that will affect the actual numbers, but which cannot be effectively predicted and entered into your model. So chances are, your predictive model will be theory at best.