Based on multiple daily weather observations, how can I calculate the "most similar" day to any other given day?

Question

Based on multiple daily weather observations, how can I calculate the "most similar" day to any other given day?

75 Views Asked by Bumbble Comm At 02 Apr 2026 - 8:36

This is definitely a case of "not even sure how to ask the question," but I am wondering if there is math available to solve a problem I have.

I have several years of daily weather observations (high temp., low temp., avg. temp., avg. humidity, avg. wind speed, total rainfall, avg. air pressure, total solar radiation, etc.). For any given day, I have records for each of these observations.

Here's an example set of the data (there are thousands of rows like these):

Date	High Temp.	Low Temp.	Avg. Temp	Avg. Humidity	Avg. Wind	Rain	Avg. Press.	Solar Radiation
2022-03-09	46.0	35.1	40.32	60.71	3.71	0.00	30.22	46281
2022-03-10	45.7	30.2	39.76	51.22	1.21	0.00	30.32	47012
2022-03-11	49.6	39.0	44.29	59.17	1.58	0.00	30.24	47745
2022-03-12	55.2	42.1	46.58	67.13	2.03	0.04	29.86	48477
2022-03-13	52.2	42.3	46.91	86.91	3.43	0.07	29.87	49219
2022-03-14	48.4	43.2	44.75	93.99	3.00	0.49	30.06	49958
2022-03-15	51.6	43.7	46.11	91.60	4.61	0.26	30.04	50699
2022-03-16	54.0	42.6	46.42	80.62	3.18	0.00	30.28	51442
2022-03-17	46.9	42.3	44.03	87.85	1.47	0.00	30.14	52185

I am hoping to use these data points to find days that are "most similar" to any other given day. So, out of the thousands of days, which day was most similar, weather-wise, based on the daily observations? Using the table example above, how would you determine which day was closest/most similar in weather observations to 2022-03-13?

I'm not even sure of the right terminology to use when asking this question; this is a new concept for me. If I had to describe in the words I know, I'd be saying that I'm seeking "a calculation for similarity based on multiple daily data points," but even that is probably off somehow. I'm not sure if I need to create an "index" for each day or not, but that doesn't seem right. Any help out there in steering me towards better knowledge would be sincerely appreciated. Thank you.

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I was able to find a solution to this issue and would like to offer it here as a suggestion for others, and to solicit feedback if there are any other ideas or better ways of handling it.

I used a "Gower Distance" calculation (using the math on this page). This method seems to help with the issues of some data being single digit integers, while other data in the same row is in the five-digit range. This normalizes the differences and averages them.

I looped through the entire dataset doing an absolute difference between the test value and all other values, then summed those differences and divided by the number of measurements, preserving the distance number in a new column. Which ever item had the lowest score was the closest data.

For example, if the comparison date was 2022-03-09, I did this for 2022-03-10:

Date	High Temp.	Low Temp.	Avg. Temp	Avg. Humidity	Avg. Wind	Rain	Avg. Press.	Solar Radiation
2022-03-10 to 2022-03-09 Range	9.5	13.5	7.15	42.77	3.4	0.49	0.46	5904
2022-03-10 Abs. Diff	0.3	4.9	0.56	9.49	2.5	0.00	-0.1	-731
Abs of diff. / range	0.032	0.363	0.078	0.222	0.735	0.000	0.217	0.124

Resulting in

(0.032 + 0.363 + 0.078 + 0.222 + 0.735 + 0.000 + 0.217 + 0.124) / 8 = 0.221

I did that same calculation on each row, which resulted in Gower Scores like this:

Date	Gower Distance
2022-03-10	0.221405984033978
2022-03-11	0.2721254859974
2022-03-12	0.530359934486749
2022-03-13	0.525491314993568
2022-03-14	0.553721902424608
2022-03-15	0.586683106346751
2022-03-16	0.484598891766925
2022-03-17	0.451780599954877

March 10 was the day most similar to March 9, while March 15 is the least similar.

In lots of random sampling, this seems to be highly accurate for my purposes. I've even incorporated additional weighting for certain measures that are underrepresented.

For the future I am likely to turn this into a MySQL function of some kind, to reduce the programming overhead on my page.

Based on multiple daily weather observations, how can I calculate the "most similar" day to any other given day?

There are 1 best solutions below

Related Questions in CORRELATION

Trending Questions

Popular # Hahtags

Popular Questions