Will calculating standard deviations from percentages give different results to claculating them from the raw data?

80 Views Asked by At

I'll start off by saying I am not a mathematician, nor have I studied maths for many years, so please keep the answers simple :)

I am working with data that has a geographical context. So assume I have data about an area 10km by 10km. My data is split into 1km by 1mk tiles, so I have 100 rows of data. 1 row for each km square in my area.

Values in my data include the following, to name a few:

Number of buildings

Number of addresses

Number of postcodes

% ground area covered by buildings in the km square

I am creating some cloropleth maps to give an overview of the whole area. I am symbolising the data using standard deviation so I can show which km squares are above and below the mean.

This works well for things such as building counts, so I know if a single km square is above or below the mean for the whole 100km area.

My question is, am I able to calculate the standard deviation for my % values? i.e. % ground area covered by buildings, % ground area covered by roads etc... Will this give me a sensible result?

Or would it be best to calculate the standard deviation on the actual area covered by these features in each km tile?

1

There are 1 best solutions below

2
On BEST ANSWER

For the purpose of this answer I'll use the "% ground area covered by buildings". Since each of your data points corresponds to $1$ km by $1$ km tiles it actually doesn't "matter" if you use percentage or the raw data.

Say one tile contains $.25 \text{km}^2$ of buildings. Then the percentage covering is $25\%$. In general if the raw data is $X$ then the percentage is just is $100X$. In effect all this does is scale the standard deviation. You will find that if the standard deviation of the raw data (in those units) is $\sigma$, then the standard deviation of the percentage is $100\sigma$. Then how you symbolize these deviations can just take into account the difference in scaling.