I have three data sets consisting of times (hours) when something happened. The data sets have 2005, 826, 276 events respectively, so they're quite disproportionate.
What I wanted to do was show how the events are distributed over 24 hours, so I created a histogram. But, as the first data set is much larger than the others, the resulting graph just looks like the first data set's distribution.
How can I transform my data to ensure that each data set gets the same amount of impact on the resulting distribution?
I'm using the Python NumPy library to plot my data, so a method using this library would be preferred. I was using the numpy.histogram function.
I followed vadim123's advice and this the following: For each data set I counted the number of events for each hour divided by the number of total events in that data set. After that I combined the three resulting distributions and divided each percentage by the number of data sets.
Thanks for the help!