I aim to model weekly room occupancy as percentage but observe a very "bipolar" (?) distribution (majority of rooms either have 0% or 100% occupancy). This made me wonder, if I possibly do something wrong in the way I calculate the weekly occupancy percentage for a room. The following shows a few example observations for room with ID = 1 (there are 1000s of rooms):
Room Id Cal Date Week Number Booking RoomFree
1 16/12/2019 51 0 1
1 17/12/2019 51 0 1
1 18/12/2019 51 0 1
1 19/12/2019 51 0 1
1 20/12/2019 51 0 1
1 21/12/2019 52 1 0
1 22/12/2019 52 1 0
1 23/12/2019 52 1 0
1 24/12/2019 52 1 0
1 25/12/2019 52 1 0
1 26/12/2019 52 1 0
1 27/12/2019 52 1 0
1 28/12/2019 1 1 0
1 29/12/2019 1 1 0
1 30/12/2019 1 1 0
1 31/12/2019 1 1 0
1 01/01/2020 1 0 0
1 02/01/2020 1 0 0
1 03/01/2020 1 0 0
1 04/01/2020 2 0 1
1 05/01/2020 2 0 1
1 06/01/2020 2 0 1
1 07/01/2020 2 0 1
1 08/01/2020 2 0 1
1 09/01/2020 2 0 1
1 10/01/2020 2 0 1
The same data as picture with week 1 highlighted red:
I think it is important to note, that bookings (Column Booking) can span over several weeks (e.g. 21/12/2019-31/12/2019). Some weeks also have not the full capacity of 7 days (e.g. week 1 highlighted red).
The way I calculate weekly occupancy percentage per week is by adding RoomFree and Booking, which gives me the capacity:
Capacity = Booking + RoomFree
So in week 1 that would be 4. The occupancy percentage is then simply calculated by dividing the sum of Booking by the capcity. So for week 1 that would be 4/4 = 1.0 (aka 100%).
Is this calculation flawed and thus results in this very "bipolar distribution" I observe (automatic binning by seaborn - granularity of percentages larger):
It appears that if I use "monthly grain" the distribution is more uniform.
I hope this makes sense and apologies for bothering proper mathematicians which such a possibly mundane and simple problem. Thanks in advance for any help.
PS:
Maybe it is worth considering the fact that different capacities allow for different occupancy percentages?


