Probability distribution for modeling the hour of the day

28 Views Asked by At

I have a dataset of observations with several variables. One of the variables is the hour of the day from which the specific task (corresponding to an observation) has been initiated, say $h\in\{0, 1, ..., 23\}$. I want to model this variable using a probability distribution and then use goodness-of-fit methods to evaluate the model (something like parametric density estimation).

One way is to think of $h$ as a discrete ordinal variable and estimate its PMF using relative frequencies, i.e., $\operatorname{Pr}(h=k) = \frac{\#\text{observations with}~~ h=k}{\#\text{all observations}},~~~~k=1, 2, 3, ..., 23$

However, I think it is naive to think of the hour in this way. My main question is that, is there a good distribution for describing $h$? Eventually, I need the PDF of such distribution.

Variables like $h$ are called cyclic variables and often for machine learning (ML) applications, it is typical to convert them to two new continuous variables using bellow transformation (and then use these two variables for training ML models):

$ h_1 = \sin(2\pi h/24)\\h_2 = \cos(2\pi h/24)$

Is it OK to use $h_1$ and $h_2$ instead of $h$? if Yes, is it OK to estimate their densities using kernel density estimation (KDE)?

P.S., if it helps, there is another variable, day of the month in which the specific task (corresponding to an observation) has been initiated. So, for every observation, we know which day of the month and which hour in that day, its corresponding task has been initiated (all observations are recorded from a specific month of a year)