Distribution real case

35 Views Asked by At

I have some data about the number of sold kilograms of oranges of a store and I want to determine the distribution of this set of data in order to make a stock simulation.I have made some examples if my simulation class at college, but the distributions we used were mostly Poisson, but for the firm that I need to do this I do not know how to determine what distribution I should use. Does anyone have an idea of how I could do this? The data I have looks like day-nr of kilograms that were sold in that day and I have information for 100 days.

2

There are 2 best solutions below

1
On

Here are some ideas:

  1. Compute the realized CDF from your data and try to imply what the pdf looks like and compare to standard distributions you have seen.
  2. Try to use a quick modern programming language (I've done this in Python) to numerically fit the parameters for each distribution known to Python and use a Kolmogorov-Smirnov statistic to compare the goodness of fit, picking the best couple and looking at them by hand.
1
On

It would be helpful to see some of your data or at least summaries thereof. I have seen successful modeling of similar situations in which the total sales per day are $N \sim \mathsf{POIS}(\lambda)$ and kg of oranges purchased by each customer $X \sim \mathsf{Norm}(\mu, \sigma).$

Then total sales in a day are $S = \sum_{i=1}^N X_i.$ You can use standard methods ('random sum of random variables') to find $E(S)$ and $SD(S).$ Making reasonable assumptions and matching moments of $X$ to actual experience may enable you to estimate parameters $\lambda, \mu,$ and $\sigma.$ [Depending on experience, you might want to model the $X_i$ is exponential instead of normal.]

A simulation in R statistical software with $\lambda = 50, \mu=4, \sigma = 0.5$ shows approximate daily sales $S.$ In this case, the simulated distribution is roughly normal. The figure uses the simulated $E(S)$ and $SD(S)$ to plot a normal curve through the histogram of simulated daily sales.

set.seed(309)
m = 10^5;  s = numeric(m)
lam=50; mu = 4; sg = .5
for (i in 1:m) {
 N = rpois(1, lam)  
 x = rnorm(N, mu, sg)
 s[i] = sum(x) } 
mean(s);  sd(s)
## 200.0123
## 28.57972

enter image description here

Notes: (1) $E(S) = E(N)E(X) = 200$ is simple, but there are two components to $V(S),$ and so $SD(S)$ may not be obvious.

(2) I used a loop structure in my program. A better approach in R would be to write an R function for daily sales and use replicated to get many realizations of $S,$ but I used loop structure here because it may be more transparent for those who don't use R.