So I was given some data of passenger boarding and disembarking numbers for a certain train line (for all the stations in that line).
My question is: Given the mean of disembarking passengers for each station(which was computed with R's summary) is there a way to approximate the probability of a passenger exiting at each station? The constraint is that on the first station, 0 passengers will disembark, whereas on the final station, all the remaining passengers will disembark.
The data look as follows:
data
Trip Station1 Station2 Station3 Station4 ... StationN
1 0 10 4 70 ... 30
2 0 4 40 10 ... 2
...
summary(data)
Station1 Station2 Station3 ...
Min. :0 Min. :0.0 Min. :0.00
1st Qu.:24.25 1st Qu.:0.0 1st Qu.:0.25
Median :32.50 Median :1.0 Median :1.00
Mean :36.30 Mean :0.6 Mean :0.70
3rd Qu.:45.75 3rd Qu.:1.0 3rd Qu.:1.00
Max. :63.00 Max :2 Max :3
The summary results as well as the data are not the actual data.
I believe you can just use the mean of the $i$th station divided by the sum of the means for all stations (including the last station), but I might be misunderstanding the question. Is there a reason my simple answer wouldn't work?