I am well aware that the response time formula for the M/M/1 queue is:
$E[R]_{M/M/1} = \frac{1}{\mu - \lambda}$.
But, I am trying to understand the intuition behind ${\mu - \lambda}$ in the denominator.
For example, suppose we model a queue as a "delay server", with Poisson arrivals and service times that are exponential and does not incur any queueing. We can say that this is an M/M/$\infty$ queue that has the following response time (for any $\lambda$):
$E[R]_{M/M/\infty} = \frac{1}{\mu}$.
Here, note there is no $\lambda$ term in the denominator.
So, can we say that in the M/M/1 response time formula that (essentially) $\lambda$ is a "penalty" term due to the fact that resources are limited (i.e., 1 server) as the load on the system increases?
Does anyone have a better explanation?
Thanks,