why is variance so famous that it appears in almost half of the probability textbook?
What is its significant history so that a statistical model would appear in such textbooks and what does it help to evaluate probability?
In another word, what useful properties makes variance to get a lot of coverage in some probabilty textbook?
Sure there are many reasons related in one way or another to the stability of systems. I'll give you an example that I'm working on. Imagine you use a cloud computing system. You log on to a server and you need to wait for some time to access your information. This is a queueing network where 'jobs' (customers) get in the queue upon arrival and wait for the 'server' (actual server bottleneck).
Image you have two options: one server has mean waiting time of 5 min and variance of 100, and second has mean 10 and variance of 0.1. What does it mean? The mean of waiting time/queue length of server 1 is lower, but if you want to join it you pretty much don't know how long you need to wait for: it can be 1 minute, it can be 15 min, 30 seconds, 40 mins etc. Essentially, mean is a bad estimate of how long you need to wait.
Contrary to that, waiting time of the second server is a much better estimate, because variance is very small: it is probably going to be something like 9.5 minutes, 10.1 minutes. etc.
So variance says something about the stability of the system.