Distribution of waiting time when lifetime is exponentially distributed

136 Views Asked by At

Suppose the lifetime of a PC-hard-drives is exponentially distributed with average lifetime $\tau$. Now, $N$ hard drives are switched on simultaneously. Question: How is the waiting time until the first hard drive fails distributed?


I'm not especially adept at statistics, so I suppose my approach is quite wrong. But here is my idea:

I want to know the probability of one hard-drive failing within the interval $[t,t+dt]$. This means, that in the interval $[0,t]$ none of the $N$ hard drives are "allowed" to fail. The probability for this is

$$\int_0^t f(t') dt' = F(t)$$

where $f(t)$ is the (exponential) density function. Now the probabilty of one hard drive failing (within the mentioned interval) is

$$ 1 - f(t) dt$$

However, since the others still have to be functioning in that very same interval, the probability for exactly one malfunction is

$$ \big(1-f(t)dt \big) \cdot \big( F(t+dt)-F(t) \big)^{N-1} $$

Finally, the overall probability is the product of these:

$$\big( F(t) \big)^N \cdot \big( 1 - f(t) dt \big) \cdot \big( dF dt \big)^{N-1} \cdot N$$

The factor $N$ is there because of all possible combinations of one hard drive failing out of $N$.

As I said, I suppose this approach is not only incorrect, but also very wrong. I would appreciate some help in solving this problem.

1

There are 1 best solutions below

1
On BEST ANSWER

Let's label the $N$ drives with numbers $i \in \{1, 2, \ldots, N\}$ and denote the random lifetimes of each drive as $T_1, T_2, \ldots, T_N$. Each one is independent and identically distributed as an exponential random variable with mean $\tau$. Let $$F_{T_i}(t) = \Pr[T_i \le t]$$ be the cumulative distribution function that gives the probability that drive $i$ has failed by time $t$.

Now let $T_{(1)}$ represent the failure time of the first drive to fail when all $N$ drives are operated simultaneously. Then we have $$T_{(1)} = \min_i (T_1, T_2, \ldots, T_N);$$ that is to say, it is the minimum of the set of random failure times of all the drives. So for instance, if $N = 5$ and we ran each drive until failure and observed $(T_1, T_2, T_3, T_4, T_5) = (10, 25, 34, 15, 9)$, then we have $T_{(1)} = 9$, the smallest observed failure time.

The question you are interested in is, what is $$F_{T_{(1)}}(t) = \Pr[T_{(1)} \le t]?$$ Well, it is easier to work with the complementary probability--the survival function $$S_{T_{(1)}}(t) = \Pr[T_{(1)} > t] = 1 - \Pr[T_{(1)} \le t] = 1 - F_{T_{(1)}}(t).$$ The survival function of the first/minimum failure time is $$\Pr[T_{(1)} > t] = \Pr[\min(T_1, T_2, \ldots, T_N) > t] = \Pr[(T_1 > t) \cap (T_2 > t) \cap \cdots \cap (T_N > t)],$$ because if the smallest of the $T_i$ exceeds $t$, we know that all of the $T_i$ exceed $t$; and vice versa--if all of the $T_i$ exceed $t$, then the smallest also exceeds $t$. This is why we switched from the CDF to the survival, because this logic doesn't work properly if we used the CDF, since $T_{(1)} \le t$ does not guarantee that the other $T_i$ are also $t$ or less; they can be greater.

Now because each $T_i$ is independent, the probability of the intersection of the events $(T_1 > t) \cap (T_2 > t) \cap \cdots \cap (T_N > t)$ is simply the product of the probability of each event; i.e., $$\Pr[T_{(1)} > t] = \Pr[T_1 > t] \Pr[T_2 > t] \cdots \Pr[T_N > t].$$ And because each of the $T_i$ are identically distributed, the LHS is simply the $N^{\rm th}$ power of any single probability of a single drive surviving past time $t$: $$\Pr[T_{(1)} > t] = (\Pr[T_1 > t])^N.$$ Written in terms of the CDF, we then have $$F_{T_{(1)}}(t) = 1 - S_{T_{(1)}}(t) = 1 - (\Pr[T_1 > t])^N = 1 - (1 - F_{T_1}(t))^N. \tag{1}$$

Note that our derivation does not use the fact that the $T_i$ are exponentially distributed, so formula $(1)$ is distribution-free. The only requirement is that the $T_i$ are independent and identically distributed.