I know that this is a common problem for students, and evidently I'm no exception; but I simply cannot wrap my head around the difference between when we desire the probability of an intersection of events and a conditional probability.
Here is the example I have in mind (from Ross's Introduction to Probability Models):
Machine $1$ is currently working. Machine $2$ will be put in use at time $t$ from now. If the lifetime of machine $i$ is exponential with rate $\lambda_{i}, i = 1, 2,$ what is the probability that machine $1$ is the first machine to fail?
The solution is as follows:
$$ \mathbb P(M_1 \lt M_2) = \mathbb P(M_1 \lt M_2 \vert M_1 < t)\cdot \mathbb P(M_1 < t) + \mathbb P(M_1 < M_2 \vert M_1 > t) \cdot \mathbb P(M_1 > t). $$
Now, I understand this solution. It states, in light of Bayes's Rule*, that we desire the probability that the lifetime of machine $1$ is less than the lifetime of machine $2$ and the lifetime of machine $1$ is less than $t$ in addition to the probability that the lifetime of machine $1$ is less than the lifetime of machine $2$ and the lifetime of machine $1$ is greater than $t$. That all makes sense to me. My question, however, is why is it not simply
$$ \mathbb P(M_1 \lt M_2) = \mathbb P(M_1 < t) + \mathbb P(M_1 < M_2 \vert M_1 > t). $$
Linguistically, this seems to satisfy the solution to the problem; i.e., "the probability that the lifetime of machine $1$ is less than $t$, in addition to the probability that the lifetime of machine $1$ is less than that of machine $2$ given that the lifetime of machine $1$ is greater than $t$. I understand that the first term here is actually the same as the solution's, and that the solution formulates it as it does for illustrative purposes; but the second term certainly has a different value than that of the solution's.
I hope my question here makes sense. If any further clarification is needed, please let me know. Also, I understand that this exact problem appears elsewhere on stackexchange, but it was more about the computation.
*The Bayes's Rule to which I'm referring is, of course, the following formulation:
$$ \mathbb P(AB) = \mathbb P(A \vert B) \cdot \mathbb P(B) $$
There's no justification for the "given that the lifetime of machine $1$ is greater than $t$" part. This is not given. It should be and, as in your first paraphrase. If $P(M_1\gt t)$ is very low, the contribution from that term should be very low; whereas in your version its contribution could be up to $1$ no matter how unlikely $M_1\gt t$ is.