There are two Poisson processes (simultaneously, but independently): event A happens in Poisson process with $\lambda_{A}$ and event B happens with $\lambda_{B}$. I would like to find the probability density function for random variable $X$, where $X$ is the time from the start until the first event A happens which is preceded by at least one B.
My solution:
$F_{X}(X\leq x)=\int_{0}^{x}\lambda_{A}e^{-\lambda_{A}t}dt[1-e^{-\lambda_{B}t}]$ $=\int_{0}^{x}\lambda_{A}e^{-\lambda_{A}t}dt-\int_{0}^{x}\lambda_{A}e^{-(\lambda_{A}+\lambda_{B})t}dt=1-e^{-\lambda_{A}x}+\frac{\lambda_{A}}{\lambda_{A}+\lambda_{B}}[1-e^{-(\lambda_{A}+\lambda_{B})x}]$.
Therefore, $f_{X}(x)=\lambda_{A}e^{-\lambda_{A}t}[1-e^{-\lambda_{B}t}]$.
Explanation: The probability that A first happens at time $T$ in a small interval is $\lambda_{A}e^{-\lambda_{A}t}dt$. The probability that B happens before $T$ is $1-e^{-\lambda_{B}t}$. Because they are independent, the joint probability is the multiplication of individual ones.
The correct answer is follows: Let $X_B$ = the time from the start to the time of the 1st event B, and $X_A$ = the time from the 1st event B to the time of the 1st event A. $X=X_A+X_B$.
So its PDF is the convolution of the two exponential distributions. For $x \geq0$, $f_{X}(x)=\int_{-\infty}^{+\infty}\lambda_{A}e^{-\lambda_{A}y}\lambda_{B}e^{-\lambda_{B}(x-y)}dy=\frac{\lambda_{A}\lambda_{B}}{\lambda_{B}-\lambda_{A}}(e^{-\lambda_{B}x}-e^{-\lambda_{A}x})$.
Please tell me where I am wrong and how to correct it.
Updates after @heropup 's answer.
Update 1: @heropup is right, the correct answer is $\frac{\lambda_{A}\lambda_{B}}{\lambda_{A}-\lambda_{B}}(e^{-\lambda_{B}x}-e^{-\lambda_{A}x})$
Update 2: I misunderstood the problem as "the X is the time until first A happens and the first A happens after at least one B". In that sense, $(A_1,A_2,A_3,B_1,A_4)$ is not in my sample space. Under this condition, will $f_{X}(x)=\frac{\lambda_{A}e^{-t\lambda_{A}}[1-e^{-t\lambda_{B}}]}{F_X(\infty)}$ be the correct answer? The $F_X(\infty)$ makes $f_{X}$ integrals to 1. My thought is since $f_{X}$ contains the new subsample space, normalizing it will makes it a new complete sample space.
What is the pdf of X if the X is the time until first A happens and the first A happens after at least one B?
You seem to be suggesting that $$\Pr[X \le x] = \int_{t=0}^x \Pr[B \le t \mid A = t]f_{A}(t) \, dt = \int_{t=0}^x \Pr[B \le t] f_{A}(t) \, dt.$$ But this is not true. To see why, suppose the sequence of events we observe are $$(A_1, A_2, A_3, B_1, A_4).$$ Then $X = A_4$ is the time we want, but three previous $A$-type events have already occurred before we observe the first $B$ event. Your computation does not account for this kind of outcome.
A few comments. First, the correct answer contains a typo, and is also incomplete: it should be $$f_X(x) = \begin{cases} \frac{\lambda_A \lambda_B}{\color{red}{\lambda_A - \lambda_B}} (e^{-\lambda_B x} - e^{-\lambda_A x} ), &\lambda_A \ne \lambda_B, \\ \lambda_A^2 x e^{-\lambda_A x}, & \lambda_A = \lambda_B.\end{cases}$$ because if $\lambda_A > \lambda_B > 0$, then $e^{-\lambda_B x} > e^{-\lambda_A x}$; and if $\lambda_A = \lambda_B$, the derived expression is indeterminate; instead, the sum $X_A + X_B$ is Gamma distributed with shape $2$ and rate $\lambda_A = \lambda_B$.
Second, one way to see that your result cannot be correct is to integrate your density: you will find yours integrates to $\lambda_B/(\lambda_A + \lambda_B) < 1$. Another way is to note that when $\lambda_B \ll \lambda_A$, that is to say, $B$-type events are much more rare than $A$-type events, the distribution of $X$ is mainly influenced by the random time it takes to observe $B$. For instance, if $\lambda_B = 1/1000$ but $\lambda_A = 1000$, then $X \approx X_B$. The correct answer behaves in this way, but yours does not.
Third, how do we fix your approach? I do not see a simple way to do it if you are conditioning on $A$, because $A$ depends on whether we have observed $B$. Therefore, to condition on $A$ is in a sense circular logic: $B$ depends on $A$ but $A$ depends on $B$. You could condition on $B$, but this is precisely what the correct solution does: $$\Pr[X \le x] = \int_{t=0}^x \Pr[B < A \le x \mid B = t] f_B(t) \, dt = \int_{t=0}^x \Pr[A - B \le x - t \mid B = t] f_B(t) \, dt,$$ and then the solution uses the memoryless property and independence of $A$ and $B$ to conclude $A - B = X_A$, $B = X_B$, and $\Pr[A - B \le x - t \mid B = t] = \Pr[X_A \le x-t]$. After differentiating to get the density, we obtain the same result.