Suppose that $Y_1,Y_2 ~ i.i.d$ Expo(b)
Find the pdf of the second order statistic, $U_2 =max(Y_1,Y_2)$
$F_{U_2}(u)= P(U_2<u) = P\{Max(Y_1,Y_2)<u\}= P(Y_1<u,Y_2<u)=P(Y_1<u)P(Y_2<u)= P(Y_1<u)^2 =(1-e^{-u/b})^2$
Then you do some differentiation and get the answer.
The problem is that I am confused about how we go from $P\{Max(Y_1,Y_2)<u\}$ to $P(Y_1<u,Y_2<u)$ and from $P(Y_1<u)P(Y_2<u)$ to $P(Y_1<u)^2$ could someone please explain?
Distribution of the maximum of two exponential random variables.
With the help of Comments by @Callculus and @JusttoAnswer, I hope the derivation of the CDF of $W = \max(Y_1,Y_2)$ is clear, and that you have PDF $f_W(u) = (2/b)(1 - \exp(-u/b))\exp(-u/b),$ for $u > 0.$
This is an important relationship in applied probability theory, particularly in reliability and queueing applications.
From the PDF, you can find that $E(W) = b/2 + b.$ Here is an intuitive rationale for the mean in two steps:
(1) The first order statistic is $V = \min(Y_1, Y_2) \sim Expo(b/2),$ which has mean $E(V) = b/2.$ That makes sense intuitively. If a system consists of two components of exponential lifetime connected in series, then the system fails when the first component fails. With two similar components at risk, the time to failure should be half as long as with only one at risk.
(2) After the first component fails, we wait for the second to fail in order to get the average lifetime of the second order statistic. But, by the no-memory property of exponential distributions, the second component doesn't 'remember' it has been going for time $V$ already, so its additional lifetime averages $b$. Hence, $E(W) = b/2 + b = 1.5b.$
When two components are connected in parallel, the average lifetime of the parallel system is longer $W$ than the average lifetime $Y$ of a single component. Sometimes communications satellites fail because the CPU of the onboard computer gets zapped by a cosmic ray. So the exponential distribution is a good model for lifetimes. If two CPUs are connected in parallel, so that the second can take over when the first gets zapped, then the average lifetime of the computer may be extended from $b$ years to $1.5b$ years.
It is easy to illustrate the distribution of $W$ using R statistical software. In R, the parameter for an exponential distribution is the failure rate $1/b$ instead of the average lifetime $b.$ In the simulation, I let $b = 5.$
The histogram below, based on a million realizations of $W$, approximates the PDF of $W$. The black curve is just a graph of the PDF of $W$, which you derived.