Question
I don't understand how the mean proximity is calculated here like it says take the average of the $x$ components then add it with the average of the $y$ components of these $16$ distances. From my understanding I thought it was like taking all the values of the $x$ coordinates and getting their average and then doing the same for the $y$ coordinates and then finally adding them together but that's not the case. In general, how does one compute the mean proximity of $2$ clusters using the Manhattan Distance?


That’s a terrible text you have there; it’s full of grammatical mistakes and it adds the distances in a different order than it listed them before. If I were you I’d switch to a different text and/or course if at all possible.
The mean distance between the clusters is the distance averaged over all pairs. For the Manhattan distance, this is
\begin{eqnarray} |A-B|&=&\frac1{16}\sum_{ij}|a_i-b_j| \\ &=& \frac1{16}\sum_{ij}\left(|a_{i1}-b_{j1}|+|a_{i2}-b_{j2}|\right)\;. \end{eqnarray}
The solution you quote computes this as
$$ \frac1{16}\sum_{ij}|a_{i1}-b_{j1}|+\frac1{16}\sum_{ij}|a_{i2}-b_{j2}|\;. $$
It’s unnecessarily hard to understand because of the change in order. For instance, $|a_{11}-b_{41}|=5$, $|a_{21}-b_{41}|=5$ , $|a_{31}-b_{41}|=7$, $|a_{41}-b_{41}|=7$, which yields the first term in the solution.