A two-dimensional persistence diagram in $[0,1]$ say is just a multiset of points of $\mathbb R^2$. Given two diagrams $P=\{p_1=(a_1,b_1),\ldots, p_n= (a_n,b_n)\}$ and $Q=\{q_1=(c_1,d_1),\ldots, q_n= (c_m,d_m)\}$ we define the Wasserstein distances by first letting $P',Q'$ be $P,Q$ with the addition of all points on the diagonal
$$\Delta_2 = \{(x,x): x \in [0,1]\}$$
with infinite multiplicity and then defining
$$W_p^q(P,Q) = \inf_\varphi \left(\sum_{p \in P'} \| p - \varphi(x)\|^q_p \right)^{1/q}$$ $$ \varphi \text{ ranges over all bijections } P' \to Q'$$
In other words we somehow pair up each point of $P$ and $Q$ with either a point of the other diagram, or some point of the diagonal, and compute pairwise distances.
Typically each $a_i$ is the birth time of some topological feature and $b_i$ is the death of said feature. Pairing up with the diagonal is allowed since a feature close to the diagonal has a short lifetime and can be considered unimportant, or maybe the result of noise in the data set.
Suppose instead we are interested in the diagrams $P=\{p_1=(a_1,b_1, z_1),\ldots, p_n= (a_n,b_n,z_n)\}$ and $Q=\{q_1=(c_1,d_1,w_1),\ldots, q_n= (c_m,d_m,w_m)\}$ where the third component somehow encodes the size of the feature.
Of course the above generalises immediately; just add the diagonal $\Delta_3 = \{(x,x,x): x \in [0,1]\}$ to each diagram and use three dimensional $p$-norms rather than two dimensional ones.
The problem here is the physical meaning. The diagonal is no longer a true diagonal. The points have units of (time, time, size) rather than just (time,time). So a point $(1/2,1/2,1/2)$ is not meaningful in the same way $(1/2, 1/2)$ is.
Instead I am interested in computing the Wasserstein distance using the two-dimensional diagonal $\Delta_2^0 = \{(x,x,0): x \in [0,1]\}$. That means a point can only be considered unimportant if it is both short-lived and small. A short-lived but large feature is still considered important. When pairing up points we need them to be close in birth and death times and also sizes.
The problem is that considering all possible pairings is computationally expensive. There are $O( n!+ m!)$ possibilities if we do it naively. So I would rather not reinvent the wheel, and would rather use a more sophisticated and efficient computer package.
Such packages exist, but I can only find ones that use the standard diagonal $\Delta_n$ for $n$-dimensional multisets. Suppose I have a such package. Is there a way to feed such a package modified diagrams and use it to compute distances using $\Delta^0_2$ ?
Edit: I would also be interested in the same question with the tall diagonal $\Delta_2^* = \{(x,x,y): x,y \in [0,1]\}$. In that model a large but short lived component is considered to just be noise. I would also be interested in alternatives to the Wasserstein distance and to cheaper approximations to it.
(For the sake of completeness, turning my comment into a proper answer.)
I'm not sure how to reduce your calculation into one just involving ordinary persistence diagrams. However, I think it is possible to do that reduction if you "go down one level" in the computational pipeline.
GUDHI, a popular python library for TDA, computes Wasserstein distances by first turning a pair of persistence diagrams into a big distance matrix that records pairwise distances between points in different diagrams, as well as distances to the diagonal. This distance matrix is then passed to an optimal transport library (Python Optimal Transport -- POT, in the case of GUDHI), where the "magic" happens in turns of computing Wasserstein distances and optimal matchings with some clever algorithms.
It seems to me that a pair of your modified diagrams can be converted into a distance matrix using the metric you have defined, just as easily as is done for ordinary persistence diagrams. At this point, you can call exactly the same optimal transport methods to solve the matching problem. Since the optimal transport methods don't care where the distance matrix comes from, the pipeline works fine with this modification.