I'm working in an unit simplex of size $S$, where hyperplanes are defined as $S$-dimensional vectors $H = \left \langle h_1, ..., h_S \right \rangle$ where each $h_i$ defines the height of $H$ at that particular simplex corner.
I'm interested in computing the distance $d$ between a point $\mathbb{p}$ and the closest intersection of two hyperplanes $H_1$ and $H_2$ which are guaranteed to intersect in the space.
I found some code that supposedly does this using the formula
$$d = \frac{|H_1 - H_2|\cdot p}{\left \| H_1 - H_2 \right \|_2}$$
But I don't understand how this is supposed to work. While looking I've found this question and this Wiki article on the Gram Schmidt process, so I guess there is some kind of projection going on, but I'm not entirely sure why this is supposed to work.
I assume the numerator represents the "height" distance between the two planes and the denominator represents the slope between them (so that the more similar they are, the higher the distance will be), but I'm still unclear on what's happening.