I have a two matrices, $A$ and $B$ with dimensions ($m$, $n$) and ($n$, $m$), respectively. I have an iterative machine learning algorithm (similar to a restricted Boltzmann machine) that changes the values of $A$ and $B$ over time. I have good reasons to suspect that due to these changes, $B$ becomes similar to the inverse of $A$ (i.e. $A'$) over time / with training (aside: I have some data samples, where each data point is some vector $v$ with length $m$; transforming $v$ using $A$ gives me a new vector $u$ of length $n$ with $u = Av$; applying $B$ to $u$ gives me $w = Bu$, which over time becomes similar to the original $v$, i.e., the sum of squared errors between $v$ and $w$ decreases).
I would like to characterize the degree to which $B$ becomes the inverse of $A$ over time, and I am looking for a well-behaved similarity measure to do so.
The best that I have come up with so far is based on the identity $I = AA'$: I define a $J = AB$ and compute the ratio $Tr(J) / |J|$, i.e. the sum of values on the diagonal of $J$ compared to the sum of all values of $J$ (if $B$ is the pseudo-inverse, then there should be no off-diagonal terms). The measure is nice as if $B = A'$ then the measure is 1; however, it is unclear to me what values lower than 1 mean, and if a change in the measure from 0.1 to 0.2 is comparable in some way to a change from 0.4 to 0.5.
I am hence looking for either
a) a more principled similarity measure, or
b) some insight into how $Tr(AB) / |AB|$ behaves as B becomes less and less like the pseudo-inverse of A.
As you can probably tell, I am not a mathematician by training so any comments or suggestions on how to improve the question are more than welcome.
Since you are dealing with rectangular matrices, you might get $(AA^+\ne I)$, even using the exact pseudoinverse. Instead you should use the Penrose condition $$AA^+A=A$$ to test how well $B$ approximates $A^+$.
You could use this condition in a ratio test to compare the initial iteration to the $k^{th}$ iteration, e.g. $$\rho_k = \frac{\|A_kB_kA_k-A_k\|}{\|A_0B_0A_0-A_0\|}$$ Obviously, $\rho_0=1\,$ and if your hunch is correct then $$\lim_{k\to\infty}\rho_k = 0$$