Similarity between mathematical expressions

106 Views Asked by At

I'm currently working on a neural network evaluating algebraic expressions. To validate the model we need a metric $D: X \times Y \to \mathbb{R^+}$, where $X$ and $Y$ are the predicted and correct answers. Examples for $D(\textbf{x},\textbf{y})$ could be, for $Z \in \mathbb{Z}$, \begin{align*} & D\left(\frac{\pi^{Z}}{Z} + \log{Z},\:\:\frac{\pi^{Z}}{Z}\right)\\ & D\bigg(Z \log{Z},\:\:\log{(Z \cdot Z)}\bigg). \\ \end{align*}

For example, if we consider the expression

\begin{equation} \frac{\pi^2}{6}, \end{equation}

our neural network might output a general expression, such as

\begin{equation} \frac{\pi^{Z}}{Z} + \log{Z}, \end{equation}

where $Z$ represents an integer (not necessarily the same integer!). In this case, since $\frac{\pi^2}{6}$ can be written as $\frac{\pi^Z}{Z}$, the output expression would have a "high" similarity score.

We talked about brute forcing it by first pattern matching, in the sense that $\frac{\pi^Z}{Z}$ becomes $\frac{\pi^2}{6}$, and then trying different $Z$ values for $\log{Z}$.

More examples, of a high similarity score, could be (correct vs. generated)

\begin{align} Z \log{Z} \; \; &\text{vs.} \; \; \log{(Z \cdot Z)} \\ Z \; \; &\text{vs.} \; \; Z + Z + Z \; \; \text{(this is probably a very unlikely scenario)} \end{align}

Does anyone know of a metric/method for giving a quantitative measure of how close the above expressions are? We could also use a binary measure, such as the brute force method. We have the data in postfix notation, but can of course change it to other notations if that is needed.