Trying to determine an exact non-linear formula from a scatterplot of gathered values

40 Views Asked by At

I'm sorry to say that I have retained so little knowledge of mathematics from college I don't know where to begin with this:

I am trying to determine an unknown (to me) formula within a game using a scatterplot of known data that I've gathered by hand. The game does not, as far as I can tell, feature random variation, as these numbers were entirely consistent - therefore, I would expect a formula to exist which perfectly encompasses all of these values. The scatterplot of data I gathered is below:

Scatterplot of gathered data

In looking up the answer, it sounds like what I'm trying to do most closely aligns with Non-Linear Regression, but looking into any articles on this seems to discuss finding a "line of best fit" which does not perfectly encompass all data values, which does not match my use case - there is a formula hard-coded into this game, and I'm looking to reverse-engineer that exact formula.

Excusing the game language - the game features a "Defense" stat, which reduces the damage taken by attacks by a given amount. It seems to feature diminishing returns, and I would logically expect this to be a logarithmic equation, however the tools I've used have claimed that there exists a polynomial line of best fit with a higher R value (which I think correlates to accuracy?).

I am very not good at reading / translating complex math nomenclature, so I've tried a few shortcuts to no avail. Excel, Google Sheets, and Wolfram Alpha have all given me lines of best fit which once again approximate the data set fairly well, but are not exact. (my axes are "Defense" to Percentage of Expected Damage (Actual Damage (tested) / Expected Damage (calculated)), so while it may be accurate to within 0.1, that is a 10% variance in practice, which is a significant margin of error and essentially unusable.)

In Summary: Given the below data set, I am trying to identify what the formula/function defining the relationship between the two axes is. If someone could explain simply the process of identifying this formula, or point me to the name of the process I'm trying to conduct, or any simple / helpful guides on how to do it, I'd greatly appreciate it.

Apologies, I didn't previously add the data set because I didn't want it to come off as though I were just asking for the solution. I would like to learn! However, to respond to comments asking for such, here is the dataset I've been using:

(352, 1.5998),
(358, 1.5875),
(391, 1.5357),
(475, 1.4113),
(560, 1.3064),
(569, 1.2944),
(620, 1.2403),
(753, 1.1155),
(880, 1.0175),
(896, 1.0062),
(973, 0.9562),
(1182, 0.8421),
(1404, 0.7478),
(1429, 0.7382),
(1548, 0.6967),
(1882, 0.6018),
(2084, 0.5562),
(2122, 0.5483),
(2296, 0.5151),
(2792, 0.4390),
(2868, 0.4294),
(2922, 0.4229),
(3159, 0.3958),
(3841, 0.3348),
(6359, 0.2132)
2

There are 2 best solutions below

0
On BEST ANSWER

tl;dr: The underlying law of your data is probably $$\text{Percentage}=\frac{1}{0.387+0.000677\cdot\text{DEF}}$$

In order to make any progress on these "forensic mathematics"-type problems, one must assume that the underlying law is simple. As Robert Israel points out, even the polynomials are rich enough to match any function, but in a way that does not generalize (even in-sample!).

A simple function is plausible here, because game developers don't want to do complex math to figure out how hard their game rules might be. But my end result is inconsistent with "real life", so it's possible that the game uses a more sophisticated model that just happens to be approximated well by my more simple formula.

There are two common and simple functions that satisfy a consistent decay over the range of inputs, as is the case here:

  • exponential decay, which has the functional form $$\text{Percentage}=Ae^{-B\cdot(\text{DEF)}}$$ and looks like so:
    An exponential decay and
  • inverse proportionality, which has the functional form $$\text{Percentage}=\frac{1}{A+B\cdot(\text{DEF})}$$ and looks like so: Inverse proportion decay

The former has the advantage that it describes the underlying physical laws of combat well: if your "sword stroke" has uniformly distributed strength within a range, but then those strokes are subjected to a large number of filters ("armor"), each of which has a small probability of "absorbing the blow", then the end result is an exponential distribution. (This is a special case of the Poisson limit law.) The latter has the advantage that it looks more like your original plot to my experienced eye.

Luckily, $2$ is a finite number, so we can just try all possibilities.

Whichever model we want to investigate, the first step is to change to different quantities which will linearize the model. In the exponential decay case, this means taking logarithms: $$\log(\text{Percentage})=\log(A)-B\cdot(\text{DEF})$$ In the proportionality case, it means inversion: $$(\text{Percentage})^{-1}=A+B\cdot(\text{DEF})$$ Then we just plot the transformed variables and see whether the result is in fact linear; if so, then we've found the underlying law.

In the exponential-transformed case, the resulting plot is Log(%) v. DEF

IMHO, that plot doesn't look linear.

In the proportionality-transformed case, the resulting plot is 1/% v. DEF

The latter looks to me like a straight line, suggesting no need for further statistical analysis.

Running a linear regression on the linearized data recovers the law cited above; testing on all the original data recovers the output to $3$ decimals places; if you need more, Mathematica's full result was $$x\mapsto 0.387193+0.000676877 x$$

0
On

Exact interpolation (which is what you seem to be asking for here) is highly unstable. So we might find, say, a polynomial of degree $24$ that exactly reproduces the $25$ given values, but the graph between those data points would be a wild mess. Moreover, the values in the second column are likely not really exact anyway, but rather rounded to $4$ decimal places.