What is and how to calculate quartic interpolation?

425 Views Asked by At

I was reading the gist on the reward function used in OpenAI Five, but I didn't understand the way they calculate health's reward.

This is what they state: Hero health is scored according to a quartic interpolation between 0 (dead) and 1 (full health) to force heavier weight when the agent is near dying.

I tried googling but didn't manage to find an easy enough explanation for me to understand. What exactly is quartic interpolation and how is it calculated?

I understand the normalization part of scoring it between 0 and 1, but how do they increase the weight of certain values?

1

There are 1 best solutions below

1
On BEST ANSWER

The explanation of this point is a bit unclear, but since "the agent's reward is the increase in score from one tick to the next", presumably there is some function $h(x)$ that maps the health $x$ to a score.

If the function were $h(x) = x,$ a $0.1$ increase in health would give the same reward whether $x$ was near $0$ or near $1.$ But they wanted the reward to be greater if $x$ is near $0$, so they chose $h$ to be some other function. "Quartic interpolation" suggests they chose a fourth-degree polynomial, a function that can be written in the form $h(x) = ax^4 + bx^3 + cx^2 + dx + e.$ They don't say which polynomial. One possibility is $h(x) = (1 - x)^4 = x^4 - 4x^3 + 6x^2 - 4x + 1,$ but it could be something else.