"Homogeneous" combination of different functions

25 Views Asked by At

(I know that "homogeneuous" is not the right word, even potentially misleading, but I don't know any better — I am thankful for suggestions!)

In reinforcement learning (RL), an agent learns from a reward that is given to it. For actions that lead to a desired state, the reward is positive; negative states generate negative rewards. Over the course of many episode, the agent learns, which actions are beneficial and which not.

This reward function is designed by the experimenter. For popular applications of RL, it can be as simple as an ELO ranking, a high score, or simply the time the agent "survives" in the game. These types of reward functions focus on a single domain, such as points or time.

However, in reality, things become more complex and the reward function needs to combine different aspects of the world the RL agent lives in. And these parts, being functions themselves, behave very differently: Some might exhibit polynomial or logarithmic behaviors. Some might describe the behavior of a system in terms of a propability distribution. Others might be perfectly linear.

The main question is now: How do I meaningfully combine different functions? Moreover, how do I do it in a way that allows me to add a weighting factor to them so that I can, for some designs, put an emphasis on different aspects of the overall system?

In my naive impression, I cannot simply put any weighting factor in front of two functions that behave very differently. I.e., if one function $f_1(x)$ performs linear and the other, $f_2(x)$ logarithmic, combining them as $f(x) = w_1 f_1(x) + w_2 f_2(x)$ won't make any sense: In adjusting $w_1$ from 0.5 to 0.6, I get — in terms of influence on the overall equation — a different impact in contrast to adjusting $w_2$ from 0.5 to 0.6. Since $f_1$ behaves linear, whereas $f_2$ behaves logarithmic, increasing $w_2$ by 0.1 has much more impact than increasing $w_1$ by 0.1.

For an concrete example, suppose $f_1(t)$ is the velocity of a car at $t$, and $f_2(t)$ is its volume in db(A). Perhaps I'd like a very fast car in one scenario, no matter the loudness, and a quiet car in another. Two different agents, one reward function, but different weights. In my naive view, I cannot assume that any $w$ is meaningfully linear in $[0.0;1.0]$, right?

Even more, I can have probability distributions, e.g., in computer networking, where traffic follows a "bursty" pattern. How can I include PDFs?

I am totally at loss here. Perhaps this is just a problem in my head, but to me, I need to account for the different behaviors of the function to get a “homogeneous” combination. Can this be done in a simple way, or do I need a device like a Hilbert space transformation in order to treat all these pieces in a $n$-dimensional sphere? If so, hwo do PDFs figure in here?