How to weight two functions if I want to optimize over both of them? (different value range + different value distribution)

48 Views Asked by At

a) Question:

I have two functions $f_1(x, y)$ and $f_2(x, y)$ which measure the similarity of $x \in R^N$ and $y \in R^N$. For both function it holds that if the function value is lower, then the inputs are more similar. Now if I want to optimize a specific $\tilde{x}$ given a constant input $\tilde{y}$ with regards to $f_1$ OR $f_2$ I can do this (for instance: via gradient descent).

But now I want to optimize $\tilde{x}$ given $\tilde{y}$ with regards to $f_1$ AND $f_2$, and additionally, I want to be able to decide how much I want to weight each of these functions in my optimization. A naive approach would be to weight the functions like this: $\ \ \ f_3(x,y) = \lambda \cdot f_1(\cdot) + \beta \cdot f_2(\cdot) \ \ \ $, and then just optimize with regards to $f_3$.

The problem with this approach is that the range of function values produced by $f_1$ and $f_2$ have an incredible different distribution and range. This comes, because $f_2(\cdot)$ is actually:

$f_2(x,y) = f_1(g(x), g(y)) \ \ \ $ with an unknwon function $\ g(\cdot)$.

However, we have a lot of empirical data of $x$ and $y$, and we can measure the distribution of $g(\cdot)$ given our input data.

So my question is: What can I do to get an $f_3$, such that I optimize over $f_1$ and $f_2$, so that both function $f_1$, $f_2$ have approximately an equal influence on my result.

.

b) What I thought of:

I though of bringing the input data for the second function into the same range as for the first function. This would imply, that also the function values of $f_1$ and $f_2$ would be in the same range. So basically, I could try scaling $g(x)$ to be in the same range as $x$.

From my empirical data, $x$ has a range in [0, 1] and is roughly normal distributed with a mean close to the middle of the intervall.

From my empirical data, $g(x)$ has a range in [0, 10000] and seems to be exponentional distributed. Most mass is in the range of [0, 0.1] and only a few outliers are above 0.1 (see the max range value :O).

Therefore, I thought of ignoring the outliers and scale $g(x)$ in the following way:

$k(x) := \max(g(x), 0.1) \, / \, 0.1$

And then set $f_3(\cdot)$ to: $\ \ f_3 := 0.50 \cdot f_1(x,y) + 0.50 \cdot f_2(k(x),k(y))$.

Now does this make any sense? Do I have to match the distribution as well as the range?

Please help :S