a) Question:
I have two functions $f_1(x, y)$ and $f_2(x, y)$ which measure the similarity of $x \in R^N$ and $y \in R^N$. For both function it holds that if the function value is lower, then the inputs are more similar. Now if I want to optimize a specific $\tilde{x}$ given a constant input $\tilde{y}$ with regards to $f_1$ OR $f_2$ I can do this (for instance: via gradient descent).
But now I want to optimize $\tilde{x}$ given $\tilde{y}$ with regards to $f_1$ AND $f_2$, and additionally, I want to be able to decide how much I want to weight each of these functions in my optimization. A naive approach would be to weight the functions like this: $\ \ \ f_3(x,y) = \lambda \cdot f_1(\cdot) + \beta \cdot f_2(\cdot) \ \ \ $, and then just optimize with regards to $f_3$.
The problem with this approach is that the range of function values produced by $f_1$ and $f_2$ have an incredible different distribution and range. This comes, because $f_2(\cdot)$ is actually:
$f_2(x,y) = f_1(g(x), g(y)) \ \ \ $ with an unknwon function $\ g(\cdot)$.
However, we have a lot of empirical data of $x$ and $y$, and we can measure the distribution of $g(\cdot)$ given our input data.
So my question is: What can I do to get an $f_3$, such that I optimize over $f_1$ and $f_2$, so that both function $f_1$, $f_2$ have approximately an equal influence on my result.
.
b) What I thought of:
I though of bringing the input data for the second function into the same range as for the first function. This would imply, that also the function values of $f_1$ and $f_2$ would be in the same range. So basically, I could try scaling $g(x)$ to be in the same range as $x$.
From my empirical data, $x$ has a range in [0, 1] and is roughly normal distributed with a mean close to the middle of the intervall.
From my empirical data, $g(x)$ has a range in [0, 10000] and seems to be exponentional distributed. Most mass is in the range of [0, 0.1] and only a few outliers are above 0.1 (see the max range value :O).
Therefore, I thought of ignoring the outliers and scale $g(x)$ in the following way:
$k(x) := \max(g(x), 0.1) \, / \, 0.1$
And then set $f_3(\cdot)$ to: $\ \ f_3 := 0.50 \cdot f_1(x,y) + 0.50 \cdot f_2(k(x),k(y))$.
Now does this make any sense? Do I have to match the distribution as well as the range?
Please help :S