I'm currently working on an iterative approach to solving an optimization problem. The implementation seems to be calculating biased directions so a colleague suggested I look into parameter scaling. I was able to find some basic material on the matter in a textbook (Gill, Murray, Wright. 1982. Practical Optimization. 7.5: 273-275) as well as this website: http://www.esrf.eu/computing/scientific/FIT2D/MF/node2.html which seem to be talking about the same thing.
Background:
For $f: \mathbb{R}^n \mapsto \mathbb{R}$, we have the problem: \begin{equation} \underset{\vec{x}^* \in \mathbb{R}^n}{\text{min}} f(\vec{x}) \tag{1} \end{equation}
we have scaled parameter $\vec{y} \in \mathbb{R}^n$ for scaling matrix $D \in \mathbb{R}^{n \times n}$ and translating vector $\vec{c} \in \mathbb{R}^n$ \begin{equation} \vec{y} = D^{-1}(\vec{x} - \vec{c}) \tag{2} \end{equation}
from which we can obtain our 'original' parameter $\vec{x}$ using: \begin{equation} \vec{x} = D \vec{y} + \vec{c} \tag{3} \end{equation}
So far, so good. By smartly choosing $D$ and $\vec{c}$, I can have $y_1, y_2 ... \in [-1, 1]$ which is exactly what I'd like to have.
Missing information/lack of understanding:
Now as I seem to have understood it, the idea is to scale $\vec{x}$ to $\vec{y}$ and to solve the problem: \begin{equation} \underset{\vec{y}^* \in \mathbb{R}^n}{\text{min}} f(\vec{y}) \tag{4} \end{equation} and then to scale back to obtain our desired solution $\vec{x}^*$.
However, this doesn't make sense to me: by scaling $\vec{x}$ to $\vec{y}$, have we not simply changed the starting/initial point with which we run the minimization algorithm? The way I see it, we have not changed $f(\vec{x})$ or its gradient at all, which means it is the same problem but with a different starting point.
If I were not looking for a global minimum, how does this idea make sense? What am I overlooking about this concept?
Edit
I followed the suggestion on this site: http://www.fitzgibbon.ie/optimization-parameter-scaling and my algorithm converged correctly. Now I only wish to understand the theory behind how this works. Any help would be much appreciated!
End of edit
Thanks in advance, MotiveHunter


Yes, it's the same problem. You can even scale the initial condition, too. The point of scaling here is that your algorithm can often be formulated way simpler if you can assume that all variables are between 0 and 1, and not e.g. take values greater than a double variable can hold...
What you maybe wanted to do is to scale the increment $x_{i+1}-x_i$ of a local optimization algorithm by some step size $s$, which can increase stability of the algo and/ or speed it up? The buzzword would then be "step size control", not scaling...