I want to minimize a function $f \, : \, \mathbb{R}^{N} \, \longrightarrow \, \mathbb{R}$ (with $N \in \mathbb{N}^{\ast}$. In my problem, $N = 315$). I know that $f$ is differentiable on $\mathbb{R}^{N}$ and convex. Numerically, when I compute the gradient of $f$ at a given point, I have noticed that the coordinates of $f$ have different orders of magnitude ($\frac{\partial f}{\partial x_{1}} \gg \frac{\partial f}{\partial x_{2}}$ at every point). In practice, when I minimize $f$ using a gradient descent ($x_{k+1} \, \leftarrow \, x_{k} - t_{k}\nabla f(x_{k})$, with $t_{k}$ determined using a backtracking line-search), $x_{2}$ almost do not change through the iterations. I have been told that my problem is badly scaled and that I can address this issue by minimizing $\tilde{f}(x) = f(Dx)$ where $D$ is a diagonal matrix with positive coefficients. Doing so, I have :
$$ \nabla \tilde{f}(x) = D \nabla f(Dx) $$
for all $x \in \mathbb{R}^{N}$. My question is : how do I choose this matrix $D$ ? Are there (empirical) rules to choose this matrix ?