I often program with optimization problems, using well-known methods such as steepest descent, and conjugate gradient search. Now I am wondering a very simple rule $$ x_i \leftarrow x_i + \frac{-\mu}{ \frac{\partial f(x_1, x_2, \cdots, x_N)}{ \partial x_i}}, \quad \textrm{for } i=1,2,\cdots, N. $$ where $\mu$ is small positive number (learning rate). Or we can write the delta of all variables by $$ \Delta x= - \mu \begin{bmatrix} \frac{ \partial x_1}{\partial f(x_1, x_2, \cdots, x_N)} \\ \frac{ \partial x_2}{\partial f(x_1, x_2, \cdots, x_N)} \\ \cdots\\ \frac{ \partial x_N}{\partial f(x_1, x_2, \cdots, x_N)} \\ \end{bmatrix} $$
The amount of the changing of $x_i$ is inverse proportional to the derivative $\partial f/ \partial x_i$. My intuition: A small derivative means the value of $f$ is not sensitive to $x_i$, thus one should make a big change on $x_i$ to improve $f$; Vice versa.
I programmed this rule and it works well(a relatively-complex function with 23 variables). Why such rule is not mentioned in textbook? I googled it without any relevant pages. Any comments on this simple optimization method?