Assume that I have a function:
$$f_{w_1,w_2}(x)=w_2(w_1x)$$
I want to find the optimal point of function $f$, so I will follow the gradient:
$$\frac{\partial f}{\partial w_1} = w_2x, \frac{\partial f}{\partial w_2} = w_1x$$
Then I update the parameter: $$\Rightarrow \begin{cases} w'_1 = w_1-w_2x \\ w'_2=w_2-w_1x \end{cases}$$ $$\Rightarrow f_{w'_1w'_2}(x)=(w_2-w_1x)(w_1-w_2x)(x)$$
Now, let assume $w=w_1w_2$ then do the similar thing
$$f_w(x)=wx$$ $$\frac{\partial f}{\partial w}=x$$ $$w' = w-x$$ $$f_w(x) = (w-x)x$$
It's clearly that $f_{w'_1w'_2}(x)\ne f_w(x)$ which means by shortening the equation, we can get the better or worse optimization result. It sounds unclear to me, I think I am wrong somewhere, please tell me where to fix it.
If you set $w_1=w_2$ your equation should become $f_w=w^2x$