To minimize an MSE, a common method is to perform a gradient descent on the objective. For example, the derivative is: $\frac{d}{dw} \sum_{i=1}^n (t_i - w x_i)^2 = \sum_{i=1}^n 2 (t_i - w x_i) x_i$. My question is what happens if we don't have a product that applied the chain rule. To be specific, we perform a gradient descent with: $\sum_{i=1}^n 2 (t_i - w x_i)$ ($x_i$ is not present here).
I can see that the location of the minimum point did not change. However, I feel like the convergence rate may be different. Would you recommend some ideas to analyze the convergence rate between one with chain rule and one without chain rule?
Thank you!