Can Gradient Descent be applied here to $f$?

31 Views Asked by At

If I have an objective function such as

$$ f(θ_0,θ_1) = \\ \sum_{i=1}^n \begin{bmatrix}X_i - X_{i-1} + \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \\ Y_i - Y_{i-1} - \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \end{bmatrix}^{T}\begin{bmatrix}X_i - X_{i-1} + \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \\ Y_i - Y_{i-1} - \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \end{bmatrix} $$

Let $\Delta X_i = X_{i}-X_{i-1}$ and analogously for $\Delta Y_i$

Then $$f(\theta_0,\theta_1) = \sum_{i=1}^n \left(\Delta X_i\right)^2+\left(\Delta Y_i\right)^2 + \frac{2}{n^2}\left(\theta_0+\theta_1X_{i-1}\right)^2\left(\Delta Y_{i-1}\right)^2 \\+ \frac{2}{n}\Delta X_i(\theta_0+\theta_1X_{i-1})Y_{i-1} - \frac{2}{n}\Delta Y_i(\theta_0+\theta_1X_{i-1})Y_{i-1} $$

I would like to approximate $\arg \min_{θ_0,θ_1}f(θ_0,θ_1)$ where $(X_i,Y_i)$ are given data points. Treating this as an optimization problem, is it possible to apply gradient descent or stochastic gradient descent?

Many of the examples I find for GD or SGD are on a simple functions such $g(x) = x^2, etc$.

Or if applying GD or SGD is not reasonable here what is the cause of concern?

If $n=1$, a critical point would be of the form $\theta_1 = - \frac{\theta_0}{X_{0}}$ but I am concerned with $n >> 1$.

Further than this $\theta_0, \theta_1$ would have some constraints but would like to understand this one part at a time.