If I have an objective function such as
$$ f(θ_0,θ_1) = \\ \sum_{i=1}^n \begin{bmatrix}X_i - X_{i-1} + \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \\ Y_i - Y_{i-1} - \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \end{bmatrix}^{T}\begin{bmatrix}X_i - X_{i-1} + \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \\ Y_i - Y_{i-1} - \frac{1}{n}(θ_0 + θ_1X_{i-1} )Y_{i-1} \end{bmatrix} $$
Let $\Delta X_i = X_{i}-X_{i-1}$ and analogously for $\Delta Y_i$
Then $$f(\theta_0,\theta_1) = \sum_{i=1}^n \left(\Delta X_i\right)^2+\left(\Delta Y_i\right)^2 + \frac{2}{n^2}\left(\theta_0+\theta_1X_{i-1}\right)^2\left(\Delta Y_{i-1}\right)^2 \\+ \frac{2}{n}\Delta X_i(\theta_0+\theta_1X_{i-1})Y_{i-1} - \frac{2}{n}\Delta Y_i(\theta_0+\theta_1X_{i-1})Y_{i-1} $$
I would like to approximate $\arg \min_{θ_0,θ_1}f(θ_0,θ_1)$ where $(X_i,Y_i)$ are given data points. Treating this as an optimization problem, is it possible to apply gradient descent or stochastic gradient descent?
Many of the examples I find for GD or SGD are on a simple functions such $g(x) = x^2, etc$.
Or if applying GD or SGD is not reasonable here what is the cause of concern?
If $n=1$, a critical point would be of the form $\theta_1 = - \frac{\theta_0}{X_{0}}$ but I am concerned with $n >> 1$.
Further than this $\theta_0, \theta_1$ would have some constraints but would like to understand this one part at a time.