I want to solve an optimization problem using multidimensional Rosenbrock function and gradient descent algorithm. The Rosenbrock function is given as follows:
$$ f(x) = \sum_{i=1}^{n-1} \left( 100 \left( x_{i+1} - x_i^2 \right)^2 + ( x_i - 1)^2 \right) $$
and its partial derivatives are
$$ \partial_{x_i} f (x) = \begin{cases} -400 x_i \left( x_{i+1} -x_1^2 \right) + 2 ( x_i - 1) & \text{if } i=1 \\\\ -400 x_i \left( x_{i+1} - x_1^2 \right) + 2 (x_i - 1) + 200 \left( x_i - x_{i-1}^2 \right) & \text{if } 1 < i < n \\\\ \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\,\, 200 \left( x_i - x_{i-1}^2 \right) & \text{if } i=n \end{cases} $$
I understand that I need to calculate the partial derivatives of each parameter of the Rosenbrock function. For the "normal" Rosenbrock function, I would know how to do that but for the multidimensional variant, I do not understand why it is distinguished between $i=1$, $1<i<n$ and $i=n$ in the solution above? Could anyone provide some additional (more detailed) thoughts on how to calculate the partial derivatives in this case, especially regarding the summation in $f(x)$? Or a hint which rule is applied here?
So partial derivative is a linear operation, which means it can distribute in the summation sign just like it distributes with normal summation $$ \partial_{x_k}f(x) = \partial_{x_k}\sum_{i=1}^{n-1} \left( 100 \left( x_{i+1} - x_i^2 \right)^2 + ( x_i - 1)^2 \right)\\ =\sum_{i=1}^{n-1} \partial_{x_k}\left(100 \left( x_{i+1} - x_i^2 \right)^2\right) + \partial_{x_k}\left( ( x_i - 1)^2 \right)\\ $$ The key is to recognize that $x_k$ and $x_i$ are independent variables unless $k=i$. So you can write $$ \frac{\partial {x_i}}{\partial {x_k}}=\delta_{ik}=\begin{cases} 1, & \text{if}\ i=k \\ 0, & \text{otherwise} \end{cases} $$ Then you can apply chain rule and product rule to get the value in the bracket.
So that's what brings difference to $k=1$,$k=n$ case. In the edge case $k=1,n$ the variable $x_k$ will only occur twice, otherwise, $x_k$ will occur 3 times,