differential dynamic programming : Intuition, key idea and difference from dynamic programming

38 Views Asked by At

I cam across 'Differential Dynamic Programming' in a course on Optimal Control. In this course , we were introduced to Dynamic Programming prior to DDP.

I went through the Wikipedia Post on Differential Dynamic Programming :

https://en.wikipedia.org/wiki/Differential_dynamic_programming

and also through the section on DDP in the prescribed textbook which is as follows :

https://sites.engineering.ucsb.edu/~jbraw/mpc/MPC-book-2nd-edition-4th-printing.pdf

However, I failed to capture the intuition behind DDP and what problem does it solve that distinguishes it from DP.

I noticed that , in solving an OCP ,through DP , we start from the 'cost-to-go' or 'value' function of the terminal stage and obtain the control at that stage as a function of the state at that stage (by minimizing the cost-to-go function). And then do a forward sweep from the initial state by applying the obtained the sequence of controls.

In the afore-mentioned textbook , I infered that the difference being highlighted is that the optimal control input is always available as a function of the state. (I might be wrong , but it seems that it is available as a Linear transformation of the state vector). Is the difference here, that in DDP we do not solve for the feedback(control input) at a point but start with a guess of it and keep improving it ? If it isn't then , I would like to know what the difference between the key idea of DDP and DP is.

In the wikipedia , post, the , the control is chosen as the value which minimizes the second order taylor expansion. I understand the rational behind this but could not get the need for iteration here. In a hand-waving fashion I understand that the iteration helps in smoothing the error resulting from the error in the second order taylor expansion but I am not sure on this point and would like to get it verified.

Looking forward to reading your thoughts on this.