I am given the Hermite interpolation formula directly in my text book without ANY explanations about how it was first made (obviously it was somehow constructed for the first time with some sort of intuition ) .
the formula for n+1 data from $x_0$ till $x_n$ with $f(x_0)$ till $f(x_n)$ and with $ f^{\prime}(x_0)$ till $f^{\prime}(x_n)$ $$H_{2n+1}(x) = \sum_{j=0}^n f(x_j)H_{n,j}(x) + \sum_{j=0}^n f^{\prime}(x_j)\hat H_{n,j}(x)$$
where $$H_{n,j} = [1 − 2(x − x_j)L^{\prime}_{n,j}(x_j)]L_{n,j}^2(x) $$
$$ \hat H_{n,j}(x) = (x-x_j) L_{n,j}^2(x) $$ I DO understand the proof and why the polynomial agrees with data and their derivatives.
i DO understand the intuition behind Lagrange polynomials.
so I am looking for the intuition behind the formula (how it was made) specially the construction of $H$ and $\hat H$. so instead of memorizing it i can learn it!
Let $P(x):=H_{2n+1}(x).$
Assume we are given a set of data $S=\{x_0,x_1,\ldots,x_n\}.$
We want $P(x)$ to pass through $f(x_j)$'s and have the same derivative as $f(x)$ for all data in $S.$
Note that, since we have $2(n+1)$ conditions for $P(x), P(x)$ should be at least of degree $2n+1$ since a $2n+1-$ degree polynomial has $2n+2$ coefficients, which can be modified to meet our conditions.
So, just like Lagrange polynomials intuition, we construct a formula like this
$$P(x) = \sum_{j=0}^n f(x_j)A_j(x) + \sum_{j=0}^n f^{\prime}(x_j) B_j(x)$$
Note 1: $A_j$ corresponds to the term in the sum whose coefficient is $f(x_i)$ and $B_j$ corresponds to the term in the sum whose coefficient is $f^{\prime}(x_i).$
Note 2: in all formulae $x_i,x_j \in S.$
Now, since we want $P(x_i)=f(x_i),\forall i$ then $$1.\quad A_j(x_i) = \begin{cases} 1, & {\text{ if }i=j} \\ 0, & {\text{ else } (\text{ if }i\ne j)}\\ \end{cases}$$
So, when we evaluate $P(x)$ at $x_i,$ the corresponding term containing $f(x_i)$ in $P(x)$ appears and all the other terms in $\sum\limits_{j=0}^nf(x_j)A_j(x)$ become zero.
When calculating $P(x_i),$ we don't want any of the derivatives to appear in the result, so
$$2.\quad B_j(x_i)=0,\forall i$$
Since $$P^{\prime}(x)=\sum_{j=0}^nf(x_j)A^{\prime}_j(x)+\sum_{j=0}^n f^{\prime}(x_j)B^{\prime}_j(x),$$ just as above, we determine $A^{\prime}_j$ and $B^{\prime}_j$
$$3. A^{\prime}_j(x_i)=0,\forall i$$
$$4.\quad B^{\prime}_j(x_i) = \begin{cases} 1, & {\text{ if } i=j} \\ 0, & {\text{ else }( \text{ if }i\ne j)}\\ \end{cases}$$ Now, note that the behavior of $A_j$ and $B_j$ is close to that of a Lagrange basis polynomial, that is: assume $A_j = L_j.$ Then $L_j$ satisfies condition 1. but it obviously does not meet condition 3 . (also remember that $\deg P(x)=2n+1.$).
So, assuming each $A_j$ has the same degree, and assuming that we want to use the property of a Lagrange basis polynomial in the construction of the $A_j,$ then
$$A_j=(a_j\cdot x+b_j)\cdot L^2_j$$
Note, by using $L^2$ instead of $L,$ the degree of $A$ rises from $n$ to $2n$ and, by multiplying $A$ with line $(a_i\cdot x+b_i), A$ has the power $2n+1.$
We have divided $A$ into two factors: one of degree $1$ and one of degree $2n$ . WHY?
Since we have only conditions $1$ and $3$ for the construction of $A$ and so dividing $A$ into the predetermined (meaning it does not have unknowns like $a_j$ and $b_j$) Lagrange basis polynomial and a line $(a_j\cdot x+b_j),$ with two unknowns should give us a unique answer for $A_j$.
This is also the case for $B_j$.
Now, assuming $A_j$ is of the form $A_j = (a_j\cdot x+b_j)\cdot L^2_j,$ and solving the equations for the conditions $1$ and $3:$
For condition $1,$ we know
$$A_j=(a_j\cdot x+b_j)\cdot L^2_j$$
If $i\ne j,$ then $A(x_i) = 0$ since, then $L_j(x_i) = 0,$ which makes $A(x_i) = 0,$ regardless of $a_j$ or $b_j,$ which gives us no information about $a_j$ or $b_j$.
If $i=j,$ then $A(x_j)=1,$ so
$$a_j(x_jL_j^2(x_j))+b_jL_j^2(x_j)=1.$$
Since $L_j(x_j) = 1,$ then
$$a_j x_j + b_j = 1\\ \text{ so }\;\boxed{b_j=1-a_jx_j}$$
Now, for condition $3,$ we have
$$A^{\prime}_j(x)=a_jL^2_j(x)+2(a_j x + b_j)L_j(x)L^{\prime}_j(x)$$
If $i\ne j,$ again, since $L_j(x_i)=0,$ we do not get any information about $a_j$ or $b_j.$
If $i=j,$ then $L_j(x_j)=1,$ so
$$a_j+2(a_jx_j+b_j)L^{\prime}(x_j)=0\\ \text{by substituting} \; b_j \; \text{from above we have} \\ \boxed{a_j=-2L^{\prime}_j(x_j)}, $$
thus
$$ b_j = 1+2x_j L^{\prime}_j(x_j) $$
by substituting $\color{red}{a_j}$ and $\color{green}{b_j}$ in $A_j$ we get:
$$A_j(x)=(\boldsymbol{\color{red}{-2L^{\prime}_j(x_j)}}x+\boldsymbol{\color{green}{1+2x_j L^{\prime}_j(x_j)}})L^2_j(x)$$
so,
$$A_j(x) = [1 − 2(x − x_j)L^{\prime}_{n,j}(x_j)]L_{n,j}^2(x)$$
So, we have found $A_j(x)$ which is technically the same as $H_{j}$ or $H_{n,j}.$
In the same way, using conditions $2$ and $4$ we can find $B_j(x):$
$$ B_j(x) = (x-x_j) L_{n,j}^2(x)$$
So, we have found $\hat H_{n,j}(x)$