In the paper here
http://www.ems.bbk.ac.uk/for_students/bsc_FinEcon/fin_economEMEC007U/VAR.pdf
It shows VAR(p) model as
$$ W_t = A_1W_{t-1} + A_2W_{t-2} + ... + A_pW_{t-p} + \epsilon_t $$
But then it makes a simplification and says the formula above equals to
$$ (I - A_1L - A_2L^2 - ... - A_pL^p)W_t = \epsilon _t $$
How does the author make this switch? Are all $W_t$ vectors somehow combined to give $L$? But then why is taking 2nd, 3rd, etc powers come into play?
Thanks,
A previous article in the same site-series, devoted to (scalar) AR-MA processes, explains that $L$ is delay operator: http://www.ems.bbk.ac.uk/for_students/bsc_FinEcon/fin_economEMEC007U/arma.pdf
Further, when one applies Z-transform (or, basically equivalent, Generating Functions), the $n-$delay operator maps to $z^{-n}$ (or $z^n$).