If we have variables $X$, $Y$ and an outcome $O$ depends on them, why do we model it with $O = w_1X + w_2Y$ instead of $O = w_1Xw_2Y$? I recently described an outcome as the product of two weighted variables like $O = w_1Xw_2Y$ and thought "wait, isn't this kind of thing supposed to be a linear combination, am I doing something wrong?". I understand that $O = w_1Xw_2Y = w_1w_2XY = wXY$ but we can determine $w$'s factors $w_1$ and $w_2$ separately can't we? Should it be the best of both worlds $O = w_1X + w_2Y + w_3XY$? I would also appreciate a reference to where I can learn more about the general theory about these kinds of things.
For example, the outcome might be a person's decision, the variables may be different kinds of observations they made, and the weights/coefficients are psychological factors that determine how these kinds of observations lead to decisions. I thought a product $w_1Xw_2Y$ would be a good way to model it and don't understand why I should use a linear combination $w_1X + w_2Y$ instead.
Typically, statistical models are linear in the terms. This is not because all phenomena are linear; rather, it is because linear models are simple and intuitive. once we introduce randomness, intuition becomes much more helpful. Furthermore, linear models can be thought of as first-order approximations to the true relationship between $O$, $X$, and $Y$.
As for your question about adding an $XY$ terms, that is indeed useful. Models which include these interaction terms are important in many areas. If you know about drug interactions, then you can imagine how clinical trials include these interactions to assess how other factors might change a drug's efficacy.
My personal favorite reference for linear models is Weisberg's Applied Linear Regression (now in its fourth edition). Weisberg is intuitive and well-paced, especially for self-study.