Firstly: having a lot of difficulty figuring out how to articulate this question due to lack of general math knowledge. There are multiple questions posed below, but I feel like if I knew more they could be condensed into a single question and am hoping someone can suggest and edit to the effect of the below. Thank you in advance for your consideration and assistance on this point!
Background
I'm taking a course which includes descriptive statistics. In that course they describe the method of calculating covariance and provide that equation. I find myself wondering - why did they choose to define it as they did (using multiplication instead of addition between the terms in the numerator - not even sure if that description is accurate).
My question?
Are equations for things like covariance derived from looking at phenomenon and 'cracking the code' of how those phenomenon could be described mathematically via a proof? Or does one, at some level of mathematical skill, say: I want to model this phenomenon and I want that model to have these features and qualities to its output, therefore, I choose this particular structure to achieve that goal and then I prove that functionality through a proof?
If the latter case - what progression of mathematical learning develops that skillset? Is the same skillset used in both cases?
Why I ask
If I understand the motivation of the creator of the covariance equation, I could compare it to my own motivation and perhaps come up with a different approach to the same problem that better fits my own goals because maybe our goals are similar but not the same...
Thank you again for any advice on how to simplify this..