Shannon entropy is defined as the average content of information. Where information content is defined as $Q=-k\log(P_i)$, $k>0$,
$$ S = \langle Q \rangle = \sum_{i=1} Q_i P_i = -k\sum_{i=1} P_i\log(P_i) $$
That being said, my question is why $\langle Q\rangle = \sum_{i=1} Q_i P_i $? More specifically, why is the information content multiplied by the probability of that information content itself? If there are any resources that you also suggestion, I would appreciate it. Thank you!
The angle brackets around $Q$ mean that we are taking the expected value of the $Q$ distribution. For a random variable $X$ with probability distribution $\{P_i\equiv P(X_i)\}$, the definition of expected value (https://en.wikipedia.org/wiki/Expected_value) is:
$$\langle X\rangle=\sum_i X_iP(X_i)=\sum_iX_iP_i$$