How do we mathematically prove that the expectation value of an operator is the weighted average,
$$ \langle\hat{A}\rangle=\langle \psi|\hat{A}|\psi\rangle=\sum_{n}a_{n}P(a_{n}) \space \space \space? $$
Let's say,
$\phi_{n}$ are the eigenvectors of the state, then
$\hat{A}|\phi_{n}\rangle=a_{n}|\phi_{n}\rangle$ and $P(a_{n})=|\langle\phi|\psi\rangle|^{2}$
Given some discrete spectrum of non-degenerated eigen-values/ eigenstates $(a_i)_{i\in I}$ and $(|\phi_i\rangle)_{i\in I}$ associated to observable (hermitian operator) $\hat A$ one can decompose the state,
\begin{equation} |\psi\rangle = \sum_i |\phi_i\rangle \end{equation}
\begin{equation} \hat A|\psi\rangle = \sum_i \hat A|\phi_i\rangle = \sum_i a_i |\phi_i\rangle \end{equation}
Considering an orthogonal basis eigenstates, \begin{equation} \langle \phi_i | \phi_k \rangle = \delta_{ik} \mathcal P(a_i) \end{equation}
Then one gets, \begin{equation} \langle\psi|\hat A|\psi\rangle = \sum_{i,k} \langle\phi_i|\hat A|\phi_k\rangle = \sum_{i,k} a_k \langle\phi_i|\phi_k\rangle = \sum_i a_i \mathcal P(a_i) \end{equation}
The last equality resulting from orthogonality.
The same holds for a continuous spectrum.