I am reading that how the MSE of an estimator $\hat{\theta}$ of $\theta$ can be expressed as $E(\hat{\theta} - \theta)^2$. Then this can be further simplified to $ (E[\hat{\theta}] - \theta)^2 + Var[\hat{\theta}],$ where the first term is the square of the bias.
I am trying to understand that what the expectation is over what. An elaborate explanation over what is happening here or an appropriate link will be very welcome.
Extended comment: Illustrating MSE, Variance, and Bias
Here is the simplest non-trivial example I can think of immediately. Suppose $X_1, \dots, X_5$ are iid $Unif(0, \theta).$ The MLE of $\theta$ is $\hat \theta = \max(X_i).$ However $E(\hat \theta) = (5/6)\theta,$ so $\hat \theta$ is biased for $\theta.$
In this simple case, you should be able to find $E(\hat \theta - \theta)$ and $Var(\hat \theta)$ analytically to verify the relationship between MSE, bias, and variance. I'm hoping that doing that simple exercise will clarify the issues for you. I'm not quite sure whether my brief Comment above is enough.
Here is a brief simulation in R of 100,000 observations of $\hat \theta,$ where $n = 5$ and $\theta = 10.$ (Of course, $\hat \theta$ can be made unbiased by multiplying it by $6/5,$ but we do not include that in the simulation.)
Also, notice that double the sample mean of the five observations, call it $\tilde \theta,$ is an unbiased estimator, but with considerably higher MSE than $\hat \theta.$ Speaking to the point of your question, $\tilde \theta$ has a different support than does $\hat \theta$. (However, the PDF of $\tilde \theta$ is messy; you can get $E(\tilde \theta)$ and $V(\tilde \theta)$ using rules for mean and variance of $\bar X.$)
The figure below plots these simulated estimators using the same scale for easy comparison of their approximate distributions.