The point estimator $\hat\theta$ of a parameter $\theta$ is some function of the sample $D=\{x_1,...,x_n\}$, $$\hat\theta=g(D)$$, since $\hat\theta$ depends on the sample $D$ we're using, so $\hat\theta$ is a random variable. And the bias of this estimator is $$bias(\hat\theta)=E[\hat\theta]-\theta_{true}$$, where $\theta_{true}$ is the true parameter, which is unknown.
Suppose I randomly draw the sample $D$ from function $$f(x)=\beta_1x+\beta_0$$, so the true parameters are $\beta_0,\beta_1$. I try to estimate the true parameters using the sample, but I absolutely don't know what regularity hides behind the sample, which means I don't know the samples are drawn from a linear function $f(x)$. And I have to choose a model to fit the sample, here I choose the constant function $$g(x)=\alpha$$, then I have the estimate $\hat\alpha$.
Here is my problem, my initial intention is to estimate the true parameters, which are $\beta_0,\beta_1$, but since I chose the wrong model $g(x)$ and now I end up with $\hat\alpha$,
With different $D$, I have different $\hat\alpha$, and I could still try to compute the bias of $\hat\alpha$, which is $$bias(\hat\alpha)=E[\hat\alpha]-\alpha_{true}$$, right? I don't think so, because there's no such a thing as $\alpha_{true}$, what do exist are $\beta_{0_{true}},\beta_{1_{true}}$, since $f(x)=\beta_1x+\beta_0$ is the truth, not $g(x)=\alpha$. Even if I can compute or analyze the $bias(\hat\alpha)$, I still don't think it's meaningful, because what's the point of evaluating the bias of the wrong parameter estimate?
In practice, what we have are just the plain data, without knowing anything about the true regularity hides behind, chances are that we might have pick up a wrong model to fit the data, ending up with the wrong parameter estimation, right?
By randomly draw sample from $f(x)$, I mean randomly pick $x_i\in[0,1]$ and $y_i=f(x_i)$, let's say the sample size is 2, so each sample is composed of two pairs of data $\{(x_1,y_1),(x_2,y_2)\}$.