Add weights to inputs of x-value function to optimize regression

233 Views Asked by At

Say I have $n$ functions (not the regression function) each with $n$ inputs. These functions compute the x-values.

The function is a simple summation function where the input is multiplied by a weight.

I want to choose a weight for each input to this function such that a regression is optimized. In other words, the weights would be used in calculating the x-value of each point. By choosing the right weights, each points x-value will be shifted in a way that optimizes the function.

I guess another way to put it would be the x-axis value is calculated using $n$ number of inputs where each input is multiplied by a weight.

How would I go about this?

For clarification, the y-value for each point on the plot is fixed. I am optimizing the regression by shifting x-values.

1

There are 1 best solutions below

1
On

In a simple linear regression you have $$ Y_i = \alpha + \beta x_i + \varepsilon_i $$ and one often assumes $\varepsilon_1,\ldots,\varepsilon_n\sim \text{i.i.d. } N(0,\sigma^2)$. The values of $(x_i,Y_i)$ are observed for $i=1,\ldots,n$; the values of $\alpha$ and $\beta$ are two be estimated. I use lower-case $x$ and capital $Y$ since we are treating $x_i$ as not being random. That may be unrealistic but for the fact that we're really trying to estimate the conditional distribution of $Y$ given $x$.

The assumption that $\varepsilon_1,\ldots,\varepsilon_n\sim \text{i.i.d. } N(0,\sigma^2)$ implies a far weaker assumption that is strong enough to get a certain optimality result described below. The weaker assumption is:

  • $\varepsilon_1,\ldots,\varepsilon_n$ are uncorrelated (not necessarily independent);
  • $\varepsilon_1,\ldots,\varepsilon_n$ all have expected value $0$ and equal finite variances (this is weaker than identical distribution).

In ordinary least squares one uses as estimates of $\alpha$ and $\beta$ the values $\hat\alpha$ and $\hat\beta$ that minimizes the sum of the errors, $\sum_{i=1}^n (Y_i - (\hat\alpha + \hat\beta x_i))^2$.

An important fact is that the mapping $(Y_1,\ldots,Y_n) \mapsto (\hat\alpha,\hat\beta)$ is linear. The Gauss–Markov theorem says that among all linear combinations of $(Y_1,\ldots,Y_n)$, the one that minimizes the mean squared error of estimation of either $\alpha$ or $\beta$ is the least-squares estimator. The hypotheses of the Gauss–Markov theorem include the "weaker" assumptions above.

In some contexts those assumptions do not hold and other estimators are better. One such situation is this: Multiple $Y$ values are observed for each $x$ value and what is reported in the data on which estimates are based is only the average $Y$ value for each $x$ value and the number of $Y$ values observed for each $x$ value. In that case you would want to use weighted least squares, the weight in each case being proportional to the number of observed $Y$ values for the particular $x$ value.

Postscript in response to recent additions to the question: If the $x$ values are $x_1\le\cdots\le x_n$ and the errors satisfy the assumptions following the typographical bullets above, then the mean squared error of $\hat\beta$ as an estimator of $\beta$ is minimized by putting half of the weight on each of the extreme $x$ values, the largest and the smallest. However, in so doing, one discards any evidence that may be found in the data that would indicate that the straight-line fit may be inappropriate.