Linear Regression - Repeating value

95 Views Asked by At

So there is a set representing training hours per day and number of won tournaments for 5 randomly chosen players.

$$ \begin{array}{|c|c|c|c|} \hline hours & 1 & 1 & 2 & 3 & 3\\ \hline won & 0 & 1 & 2 & 2 & 5\\ \hline \end{array} $$

The task is to fit linear regression model. Mainly: "State the model and estimate the regression coefficients"

Later on I have to find out how many wins we expect from a person who was training 5 hours per day.

I tried to solve it the following way

So we have a formula $y_i = \beta_0 + \beta_1x_i + \epsilon$

I tried getting the coefficients using the following formulas: $$\beta_1 = \frac{\sum(x_i-\bar{x})(y_i-\bar{y})}{\sum(x_i-\bar{x})^2}$$ $$\beta_0 = \bar{y} - \beta_1\bar{x}$$

As $\bar{x} = \bar{y} = 2$ I got $\beta_1 = \frac{3}{2}$ and $\beta_0 = -1$

Hence

$$\hat{y} = \frac{3}{2}x - 1$$

But I assume that it is no the correct way to solve when we face repeating values or I did forget about something that need to be also used here.

So should it be done other way?

1

There are 1 best solutions below

2
On BEST ANSWER

Repeating points are just fine. It's equivalent to being able to weight a point. In this case, the weight is the number of times the point is repeated.