What is the correct format for the formula of linear regression?

Question

What is the correct format for the formula of linear regression?

72 Views Asked by Bumbble Comm At 30 Mar 2026 - 8:52

Lets say you have $i$ independent variables, ($y_1$, $y_2$, ... $y_i$), and each of them have the SAME two predictors, $x_1$ and $x_2$. I thought that the formula for the linear regression model for each $y$ would be:

$$ y_i = \beta_0 + \beta_{i,1}x_1 + \beta_{i,2}x_2 + \epsilon_i $$

But based on Wikipedia, the formula looks like it would be:

$$ y_i = \beta_0 + \beta_1x_{i,1} + \beta_{2}x_{i,2} + \epsilon_i$$

Here is a picture from another site:

This formula suggests there are $n*p$ predictor variables with $p$ unique ones for each response variable.

Why is my formula incorrect? Shouldn't the coefficients change depending on which response variable I am trying to model?

Original Q&A

There are 2 best solutions below

**user9464** · Answer 1 · 2021-06-23 13:54:04

In a Layman's term Linear Regression is about finding a best fit line for your statistical data distribution for a given set of input variables/predictiors and response/out-puts. The best fit is achieved by tweaking the slope(in a single variable model with only one predictor) i.e $y_i = mx_i +c$ where m is the slope and c is the bias. So what you eventually do is STEP 1: start with a random value for m and c. STEP 2: find the response given by your model for corresponding inputs. STEP 3: use a loss function (like MSE-Mean squared error)to determine how well did your model perform as compared to the actual data. You again start from step 1 but with a different value for m amd c and then again carry out step 2 amd 3 till you find out that value of m and c for which the loss function gives minimum value.

Now if you noticed properly in each iteration we keep the m and c constant while changing the value of $x$ and recording the response $y$ and eventually we find that optimal value of m and c for which our model works the best.

I took the simple linear regression equation for the ease of understanding.

**Bumbble Comm** · Answer 2 · 2021-06-24 00:39:34

Assume that the data generating process is $$ Y = \beta_0 + \beta_1X + \epsilon, $$ here $X$ is a variable (either random or not) and $\epsilon$ is a random variable. The $\beta_0$ and $\beta_1$ are unknown parametrs. Now, you observe $n$ realizations of this process that results in $n$ data points $\{(y_i, x_i)\}_{i=1}^n$. Namely, you want to use these $n$ data points in order to estimate the unknown $\beta_0$ and $\beta_1$. Therefore, you fit $n$ linear equations of the form $$ y_i = \beta_0 + \beta_1x_i,\quad i=1,...,n $$ that you want to solve w.r.t. $\beta_0$ and $\beta_1$. Clearly there is there is no unique solution, hence you use the orthogonal projection which results in the OLS estimators.

Using your logic, you suggest that $\beta_0$ and $\beta_1$ vary by observation. This is different approach andresembles the random effects models https://en.wikipedia.org/wiki/Random_effects_model that is different from the classical linear regression problem.

What is the correct format for the formula of linear regression?

There are 2 best solutions below

Related Questions in STATISTICS

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions