Bias parameter in machine learning linear regression example

364 Views Asked by At

I am studying a linear regression example for machine learning. It makes the following definition:

As the name implies, linear regression solves a regression problem. In other words, the goal is to build a system that can take a vector $\mathbf{x} \in \mathbb{R}^n$ as input and predict the value of a scalar $y \in \mathbb{R}$ as its output. The output of linear regression is a linear function of the input. Let $\hat{y}$ be the value that our model predicts $y$ should take on. We define the output to be

$$\hat{y} = \mathbf{w}^T \mathbf{x}$$

where $\mathbf{w} \in \mathbb{R}^n$ is a vector of paramters.

Parameters are values that control the behaviour of the system. In this case, $w_i$ is the coefficient that we multiply by feature $x_i$ before summing up the contributions from all the features. We can think of $\mathbf{w}$ as a set of weights that determine how each feature affects the prediction. If a feature $x_i$ receives a positive weight $w_i$, then increasing the value of that feature increases the value of our prediction $\hat{y}$.

It then says the following:

It is worth noting that the term linear regression is often used to refer to a slightly more sophisticated model with one additional parameter -- an intercept term $b$. In this model

$$\hat{y} = \mathbf{w}^T \mathbf{x} + b,$$

so the mapping from parameters to predictions is still a linear function but the mapping from features to predictions is now an affine function. This extension to affine functions means that the plot of the model's predictions still looks like a line, but it need not pass through the origin. Instead of adding the bias parameter $b$, one can continue to use the model with only weights but augment $\mathbf{x}$ with an extra entry that is always set to $1$. The weight corresponding to the extra $1$ entry plays the role of the bias parameter.

This is the first part that I have a question about:

so the mapping from parameters to predictions is still a linear function but the mapping from features to predictions is now an affine function.

Can someone please clarify this more clearly?

This is the second part I have a question about:

Instead of adding the bias parameter $b$, one can continue to use the model with only weights but augment $\mathbf{x}$ with an extra entry that is always set to $1$.

So the vector $\mathbf{x}$ would just have one additional element (a $1$ value) at the end? And this means we can avoid the bias parameter and just have $\hat{y} = \mathbf{w}^T \mathbf{x}$?

Thank you.

1

There are 1 best solutions below

0
On BEST ANSWER

mapping from parameters to predictions is still a linear function.

Note that the parameters are $(w,b)$, hence we have

$$\hat{y}=(w^T, b)\begin{bmatrix} x \\ 1\end{bmatrix}$$

which is linear with respect to the parameter.

mapping from features to predictions is now affine

However, the original features is just $x$.

$$\hat{y}=w^Tx+b$$

There is a translation by $b$ away from the origin. Hence it is affine.

For your second question, you can either append the last entry to be $1$ or first entry to be $1$ as long as you are consistent.

$$\hat{y}=(w^T, b)\begin{bmatrix} x \\ 1\end{bmatrix}= (b, w^T)\begin{bmatrix}1\\ x \end{bmatrix}$$

and reduce the analysis to the earlier case.