Why does having $X_0 = 1$ mean that the hyperplane includes the origin?

1.3k Views Asked by At

I was just reading this question on stats.stackexchange, because I had the same question about why having $X_0 = 1$ means that the hyperplane includes the origin, and why it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0$) if the constant is not included in $X$. However, I don't think the answer actually explains this; rather, it seems like it just restates it in a verbose way. And judging by the mathematics involved, I think that it would be a more appropriate question for the minds at math.stackexchange. So I am looking for a clear explanation of why this is the case; that is, why does having $X_0 = 1$ mean that the hyperplane includes the origin, and why it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0$) if the constant is not included in $X$?

The textbook section is 2.3.1 Linear Models and Least Squares from here. The relevant parts are all at the beginning of section 2.3.1.


EDIT:

The part that I'm interested in is

Often it is convenient to include the constant variable $1$ in $X$, ...

and

In the $(p + 1)$-dimensional input-output space, $(x, \hat{Y})$ represents a hyperplane. If the constant is included in $X$, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0)$.

3

There are 3 best solutions below

10
On BEST ANSWER

The equation $y=\beta_0+\beta_1 x_1+\cdots\beta_p x_p$, where the unknowns are $(x_1,\dots,x_p,y)$, describes an affine hyperplane $H$ of the affine space $\Bbb R^{p+1}$, that does not pass through the origin if $\beta_0\ne 0$.

The equation $y=\beta_0x_0+\beta_1 x_1+\cdots\beta_p x_p$, where the unknowns are now $(x_0,x_1,\dots,x_p,y)$, describes an affine hyperplane of the affine space $\Bbb R^{p+2}$ that does pass through the origin (hence it's a subspace of dimension $p+1$ of $\Bbb R^{p+2}$ considered as a vector space).

However, to describe the same model, you have the additional constraint $x_0=1$, and these two equations together describe an affine subset $S$ of $\Bbb R^{p+2}$ that has $p$ dimensions. It's simply an embedding in $\Bbb R^{p+2}$ of the hyperplane $H$ defined above. It's not a subspace of $\Bbb R^{p+2}$ because for all $x\in S$, $x_0=1$. And it's not an hyperplane either because it has dimension $p$, not $p+1$.

Therefore, I regard the sentence "If the constant is included in $X$, then the hyperplane includes the origin and is a subspace; if not, it is an affine set cutting the $Y$-axis at the point $(0,\hat\beta_0)$". as wrong.

That being said, I think it's a minor error that does not impair the subsequent exposition of the linear model. I have another concern about the randomness in the model being completelly hidden, but it's the introduction of the chapter, and later on the epsilons are introduced as expected, to address inference.

3
On

In linear regression, the equation linking the features $x = (x_1,\ldots,x_p) \in \mathbb R^p$, coefficients $\beta =(\beta_1,\ldots,\beta_p) \in \mathbb R^p$, the intersept $\beta_0 \in \mathbb R$, and output is $\widehat{y} \in \mathbb R$, is $$ \widehat{y} = \beta^Tx + \beta_0. \tag{1} $$

  • This is clearly equation of a line in the $(p+1)$-dimensional space $\mathbb R^{p+1} = \{(x,y) \mid x \in \mathbb R^p,\; y \in \mathbb R\}$, which doesn't go through the origin (except if $\beta_0 = 0$).

  • Projectivization. Now, let $\widetilde{x} := (1,x) = (1,x_1,\ldots,x_p)$ and $\widetilde{\beta} := (\beta_0,\beta) := (\beta_0,\beta_1,\ldots,\beta_p)$. We can rewrite this equation as $$\widehat{y} = \beta^Tx + \beta_0 = \sum_{j=1}^p\beta_j x_j + \beta_0\cdot 1 = \sum_{j=1}^{p+1}\widetilde{\beta}^T\widetilde{x}, $$ which is clearly the equation of a line through the origin, in the $(p+1)$-dimensional real-projective space space $\mathbb P\mathbb R^{p+1} := \{(1,x,y) \mid x \in \mathbb R^p,\;y \in \mathbb R\} \cup \infty$, where $\infty := \{(0,x,y) \mid y \ne 0 \lor \exists j,\;x_j \ne 0\}$ is the point at infinity.

The apparent difference between the two geometric intepretations (line not going through origin versus line going through origin) boils down to the difference between the real vector-space $\mathbb R^{p+1}$ and its projectivization $\mathbb P\mathbb R^{p+1} := \{(1,z) \mid z \in \mathbb R^{p+1}\} \cup \infty$,where $\infty := \{(0,z) \mid z \in \mathbb R^{p+1} \mid \exists j,\;z_j \ne 0\}$.

Let me know if you need more details on anything I've said...

9
On

The space is described in terms of a projective space, which means that you are dealing with vectors of the form $(X_{0}, X_{1}, \ldots, X_{p}) \in \mathbb{R}^{p+1}$, where two vectors are considered equivalent if they are scalar multiples of each other.

You usually take representative vectors from each equivalence class by requiring that the first nonzero entry is $1$. You can consider the points with $X_{0}=0$ to be the hyperplane at infinity, so your affine points are those for which $X_{0} \neq 0$ (and can assumed to be $X_{0}=1$). This gives a copy of the affine space $\mathbb{R}^{p}$ as the set of these vectors with $X_{0}=1$, and the "origin" here can be taken as the vector $(1,0,\ldots,0)$.

Okay, now from this, assuming we included the constant $1$ in our input vector, for each $X = (X_{1}, \ldots, X_{p})$ we define $\hat{Y} = (1, X_{1}, \ldots, X_{p})^{T} \hat{\beta} = \hat{Y}$. Then if we look at the vectors $(1, X_{1}, \ldots, X_{p}, \hat{Y})$, these vectors satisfy the equation $(1, X_{1}, \ldots, X_{p}, \hat{Y})^{T}(\beta_{0}, \beta_{1}, \ldots, \beta_{p}, -1) = 0$.

Okay, if you know a little projective geometry, the set of vectors $x$ satisfying a homogeneous equation $x^{T} b = 0$ for a fixed vector $b$ form a hyperplane of the projective space; if we require that the first coordinate of $x$ is 1, then we are looking for the intersection of this projective hyperplane with the affine space mentioned above, giving a hyperplane of the affine space (this is different than the hyperplane of a vector space, which must contain the origin). Therefore, the points of the form $(X, \hat{Y})$ give us a hyperplane of this affine space $\mathbb{R}^{p+1}$ (here I am using a hyperplane to mean an affine subspace of codimension 1, different from a vector subspace in that it can be a translation $v+H$ of a vector subspace $H$).

If we do it the way I described above, this hyperplane obtained above does NOT contain the origin, because if we fix $X_{1}=X_{2}=\cdots=X_{p}=0$, then we must have $\hat{Y}=\beta_{0}$, therefore it slices the "y-axis" at $(0,\beta_{0})$. So we find ourselves in the case where we have not "included the constant variable 1 in X". I don't totally understand what he means by this statement, except that he would like to fudge things a little to absorb this extra variable $X_{0}$ somehow so he does not have to worry about fiddly notation. In my reading, he is not explicit about how he includes this, and in fact he makes no mention at all of an $X_{0}$.

In general you will find that textbooks on this topic are not very mathematically clear or rigorous. For example this author doesn't ever define a hyperplane, they only vaguely describe them in Chapter 4 where they mention that they sort of play loosely with the normal definition. So I think you will find madness if you try to take anything more than a big picture idea from a text like this written by a computer scientist.