hypothesis space - linear and logistic regression

811 Views Asked by At

I am new to machine learning and I came across the term "hypothesis space". I am trying to grasp what is it and especially am interested in dimension of this "space." For example in the context of linear regression, trying to fit a linear polynomial to the data, would the dimension of the hypothesis space be $2$? What about in the context of logistic regression?

1

There are 1 best solutions below

0
On

In the simplest instances of logistic regression one has independent random variables $Y_1,\ldots,Y_n$ for which $$ \begin{cases} \operatorname{logit} \Pr(Y_i=1) = \phantom{+(}\alpha + \beta x_i \\[8pt] \operatorname{logit} \Pr(Y_i=0) = -(\alpha+\beta x_i) \end{cases} $$ where $$ \operatorname{logit} p = \log \frac p {1-p}, $$ and

  • $\{(x_i, Y_i) : i=1,\ldots,n\}$ are observed;
  • $\alpha,\beta$ are not observed and are to be estimated based on the above observed data;
  • As mentioned, $Y_i$ are random variables. On the other had $x_i$ are treated as constant, i.e. non-random, despite the fact that they may change if a new sample of $n$ observations is taken, the justification being that one is really interested in the conditional distribution of $Y$ given $x.$

Least squares is not the method used for estimating $\alpha$ and $\beta;$ maximum likelihood is, and the MLE is found by iteratively re-weighted least squares.

The function of most interest my be $$ p = \operatorname{logit}^{-1} (\alpha + \beta x) = \frac 1 {1 + e^{-(\alpha+\beta x)}}. $$ Every such function is completely determined by the values of $\alpha$ and $\beta.$ And in this case $\alpha$ and $\beta$ can be any real numbers at all.

Therefore the hypothesis space, if that is defined as the set of functions the model is limited to learn, is a $2$-dimensional manifold homeopmorphic to the plane.

When the mapping from the parameter space to the hypothesis space is one-to-one and continuous, then the dimension of the hypothesis space is the same as the dimension of the parameter space. And "continuous" may be best defined in this context in such a way that it's always continuous, i.e. the mapping itself determines the topology on the hypothesis space.