What is the domain of this random variable?

117 Views Asked by Bumbble Comm At 09 Apr 2026 - 2:04

I've been self-studying Introduction to Statistical Learning. From page 16 of the book:

"...suppose that we observe a quantitative response $Y$ and $p$ different predictors, $X_1$, $X_2$, $\ldots$, $X_p$. We assume that there is some relationship between $Y$ and $X = (X_1, X_2, \ldots, X_p)$, which can be written in the very general form $Y = f(X) + \epsilon$

Later on, we analyze the approximation function

$$ \hat{Y} = \hat{f}(X). $$

Question 1: What is the domain of $Y$ and $\hat{Y}$? Is it tuples of observations or tuples filled with values from the domains of the $X_1$, $\ldots$, $X_p$?

Question 2: Throughout this text, one will see references to the notation like $\hat{f}(x_i)$. For example:

$$ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}(x_i))^2 $$

Isn't the domain of $\hat{f}$ random variables (not tuples of observations $x_i$)?. If so, isn't this an abuse of notation? Shouldn't we instead (technically) write something like

$$ (f(X_1, \ldots, X_p))(x_i)? $$

NOTE: In case not clear, I mean to use the term "domain" in the purely mathematical sense: i.e., if $f: A \rightarrow B$, then $A$ is the domain (at least how I'm using the term).

Original Q&A

There are 1 best solutions below

Bumbble Comm On 19 Apr 2016 - 12:27

So when you are given a sample of response variables Y and some predictor variables $X_1, ... , X_p$, I think you are asked to find the regression between Y and $X_1, ... , X_p$. On the very basics, we use simple linear regression $Y = a_0 + a_1X_1 + ... + a_pX_p + \epsilon$ where $\epsilon$ is the noise(error) term.

Q1. The domain of Y should just be the population where you sampled your response variable Y. It can be either categorical or numerical

Q2.

$\hat{f}(x_i)$ means our regression estimate on variable $x_i$ where $x_i$ is consisted of p components $x_{i1},x_{i2}, .. x_{ip}$.

For example, we want to predict the height of student. And we have our predictor variables like weight, age, gender, race,..etc. Then $x_i$ is our measure of the ith student of his/her weight, age, gender, race,..etc

What is the domain of this random variable?

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in RANDOM-VARIABLES

Related Questions in SELF-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions