I've been self-studying Introduction to Statistical Learning. From page 16 of the book:
"...suppose that we observe a quantitative response $Y$ and $p$ different predictors, $X_1$, $X_2$, $\ldots$, $X_p$. We assume that there is some relationship between $Y$ and $X = (X_1, X_2, \ldots, X_p)$, which can be written in the very general form $Y = f(X) + \epsilon$
Later on, we analyze the approximation function
$$ \hat{Y} = \hat{f}(X). $$
Question 1: What is the domain of $Y$ and $\hat{Y}$? Is it tuples of observations or tuples filled with values from the domains of the $X_1$, $\ldots$, $X_p$?
Question 2: Throughout this text, one will see references to the notation like $\hat{f}(x_i)$. For example:
$$ MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}(x_i))^2 $$
Isn't the domain of $\hat{f}$ random variables (not tuples of observations $x_i$)?. If so, isn't this an abuse of notation? Shouldn't we instead (technically) write something like
$$ (f(X_1, \ldots, X_p))(x_i)? $$
NOTE: In case not clear, I mean to use the term "domain" in the purely mathematical sense: i.e., if $f: A \rightarrow B$, then $A$ is the domain (at least how I'm using the term).
So when you are given a sample of response variables Y and some predictor variables $X_1, ... , X_p$, I think you are asked to find the regression between Y and $X_1, ... , X_p$. On the very basics, we use simple linear regression $Y = a_0 + a_1X_1 + ... + a_pX_p + \epsilon$ where $\epsilon$ is the noise(error) term.
Q1. The domain of Y should just be the population where you sampled your response variable Y. It can be either categorical or numerical
Q2.
$\hat{f}(x_i)$ means our regression estimate on variable $x_i$ where $x_i$ is consisted of p components $x_{i1},x_{i2}, .. x_{ip}$.
For example, we want to predict the height of student. And we have our predictor variables like weight, age, gender, race,..etc. Then $x_i$ is our measure of the ith student of his/her weight, age, gender, race,..etc