Verification of correct setup of the matrix form of the linear regression problem

Question

Verification of correct setup of the matrix form of the linear regression problem

57 Views Asked by Bumbble Comm At 30 Mar 2026 - 7:09

The following is a question from Rice's Mathematical Statistics - Ch. 14 - Q-5

I wanted to know if the following approach would be the right set up of the problem:

I looked at this as a question of attempting to solve for the true of value of the points $p_{1}, p_{2}, p_{3}$.

To do this we have to first specify the linear model we are going to use. I specified it in the following way:

$$p_{i} = \beta_{0} + \beta_{1}y_{i,1} + \beta_{2}y_{i,2} + \beta_{3}y_{i,3} + e_{i}$$

Which if translated to matrix form would be:

$$\textbf{P} = \textbf{X} \textbf{$\mathbb{\beta}$}\ \text{where,} \\ \textbf{P} = \begin{bmatrix} p_{1} \\ p_{2} \\ p_{3} \\ \end{bmatrix}, \ \textbf{X} = \begin{bmatrix} 1 & y_{1,1} & y_{1,2} & y_{1,3} \\ 1 & y_{2,1} & y_{2,2} & y_{2,3} \\ 1 & y_{3,1} & y_{3,2} & y_{3,3} \\ \end{bmatrix}, \ \textbf{$\mathbb{\beta}$} = \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \end{bmatrix} \\ \\ \text{where,} \ y_{i,j} = \begin{cases} |p_{i} - p_{j}| & \text{if}\ i\neq j \\ |p_{i} - 0| & \text{if}\ i = j \\ \end{cases} $$

Which would put me in a position to be able to express the least squares expression:

$$S(\textbf{P}) = \sum_{i = 1}^{3} (p_{i} - \beta_{0} - \beta_{1}y_{i,1} - \beta_{2}y_{i,2} - \beta_{3}y_{i,3})^{2} \\ = ||\textbf{P} -\textbf{X} \textbf{$\mathbb{\beta}$}||^{2} \\ = ||\textbf{P} - \hat{\textbf{P}}||^{2}$$

So my concerns:

I looked at the question as regressing the $p_{i}$ on to the $y_{i}$, now the way they expressed things in the question it kind of feels as if they wanted me to do the opposite. I may be overthinking it here, but considering this is a first approach at linear regression I don't think they would deviate too much from how they use the $Y_{i}$ variable in the text.
I introduced an intercept term $\beta_{0}$, for me I was looking at it as perhaps the distance from the origin to the first point. That was my thinking behind it, but then I also have $\beta_{1}$ which is being estimated based off of distances that use the observed $p_{1}$ value. So would an intercept term be needed here?
How is the setup in terms of the overall idea?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

I think your setup is correct, but maybe it's not what you were asked to do. The question is about using the least-squares method to estimate (directly) the values of $p_i$ given the noisy measurements $Y_i$. If we wanted to state this question as a regression problem, we would be asked to regress $p_i$ given the values of $Y_i$.

The least-squares method is more general than regression and tries to find optimal parameters to some model given data. In our case:

We have latent values $p_1, p_2, p_3$ and we have information about $d_{12} = p_2 - p_1, ~ d_{13} = p_3 - p_1, ~ d_{23} = p_3 - p_2$.
We have noisy measurements of those quantities given by $Y_1, ..., Y_6$.

We now have to model our problem. The most common way to do it is to consider all measurements to have iid errors coming from a normal distribution, i.e. $Y_i = x_i + \varepsilon_i$ where $x_i$ is some data and $\varepsilon_i \sim N(0, \sigma^2)$ are iid samples from a gaussian, modeling noise (it can be justified by assuming all the measurements taken independently by the same person / instrument). In our model, we are taking measurements of $p_i$'s and $d_{ij}$'s with some noise, so we can state that each $Y_i$ comes from a gaussian centered in the true values of our quantities of interest.This gives us: $$ \begin{aligned} &Y_1 = p_1 + \varepsilon_1; ~~ Y_2 = p_2 + \varepsilon_2; ~~ Y_3 = p_3 + \varepsilon_3 \\ &Y_4 = d_{12} + \varepsilon_4; ~~ Y_5 = d_{13} + \varepsilon_5;~~ Y_6 = d_{23} + \varepsilon_6 \\ \end{aligned} $$ Where $\varepsilon_i \sim N(0, \sigma^2)$ are iid random variables. Our problem looks like a linear regression given only by intercepts. When we deal with iid gaussians, the least-squares solution is equivalent to maximum likelihood estimation (MLE). To see it, let's write out our likelihood:

$$ P(Y_1,Y_2,Y_3,Y_4,Y_5,Y_6 | p_1,p_2,p_3) = \prod_{i=1}^{6} P(Y_i) $$

We'd like to maximize that quantity by finding the optimal values of $p_i$'s, i.e. finding the most probable positions which explain our data. Observe that maximizing $\mathcal{L}(p_1, p_2, p_3) := P(Y_1,Y_2,Y_3,Y_4,Y_5,Y_6 | p_1,p_2,p_3)$ is the same as maximizing $\log \mathcal{L}(p_1, p_2, p_3)$, for what we have:

$$ \begin{aligned} &\max_{p_1, p_2 , p_3} \log \mathcal{L}(p_1, p_2 , p_3) \\ & = \max_{p_1, p_2 , p_3} -\frac{1}{2\sigma^2} \left ( \sum_{i = 1}^{6} (Y_i - p_i)^2 \right) \\ & = \min_{p_1, p_2 , p_3} \sum_{i = 1}^{6} (Y_i - p_i)^2 \end{aligned} $$ Where $p_4 = d_{12}$, $p_5 = d_{13}$ and $p_6 = d_{23}$. The last expression is precisely the least squares solution. We were asked to find it's matrix form, for that let's define $T$ by:

$$ T = \left [ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \\ 0 & -1 & 1 \\ \end{matrix} \right ] $$

Then, for $Y = [Y_1,Y_2,Y_3,Y_4,Y_5,Y_6]^T$ and $\mathbf{p} = [p_1, p_2, p_3]^{T}$ we finally have:

$$ \begin{aligned} &\max_{p_1, p_2 , p_3} \log \mathcal{L}(p_1, p_2 , p_3) \\ & = \min_{p_1, p_2 , p_3} \sum_{i = 1}^{6} (Y_i - p_i)^2 \\ &= \| Y - T\mathbf{p} \|^2 \end{aligned} $$ I hope it helps :)

Verification of correct setup of the matrix form of the linear regression problem

There are 1 best solutions below

Related Questions in STATISTICS

Related Questions in SOLUTION-VERIFICATION

Related Questions in PROOF-EXPLANATION

Related Questions in LEAST-SQUARES

Related Questions in LINEAR-REGRESSION

Trending Questions

Popular # Hahtags

Popular Questions