The following is a question from Rice's Mathematical Statistics - Ch. 14 - Q-5
I wanted to know if the following approach would be the right set up of the problem:
I looked at this as a question of attempting to solve for the true of value of the points $p_{1}, p_{2}, p_{3}$.
To do this we have to first specify the linear model we are going to use. I specified it in the following way:
$$p_{i} = \beta_{0} + \beta_{1}y_{i,1} + \beta_{2}y_{i,2} + \beta_{3}y_{i,3} + e_{i}$$
Which if translated to matrix form would be:
$$\textbf{P} = \textbf{X} \textbf{$\mathbb{\beta}$}\ \text{where,} \\ \textbf{P} = \begin{bmatrix} p_{1} \\ p_{2} \\ p_{3} \\ \end{bmatrix}, \ \textbf{X} = \begin{bmatrix} 1 & y_{1,1} & y_{1,2} & y_{1,3} \\ 1 & y_{2,1} & y_{2,2} & y_{2,3} \\ 1 & y_{3,1} & y_{3,2} & y_{3,3} \\ \end{bmatrix}, \ \textbf{$\mathbb{\beta}$} = \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \end{bmatrix} \\ \\ \text{where,} \ y_{i,j} = \begin{cases} |p_{i} - p_{j}| & \text{if}\ i\neq j \\ |p_{i} - 0| & \text{if}\ i = j \\ \end{cases} $$
Which would put me in a position to be able to express the least squares expression:
$$S(\textbf{P}) = \sum_{i = 1}^{3} (p_{i} - \beta_{0} - \beta_{1}y_{i,1} - \beta_{2}y_{i,2} - \beta_{3}y_{i,3})^{2} \\ = ||\textbf{P} -\textbf{X} \textbf{$\mathbb{\beta}$}||^{2} \\ = ||\textbf{P} - \hat{\textbf{P}}||^{2}$$
So my concerns:
I looked at the question as regressing the $p_{i}$ on to the $y_{i}$, now the way they expressed things in the question it kind of feels as if they wanted me to do the opposite. I may be overthinking it here, but considering this is a first approach at linear regression I don't think they would deviate too much from how they use the $Y_{i}$ variable in the text.
I introduced an intercept term $\beta_{0}$, for me I was looking at it as perhaps the distance from the origin to the first point. That was my thinking behind it, but then I also have $\beta_{1}$ which is being estimated based off of distances that use the observed $p_{1}$ value. So would an intercept term be needed here?
How is the setup in terms of the overall idea?

I think your setup is correct, but maybe it's not what you were asked to do. The question is about using the least-squares method to estimate (directly) the values of $p_i$ given the noisy measurements $Y_i$. If we wanted to state this question as a regression problem, we would be asked to regress $p_i$ given the values of $Y_i$.
The least-squares method is more general than regression and tries to find optimal parameters to some model given data. In our case:
We now have to model our problem. The most common way to do it is to consider all measurements to have iid errors coming from a normal distribution, i.e. $Y_i = x_i + \varepsilon_i$ where $x_i$ is some data and $\varepsilon_i \sim N(0, \sigma^2)$ are iid samples from a gaussian, modeling noise (it can be justified by assuming all the measurements taken independently by the same person / instrument). In our model, we are taking measurements of $p_i$'s and $d_{ij}$'s with some noise, so we can state that each $Y_i$ comes from a gaussian centered in the true values of our quantities of interest.This gives us: $$ \begin{aligned} &Y_1 = p_1 + \varepsilon_1; ~~ Y_2 = p_2 + \varepsilon_2; ~~ Y_3 = p_3 + \varepsilon_3 \\ &Y_4 = d_{12} + \varepsilon_4; ~~ Y_5 = d_{13} + \varepsilon_5;~~ Y_6 = d_{23} + \varepsilon_6 \\ \end{aligned} $$ Where $\varepsilon_i \sim N(0, \sigma^2)$ are iid random variables. Our problem looks like a linear regression given only by intercepts. When we deal with iid gaussians, the least-squares solution is equivalent to maximum likelihood estimation (MLE). To see it, let's write out our likelihood:
$$ P(Y_1,Y_2,Y_3,Y_4,Y_5,Y_6 | p_1,p_2,p_3) = \prod_{i=1}^{6} P(Y_i) $$
We'd like to maximize that quantity by finding the optimal values of $p_i$'s, i.e. finding the most probable positions which explain our data. Observe that maximizing $\mathcal{L}(p_1, p_2, p_3) := P(Y_1,Y_2,Y_3,Y_4,Y_5,Y_6 | p_1,p_2,p_3)$ is the same as maximizing $\log \mathcal{L}(p_1, p_2, p_3)$, for what we have:
$$ \begin{aligned} &\max_{p_1, p_2 , p_3} \log \mathcal{L}(p_1, p_2 , p_3) \\ & = \max_{p_1, p_2 , p_3} -\frac{1}{2\sigma^2} \left ( \sum_{i = 1}^{6} (Y_i - p_i)^2 \right) \\ & = \min_{p_1, p_2 , p_3} \sum_{i = 1}^{6} (Y_i - p_i)^2 \end{aligned} $$ Where $p_4 = d_{12}$, $p_5 = d_{13}$ and $p_6 = d_{23}$. The last expression is precisely the least squares solution. We were asked to find it's matrix form, for that let's define $T$ by:
$$ T = \left [ \begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \\ 0 & -1 & 1 \\ \end{matrix} \right ] $$
Then, for $Y = [Y_1,Y_2,Y_3,Y_4,Y_5,Y_6]^T$ and $\mathbf{p} = [p_1, p_2, p_3]^{T}$ we finally have:
$$ \begin{aligned} &\max_{p_1, p_2 , p_3} \log \mathcal{L}(p_1, p_2 , p_3) \\ & = \min_{p_1, p_2 , p_3} \sum_{i = 1}^{6} (Y_i - p_i)^2 \\ &= \| Y - T\mathbf{p} \|^2 \end{aligned} $$ I hope it helps :)