Suppose we have two samples with known correlation (should be relatively high). Say both samples have $n$ data points. What if now we still know the correlation factor but one sample only consistent of the first 5 data point.
Could one still construct the remaining data points solely using the correlation with the other sample?
My idea would be to look at the relative differences in the known sample and compensate by the correlation. Could this work? Thanks for any assistance.
Use the formula for the correlation coefficient to generate an equation:$$\overline r=\large{\frac{\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)\left(y_{i}-\overline{y}\right)}{\sqrt{\sum_{i=1}^{n}\left(x_{i}-\overline{x}\right)^2\sum_{i=1}^{n}\left(y_{i} - \overline{y} \right)^2}}}$$
Suppose you have three data points :
$\begin{array}{|c|c|c|c|} \hline x&1&3&5 \\ \hline y&2 & & \\ \hline\end{array}$
And the desired value for $\overline r$ is $0.9$.
The equation becomes
$0.9=\large{\frac{(-2)\cdot \left( 2-\frac{1}{3}\cdot (2+y_2+y_3)\right)+(0)\cdot \left( y_2-\frac{1}{3}\cdot (2+y_2+y_3)\right)+(2)\cdot \left( y_3-\frac{1}{3}\cdot (2+y_2+y_3)\right)}{\left( (-2)^2+0^2+2^2\right) \cdot \left( \left(2-\frac{1}{3}\cdot (2+y_2+y_3) \right)^2+\left(y_2-\frac{1}{3}\cdot (2+y_2+y_3) \right)^2+\left( y_3-\frac{1}{3}\cdot (2+y_2+y_3)\right)^2\right)}}$
This equation has $\texttt{more than one}$ solution and it is not easy to solve. But in general it is possible.