Description of plane and points
Given a set of non-planar points ($\rm\color{green}{P_i}$) in $\color{green}{\text{Coordinate System 1}}$, the projections of those points ($\rm\color{blue}{S_i}$) on the $\color{blue}{\text{Shadow Plane}}$ in $\color{blue}{\text{Coordinate System 2}}$, and the distance ($\rm\color{blue}{d}$) between the $\color{red}{\text{Light Source}}$ and the $\color{blue}{\text{Shadow Plane}}$ given in the same scale as the $\color{blue}{\text{Shadow Plane}}$; how would one determine the positions of points ($\rm\color{green}{P_i}$) in $\color{blue}{\text{Coordinate System 2}}$? How many Point-Projection ($\rm\color{green}{P_i}$-$\rm\color{blue}{S_i}$) pairs would be necessary for a fully constrained solution?
Note: $\color{green}{\text{Coordinate System 1}}$ and $\color{blue}{\text{Coordinate System 2}}$ are rotated, translated, and scaled.
This can be viewed as a variant of what’s known as the resectioning problem: recovering the $4\times3$ projection matrix $\mathtt P$ from a set of scene-image point correspondences. You can find more detail on how to do this in any standard reference such as Hartley and Zisserman’s Multiple View Geometry In Computer Vision. Once you have $\mathtt P$, you can recover the information that you need to construct “Coordinate System 2” from it. In particular, you have the following:
If an image point $\mathbf x = (x,y,w)^T$ corresponds to the scene point $\mathbf X$, then the following relation holds: $$\begin{bmatrix}\mathbf 0^T & -w\mathbf X^T & y\mathbf X^T \\ w\mathbf X^T & 0 & -x\mathbf X^T \\ -y\mathbf X^T & w\mathbf X^T & 0\end{bmatrix} \mathtt P = \mathbf 0.$$ This is a set of homogeneous linear equations in the elements of $\mathtt P$. Each point correspondence contributes two independent equations and $\mathtt P$ has 11 degrees of freedom, so in general one needs $5\frac12$ point correspondences to determine $\mathtt P$ uniquely. This can be reduced for your problem since you have other constraints on $\mathtt P$, namely the known camera center $\mathbf C$, which satisfies $\mathtt P\mathbf C=\mathbf 0$ and so is effectively another point correspondence, and the distance to the image plane. There are, of course, degenerate configurations for which the reconstruction is ambiguous, for example, if the camera and scene points lie on a twisted cubic or on the union of the image plane and camera axis. Hartley and Zisserman discuss these degenerate configurations in more detail.