Consider a camera placed in $(XC,YC,ZC)$. The camera is looking down on a flat table $Z=0$:
Introduce a local coordinate system for the camera: $(x,y,z)$. Initially the axes of the local coordinate system for the camera are aligned with the main coordinate system: $(X,Y,Z)$.
Unfortunately during mounting the camera becomes slightly rotated. As a result a vector described in the camera coordinate system: $\vec{dir}$ is expressed as $\vec{\mathit{DIR}}$ the main coordinate system:
$$ \vec{\mathit{DIR}} = R \vec{\mathit{dir}} $$
where R is a rotation matrix. Describing R as a rotation with the angle $\omega$ around the unit vector $[kx,ky,kz]$ I find from Rodrigues' rotation formula: $$ R=\left[\begin{matrix} kx^2(1-\cos(\omega)) + \cos(\omega) & kx ky \left(1 - \cos{\left(\omega \right)}\right) - kz \sin{\left(\omega \right)} & kx kz \left(1 - \cos{\left(\omega \right)}\right) + ky \sin{\left(\omega \right)}\\kx ky \left(1 - \cos{\left(\omega \right)}\right) + kz \sin{\left(\omega \right)} & ky^2(1-\cos(\omega)) + \cos(\omega) & - kx \sin{\left(\omega \right)} + ky kz \left(1 - \cos{\left(\omega \right)}\right)\\kx kz \left(1 - \cos{\left(\omega \right)}\right) - ky \sin{\left(\omega \right)} & kx \sin{\left(\omega \right)} + ky kz \left(1 - \cos{\left(\omega \right)}\right) & kz^2(1-\cos(\omega)) + \cos(\omega) \end{matrix}\right] $$
Now the camera is a line camera (1D). Each pixel in the camera, i, corresponds to an angle $\theta_i$.
A ray from the i'th pixel in the camera has the direction (I am here using the principle of reversibility of light from optics): $$ \vec{\mathit{dir}} = [0,\sin(\theta_i),-\cos(\theta_i)] $$ in the camera coordinate system.
The equation for a ray from the camera is: $$ (X,Y,Z) = (XC,YC,ZC) + s \cdot \vec{\mathit{DIR}} $$ where $s \ge 0$.
The ray intersect the table at $Z=0$: $$ 0 = ZC + s \cdot \mathit{DIR}_Z $$ where $\mathit{DIR}_Z$ is the $z$ component of the $\vec{\mathit{DIR}}$. Solve this for s: $$ s = -\frac{ZC}{\mathit{DIR}_Z} $$ Substitute this into the equation for $X$ and $Y$: $$ X = XC + \alpha ZC $$ $$ Y = YC + \beta ZC $$ where I have introduced $$ \alpha = -\frac{\mathit{DIR}_X}{\mathit{DIR}_Z} $$ $$ \beta = -\frac{\mathit{DIR}_Y}{\mathit{DIR}_Z} $$
I now have a forward model that calculates a position in $R^2$: $(X,Y)$. The model uses 7 parameters that I need to estimate: $$\vec{C}=[XC,YC,ZC,kx,ky,kz,\omega]$$
I have $N$ measurements: $\{(\theta_i,X_i,Y_i)\}$. From these measurements I want to estimate $\vec{C}$.
I have made a Square Error expression involving the measurements and $\vec{C}$:
$$ SE = \frac{1}{2N}\sum_{i=1}^{N} [(XC - Xi + \alpha\ ZC))^2 + (YC - Yi + \beta ZC)^2] $$
I can minimize this expression eg. using scipy.optimize.minimize().
However there is one "snag" that I do not understand the consequences of; the length of the rotation vector must be one: $$ kx^2+ky^2+kz^2=1. $$ If I solve for $\vec{C}$ I could end up with a $\vec{k}$ with a norm different than 1. Could I then simply divide the found rotation by the length of the k vector: $$ \omega_\text{effective} = {\omega \over (\sqrt{kx^2+ky^2+kz^2})} \;? $$ Alternatively I could perhaps express $kx$, $ky$ and $kz$ in spherical coordinates before minimizing?
For instance: $$ \begin{align} kx &= \sin(\phi)\cos(\psi)\\ ky &= \sin(\phi)\sin(\psi)\\ kz &= \cos(\phi) \end{align} $$
As answered by Cesareo I could use constraints with scipy.optimize.minimize().
However does this reduce the possible algorithms I could use (to e.g. SLSQP)?

In another scenario, if you want to find the rotation matrix that specifies the orientation of the camera in space, then the vector $\{ \theta_i \}$ is not sufficient. In addition to $\theta_i$, you need to have an angle $ \phi_i $ that specifies the polar angle with which point $P_i$ was observed at the camera. If $R$ is the rotation matrix, and $C$ is the location of the camera, then
$ P_i = C + R \| P_i - C \| u_i $
where $u_i = ( \sin \theta_i \cos \phi_i , \sin \theta_i \sin \phi_i, \cos \theta_i ) $
and
$ \| P_i - C \| = \sqrt{ (P_i - C)^T (P_i - C) } $
We now define our error function as follows
$ E(C, R) = \displaystyle \sum_i (P_i - C - R \| P_i - C \| u_i )^T (P_i - C - R \| P_i - C \| u_i ) $
Expanding the above expression, and using $u_i^T u_i = 1 $ ($u_i$ is a unit vector ), and $R^T R = I $, we get
$ E(C, R) = \displaystyle \sum_i \bigg( 2 P_i^T P_i + 2 C^T C - 4 P_i^T C - 2 P_i^T R \| P_i - C \| u_i + 2 C^T R \| P_i - C \| u_i \bigg)$
The gradient of this function with respect to $C$ is
$\nabla_C (E) = \displaystyle \sum_i \bigg( 4 C - 4 P_i + 2 \dfrac{(P_i - C)^T R u_i}{\| P_i - C \| } (P_i - C) + 2 \| P_i - C \| R u_i \bigg) $
In addition, we have the matrix $R$, which instead of specifying using an axis of rotation and an angle of rotation, can be specified as the composition of three consecutive rotations, with angles $ \theta , \phi, \beta $ respectively, as follows
$ R = R_z( \phi ) R_y(\theta) R_z( \beta) $
where $R_z$ is the rotation matrix about the $z$ axis, and $R_y$ is the rotation matrix about the $y$ axis by the given angle.
So now our unknowns vector is $X = [C_x, C_y, C_z, \theta, \phi, \beta ]^T $
Computation of the gradient of $E$ with respect to $\theta, \phi, \beta$ is straight forward and it is as follows
$ \dfrac{\partial E}{\partial \theta} = \displaystyle \sum_i \bigg(-2 \| P_i - C \| (P_i - C) ^T \dfrac{\partial R}{\partial \theta} u_i \bigg) $
Similarly,
$ \dfrac{\partial E}{\partial \phi} = \displaystyle \sum_i \bigg(-2 \| P_i - C \| (P_i - C) ^T \dfrac{\partial R}{\partial \phi} u_i \bigg) $
$ \dfrac{\partial E}{\partial \beta} = \displaystyle \sum_i \bigg(-2 \| P_i - C \| (P_i - C) ^T \dfrac{\partial R}{\partial \beta} u_i \bigg) $
And, to evaluate these partial derivatives of the rotation matrix, we have
$ \dfrac{\partial R}{\partial \theta} = R_z(\phi) \dfrac{\partial R_y}{\partial \theta} R_z(\beta) $
$ \dfrac{\partial R}{\partial \phi} = \dfrac{\partial R_z}{\partial \phi} R_y(\theta) R_z(\beta) $
$ \dfrac{\partial R}{\partial \beta} = R_z(\phi) R_y(\theta) \dfrac{\partial R_z}{\partial \beta} $
That's all the formulation we need, the rest is applying optimization that utilizes the gradient information. This is what I did by using the steepest descent method, and it converged successfully to the true values of $C$ and $R$ in the simulation.
The code for minimization using the steepest descent is found in my other answer, which can be used here with obvious modifications.