Hidden variable resultant for solution of a system of polynomial equations

786 Views Asked by At

I am trying to implement a method from as paper that solves a practical problem. The core part is solving of a system of polynomial equations of same total degree.

We have a system of 10 polynomial equations on $x, y, z$ of degrees 3 $$F_1(x, y,z)=F_2(x,y,z)=...=F_{10}(x,y,z)=0$$
The hidden variable method mentioned in the paper suggests next representation: $$F_i = f_{i1}(z)x^3 + f_{i2}(z)y^3 + f_{i3}(z)x^2y + f_{i4}(z)xy^2 + ... + f_{i8}(z)x + f_{i9}(z)y + f_{i10}(z)1$$ where $f_{ij}$ is a polynomial of z of correspondent degree from 0 ($f_{i1}$ for example) to 3 ($f_{i10}$).

Now we can look write our system of equations as $$A(z)X=0$$ where $A(z)={f_{ij}(z)}$, $X=\begin{bmatrix}x^3 & y^3 & x^2y & xy^2 & x^2 & y^2 & xy & x & y & 1\end{bmatrix}^T$

From linear algebra we know that this system can have a nontrivial solution only if $det(A(z))=0$

So writing down $det(A(z))$ as a function of will give us a single-variable polynomial on $z$. Its $k$ roots ${z_k}$ will give us possible values of $z$ that a solution of the system can have.

The part above is clear for me, but next steps are not. It is said that eigenvectors of $A(z_k)$ are solutions for the system! So we have to find an eigenvector, take its 8th and 9th components (which correspond to $x$ and $y$ in $X$ vector) and we'll get a valid solution for the whole system. That sounds like a magic for me. Is this guaranteed? Will, for example, 7th component of the vector (it corresponds to $xy$) be a product of 8th and 9th? What gives such guarantees etc. It's also mentioned that it's true due to the fact that $X$ has all bivariate monomials of $x$ and $y$ of degrees up to 3 and I have no idea why it's so.

I tried to check if this method works on a system of 3 quadratic equations on $x$ and $y$. For example, consider the system

$$x^2 + 2xy + 3y^2 + 4x + 5y + 6 = 0$$ $$6x^2 + 3xy + 2y^2 + x + 5y + 4 = 0$$ $$3x^2 + 2xy + 6y^2 + 4x + y + 5 = 0$$

It can be rewritten as $A(y)X = 0$ where $$A = \begin{bmatrix}1 & 2y + 4 & 3y^2 + 5y + 6 \\ 6 & 3y + 1 & 2y^2 + 5y + 4 \\ 3 & 2y + 4 & 6y^2 + y + 5 \end{bmatrix}$$ and $$X = \begin{bmatrix}x^2 \\ x \\ 1 \end{bmatrix}$$

Determinant of such matrix is a polynomial: $$det(A(y)) = -37y^3 - 33y^2 + 111y + 43$$

The roots of this polynomial are: $$y_1 \approx -2.069961\\ y_2 \approx -0.364067 \\y_3 \approx 1.5421368$$

Now take for example $y_1$ and substitute it to $A(y)$

$A(y_1) = \begin{bmatrix} 1.00000 & -0.13992 & 8.50441 \\ 6.00000 & -5.20988 & 2.21967 \\ 3.00000 & -0.13992 & 28.63848 \end{bmatrix}$

Using SVD decomposition we get an eigenvector correspondent to zero eigenvalue $p = \begin{bmatrix} 0.668080 \\ 0.741125 \\ -0.066363 \end{bmatrix} $. Normalizing $p$ by dividing it by -0.066363 we should get an answer (a valid $x$ such that $(x, y_1)$ satisfy our system), but it's not a valid answer. So no magic happened! :( Neither it did for nullspace vector for $A(y_2)$ and $A(y_3)$.

Now let's consider a system that has solutions. For example, $$ x + 2y = 5 \\ 3x + 4y = 6$$

Can be rewritten as $$A(y)X = 0$$

where

$$A(y) = \begin{bmatrix} 1 & 2y-5 \\ 3 & 4y - 6 \end{bmatrix}$$ and $$X = \begin{bmatrix} x \\ 1 \end{bmatrix}$$

Then $det(A(y))$ is a polynomial $-2y + 9$ with a root $y_1 = -4.5$

Substituting it back gives us $A(y_1) = \begin{bmatrix} 1 & 4 \\ 3 & 12 \end{bmatrix}$ with a basis vector of its nullspace $p = \begin{bmatrix} -0.97014 & 0.24254 \end{bmatrix}$, dividing it by its last component gives a correct answer $x = -4$, the magic did happen indeed.

I understand that these questions are answered by algebraic geometry so I started studying it (Cox, Little, O'Shea, Using Algebraic Geometry) but things go really slowly. Due to my working schedule, it seems I'll be able to go through it in a year or more. But I hate using methods that I don't get at least at some intuitive level.

Is the amount of theory needed to be understood can be worked in a, say, a week? Any advices on books or maybe some intuitions or explanations would be really-really helpful.

2

There are 2 best solutions below

0
On

Don't forget the eigenvectors are not unique, meaning that if $v$ is an eigenvector of matrix $A$ then so is $-v$. Remember the definition of an eigenvector v of $A$ being any v that satisfies the equation: $A v = \lambda v$ and so -v will satisfy too: $A (-v) = \lambda (-v)$.

Also, don't forget that the solution to $Ax=0$ is the right-singular vector of $A$, not the eigenvector of $A$. That is, first form matrix $M=A^{T}A$ and the solution is the eigenvector of $M$ corresponding to the smallest eigenvalue of $M$.

0
On

I happen to have the same doubt!

At first I thought the method doesn't work because the system you posted has no real solutions but I realized that's not necessarily the case (see example below).

First, let's show the system has no real solutions.

Just pick the first equation. We can show that this equation has no real solutions in different ways, maybe the simplest is to solve for $x$ (as if $y$ was fixed): $$x = -y - 2 \pm \sqrt{-2 y^{2} - y - 2} $$ For $x$ to be real, we need the expression inside the square root to be positive. Nevertheless that expression is a downwards parabola (negative sign of $y^2$) which doesn't even intersect the $y=0$ axis. You can see this by solving this quadratic equation $-2 y^{2} - y - 2 = 0$ to obtain $y= -1 \pm \sqrt{-15}$ or you can plot the parabola as I do

Plot of -2 y^{2} - y - 2

So no value of $y$ will produce a real $x$, that is, the polynomial equation has no solution over the real numbers and therefore neither does the polynomial system.

Now going back to the original question:

Why and how does method work? And you'll probably also wondering: Does the method work when there are real solutions?

I'm also trying to understand the paper "Five-Point Motion Estimation Made Easy" which is probably where you read about this hidden variable resultant and nullspace technique, so please bear with me.

Let's try to work through an example, to see if we can get a better understanding. To start, I set up a system like yours (2 variables and 3 equations) that I know has real solutions. Having 2 variables allows me to plot it in 2 dimensions to get extra intuition.

Consider those 3 polynomial equations $$ \left\{ \begin{align*} \color{red}{f_1} &= (x -4)^2 + 2*(y +2)^2 - 3^2 &=0 \\ \color{green}{f_2} &= 3*(x -1)^2 + 3*(y +4)^2 - 3*2^2 &=0 \\ \color{blue}{f_3} &= (x -3)*(y +2) &=0 \end{align*} \right. $$

Geometrically $f_1$ is just an ellipse centered at (4,-2), $f_2$ is a circle at (1,-4) with radius 2 (equation multiplied by 3 for added fun) and $f_3$ is a line $x=-2$ and a line $y=3$ as shown below in their respective colors:

Plots of the 3 functions

You can visually note that the solutions (the intersections of all 3 shapes) are $\left\{ x=1, y=-2 \right\}$ and $\left\{ x=3, y=-4 \right\}$.

We can rewrite them in the expanded form $$ \left\{ \begin{alignat*}{6} \color{red}{f_1} &= x^2 && && -8x && +2y^2 && +8y && +15 &=0 \\ \color{green}{f_2} &= 3x^2 && && -6x && +3y^2 && +24y && +39 &=0 \\ \color{blue}{f_3} &= && x y && +2x && && -3y && -6 &=0 \end{alignat*} \right. $$

which, hiding variable $y$, can be factored into: $$ \underbrace{\left[\begin{array}{rrr} 1 & -8 & 2 \, y^{2} + 8 \, y + 15 \\ 3 & -6 & 3 \, y^{2} + 24 \, y + 39 \\ 0 & y + 2 & -3 \, y - 6 \end{array}\right]}_{A(y)} \underbrace{\left[\begin{array}{r} x^{2} \\ x \\ 1 \end{array}\right]}_{X} = \left[\begin{array}{r} 0 \\ 0 \\ 0 \end{array}\right] $$

Setting $\det(A)=0$ results in $3 \, y^{3} + 6 \, y^{2} - 48 \, y - 96 =0$ which in term gives us 3 solutions: $y_1 = -2$, $y_2 = 4$ and $y_3 = -4$.

You might already notice that $y=-4$ will not lead to a valid solution since none of our solutions have that $y$ value. But let's continue as if we didn't know that.

Now, for each solution let's find the SVD decomposition $A(y)= U \, \Sigma \, V^T$. I rounded the result to 2 digits of precision for readability:

$$ A(y_1)= \left[\begin{array}{rrr} -0.83 & -0.55 & 0.00 \\ -0.55 & 0.83 & 0.00 \\ 0.00 & 0.00 & 1.0 \end{array}\right] \left[\begin{array}{rrr} 13. & 0.00 & 0.00 \\ 0.00 & 2.4 & 0.00 \\ 0.00 & 0.00 & \mathbf{0.00} \end{array}\right] \left[\begin{array}{rrr} -0.20 & 0.78 & -0.59 \\ 0.79 & -0.23 & -0.57 \\ \mathbf{0.58} & \mathbf{0.58} & \mathbf{0.58} \end{array}\right] $$

$$ A(y_2)= \left[\begin{array}{rrr} -0.40 & 0.61 & 0.69 \\ -0.91 & -0.33 & -0.23 \\ 0.091 & -0.72 & 0.69 \end{array}\right] \left[\begin{array}{rrr} 200. & 0.00 & 0.00 \\ 0.00 & 7.2 & 0.00 \\ 0.00 & 0.00 & \mathbf{0.00} \end{array}\right] \left[\begin{array}{rrr} -0.016 & 0.046 & -1.0\\ -0.055 & -1.0 & -0.045 \\ \mathbf{1.0} & \mathbf{-0.054} & \mathbf{-0.018} \end{array}\right] $$

$$ A(y_3)= \left[\begin{array}{rrr} 0.87 & 0.38 & -0.31 \\ -0.36 & 0.93 & 0.10 \\ 0.33 & 0.023 & 0.94 \end{array}\right] \left[\begin{array}{rrr} 19. & 0.00 & 0.00 \\ 0.00 & 9.5 & 0.00 \\ 0.00 & 0.00 & \mathbf{0.00} \end{array}\right] \left[\begin{array}{rrr} -0.012 & -0.29 & 0.96 \\ 0.33 & -0.91 & -0.27 \\ \mathbf{-0.94} & \mathbf{-0.31} & \mathbf{-0.10} \end{array}\right] $$

The right nullspace of $A(y_i)$ is given by the span of the right singular vectors (which I highlighted in bold in $V^T$) corresponding to the zero singular values (also highlighted in bold $\Sigma$). In all 3 cases, there is only 1 zero singular value and therefore one corresponding right singular vector.

Let's pause for a second now: the paper only mentions

Step 4: Back-substitute the real roots of $z$. Compute the null-space of $C(z)$ and extract $x$, $y$ from the null-space.

It does not mention how one extracts $x$ and $y$ from this subspace. You very well noted that you must fisrt make the solution comply with the definition of $X$ by scaling the nullspace so that the last component is $1$. But why not enforncing also the other implicit constraint in the definition of $X$, namely for $X[0]=X[1]^2$ (where the bracket represent array indexing, that is, $X_0=X_1^2$ )? That is, our definition of $X$, brings two implicit constraints:

  • Constraint 1: $X[0]=X[1]^2$
  • Constraint 2: $X[2]=1$

For this particular problem one can argue that, for each $y_i$, the nullspace is 1 dimensional (has only 1 degree of freedom), so that one constraint suffices to fix that degree of freedom, but both constraints have to be taken into consideration. Let's do that:

Let's call $\left[ a, b, c \right]^T$ the right singular vector associated to the $0$ singular value for a particular value of $y$:

$$ \mathrm{null}\left({A(y)}\right) = \mathrm{span} \left\{ \begin{bmatrix} a \\ b \\ c \\ \end{bmatrix} \right\} = \lambda \begin{bmatrix} a \\ b \\ c \\ \end{bmatrix} = \begin{bmatrix} \lambda a \\ \lambda b\\ \lambda c\\ \end{bmatrix} \quad \text{for $\lambda \in \mathbb{R}$} $$

  • Constraint 1 translates to $\lambda a = (\lambda b)^2 $ which means $\lambda = \frac{a}{b^2}$
  • Constraint 2 translates to $\lambda c = 1 $ which means $\lambda = \frac{1}{c}$

It is clear, that unless both $\lambda$ are the same, we won't be able to fullfill both conditions and therefore that particular $y$ will not lead to a valid solution.

Let's apply that to our 3 candidate $y$'s and when both lambdas coincide, let's calculate the solution:

$\require{cancel}$ \begin{alignat*}{3} \mathrm{null}\left({A(y_1)}\right) &= \mathrm{span} \left\{ \begin{bmatrix} 0.58 \\ 0.58 \\ 0.58 \\ \end{bmatrix} \right\} &\; \Longrightarrow \; &\lambda = \begin{cases} 1.73, & \text{for Constraint 1} \\ 1.73, & \text{for Constraint 2} \end{cases} &\; \Longrightarrow &\; s_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \\ \end{bmatrix} \\ \mathrm{null}\left({A(y_2)}\right) &= \mathrm{span} \left\{ \begin{bmatrix} 1.0 \\ -0.054 \\ -0.018 \\ \end{bmatrix} \right\} &\; \Longrightarrow \; &\lambda = \begin{cases} 336.67, & \text{for Constraint 1} \\ -55.09, & \text{for Constraint 2} \end{cases} &\; \Longrightarrow &\; \xcancel{s_2} \\ \mathrm{null}\left({A(y_3)}\right) &= \mathrm{span} \left\{ \begin{bmatrix} -0.94 \\ -0.31 \\ -0.10 \\ \end{bmatrix} \right\} &\; \Longrightarrow \; &\lambda = \begin{cases} -9.54, & \text{for Constraint 1} \\ -9.54, & \text{for Constraint 2} \end{cases} &\; \Longrightarrow &\; s_3 = \begin{bmatrix} 9 \\ 3 \\ 1 \\ \end{bmatrix} \end{alignat*}

We can now read the value of $x$ as $X[1]$ (or $\sqrt{X[0]}$ since it is the same) and form the solutions candidates $\left\{ x=1, y=-2 \right\}$ and $\left\{ x=3, y=-4 \right\}$.

Note in your example, all possible solutions die at this stage.

As a last step we should not forget to substitute each solution candidate in the original system of equation and verify it is indeed a solution.

Let's recap the logic behind this last part of the algorithm:

It is clear that any solution to the original problem needs to belong to the nullspace of $A$ (since the original system of equation is equivalent to $A X =0$) , but we know that $X$ is not made of 3 independent variables (that is, we have created implicit constraints in the definition of $X$), therefore we proceed in 2 steps: first we disregard the implicit constraints of $X$ (we solve for the nullspace of $A$) and then we apply those constraints to the previous set (the nullspace). In effect we are intersecting 2 sets, the set of solutions to $A X =0$ and the set of 3 real numbers that fulfill the implicit constraints. Note the intersection could be empty as in your example.

This is a recurrent idea in the literature, actually it has already been used in the paper itself. That is, the 5 point algorithm stems from the idea of using initially only some of the constraints of the problem (the epipolar constraints) to obtain obtaining the set that satisfies those constraints (a 4 dimensional nullspace in this case) and then intersect that set with the set of valid Essential matrices.

For yet another example of the intersection between the nullspace and the contrainsts check the famous paper "Hand-Eye CalibrationUsing Dual Quaternions", particularly section 6, in which the author forms a linear system of equations (see eq 31) on variables that have some constraints (see eq 32). He then obtains a 2 dimensional nullspace and intersects it with the constraints to obtain the final solution.

Regarding the mysterious sentence mentioned in the "Five-Point Motion Estimation Made Easy":

We notice that the monomial vector of eq.(6) has exhausted all the bi-variate combinations of variables $x$ and $y$ up to order three. Therefore, one can find the solutions of $x$ and $y$ simply as the right null-space ofC(z)

I also fail to see an obvious connection between the facts exposed here and the fact that the monomial vector has all bi-variate combinations of variables $x$ and $y$ up to order three.

Conclusion

Finally let's conclude with this remark extracted from the paper Are resultant methods numerically unstable for multidimensional rootfinding?:

hidden-variable resultant methods are notoriously difficult, if not impossible, to make numerically robust. Most naive implementations will introduce unwanted spurious solutions, compute roots inaccurately, and unpredictably miss zeros 8. Spurious solutions can be removed by manually checking that all the solutions satisfy (1.1), inaccurate roots can usually be polished by Newton's method, but entirely missing a zero is detrimental to a global root finding algorithm.

My advice if you're working on Visual Odometry is to check newer methods such as "An Efficient Solution to Non-Minimal CaseEssential Matrix Estimation", "Certifiable Relative Pose Estimation" or "Graduated Non-Convexity for Robust Spatial Perception:From Non-Minimal Solvers to Global Outlier Rejection".

I hope that helps.