Can you efficiently determine the number of possible solutions for an arbitrary starting sudoku configuration?

55 Views Asked by At

I thought it would be fun to implement a solver which updates in real time to show how many possibilities remain as you fill in squares in any order. I'm able to find a number of resources explaining how to calculate the number of possible solutions exist in total, and they go into detail about some reductions that can be made by enumerating the possibilities for the top three rows, but it's not clear to me if or how that analysis can be mapped to, say, a board with 7 digits filled in randomly. The more I skim around the more computationally infeasible this seems, but I have yet to find any literature directly addressing this question. It feels plausible it could be done with a sufficiently clever pre-computed lookup table.

1

There are 1 best solutions below

0
On

Because of the way that the cells in a sudoku are tightly bound to one another, there is likely no easy way to count the number of possible solutions without explicitly finding them.

To give a very simple demonstration of this effect, if we look at 4x4 sudoku (because it's fairly simple to enumerate all the solutions), look at how adding given digits in different arrangements changes the number of solutions:

A 4x4 sudoku grid with a 1 in the top-left corner. With this grid, there are 72 solutions.

A 4x4 sudoku grid with a 1 in the top-left corner and a 2 next to it. Putting a 2 here reduces the number of solutions to 24.

A 4x4 sudoku grid with a 1 in the top-left corner and a 2 in the third column of the second row. Put the 2 here instead and now there are only 12 solutions.

A 4x4 sudoku grid with a 1 in the top-left corner and a 2 in the bottom-right corner. Put it here, and there are 18 solutions.

A 4x4 sudoku grid with 1s in the top-left corner and the third column of the second row. Putting a 1 in this position instead leaves us with 36 solutions.

A 4x4 sudoku grid with 1s in opposite corners. Putting it here is 18 solutions.

A 4x4 sudoku with 1s in adjacent cells. And of course putting it here reduces the solutions to zero.

As a bit of a basic test, I took a random sudoku from online and put it into f-puzzles. The solve took basically no time at all (because it was mostly a case of placing single digits with a small number of pairs). Removing a single given digit resulted in 3 possible solutions, which the solver took about 5 seconds to enumerate. Changing the same digit to another value resulted in a unique solution which also took about 5 seconds to find (because the solver had to use several more sophisticated techniques and eventually hit a point where it just brute forced the solve).

Using MiniZinc, which is an optimisation modelling language, and the Gecode solver, which is designed for combinatorial problems, you can certainly enumerate the solutions quite quickly in comparison to f-puzzles, and you can also explore the searches it uses to see that it sometimes has to search deeper to prove that a solution doesn't exist under certain conditions than to find an actual solution. And once you have more than, say, 10,000 valid solutions to a grid, even a really efficient solver is going to take a while to find them all, likely much longer than you'd want for a system that updates in real time.

I think you could definitely borrow ideas from both specialised and generalised solvers and do some rudimentary counting of solutions, but if I were implementing it I would probably do some variation of the following:

  1. Use fast constraint elimination methods to confirm that there is still probably a solution (without explicitly finding one).

  2. For fewer than, say, 10-15 filled digits, use some heuristic combinatorial formulas to approximate the number of solutions.

  3. Once the number of givens reaches a critical mass, you can start counting actual solutions, but cap the search time to keep it from making the whole thing too clunky.

You may be able to take advantage of the dynamic nature of the system slightly, if your solver can use information from previous runs to start from a more informed position (e.g. if you add a digit then it should be able to use the constraints from before and just add in the extra information to reduce its search space).