Interactively view the steps that this app uses to process the image using OpenCV, classify the digits using Tensorflow, and then solve the puzzle using a recursive backtracking algorithm.
The first step in processing the image is thresholding to binarize the image. Simple thresholding - applying a global threshold to binarize the entire image - would not be sufficient for this task since the image is likely affected by different environment variables such as inconsistent lightning conditions throughout the image. Adapative thresholding is helpful because the algorithm determines thresholds based on neighboring pixels, which should have similar lighting conditions. This step helps to detect the correct contours.
This step shows the outline of the sudoku grid found in the image. A contour is the outline of a shape. In this step, the algorithm detects all the polygons found in the image. Filtering is then done to detect the polygon that has 4 vertices and a large area, which should be the contour of the sudoku grid.
The next step is a geometric transformation. Finding the vertices of the sudoku grid allows us to isolate and warp the grid such that we have a bird's eye view of the puzzle. This is done by finding the transformation matrix given the 4 vertices from the image and the 4 destination vertices. Notice how even if you rotate or gently warp the sudoku puzzle, it will still be properly transformed to the bird's eye view. The transformation matrix will be important later as the inverse of that matrix is used to project the solution back onto the original image.
Trained a Tensorflow model using Python and serialized the model to produce inference in the browser with Tensorflow JS. The model was trained on MNIST (handwritten digits) in addition to a font dataset found here. I augmented the MNIST dataset with a font dataset because sudoku puzzles are typically generated using computer generated fonts rather than hand written digits, so adding this diversity improved the model to recognize digits.
I implemented a backtracking algorithm to solve the sudoku puzzle. This is a recursive algorithm that incrementally incrementally tries new solutions, checks if the solution is valid, and if it isn't valid then it 'backtracks' to a previous candidate that has more untried search branches. In this case, it determines the possible values for a cell and then recursively traverses the search tree from top left to bottom right until the board is completed. Backtracking algorithms are more efficient than brute force methods because they don't need to try every possible combination.
The final step. Inverse the perspective transformation matrix described above to project the predicted cell values and the solution back onto the original image, providing an AR-like feel.