Detect clusters in an RGB space

59 Views Asked by At

With OCR one of the first stages in a processing pipeline is to accurately identify what is print and what is paper, generally achieved by some form of binarization.

In the good old days most scans were captured as grayscale images. For clean images with a high level of contrast, i.e. already largely black and white, Otsu's method https://en.wikipedia.org/wiki/Otsu%27s_method worked quite well.

However once things like yellowing of paper, brown water stains, notes in blue biro, and other artefacts that appear as dark grey or black in a greyscale image, along with faded print are added into the mix we find ourselves in trouble.

These days almost all scanners and smart phones capture colour images and I would like to take advantage of the essentially 3D nature of RGB colour vs 2D grayscale images to improve the accuracy of the process.

To that end can you please help me with links to methods/algorithms that help to identify the centres of clusters of colour in a 3D RGB space.