How do I split a number grid into rectangular polygons?

140 Views Asked by At

I am starting with a grid of numbers of a similar format to this:

4,4,4,4,4,4,4,2,9,9,9,1,1,1,1,
4,4,4,4,4,2,2,1,9,9,9,1,1,1,1,
4,4,2,4,4,2,1,1,9,1,9,1,1,9,1,
2,1,2,1,1,1,2,1,1,1,1,1,9,1,1,
9,9,9,1,1,1,1,9,9,9,9,9,2,1,1,

Each horizontally or vertically contiguous grouping of digits corresponds to a particular polygon. For instance, the 9s form 5 distinct polygons, while the 4s form a single one.

enter image description here

I am attempting to programatically determine the vertices of these shapes. A pseudocode version of my algorithm would be:

foreach cell in the input
  if the cell has been visited
    continue
  else
    recursively add all neighbors that match the cell's digit to a list of cells
    when all neighbors have been added to the list, 
      find the top-most and then left-most cell. (I believe this is guaranteed to be a vertex)

      starting from the top-left corner of the cell, trace the polygon clockwise 
    endwhen
    set all cells in the list to "visited"'
endforeach

I have used this method successfully for simple polygons, both convex and concave, but there is one major flaw in this method -- it will never find holes in complex polygons.

I think I have determined an algorithm that will find the holes post-facto, but I thought I would take a step back and see if there wasn't a better, more-robust way of determining both vertices and holes simultaneously.

Any critiques / suggestions? Or am I on the right track, here?

1

There are 1 best solutions below

6
On BEST ANSWER

It can be much simpler than that, actually.

Add four boolean attributes for each cell, corresponding to the left, right, top, and bottom edges (borders). If the value in a cell differs from the value in a neighboring cell, then the border exists in the current cell.

If we visualize OP's data using this scheme, we get something like

   ▛▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▜▛▀▜▛▀▀▀▀▀▀▀▜▛▀▀▀▀▀▀▀▀▀▀▜
   ▌4  4  4  4  4  4  4▐▌2▐▌9  9  9▐▌1  1  1  1▐
   ▌              ▄▄▄▄▄▟▙▄▟▌       ▐▌          ▐
   ▌             ▐▛▀▀▀▀▜▛▀▜▌       ▐▌          ▐
   ▌4  4  4  4  4▐▌2  2▐▌1▐▌9  9  9▐▌1  1  1  1▐
   ▌     ▄▄▄     ▐▌  ▄▄▟▌ ▐▌  ▄▄▄  ▐▌     ▄▄▄  ▐
   ▌    ▐▛▀▜▌    ▐▌ ▐▛▀▀  ▐▌ ▐▛▀▜▌ ▐▌    ▐▛▀▜▌ ▐
   ▌4  4▐▌2▐▌4  4▐▌2▐▌1  1▐▌9▐▌1▐▌9▐▌1  1▐▌9▐▌1▐
   ▙▄▄▄▄▟▌ ▐▙▄▄▄▄▟▙▄▟▙▄▄  ▐▙▄▟▌ ▐▙▄▟▌  ▄▄▟▙▄▟▌ ▐
   ▛▀▜▛▀▜▌ ▐▛▀▀▀▀▀▀▀▜▛▀▜▌  ▀▀▀   ▀▀▀  ▐▛▀▜▛▀▀  ▐
   ▌2▐▌1▐▌2▐▌1  1  1▐▌2▐▌1  1  1  1  1▐▌9▐▌1  1▐
   ▙▄▟▙▄▟▙▄▟▌       ▐▙▄▟▙▄▄▄▄▄▄▄▄▄▄▄▄▄▟▙▄▟▌    ▐
   ▛▀▀▀▀▀▀▀▜▌        ▀▀▜▛▀▀▀▀▀▀▀▀▀▀▀▀▀▜▛▀▜▌    ▐
   ▌9  9  9▐▌1  1  1  1▐▌9  9  9  9  9▐▌2▐▌1  1▐
   ▙▄▄▄▄▄▄▄▟▙▄▄▄▄▄▄▄▄▄▄▟▙▄▄▄▄▄▄▄▄▄▄▄▄▄▟▙▄▟▙▄▄▄▄▟

If you implement a simple destructive walker, that can start at any given cell with one to four borders, consumes the borders it walks, and walks the region boundary in counterclockwise (or clockwise) order, you get exactly the polygons you want. Note that holes in polygons will then be drawn in the opposite order.

Note that you can use a double loop to examine each cell whether a walk should be started or not. The walker itself is a simple loop: if you walk the borders keeping the border on the right side (as if you kept your right hand on it), you are guaranteed to walk the boundary counterclockwise, and arrive at the point you started from. So, other than the edge information (4 bits per cell), no extra memory is needed.

Note the "rounded" corners. If you walk the regions counterclockwise, keeping the border on the right side, these occur when you arrive in a cell with no borders at all, and need to turn right (with respect to the direction you just walked). If you cannot turn right, the walk is complete.

(I used a small C program with wide-character output to draw the above Unicode diagram, just to verify the details, so this does work.)


Here is an example program, that reads a grid (of up to 252×252 cells) from standard input, accepting only decimal digits (0 to 9), and outputs an SVG image to standard output describing the grid, grid edges, and the polygons generated:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

#define   VALUE        15

#define   EDGE_TOP     16
#define   EDGE_RIGHT   32
#define   EDGE_BOTTOM  64
#define   EDGE_LEFT   128

#define   UP            0
#define   RIGHT         1
#define   DOWN          2
#define   LEFT          3

#ifndef   MAX_ROWS
#define   MAX_ROWS    254
#endif
#ifndef   MAX_COLS
#define   MAX_COLS    254
#endif
#ifndef   XSTEP
#define   XSTEP        32
#endif
#ifndef   YSTEP
#define   YSTEP        32
#endif

const char *svg_digit[16] = {
    "m0-8c-5 0-5 4-5 8s0 8 5 8 5-4 5-8 0-8-5-8z",
    "m-2-5 2-3v16",
    "m5+8h-10v-3c0-4 10-6 10-10 0-2-2-3-5-3s-5 1-5 3",
    "m-5+5c0 2 2 3 5 3s5-2 5-4c0 -3-2-4-5-4 3 0 5-1 5-4 0-2-2-4-5-4s-5 1-5 3",
    "m4-0h-9l7-8v16",
    "m5-8h-10v6c8 0 10 1 10 6 0 3-2 4-5 4-3 0-5-1-5-3",
    "m4-8c-8 0-8 8-8 11s1 5 4 5 4-2 4-4-1-4-4-4-4 2-4 4",
    "m-4-8h8v3s-4 7-4 13",
    "m-4-4c0-3 1-4 4-4s4 1 4 4c0 4-8 4-8 8 0 3 1 4 4 4s4-1 4-4c0-4-8-4-8-8z",
    "m-4+8c8 0 8-8 8-11s-1-5-4-5-4 2-4 4 1 4 4 4 4-2 4-4",
    "", "", "", "", "", ""
};

const char *svg_color[16] = {
    /* 0 */ "#000",
    /* 1 */ "#ff0000",
    /* 2 */ "#00ff00",
    /* 3 */ "#cccc00",
    /* 4 */ "#0000ff",
    /* 5 */ "#cc00cc",
    /* 6 */ "#00cccc",
    /* 7 */ "#999999",
    /* 8 */ "#ff9966",
    /* 9 */ "#6699ff",
    "#000", "#000", "#000", "#000", "#000"
};

int main(void)
{
    unsigned char cell[1+MAX_ROWS+1][1+MAX_COLS+1];
    int           rows = 0, cols = 0;
    int           row  = 1, col  = 1;
    int           ch;

    /* Clear grid. '0' is "outside". */
    memset(cell, 0, sizeof cell);

    /* Read decimal digit grid from standard input.
       For simplicity, the grid has one extra (empty, 0)
       cell around the actual data. */
    while ((ch = getc(stdin)) != EOF)
        if (ch == '\n') {
            row++;
            col = 1;
            continue;
        } else
        if (ch >= '0' && ch <= '9') {
            if (row <= MAX_ROWS && col <= MAX_ROWS)
                cell[row][col] = ch - '0';
            if (rows < row)
                rows = row;
            if (cols < col)
                cols = col;
            col++;
        }

    if (!rows || !cols) {
        fprintf(stderr, "No grid in standard input.\n");
        return EXIT_FAILURE;
    }
    if (rows > MAX_ROWS || cols > MAX_COLS) {
        fprintf(stderr, "Standard input grid is too large.\n");
        return EXIT_FAILURE;
    }

    /* Define the edges. */
    for (row = 1; row <= rows; row++)
        for (col = 1; col <= cols; col++) {
            if ((cell[row][col] & VALUE) != (cell[row][col-1] & VALUE))
                cell[row][col] |= EDGE_LEFT;
            if ((cell[row][col] & VALUE) != (cell[row][col+1] & VALUE))
                cell[row][col] |= EDGE_RIGHT;
            if ((cell[row][col] & VALUE) != (cell[row-1][col] & VALUE))
                cell[row][col] |= EDGE_TOP;
            if ((cell[row][col] & VALUE) != (cell[row+1][col] & VALUE))
                cell[row][col] |= EDGE_BOTTOM;
        }

    /* Output an SVG image showing the outlines. */
    printf("<?xml version=\"1.1\" encoding=\"UTF-8\" standalone=\"no\"?>\n");
    printf("<svg xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"0 0 %d %d\">\n",
                 (cols+2)*XSTEP, (rows+2)*YSTEP);

    /* White background rectangle. */
    printf("<rect x=\"0\" y=\"0\" width=\"%dpx\" height=\"%dpx\" fill=\"#fff\" stroke=\"none\" />\n",
                 (cols+2)*XSTEP, (rows+2)*YSTEP);

    /* All values first. */
    printf("<path d=\"");
    for (row = 1; row <= rows; row++)
        for (col = 1; col <= cols; col++)
            printf("M %d%+d %s", col*XSTEP+XSTEP/2, row*YSTEP+YSTEP/2, svg_digit[cell[row][col] & VALUE]);
    printf("\" fill=\"none\" stroke=\"#000\" stroke-width=\"1px\" stroke-linejoin=\"round\" stroke-linecap=\"round\" />\n");

    /* All edge segments. */
    printf("<path d=\"");
    for (row = 0; row <= rows + 1; row++)
        for (col = 0; col <= cols + 1; col++) {
            if (cell[row][col] & EDGE_TOP)
                printf("M %d,%d h %d", col*XSTEP+1, row*YSTEP+1, XSTEP-2);
            if (cell[row][col] & EDGE_BOTTOM)
                printf("M %d,%d h %d", col*XSTEP+1, (row+1)*YSTEP-2, XSTEP-2);
            if (cell[row][col] & EDGE_LEFT)
                printf("M %d,%d v %d", col*XSTEP+1, row*YSTEP+1, YSTEP-2);
            if (cell[row][col] & EDGE_RIGHT)
                printf("M %d,%d v %d", (col+1)*XSTEP-2, row*YSTEP+1, YSTEP-2);
        }
    printf("\" stroke=\"#000\" stroke-width=\"1px\" fill=\"none\" />\n");

    /* Walker start loop. Note that each closed loop must have at least one top edge. */
    for (row = 1; row <= rows; row++)
        for (col = 1; col <= cols; col++)
            if (cell[row][col] & EDGE_TOP) {
                const unsigned char v = cell[row][col] & VALUE;
                int                 r = row;
                int                 c = col;
                int                 d = RIGHT;

                printf("<path title=\"%d\" d=\"M %d %d", v, c*XSTEP, r*YSTEP);

                while (1)
                    if (d == RIGHT) { /* Top edge, right */
                        cell[row][col] &= ~EDGE_TOP;
                        printf(" h%+d", XSTEP);

                        if ((cell[row][col+1] & VALUE) == v &&
                            (cell[row-1][col+1] & VALUE) == v &&
                            cell[row-1][col+1] & EDGE_LEFT) {
                            --row;
                            ++col;
                            d = UP;
                        } else
                        if ((cell[row][col+1] & VALUE) == v &&
                            cell[row][col+1] & EDGE_TOP) {
                            ++col;
                            d = RIGHT;
                        } else
                        if (cell[row][col] & EDGE_RIGHT) {
                            d = DOWN;
                        } else
                            break;

                    } else
                    if (d == DOWN) { /* Right edge, down */
                        cell[row][col] &= ~EDGE_RIGHT;
                        printf(" v%+d", YSTEP);

                        if ((cell[row+1][col] & VALUE) == v &&
                            (cell[row+1][col+1] & VALUE) == v &&
                            cell[row+1][col+1] & EDGE_TOP) {
                            ++row;
                            ++col;
                            d = RIGHT;
                        } else
                        if ((cell[row+1][col] & VALUE) == v &&
                            cell[row+1][col] & EDGE_RIGHT) {
                            ++row;
                            d = DOWN;
                        } else
                        if (cell[row][col] & EDGE_BOTTOM) {
                            d = LEFT;
                        } else
                            break;

                    } else
                    if (d == LEFT) { /* Bottom edge, left */
                        cell[row][col] &= ~EDGE_BOTTOM;
                        printf(" h%+d", -XSTEP);

                        if ((cell[row][col-1] & VALUE) == v &&
                            (cell[row+1][col-1] & VALUE) == v &&
                            cell[row+1][col-1] & EDGE_RIGHT) {
                            ++row;
                            --col;
                            d = DOWN;
                        } else
                        if ((cell[row][col-1] & VALUE) == v &&
                            cell[row][col-1] & EDGE_BOTTOM) {
                            --col;
                            d = LEFT;
                        } else
                        if (cell[row][col] & EDGE_LEFT) {
                            d = UP;
                        } else
                            break;

                    } else
                    if (d == UP) { /* Left edge, up */
                        cell[row][col] &= ~EDGE_LEFT;
                        printf(" v%+d", -YSTEP);

                        if ((cell[row-1][col] & VALUE) == v &&
                            (cell[row-1][col-1] & VALUE) == v &&
                            cell[row-1][col-1] & EDGE_BOTTOM) {
                            --row;
                            --col;
                            d = LEFT;
                        } else
                        if ((cell[row-1][col] & VALUE) == v &&
                            cell[row-1][col] & EDGE_LEFT) {
                            --row;
                            d = UP;
                        } else
                        if (cell[row][col] & EDGE_TOP) {
                            d = RIGHT;
                        } else
                            break;

                    } else
                        break; /* Should never occur. */

                printf("z\" fill=\"none\" stroke=\"%s\" opacity=\"0.5\" stroke-width=\"3px\" />\n", svg_color[v]);
            }

    printf("</svg>\n");

    return EXIT_SUCCESS;
}

Using OP's example input, the resulting SVG image looks like this: Grid image If you open the SVG image in e.g. Inkscape, you can select and examine each individual polygon. The polygon outlines are thicker, with 50% opacity.

It is not difficult to see or prove that if there are $N$ cells, the above uses $O(N)$ space complexity (the cell[][] array; $4 N$ bits for the edge data), and $O(N)$ time complexity. (For the time complexity, note that in the while loop, each cell can be visited at most $4 N$ times, because each visit eliminates one edge.)