I am working on an algorithms problem, and I came up with a solution that I do not believe gives the optimal result, but I don't know how to go about showing it mathematically or coming up with a counter example. The problem is:
Suppose you have a $n \times m$ grid of tiles, where each tile is either a water tile (representing a lake) or a land tile. Suppose that this grid is entirely surrounded by the ocean. A water tile is connected to another water tile if it is adjacent to it, but diagonal "adjacency" do not qualify as connected, e.g., if there's a water tile at (1,1) and another at (2,2), these 2 are not considered to be connected. A lake is defined as one or more adjacently connected water tiles. In one step, you can convert any land tile to a water tile. What is the minimum number of steps to connect all lakes to the ocean? Note that there could be water tiles in the grid initially that are already connected to the ocean, e.g., any coordinate $(i,j )$ (assume 0-based) where $i \in \{0, n - 1\}$ or $j \in \{0, m - 1\}$ that are a water tile are connected to the ocean, or any water tile that has a path to the border.
The idea that I have is you repeatedly find the lake that is closest to the ocean. If there are multiple lakes with the same minimal distance to the ocean, randomly choose one. Connect that lake to the ocean using the path that requires the least number of steps. If there are multiple paths that require the same minimum number of steps, randomly choose one. And then find the next lake that is closest to the ocean with the addition that the previously connected lake is part of the ocean now. So you do this recursively until you are left with no lakes. The underlying assumption it is optimal for each lake (after already connecting the closer lakes to the ocean) to be connected to the closest ocean.
Here's an example below. I couldn't get the format to align properly on SE, so I'm pasting a picture from my text editor. This is a $7 \times 8$ grid where $L$ means land and $W$ means water. There are 3 lakes. One with the coordinates $\{(5, 0), (5, 1), (5, 2)\}$. One with the coordinates $\{(3, 3), (4, 3)\}$. The last one with coordinates $\{(2,5)\}$. Note that the water tiles at $(3,6), (3, 7)$ are already connected to the ocean initially.
I don't think my idea is optimal for a few reasons: (1) I don't think I can just randomly choose a lake if there are multiple lakes of the same minimal distance from the ocean. (2) For the lake with the minimum distance to the ocean, there could be multiple ways to connect it to the ocean for the same exact cost, and here, I choose the path randomly, which I don't think should be allowed. (3) Although, I can't think of a quick example right now, but I think there might be a situation where it's not optimal to connect a lake to the closest ocean.
After some more thinking, I can come up with a simple example to show (2).
My approach definitely doesn't work, but I'll leave it here for now. If anyone has any ideas, please share.


At first I thought this was the minimum spanning tree (MST) problem on a grid graph. The greedy algorithm for that is indeed optimal, and you can break ties arbitrarily.
After the comment by @user144527, I realized that it is not MST but instead is the minimum node-weighted Steiner tree problem, which is very difficult. Searching for this phrase should yield an instance for which your greedy algorithm fails to find an optimal solution.