Suppose one is given a set of $M$ points distributed on a "grid", i.e: $$x_i = x_0 + \alpha n_i + \epsilon_i, \quad n_i\in\mathbb{Z}$$ This might like something like this:
$\quad\ \quad\quad\quad\quad\quad\quad$
That is, each of the points lies on a grid position with grid size $\alpha$, up to some amount of Gaussian noise $\epsilon_i \sim N(0,\sigma)$.
I have tried writing out the maximum likelihood estimator, but since the $n_i$ are not known a priori, one gets an unsolvable equation for $\alpha$: $$\alpha = \frac{\sum (x_i - x_0)n_i}{\sum n_i^2}\ =\ ?$$
How would one estimate the grid size $\alpha$?
Intuitively, one can easily "see" the solution, but it's not clear how one could get the MLE. In my mind, the problem stems from the fact that $n_i$ are integers is not being used, though it's not clear how one would enforce this constraint.
I would probably use the following numerical approach. Assume for the moment that the parameters $x_0$ and $\alpha$ are known. We can then compute for each data point $i$ the most likely value for $n_i$ as follows: $n_i = (x_i - x_0) / \alpha$. The result is a real number, so we round it to the nearest integer. Next we compute the difference $\Delta x_i$ between the actual data point and the value obtained from the grid model. Now evaluate $S$, the sum of the squares of $\Delta x_i$. We expect $S$ to have a well-defined, sharp minimum as a function of $x_0$ and $\alpha$. These are the maximum likelihood estimators.
EDIT: I thought about this interesting problem for a couple of days. I also did some numerical tests. We can arrange the values $x_i$ in ascending order, and then compute the nearest neighbour differences $y_i = x_{i+1} - x_i$. Now ideally, the original set of values is both large and compact, and the noise very small compared to the step size. Then the ordered set of $y_i$ values might look like: $1, 1, 1, 2, 5, 6, 6, 6, 7, 10, 12...$ Now it is a trivial matter to recognize that the step size is equal to $1$. [This value may then be fine-tuned further by minimizing $S$.] On the other hand, the worst case scenario is that the points are separated by large multiples of the step size, and that the noise is of the order of the step size. Then the error function $S$ becomes extremely messy. It oscillates wildly, and there is no apparent correlation between the positions of all the minima in $S$ and the actual step size. On that ground, I think it is unlikely that there exists an analytic solution for the general case.
EDIT 2: The OP's problem is stated so broadly, that on the one hand it is applicable to easy cases where finding the solution is almost trivial, while on the other hand allowing for very tough cases (very small $\alpha$ and very large $n_i$; or noise of the order of $\alpha$) that might well be insolvable. For that reason I suggest the following approach [based on Fourier analysis], since it also provides a test to distinguish the "solvable" cases from the "insolvable" cases.
First compute the following four sum functions:
$$f_1(\alpha) = (1/N)* \Sigma _{j=1} ^N \, cos(2 \pi x_j / \alpha)$$
$$f_2(\alpha) = (1/N)* \Sigma _{j=1} ^N \, sin(2 \pi x_j / \alpha)$$
$$g_1(\alpha) = (1/N)* \Sigma _{j=1} ^N \, x_j cos(2 \pi x_j / \alpha)$$
$$g_2(\alpha) = (1/N)* \Sigma _{j=1} ^N \, x_j sin(2 \pi x_j / \alpha)$$
Now we define the error function $S(\alpha)$ of the fit as follows:
$$S(\alpha) = 1 - f_1^2(\alpha) - f_2^2(\alpha)$$
Note that $0 \le S \lt 1$. The case $S = 0$ corresponds to a perfect fit of the grid size $\alpha$ to the data $x_j$. In general, for arbitrary values of $\alpha$, $S$ is typically $0.9$. The task is to seek the global minimum of $S$. Denoting the derivative of $S(\alpha)$ with respect to $\alpha$ as $T(\alpha)$, we get:
$$T(\alpha) = f_1(\alpha) g_2(\alpha) - f_2(\alpha) g_1(\alpha)$$
We must solve: $T(\alpha) = 0$. Now in practice we will find many zeros for $T$. That is because the function $S$ has many local extrema. To distinguish a good (candidate) solution for the grid size $\alpha$ from a poor one, we simply take the accompanying value of $S$ into account and compare it to a maximum allowed value, say $M = 0.5$. Hence the method comes down to:
Find the values of $\alpha$ for which $T(\alpha) = 0$ and $S(\alpha) < M$.
For easy cases this method works well in obtaining the correct grid size (as a candidate solution that can be further tested and/or fine-tuned). However for tough cases, e.g. when the noise level is substantial, it is quite likely that no candidate solutions are found ! In my opinion this is acceptable. One can not expect more from a general method for this problem.