First check
I considered the Constant Elasticity of Variance (CEV) model:
$$
dS = S\mu dt + S^\alpha \sigma dW
$$
and numerically approximated it with the Milstein method:
$$
S(t+dt) = S(t) + S(t)\cdot \mu \cdot dt + S(t)^\alpha \cdot \sigma \cdot \varepsilon_t \cdot \sqrt{dt} + \frac{1}{2} \cdot \alpha \cdot S(t)^{2\alpha-1} \cdot \sigma^2 \cdot \left( (\varepsilon_t \cdot \sqrt{dt})^2 -dt \right)
$$
where $\varepsilon_t$ is a random variable with mean 0 and variance 1. Then I applied ito's lemma to the process $X= \ln S$:
$$
d \ln S = \left(\mu - \frac{S^{2\alpha - 2} \sigma^2}{2} \right)dt + S^{\alpha-1}\sigma dW
$$
Second check
I considered the following modification of the Constant Elasticity of Variance (CEV) model:
$$
dS = S\mu dt + S^\alpha \sigma_1 dW_1 + S\sigma_2dW_2
$$
which I call Modified CEV, and numerically approximated it with the Milstein method:
$$
S(t+dt) = S(t) + S(t)\cdot \mu \cdot dt + S(t)^\alpha \cdot \sigma_1 \cdot \varepsilon_t \cdot \sqrt{dt} + S(t) \cdot \sigma_2 \cdot \epsilon_t \cdot \sqrt{dt} + \frac{1}{2} \cdot \alpha \cdot S(t)^{2\alpha-1} \cdot \sigma_1^2 \cdot \left( (\varepsilon_t \cdot \sqrt{dt})^2 -dt \right) + \frac{1}{2} \cdot S(t) \cdot \sigma_2^2 \cdot \left( (\epsilon_t \cdot \sqrt{dt})^2 -dt \right)
$$
where $\varepsilon_t$ and $\epsilon_t$ are two random variables with mean 0 and variance 1. Then I applied ito's lemma to the process $X= \ln S$:
$$
d \ln S = \left(\mu - \frac{S^{2\alpha-2}\sigma_1^2 + \sigma_2^2+2S^{\alpha+1}\sigma_1 \sigma_2 \rho_{12}}{2} \right)dt + S^{\alpha-1} \sigma_1 dW_1 + \sigma_2 dW_2
$$
where I used the fact that $dW_1 \cdot dW_2 = \rho_{12}dt$.\
By implementing numerical simulation starting from the same diffusions ($\varepsilon_t$ and $\epsilon_t$) for $T=2$ years and $dt=1/12$ months, $\sigma_1 = 0.12$ and $\sigma_2 = 0.105$, $\mu=0$ I compared 4 different models:
- Modified CEV with $\alpha = 1$ approximated with Euler method;
- Modified CEV with $\alpha = 1$ approximated with Milstein method;
- Modified CEV with $\alpha \neq 1$ approximated with Euler method;
- Modified CEV with $\alpha \neq 1$ approximated with Milstein method;
The result is that differences in the process 1., 2., 4. dynamics are negligible, BUT the process 3. dynamic is significantly different from the others. I can't understand why this happens.
In the plot $\alpha = 0.5$:

While the proof of the orders of Milstein's method is more complicated, its construction can be easily obtained by one iteration of the integral equation. In the SDE (for scalar $X$) $$ dX_t=a(X_t)\,dt+\vec b(X_t)^Td\vec W_t $$ and for its first step with $X_0=x$ take the Euler-Maruyama formula as first approximation $$ \hat X_t=x+a(x)t+\vec b(x)^T\vec W_t $$ Then insert into the integral equation and use Taylor expansions of the coefficient formulas to get the next-higher error order $t^1$ or $\|\vec W_t\|^2$, \begin{align} X_t&=x+\int_0^t a(\hat X_s)\,ds+\int_0^t \vec b(\hat X_s)^T\,d\vec W_s\\ &=x+a(x)t+\int_0^t [\vec b(x)+\vec b'(x)\vec b(x)^T\vec W_s]^T\,d\vec W_s+O(t^{3/2})\\ &=x+a(x)t+\vec b(x)^T\vec W_t+{\rm trace}\left(\vec b'(x)\vec b(x)^T·\int_0^t\vec W_s\,d\vec W_s^T\right)+O(t^{3/2}) \end{align} Of the matrix of integrals in the last term, only the diagonal entries can be solved to the form you already used. The non-diagonal entries have to be sufficiently approximated by other means.