Let $\mathcal{X}=\{0,1,...,k-1\}=\mathcal{\hat{X}}$ for $k\geq 3$ and let $X$ be uniformly distributed over the source alphabet. Consider a distortion function given by \begin{align*} d(x,\hat{x}) = \begin{cases} 0 &: x=\hat{x}\\ 1 &: x=\hat{x}\pm1\mod l\\ \infty &: \text{otherwise} \end{cases} \end{align*}
Show that the rate distortion function is given by \begin{align*} R(D)= \begin{cases} \log l - D - H_b(D) &: 0 \leq D\leq\frac{2}{3}\\ \log l - \log3 &: \frac{2}{3}<D \end{cases} \end{align*}
where $H_b(\cdot)$ is the binary entropy function.
My attempt: Given the distortion function, we know that the conditional distribution is of the form.
$$ p(\hat{x}|x) = \begin{cases} 1-q &:\hat{x}=x\\ \frac{q}{2} &: \hat{x}=x\pm1\mod3\end{cases} $$
and
$$Ed(X,\hat{X}) = \frac{1}{l}\sum_\mathcal{X}(1-q)\cdot0+\frac{q}{2}\cdot1+\frac{q}{2}\cdot1 = q\leq D$$
and we can compute $$ p(\hat{x}) = \frac{1}{m}\\ p(x|\hat{x}) = p(\hat{x}|x) $$ hence $$H(X|\hat{X}) = H_b(q)+q$$ and $$I(X;\hat{X}) = H(X) - H(X|\hat{X}) = \log l - q - H_b(q)$$ and so by the distortion constraint computed above, this is minimized when $q=D$ and hence, $$R(D) = \log l -D - H_b(D)$$
I know I'm on the right track, but I'm not sure where the $0\leq D \leq \frac{2}{3}$ constraint in the final solution comes from.
I am assuming that $k = l$ in the question, which seems to make sense.
First, I'll give an intuitive answer. To meet a distortion level of $2/3,$ the test channel will spit out a $\hat{x}$ uniformly distributed over $\{x -1, x , x+1\}$, and the compression rate will be $\log(k/3)$. Now suppose that we wanted to attain a distortion of exactly $1$. The way you could do this is to spread the mass of $\hat{x}|x$ uniformly over $\{-1,1\}$. But observe that this is wasteful in terms of compression - we're reducing $H(\hat{X}|X)$ without any improvement in $H(\hat{X})$. So to meet distortion constraints of $D =1$, it is optimal just to produce a compression with distortion $2/3.$ This same effect holds for every $D > 2/3,$ and is where the flat bit in $R(D)$ comes from.
More mathematically, the definition of the rate distortion function is $$ R(D) := \min I(\hat{X};X) : \mathbb{E}[d(\hat{X},X)] \le D.$$ Notice the inequality constraint in the above. You've worked out the expressions for $I$ and $\mathbb{E}[d]$ correctly, giving us the program $$ \min \log k - q - H_b(q) : q \le D, 0 \le q \le 1. $$
Now observe that without the $q \le D$ constraint, the objective is minimised at $q_0 = 2/3$, and is monotonically decreasing for $q < 2/3.$ So, if we solve the program for $D \le 2/3,$ the $q \le D$ constraint will be saturated by the optimum. On the other hand, for $D > 2/3, $ this constraint will not be saturated since we can instead set $q = q_0 < D$ and minimise the objective globally. This in turn means that for all $D \ge 2/3, R(D) = \log(k/3).$