I have seen on some books (e.g. Lipster and Shiryayev (1977)) some concepts from optimal filtering. One idea is that, given a hidden state $\theta$ (say time invariant and taking finitely many values) and an observable process $\xi$ which satisfies $$ d \xi_t = A(\theta, \xi)dt + B(\xi) dW_t $$ we can find an SDE for the conditional probability of each value of $\theta$ given information from the observed process (see theorem 9.1 in LS).
With this same goal, I am interested in the case with observable processes of the type $$ d \xi_t = A(\theta, \xi)dt + B(\theta,\xi) dW_t $$ or even the simpler $$ d \xi_t = A(\theta)dt + B(\theta)dW_t $$
For instance, this could be a filtering in which the state is the variance of the noise in the signal. Is there any good reference on these types of problems?