Let $S = \{S_1,\ldots,S_N\}$ be the number of states of a hidden markov process, and let $V = \{V_1,\ldots,V_N\}$ be the alphabet of possible observations one can make.
Given a model $\lambda = (\pi,A,B)$ for a hidden markov process, where $\pi$ is the initial distribution, $A$ is the transition-matrix and $B$ is the emission-matrix and $\mathcal{O} = \mathcal{O}_1,\ldots,\mathcal{O}_T$ generated by the algorithm
$(1)$ We choose state $q_1 = S_i$ according to $\pi$. $(2)$ Then we set $t = 1$. $(3)$ Then we choose $\mathcal{O}_t = V_k$ according to $B_i(k)$. Then we transit to a new state $q_{t+1} = S_j$ according to $A_{ij}$. We set $t = t+1$. If $t<T$, we return to $(3)$, and otherwise we terminate the procedure.
My question is this. If we have generated an observation-sequence $\mathcal{O}$ by this process, and we fix our initial guess in the Baum-Welch algorithm at the true $B$-matrix that was used to generate $\mathcal{O}$, does the Baum-Welch algorithm just reduce to calculating $\# i \to j$ for every state $i,j \in \{1,\ldots,N\}$? That is, does it reduce to effectively a markov process.
Or more generally, if you instead of the Baum-Welch algorithm, use any optimization-process, will it reduce to a just markov-process if you set your initial guess to the $B$ matrix that generated $\mathcal{O}$.
I don´t believe this is the case, but I have heard people make this assertion, and I am trying to find a proof for why this is not the same thing. I might be wrong.