Let’s define a randomized acceptor as a tuple $V = (A, Q, \Omega, \mathfrak{F}, P, \phi, q_i, Q_t)$, where $A$ is the input alphabet, $Q$ is the set of states, $(\Omega, \mathfrak{F}, P)$ is a probability space, $\phi: Q \times A \times \Omega \to Q$ is the transition function and $q_i \in Q$ is the initial state and $Q_t \subset Q$ are the terminal states accordingly. We will call $V$ finite iff both $A$ and $Q$ are finite.
Let’s extend the transition function $\phi$ from $Q \times A \times \Omega$ to $Q \times A^* \times \Omega$ using the recurrence formulas:
$$\phi(q, \Lambda, \omega) = q$$ $$\phi(q, \alpha a, \omega) = \phi(\phi(q, \alpha, \omega), a, \omega) \forall a \in A \alpha \in A^*$$
Now define the acceptance probability of a word $w \in A^*$ in $V$ as $P_V(w) := P(\{\omega \in \Omega| \phi(q_i, w, \omega) \in Q_t)$. Using this we can define for an arbitrary language $L \subset A^*$ the absolute error of $V$ in respect to it as $Err(V, L) := sup\{|P_V(w) - \mathbb{I}_V(w)| | w \in A^* \}$. Let’s call a formal language $L \subset A^*$ almost regular iff $\forall \epsilon > 0$ $\exists$ a finite randomized acceptor $V$ such that $Err(V, L) < \epsilon$.
It is not hard to see, that all regular languages are almost regular. Bug is the converse true? Or does there exist an almost regular formal language, which is not regular?
Yes - and "almost regular" can be weakened to say only that some machine exists for some $\varepsilon <1/2$. In particular, one may prove the following:
We can prove this by adapting some of the usual metric space notions about Markov chains to handle probabilistic automatons and to show a way to construct, from a probabilistic finite automatic with the given property, a deterministic one accepting the set of words that the probabilistic automaton was more likely to accept than reject.
To do so, we first adopt a geometrical view of probability: First, we let $M(Q)$ be the set of probability measures on $Q$ as we will need to deal with this to describe a probabilistic automaton usefully. Note that, since $Q$ is finite, this is best imagined as a simplex with $|Q|$ vertices - or analytically as the space of maps from $Q$ to $\mathbb R_{\geq 0}$ where the sum of the outputs is $1$.
Note that this space comes with a metric: if we imagine a measure to be a map $Q\rightarrow\mathbb R_{\geq 0}$, we can use the $L^1$ norm on the space. (This is also equal to twice the total variation norm on $M(Q)$, if we want to stay in measure theoretic language)
Each symbol $a\in A$ is associated to some affine function $T_a:M(Q)\rightarrow M(Q)$ representing the outcome of of a machine reading the symbol $a$ when its state was previously distributed according to the input distribution. One should observe that $T_a$ does not increase any distances - in particular, in our metric, we have $d(T_a(\mu), T_a(\mu')) \leq d(\mu,\mu')$. We can extend this to represent any map $T_{\omega}$ where $\omega$ is a string in $A^*$.
Finally, we can consider that if some state or some set of states in $Q$ is designated "accepting", we can then represent the probability of acceptance as another affine function $P:M(Q)\rightarrow [0,1]$ assuming the value of $1$ on pure accepting states and $0$ on pure rejecting states. This map also does not increase distances.
With definitions out the way, we may now begin the more insightful portion of this proof. By hypothesis, if $\mu\in M(Q)$ is any distribution reachable from the starting distribution of the machine, $\omega$ we have $P(T_{\omega}(\mu)) \in [0,\varepsilon] \cup [1-\varepsilon, 1]$, since otherwise something would be accepted with probability less than $1-\varepsilon$ but would also be rejected with probability less than $1-\varepsilon$, violating hypothesis. Let's define $X$ to be the set of $\mu$ that satisfy this condition. Note that $X$ is closed because it is an intersection of closed sets and thus compact because it is a closed subset of a compact space.
Now, let's say that two states $\mu$ and $\mu'$ in $X$ are equivalent if for every $\omega$, we have that $P(T_{\omega}(\mu))$ and $P(T_{\omega}(\mu'))$ are either both above $1/2$ or both below $1/2$. This is, of course, an equivalence relation. Now, we can prove a simple lemma:
The proof is easy: note that $|P(T_{\omega}(\mu)) - P(T_{\omega}(\mu')) \leq d(\mu,\mu')| < 1-2\varepsilon$ since all maps involved are distance non-increasing*. However, since neither value can be in the interval $(\varepsilon,1-\varepsilon)$, this implies that they are both to the same side of this interval.
Then, we're clear to finish: this means that these equivalence classes are open, but $X$ is compact, so there are only finitely many equivalence classes. Let $X/\sim$ be the set of equivalence classes. Observe that, necessarily, the maps $T_{a}$ when restricted to the domain $X$ descend to maps $X/\sim \rightarrow X/\sim$ due to the definition of the equivalence relation. However, now we are done: we can define a deterministic finite automaton with the states from $X/\sim$, the transition functions induced from the maps $T_a$, and the accepting states lifted from $X$. This machine accepts the same set that the original was more likely to accept than to reject, hence we are done.
Note: it would be possible to bound the number of states in $X/\sim$ if one desired - though it seems like it's likely hard to get good bounds. This also shows that "biasing" the requirement doesn't change the situation - for instance, if we asked that words in the language be accepted with probability $p$ and words outside be accepted with probability $q$ where $q<p$, all of the same reasoning still applies.