I have some problems in understanding the MAP estimation for discriminative models. I will use the notation used in the very first two pages of this paper https://www.microsoft.com/en-us/research/wp-content/uploads/2016/05/Bishop-Valencia-07.pdf As far as I understand, the posterior distribution of a discriminative model is $p(\theta|X, C)$. Where $X = \{x_1,x_2,\dots,x_n\}$ is the training set while $C=\{c_1,c_2,\dots,c_n\}$ are the corresponding labels.
As usual the posterior is split into the prior and likelihood.
$$p(\theta|X,C) \overset{?}{=} \frac{p(\theta)L(\theta)}{p(C|X)} = \frac{p(\theta)p(C|X,\theta)}{p(C|X)}$$
The step that I do not understand is the one marked by the "?".
Moreover, in the same paper, it is also noted that: $p(\theta,C|X)=p(\theta)L(\theta)$. However, if we go a bit further: \begin{aligned} p(\theta,C|X)=p(\theta)L(\theta) \implies \\ p(\theta,C|X)=p(\theta)p(C|X,\theta) \implies \\ \frac{p(\theta,C,X)}{p(X)} = \frac{p(\theta)p(\theta,C,X)}{p(X,\theta)} \implies \\ p(X)p(\theta) = p(X,\theta) \end{aligned}
Therefore, it seems that $X$ and $\theta$ are independent but I fail to see why. Given such independence, it would be fairly simple also proving the step marked with "?".