Suppose I need to find the posterior distribution of $\mathbf{w}$ given data $D={\mathbf{y,X}}$. Then how do I arrive at the below expression using Baye's rule.
$p(w|y,X)=\frac{p(w,y,X)}{p(y,X)}=\frac{p(y|X,w)p(X,w)}{p(y|x)p(x)}$
Now what will $p(X,w)$ decompose into? $p(X,w)=p(X|w)p(w)$ OR $p(X,w)=p(w|X)p(X)$ and to cancel the $p(x)$ in the denominator we can write $p(X|w)=p(X)$ since I think $X$ is independent of $w$ OR is it $p(w|X)=p(w)$ using the same reason. Which of the two is correct.

$p(w|y,X)p(y|X)=p(w,y|X)$
and
$p(y|X,w)p(w)=p(w, y|X)$ only if $p(w)=p(w|X)$. So I guess it's saying the that the prior distribution can't depend on only one aspect of the data, i.e. the distribution of $w$ is the same if you just have a bunch of predictors without knowing the response. But it'd be good to know the context. Sometimes authors take liberties in writing online articles.