I have a question on exchangeable random variables. I have always assumed that given random variables $X$ and $Y$ we have equality of the two joint distributions $p(X,Y)=p(Y,X)$ (don't we need this how we get the Bayes' theorem), but learning the definition of exchangeable random variables, I realized this is only possible if $X$ and $Y$ are exchangeable.
But then, how do we get the Bayes' theorem in the general case? Don't we define $p(Y|X)=p(Y,X)/p(X)$ and $p(X|Y)=p(X,Y)/p(Y)$?
Edit: I can understand the case where we interpret $(X,Y)$ as events or sets $(X=x,Y=y)$ as the answers below say. But I still have confusion over the general case. Namely, where we have the joint distribution $P_{X,Y}(dx,dy)=P^x_Y(dy)P_X(dx)$ where $P^x_Y$ is the conditional distribution of $Y$ given $X=x$. Now then when we apply Bayes, don't we need get $P_{X,Y}(dx,dy)=P_{Y,X}(dy,dx)$ since we should have $P_{Y,X}(dy,dx)=P_X^y(dx)P_Y(dy)$ by exchanging $X,Y$ above and $P_Y^x(dy)P_X(dx)=P_X^y(dx)P_Y(dy)$ is Bayes' theorem? So what goes wrong in the case of general joint distributions and not equivalences of set probabilities? This product rule of the joint distribution is from the general case of https://en.wikipedia.org/wiki/Bayes%27_theorem.
And in the case of continuous random variables, why is the numerator for $f_{X|Y=y}(x,y)$ and $f_{Y|X=x}(x,y)$ both $f_{X,Y}(x,y)$, the joint density of $(X,Y)$? Shouldn't the second be $f_{Y,X}(y,x)$, the joint density of $(Y,X)$? I can't see why $f_{X,Y}(x,y)=f_{Y,X}(y,x)$ since this is not a set intersection but one is derived from differentiating the joint distribution function of $(X,Y)$ $F_{X,Y}$ and the other from $(Y,X)$ $F_{Y,X}$. So my question mainly is why are we able to ignore the order of the random variables when we consider the joint distributions, densities when they are not exchangeable.
I think is more of a notational confusion rather than conceptual. When we write $P(A, B)$, we usually mean, event $A$ and event $B$ happens, so in this sense $P(A, B) = P(B, A) = P(A\cap B)$. Note that $P$ is not a function here, just a convention to talk about probabilities of events.
On the other hand, if we are talking about functions, the meaning is completely different. In this case $P(A, B)$ means the first variable is $A$ and the second variable is $B$. In math, you can consider symmetric functions to better understand exchangability. For example, $f(x,y) = x^2 + y^2$ is symmetric so $f(x,y) = f(y,x)$, i.e. you can swap the first and second variables. But this is not true in general, of course. Exchangeability is a similar idea in probability theory.
Update: I think I better understand where you got confused (and I must admit that it's more subtle than it looks). I will use the notation you used to clarify things.
First of all, suppose that $X$ and $Y$ are defined on the same set $[0,1]$ so that the support of the joint distribution wouldn't complicate the discussion.
What you are saying is this $$f_{X,Y}(X=a, Y=b) = f_{Y,X}(Y=b, X=a)$$ which is always true. Because this identity follows from a simple change of variables (coordinates): you are expressing the joint density in the $X$-$Y$ plane on the LHS and in the $Y$-$X$ plane on the RHS. Therefore, $f_{X,Y}(X=a, Y=b)$ and $f_{Y,X}(Y=b, X=a)$ live in different coordinate systems, and of course, the joint density when $X=a$ and $Y=b$ should be the same regardless of the coordinate system so they are equal.
However, the exchangeability concept is different. Its definition with this notation, would be something like this $$f_{X,Y}(X=a, Y=b) = f_{X,Y}(X=b, Y=a), \quad \forall a,b\in [0,1]\times [0,1]$$ So we require the joint density function to be symmetric with respect to the $y=x$ line in the same $X$-$Y$ plane system.
If you want a concrete example, take $f_{X,Y}(X=a, Y=b)= 6ab^2$. Then of course, $f_{Y,X}(Y=b, X=a) = 6ab^2$ but $$f_{X,Y}(X=a, Y=b) = 6ab^2 \neq 6ba^2 = f_{X,Y}(X=b, Y=a), \quad \forall a,b\in [0,1]\times [0,1]$$ so $X$ and $Y$ are not exchangeable.
On the other hand, the bivariate Gaussian distribution is exchangeable. For simplicity look at the standard bivariate $$ f_{X,Y}(X=a, Y=b) = \frac{1}{2\pi \sqrt{1-\rho^2}}\text{exp}[-\frac{1}{2(1-\rho^2)} (a^2 - 2ab\rho + b^2 ) ] $$ $$ f_{X,Y}(X=b, Y=a) = \frac{1}{2\pi \sqrt{1-\rho^2}}\text{exp}[-\frac{1}{2(1-\rho^2)} (b^2 - 2ba\rho + a^2 ) ] $$ $$ \implies f_{X,Y}(X=a, Y=b) = f_{X,Y}(X=b, Y=a) $$
Update 2 re-your question: When you express the joint distribution you decide which variable goes to x-axis and which to y-axis so you choose to work either in (X,Y) or (Y,X) coordinate system and in that sense, the chosen coordinate system induces an order between X and Y (i.e. which one is first variable). On the other hand, the expressions $f_{Y|X=x}(y)$ and $f_{X|Y=y}(x)$ don't involve any order- there is no "first" or "second" variable in these expressions. So if you were thinking "$f_{X|Y=y}(x)$ is computed from $f_{X,Y}(X,Y)$ so $f_{Y|X=x}(y)$ should be computed from $f_{Y,X}(Y,X)$", that is not true because there is no order implied in conditional densities. But in principle, you could compute $f_{Y|X=x}(y)$ using $f_{Y,X}(Y,X)$ but then you would be working in two coordinate systems, which is not practical. For example, when you compute both of the marginal densities from the same $f(X,Y)$, we have a nice geometric illustration for independence: at any point $(a,b)$, the joint density is the product of two marginal densities that intersect at the right angle but if you compute the marginals from different coordinate systems they will live in two different planes! In general, if you carry around two coordinate systems in your analysis, it would be ambiguous which variable is $X$ which is $Y$ in an expression like $f(a,b)$ so you would have to write $f(X=a,Y=b)$, $f(Y=a,X=b)$ each time or use a different function, say $g$, for $Y$-$X$ coordinate system so that when you write $g(a,b)$, it would be clear that $Y=a$ and $X=b$. Alternatively, as you did initially, the subscripts of $f$ in $f_{X,Y}(X,Y)$ $f_{Y,X}(Y,X)$ differentiate which coordinate system we are working in so they are not both $f$. But again, there is no reason to let all this confusion in. Simply we stick to the same coordinate system.