Let $X,X_1,X_2\dots$ be random variables defined on a probability space $(\Omega,\mathcal{F},P)$ and taking values in a metric space $(S,d)$. We write $X_n\overset{p}{\to}X$ if $P[d(X_n,X)\geq \epsilon]\to 0$ for all $\epsilon>0$.
Question: Do we need $S$ to be separable for this to make sense?
Wikipedia indicates yes, but this answer indicates no. The definition on page 27 of Billingsley's Convergence of Probability Measures puts no restriction on $S$, but the definition on page 287 of Dudley's Real Analysis and Probability imposes separability.
The distance function $d:S\times S\to \mathbb{R}$ is continuous with respect to the product topology on $S\times S$, and so is $\mathcal{B}(S\times S)$ measurable. On the other hand if $X,X_n:\Omega\to S$ are both Borel measurable, then the product map $(X,X_n):\Omega\to S\times S$ is $\mathcal B(S)\otimes \mathcal B(S)$ measurable. So it seems we need separability to ensure $\mathcal{B}(S\times S)=\mathcal B(S)\otimes \mathcal B(S)$ so that the composition $d(X_n,X)$ is measurable.
However if $X=c$ a constant, then I believe this restriction is not needed.
Is this correct? Thanks for your help.