In this paper about positive definite kernels they introduce the extension of the radial basis function (RBF or Gaussian Kernel)
$$ K_{rbf}(x,y) = e^{-\epsilon^2||x-y||^2} $$
for the unit sphere
$$ K_{srbf}(x,y) = e^{-2\epsilon(1- \langle x , y \rangle)} $$
Instead of the squared euclidean distance $||x-y||^2$ it uses the cosine similarity $1-\langle x , y \rangle$.
Both use the shape parameter $\epsilon$, with $\epsilon^2 = \frac{1}{\sigma^2}$ from the more common gaussian form. In the code used by this paper about kernel MRCD they implement a RBF kernel and estimate $\sigma$ for a dataset $D$ robustly using
$$ \sigma = \sqrt{\underset{x,y \in D}{median}\big(||x-y||^2\big)} $$
Now I am testing said kernel MRCD for data on a hypersphere and wanted to extend the originally implemented RBF kernel to the sphere, but how do I then estimate $\sigma$? So far I landed on
$$ \sigma = \sqrt{\underset{x,y \in D}{median}\big((1-\langle x , y \rangle)^2\big)} $$
which simply replace the euclidean distance by the cosine similarity. But since the extension of the kernel to the sphere itself wasn't just a change of distance this doesn't feel quite right.
What other heuristics can be used? The application is for unsupervised outlier detection so something like a grid search should not be used and a direct estimate from the dataset is needed.
Edit:
Thanks for pointing out that if $||x||=||y||=1$ then:
$$ ||x-y||^2 = ||x||^2 - 2\langle x , y \rangle + ||y||^2 = 2-2\langle x , y \rangle = 2(1-\langle x , y \rangle) $$
so on the unit sphere cosine similarity and euclidean distance coincide. I am still wondering though why instead of $\epsilon^2$ the second one uses $\epsilon$? And also if this way of estimating $\sigma$ still holds true on the unit sphere?