Let $\mathcal{X}$ be a non-empty set. We define $k\colon\mathcal{X}\times\mathcal{X}\to\mathbb{R}$ as a symmetric positive definite kernel. For instance, the Gaussian Radial Basis Function (RBF) is given by $$ k(\mathbf{x}_i, \mathbf{x}_j)=\exp\Big(-\frac{1}{2}(\mathbf{x}_i - \mathbf{x}_j)^T \Sigma^{-1} (\mathbf{x}_i - \mathbf{x}_j)\Big), $$ where $\Sigma$ is a symmetric positive definite matrix (a covariance matrix). $k$ expresses the similarity between points $\mathbf{x}_i$ and $\mathbf{x}_j$ anisotropically, as well as isotropically if $\Sigma=\sigma^2I_n$. The latter case gives $$ k(\mathbf{x}_i, \mathbf{x}_j)=\exp\Big(-\gamma\big\lVert \mathbf{x}_i - \mathbf{x}_j \big\rVert^2\Big), $$ where $\gamma=\frac{1}{2\sigma^2}$. Let's assume now that we have a set of points $\mathbf{x}_i\in\mathbb{R}^n$, $i=1,\ldots,l$, which are the mean vectors of equal number multivariate Gaussian distributions with covariance matrices $\Sigma_i$.
We would like to construct a kernel function which could express the similarity of two points each of which belongs to some of the above Gaussian distributions. To be more clear, let $\mathbf{x} \sim N(\mathbf{x}_i,\Sigma_i)$, and $\mathbf{y} \sim N(\mathbf{x}_j,\Sigma_j)$, $i,j=1,\ldots,l$. A kernel function with the desired property would be hopefully of the form $$ k(\mathbf{x}, \mathbf{y})=\exp\Big(-\frac{1}{2}(\mathbf{x} - \mathbf{y})^T \Sigma^{-1} (\mathbf{x} - \mathbf{y})\Big). $$
However: (a) We must somehow determine the relations between $\mathbf{x}, \mathbf{y}$ and the covariance matrices $\Sigma_i,\Sigma_j$, respectively (i.e., that $\mathbf{x}$ corresponds to $\Sigma_i$, while $\mathbf{y}$ corresponds to $\Sigma_j$). Should we construct a generalized kernel (maybe a hyperkernel similar -not the same- to this) as $k( (\mathbf{x},\Sigma_i), (\mathbf{y},\Sigma_j) )$? Is this possible in terms of preserving symmetry and positive-definiteness? If so, (b) what would be a reasonable candidate covariance matrix $\Sigma$? Intuitively, it should be of the form: $\Sigma=\phi(\Sigma_i,\Sigma_j)$, where $\phi$ is a symmetric function on $\mathbb{S}^{n}_{++}\times\mathbb{S}^{n}_{++}$, where $\mathbb{S}^{n}_{++}$ is the space of symmetric positive definite $n\times n$ matrices. For instance, $\Sigma = \Sigma_i+\Sigma_j$. Is this even rational/feasible/possible?
Please keep in mind that I may be mistaken in any of the above stages. Please feel free to correct me and advice me appropriately! Many thanks!
I do not quite understand what you are trying to do. You said you want to define a kernel on $\boldsymbol{x} \sim \mathcal{N}(\boldsymbol{x}_i, \Sigma_i)$ and $\boldsymbol{y} \sim \mathcal{N}(\boldsymbol{x}_j, \Sigma_j)$.
Why does the distribution of your points $\{ \boldsymbol{x}, \boldsymbol{y} \}$ matter ? In general, you do not need any distribution assumption. If you want to define a kernel on the two distributions directly, an obvious way is to define a kernel on the parameters $\boldsymbol{x}_i, \Sigma_i, \boldsymbol{x}_j, \Sigma_j$. The resulting kernel will depend on parametrization. There are also many kernels on distributions which are independent of parametrization. See, say, Probability Product Kernels. The set $\mathcal{X}$ on which you can define a kernel can contain any kinds of objects. In the latter case, it is a set of distributions.
Updated: Assume that you want to define a kernel on distributions. Then, the previously mentioned paper is relevant. Further, if your distributions are non-parametric, then you want to search works along the direction of "mean embeddings of distributions". See this Wiki on kernel embedding of distributions. For example, the inner product of mean-embedded distributions in RKHS is a valid kernel for your distributions. This may or may not make sense to you. The point is that given two arbitrary distributions, you can define a kernel on them.