Let $\mathcal{H}\colon\mathbf{w}^\top\mathbf{x}+b=0$ denote a hyperplane in $\Bbb{R}^n$ and $\Omega_{\pm}=\{\mathbf{x}\in\Bbb{R}^n\colon\mathbf{w}^\top\mathbf{x}+b\gtrless0\}$ be the positive and negative halfspace of $\Bbb{R}^n$ with respect to $\mathcal{H}$.
Also, let $d\colon\Bbb{R}^n\to\Bbb{R}$ be a function with $$ d(\mathbf{x})=\mathbf{w}^\top\mathbf{x}+b. $$
I use $d$ as a (signed) "distance" function so as to characterize an arbitrary point $\mathbf{x}\in\Bbb{R}^n$ about its relative position with respect to the hyperplane $\mathcal{H}$. More specifically, I want $d$ to increase as the point goes deeper in the positive halfspace and to decrease as the point goes deeper in the negative halfspace. $d$ does a good job for this purpose. A $2$D example is shown below.
Now, let's assume that $\mathbf{x}$ is a normal random vector with mean $\bar{\mathbf{x}}$ and covariance matrix $\Sigma$. A $2$D example is shown below. The ellipse denotes an iso-density locus that encloses points with density greater that a certain value. We use this just for illustration purposes.
For computing a similar "distance" measure that fulfill the above criterion, one could compute the expected value of $d(\mathbf{x})$, which is given by $d(\bar{\mathbf{x}})=\mathbf{w}^\top\bar{\mathbf{x}}+b$. This would give a good answer since the ellipse is entirely inside the one (positive in this case) halfspace, but what about the following example?
Intuitively, I would expect the computed "distance" measure to be less than the one computed in the previous example, since there is a part of the ellipse that belongs to the opposite halfspace, but using just the mean operator, these values will be the same.
Of course, taking the expected distance do encapsulate the fact there are also values of the random vector in the negative halfspace, but I think that the knowledge of the variance could also be used for arriving at a more meaningful "distance" measure.
Let's take for granted that I'm able to compute the expected values of $d(\mathbf{x})$ over the two halfspaces, i.e., to compute (analytically) the integrals $$ d_{\pm} = \int_{\Omega_{\pm}}d(\mathbf{x})f(x)\mathrm{d}\mathbf{x}, $$ where $f$ is the probability density function of $\mathbf{x}$. One could note that $$ d_+ + d_- = d(\bar{\mathbf{x}}),\quad d_+\geq0, d_-\leq0. $$ I could use $d_{\pm}$ to "scale" $d(\bar{\mathbf{x}})$ appropriately, for instance.
What are your thoughts? Thanks!


