How can we relate the mean of a function to the mean of some corresponding PDF?

261 Views Asked by At

Suppose $f:[a, b]\rightarrow\mathbb{R}$ is a real-valued function on a compact interval. For the sake of simplicity, let us also assume $f$ is continuously differentiable for now. The mean of $f$ is known to be

\begin{align}\tag{1} \mu = \frac{1}{b-a}\int_{a}^{b} f(x)\, dx. \end{align}

What I am wondering is, how can we relate this to the mean of a PDF?

The discrete analog of this question is easy enough. Given $\mu = (x_{1} + \cdots + x_{n})/n$ we can relate this to $\mu = \sum xp(x)$ by taking $p(x)$ to be the fraction of times $x$ appears in $x_{1}, \ldots, x_{n}$. However, it doesn't seem as simple when we are dealing with continuous functions.

Problem. To formulate my question more precisely, I'll state state my question in the form of a math problem. Given $f(x)$ can we find a probability density $p(y)$ such that if we choose $x\in [a, b]$ at uniform random (and apply $f$) then the probability density of obtaining $y$ is $p(y)$? In particular, given such a $p$ we should have

\begin{align}\tag{2} \mu = \int_{-\infty}^{\infty} yp(y) \, dy. \end{align}


Example 1. Suppose $f:[a, b]\rightarrow\mathbb{R}$ is a constant function $f(x) = c$. Then the corresponding probability density has to be $p(y) = \delta(y-c)$ where $\delta$ is the Dirac delta function.

Example 2. If $f(x) = A + \frac{x-a}{b-a}(B-A)$ then the corresponding probability density has to be

$$ p(y) = \begin{cases} \frac{1}{B-A} &\text{ if } y\in[A, B], \\ 0 &\text{ otherwise.} \end{cases} $$


Approach 1. My first idea was to divide the codomain of $f$ into discrete intervals, writing

$$ \mathbb{R} = \bigcup_{k} \,[\tfrac{k}{n}, \tfrac{k+1}{n}]. $$

Then define

$$ p(y) = N \int_{a}^{b} I(\tfrac{\lfloor ny \rfloor}{n}\le f(x)\le \tfrac{\lfloor ny \rfloor + 1}{n}) \, dx $$

where $N$ is a normalization constant. Here $I(\cdots)$ is the indicator function that is $1$ if and only if the condition in the parentheses is satisfied, and $0$ otherwise. I imagine we obtain the desired PDF by sending $n\rightarrow\infty$.

Approach 2. Given the framing of my problem, $x$ has a uniform PDF $\lambda(x)$ on $[a, b]$. It seems to be that $y= f(x)$ is a transformation of variables. Assuming $f$ is strictly increasing or strictly decreasing, the change of variables formula for PDFs gives us

$$ p(y) = \lambda(f^{-1}(y)) \cdot |(f^{-1}(y))'| = \lambda(f^{-1}(y)) \frac{1}{|f'(f^{-1}(y))|}. $$

This makes sense because the steeper $f(x)$ is near output $y$, the smaller probability density there is for getting $y$. Unfortunately, this approach only seems to work when $f$ is strictly increasing or decreasing. I am wondering how we could incorporate the case where $f$ is constant like in the example above.


My question is, is there a general way of approaching this that handles all examples? In particular, can we do this if we drop the condition that $f$ is injective (in Approach 2). What if we drop the condition that $f$ is continuously differentiable?

Is this problem well-known or studied? It seems surprising I can't find anything immediately pertaining to this, because it seems like very a natural question to ask what is the relationship between $(1)$ and $(2)$.

3

There are 3 best solutions below

0
On

I have a naïve approach, but I'm not sure:
You have already solved for strictly increasing, strictly decreasing and constant functions.
Any continuously differentiable function can be always broken down into intervals where it is exactly one out of the above three. Break it down like that, obtain the corresponding probability densities and add them with weights corresponding to the weight of the respective interval.

For example, if $f(x) = |x| $ for $x \in [-1,2]$

This can be broken down to: $g(x)=x$ on $[0,2]$ and $h(x)=-x$ on $[-1, 0]$

Corresponding probabilities are, $p_g(y)=1/2$ on $[0,2]$ and $p_h(y)=1$ on $[0,1]$

The weight of the interval $[-1,0]$ is $1/3$ and for $[0,2]$ it is $1/3$

Then, the overall $p(y)$ is:

For $y \in [0,1]$, $p(y)=1*(1/3)+(1/2)*(2/3) = 2/3 $
For $y \in (1,2]$, $p(y)=(1/2)*(2/3) = 1/3 $


The question is basically, given the pdf of a RV $X$, how to find pdf of $f(X)$. This is a studied concept, and you can find lecture notes like this and this.

This blog concerns pdf of arbitrary transformations, and goes into technical details.

1
On

I will give a more abstract answer. We work on the probability space $(\Omega,\mathscr{F},P)$. Let $f:\mathbb{R}\to \mathbb{R}$ be a measurable function and $X$ a random variable with law $P_X$. Then $$P(f(X)\in B)=P(X \in f^{-1}(B))=P_X(f^{-1}(B))=P_{f(X)}(B),\,B \in \mathscr{B}(\mathbb{R})$$ is the law of of $f(X)$; this is a probability measure on the measurable space $(\mathbb{R},\mathscr{B}(\mathbb{R}))$. We are not guaranteed to have a (Lebesgue) density for $P_{f(X)}$. If $X \sim \textrm{Uniform}([a,b])$ we have $$P_{f(X)}((-\infty,t])=\frac{1}{b-a}\int_{[a,b]}\mathbf{1}_{f^{-1}((-\infty,t])}(x)dx=\frac{\lambda([a,b]\cap f^{-1}((-\infty,t]))}{b-a},t \in \mathbb{R}$$ where $\lambda$ is the Lebesgue measure. So ultimately: $$\frac{1}{b-a}\int_{[a,b]}f(x)dx=\int_\mathbb{R}f(x)P_X(dx)=\int_\mathbb{R}yP_{f(X)}(dy)$$ So we related the integral to a statistical mean with respect to a probability law.

0
On

If I understand your question correctly, you want $p(y)$ such that $\int_{y_1}^{y_2}p(y)$ is the probability that $x$ will be such that $y_1<f(x)<y_2$. So $\int_{y_1}^{y_2}p(y)$ is the measure of the pre-image of $[y_1,y_2]$ under $f$. For "nice" functions, we can say that $p(y)$ is then the derivative of $\int_{y_1}^{y_2}p(y)$ (fundamental theorem of Calculus), and is therefore equal to something that is in some sense the "derivative" of the measure of the pre-image at $y$. Since the pre-image is in some sense the "inverse" of the function, the "derivative" is the derivative of the "inverse", and thus the inverse of the derivative. The first "inverse" in the previous sentence refers to the functional inverse, while the last refers to the multiplicative, but those are in some sense "the same": if $g(f(x)) = x$, then $g'(f(x)) = 1/f'(x)$. This can be understood with the following logic: if the derivative of $f$ at $f(x)=y$ is large, then that means $f$ doesn't spend much time in the neighborhood of $y$, and so $p(y)$ should be small. Another way of looking at it is imaging a horizontal line that starts at $y_0$ and then is moved vertically a distance of $\Delta y$. We look at the $x$-coordinate of the where the line originally intersected the curve, and where it intersected afterwards, and take the difference, and take the ratio of that to $\Delta y$. This is run over rise, the reciprocal of slope.

There are further complications. We have to scale it by the probability density of $x$, which means dividing by $b-a$. Furthermore, if we had some other function that is just $f$ horizontally reflected on $[a,b]$, we should get the same $p(y)$, which leads to the conclusion that we have to take the absolute value of the derivative. If $f$ goes through $y$ at multiple points, then we need to sum over all those points. Finally, there is the issue of the derivative being zero. At those points, $y$ would not be representable as a function, and we would have to look at further derivatives to see the behavior of $y$. If the second derivative is non-zero, and the first derivative is non-zero in a punctured neighborhood, then the limit of $y$ from one side would be infinite but from the other side it would be zero. If you imagine the horizontal intersecting near a local maximum, as the line approaches the maximum from below, the distance between the two intersections decreases faster and faster. But coming from above, there are no intersections, so the distance remains a constant zero.