Can any function of the second moment of a random variable be recovered from its quantile function?

964 Views Asked by At

Summary of question

It is known that the expected value of a random variable can be obtained from integrating its survival function. This is easily restated in terms of the quantile function as: $$ \int_0^\infty S(x)\;dx = \int_0^1F^{-1}(x) $$ whose equivalence can be seen graphically as integrating over the same area. This is useful, as when dealing with empirical or stochastically generated CDFs, layer loss statistics can be estimated directly off of the empirical quantile/CDF. It would be useful to be able to extract layer loss variance information in a similar way. Is there any similar relationship or method to extract higher moments, limited or unlimited, from the quantile function?

Detailed question

Background

It is a known property that given a random variable $X$ with density $f(x)$, cumulative distribution function (CDF) $F(x)$, survival function $S(x)$, and quantile function $F^{-1}(x)$, there is the following relationship between S(x) and E(X): $$ E(X) = \int_0^\infty xf(x)\;dx = \int_0^\infty S(x)\;dx = \int_0^1F^{-1}(x)\;dx $$

The last form, obtained by solving for the area in terms of y instead of x, is particurlarly convenient when forced to estimate these calculations in tabular formal (like Excel).

Somewhat less known is that what we call the "layer loss cost" or the amount of the random variable that pierces a threshold but is capped by a limit is also recoverable from the survival function. Let $E(X\wedge A)$ be defined as the limited expected value of $X$ capped at $A$, or: $$ E(X\wedge A) = \int_0^A xf(x)\;dx + A\int_A^\infty f(x)\;dx $$ Given random variable $Y$, the loss in the layer between $A$ and $B$, defined as $\min(B-A, \max(0, X - A))$, or equvalently $\textrm{median}(0, B-A, X-A)$, the following relationship also holds: $$ \begin{align} E(Y) &= E(X\wedge B) - E(X\wedge A)\\ &= \int_A^B (x - A)f(x)\;dx + (B-A) \int_B^\infty f(x)\;dx\\ &= \int_A^B S(x)\;dx \end{align} $$

So it can be said that the CDF/survival function encodes information about the first moment (limited and unlimited) which can be relatively easy to extract.

Proof for first moment

Both relationships are usually proved by integration by parts, and can be proved generally without recourse to the specific form of the distribution at hand. $$ \begin{align} E(Y) &= \int_A^B (x - A)f(x)\;dx + (B-A) \int_B^\infty f(x)\;dx\\ &= \int_A^Bxf(x)\;dx - A \int_A^Bf(x)\;dx + (B - A)S(B)\\ &= \int_A^Bxf(x)\;dx - A\left[S(A) - S(B)\right] + BS(B) - AS(B)\\ &= \int_A^Bxf(x)\;dx - AS(A) + AS(B) + BS(B) - AS(B) \end{align} $$ Now to focus on the first term. Using the fact that $S(x) = 1 - F(x)$ we have $$ \frac{d}{dx}S(x) = -f(x) $$

So using integration by parts, we can let $u = x, du = 1, dv = f(x), v = -S(x)$ and state generally that: $$ \begin{align} \int_A^B xf(x)\;dx &= -xS(x)\bigg\vert_A^B + \int_A^B S(x) dx\\ &= \int_A^B S(x)\;dx - BS(B) + AS(A) \end{align} $$ Substituting the above in the original question we get $$ \begin{align} E(Y) = \int_A^B S(x)\;dx &- BS(B) + AS(A) \\ &- AS(A) + AS(B) + BS(B) - AS(B) \end{align} $$ All the non-integral terms cancel, leaving us with $E(Y) = \int_A^B S(x)\;dx\;\blacksquare$.

Attempt for second moment

I am interested in the variance of the random variable $Y$ as defined above. The most starightforward way to approach this would be to calculate $E(Y^2)$ and the subtract $E(Y)^2$ as obtained from the quantile function per above. This could be stated as: $$ E(Y^2) = \int_A^B (x - A)^2 f(x)\;dx + (B-A)^2\int_B^\infty f(x)\;dx $$ Expanding the terms, engaging in some algebraic manipulation and cancellation, and using the result from above for the first moment, we get (if I have not made a mistake): $$ \int_A^B x^2 f(x)\;dx - 2A\overbrace{\int_A^BS(x)\;dx}^{\textrm{Can use } F^{-1}(x)} - A^2S(A) + B^2S(B) $$

At this point I am stuck, since to use integration by parts to handle the first term would require knowing the antiderivative of the general survival function, which I don't know, nor do I know how to convert that into using the quantile function, as could be done for the survival function itself.

Specific problem

Using the same integration by parts technique on $\int_A^B x^2 f(x)\;dx$, the term $$ \int_A^B x S(x)\;dx $$

arises, and I do not know how to proceed from there.

Recap

As mentioned in the summary, often I will have an ground-up (0 to infinity) empirical distribution function for a random variable representing a loss severity, and I am interested not only in the expected loss in a layer but the variance of said loss, so being able to extract the second moment from the ECDF/quantile table would be very helpful. Thank you.

Updated with Answer

For completeness sake, for unlimited moments, just as: $$ E(X) = \int_0^\infty S(x)\;dx $$ the corresponding formula for the second moment is: $$ E(X^2) = 2\int_0^\infty xS(x)\;dx $$ And for layer values $Y$ as defined above, just as $$ E(Y) = \int_A^B S(x)\;dx $$ the second moment of $Y$ can be recovered as: $$ E(Y^2) = 2\int_A^B (x-A)S(x)\;dx $$

Updated with more complete and elegant answer to the general (non-layered) case

See this question for a closed form answer for getting the moments of a distribution through integrating its quantile function, as well as the solution to $\int_0^\infty xS(x)dx$.

1

There are 1 best solutions below

1
On BEST ANSWER

If you want to estimate some functional of the original distribution, then you can always apply the same functional to an estimate of that distribution (the "plug-in principle"). In the case of moments, this would just amount to integrating the EDF, or treating the jumps as point masses and calculating the moments of this discrete distribution.