How to define a PDF for data with unkown distribution?

147 Views Asked by At

I have a dataset containing real values and I want to define the PDF associated. Is there any method to find out the PDF for data with unknown distribution?

1

There are 1 best solutions below

2
On BEST ANSWER

There is not one right answer. If you know from which family of distributions the data come, you can estimate the parameters for that family and plug those in to get a pdf. If instead you do not wish to assume a certain family, you can approach the problem nonparametrically. In this case, you could look into kernel density estimation (Google kernel density estimation for a lot of explanations). The basic idea is the following:

Suppose you have an iid sample $(X_1, ..., X_n).$ We take some pdf $g$ (which need not be the pdf of the data!) and some smoothing parameter (also known as bandwidth) $h>0.$ We then estimate the pdf of the data by $$\hat{f}(x) = \frac{1}{n}\sum_{i=1}^n g\left(\frac{x-X_i}{h}\right)/h.$$

However, what the best method is depends on the situation. What do you know (or believe) about the data already and what do you wish to use the pdf for?

If you want to estimate the probability $\mathbb{P}(X_1<7)$ for example, the empirical CDF might be a much better method.