Creating a probability density function for a particular dataset

153 Views Asked by At

I want to create a probability density function for a particular dataset. First of all, I calculate the mean and the variance of my dataset. So, I use the mean and the variance to create a probability density function, for example, Gaussian distribution. Is my thinking correct?

2

There are 2 best solutions below

1
On BEST ANSWER

I encourage you to visualize the dataset.

For example, you have to consider if your data is symmetrical.

If your data is symmetrical and you believe that Normal distribution would be a good fit, then using the mean and unbiased estimator is indeed a common practice.

This Wikipedia page describes your approach as follows:

For example, the parameter $\mu$ (the expectation) can be estimated by the mean of the data and the $\sigma^{2}$ (the variance) can be estimated from the standard deviation of the data. The mean is found as $ m=\sum \frac{X}n$, where $X$ is the data value and $n$ the number of data, while the standard deviation is calculated as $s=\sqrt {{\frac{1}{n-1}}\sum {(X-m)^{2}}}$. With these parameters many distributions, e.g. the normal distribution, are completely defined.

0
On

A nonparametric way to estimate a density corresponding to your data is through kernel density estimation.

Given an iid sample $(x_1,...,x_n),$ this method estimates your density function as

$$\widehat f=\frac{1}{nh}\sum_{i=1}^n K\left(\frac{x-x_i}{h}\right)$$

for a suitable choice of a bandwidth parameter $h$ and kernel $K(\cdot)$.

I encourage you to read the wiki article for details. Further lecture notes are here.