Suppose I have the following code in MATLAB:
for i=1:2000
w(i) = rand(1);
X1(i) = 4*tan(pi*w(i)/3);
end
%plot X1
figure
plot(w,X1,'.');
xlabel('w');
ylabel('X1');
%plot CDF of X1
figure
cdfplot(X1);
How do I generate a PDF of X1? There is no "pdfplot" function.
Use
histogram(X1)orhistogram(X1,n)instead.Note that a histogram is just a scaled version of a PDF. Since data are discrete even if they are sampled from a continuous distribution, this is the appropriate approach.
At first glance, it may not appear as though a histogram is what you want. But it is.
Let's take a look at what CDFs and PDFs are for continuous distributions.
The CDF $F(x)$ evaluated at any value $x$ gives the probability that the random variable $X$ is less than or equal to $x$. As such, the CDF increases monotonically from zero to 1 over the support of the random variable.
The PDF is the derivative of the CDF. The PDF $f(x)$ evaluated at $x$ is practically meaningless. It's the shape of the PDF that matters, and at any value $x$, what we care about is the area under the curve to the left of that point -- which is precisely the CDF.
When analyzing sampled data, it's very easy to construct CDFs -- simply sort the data and at each data point compute the percentage of values lower than that value.
It's not so easy to generate a PDF of sampled data. How do we take a derivative of a sampled function? We can use finite differences, but that suffers from local sensitivity and will not give you the curve you're looking for. You can use moving averages, but this destroys the qualities of the PDF you're looking for. In short, there's not a good way to do it.
But it doesn't matter that there's not a good way to do it, because any such approximation to the PDF, denoted $\hat{f}(x)$, will be essentially meaningless. It's the shape that matters -- histograms are what we use to give us that shape.
In a histogram, the sum of the heights of the bars equals the total number of samples. If we divide the histogram by the number of samples, we'll get something that resembles a PDF pretty closely. But because it's the shape of the histogram that really matters, re-scaling isn't very important in the long run.
If we did re-scale the histogram by dividing by $N$, then we can add the areas of the bars to obtain the CDF. Note that this process is exactly the same as starting with a PDF and computing a Riemann sum to approximate the CDF!