How to find (and plot) a probability distribution function?

5.8k Views Asked by At

I'm working on my biometrics course, and I have to plot a pdf (I think it means probability density / distribution function). Here is a sample pdf graph : Introduction to Biometrics page 5 , figure 2.a

I have the data for genuine and impostor scores, in MATLAB. So I have two vectors, which contain hundreds of data (values of which are between 0 and 1). But I have no idea how to pdf-plot them.

I tried probplot(data) function of MATLAB, but it gaves an increasing function, whereas I was expecting a bell-curve shape. See the image : probplot

I also see that there is a pdf function of MATLAB, but it asks you to use which kind of pdf. Maybe normal distribution is the one for me, but I'm confused because it also asks for the mean and st.dev. values. Shouldn't the function compute these for me?

In the end, how can I get a PDF plot like in the link ?

Thanks for any help !

Edit:

ksdensity plot for the genuine scores: enter image description here

Edit 2:

Output for normpdf(scores , mean(scores) , std(scores)); : enter image description here

Edit 3:

This is the actual graph I'm trying to achieve : (x-axis values are not important) enter image description here

Edit 4:

output of this : (an un-readable version of http://www.mathworks.co.uk/matlabcentral/newsreader/view_thread/155832 this answer)

vector_to_pdfplot = genuine_scores; 
plot(min(vector_to_pdfplot) : ((max(vector_to_pdfplot) - min(vector_to_pdfplot))/1000) : max(vector_to_pdfplot) , normpdf(min(vector_to_pdfplot) : ((max(vector_to_pdfplot) - min(vector_to_pdfplot))/1000) : max(vector_to_pdfplot) , mean(vector_to_pdfplot) , std(vector_to_pdfplot) ));

enter image description here

2

There are 2 best solutions below

5
On

Try hist(data) to get a histogram.

You can also try

[f,xi] = ksdensity(data)
plot(xi,f)

to get a smoothed histogram.

You can restrict the domain (since the default is the real number line):

support = (0:.01:1)'
[f] = ksdensity(data,support)
plot(support,f)
6
On

I suggest adapting this code I've taken from the link I gave in my first comment.

>> A = rand(700,1);
>> MAX = max(A);
>> STD = std(A);
>> MAX = max(A);
>> MIN = min(A);
>> STEP = (MAX - MIN) / 1000;
>> PDF = normpdf(MIN:STEP:MAX, M, S);
>> plot(MIN:STEP:MAX, PDF);

In your case the distribution isn't random, but I imagine you can apply the principles used here.

In particular, the first argument should be a combination of the maximum, minimun and a step size which depends upon the maximum, minimum and one other number (1000 in the example). I'm afraid I can't offer any help on how to choose the number in your case.

EDIT: maybe just modifying your code so that it's in this format will solve the problem:

x = -100:0.1:100;
plot(x,normpdf(x,0,20),'-')

What's the difference? The difference is that defining your x values beforehand alters MATLAB's scaling:

quoting the answer I posted in the comments:

When you call plot with ONE argument, it plots those numbers on the y axis, using the index numbers of those values for the x axis. If you wanted the x axis scaled properly, you had to provide them in the first place. Thus...

x = -100:0.1:100;
plot(x,normpdf(x,0,20),'-')

produces very different results to

plot(normpdf((-100:0.1:100),0,20))

contrary to what one might expect.