I am trying to plot the distribution of a measured variable from a scientific experiment, in this case a velocity. After making a simple histogram, I have been reading this,
http://en.wikipedia.org/wiki/Kernel_density_estimation
This seems to suggest that one can pretend each value actually is better represented by a normal distribution and that the total distribution can be found as the sum of the individual distributions, i.e. the kernels. It goes on to discuss the art of choosing the correct bandwidth for the kernel.
My data, however, has known uncertainty. That is to say, I measure the velocity in six different ways, each of which gives me a slightly different answer. I then take the mean of these values and calculate the standard deviation as an estimate of my measurement uncertainty.
My question is, can this extra information be included explicitly into the kernel density estimation process? Intuitively I feel that I could give each point a kernel whose bandwidth is the measured uncertainty for that point. Does this make sense? Is there an established way to do this?
Thanks in advance,
Nick
See this post, which I think addresses your question: https://stats.stackexchange.com/questions/88297/kernel-density-estimation-incorporating-uncertainties
My suggestion is very close to Glen_b's in the attached post. Basically, assume that each data point is known with 100% certainty, then determine the desired bandwith, $h$ based on that assumption. Then adjust $h$ based on the uncertainty, $\sigma$ to get an adjusted bandwidth $h'=\sqrt{h^2+\sigma^2}$ and use this point-specific bandwidth to make your final density estimate.