Let's say that we had an information for men and women heights.
r code:
set.seed(1)
Women=rnorm(80, mean=168, sd=6)
Men=rnorm(120, mean=182, sd=7)
par(mfrow=c(2,1))
hist(Men, xlim=c(150, 210), col="skyblue")
hist(Women, xlim=c(150, 210), col="pink")
Unfortunately something happened and we lost the information who is women and who is men.
r code:
heights=c(Men, Women)
par(mfrow=c(1,1))
hist(heights, col="gray70")
rm(women, men)
Could we somehow estimate women and men mean heights and standard deviation using maximum likelihood method? We know that men and women heights are normally distributed.
This is a classical clustering or unsupervised classification problem. The usual solution uses the well known EM algorithm (look for it on internet).