I'm referring Andrew Ng machin learning course, and currently studying Gaussian Discriminant Analysis model from this notes.
Now I understand following model equations:
And why it's log maximum likelihood is written as following:
But what I don't understand is that how the maximization with respect to the parameters φ, Σ,μ0 and μ1 is derived or how the values of φ, Σ,μ0 and μ1 are derived?
I've already searched this but can't find the exact proof or derivation for the same. I'm not asking for the full derivation or proof, just appropriate reference with some explanation will do. Thanks in advance.

