How to do the derivation of the MLE for Linear Discriminant Analysis

Question

How to do the derivation of the MLE for Linear Discriminant Analysis

98 Views Asked by Bumbble Comm At 02 Apr 2026 - 11:39

Derivation of the MLE for Linear Discriminant Analysis

$$ \ell(\phi, \mu, \Sigma) = \log \prod_{i=1}^{M} p(x^{(i)}, y^{(i)}; \phi, \mu, \Sigma) $$

$$ = \log \prod_{i=1}^{M} p(x^{(i)}|y^{(i)}; \mu, \Sigma) p(y^{(i)}; \phi) $$

$$ = \log \prod_{i=1}^{M} \frac{1}{\sqrt{2\pi}^N |\Sigma|} \exp\left(-\frac{1}{2}(x^{(i)} - \mu_{y^{(i)}})^T \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}})\right) \prod_{c=1}^{C} \phi^{I[y^{(i)}=c]} $$

$$ = \sum_{i=1}^{M} \left[-\frac{N}{2} \log(2\pi) - \frac{1}{2} \log|\Sigma| - \frac{1}{2}(x^{(i)} - \mu_{y^{(i)}})^T \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}}) + \sum_{c=1}^{C} I[y^{(i)} = c] \log \phi_c\right]. $$

Now we need to take partial derivatives with respect to each parameter and equate it to zero. For $\mu_c$,

$$ \frac{\partial\ell(\phi, \mu_c, \Sigma)}{\partial \mu_c} = \sum_{i=1}^{M} I[y^{(i)} = c] \Sigma^{-1}(x^{(i)} - \mu_c) = 0. $$

My question

So my question is, I don't really know how to do derivation of this kind of vector-equation aka going from this step: $$ = \sum_{i=1}^{M} \left[-\frac{N}{2} \log(2\pi) - \frac{1}{2} \log|\Sigma| - \frac{1}{2}(x^{(i)} - \mu_{y^{(i)}})^T \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}}) + \sum_{c=1}^{C} I[y^{(i)} = c] \log \phi_c\right]. $$ to this step: $$ \frac{\partial\ell(\phi, \mu_c, \Sigma)}{\partial \mu_c} = \sum_{i=1}^{M} I[y^{(i)} = c] \Sigma^{-1}(x^{(i)} - \mu_c) = 0. $$

I know basic linear algebra and calculus, but I have not encountered this kind of derivation problem before, and I don't know where to learn it, I have been stuck here for a long time, can someone provide me a step-by-step proof of going from that step to this step please?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

The notation is a bit confusing since the log likelihood is written in terms of the collection of vectors $\mu$, but then the derivative is written in terms of a single vector (which they call $\mu_c$). Recall that since we have multiple classes here we have a center for each class: $\mu_c$.

In any case, the way to proceed is to remember that we want the derivative of our function with respect to the variable $\mu_c$ and proceed as usual:

$$\frac{\partial}{\partial \mu_c}\left ( \sum_{i=1}^{M} \left[-\frac{N}{2} \log(2\pi) - \frac{1}{2} \log|\Sigma| - \frac{1}{2}(x^{(i)} - \mu_{y^{(i)}})^T \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}}) + \sum_{c=1}^{C} I[y^{(i)} = c] \log \phi_c\right]\right ) = \frac{\partial}{\partial \mu_c}\left ( \sum_{i=1}^{M}- \frac{1}{2}(x^{(i)} - \mu_{y^{(i)}})^T \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}})\right )$$

where the equality follows since the other terms are a constant with respect to $\mu_c$. Now since the derivative is a linear operator, we can take the derivative of each term in the sum separately. Further, each term will be a constant with respect to $\mu_c$ unless $\mu_{y^(i)} = \mu_c$, so:

$$=- \frac{1}{2}\sum_{i=1}^{M}\frac{\partial}{\partial \mu_c}(x^{(i)} - \mu_{y^{(i)}})^T \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}}) = \sum_{i=1}^{M}I(y^{(i)} = c) \Sigma^{-1} (x^{(i)} - \mu_{y^{(i)}})$$

Where the last step follows by the chain rule and the fact that for symmetric $A$:

$$\frac{\partial}{\partial x} x^TAx = 2Ax$$

Let me know which of these steps are confusing for you and I can clarify.

Finally, see chapter 2 of the Matrix Cookbook: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf (specifically equation 81 in the case where $B$ is symmetric)

How to do the derivation of the MLE for Linear Discriminant Analysis

Derivation of the MLE for Linear Discriminant Analysis

My question

There are 1 best solutions below

Related Questions in CALCULUS

Related Questions in LINEAR-ALGEBRA

Related Questions in PROBABILITY

Related Questions in MULTIVARIABLE-CALCULUS

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions