I am refering to this formula. I am not sure I understand how to calculate this exactly. My data consists of 2 predictor variables $X_1$ and $X_2$, and a class variable {$0, 1$}.
I understand the first step is to split the data according to the class values {$0, 1$} , but when the formula says:
$(x_i - \hat{\mu}_k)(x_i - \hat{\mu}_k)^T$
how do you account for the observations in each column, $X_1$ and $X_2$? This just seems to account for observation # in general.
I have a similar confusion about the mean vector ${\mu}_k$ , as this seems to suggest it is a ($2$ x $1$) but I have calculated mine as a ($2$ x $2$), where the rows denote the predictor variable {$X_1,X_2$} and the columns denote class {$0,1$}
How exactly does one go about computing this covariance matrix $\hat{\Sigma}$ , and what is its final dimension given my circumstances?
In the literature, $x^{(i)}$ is the $i$-th data points. $x^{(i)}= \begin{bmatrix} x^{(i)}_1 \\ x^{(i)}_2\end{bmatrix} \in \mathbb{R}^2$.
$\mu_k$ is the class centroid for the $k$-th class. Just take those $x^{(i)}$ that belongs to the $k$-class and average them.
$\mu_1 \in \mathbb{R}^{2 \times 1}$ and $\mu_2 \in \mathbb{R}^{2 \times 1}$ as well.
The covariance matrix is of size $\mathbb{R}^{2 \times 2}$.
I believe the main confusion comes from most literatures use column convention but the data science community uses row convention to denote a data point.