Confusion about optimizing over k-means with non spherical parameters

17 Views Asked by At

When finding the gradient of $\sum_{x_{i}\in C_{j}}(x_{i}-\mu_{j})^{T}M_{j}(x_{i}-\mu_{j}) +Reg(M)$ over $M$ where $M$ is positive definite and $C_{j}$ is a single cluster, the derivative of above formula is $\sum_{x_{i}\in C_{j}}(x_{i}-\mu_{j})(x_{i}-\mu_{j})^{T}+\partial(Reg(M))$ in the lecture slides. Why the derivative reverses the vector multiplication? Shouldn't that be $\sum_{x_{i}\in C_{j}}(x_{i}-\mu_{j})^{T}(x_{i}-\mu_{j})+\partial(Reg(M))$?(I know one of the reasons that this formula does not work is because of the dimension)