i'm trying to get the gradient of a generalized method of moment that is based on the first and second moments of a Gaussian mixture model problem but i've failed.
a few notations:
K - amount of clusters,
N - dimension of samples,
T - amount of samples
A - N*K matrix of K cluste means.
lets assume that all gaussians have uniform diagonal noise (sigma*I) and that the priors of sample generated from the cluster (p) are known. lets notate {A,p}=θ
we sample T sample and denote them {y_i}i=1:T
the moments are : M_k=(1/N)∑y_i^k
will will try to apply GMM to the 1st and 2nd moments. the form of the problem is:
g(y,θ)=[g_1 (y,θ),g_2 (y,θ)]=[M_1-∑p_k* _k,M_2-(∑p_kA_kA_k^T+σ^2*I_M )]
note: p_k is element from p . A_k is column k of A. we sum over k cluster.
cost(A) = g(y,θ)' * W * g(y,θ) , were W is calculated as the optimal weighting according to GMM formulation.
i havent found any elegent way to calculate the N*K gradient of the scalar problem.
i couldnt match this problem to a simple form from The Matrix Cookbook. in addition i calculated grad(g_1 (y,θ)),grad(g_2 (y,θ)) by hand via generalizing small problems to a closed form solution.
i've tried to perform SVD to the W matrix and try to get a simple g_1' (y,θ)*W1 * g_1 (y,θ) + g_2' (y,θ)*W2 * g_2 (y,θ) form but failed as well...
- a closed form of dx(f(x) * W * g(x)) where f(x),g(x) are vectors will be nice as well
what kind of a direction will lead me to a simple solution?
so i found the solution to the problem: W = [W11 , W12; W21 , W22] g(y,θ)' * W * g(y,θ) = g_1' * W11 * g_1 + g_1' * W12 * g_2 + g_2' * W21 * g_1 + g_2' * W2 * g_2
all 4 elements can be derived by the 2 calculations above, and thats it.