Why isn't my EM algorithm increasing the log-likelihood after each iteration?

197 Views Asked by At

I used EM algorithm to estimate the parameters of the following time invariant process. $$y=Ct+e_y$$ $$x=Pt+e_x$$

where $y \in R$, $x \in R^d$, $t\sim N(0,I^{p \times p})$, $e_y \sim N(0,\Sigma_y)$, and $e_x \sim N(0,\Sigma_x)$. Both $\Sigma_x$ and $\Sigma_y$ are diagonal covariance matrices. The set of parameters to be estimated include $\theta=(C,P,\Sigma_x, \Sigma_y)$. I first derived the joint probability density function $p(x,y,t)$ as following. $$p(x,y,t)=N(Ct,\Sigma_y)N(Pt,\Sigma_x)N(0,I)\\=x^T\Sigma_x^{-1}x+y^T\Sigma_y^{-1}y-2t^T(C^T\Sigma_y^{-1}y+P^T\Sigma_x^{-1}x)+t^T(C^T\Sigma_y^{-1}C+P^T\Sigma_x^{-1}P+I)t$$ $$$$ Knowing that $p(t|x,y)\propto p(x,y,t)$, $$Var[t|x,y]=(C^T\Sigma_y^{-1}C+P^T\Sigma_x^{-1}P+I)^{-1}$$ $$E[t|x,y]=(C^T\Sigma_y^{-1}C+P^T\Sigma_x^{-1}P+I)^{-1}(C^T\Sigma_y^{-1}y+P^T\Sigma_x^{-1}x)$$ $$Var[t|x,y]=E[(t-E[t|x,y])(t-E[t|x,y])^T]=E[tt^T|x,y]-E[t|x,y]E[t^T|x,y]$$ $$E[tt^T|x,y]=Var[t|x,y]+E[t|x,y]E[t^T|x,y]$$ $$$$ Now I have the required posterior distributions for the M-step, including $E[t] $ and $E[tt^T]$ (for conciseness $|x,y$ is omitted). In the M-step, considering the distributions for $y$ and $x$ are independent of each other, $P$, $C$, $\Sigma_x$ and $\Sigma_y$ can be maximized independent of each other as well. The complete log-likelihood function of the EM algorithm is derived as followings. $$L(X=[x_n]_{n=1}^N;\theta)=\dfrac{N}{2}\ln|\Sigma_x^{-1}|-\dfrac{1}{2}trace[\Sigma_x^{-1} \sum_{n=1}^N(x_nx_n^T-x_nE[t_n^T]P^T-PE[t_n]x_n^T+PE[t_nt_n^T]P^T)]$$

$$L(Y=[y_n]_{n=1}^N;\theta)=\dfrac{N}{2}\ln|\Sigma_y^{-1}|-\dfrac{1}{2}trace[\Sigma_y^{-1} \sum_{n=1}^N(y_ny_n^T-y_nE[t_n^T]C^T-CE[t_n]y_n^T+CE[t_nt_n^T]C^T)]$$

Subsequently, taking the derivatives of the above equations with respect to each parameter and setting these derivatives to zero yield (derivation similar to equations 13 to 16 of http://mlg.eng.cam.ac.uk/zoubin/course04/tr-96-2.pdf): $$C_{new}=(\sum_{n=1}^Ny_nE[t_n^T])(\sum_{n=1}^NE[t_nt_n^T])^{-1}$$ $$P_{new}=(\sum_{n=1}^Nx_nE[t_n^T])(\sum_{n=1}^NE[t_nt_n^T])^{-1}$$ $$\Sigma_x=\dfrac{1}{N}\sum_{n=1}^N(x_nx_n^T-P_{new}E[t_n]x_n^T)$$ $$\Sigma_y=\dfrac{1}{N}\sum_{n=1}^N(y_ny_n^T-C_{new}E[t_n]y_n^T)$$ This completes one step of EM. The joint likelihood of the data is computed as following, since $p(x)$ and $p(y)$ are independent of each other (not sure if it is correct).This log-likelihood should increase after each step of EM. $$L(X,Y;\theta_{new})=N\ln(2\pi^{-1/2})+N\ln(2\pi^{-d/2})+N\ln|\Sigma_y+C_{new}C^T_{new}|^{-1/2}+N\ln|\Sigma_x+P_{new}P^T_{new}|^{-1/2}-\dfrac{1}{2}\sum_{n=1}^Nx_n^T(\Sigma_x+P_{new}P^T_{new})^{-1}x_n+y_n^T(\Sigma_y+C_{new}C^T_{new})^{-1}y_n$$

I have implemented this in matlab. The log-likelihood is decreasing after each step. I am attaching the code and the testing data with this question. Could anybody help me identify the problem?

function net=PPLS_EM(X,Y,p,iter,tol)

net=struct('type','PPLS_EM','C',[],'P',[],'Ex',[],'Ey',[],'mux',[],'muy',    [],'LL',[]);
[N,din]=size(X); %N by d input data matrix column [6 9 11 13 16 18] of the d00_te. data
[~,dout]=size(Y); %N by 1 output data matrix column 7 of the d00_te.dat

const1=(2*pi)^(-din/2);
const2=(2*pi)^(-dout/2);
lik=0;
LL=[];
tiny=exp(-700);

Vcur=zeros(p,p,N);
tcur=zeros(p,N);

YY=diag(sum(Y.*Y)');
XX=diag(sum(X.*X)');

if nargin<5
tol=0.0001;
end

if nargin<4
iter=100;
end

if nargin<3
p=2;
end


%% Initilization
C=ones(dout,p);
Ey=ones(dout,1);
Ey=diag(Ey);

P=ones(din,p);
Ex=ones(din,1);
Ex=diag(Ex);

Ex=Ex+(Ex==0)*tiny;
Ey=Ey+(Ey==0)*tiny;

Etxsum=0;
Etysum=0;
Vttsum=0;

for t=1:iter
%% E-step

oldlik=lik;
for n=1:N
    tcur(:,n)=inv(eye(p)+P'*inv(Ex)*P+C'*inv(Ey)*C)*(P'*inv(Ex)*X(n,:)'+C'*inv(Ey)*Y(n,:)');
    Vcur(:,:,n)=inv(eye(p)+P'*inv(Ex)*P+C'*inv(Ey)*C)+tcur(:,n)*tcur(:,n)';

    invx=inv(Ex+P*P');
    invy=inv(Ey+C*C');

    Etx=tcur(:,n)*X(n,:);
    Ety=tcur(:,n)*Y(n,:);

    Etxsum=Etxsum+Etx;
    Etysum=Etysum+Ety;

    Vttsum=Vttsum+Vcur(:,:,n);

    % calculate likelihood
    detiEx=sqrt(det(invx));
    detiEy=sqrt(det(invy));
    if (isreal(detiEx) && detiEx>0)
        lik=lik+log(detiEx)-0.5*sum(sum(X(n,:).*(X(n,:)*invx)))+log(detiEy)-0.5*sum(sum(Y(n,:).*(Y(n,:)*invy))); %log-likelihood of input data matrix
    else
        break;
    end;
end
lik=lik+N*log(const1)+N*log(const2);

if (t<=2)
    likbase=lik;
elseif (lik<oldlik)
    %fprintf('Oops!');
elseif ((lik-likbase)<(1 + tol)*(oldlik-likbase)||~isfinite(lik))
    break;
end;

LL=[LL lik];

%% M-step
C=Etysum'*inv(Vttsum);
P=Etxsum'*inv(Vttsum);
Ex=diag(diag(XX-(P*Etxsum)))/N;
Ey=diag(diag(YY-(C*Etysum)))/N;
end

net.C=C;
net.P=P;
net.Ex=Ex;
net.Ey=Ey;
net.LL=LL;

Data can be accessed at https://www.dropbox.com/s/i36kw4qq6al0pmn/d00_te.dat?dl=0

1

There are 1 best solutions below

0
On BEST ANSWER

Problem Solved

The decreasing Likelihood was due to a coding problem. The accumulation of $lik$ is not only in the for loop for the data samples but also in the for loop for E-M step. This leads to adding the $lik$ of current EM-step onto the $lik$ of the previous EM-step which of course keeps decreasing $lik$. I am attaching the revised code here for anyone interested. No restriction on how you might use it.

function net=PPLS_EM(X,Y,p,iter,tol)

net=struct('type','PPLS_EM','C',[],'P',[],'Ex',[],'Ey',[],'LL',[]);
[N,~]=size(X); %N by p input data matrix

LL=[];
lik=0;

YY=diag(sum(Y.*Y)');
XX=diag(sum(X.*X)');

if nargin<5
tol=0.0001;
end

if nargin<4
iter=100;
end

if nargin<3
p=2;
end


%% Initilization

C=ones(dout,p);
Ey=ones(dout,1);
Ey=diag(Ey);

P=ones(din,p);
Ex=ones(din,1);
Ex=diag(Ex);

for t=1:iter
%% E-step
oldlik=lik;
[lik,Vtt,Etxsum,Etysum]=PPLS_E(C,P,Ex,Ey,X,Y);
if (t<=2)
    likbase=lik;
elseif (lik<oldlik)
    %fprintf('Oops!');
elseif ((lik-likbase)<(1 + tol)*(oldlik-likbase)||~isfinite(lik))
    break;
end;

LL=[LL lik];

%% M-step
C=Etysum'*inv(Vtt);
P=Etxsum'*inv(Vtt);
Ex=diag(diag(XX-(P*Etxsum)))/N;
Ey=diag(diag(YY-(C*Etysum)))/N;
end

net.C=C;
net.P=P;
net.Ex=Ex;
net.Ey=Ey;
net.LL=LL;

And the code for PPLS_E

    function [lik,Vtt,Etxsum,Etysum]=PPLS_E(C,P,Ex,Ey,X,Y)

    [N,din]=size(X);
    [~,dout]=size(Y);

    const1=(2*pi)^(-din/2);
    const2=(2*pi)^(-dout/2);

    lik=0;
    p=size(P,2);

    Vcur=zeros(p,p,N);
    tcur=zeros(p,N);

    tiny=exp(-700);
    I=eye(p);

    Ex=Ex+(Ex==0)*tiny;
    Ey=Ey+(Ey==0)*tiny;

    Etxsum=zeros(p,din);
    Etysum=zeros(p,dout);
    Vtt=0;

    Sigma_XY=I+C'*inv(Ey)*C+P'*inv(Ex)*P;
    for n=1:N
        tcur(:,n)=inv(Sigma_XY)*(P'*inv(Ex)*X(n,:)'+C'*inv(Ey)*Y(n,:)');
        Vcur(:,:,n)=inv(Sigma_XY)+tcur(:,n)*tcur(:,n)';

        Etx=tcur(:,n)*X(n,:);
        Ety=tcur(:,n)*Y(n,:);

        Etxsum=Etxsum+Etx;
        Etysum=Etysum+Ety;

        Vtt=Vtt+Vcur(:,:,n);

        % calculate likelihood
        Ydiff=Y(n,:)'-C*tcur(:,n);
        Xdiff=X(n,:)'-P*tcur(:,n);
        inx=inv(Ex+P*Vcur(:,:,n)*P');
        iny=inv(Ey+C*Vcur(:,:,n)*C');
        detiEx=sqrt(det(inx));
        detiEy=sqrt(det(iny));
        if (isreal(detiEx) && detiEx>0 && isreal(detiEy) && detiEy>0)
            lik=lik+log(detiEx)-0.5*sum(Xdiff'.*(Xdiff'*inx))+log(detiEy)-0.5*sum(Ydiff'.*(Ydiff'*iny)); %log-likelihood of input data matrix
        else
            break;
        end;
    end
    lik=lik+N*log(const1)+N*log(const2);