Is there way to simplify or improve efficiency of LU factorization algorithm?

402 Views Asked by At

I have attempted to implement LU facotorization from pages 3-6 of this document:

http://www4.ncsu.edu/~kksivara/ma505/handouts/lu-pivot.pdf

An example of running the code below for a simple 4x4 matrix:

A=[2 1 1 0; 4 3 3 1; 8 7 9 5 ; 6 7 9 8];
intch=[];flag=0;
>> [Bp]=Gauss2(A,4,intch,flag);
>> B=lu(A);
>> Bp

Bp =

    8.0000    7.0000    9.0000    5.0000
    0.7500    1.7500    2.2500    4.2500
    0.5000   -0.2857   -0.8571   -0.2857
    0.2500   -0.4286    0.3333    0.6667

>> B

B =

    8.0000    7.0000    9.0000    5.0000
    0.7500    1.7500    2.2500    4.2500
    0.5000   -0.2857   -0.8571   -0.2857
    0.2500   -0.4286    0.3333    0.6667

As you can see the matrix Bp and B are the same, which proves that my LU factorization algorithm works correctly for a 4x4 matrix A. I have modified A to be a 60x60 and a 256x256, both of which also work correctly (i.e. Bp==B). However, I am now running this code on a dataset that is 4000x4000. My program does not finish (even after running overnight for 10 hours). In MATLAB, while the program is running, I can click on the "Pause" button. This places the program in debug mode and tells me the precise line of code the program is spending a lot of time on. It turns out, in the "Gauss2" function below, the program is spending a lot of time on the two nested for loops at the end of the function (see below where I have identified "BOTTLENECK" in comments).

How do I improve the speed of the matrix multiplication here? If you goto the very bottom of pg. 5 of http://www4.ncsu.edu/~kksivara/ma505/handouts/lu-pivot.pdf you will see Equations 21.6 and 21.7 which say:

L_k'=P(m-1)...P(k+1)L_k*P(k+1)...*P(m+1)

and

L=inverse(L(m-1)' .... L(2)'L(1)')

This is exactly what the two nested for loops below are trying to do. Is there a smarter way to compute L_k' ?

function [Bp]= Gauss2(A, n, intch, flag)

    intch=zeros(n,1);
    flag=0;
    a=zeros(n,n);

    for k=1:n-1
        [amax,amax_index] = max(abs(A(k:n,k)));
        amax_index=amax_index+k-1;
        if (amax == 0)
            flag=1; % matrix is singular
            intch(k)=0;
        else
            intch(k)=amax_index;
            if (amax_index ~= k)
                % pivoting
                % interchange rows k and m
                A([k,amax_index],:) = A([amax_index,k],:);
            end

            % calculate multipliers store, and store in "a"
            for i=k+1:n
                a(i,k)=-A(i,k)/A(k,k);
            end

            % perform elimination for column "k"
            for i=k+1:n
                A(i,:)=A(i,:)+a(i,k)*A(k,:);

            end 
        end   
    end
    if (A(n,n)==0)
        flag=1;
        intch(n)=0;
    else
        intch(n)=n;
    end

    U=A; % this is the uppper matrix 

    % next, need to determine "L_k'" from algorithm 
    % for all k
    prevX=eye(n);
    for k=1:n-1
        X=eye(n);
        for j=(n-1):-1:(k+1)
            X=X*P(j,intch(j),n); % <---- BOTTLENECK HERE
        end
        X=X*L(a,k,n);
        for j=(k+1):(n-1)
            X=X*P(j,intch(j),n); % < --- BOTTLENECK HERE
        end

        prevX=X*prevX;

    end


    Lower=inv(prevX);

    Bp=U+Lower-eye(n);

end

Here is my code for P.m and L.m below. These are functions used in the code above to generate permutation matrices P_1,P_2,...,P_k and L_1, L_2,....L_k . I need to compute these in order to figure out the L_k' matrices, which can then be used to figure out the final L lower matrix.

function [Y]=L(a,k,n)
    Y=eye(n);
    Y((k+1):n,k)=a(k+1:n,k);
end


function [X]=P(i,j,n)
    X=eye(n);
    X([i,j],:) = X([j,i],:);
end
1

There are 1 best solutions below

3
On

There are a number of ways to improve the efficiency. One way off the top of my head is to use the block LU decomposition. Instead of doing the original

$$ A= LU \tag{1}$$

we do $$ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22}\end{bmatrix} = \begin{bmatrix} L_{11} & 0 \\ L_{21} & L_{22}\end{bmatrix} \begin{bmatrix} U_{11} & U_{12} \\ 0 & U_{22}\end{bmatrix} \tag{2}$$

there is code here.

Another aspect is considering the sparsity. Also, some algorithms multiply by random matrices to approximate the decomp.

Depending on the structure if it is positive definite symmetric you do a Cholesky decomposition which is twice as fast.

Note with the block LU we are solving different problems

$$ A_{11} = L_{11} U_{11} \tag{3} $$ $$ A_{12} = L_{11}U_{12} \tag{4} $$ $$ A_{21} = L_{21}U_{11} \tag{5} $$ $$ A_{22} = L_{21}U_{12} + L_{22}U_{22} \tag{6} $$

I believe...then your $4000 \times 4000$ matrix will become $4$ $2000 \times 2000 $ matrices