Lagrange multipliers and Maxima CAS

167 Views Asked by At

I am considering the system

eqns : [
a11*q1+a12*q2+a13*q3-q1 = 0, 
a21*q1+a22*q2+a23*q3-q2 = 0,
a31*q1+a32*q2+a33*q3-q3 = 0, 
a11+a21+a31-1 = 0, 
a12+a22+a32-1 = 0,
a13+a23+a33-1 = 0, 
2*a11*(1-q1)*q1-L1*q1-L4 = 0,
2*a12*(1-q2)*q2-L1*q2-L5 = 0, 
2*a13*(1-q3)*q3-L1*q3-L6 = 0,
2*a21*(1-q1)*q1-L2*q1-L4 = 0, 
2*a22*(1-q2)*q2-L2*q2-L5 = 0, 
2*a23*(1-q3)*q3-L2*q3-L6 = 0, 
2*a31*(1-q1)*q1-L3*q1-L4 = 0,
2*a32*(1-q2)*q2-L3*q2-L5 = 0,
2*a33*(1-q3)*q3-L3*q3-L6 = 0];

inside maxima, with solve(eqns,[a11,a12,a13,a21,a22,a23,a31,a32,a33]). The solution looks like

(%i18) linsolve(eqns,[a11,a12,a13,a21,a22,a23,a31,a32,a33,L1,L2,L3,L4,L5,L6]);

solve: dependent equations eliminated: (3)
(%o18) [a11 = 
             2      2                          2
  (q2 - 1) q3  + (q2  - 5 q1 q2 + 4 q1) q3 - q2  + 4 q1 q2 - 3 q1
- ---------------------------------------------------------------, 
       ((9 q1 - 6) q2 - 6 q1 + 3) q3 + (3 - 6 q1) q2 + 3 q1
                    2                        2                          2
a12 = - ((q1 - 1) q3  + ((1 - 2 q1) q2 - 2 q1  + 3 q1) q3 + q1 q2 + 2 q1
 - 3 q1)/(((9 q1 - 6) q2 - 6 q1 + 3) q3 + (3 - 6 q1) q2 + 3 q1), 
                                            2        2                  2
a13 = (((2 q1 - 1) q2 - q1) q3 + (1 - q1) q2  + (2 q1  - 3 q1) q2 - 2 q1
 + 3 q1)/(((9 q1 - 6) q2 - 6 q1 + 3) q3 + (3 - 6 q1) q2 + 3 q1), 

The system above models a minimization problem using Lagrange multipliers for the function $f(a_{11},a_{12},\ldots,a_{33}) = \sum_{i=1}^{3}\sum_{j=1}^{3}a^{2}_{ij}p_1q_1,$ with constraints $\sum_{j=1}^{3}a_{ij}q_j = q_i,$ for all $i \in [1..3],$ and column-wise constraints $\sum_{i=1}^3a_{ij} = 1,$ for all $j \in [1..3].$

However, maxima's output is not what I am expecting to see: When using cvxpy (https://www.cvxpy.org/) like so:

def solve(self,u,verbose=True):
    x = cp.Variable(self.ncols)
    variances = sum(x[self.ids[(i,j)]]**2*self.p[j]*(1-self.p[j]) for (i,j) in itertools.product(bit_utils.bits(u),range(self.n)))
    # expectances = sum(-x[self.ids[(i,j)]]*self.p[j] for (i,j) in itertools.product(bit_utils.bits(u),range(self.n)))
    # pr = variances.__add__(expectances)
    pr = variances
    # constraints = [0 <= x, self.M*x == self.b]
    constraints = [self.M*x == self.b]
    print("Constraints: ",constraints)
    if verbose:
        print("Curvature: ",pr.curvature)
    objective = cp.Minimize(pr)
    prob = cp.Problem(objective,constraints)
    if verbose:
        print(prob)
        result = prob.solve(verbose=True)
    else:
        result = prob.solve(verbose=False)
    if verbose:
        my_utils.print_matrix(self.M)
        print('problem state: ', prob.status)
    alpha = np.zeros((self.n,self.n))
    # vec = A*x.value-b
    for i in range(self.n):
        for j in range(self.n):
            alpha[i,j] = x.value[self.ids[(i,j)]]
    if verbose:
        print("The norm of the residual, i.e. of ||Ax-b||_2 is %.6f" % (np.linalg.norm(self.M.dot(x.value)-self.b)))
        print("The objective function's optimal value: ",prob.value)
    return (x,alpha)

I get something like $a_{11} = a_{12} = a_{13} = \frac{q_1}{q_1+q_2+q_3},$ and similarly for the other two rows. How to reconcile these two results?

1

There are 1 best solutions below

0
On

The equations you derived are not correct. So let's do the the full calculation in *Maxima'.

The problem: Minimize

$$f(a_{11},a_{12},\ldots,a_{33}) = \sum_{i=1}^{3}\sum_{j=1}^{3}a^{2}_{ij}p_1q_1$$ under the contraints $$\sum_{j=1}^{3}a_{ij}q_j = q_i, \; \forall i \in \{1,2,3\}$$ $$\sum_{i=1}^3a_{ij} = 1, \; \forall j \in \{1,2,3\}$$ The variable f holds the term you want to minimize. The variables $a_{ij}$ and $q_i$ will be represented by indexed variables a[i,j]and q[i].

(%i1) f:sum(sum(a[i,j]^2*q[i]*q[j],j,1,3),i,1,3)
(%o1) q[3]^2*a[3,3]^2+q[2]*q[3]*a[3,2]^2+q[1]*q[3]*a[3,1]^2+q[2]*a[2,3]^2*q[3]
                     +q[1]*a[1,3]^2*q[3]+q[2]^2*a[2,2]^2+q[1]*q[2]*a[2,1]^2
                     +q[1]*a[1,2]^2*q[2]+q[1]^2*a[1,1]^2

Create the constraints

(%i2) create_list(sum(a[i,j]*q[j],j,1,3)-q[i],i,[1,2,3])
(%o2) [a[1,3]*q[3]+a[1,2]*q[2]+q[1]*a[1,1]-q[1],
       a[2,3]*q[3]+q[2]*a[2,2]+q[1]*a[2,1]-q[2],
       q[3]*a[3,3]+q[2]*a[3,2]+q[1]*a[3,1]-q[3]]
(%i3) create_list(sum(a[i,j],i,1,3)-1,j,[1,2,3])
(%o3) [a[3,1]+a[2,1]+a[1,1]-1,a[3,2]+a[2,2]+a[1,2]-1,a[3,3]+a[2,3]+a[1,3]-1]

Concatenate the created lists of constraints and store them in a list named constrs.

(%i4) constrs:append(%th(2),%th(1))
(%o4) [a[1,3]*q[3]+a[1,2]*q[2]+q[1]*a[1,1]-q[1],
       a[2,3]*q[3]+q[2]*a[2,2]+q[1]*a[2,1]-q[2],
       q[3]*a[3,3]+q[2]*a[3,2]+q[1]*a[3,1]-q[3],a[3,1]+a[2,1]+a[1,1]-1,
       a[3,2]+a[2,2]+a[1,2]-1,a[3,3]+a[2,3]+a[1,3]-1]

This is the auxiliary function we define using the Lagrange Mulzipliers L[1],...,L[6]

(%i5) aux:f+sum(L[t]*constrs[t],t,1,6)
(%o5) (a[3,3]+a[2,3]+a[1,3]-1)*L[6]+(a[3,2]+a[2,2]+a[1,2]-1)*L[5]
                                   +(a[3,1]+a[2,1]+a[1,1]-1)*L[4]
                                   +q[3]^2*a[3,3]^2
                                   +L[3]
                                    *(q[3]*a[3,3]+q[2]*a[3,2]+q[1]*a[3,1]
                                                 -q[3])+q[2]*q[3]*a[3,2]^2
                                   +q[1]*q[3]*a[3,1]^2
                                   +L[2]
                                    *(a[2,3]*q[3]+q[2]*a[2,2]+q[1]*a[2,1]
                                                 -q[2])
                                   +L[1]
                                    *(a[1,3]*q[3]+a[1,2]*q[2]+q[1]*a[1,1]
                                                 -q[1])+q[2]*a[2,3]^2*q[3]
                                   +q[1]*a[1,3]^2*q[3]+q[2]^2*a[2,2]^2
                                   +q[1]*q[2]*a[2,1]^2+q[1]*a[1,2]^2*q[2]
                                   +q[1]^2*a[1,1]^2

The list of variables vars contains 15 Variables

(%i6) create_list(L[t],t,[1,2,3,4,5,6])
(%o6) [L[1],L[2],L[3],L[4],L[5],L[6]]
(%i7) create_list(a[i,j],i,[1,2,3],j,[1,2,3])
(%o7) [a[1,1],a[1,2],a[1,3],a[2,1],a[2,2],a[2,3],a[3,1],a[3,2],a[3,3]]
(%i8) vars:append(%th(2),%th(1))
(%o8) [L[1],L[2],L[3],L[4],L[5],L[6],a[1,1],a[1,2],a[1,3],a[2,1],a[2,2],
       a[2,3],a[3,1],a[3,2],a[3,3]]

By differentiating the auxiliary function we get 15 equations eqs

(%i9) eqs:create_list(diff(aux,v),v,vars)
(%o9) [a[1,3]*q[3]+a[1,2]*q[2]+q[1]*a[1,1]-q[1],
       a[2,3]*q[3]+q[2]*a[2,2]+q[1]*a[2,1]-q[2],
       q[3]*a[3,3]+q[2]*a[3,2]+q[1]*a[3,1]-q[3],a[3,1]+a[2,1]+a[1,1]-1,
       a[3,2]+a[2,2]+a[1,2]-1,a[3,3]+a[2,3]+a[1,3]-1,
       L[4]+2*q[1]^2*a[1,1]+L[1]*q[1],L[5]+2*q[1]*a[1,2]*q[2]+L[1]*q[2],
       L[6]+2*q[1]*a[1,3]*q[3]+L[1]*q[3],L[4]+2*q[1]*q[2]*a[2,1]+q[1]*L[2],
       L[5]+2*q[2]^2*a[2,2]+L[2]*q[2],L[6]+2*q[2]*a[2,3]*q[3]+L[2]*q[3],
       L[4]+2*q[1]*q[3]*a[3,1]+q[1]*L[3],L[5]+2*q[2]*q[3]*a[3,2]+q[2]*L[3],
       L[6]+2*q[3]^2*a[3,3]+L[3]*q[3]]

Now we solve the equations

(%i10) solve(eqs,vars)

solve: dependent equations eliminated: (3)
(%o10) [[L[1] = (q[3]*%r8+q[2]*%r8+q[1]*%r8+2*q[3]^2-2*q[1]^2)
              /(q[3]+q[2]+q[1]),
         L[2] = (q[3]*%r8+q[2]*%r8+q[1]*%r8+2*q[3]^2-2*q[2]^2)
              /(q[3]+q[2]+q[1]),L[3] = %r8,
         L[4] = -(q[1]*q[3]*%r8+q[1]*q[2]*%r8+q[1]^2*%r8+2*q[1]*q[3]^2)
              /(q[3]+q[2]+q[1]),
         L[5] = -(q[2]*q[3]*%r8+q[2]^2*%r8+q[1]*q[2]*%r8+2*q[2]*q[3]^2)
              /(q[3]+q[2]+q[1]),
         L[6] = -(q[3]*(q[2]*%r8+q[1]*%r8)+q[3]^2*%r8+2*q[3]^3)
              /(q[3]+q[2]+q[1]),a[1,1] = q[1]/(q[3]+q[2]+q[1]),
         a[1,2] = q[1]/(q[3]+q[2]+q[1]),a[1,3] = q[1]/(q[3]+q[2]+q[1]),
         a[2,1] = q[2]/(q[3]+q[2]+q[1]),a[2,2] = q[2]/(q[3]+q[2]+q[1]),
         a[2,3] = q[2]/(q[3]+q[2]+q[1]),a[3,1] = q[3]/(q[3]+q[2]+q[1]),
         a[3,2] = q[3]/(q[3]+q[2]+q[1]),a[3,3] = q[3]/(q[3]+q[2]+q[1])]]

The solutions are the same as in your cvxpy calculation. So lets compare our equations with the equations in your Maxima calculation

(%i11) for term in eqs do print(term)
a[1,3]*q[3]+a[1,2]*q[2]+q[1]*a[1,1]-q[1] 
a[2,3]*q[3]+q[2]*a[2,2]+q[1]*a[2,1]-q[2] 
q[3]*a[3,3]+q[2]*a[3,2]+q[1]*a[3,1]-q[3] 
a[3,1]+a[2,1]+a[1,1]-1 
a[3,2]+a[2,2]+a[1,2]-1 
a[3,3]+a[2,3]+a[1,3]-1 
L[4]+2*q[1]^2*a[1,1]+L[1]*q[1] 
L[5]+2*q[1]*a[1,2]*q[2]+L[1]*q[2] 
L[6]+2*q[1]*a[1,3]*q[3]+L[1]*q[3] 
L[4]+2*q[1]*q[2]*a[2,1]+q[1]*L[2] 
L[5]+2*q[2]^2*a[2,2]+L[2]*q[2] 
L[6]+2*q[2]*a[2,3]*q[3]+L[2]*q[3] 
L[4]+2*q[1]*q[3]*a[3,1]+q[1]*L[3] 
L[5]+2*q[2]*q[3]*a[3,2]+q[2]*L[3] 
L[6]+2*q[3]^2*a[3,3]+L[3]*q[3] 
(%o11) done

So some of your equations were wrong.