How to differentiate $f(w) = \sum\left[(w.\Omega w)^2\right]-\sum\left[w.\Omega w\right]^2$?

105 Views Asked by At

I have a function $$f(w) = \sum\left[(w.\Omega w)^2\right]-\sum\left[w.\Omega w\right]^2$$ where $.$ indicates element wise multiplication, $w$ is a vector, $\Omega$ is a symmetric positive definite matrix.

I need to optimize the above function w.r.t $w$. The size of my problem is small enough to do this with any old non linear optimizer, but I was curious to know what the gradient looks like, should I ever want to feed it in to the optimizer.

So this became a little toy puzzle for me to play with whilst alone in the office, but my persistent failure at finding the answer has become annoying.

In attempting to solve this, I noted that at some point I would need to find $$\frac{\partial (w.\Omega w)}{\partial w}$$

I'm pretty sure I have correctly worked out that this is $$w.\Omega + diag(\Omega w)$$ where $diag(x)$ converts the vector $x$ into a diagonal matrix.

To verify this, I have the following Matlab code. I can translate to R if required for this question.

a = randn(100000,1);
b = a + randn(100000,1);
c = a - b + randn(100000,1);
omega = cov([a b c], 1);
w = [0.1 0.5 0.4];

r = w'.*omega*w';
drdw = omega.*w' + diag(omega*w');
d = 0.000000001;
i = 3;
w_ = w;
w_(i) = w_(i) + d;
r_ = w_'.*omega*w_';
disp((r_ - r) ./ d)
disp(drdw(:,i))

So with that result in my pocket, I focused on the first term of $f$. My reasoning is as follows

$$\frac{\partial \sum\left[(w.\Omega w)^2\right]}{\partial w} = \sum\left[\frac{\partial ((w.\Omega w)^2)}{\partial w}\right]$$ application of chain rule gives $$2.\sum\left[(w.\Omega w) \frac{\partial (w.\Omega w)}{\partial w}\right]$$ and we already have the pocket solution from earlier.

But now when I try to verify this solution, it goes wrong.

r = sum(w'.*omega*w');
i = 3;
drdw = sum((w'.*(omega*w')).*(omega.*w' + diag(omega*w'))) * 2;
d = 0.000000001;
w_ = w;
w_(i) = w_(i) + d;
r_ = sum(w_'.*omega*w_');
disp((r_ - r) ./ d)
disp(drdw(i))

But where does it go wrong?

EDIT

As noted by Abdullah, there was a bug in my code, rather than a flaw in my maths. Correct code is

r = sum((w'.*omega*w').^2);
i = 3;
drdw = sum((w'.*(omega*w')).*(omega.*w' + diag(omega*w'))) * 2;
d = 0.000000001;
w_ = w;
w_(i) = w_(i) + d;
r_ = sum((w_'.*omega*w_').^2);
disp((r_ - r) ./ d)
disp(drdw(i))
2

There are 2 best solutions below

4
On BEST ANSWER

I don't have MATLAB installed on my personal laptop and it will take a while to download Octave, so the following is hypothetical, i.e. debugging on paper, sigh.

I think it goes wrong with how you use the sum function of MATLAB. Since you are not passing the sum direction and (w'.*(omega*w')).*(omega.*w' + diag(omega*w')) is a matrix, sum does the following according to the MATLAB reference documentation

If A is a matrix, then sum(A) returns a row vector containing the sum of each column.

Try sum((w'.*(omega*w')).*(omega.*w' + diag(omega*w')),2).

Also r_ = sum(w_'.*omega*w_'); seems fishy. w_'.*omega*w_' is a vector, so sum will return a scalar and r_ is supposed to be a vector, too. But I don't know your application so I am not sure.

Edit: if $r = \sum [(w.\Omega w)^2]$ then r = sum(w'.*omega*w'); is not the correct MATLAB code to implement it.

2
On

Think I got a little epiphany, actually. Burning the midnight oil a bit, but oh well.

$dw$ is a shorthand, at least in our beloved application-speak, for "small variation in w." The core of the issue is that the code doesn't implement a proper defintition of "small variation" in a vector.

To linearly approximate $dw$, your code adds some small constant to each term in the $w$ vector. This addition does not accurately capture all degrees of freedom of $w$, as adding a fixed constant to all three elements only changes the magnitude of the vector. I'd imagine that taking the sum of $w$ and some vector with three small, independently-varying elements would give accurate results.

As to why the code happened to give accurate results for the first example but failed for the second, I'm not good for an answer.