Multiplications and transposed vectors

1.6k Views Asked by At

This might seem as trivial to you, but for a beginner like me it's crucial to understand that, it's at the 1st lecture of a course I want to take. I want to unfold the inner part of this sum: $\sum_{n=1}^N(y_n - \theta^Tx_n)^2$. This is for the Least Squares method in Regression, where y is the output (let's say it's a number), x the input and θ the parameters we are trying to estimate (so these two entities are vectors). Here is the proof I am trying to make (I know it holds):

$J(\theta) = \sum_{n=1}^N(y_n - \theta^Tx_n)^2$. I want to take the gradient of this, w.r.t. θ. So I have:

$J(\theta) = \sum_{n=1}^N[(y_n - \theta^Tx_n) * (y_n - \theta^Tx_n)]$ = ?

I am having trouble making the multiplication. Here is what I have in my textbook:

$J(\theta) = \sum_{n=1}^N(y_n^2 - y_nx_n^T\theta - \theta^Tx_ny_n + \theta^Tx_nx_n^T\theta)$

$ =\sum_{n=1}^N(y_n^2 - 2y_nx_n^T\theta + \theta^Tx_nx_n^T\theta)$

What troubles be is that we had the tranpose marking over θ, but it now got moved to $x_n$.

1

There are 1 best solutions below

0
On BEST ANSWER

I will just detail some of the work left out, starting with

$$(y - \theta^T x)^2 = (y - \theta^T x)\cdot(y - \theta^T x) = (y - \theta^T x)^T(y - \theta^T x)$$

Now transposition is a linear operator so we can distribute this through the sum

$$(y - \theta^T x)^T(y - \theta^T x) = (y^T - (\theta^T x)^T)(y - \theta^T x) = (y^T - x^T \theta)(y - \theta^T x)$$

Where I have used the second fact I mentioned $(AB)^T = B^TA^T$. Next we expand (you can even FOIL it but this is basic expansion of a squared binomial)

$$(y^T - x^T \theta)(y - \theta^T x) = y^T y - y^T \theta^T x - x^T\theta y + x^T \theta \theta^T x$$

Now the middle two terms are actually the exact same number, the reason is that all the objects listed here are $1 \times 1$ matrices or scalars, and the transpose of a scalar is itself. To see why they are the same again apply the matrix identity

$$ (y^T \theta^T x)^T = (\theta^T x)^T y = x^T \theta y$$

Since the middle terms are transposes of each other, but are scalars they are equal and so you can combine them into one term becoming

$$ y^T y - 2 x^T \theta y + x^T \theta \theta^T x = y^2 - 2x^T \theta y + x^T \theta \theta^T x$$

When you bring this back down to the component level (with the n subscripts), the order of the terms doesn't matter so much so we obtain

$$ \sum_n^N y_n^2 - 2x_n^T \theta y_n + x_n^T \theta \theta^T x_n = \sum_n^N y_n^2 - 2y_n x_n^T \theta + \theta^T x_nx_n^T\theta $$