machine learning octave code gradient descent question

14.1k Views Asked by At

I'm taking Coursera Machine learning course. so who take this courses will able to help this problem.

this is the octave code to find the delta for gradient descent.

     theta = theta - alpha / m * ((X * theta - y)'* X)';//this is the answerkey provided

First question) the way i know to solve the gradient descent theta(0) and theta(1) should have different approach to get value as follow

     theta(0) = theta(0) - alpha / m * ((X * theta(0) - y)')'; //my answer key
     theta(1) = theta(1) - alpha / m * ((X * theta(1) - y)')'; //my answer key

but i'm not sure why the answer key only show

            theta = theta - alpha / m * ((X * theta - y)'* X)';

this equation.

Second question) what is the ' ' doing in octave code?

            theta = theta - alpha / m * ((X * theta - y)'* X)';
                                '* X)' // what ' ' thing do in here

There are 1 best solutions below


Transpose here is used for matching the columns of the X with rows of theta. Ex: size of X=97x2; y=97x1; theta=2x1;

first calc is X * theta. The size of the resulting matrix will be 97x1. Then, the sub of two same size matrices. Now, we have to multiply X with the matrix obtained from the previous step. But, the sizes are different (97x1) * (97x2)

Thus transposing the first matrix makes multiplication possible. This results in a new matrix of size 1x2 (row vector). But, theta is of size 2x1 (column vector). Hence the final transpose.