What is the partial derivative of the loss function (root mean squared) w.r.t. theta when theta is the exponent of x?

3.6k Views Asked by At

I'm trying to solve a regression problem using python 3 without an machine learning libraries.
The input data consists of a csv file of x,y floats which should fit the hypothesis: y = x^theta

I need to use regression to find the value of theta.

This is different than the more common regression problem because theta is an exponent of x rather than a coefficient.

I believe the appropriate loss function is root mean squared: Non-vectorized in python:

sum = 0
for i in range(N):                                                                                                                       
    sum += Y[i] - (X[i]**theta)
cost = 1/(2*N) * (sum**2)

I do not know multivariate calculus so I am uncertain how to compute the partial derivative for the update rule for this perhaps uncommon hypothesis.

I know that the derivative of a^x w.r.t. x is ln(x)a^x.

I also know that the partial derivatives of the root mean squared loss function for univariate linear regression are:

derivate_of_the_loss_w.r.t._theta0:  1/m * (h(xi)-y(xi)) 
derivate_of_the_loss_w.r.t._theta1:  1/m * (h(xi)-y(xi))* (xi)

where m is the size of the dataset and xi is the ith feature in the dataset.

so the update rule(s) look like:

theta0 = theta0 - learning_rate * derivate_of_the_loss_w.r.t._theta0   

and

theta1 = theta1 - learning_rate * derivate_of_the_loss_w.r.t._theta1

I've taken a few guesses, but none lead the learner to converge:

gradient = 0
for i in range(N):
     # gradient += X[i]**theta * np.log(theta)                                                                                                             
     # gradient += X[i]**theta * np.log(X[i])                                                                                                              
     gradient += (Y[i] - X[i]**theta) * np.log(X[i])
# print('gradient=|' + str(gradient) + '|')                                                                                                                
theta_new = theta - learning_rate * gradient
1

There are 1 best solutions below

1
On BEST ANSWER

Correct me if I'm wrong. What I understood is that you are interested in $$ \begin{split} \frac{d}{d\theta}\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2} &= \frac{\frac{d}{d\theta}\left(\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2\right)} {2\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} = \frac{\frac1n\sum_{i=1}^n\frac{d(x_i^\theta-y_i)^2}{d\theta}} {2\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} \\ &= \frac{\frac1n\sum_{i=1}^n2(x_i^\theta-y_i)\frac{d(x_i^\theta-y_i)}{d\theta}} {2\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} = \frac{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)x_i^\theta\log(x_i)} {\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} . \end{split} $$