I'm trying to solve a regression problem using python 3 without an machine learning libraries.
The input data consists of a csv file of x,y floats which should fit the hypothesis:
y = x^theta
I need to use regression to find the value of theta.
This is different than the more common regression problem because theta is an exponent of x rather than a coefficient.
I believe the appropriate loss function is root mean squared: Non-vectorized in python:
sum = 0
for i in range(N):
sum += Y[i] - (X[i]**theta)
cost = 1/(2*N) * (sum**2)
I do not know multivariate calculus so I am uncertain how to compute the partial derivative for the update rule for this perhaps uncommon hypothesis.
I know that the derivative of a^x w.r.t. x is ln(x)a^x.
I also know that the partial derivatives of the root mean squared loss function for univariate linear regression are:
derivate_of_the_loss_w.r.t._theta0: 1/m * (h(xi)-y(xi))
derivate_of_the_loss_w.r.t._theta1: 1/m * (h(xi)-y(xi))* (xi)
where m is the size of the dataset and xi is the ith feature in the dataset.
so the update rule(s) look like:
theta0 = theta0 - learning_rate * derivate_of_the_loss_w.r.t._theta0
and
theta1 = theta1 - learning_rate * derivate_of_the_loss_w.r.t._theta1
I've taken a few guesses, but none lead the learner to converge:
gradient = 0
for i in range(N):
# gradient += X[i]**theta * np.log(theta)
# gradient += X[i]**theta * np.log(X[i])
gradient += (Y[i] - X[i]**theta) * np.log(X[i])
# print('gradient=|' + str(gradient) + '|')
theta_new = theta - learning_rate * gradient
Correct me if I'm wrong. What I understood is that you are interested in $$ \begin{split} \frac{d}{d\theta}\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2} &= \frac{\frac{d}{d\theta}\left(\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2\right)} {2\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} = \frac{\frac1n\sum_{i=1}^n\frac{d(x_i^\theta-y_i)^2}{d\theta}} {2\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} \\ &= \frac{\frac1n\sum_{i=1}^n2(x_i^\theta-y_i)\frac{d(x_i^\theta-y_i)}{d\theta}} {2\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} = \frac{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)x_i^\theta\log(x_i)} {\sqrt{\frac1n\sum_{i=1}^n(x_i^\theta-y_i)^2}} . \end{split} $$