Least relative error linear regression solution

97 Views Asked by At

Consider the optimization problem of minimizing the function $f(a, b)$ defined as:

$$f(a, b) = \sum_{i=1}^n \left(\frac{x_i - \frac{y_i - b}{a}}{x_i}\right)^2,$$

where $\{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}$ is a set of points in $\mathbb{R}^2$ with positive coordinates. The goal is to find real numbers $a$ and $b$ that minimize this function.

I followed the standard approach of taking partial derivatives with respect to both parameters and equating each one to 0. This led to the following system of equations:

\begin{align*} \sum_{i=1}^n \left(1 - \frac{y_i - b}{a x_i}\right) \frac{y_i - b}{a^2 x_i} &= 0, \\ \sum_{i=1}^n \left(1 - \frac{y_i - b}{a x_i}\right) \frac{1}{a x_i} &= 0. \end{align*}

Solving the first equation yields the expression for $a$:

$$a = \frac{\sum_{i=1}^n \left(\frac{y_i - b}{x_i}\right)^2}{\sum_{i=1}^n \frac{y_i - b}{x_i}}.$$

I can do a similar thing on the second equation to obtain an expression for $b$:

$$ b = \frac{\sum_{i=1}^n \frac{y_i - a x_i}{x_i^2}}{\sum_{i=1}^n \frac{1}{x_i}}. $$

I'm unsure about how to proceed further since both expressions for $a$ and $b$ appear to be quite complex. Any help from here would be appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

The solution may not be simple so I only give you the ideas. We rewrite $f(a,b)$ as $$ f(a, b)= \sum_{i=1}^n \Big(\frac{x_i - \frac{y_i - b}{a}}{x_i}\Big)^2 = \sum_{i=1}^n \Big(1-\frac{y_i-b}{ax_i}\Big)^2=\sum_{i=1}^n \Big(1-\frac{1}{a}\frac{y_i}{x_i}+\frac{b}{a}\frac1{x_i}\Big)^2. $$

Now we let $\tilde a=\frac1a$, $\tilde b=\frac ba$, $\tilde y_i=\frac{y_i}{x_i}$ and $\tilde x_i=\frac1{x_i}$, so that we have $$ f(a,b)=\sum_{i=1}^n(1-\tilde a\tilde y_i+\tilde b\tilde x_i)^2. $$ Take partial derivatives and set them to zero. You will obtain linear systems in $\tilde a,\tilde b$: $$ \begin{aligned} \sum_{i=1}^n(1-\tilde a\tilde y_i+\tilde b\tilde x_i)\tilde y_i&=0,\\ \sum_{i=1}^n(1-\tilde a\tilde y_i+\tilde b\tilde x_i)\tilde x_i&=0. \end{aligned}\implies \begin{aligned} \Big(\sum_{i=1}^n\tilde y_i^2\Big)\tilde a-\Big(\sum_{i=1}^n\tilde x_i\tilde y_i\Big)\tilde b&=\sum_{i=1}^n\tilde y_i, \\ \Big(\sum_{i=1}^n\tilde x_i\tilde y_i\Big)\tilde a-\Big(\sum_{i=1}^n\tilde x_i^2\Big)\tilde b&=\sum_{i=1}^n\tilde x_i,\\ \end{aligned} $$ Solve $\tilde a,\tilde b$. Then you can find the original ones easily.

0
On

Just for completeness, here are the closed-form solutions for $a$ and $b$ based on the answer by @ImbalanceDream:

\begin{align*} a &= \frac{\left(\sum_{i=1}^n \frac{y_i}{x_i^2}\right)^2 - \sum_{i=1}^n \frac{y_i^2}{x_i^2}\sum_{i=1}^n \frac{1}{x_i^2}}{\sum_{i=1}^n \frac{y_i}{x_i^2}\sum_{i=1}^n \frac{1}{x_i} - \sum_{i=1}^n \frac{y_i}{x_i}\sum_{i=1}^n \frac{1}{x_i^2}},\\ b &= \frac{\sum_{i=1}^n \frac{y_i^2}{x_i^2}\sum_{i=1}^n \frac{1}{x_i} - \sum_{i=1}^n \frac{y_i}{x_i^2} \sum_{i=1}^n \frac{y_i}{x_i}}{\sum_{i=1}^n \frac{y_i}{x_i^2}\sum_{i=1}^n \frac{1}{x_i} - \sum_{i=1}^n \frac{y_i}{x_i}\sum_{i=1}^n \frac{1}{x_i^2}}. \end{align*}

To make this easier to compute, we can rewrite it as $a = p/q$ and $b = r/q$, where \begin{align*} p &= \left(\sum_{i=1}^n z_i t_i\right)^2 - \sum_{i=1}^n z_i^2\sum_{i=1}^n t_i^2,\\ q &= \sum_{i=1}^n z_i t_i\sum_{i=1}^n t_i - \sum_{i=1}^n z_i\sum_{i=1}^n t_i^2,\\ r &= \sum_{i=1}^n z_i^2\sum_{i=1}^n t_i - \sum_{i=1}^n z_i t_i\sum_{i=1}^n z_i, \end{align*} where $z_i=y_i/x_i$ and $t_i=1/x_i$.