Determining a value $c$ such that $f(c) = \sum_{i=1}^n \left|\frac{y_i-c}{y_i}\right|v_i$ is minimized

200 Views Asked by At

I'm trying to construct prediction model for a variable of interest, based on a set of input values. I have a set of validation data and their predictions (by my model) and now I need to asses whether my model is any good or not. I'm using the following error measure for my model:

$$\displaystyle\sum_{i=1}^n \left|\frac{y_i-\hat{y}_i}{y_i}\right|\times v_i,$$

where $n$ is the number of data points, $y_i$ is the real value for data point $i$, $\hat{y}_i$ the corresponding prediction and $v_i$ a specific constant associated with observation $i$. The constants $v_i\;(i=1, ...,n)$ have the property $\sum_{i=1}^n v_i=1$ and $v_i\geq0\; \forall i$. The constants $v_i$ have their specific reason in the error function, but the reason is not relevant for this problem (unless someone really wants to know).

In order to asses the goodness of my model, I need to compare the value of the above function with a baseline value. My problem is the following: Find a value $c$ such that the function:

$$f(c) = \displaystyle\sum_{i=1}^n \left|\frac{y_i-c}{y_i}\right|\times v_i,$$

is minimized. Any suggestions on finding the minimizing value $c$? My question is closely related to my earlier question, but the error-function is now slightly different, with the weights $v_i$ added into the function.

Thank you for any help!

P.S.

Please comment if my question is unclear =)

1

There are 1 best solutions below

0
On BEST ANSWER

Assume that the $y_i$ are ordered as $0<y_1<\ldots<y_n$. Absorbing the ${1\over y_i}$ into the $v_i$ we have to minimize the piecewise linear function $$f(c):=\sum_{i=1}^n p_i\> |y_i-c|\ ,$$ where $$p_i:={v_i\over y_i}>0,\qquad\sum_{i=1}^n p_i=:P\ .$$ When $c$ is different from all $y_i$ then $$f(c)=\sum_{y_i<c} p_i(c-y_i)+\sum_{y_i>c} p_i(y_i-c)\ .$$ It follows that $$f'(c)=\sum_{y_i<c} p_i-\sum_{y_i>c} p_i$$ for these $c$. When $c<y_1$ one has $f'(c)=-P<0$. Then at each $y_i$ $\>(1\leq i\leq n)$ the slope of $f$ jumps by $2p_i$, since $p_i$ is counted negative when $c<y_i$, and is counted positive, when $c>y_i$. Finally for all $c>y_n$ one has $f'(c)=P$. The minimum of $f$ is at the $y_i$ where $f'(y_i-0)<0$ and $f'(y_i+0)\geq0$.

Define $i_*\in[n]$ by $$\sum_{i<i_*} p_i<{P\over2},\qquad \sum_{i\leq i_*} p_i\geq{P\over2}\ .$$ The optimal value for $c$ is then $c:=y_{i_*}$.