minimizing the sum of weighted absolute distance

1.9k Views Asked by At

Let $x_1, \ldots, x_n \in \mathbb{R}^d$ denote $n$ points in $d$-dimensional Euclidean space, and $w_1, \ldots, w_n \in \mathbb{R}_{\geq 0}$ any non-negative weights.

$\arg\min_{\mu \in \mathbb{R}^d} \sum_{i=1}^n w_i | x_i-\mu| = median\{x_1, \ldots, x_n \}?$

I understand why the median of $\{x_1, \ldots, x_n \}$ minimizes the function without the weights. But I am not sure if the median is what minimizes the function with the weights, should I use Lagrange multiplier to solve this?

2

There are 2 best solutions below

1
On

What minimizes your weighted is again the median !!! But not the median of $x_1,x_2,... $.

A value that minimizes your sum will be a number $\mu $ such that

$$ \sum_{i\in \{1,2,...,n\}, x_i>\mu} w_i < \frac {\sum_{i=1}^n w_i}{2} $$

$$ \sum_{ i\in \{1,2,...,n\},\,\, x_i<\mu} w_i < \frac {\sum_{i=1}^n w_i}{2} $$

Note that this $\mu $ will also a median. It is a median of the random variable defined on the sample space ${1,2,...,n} $ that sends $i $ to $x_i $. In this probability space, the probability of occurrence of $i $ is $\frac {w_i} {w_1+...+w_n} $


Opps! ! I just noted that your $x_i $ can be vectors, therefore my answer only works when $d=1$

0
On

The problem you have stated is the weighted L1 median (sometimes called the weighted geometric median) of your data. There are many references on the topic: https://www.google.com/search?q=l1+median

A concise reference along with an algorithm is: http://www.stat.rutgers.edu/home/cunhui/papers/39.pdf

However, note that the Vardi-Zhang process needs to be adjusted for data having repeated values. If you scan the literature, there are some references that describe this adjustment.