How to introduce different costs by class in a binary logistic regression?

17 Views Asked by At

What is the form of the Negative Log-Likelihood Goal function in Logistic regression if we introduce different costs per class (e.g. the cost of erring class 1 is errc1, and class 2 is errc2)?

1

There are 1 best solutions below

0
On BEST ANSWER

The logistic regression hypothesis (sigmoid) is defined as: $$h_w(x) = \sigma(w^T x)$$ where $\sigma(z) = \frac{1}{1 + exp(-z)}$. Logistic regression is fit by maximum likelihood, i.e. we want to maximize the probability of the observed data: $$ arg\,max ({\mathbf{w}}) \;p(x^{(1)},y^{(1)}),p(x^{(2)},y^{(2)}),...,p(x^{(N)},y^{(N)})$$ which is equivalent to (by eliminating the constant priors): $$ arg\,max ({\mathbf{w}}) \;p(y^{(1)}|x^{(1)})\cdot p(y^{(2)}|x^{(2)})\cdot...\cdot p(y^{(N)}|x^{(N)})$$

By assumption, these conditional probabilities are the sigmoid, so our goal function (through MLE, and assuming independence) is: $$ J(w) = arg\,max ({\mathbf{w}}) \prod_{i=1}^{N} \;(p(y_i=1|x_i))^{y_i}\cdot (p(y_i=0|x_i))^{1-y_i}$$

If we introduce different costs, this is equivalent to weighting according to the correspondent class probabilities, e.g. if the cost of erring $C_2$ is twice the cost of erring $C_1$, this is like having another $C_2$ case in the training data. Thus, it's equivalent to doubling it's probability:

$$ J(w) = arg\,max ({\mathbf{w}}) \prod_{i=1}^{N} \;(p(y_i=1|x_i))^{C_1/C_0*y_i}\cdot (p(y_i=0|x_i))^{(C_0/C_1)*(1-y_i)}$$