Logistic regression - Prove That the Cost Function Is Convex

52.4k Views Asked by At

I'm reading about Hole House (HoleHouse) - Stanford Machine Learning Notes - Logistic Regression.

You can do a find on "convex" to see the part that relates to my question.

Background:
$h_\theta(X) = sigmoid(\theta^T X)$ --- hypothesis/prediction function
$y \in \{0,1\}$

Normally, we would have the cost function for one sample $(X,y)$ as:

$y(1 - h_\theta(X))^2 + (1-y)(h_\theta(X))^2$

It's just the squared distance from 1 or 0 depending on y.

However, the lecture notes mention that this is a non-convex function so it's bad for gradient descent (our optimisation algorithm).

So, we come up with one that is supposedly convex:

$y * -log(h_\theta(X)) + (1 - y) * -log(1 - h_\theta(X))$

You can see why this makes sense if we plot -log(x) from 0 to 1:

we get a corresponding range from <span class=$\infty$ to 0">

i.e. if y = 1 then the cost goes from $\infty$ to 0 as the hypothesis/prediction moves from 0 to 1.

My question is:
How do we know that this new cost function is convex?

Here is an example of a hypothesis function that will lead to a non-convex cost function:

$h_\theta(X) = sigmoid(1 + x^2 + x^3)$

leading to cost function (for y = 1):

$-log(sigmoid(1 + x^2 + x^3))$

which is a non-convex function as we can see when we graph it:

non-convex

2

There are 2 best solutions below

1
On BEST ANSWER

You are looking at the wrong variable. The need is for $J(\theta)$ to be convex (as a function of $\theta$), so you need $Cost(h_{\theta}(x), y)$ to be a convex function of $\theta$, not $x$. Note that the function inside the sigmoid is linear in $\theta$.

1
On

Consider a twice differentiable function of one variable $f(z)$. If the second derivative of $f(z)$ is (always) non-negative, then $f(z)$ is convex. So consider the function $$ j(z) = -y\log(\sigma(z)) - (1-y)\log(1-\sigma(z)) $$ where $\sigma(x) =sigmoid(x)$ and $0\leq y \leq 1$ is a constant. You can show that $j(z)$ is convex by taking the second derivative. Compare with the case that you take $$ k(z) = y\sigma(z)^2 + (1-y)(1-\sigma(z))^2 $$ is $k(z)$ convex? What if you take $\tilde{\sigma}(z) = sigmoid(1+z^2+z^3)$ instead of $\sigma$(z)?

Now, the composition of a convex function with a linear function is convex (can you show this?). Note that $Z(\theta) := \theta^T \cdot X $ is a linear function in $\theta$ (where $X$ is a constant matrix). Therefore, $J(\theta) := j(Z(\theta))$ is convex as a function in $\theta$.