Cost Function of Neural Network (Forward Propagation)

868 Views Asked by At

This question is related to Andrew Ng's machine learning course on Coursera. Basically, when I calculate the cost function of a neural network, I use the following formula that was described by Ng: $$ J(\theta) = \frac 1m \sum_{i=1}^m\sum_{k=1}^k\begin{bmatrix}-y_k^{(i)}\log((h_\theta(x^{(i)}))_k) - (1-y_k^{(i)})\log(1-(h_\theta(x^{(i)}))_k)\end{bmatrix} $$

Let's call: \begin{align} j_1 &= -y_k^{(i)}\log((h_\theta(x^{(i)}))_k)\\j_2&= - (1-y_k^{(i)})\log(1-(h_\theta(x^{(i)}))_k)\\j&= j_1+j_2 \end{align} So the exercise asked me to convert any $y$ and $h_\theta(x)$ matrix into a binary matrix before performing the computation. Now this is when I don't understand...either I am getting it wrong or I am misunderstanding a concept.

If $y_k=0$ and $h_\theta(x)=0$ then $j_1=0, j_2=0, j=0$. Similarly

If $y_k=1$ and $h_\theta(x)=1$ then $j_1=0, j_2=0, j=0$.

Which seems fair since there should be no impact to the overall cost function.

How about...

If $y_k=1$ and $h_\theta(x)=0$ then $j_1$ is undefined since i cannot perform $\log(0)$. ? and also

If $y_k=0$ and $h_\theta(x)=1$ then I will also face the $\log(0)$ problem?

I think I am interpreting something wrongly but am not sure...

1

There are 1 best solutions below

1
On BEST ANSWER

When the output of the neural network is to be interpreted as a probability distribution (e.g. over $K$ visual object categories), the final layer is typically a softmax: $$ h_\theta(x)_k = \frac{e^{z_k(x)}}{\sum_{j=1}^K e^{z_j(x)}} $$ where each $z_j$ is a "neuron activation", which itself is a differentiable function of the input $x$.

Now notice that the softmax is always positive for finite inputs $z_k$. That is, we can never have $h_\theta(x) = 0$ (although it can be very small, and this can lead to numerical problems).

Similarly, the softmax cannot output values that are exactly equal to $1$, for finite inputs.