Derivative of Binary Cross Entropy - why are my signs not right?

Question

Derivative of Binary Cross Entropy - why are my signs not right?

38.9k Views Asked by Bumbble Comm At 07 Apr 2026 - 12:00

I'm trying to derive formulas used in backpropagation for a neural network that uses a binary cross entropy loss function. When I perform the differentiation, however, my signs do not come out right:

Binary cross entropy loss function: $$J(\hat y) = \frac{-1}{m}\sum_{i=1}^m y_i\log(\hat y_i)+(1-y_i)(\log(1-\hat y)$$

where

$m = $ number of training examples
$y = $ true y value
$\hat y = $ predicted y value

When I attempt to differentiate this for one training example, I do the following process:

Product rule: $$ \frac{dJ}{d\hat y_i} = -1(\frac{d}{d\hat y_i}(y_i\log(\hat y_i)+(1-y_i)(\log(1-\hat y)))) $$

Sum rule: $$ = -1(\frac{d}{d\hat y_i}y_i\log(\hat y_i)+\frac{d}{d\hat y_i}(1-y_i)(\log(1-\hat y))) $$

Product rule, deriv of constant (treating $y$ as a constant) and deriv of natural log: $$ = -1(\frac{y_i}{\hat y_i} + \frac{1-y_i}{1 - \hat y_i})$$

However, this is different from the expected result: $$ \frac{dJ}{d\hat y_i} = -1(\frac{y_i}{\hat y_i} - \frac{1-y_i}{1 - \hat y_i}) $$

Not sure what's going wrong. I'm sure I'm doing something incorrectly, but I can't figure out what it is. Any help is appreciated!

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Answer 1 · 2017-11-04 03:08:53

Let's denote the inner/Frobenius product by $a:b= a^Tb$
and the elementwise/Hadamard product by $a\odot b$
and elementwise/Hadamard division by $\frac{a}{b}$
and note that the $\log$ function is to be applied elementwise.

For convenience, let's use a modified loss function $$L=-mJ$$ Then the differential and gradient of $L$ can be calculated as $$\eqalign{ L &= y:\log({\hat y}) + (1-y):\log(1-{\hat y}) \cr \cr dL &= y:d\log({\hat y}) + (1-y):d\log(1-{\hat y}) \cr &= \frac{y}{{\hat y}}:d{\hat y} + \frac{1-y}{1-{\hat y}}:d(1-{\hat y}) \cr &= \Big(\frac{y}{{\hat y}} - \frac{1-y}{1-{\hat y}}\Big):d{\hat y} \cr &= \Big(\frac{y-{\hat y}}{{\hat y}-{\hat y}\odot{\hat y}}\Big):d{\hat y} \cr \cr \frac{\partial L}{\partial{\hat y}} &= \frac{y-{\hat y}}{{\hat y}-{\hat y}\odot{\hat y}} \cr \cr }$$ And the gradient of the original cost function is $$\eqalign{ \frac{\partial J}{\partial{\hat y}} &= -\frac{1}{m}\frac{\partial L}{\partial{\hat y}} = \frac{{\hat y}-y}{m\,({\hat y}-{\hat y}\odot{\hat y})} \cr }$$

Derivative of Binary Cross Entropy - why are my signs not right?

There are 1 best solutions below

Related Questions in LINEAR-ALGEBRA

Related Questions in DERIVATIVES

Related Questions in MACHINE-LEARNING

Trending Questions

Popular # Hahtags

Popular Questions