How do I read out the loss function used in YOLO? I somehow need it for a class that I'm attending.
EDIT
Got an answer in Reddit!
How do I read out the loss function used in YOLO? I somehow need it for a class that I'm attending.
EDIT
Got an answer in Reddit!
Copyright © 2021 JogjaFile Inc.
It's a bit of an unexpected question, but I guess I would read it out by describing one term at a time. (Hopefully you meant a high-level description, not literally a phonetic sequence.) I'd say something like this when "reading it out":
Overall, we want to perform simultaneous object detection and classification. The indicator functions $(\unicode{x1D7D9}_{ij}^{ \text{obj} })$ denote when the $j$th box in cell $i$ (i.e. the $j$th prediction has maximal confidence). Similarly the indicator $(\unicode{x1D7D9}_{i}^{ \text{obj} })$ denotes whether there is an object in cell $i$. Hatted quantities (e.g. $\widehat{x}$, $\widehat{C}$, $\widehat{p}_i$) are predictions of their unhatted counterparts. The sums over $i$ are over the gridded cells of the image, while the sums over $j$ iterate over the bounding box predictors (per cell).
The first term checks that the predicted object box centers are close to the real ones, based on the squared distance between the centers.
The second term checks that the sizes (width $w$ and height $h$) of the predicted and true boxes are close to each other, to maximize overlap between them.
The third and fourth term measures the existence confidence (or objectness), i.e. $C_i$ gives the probability of an object being in cell $i$ at all, so the loss want the confidence of our learner to match whether or not an object is actually present.
The fifth term is the classification loss, so that the network correctly categorizes each object if an object exists there.
Might be helpful to look at other Yolo questions: [1], [2], [3], [4], [5], [6].