My question is mathematical related but it concerns a bit biology since they are images of cells. Image we have 3 images of negative control. Since I have to only use 1 I need to normalize these 3 images into 1. Imagine we have the following dataframe
total area number of cells
300 5
600 3
500 6
where total area is the area that the signal occupies in the image, and number of cells is the number of cells that produce that area. In the 1st image, 5 cells occupy 300 total area, in the 2nd 3 cells occupy 600 total area and in the 3rd 6 cells occupy 500 total area -------- 1st method-------
- Calculate mean of total area: (300+600+500)/3 = 466.6
- Divide that mean with the sum of the cells: 466/14 = 33.32 -------- 2nd method-------
- Calculate mean of each of my images: 300/5 = 60, 600/3= 200, 500/6= 83.3
- Sum these means and find the average: (60+200+83.3)/3 = 114.43
As you can see the outcome is much different and I am not sure which method is correct. The interesting is that the trend of the images stay the same with both methods, but the values between the cells is different
The short answer is that you need to divide the sum of all areas by the sum of all the cell amount, that is $1400/14 = 100$. This is because you can think of it as one large image and you want the average (I am guessing that is what you mean by normalize, if not, please clarify).
The longer answer is that the assumption of all images coming from a single large image representing one type of cell might be incorrect.
For example, suppose that each image represents a different kind of cell (call them $A,B,C$) with a different size distribution. Then you might be interested in averaging each image independently, (eg: $300/5 = 60;\ 600/3 = 200;\ 500/6 = 83$), and noticing that each image has a different average cell size. Furthermore, just because there are 6 cells in the third image doesn't mean that the $C$ cells are more abundant, so you might want to do a weighted average, taking into account this fact (this is similar to your second method). In any case, the first method is wrong.