I have a list of documents that belong to a category. Each document has n number of pages. Each page is analyzed with an OCR engine and the time to process each page is saved into a table.
So, I have
CATEGORY | DOC | PAGE | TIME
1 1 1 2
1 1 2 3
1 2 1 3
1 3 3 4
1 3 2 4
2 5 1 5
...
Now, I am grouping these documents to calculate the average time to process their pages. For instance,
CATEGORY | DOC | AVG TIME
1 1 2.5
1 2 3
...
Now, I'd like to know the actual average to process a page per category. But adding these averages and dividing by the number of docs within that category does not make any sense to me.
What can I do to do this? Can I calculate a weighted average in this case? How?
Thanks
You need to do a weighted average. The weight in your case is number of pages. What that does for you is recalculate the time for all pages, and divide by the total number of pages $$Average=\frac{\sum_{DOC}AvgTime\times pages}{\sum_{DOC}pages}$$