I have dataset such as:
Person Code Value
1 A 5
1 B 6
1 C 7
2 A 10
2 B 11
2 C 12
3 A 10
4 B 8
The way to interpret the data is that person 1 performed code A for 5 times and code B for 6 times. I am interested to find the ratio of code A/code B. There are missing codes for certain people, such as person 4 does not have code A. For missing values, I replace with 0, since person 4 does not perform code A. I calculate the ratio A/B for everyone such that:
Person RatioAB
1 5/6
2 10/11
3 Inf
4 0
My goal is to find the average ratio A/B for all 4 people.
What is the correct way to do this in statistical study?
- Should I only consider person 1,2 and 4 and their mean ratio which is 0.58?
- Should I first remove any person without both A and B (person 3 and person 4), and only calculate the average ratio based on person 1 and 2?
- Average of A/Average of B
My actual dataset consists of millions of people.