I'm having a problem identifying when one variable is independent or dependent on another in multidimensional data. For this I have:
The statistical variables x and y are said to be independent if for all i ∈ {1, ..., k} and all j ∈ {1, ..., l} is verified fij = fi. * f.j
If x and y are independent then the conditional distribution of the variable x by y = yj is the same whatever the value and
That is, the conditioned distribution of the variable x by y = yj matches with the marginal distribution of x
If x and y are independent then the conditional distribution of the variable and for x = xi is the same whatever the value xi : That is, the conditional distribution of the variable y by x = xi matches with the marginal distribution of y
But when it comes to applying it to an exercise with real numbers and checking the result, I do not know how to do it. Any way to calculate dependence - statistical independence between two variables?
For example for
y
x 0 1 2 3
1 8 7 4 1
2 10 35 26 9
how can I see if x is independent or dependent of y and vice versa?
Here is an outline toward a solution:
Start by embellishing your table with row and column totals. (Please do that now, before going on.)
Then look at the $8$ at upper-left: $P(X=1,Y=0)=8/100.$ Then $P(X=1)=20/100$ and $P(Y=0)=18/100.$ Do you find that $P(X=1,Y=0)=P(X=1)P(Y=0),$ as required for independence? If not, you're done. One such failure is enough to contradict independence.
If so, move on to the next element $7$ in the body of the table. Only if the multiplication rule works for all eight, then you do have independence.
Note: Technically, you wouldn't really need to check all eight entries to verify independence. With some algebra, one can show that you would find a contradiction among elements $(1,1), (1,2),$ or $(1,3)$ of such a $2 \times 4$ table, unless independence holds. [From those three cells and the row and column totals, you could reconstruct the body of the table.] For now with such simple tables, it's probably best to check every cell to be sure.