I am puzzled by a problem in which the mathematical formula for the Pearson Correlation Coefficient ($r$) is an objective function to be maximized, but in a contingency table with specific marginal distributions for two ordinal variables. While I'm aware that $r$ is typically not suitable for ordinal data, my interest lies in treating this as a constrained optimization problem. The constraints are the given marginal distributions, and the objective is to find the maximal possible value of $r$.
Consider two ordinal variables, $A$ and $B$, each with $K$ categories, measured on the same scale (e.g., a Likert scale with different prompts). The marginal distribution for each variable is known, but the joint distribution is unspecified, constrained only by the requirement to fit these marginals.
Here are some examples that I hope illustrate the problem.
Example 1:
For $A$ and $B$ both having two categories ('High' and 'Low'):
- Marginal Distribution of $A$: High = 70, Low = 30
- Marginal Distribution of $B$: High = 70, Low = 30
In this scenario, the maximum $r$ consistent with these marginals is 1, demonstrated by the following joint distribution:
| B = High | B = Low | |
|---|---|---|
| A = High | 70 | 0 |
| A = Low | 0 | 30 |
Example 2:
Changing the marginal distributions:
- Marginal Distribution of $A$: High = 90, Low = 10
- Marginal Distribution of $B$: High = 50, Low = 50
Here, the maximum $r$ is less than 1. The joint distribution that maximizes $r$ under these constraints is:
| B = High | B = Low | |
|---|---|---|
| A = High | 50 | 40 |
| A = Low | 0 | 10 |
Yielding an $r$ of $1/3$.[^ I don't know if I have a formal proof of this claim, but I have an argument as to why this joint distribution yields the largest value of $r$. It is essentially based on the idea that in this 2 by 2 table we only have one degree of freedom to 'play with' cell counts -- kinda akin to the degrees of freedom in Chi-squared tests in contingency tables.]
Question:
Is there a generalized approach or formula to mathematically determine the maximal value of $r$ for any given marginal distribution of $A$ and $B$? The solution, I suspect, depends on: how the marginal distribution is represented (e.g. counts, in which case the sample size would come into play or proportions in which case one could concievably appeal to continuous multivariate optimization techniques); the number of categories $K$; the marginal distributions of $A$ and $B$; and potentially a suitable measure of association (like Kolmogorov-Smirnov distance, Bhattacharyya distance, or Kullback–Leibler divergence) play in this optimization problem? Insights into how these factors influence the maximum value of $r$ and any relevant mathematical methods or literature references would be greatly appreciated.
This is a linear optimization problem. The joint distributions form a $K^2$-dimensional vector space, and specifying the marginal distributions imposes $2K-1$ linear constraints ($-1$ because both sets of constraints include the same normalization constraint). That leaves a $(K-1)^2$-dimensional space, in which the correlation coefficient (which is linear in the joint distribution) is to be maximized under the linear constraints imposed by the non-negativity of the joint distribution.