Assume all matrix games are 2 person zero-sum games. This gives us uniqueness of player 1 NE payoffs and therefore a notion of game 'value'.
Suppose we have a matrix game $M$ and we want to find its NE strategies and value. This is maybe extraneous but the probability is 1 that there is a unique NE strategy profile for $M$. Also assume that all matrix entries are $\in [-1, 1]$
If we knew $M$ exactly we could just solve it using linear programming. However there is a catch. We don't know the entries of $M$ exactly, instead each 'round' we observe an entry which consists of sampling from a PDF whose mean is the entry's value. At the end of each round, we have a matrix $M_t$ whose entries are the means of the observations. (If an entry has not been observed set it to 0, or we could also assume that all entries are observed at least once. Also it seems to me that we should consider more than just the means, but bear with me.) At each $M_t$ we can solve it and get a (a.s.) unique NES profile and value. Our goal is then to get the calculated value/strategies as close to reality as possible. Note the incredibly vague phrasing, as its not clear to me which objective is the most worthwhile.
It seems to me that for any matrix, some entries matter to the NE strategies and value more, in the sense that variations in the value will induce greater changes in NES/game value. Some entries will have a gradient in this sense, but not others. For example, if $M$ has a unique pure NE then only the entry corresponding to that NE will have a non-zero gradient.
So there's kind of a notion of gradient with respect to variance, and there's also the variance that we can observe as the rounds pass. I think a good idea is to choose at each round using this information.
This is a request for thoughts and references. I can't find anything like this on arxiv.