Statistical concepts like spaces of probability distributions, "metrics" like Fisher information or relative entropy, and convergence with respect to these quantities are needed in my research problems. I need to come up to speed on these tools quickly, but can't find any standard, rigorous treatment of them.
Can anyone share good references on this and the needed prerequisite knowledge?
It looks like to even start with this topic you need quite a lot of knowledge in stochastic processes (I am using the two volumes on diffusion by Rodgers as reference), functional analysis (using Folland and Conway), and maybe some differential geometry (using Lee's 3 books).
Edit: This conference paper linked by this question gives a strong argument that studying these things from a geometric perspective is misguided, which rules out information geometry references. But there is still plenty of motivation to study relative entropy and Fisher information.
Information geometry is introduced in Amari's book.
A concise treatment of the relation between information theory and statistics can be found in "Information Theory and Statistics : A Tutorial by Csiszar and Shields. Particularly section 3 and 4 are dedicated to information geometry. They are credible researchers on this topic.
There is this chapter of Cover-Thomas book on information theory, targeting mostly engineers.
However, spaces of probability distributions are discussed in many classic books of Probability, particularly where the notion of convergence in distribution is discussed. Billingsley's "Convergence of Probability Measures" is a classic here.