A good, self-study statistical computing book

752 Views Asked by At

I'm looking for a book an introductory statistical computing that has proofs for the methods as well as examples. I'd like proofs that are about the same level as (or lower than) proofs in Statistical Inference by Casella and Berger. I'd also like examples in R or MATLAB. If there aren't any good books that combine these two features, then please recommend two books.

1

There are 1 best solutions below

0
On

Rice's book has worthwhile examples of various computing techniques using real data, but the main proofs are what you will find in Casella/Berger, perhaps at a slightly lower mathematical, pre-measure-theoretic level.

One of several very good books introducing basic statistical methods in R (descriptive statistics, t and chi-squared tests, ANOVA models for one and two factors, simple and multiple linear regression) using R for real data is by Dalgaard. (It is a Springer book, but he has posted a free .pdf on a Danish webssite.) First 3-4 chapters mostly R, skip beyond those to see stat applications. There are extensive datafiles resulting from serious investigations, mainly in medical and biological science.

You might want to browse the entire lists of 'use R' books from Springer and Chapman-Hall, read reviews and sample previews, and pick statistical topics of interest to you personally. I can speak from personal experience that both publishers provide authors with serious help from anonymous reviewers as to content, relevance, and correctness. (These days some 'publishers' put more effort into the pretty covers than what's between them.)

R is not the only computer package useful for statistical analysis. Python (with Scripy) and others are also used. However, as of today, I think R is your best bet: free; lots of authoritative books; expert, open, curated programming of procedures; available for windows, Mac, and UNIX; R studio, R Commander, etc., in addition to basic R; several appealing libraries for graphical display; and (did I mention) free. [Matlab does amazing things, but it is expensive and for most students available only in math dept settings.]

You may have difficulty finding both useful computational guidance and solid mathematical-statistics proofs in the same book. As discussed in the comments, one reason is that handling both would make a book of unwieldy length.

Just as important is that a huge contribution of statistical computation is to allow the practical analysis of data that cannot be handled by standard methods covered in math-stat books. Many statistical software packages (e.g., SAS, SPSS, Minitab, Jump, and R) have procedures for handling standard situations. Heavy-duty and innovative computation arises when assumptions or models are beyond the territory covered by standard methods.

The validity of basic simulation methods is covered by the Law of Large numbers (WLLN), sometimes with a little extra work. Some maximum likelihood computations require computer intensive deterministic computations. (Justified by numerical analysis, not statistics.) Bootstrap CIs and permutation tests use extensive simulation. Many Bayesian models require Gibbs sampling or the Metropolis-Hastings algorithm, both Markov Chain Monte Carlo (MCMC) methods. There are open research questions about the convergence and speed of convergence of some of these methods; meanwhile. numerical and graphical descriptive methods are required to check that they are working as intended.

Crucially, I would like to stress that you should use real data as much as possible as you get into statistical computing. You can be reasonably sure that someone cared enough about the research that yielded the data that there are engaging issues to explore and potentially important conclusions to be drawn. Fake and simulated data may have pedagogical uses, but they may also be used to promote marvelously clever and endlessly intricate methods to answer questions no one is seriously asking. There are plenty of challenging questions to answer about real data to keep you learning important stuff for a long while.

Finally, for 'recreational' reading to get into a useful frame of mind, you might read the books Dyson: "Turing's cathedral" and Silver: "The signal and the noise." A bit of history and a bit of rational vision toward the future.