Probability and Statistics Books for Distributions and Introduction to Data Mining/Machine Learning

767 Views Asked by At

In college, I took a probability class using Sheldon Ross' A First Course in Probability. It was not my best semester to say the least. However, I am returning back to probability and statistics as it relates to what I want to do later in life. Since then I have learned some basic data mining, regression modeling, more generalized statistics topics, but without any real theory. I would like to learn the theory because as the modeling gets more complicated, more theory comes into play and I would like to understand more than the general explanation. However, my foundation of probability is not well-rounded. I know of mean, standard deviation, variance, hypothesis testing, linear model assumptions, but there are not that complex.

Specifically, I would like to explore more of the different types of distributions (gamma, poisson, etc), and explore topics related to modeling (logistic, support vectors, random forest, etc), but also topics in Data Mining and Machine Learning.

I have a B.A. in Mathematics and Economics from an okay school, but have taken courses and understood topics in Linear Algebra, Multivariate Calculus, Econometrics, Statistics (for Economics), Mathematical Models (covered predator and prey, linear regression, differentiable equations), and Analysis (which I mostly understood).

Based on the above, I am looking for books that would help me get to where I want to be with DETAILED examples and walkthroughs. I am not a big person on books that say this is trivial or make general assumptions without explaining the topic. I know it won't all be in one book. The two books I have are: An Introduction to Statistical Learning: with Application in R and R Data Mining: Implement data mining techniques through practical use cases and real world datasets. I am in the process of finishing the second book, but just encountered Maximum Likelihood and got thrown for a loop. In case you can't tell by the titles, I am also learning R. Any advice would be well received as well as suggestions to free copies. Thank you.

1

There are 1 best solutions below

4
On BEST ANSWER

I've linked these notes before, but the notes here and here give a pretty good (free) note set which I used in my mathematical statistics courses at Rice University. The notes heavily influenced (the author had the previously linked author as an instructor at UW Madison) this book, which I think is probably my favorite statistical theory book.

Those books are more theoretical accounts of statistical theory, and give a good understanding for what's going on.

If you want good books on Machine Learning specifically, then these three books are probably good places to go, with the last being kind of canonical in the field, and all three written by major players in Machine Learning.

None of this stuff is cutting edge, but would get you started for sure.