For a project I need a large regression (least squares) dataset:
If $n$ is the number of samples and $p$ the number of features, then I need $p < n$ and $p,n$ both very large. For example $n=10,000,000$, $p=1,000,000$.
Does anybody know such a dataset or at least where I could get one? I was thinking about natural language processing data but I couldn't find any.
Why don't you generate one writing a small piece of code ? You could include noise in the dependent and independent variables ? About large data sets, may be the Census bureau could provide you one or consumer association too. I suggest you have a look at http://www.census.gov/main/www/access.html