Can a sampling based method estimate how many species exist?

95 Views Asked by At

I've got in to a bit of a debate online and I'm hoping some people here can help clear it up. The position I'm arguing against is "It's impossible even come up with a ballpark estimate for how many species exist."

My logic is:

  1. Estimating the total number of species (known and unknown) can be treated as an estimation of distinct values problem.
  2. Sampling based estimators can be used to estimate the number of distinct values.

Therefor it's possible to estimate the total number of species.

Is my logic sound?

4

There are 4 best solutions below

3
On

If all species were equally abundant and easy to catch, and assuming you can recognize an organism as belonging to a new species when you catch it, then you could use such estimators. But if there are lots of very rare species that would have very low probabilities of showing up in samples, there's no way to estimate that (unless you catch everything!).

5
On

This is a very intriguing question.

It shares resemblance to a heated discussion over at the Biology Stack Exchange. Here I summarize from that post.

To tie the relevance to your specific question, the answer I describe here is a conjecture about: "Upper limit to possible number of species that can exist", rather than a ballpark value.

Proposed Conjecture

The maximum number of species must be limited by the maximum combinatorial/permutational space that can be occupied by DNA. Thus if there is a maximum physical genome size this is what will determine the maximum number of species that can possibly exist.

Explanation

For example, lets say the maximum number of DNA base pairs able to fit in a genome was 3, each can be one of either {A,G,T,C}. Then there are 4^3 = 64 possible combinations of genomes. Thus there can only be 64 possible distinct species. Extrapolate to genome sizes of X, then there are 4^X possible species that can possibly exist.

Formulation

The maximum number of possible species < 4^X

0
On

For estimating the number of species in a specific habitat, look at:

http://viceroy.eeb.uconn.edu/estimates/

Especially look at Anna Chao's work, which is really quite brilliant. If you observe a total of $S$ species, then a rather reliable lower bound on the number of species you have not observed is $n_1^2/(2n_2)$, where $n_1$ is the number of species observed once, and $n_2$ is the number of species observed twice.

I also co-authored papers on this subject: http://www.math.missouri.edu/~stephen/preprints/class-novel.html http://www.math.missouri.edu/~stephen/preprints/class-bayesian.html http://www.math.missouri.edu/~stephen/preprints/schmidt-paper.html The last paper was published, and is based on Chao's work.

0
On

I would argue yes and no.

Yes: I saw the claim in E. O. Wilson's Biodiversity that the unknown species are dominated by insects in the rain forest. If you believe this, you can essentially use sample-recapture techniques. I have read that it is standard practice to set off a gas bomb by a tree and count/ID everything that falls out. Do this for a bunch of trees and you can get a distribution of new species. Say every time you sample a different tree of the same species you get five new insect species, but when you had sampled half as many as today you were finding ten. Now try a new species of tree and find a hundred. You can use this (over many species of trees) to estimate a distribution.

No: do we believe that claim? Think about the relatively recent discovery of the hydrothermal vents in the ocean and the life around them. There is new work at microscopic life buried in the rocks and soil-and there is a lot of volume there. The rain forest work can give a lower bound, but there might be some large population we have missed entirely.