How does population size affect sampling?

29 Views Asked by At

I have two components: a service (creates logs) and a logger (writes logs to datastore)

The writing mechanism of the logger is sampled: the logger writes only in 1 in 10 logs to the datastore

A log is just a piece of string - eg: "foo"

In my application, there is a population of 1M possible unique strings.

Say my service produced 100 logs every second.

  1. Can I be assured that every generated log would be logged to the datastore? What is the likelihood that a generated log would never be written to the datastore?

  2. What if the population size increased to 1B - would the likelihood still be same?

Intuitively, to me it, it seems like if the population size decreased, the likelihood of all logs generated by the service would increase.

I am a complete dummy, so would appreciate some guidance on how to think about such problems :)