This is a problem I have recently come across and have been trying to solve it mathematically.
- Suppose there is a pond with 100 fish
- Each day, there is a 5% chance that population of the pond increases by 5% of its current population
- A 5% chance that the population of the pond decreases by 5% of its current population a 90% chance that the population of the pond stays the same
In the past, I have come up with methods to estimate the pond population using recursive formulas (e.g. Expected Score in a Coin Flipping Game ) as well as simulation based techniques to show how the pond population is expected to change (R Code):
set.seed(123)
n_days <- 1000
pond_population <- rep(0, n_days)
pond_population[1] <- 100
for (i in 2:n_days) {
prob <- runif(1)
if (prob <= 0.05) {
pond_population[i] <- pond_population[i-1] + round(pond_population[i-1] * 0.05)
} else if (prob > 0.05 && prob <= 0.10) {
pond_population[i] <- pond_population[i-1] - round(pond_population[i-1] * 0.05)
} else {
pond_population[i] <- pond_population[i-1]
}
}
plot(pond_population, type = "l", main = "Pond Population Over Time", xlab = "Day", ylab = "Population")
I thought of an extension to this problem: Suppose each day you catch 10 distinct fish from this pond, tag these fish, and then put them back into the pond. Naturally, it is possible that some days you will catch fish that you previously caught in the past - and it is also possible that some of the fish you caught in the past will die. After you have finished fishing on the 1000th day - what percent of the current fish pond population will be known to you?
I am not sure how to create a mathematical formula which can estimate the expected number of fish you have seen on the 1000th day. I also don't know how to create a mathematical formula which can estimate the probability of observing X% of the fish on the 1000th day.
Recently, I was shown (https://stackoverflow.com/a/76701995/13203841) how to use a simulation based approach to estimate the percentage of the fish pond seen on the 1000th day:
library(tidyverse)
set.seed(123)
fish_pond_sim <- function(pop=100, days=1000, fished_per_day=10) {
fish <- tibble(
id = 1:pop,
seen = FALSE,
dead = FALSE
)
results <- tibble(
population = pop,
day = 1,
seen = 0,
dead = 0,
seen_and_alive = 0
)
living <- pop
for (i in 2:days) {
prob <- runif(1)
five_percent <- round(length(living) * 0.05)
if (prob <= 0.05) {
five_pct_sample <- sample(living, five_percent, replace = FALSE)
fish[fish$id %in% five_pct_sample,]$dead <- TRUE
} else if (prob > 0.05 && prob <= 0.10) {
fish <- fish %>% add_row(
id = max(fish$id):(max(fish$id) + five_percent),
seen = FALSE,
dead = FALSE
)
}
fished_sample <- sample(living, fished_per_day, replace = FALSE)
fish[fish$id %in% fished_sample,]$seen <- TRUE
living <- fish[!fish$dead,]$id
results <- results %>% add_row(
population = length(living),
day = i,
seen = sum(fish$seen),
dead = sum(fish$dead),
seen_and_alive = sum(fish$seen & !fish$dead)
)
}
return(results)
}
result <- fish_pond_sim(1000, 1000)
result %>%
slice_tail() %>%
pull(seen_and_alive_percentage) # 84.00424
result %>%
ggplot(aes(x = day)) +
geom_line(aes(y = population, color = "Population")) +
geom_line(aes(y = seen_and_alive, color = "Seen and Alive")) +
theme_bw()
But I was wondering if someone could help me figure out a mathematical formula to estimate these probabilities?
Thanks!

