Let $X_1, X_2, \ldots,$ be a series of independent random variables such that $X_i\sim Geo(p_i)$, and let $T\in\mathbb N$ be a constant positive integer.
Next, define $N\triangleq \min\{{n\in\mathbb N}\mid \sum\limits_{i=1}^nX_i\ge T\}$.
How can we find $\mathbb E[N]$?
(An upper bound would also work).
This seems like a somewhat inverse version of the general version of Wald's equation.
Intuitively, I'd like to say that $\mathbb E[N]\approx\min\{{n\in\mathbb N}\mid \sum\limits_{i=1}^n \mathbb E[X_i]\ge T\} = \min\{{n\in\mathbb N}\mid \sum\limits_{i=1}^n 1/p_i \ge T\}$, but I'm looking for a formal argument.
If this helps, we can assume that $0.01\ge p_1\ge p_2\ge p_3\ge\ldots$
In his answer, Ian gave a solution that handles two extreme cases - one where $\min\{{n\in\mathbb N}\mid \sum\limits_{i=1}^n 1/p_i \ge T\} = \Theta(T)$ and one where the $p_i$'s decrease exponentially and we can just look for the $i$ such that $p_i^{-1}\approx T$ as a good approximation.
However, this approach doesn't seem to work well for what I think are the interesting cases. As we have $p=\prod_{i=1}^d(1-p_i)^{x_i-1}\le \prod_{i=1}^d(1-p_d)^{x_i-1}=(1-p_d)^{T-d}$, this fails in the following example. If we have $p_i = \Theta(1/i)$, then Ian's approach would give an a bound of $\mathbb E[N]\le T/\log T$ (after optimizing $d$), while we can expect $N$ to be of the order of $\sqrt T$. This is a near quadratic gap that this method cannot address.
The simplest thing to do is to split the space into a single good set and a single bad set. The good set is meant to be highly probable and yet also have good values of $N$; the bad set is meant to be highly improbable but perhaps contain bad values of $N$, hopefully still not so bad that we can't control them. One way to do this is to consider the "good" set
$$B=B(x)=\{ X_1 \geq x_1,X_2 \geq x_2,\dots,X_d \geq x_d \}$$
where $x$ is a vector of nonnegative integers satisfying $\sum_{i=1}^d x_i=T.$
We have
$$P(B)=\prod_{i=1}^d q_i^{x_1-1}$$
where $q_i=1-p_i$. On $B(x)$, $N \leq d(x)$. On $B(x)^c$, $N \leq T$ (and this is the best we can do uniformly over all of $B(x)^c$). Thus
$$E[N] \leq d(x) P(B(x)) + (1-P(B(x)) T$$
for any $x$. So the question is to find a good $x$. Suppose $f(T)$ is the "mean field" solution, i.e.
$$f(T)=\min \left \{ n : \sum_{i=1}^n p_i^{-1} \geq T \right \}.$$
Presumably $N$ is typically around $f(T)$. So we should choose $x$ with $d(x)$ a bit bigger than $f(T)$, so that $1-P(B(x))$ is small, but not so much bigger that the first term becomes large. If we don't care about constant factors, then there are three cases: