What's the probability of completing the illustrated "binomial walk" without ever visiting a node above the baseline?

247 Views Asked by At

enter image description here

Consider the illustrated binomial (not binary) tree with $n$ steps (drawn for $n=5$, but consider $n$ variable). Start a random walk at the left-hand node, and at each step you have probability $p$ of visiting the node to your lower-right, and probability $1-p$ of visiting the node to your upper-right.

What's the probability, call it $P_n$ (or whatever else you like:), of completing an $n$-step walk without ever visiting any node above the dotted-line baseline, i.e., never visiting one of the labelled "prohibited nodes". You're allowed to visit nodes "touching" the baseline (possible on even-numbered steps only), but never nodes above it.

    Edit #3 (new binomial coeff identity???)
-------------------------------------------------------
@BrianTung answered the question below, and below that I added another "answer", just numerically confirming Brian's result, based on his $_nD_k$ "modified binomial coefficients" (see Brian's answer). And I subsequently modified that program, just to see if my original effort at an answer, based on my $N^m_n$ coefficients (see Edit#2 below), might also work. Kind of surprised me to find that my answer works, too.

And what just occurred to me is that since both answers involve exactly the same (somewhat modified) cumulative binomial distribution sum of the form $\sum_k \mbox{coef}_{n,k}p^k(1-p)^{n-k}$, therefore both our coefficeints must be identical. And (after juggling my non-standard $m,n$-notation in $N^m_n$ to Brian's standard $n,k$'s) that leads to the following necessary-but-goofy-looking identity, $$ \frac{n-2k+1}{n-k+1}\binom{n}{k} = \binom{n-1}{k} \ - \ \sum_{i=1}^{k-1} i\ \binom{n-i-2}{k-i-1}, \hspace{10pt} k\leq\frac n2$$ where Brian's $_nD_k$ is on the lhs and mine's on the rhs. Just to be sure, I numerically checked it, and it's correct. And maybe it algebraically simplifies to some common binomial coefficient formula, but I'm not seeing how to do that. And if it doesn't simplify, then you probably "saw it here first":).

    Motivation
----------------------
Firstly, much thanks to @saulspatz and @BrianTung for their elaborate effort replying to this question. So I thought I should add a paragraph explaining why I'd asked it (beyond recreational). That relates to the comment link https://en.wikipedia.org/wiki/Binomial_options_pricing_model I posted below   ( chiding :) @jorwiki for his terminology usage formality ). Anyway...

The purpose is pricing (getting the current value of) cashflows from consumer mortgages along a binomial tree of interest rates (determined from treasury notes and bonds, and from a given volatility parameter). But mortgages are like options due to prepayments, e.g., if you sell your house then you typically pay off the entire mortgage and all cashflows abruptly stop. Statistics for that kind of prepayment are pretty much actuarially determined from observed behaviors. But if interest rates go down a lot, people may choose to refinance their mortgages to take advantage of lower monthly payments (after accounting for refinance costs, etc). That's called "rational exercise", although prepayment models usually model plenty of not-so-rational behavior, too.

So the "baseline" here represents rational exercise -- the rate below which the mortgage is refinanced and cashflows stop. The actual model's on a computer, and this idealized closed-form solution is therefore irrelevant. But for testing and comparison purposes, it's a useful tool to have. Of course, however, I seem to have underestimated the number-theory-type stuff required for that solution :)

    EDIT#1 (some preceding work)
---------------------------------------------

As per @saulspatz's comment request below, I'm showing some work I'd already tried. But it's just to demonstrate that I tried. Don't actually try reading it too carefully. It's not well-written, and in the end it didn't lead much of anywhere.

For $p=.5$ I tried arguing heads/tails "paths", somewhat along the following lines.

                       tails<-----     oN00  ----->heads
                                      / \
 +-------------------------------+   /   \   +--------------------------------+
 | Nht denotes number of "legal"/   oN01  o   \An "illegal" path on the right |
 | paths from N00 to the node  /   / \   / \   \ side of the tree has more    |
 | representing h heads and   /   /   \ /   \   \ heads than tails before the |
 | t tails.  "Legal" paths   /   oN02  oN11  o   \     n-th trial.  The lower |
 | must stay within the     /   / \   / \   / \   \      left of the tree has |
 | labelled portion        /   /   \ /   \ /   \   \        too many tails to |
 | of the tree.           /   oN03  oN21  o     o   \          catch up by n. |
 +-----------------------+   / \   / \   / \   / \   +------------------------+
                            /   \ /   \ /   \ /   \
                           oN04  oN13  oN22  o     o
                          / \   / \   / \   / \   / \
                         /   \ /   \ /   \ /   \ /   \
                        oN05  oN14  oN23  o     o     o
                       / \   / \   / \   / \   / \   / \
                      /   \ /   \ /   \ /   \ /   \ /   \
                     oN06  oN15  oN24  oN33  o     o     o

For any $N^0_t$, note that $N^0_1=N^0_2=\cdots=1$ since there's only one possible path along the ``left edge'' of the tree, i.e., toss all tails).

For $N^1_6$, there's exactly one path from $N^0_6$ to $N^1_6$, one path from $N^0_5$ to $N^1_6$ ($N^0_5\rightarrow N^1_5\rightarrow N^1_6$) in addition to the preceding path through $N^0_6$, one path from $N^0_4$ to $N^1_6$ ($N^0_4\rightarrow N^1_4\rightarrow N^1_5\rightarrow N^1_6$) in addition to the preceding paths, $\ldots$, and, finally, one additional path from $N^0_1$ to $N^1_6$ (there are no additional paths from $N^0_0$ to $N^1_6$).

By similar reasoning we obtain the following sequence of iterative formulas:
$N^1_6$ = $N^0_6$ + $N^0_5$ + $N^0_4$ + $N^0_3$ + $N^0_2$ + $N^0_1$
$N^2_6$ = $N^1_6$ + $N^1_5$ + $N^1_4$ + $N^1_3$ + $N^1_2$
$N^3_6$ = $N^2_6$ + $N^2_5$ + $N^2_4$ + $N^2_3$
$N^4_6$ = $N^3_6$ + $N^3_5$ + $N^3_4$
$N^5_6$ = $N^4_6$ + $N^4_5$
$N^6_6$ = $N^6_6$

For "interior" nodes, the same reasoning applies, e.g.:
$N^3_5$ = $N^2_5$ + $N^2_4$ + $N^2_3$
etc.

For the general case we therefore conclude:
$N^m_n$ = $\sum_{k=m}^n N^{m-1}_k$
        = $\sum_{k=1}^n N^{m-1}_k$ $-$ $\sum_{k=1}^{m-1} N^{m-1}_k$

with $N^0_n=1$ (and, trivially, $N^1_n=n$).



    EDIT#2 (some recurrence relations)
-------------------------------------------------

Following along with @BrianTung's work below, I'm posting some recurrence relations that seemed to be useful while I was trying to find a closed-form solution for my $N^m_n$. I'll continue using my notation rather than Brian's pretty-much-corresponding ${}_nD_k$, just so I can transcribe my notes with hopefully minimal mistakes...

Firstly, a generalization of the usual $\sum_1^ni=\frac{n(n+1)}2$,
define $H^m_n$ as follows,
$ \begin{array}{cclcl} H^1_n & = & & & \mbox{$1$ for all $n$} \\ H^2_n & = & \sum_{i=1}^n H^1_i & = & n \\ H^3_n & = & \sum_{i=1}^n H^2_i & = & \frac{n(n+1)}{2!} \mbox{as usual}\\ H^4_n & = & \sum_{i=1}^n H^3_i & \stackrel{?}{=} & \underbrace{an^3+bn^2+cn}_{ \mbox{solve for $a,b,c$ $\ldots$}} \\ & & & = & \frac{1}{6}n^3 + \frac{1}{2}n^2 + \frac{1}{3}n \\ & & & = & \frac{n(n+1)(n+2)}{3!} \\ \end{array}$

Now, by examination we infer the general expression

$\begin{array}{ccl} H^m_n & = & \frac{1}{(m-1)!} \prod_{k=1}^{m-1} (n+k-1) \\ & = & \frac{(n+m-2)!}{(m-1)!(n-1)!} \\ & = & \binom{n+m-2}{m-1} = \binom{n+m-2}{n-1} \\ \end{array}$


So now a closed-form solution for $N^m_n$ can tentatively be developed from our earlier (from Edit#1)
$N^m_n$ = $\sum_{k=m}^n N^{m-1}_k$
        = $\sum_{k=1}^n N^{m-1}_k$ - $\sum_{k=1}^{m-1} N^{m-1}_k$

by using the above $H^m_n$ formula to iteratively evaluate...

$\begin{array}{cclclclcl} N^0_n &=& 1 \\ &\equiv& H^1_n \\ N^1_n &=& \sum_{k=1}^n N^0_k - \underbrace{\sum_{k=1}^0 N^0_k}_{=0} \\ &=& \sum_{k=1}^{n} H^1_n \\ &=& H^2_n \\ N^2_n &=& \sum_{k=1}^n N^1_k - \sum_{k=1}^1 N^1_k \\ &=& \sum_{k=1}^n H^2_k - \sum_{k=1}^1 H^2_k \\ &=& H^3_n - \underbrace{H^3_1}_{=1=1H^1_n} \\ &=& H^3_n - H^1_n \\ N^3_n &=& \sum_{k=1}^n(H^3_k - H^1_k) - \sum_{k=1}^2(H^3_k - H^1_k) \\ &=& (H^4_n -H^2_n) - \underbrace{(H^4_2 - H^2_2)}_{=2=2H^1_n} \\ N^4_n &=& (H^5_n - H^3_n - 2H^2_n) - \underbrace{(H^5_3 - H^3_3 - 2H^2_3)}_{=3=3H^1_n} \\ N^5_n &=& (H^6_n - H^4_n - 2H^3_n - 3H^2_n) - \underbrace{(H^6_4-H^4_4-2H^3_4-3H^2_4)}_{=4=4H^1_n} \\ \vdots \\ N^m_n &=& H^{m+1}_n - \sum_{k=1}^{m-1} kH^{m-k}_n \\ & & \mbox{iff for any integer $m$ we have } m \stackrel{?}{=} H^{m+2}_m - \sum_{k=1}^{m-1} kH^{m+1-k}_m\\ & & \mbox{which seems to work out for the cases I tested} \end{array}$

3

There are 3 best solutions below

4
On BEST ANSWER

ETA2: I rolled back, because the OP's notation is a little different from mine. OP uses $_mN_k$ for the number of ways for $k$ upward moves to be chosen out of $k+m$ total moves; I use $_nD_k$ for the number of ways for $k$ upward moves to be chosen out of $n$ total moves.


Partial answer. It occurs to me that one can proceed as one normally does with the binomial distribution, but one must use different coefficients. Whereas the normal binomial coefficients $_nC_k$ have a recurrence of

$$ _nC_k ={} _{n-1}C_{k-1} +{} _{n-1}C_k $$

our modified coefficients $_nD_k$ have a recurrence of

$$ _nD_k = \begin{cases} _{n-1}D_{k-1} +{} _{n-1}D_k & k \leq n/2 \\ 0 & \text{otherwise} \end{cases} $$

Once this is done, it seems to me that we can compute

$$ P(\text{permissible path of length $n$}) = \sum_{k=0}^{\lfloor n/2 \rfloor} {} _nD_k (1-p)^k p^{n-k} $$

A little pencil-and-paper noodling and trawling on OEIS seems to show that

$$ _nD_k = \frac{n-2k+1}{n-k+1} \binom{n}{k} $$

Note that this puts the Catalan numbers right on the critical line. I still have to figure out $_nD_k$ from first principles, though.

5
On

This is a follow-up to Brian Tung's answer, just confirming his solution. It really belongs in a comment, but it doesn't fit.

To verify that the formula for $_nD_k$ is correct, we have to check that is satisfies appropriate initial conditions, and check that it satisfies the recurrence. The formula gives $$_1D_1=0,\ _1D_0=1,$$ which is correct.

Now suppose that $k\le(n-1)/2$. Then also $k-1\le (n-1)/2$ and $k\le n/2$ so in terms of the recurrence, $D$ is not 0 and is given by the formula. We have$$ {_nD_k}-{_{n-1}D_{k-1}}-{_{n-1}D_k}=\\ \begin{align} &=\frac{n-2k+1}{n-k+1}{n\choose k}-\frac{n-2k+2}{n-k+1}{n-1\choose k-1}-\frac{n-2k}{n-k}{n-1\choose k}\\ &=\left(\frac{n-2k+1}{n-k+1}-\frac{n-2k+2}{n-k+1}\right){n-1\choose k-1}+ \left(\frac{n-2k+1}{n-k+1}-\frac{n-2k}{n-k}\right){n-1\choose k}\\ &=\frac{-1}{n-k+1}{n-1\choose k-1}+\frac{k}{(n-k)(n-k+1)}{n-1\choose k}\\&=0,\\ \end{align} $$ since $${n-1\choose k}=\frac{n-k}{k}{n-1\choose k-1}$$

Now suppose $k>(n-1)/2.$ The only case we have to check is when $k=n/2$ since if $k$ is any larger we have $_nD_k=0.$ Now we have $${_{2k}D_k}-{_{2k-1}D_{k-1}}-{_{2k-1}D_{k}}=\\ \begin{align} &={_{2k}D_k}-{_{2k-1}D_{k-1}}-0\\ &=\frac{1}{k+1}{2k\choose k}-\frac{2}{k+1}{2k-1\choose k-1}\\ &=\frac{1}{k+1}\left(\frac{(2k)!}{k!k!}-2{2k-1\choose k-1 }\right)\\ &=\frac{1}{k+1}\left(\frac{2(2k-1)!}{(k-1)!k!}-2{2k-1\choose k-1 } \right)\\&=0 \end{align} $$

NOTE Can someone tell me how to improve the formatting, please? On my Mac, at least, in something like $$_nD_k-_{n-1}D_{k-1}-_{n-1}D_k$$ the $n-1$ pre-subscripts are too far way from the $D\text'$s and too close to the minus signs.

EDIT All I had to do was to put the terms in braces to get $${_nD_k}-{_{n-1}D_{k-1}}-{_{n-1}D_k}$$

2
On

As per @saulspatz's answer, "This is [also] a follow-up to Brian Tung's answer, just confirming his solution." In this case, the confirmation's numerical. I coded a recursive solution to the problem, as well as Brian's closed-form solution, printing both results. And they both agree. For very small $n$, the answer's obvious, and we're both unmistakably correct. For larger $n$, correctness is hopefully inferred from the continued agreement.

So first, my code's below. And that's followed by the examples cited above, and a little further discussion. Here's the code...

/* ---
 * standard headers
 * ------------------- */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

/* ---
 * globals (to reduce recursive stack)
 * -------------------------------------- */
static  int n         = 16;     /* total# periods/steps in binomial tree */
static  double pup    = 0.5;    /* probability of "up"-step (pdown=1-pup) */
static  int maxup     = 999999; /* maximum allowable kup-kdown */
static  double npaths = 0.0;    /* #paths satisfying maxup constraint */

/****************************************************************************
 * Function:    pmoddyck ( k, kup )
 * Purpose:     See https://math.stackexchange.com/questions/2860403/
 *              ...recursively enumerate "modified dyck paths" through
 *              n-period binomial tree, and calculate probability
 *              of successfully completing the entire "binomial walk"
 * --------------------------------------------------------------------------
 * Arguments:   k (I)           int containing number periods/steps
 *                              already completed in binomial tree
 *              kup (I)         int containing number of up-steps
 * --------------------------------------------------------------------------
 * Returns:     (double)        probability of walking n-period tree without
 *                              ever crossing past "critical baseline"
 * --------------------------------------------------------------------------
 * Notes:     o call pmoddyck(0,0) to start the recursion,
 *              and return the overall probability (with npaths
 *              returning the #paths satisfying the "maxup" constraint)
 ***************************************************************************/
double  pmoddyck ( int k, int kup ) {
  /* note: kdown=k-kup, so kup-kdown=2*kup-k */
  if ( 2*kup - k > maxup ) return ( 0.0 ); /* abort failed paths */
  if ( k >= n ) {                       /*completed entire path successfully*/
    npaths += 1.0;                      /*#successful paths=2^n if maxup=999*/
    return ( pow(pup,(double)kup)*pow(1.-pup,(double)(k-kup)) ); }
  return ( pmoddyck(k+1,kup) + pmoddyck(k+1,kup+1) ); /* next step: down+up */
  } /* --- end-of-function pmoddyck() --- */

/****************************************************************************
 * Function:    Dcoef ( n, k )
 * Purpose:     Brian Tung's "modified binomial coefficient",
 *              see https://math.stackexchange.com/questions/2860403/
 * --------------------------------------------------------------------------
 * Arguments:   n (I)           n items...
 *              k (I)           ...taken k at a time
 * Returns:     (double)        as above, or 0 for any argument error
 * --------------------------------------------------------------------------
 * Notes:     o
 ***************************************************************************/
/* --- entry point --- */
double  Dcoef ( int n, int k ) {
  double bcoef(), dcoef = 0.0;
  if ( k <= n/2 )
    dcoef = bcoef(n,k) * ((double)(n-2*k+1))/((double)(n-k+1));
  return ( dcoef ); }


/****************************************************************************
 * Function:    bcoef ( n, k )
 * Purpose:     binomial coefficient = n!/(k!(n-k)!)
 * --------------------------------------------------------------------------
 * Arguments:   n (I)           n items...
 *              k (I)           ...taken k at a time
 * Returns:     (double)        as above, or 0 for any argument error
 * --------------------------------------------------------------------------
 * Notes:     o Algorithm avoids dividing one (very) large number
 *              by another using
 *              n!/k!(n-k)! = (n-k+1)(n-k+2)...(n-k+i)/1*2*...*k   if k<=n-k,
 *                    or    = (k+1)(k+2)...(k+(n-k))/1*2*...*(n-k) if k>n-k
 *              In both cases the #terms in numerator and denom is the same.
 ***************************************************************************/
/* --- entry point --- */
double  bcoef ( int n, int k ) {
  double coef = 1.0;                    /* init with multiplicative ident */
  /* --- bcoef(n,k)=bcoef(n,n-k), so choose smaller number terms --- */
  int   kterm=0, nterms=n-k;            /* number of terms... */
  if ( k<nterms ) nterms=k;             /* ...is lesser of k,n-k */
  /* --- accumulate coef=coef*(n-nterms+kterm)/kterm, kterm=1...nterms --- */
  while ( kterm++ < nterms )            /* need another term */
    coef *= ((double)(n-nterms+kterm))/((double)kterm);
  return ( coef );                      /* return binomial coef to caller */
  } /* --- end-of-function bcoef() --- */


/****************************************************************************
 * Program:     moddyck  n  maxup  pup
 * Purpose:     Test Brian Tung's closed-form solution to
 *                https://math.stackexchange.com/questions/2860403/
 *              as well as my recursive numerical evaluation
 *              of the same problem, and see if they agree.
 * --------------------------------------------------------------------------
 * Command-Line Arguments:
 *              n (I)           int containing #periods/steps in tree
 *              maxup (I)       int containing maximum allowed
 *                                 #up-steps - #down-steps
 *                              anywhere along path...
 *                                 maxup=0   is the stackexchange problem
 *                                 maxup=999 permits all paths for a check
 * --------------------------------------------------------------------------
 * Notes:     o
 ***************************************************************************/
int     main ( int argc, char *argv[] ) {
  /* ---
   * allocations and declarations
   * ------------------------------- */
  double pmoddyck(),                    /* recursive numerical evaluation */
         pBrian = 0.0,                  /* Brian's probability, */
         Bpaths = 0.0;                  /* and Brian's #paths count */
  double Dcoef();                       /* Brian's modified binomial coef */
  int k=0;                              /* current period/step in tree */
  /* ---
   * command-line args
   * -------------------- */
  n     = ( argc > 1 ?  atoi(argv[1])  :  16 );     /*16-period tree*/
  maxup = ( argc > 2 ?  atoi(argv[2])  :  999999 ); /*allow all paths*/
  pup   = ( argc > 3 ?  atof(argv[3])  :  0.5 );    /*up/down=50/50*/
  /* ---
   * recursive evalutaion
   * ----------------------- */
  npaths = 0.0;
  printf ( " #paths=%.2lf, pmoddyck(n=%d,maxup=%d,pup=%.3lf) = %.8lf\n",
    npaths, n,maxup,pup, pmoddyck(0,0) );
  /* ---
   * Brian Tung's solution
   * ----------------------- */
  Bpaths = 0.0;
  for ( k=0; k<=(n/2); k++ ) {
    double Dnk = Dcoef(n,k);
    double p = 1.0 - pup;               /* oops -- I reversed my p's */
    Bpaths += Dnk;
    pBrian += Dnk*pow(1.-p,(double)k)*pow(p,(double)(n-k));
    } /* --- end-of-for(k) --- */
  printf ( " Bpaths=%.2lf, pBrian(n=%d,maxup=0,pup=%.3lf) = %.8lf\n",
    Bpaths, n,pup, pBrian );
  exit ( 0 );
  } /* --- end-of-function main() --- */
/* ----------------------- END-OF-FILE MODDYCK.C ------------------------- */

So if you want to run this yourself, cut-and-paste it into file moddyck.c (modified Dyck paths, as per @saulspatz's reference cited in the comments to the original question, and Brian's mention of Catalan numbers). Then compile and run it as

cc moddyck.c -lm -o moddyck

./moddyck 16 0 .5

where that run is for $n=16$, $\mbox{maxup}=0$ (explained below), and $p=0.5$. The zero for maxup puts the dotted baseline/critical-line exactly where shown along the center of the binomial tree. $\mbox{maxup}=1$ tells my recursive algorithm to allow paths that are one node higher. However, Brian's solution doesn't accommodate that, so to compare the solutions you have to stick with $\mbox{maxup}=0$.

And here are some promised small-$n$ comparisons...

bash-4.3$ ./moddyck 1 0 .5
 #paths=1.00, pmoddyck(n=1,maxup=0,pup=0.500) = 0.50000000
 Bpaths=1.00, pBrian(n=1,maxup=0,pup=0.500) = 0.50000000
bash-4.3$ ./moddyck 2 0 .5
 #paths=2.00, pmoddyck(n=2,maxup=0,pup=0.500) = 0.50000000
 Bpaths=2.00, pBrian(n=2,maxup=0,pup=0.500) = 0.50000000
bash-4.3$ ./moddyck 3 0 .5
 #paths=3.00, pmoddyck(n=3,maxup=0,pup=0.500) = 0.37500000
 Bpaths=3.00, pBrian(n=3,maxup=0,pup=0.500) = 0.37500000
bash-4.3$ ./moddyck 4 0 .5
 #paths=6.00, pmoddyck(n=4,maxup=0,pup=0.500) = 0.37500000
 Bpaths=6.00, pBrian(n=4,maxup=0,pup=0.500) = 0.37500000
bash-4.3$ ./moddyck 5 0 .5
 #paths=10.00, pmoddyck(n=5,maxup=0,pup=0.500) = 0.31250000
 Bpaths=10.00, pBrian(n=5,maxup=0,pup=0.500) = 0.31250000

There are always $2^n$ total paths (duh, it's a binomial tree:), and our displayed #paths/Bpaths are the number of them that successfully traverse the tree. So, for $n=1,2,3$ you can pretty much draw the successful paths in your head, and see that we're both right. But you ain't gonna want to try that with the next few examples below, where I also changed the $p$'s around a little, just to check that also...

bash-4.3$ ./moddyck 16 0 .5
 #paths=12870.00, pmoddyck(n=16,maxup=0,pup=0.500) = 0.19638062
 Bpaths=12870.00, pBrian(n=16,maxup=0,pup=0.500) = 0.19638062
bash-4.3$ ./moddyck 18 0 .75
 #paths=48620.00, pmoddyck(n=18,maxup=0,pup=0.750) = 0.00308244
 Bpaths=48620.00, pBrian(n=18,maxup=0,pup=0.750) = 0.00308244
bash-4.3$ ./moddyck 20 0 .25
 #paths=184756.00, pmoddyck(n=20,maxup=0,pup=0.250) = 0.66734600
 Bpaths=184756.00, pBrian(n=20,maxup=0,pup=0.250) = 0.66734600
bash-4.3$ ./moddyck 22 0 .5
 #paths=705432.00, pmoddyck(n=22,maxup=0,pup=0.500) = 0.16818810
 Bpaths=705432.00, pBrian(n=22,maxup=0,pup=0.500) = 0.16818810
bash-4.3$ ./moddyck 24 0 .75
 #paths=2704156.00, pmoddyck(n=24,maxup=0,pup=0.750) = 0.00091751
 Bpaths=2704156.00, pBrian(n=24,maxup=0,pup=0.750) = 0.00091751