Closed Form for Sum of Nodes in Binary Tree

Question

Closed Form for Sum of Nodes in Binary Tree

894 Views Asked by Bumbble Comm At 31 Mar 2026 - 7:43

Consider a binary tree $T$ with nodes in $\mathbb{Z}^+$, where level $k$ of $T$ contains nodes $2^k$ through $2^{k + 1} - 1$. I have some problems that involve visiting the nodes of $T$ in their natural order up to a given node and computing some value that depends on the nodes encountered. Here is an example of the nodes encountered when visiting every node up to $6$ and returning to the root:

$$ \mathbf{1}, \ \mathbf{2}, \ 1, \ \mathbf{3}, \ 1, \ 2, \ \mathbf{4}, \ 2, \ \mathbf{5}, \ 2, \ 1, \ 3, \ \mathbf{6}, \ 3, \ 1 $$

The problem that's giving me trouble is computing the sum $S_N(n)$ of nodes encountered when visiting every node of $T$ up to $n$ and returning to the root. For $n = 2^{k + 1} - 1$, some examples of $S_N(n)$ are: $$ \begin{alignat*}{3} & S_N(1) && ~=~ \mathbf{1} \\ & S_N(3) && ~=~ S(1) + \mathbf{2} + 1 + \mathbf{3} + 1 ~=~ 8 \\ & S_N(7) && ~=~ S(3) + 2 + \mathbf{4} + 2 + \mathbf{5} + 2 + 1 + 3 + \mathbf{6} + 3 + \mathbf{7} + 3 + 1 ~=~ 47 \end{alignat*} $$

To compute the sum $S_L(k)$ of nodes encountered when visiting the nodes of $T$ up to level $k$ and returning to the root, note that every node at levels $1$ to $k - 1$ must be visited three times (the first node is visited twice). Since $S_L(0) = 1$, it follows that

$$ \begin{alignat*}{3} & S_L(k) && ~=~ 1 + \sum_{j = 1}^k \left ( 3 \sum_{i = 1}^{2^j - 1} i + \sum_{i = 2^j}^{2^{j + 1} - 1} i - 1 \right ) \\ & && ~=~ 1 + \sum_{j = 1}^k \left ( 3 \cdot 2^{2j} - 2^{j + 1} - 1 \right ) \\ & && ~=~ 4(4^k - 2^k) - k + 1 \end{alignat*}$$

However, I haven't been able to find a closed form for $S_N(n)$. Let $\mathsf{lca}(n)$ denote the least common ancestor of $n - 1$ and $n$, where $n$ has level $k$ and $n > 2^k$. To find the nodes that must be added to $S_N(n - 1)$ to obtain $S_N(n)$ when moving from $n - 1$ to $n$ on $k$, it suffices to add the sum of $n$, $\mathsf{lca}(n)$, and twice the sum of the interior nodes along the path from $\mathsf{lca}(n)$ to $n$, to $S_N(n - 1)$. This sum can be found by evaluating the function

$$ \begin{alignat*}{3} & \tau(n, 0) && ~=~ \lfloor n / 2 \rfloor \\ & \tau(n, i) && ~=~ 2 \lfloor n / 2 \rfloor + \tau(\lfloor n / 2 \rfloor, i - 1) \end{alignat*} $$

at the point $(n, \mathsf{bits}_1(n - 1))$ and adding $n$, where $\mathsf{bits}_1(m)$ denotes the number of trailing $1$ bits in $m$. You can also obtain this value by recursion on the binary representations of $n - 1$ and $n$. Unfortunately, both functions give unwieldy expressions for $S_N(n)$. Suppose that $n$ occurs at level $j$. If $k = 2^j$, then for $n > 1$ it should follow that

$$\begin{alignat*}{3} & S_N(n) && ~=~ S_L(j - 1) + 2 \sum_{i = 0}^{j - 1} 2^i + \sum_{i = k}^n i + \sum_{m = k + 1}^n \tau(m, \mathsf{bits}_1(m - 1)) - 1\\ & && ~=~ S_L(j - 1) + 2k + \frac{1}{2}(n - k + 1)(n + k) + \sum_{m = k + 1}^n \tau(m, \mathsf{bits}_1(m - 1)) - 3 \\ \end{alignat*}$$

Assuming I haven't made any mistakes, this appears to be correct. It agrees with small cases when computed by hand or computer. However, it isn't a closed form. Furthermore, it relies on recursive functions that encode algorithms over binary trees. I think the approach of computing least common ancestors and summing down paths is correct, but the end result is lacking.

I can find a closed form for $\mathsf{bits}_1(n)$, but not $\tau(n, i)$ or the unevaluated sum. Is there a better way to think about this problem, or express the solution, to avoid the special functions and mess of notation?

Original Q&A

There are 1 best solutions below

**Bumbble Comm** · Accepted Answer

Let's try again. I think I understand what you intend now.

Problem statement: Let there be a heap (a complete binary tree) containing $n$ nodes (internal and leaf). Array the tree so that the root is at the "top" and all nodes of depth $k$ are in the $k^\text{th}$ row. Number the nodes sequentially starting with "$1$" at the root and incrementing, sequentially (left to right) among parent node children, sequentially through parents (left to right) in each depth, and sequentially (top to bottom) through the set of depths. Thus, the left child of the root (node $1$) is node $2$ and the right child is node $3$. To help clarify this numbering, the lowest numbered node at depth $k$ is $2^{k-1}$ and the highest is $2^k - 1$.

We now have a heap each node of which has a number. We traverse the tree, accumulating node numbers every time we reach a node (descending, switching from left to right sibling, or ascending). In particular, (where by "common ancestor", we mean "common ancestor of greatest depth")

for j = 1 to n
    If j-1 is not the common ancestor of j and j-1, ascend to the 
        common ancestor of j$ and $j-1$, accumulating node numbers 
        as we visit them.
    Descend to the node j, accumulating node numbers as we visit them
Next j.
Ascend to node 1, accumulating node numbers as we visit them.

Note:

The condition on common ancestors is to provide the initial sequence $(1, 2, 1)$ rather than $(1, 1, 2, 1)$.
For a node numbered $2^k - 1$, the common ancestor with its successor is node $1$. This will provide a natural place to break our accumulations into chunks.
For all $m > n$, the sequence of accumulations up to $j=n$ is a prefix of the accumulations up to $j=m$.

For $n = 2^k - 1$, the process is very simple because each depth is completely filled. We perform a union of pre-, in-, and post-order traversal, accumulating node numbers each of the (up to) three times that we visit them, once for each level of the tree. That is, we sum the traversal on the filled heap $\{1\}$, then the filled heap $\{1, \{2\}, \{3\}\}$, and so on until we have summed the filled heaps of each depth. Unfortunately, the depth $1$ subtree is "different", so we have to manually adjust. However, all the other filled trees of each depth behave the same -- each internal node is visited three times and each leaf is visited once. Let $s(k)$ be the sum for the single pass of the filled tree of depth $k$ (that is, the accumulations for $j \in [2^{k-1}, 2^k -1]$). We have \begin{align*} s(1) &= 1 \text{, and for $k > 1$, } \\ s(k) &= 3 \sum_{j=1}^{2^{k-1}-1} j + \sum_{j=2^{k-1}}^{2^k -1} j \\ &= 2^{k-2}(3 \cdot 2^k - 4) \text{.} \end{align*} and the total accumulation over all $k$ filled trees is \begin{align*} S_N(2^k -1) &= 1 + \sum_{j=2}^k s(j) - 1 \\ &= (2^k - 1)^2 +1 - k \text{.} \end{align*} (This would be simpler if we were treating the transition from node 1 to node 2 as if there were ascending to node 1 from node 1, accumulating a $1$. If we did so, the sum would start with $j=1$. Also, this subtracts out a "1" because after ascending to node 1 at the end of one filled tree, we do not accumulate to start at 1 on the next filled tree.)

Now we have to figure out what to do with partially filled greatest depths. To do so, we write $n$ in binary. The position of the leading $1$ tells us the depth of the tree (which is the depth that is partially filled). Each subsequent bit encodes which of the children of the "current" node is the node leading to node $n$. For each subsequent $0$ bit, only the left child has descendants in the partially filled depth. For each subsequent $1$ bit, the left child has a filled subtree all the way down to the partially filled depth, and the right child also has descendants in the partially filled depth (but not a filled subtree unless $n$ is handled in the "easy" case above). Examples to set intuition:

$n=2 = 10_2$: The partially filled depth is $k=2$ and the root node has no right descendants in the partially filled depth, so will be visited twice: once in pre-order and once in post-order.
$n = 6 = 110_2$: The partially filled depth is $k=3$. From the second "$1$", the root node has a filled left subtree all the way down to depth $3$ and the root node's right subtree is present but not filled. The root node will be visited three times: once each in pre-, in- (transitioning from left subtree to right subtree), and post- order. From the "$0$", node 3's right subtree (consisting of 7) is absent and left subtree (consisting of 6) is present. So node 3 will be visited twice, once pre-, and once post-.

So now we need to evaluate sums of nodes on filled subtrees. Consider the subtree rooted at $m$. It's children are the nodes $[2m, 2m+1]$. Their children are the nodes $[4m, 4m+3]$, and so on. We only want to sum nodes down to level $k$ (so only $k - \lfloor \log_2(m) \rfloor$ levels, including the level containing node $m$). These are all internal nodes, so are visited three times, except for the leaves. The sum we are interested in is (where, as per standard practice, an empty sum is zero), \begin{align*} k &= 1 + \lfloor \log_2 n \rfloor \\ \ell &= \lfloor \log_2 m \rfloor \\ s_m(k) &= 3 \sum_{j=0}^{k-2-\ell} \sum_{i = 0}^{2^j - 1} (2^j m + i) + \sum_{i = 0}^{2^{k-1-\ell} - 1} (2^{k-1-\ell} m + i) \\ &= 2^{-2 \ell -3} \left(\left(3 \cdot 2^{j+1} m-1\right) 2^{k+\ell +1} \right. \\ &\qquad \left. {} - 3 \cdot 2^{j+2\ell +2} \left(-2^j k+\left(2^j-1\right) \ell +2^j+k+2 m-1\right) \right. \\ &\qquad \left. {} + 4^k (2m+1)\right) \text{.} \end{align*}

(I will not be terribly surprised if there are bugs in the above. That was a lot of ugly algebra.)

So, to recap:

Sum up the filled trees. If the last level is incompletely filled, ...
Use the binary expansion of $n$ to determine which full subtrees exist in the sum.
Use the last equation to provide contributions for those subtrees.

Closed Form for Sum of Nodes in Binary Tree

There are 1 best solutions below

Related Questions in DISCRETE-MATHEMATICS

Related Questions in COMPUTER-SCIENCE

Related Questions in TREES

Trending Questions

Popular # Hahtags

Popular Questions