\(N\) Internal Nodes">

The Average Number of Leaves in a Binary Tree of \(N\) Internal Nodes

Posted by Beetle B. on Mon 22 September 2014

Let’s use generating functions to count the average number of leaves in a binary tree with \(N\) internal nodes.

Definitions

  • Let \(P\) be the set of all binary trees.
  • Let \(p\in P\) be a given binary tree
  • Let \(|p|\) be the number of internal nodes in a binary tree.
  • Let \(P_{N}=\) the number of binary trees with \(N\) internal nodes. We showed this earlier to be \(\frac{1}{N+1}\dbinom{2N}{N}\).
  • \(P(z)=\frac{1}{2}\left(1-\sqrt{1-4z}\right)\)
  • Let \(cost(p)\) be the number of leaves in \(p\).

The cumulative cost function is:

\begin{equation*} C(z)=\sum_{p\in P}cost(p)z^{|p|} \end{equation*}
\begin{equation*} C(z)=z+\sum_{p_{L}}\sum_{p_{R}}\left[cost(p_{L})+cost(p_{R})\right]z^{|p_{L}|+|p_{R}|+1} \end{equation*}

This is the cost of:

  • A tree with a single node and 2 leaves (\(z\)).
  • A tree with the root (which explains the 1 in the exponent), and the left and right subtrees.

If you expand the terms out, you’ll get:

\begin{equation*} C(z) = z\left[1+2C(z)P(z)\right] \end{equation*}
\begin{equation*} C(z)=\frac{z}{\sqrt{1-4z}} \end{equation*}

Expanding it out using the Binomial expansion and simplifying, we get:

\begin{equation*} C_{N}=\dbinom{2N-2}{N-1} \end{equation*}

And so the average number of leaves is:

\begin{equation*} \frac{N^{2}(N+1)}{2N(2N-1)} \end{equation*}

As you take the limit in large \(N\), you get \(N/4\).