Gradient Boosting (XGBoost) definition help

123 Views Asked by At

I was hoping somebody could help explain the definition below. Particularly I have trouble understanding the definition of the space of regression trees $$F = \{ f(x)= w_{q(\mathbf{x})} \}(q: R^m \to T, w \in R^T) $$

$q$ is a function that maps a vector to the number of trees? Doesn't make sense to me.

$w_i$ is the score in the $i^{th}$ leaf but what is meant by $w_q(x)$ then? $w$ is indexed by the structure of the tree $q$?

$w \in R^T$: I understand as there are $T$ leafs and $w$ is a vector of $T$ leaf scores.

But what is meant by $F=\{f(x) = w_q(x)\}$?

XGBoost: A Scalable Tree Boosting System
: https://arxiv.org/abs/1603.02754

Image from: "XGBoost: A Scalable Tree Boosting System": https://arxiv.org/abs/1603.02754

1

There are 1 best solutions below

2
On BEST ANSWER

$q_k$ is a function that maps the data ($x_i$) called the example in the article to a number from $1$ to $T_k$ where $T_k$ is the number of leaves in the $k$th tree (and could generally vary from tree to tree), $k$ from $1$ to $K$. It is a bit confusing that they named them $q$ and $T$ for all the trees.

$w_{k_i}$ is the score of the $i$th leaf of the $k$th tree (shown in the article as $w_i$). It is also a bit confusing since there is a different set of "$w$"'s for each tree, as well as the possible values that $i$ can take (from $1$ to $T_k$). In this case, $w_{k_{q_{k}(x_i)}}$ refers to the $q_{k}(x_i)$th leaf of $w_k$. (In particular, $q_k:\mathbb R^m\rightarrow \{1,2,...,T_k\})$ and $w_k=\{w_{k_1},w_{k_2},...,w_{k_{T_k}}\}$.)

Lastly $\mathscr F=\{f_k(x):f_k(x)=w_{k_{q_k(x_i)}}, k=1,2,...,K\}$ is the set of functions, one for each tree, that takes an example/data vector $x_i$, maps it to the appropriate leaf of that tree via the $q_k$ function, and then indexes the appropriate leaf $w_{k_j}$ of that tree with $j=q_k(x_i)$. The ensemble model is created by summing the continuous outcome variable that is received from each of these trees, specifically the value at the end of each leaf that has been chosen from each tree.