TreeHFD

TreeHFD

TreeHFD decomposition of a single tree.

class treehfd.tree.TreeHFD(tree_table: DataFrame, interaction_order: int, interaction_list: ndarray | None, depth_variable: int)

TreeHFD decomposition of a single tree.

This class is the TreeHFD decomposition of a single tree of an ensemble. A least square problem is solved to get the coefficients defining the components of the decomposition. This class is called sequentially by XGBTreeHFD to get the TreeHFD of an xgboost tree ensemble.

Parameters:
  • tree_table (pd.DataFrame) – The table with the structure of the considered tree, obtained from xgb_model.get_booster().trees_to_dataframe().

  • interaction_order (int, default=2) – Set to 1 to fit only main effects, or to 2 to also include second-order interactions in the TreeHFD decomposition.

  • interaction_list (np.ndarray, default=None) – Predefined list of second-order interactions to be estimated in the decomposition. Each row defines an interaction with two integers for the variable indices. Default=None, and interactions are automatically extracted from tree paths.

  • depth_variable (int) – Variables are selected at the first depth_variable levels of the tree for the components of the decomposition.

tree_structure

Structure of the tree, i.e., the splitting variables, children node indices, and splitting values.

Type:

tuple

interaction_order

Set to 1 to fit only main effects, or to 2 to also include second-order interactions in the TreeHFD decomposition.

Type:

int, default=2

interaction_list

The list of interactions, defined as variable pairs, that occur in the tree paths.

Type:

list

eta0

Intercept of the TreeHFD decomposition of the tree.

Type:

float, default=0

cartesian_partition

Cartesian tree partitions, i.e., variable indices for main effects, cell index of each component partition, list of splits for each variable, list of cells for each interaction, and size of these cells.

Type:

CartesianTreePartition

hfd_coeffs

Array with coefficients defining the values of the decomposition components in each cell of the Cartesian tree partitions.

Type:

np.array

treehfd.tree.TreeHFD.fit(self, X: ndarray, y_tree: ndarray) None

Fit TreeHFD decomposition of a single tree.

Parameters:
  • X (np.ndarray) – The input data used to train the xgboost model.

  • y_tree (np.ndarray) – Output of the original tree for the training data.

treehfd.tree.TreeHFD.predict(self, X_new: ndarray) tuple[ndarray, ndarray]

Predict TreeHFD components of a single tree for new input data.

Parameters:

X_new (np.ndarray) – New input data where TreeHFD predictions are computed.

Returns:

y_mainnp.ndarray

array for the predictions of main effects

y_order2np.ndarray

array for predictions of second-order interactions (columns are ordered according to interaction_list).

Return type:

tuple