TreeHFD
TreeHFD decomposition of a single tree. |
- class treehfd.tree.TreeHFD(tree_table: DataFrame, interaction_order: int, interaction_list: ndarray | None, depth_variable: int)
TreeHFD decomposition of a single tree.
This class is the TreeHFD decomposition of a single tree of an ensemble. A least square problem is solved to get the coefficients defining the components of the decomposition. This class is called sequentially by XGBTreeHFD to get the TreeHFD of an xgboost tree ensemble.
- Parameters:
tree_table (pd.DataFrame) – The table with the structure of the considered tree, obtained from xgb_model.get_booster().trees_to_dataframe().
interaction_order (int, default=2) – Set to 1 to fit only main effects, or to 2 to also include second-order interactions in the TreeHFD decomposition.
interaction_list (np.ndarray, default=None) – Predefined list of second-order interactions to be estimated in the decomposition. Each row defines an interaction with two integers for the variable indices. Default=None, and interactions are automatically extracted from tree paths.
depth_variable (int) – Variables are selected at the first depth_variable levels of the tree for the components of the decomposition.
- tree_structure
Structure of the tree, i.e., the splitting variables, children node indices, and splitting values.
- Type:
tuple
- interaction_order
Set to 1 to fit only main effects, or to 2 to also include second-order interactions in the TreeHFD decomposition.
- Type:
int, default=2
- interaction_list
The list of interactions, defined as variable pairs, that occur in the tree paths.
- Type:
list
- eta0
Intercept of the TreeHFD decomposition of the tree.
- Type:
float, default=0
- cartesian_partition
Cartesian tree partitions, i.e., variable indices for main effects, cell index of each component partition, list of splits for each variable, list of cells for each interaction, and size of these cells.
- Type:
CartesianTreePartition
- hfd_coeffs
Array with coefficients defining the values of the decomposition components in each cell of the Cartesian tree partitions.
- Type:
np.array
- treehfd.tree.TreeHFD.fit(self, X: ndarray, y_tree: ndarray) None
Fit TreeHFD decomposition of a single tree.
- Parameters:
X (np.ndarray) – The input data used to train the xgboost model.
y_tree (np.ndarray) – Output of the original tree for the training data.
- treehfd.tree.TreeHFD.predict(self, X_new: ndarray) tuple[ndarray, ndarray]
Predict TreeHFD components of a single tree for new input data.
- Parameters:
X_new (np.ndarray) – New input data where TreeHFD predictions are computed.
- Returns:
- y_mainnp.ndarray
array for the predictions of main effects
- y_order2np.ndarray
array for predictions of second-order interactions (columns are ordered according to interaction_list).
- Return type:
tuple