Disentangled Attribution Curves

Disentangled Attribution Curves (DAC) are a method to interpret tree ensemble models through feature importance curves. These curves show the importance of a variable or group of variables based on their value changes.

What are Tree Ensemble Methods?

Tree Ensemble Methods are models that use a collection of decision trees to achieve classification or regression tasks. Decision trees are flowcharts consisting of nodes and edges, and each node represents a decision. They learn to map input features to output targets by partitioning the data into disjoint subsets based on conditions on feature values. Each subset represents a node in the decision tree. Tree Ensemble Methods build multiple decision trees and aggregate the results to make predictions. Examples of Tree Ensemble Methods are Random Forest, Extra Trees, and Gradient Boosting.

What is Feature Importance?

Feature Importance is a measure of how useful a feature is in predicting the target variable of a model. It can give insights into the relations between the input variables and the output variable. Features that are highly important can be used to improve the model or to explain the model's behavior to a non-expert audience.

What are Disentangled Attribution Curves?

Disentangled Attribution Curves (DAC) are an extension of feature importance methods to tree ensemble models. Tree Ensemble models can be challenging to interpret because they are non-linear, non-parametric, and can capture interactions between features that are difficult to discern. DAC attempts to unravel these interactions by plotting changes in feature importance as the value of the feature changes.

DAC calculates the feature importance for each feature by measuring how much the model's predictions change when the feature values change. DAC applies a randomized perturbation to the feature values and measures the change in model predictions. The more significant the change, the more important the feature is. DAC uses a permutation method that preserves the correlation between features, and thus, the interactions between features are properly accounted for when calculating feature importance.

For example, suppose the model uses four features `A`, `B`, `C`, and `D` to predict whether a customer will purchase a product. The DAC method would produce a curve for each of the four features showing their respective importance in the prediction task. The curves would illustrate how feature importance varies over the value range of the feature.

How do Disentangled Attribution Curves work?

The Disentangled Attribution Curve (DAC) method works by decomposing the feature importance of a group of features into its individual feature contributions. This decomposition helps to uncover the interactions between the features and how they affect the model's performance.

To illustrate how DAC works, let us consider a binary classification problem. Suppose we want to predict whether a student will pass an exam based on two features: `hours of study` and `hours of sleep`. In this example, we want to know whether the student needs more sleep or more study time to pass the exam.

We use a decision tree to model the relationship between study hours, sleep hours, and the exam result. The decision tree splits the data set into smaller subsets based on the values of the features. The tree predicts the class label of a new instance by traversing the tree from the root node down to a leaf node.

We can calculate the importance of the features by measuring their contribution to the accuracy of the model. The DAC method provides an intuitive way to visualize the contribution of each feature to the overall accuracy of the model at different values of the features.

The DAC method decomposes the feature importance into two parts:

Marginal Feature Importance (MFI)
Interaction Feature Importance (IFI)

The Marginal Feature Importance (MFI) is the importance of a feature in isolation. It measures how much the performance of the model depends on the value of the feature independent of any other feature. If the MFI of a feature is high, then the feature is crucial to the model's accuracy. If the MFI is low, then the feature provides little information for the prediction.

Interaction Feature Importance (IFI) measures the importance of the interaction between features. It captures the effect of using multiple features together to obtain accurate predictions. IFI is higher when multiple features interact to create a prediction than when they are used in isolation.

Why are Disentangled Attribution Curves useful?

Disentangled Attribution Curves (DAC) provide a detailed and intuitive way to interpret tree ensemble models. DAC makes it possible to visualize the contribution of each feature to the model's accuracy, both in isolation and in interaction with other features. This information can be used to adjust the model and improve its performance or to explain the model's behavior to stakeholders.

DAC helps data scientists to understand how features interact with each other in a model. This understanding can be used to create new models that take into account these interactions or improve the interpretability of the model by accounting for them.

DAC is a powerful tool for identifying which features are essential to the performance of the model. This information can be used to select the important features to include in a model or to eliminate irrelevant features that would not improve its performance. DAC can reduce the dimensionality of the model and improve its computational efficiency.

Disentangled Attribution Curves (DAC) provide a powerful and intuitive way of interpreting tree ensemble models. DAC can help data scientists to identify the importance of individual features and their interactions with other features. The information obtained from DAC can be used to adjust the model to improve its performance, reduce its dimensionality, or explain its behavior to stakeholders. DAC is a valuable addition to the growing set of feature importance methods and provides an excellent way to interpret complex models.