«`html
Understanding the Target Audience
The target audience for this tutorial includes data scientists, machine learning practitioners, and business analysts interested in enhancing their understanding of machine learning model interpretability. They likely work in industries such as finance, healthcare, logistics, and technology, where predictive modeling plays a significant role in decision-making processes.
Pain Points:
- Difficulty in explaining model predictions in a business context.
- Challenges in understanding feature interactions and their contributions to model outputs.
- Lack of tools to visualize complex relationships between features in machine learning models.
Goals:
- To gain deeper insights into how different features interact within machine learning models.
- To enhance model interpretability for stakeholders and non-technical team members.
- To utilize advanced techniques in model evaluation and explanation.
Interests:
- Latest trends in machine learning and AI.
- Methodologies for model evaluation and interpretability.
- Tools and packages that facilitate data exploration and visualization.
Communication Preferences:
- Prefer detailed, step-by-step tutorials that demonstrate practical applications.
- Appreciate clear explanations supported by code examples and visualizations.
- Engage with content that includes references to external resources for further learning.
How to Use the SHAP-IQ Package to Uncover and Visualize Feature Interactions in Machine Learning Models Using Shapley Interaction Indices (SII)
In this tutorial, we explore how to use the SHAP-IQ package to uncover and visualize feature interactions in machine learning models using Shapley Interaction Indices (SII), building on the foundation of traditional Shapley values. Shapley values effectively explain individual feature contributions in AI models but fail to capture feature interactions. Shapley interactions enhance this by separating individual effects from interactions, offering deeper insights—such as how longitude and latitude jointly influence house prices. In this tutorial, we’ll get started with the SHAP-IQ package to compute and explore these Shapley interactions for any model.
Installing the Dependencies
To get started, install the necessary packages:
!pip install shapiq overrides scikit-learn pandas numpy
Data Loading and Pre-Processing
We’ll use the Bike Sharing dataset from OpenML. After loading the data, we’ll split it into training and testing sets to prepare it for model training and evaluation.
import shapiq
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
import numpy as np
# Load data
X, y = shapiq.load_bike_sharing(to_numpy=True)
# Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Model Training and Performance Evaluation
# Train model
model = RandomForestRegressor()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)
print(f"R² Score: {r2:.4f}")
print(f"Mean Absolute Error: {mae:.4f}")
print(f"Root Mean Squared Error: {rmse:.4f}")
Setting Up an Explainer
We set up a TabularExplainer using the SHAP-IQ package to compute Shapley interaction values based on the k-SII (k-order Shapley Interaction Index) method. By specifying max_order=4, we allow the explainer to consider interactions of up to 4 features simultaneously, enabling deeper insights into how groups of features collectively impact model predictions.
# set up an explainer with k-SII interaction values up to order 4
explainer = shapiq.TabularExplainer(
model=model,
data=X,
index="k-SII",
max_order=4
)
Explaining a Local Instance
We select a specific test instance (index 100) to generate local explanations. This helps us understand the exact inputs passed to the model and sets the context for interpreting the Shapley interaction explanations that follow.
from tqdm.asyncio import tqdm
# create explanations for different orders
feature_names = list(df[0].columns) # get the feature names
n_features = len(feature_names)
# select a local instance to be explained
instance_id = 100
x_explain = X_test[instance_id]
y_true = y_test[instance_id]
y_pred = model.predict(x_explain.reshape(1, -1))[0]
print(f"Instance {instance_id}, True Value: {y_true}, Predicted Value: {y_pred}")
for i, feature in enumerate(feature_names):
print(f"{feature}: {x_explain[i]}")
Analyzing Interaction Values
We use the explainer.explain() method to compute Shapley interaction values for a specific data instance (X[100]) with a budget of 256 model evaluations. This returns an InteractionValues object, which captures how individual features and their combinations influence the model’s output. The max_order=4 allows us to consider interactions involving up to 4 features.
interaction_values = explainer.explain(X[100], budget=256)
# analyse interaction values
print(interaction_values)
First-Order Interaction Values
To keep things simple, we compute first-order interaction values—i.e., standard Shapley values that capture only individual feature contributions (no interactions).
feature_names = list(df[0].columns)
explainer = shapiq.TreeExplainer(model=model, max_order=1, index="SV")
si_order = explainer.explain(x=x_explain)
si_order
Plotting a Waterfall Chart
A Waterfall chart visually breaks down a model’s prediction into individual feature contributions. It starts from the baseline prediction and adds/subtracts each feature’s Shapley value to reach the final predicted output. The baseline value (i.e., the model’s expected output without any feature information) is 190.717. As we add the contributions from individual features (order-1 Shapley values), we can observe how each feature influences the prediction.
si_order.plot_waterfall(feature_names=feature_names, show=True)
Features like Weather and Humidity have a positive contribution, while features like Temperature and Year negatively impact the prediction. Overall, the Waterfall chart provides valuable insights into the model’s decision-making process.
For further exploration, feel free to check out our GitHub Page for tutorials, codes, and notebooks. You can also subscribe to our newsletter for the latest updates.
«`