[ ]:
import fdfi
print('FDFI version:', fdfi.__version__)
Quickstart: FDFI in 5 Minutes
This tutorial introduces the basics of FDFI (Flow-Disentangled Feature Importance). By the end, you’ll be able to:
Create an explainer for any model
Compute feature importance
Interpret the results
Get confidence intervals
Setup
First, let’s import the necessary libraries:
[ ]:
import numpy as np
from fdfi.explainers import OTExplainer
from fdfi.plots import confidence_interval_plot, summary_bar
# Set random seed for reproducibility
np.random.seed(42)
Create a Simple Model
Let’s create a simple model where we know the true feature importance. Features 0 and 1 are important, the rest are noise:
[ ]:
def model(X):
"""Simple model: y = x0 + 2*x1 + 0.5*x2"""
return X[:, 0] + 2 * X[:, 1] + 0.5 * X[:, 2]
# Create training data (used as background distribution)
n_samples = 200
n_features = 10
X_train = np.random.randn(n_samples, n_features)
# Create test data to explain
X_test = np.random.randn(100, n_features)
print(f"Training data shape: {X_train.shape}")
print(f"Test data shape: {X_test.shape}")
print(f"Model predictions for test data: {model(X_test)[:5]}")
Create an Explainer
The OTExplainer uses Gaussian optimal transport to compute feature importance:
[ ]:
# Create the explainer
explainer = OTExplainer(
model, # The model to explain
data=X_train, # Background data
nsamples=50, # Monte Carlo samples per feature
)
print("Explainer created!")
Compute Feature Importance
Call the explainer on test data to get feature importance:
[ ]:
# Compute feature importance
results = explainer(X_test)
# Print the results
print("Feature Importance (phi_X):")
for i, phi in enumerate(results["phi_X"]):
print(f" Feature {i}: {phi:.4f}")
Visualize Feature Importance
Use summary_bar immediately after results = explainer(X_test) to inspect global scores and standard errors.
[ ]:
feature_names = [f"X{i}" for i in range(n_features)]
fig, ax, importance_table = summary_bar(
results["phi_X"],
results["se_X"],
feature_names,
max_display=8,
show=False,
)
importance_table.head()
Interpret the Results
The results dictionary contains:
phi_X: Feature importance in the original X-spacephi_Z: Feature importance in the disentangled Z-spacese_X,se_Z: Standard errors for uncertainty quantification
Higher values indicate more important features. Since our model uses x0 + 2*x1 + 0.5*x2, we expect Features 0, 1, and 2 to have the highest importance.
[ ]:
# Sort features by importance
importance = results["phi_X"]
sorted_idx = np.argsort(importance)[::-1]
print("Features ranked by importance:")
for rank, idx in enumerate(sorted_idx):
print(f" Rank {rank+1}: Feature {idx} (importance = {importance[idx]:.4f})")
Get Confidence Intervals
FDFI provides statistical inference via conf_int():
[ ]:
# Compute confidence intervals
ci = explainer.conf_int(
alpha=0.05, # 95% confidence level
target="X", # Use X-space importance
alternative="greater" # Test if importance > 0
)
print("\nConfidence Intervals (95%, one-sided):")
print("-" * 70)
print(f"{'Feature':>8} {'Estimate':>10} {'SE':>10} {'Z-score':>10} {'Rank':>6} {'P-value':>10}")
print("-" * 70)
for i in range(n_features):
sig = "*" if ci["reject_null"][i] else ""
print(f"{i:>8} {ci['score'][i]:>10.4f} {ci['se'][i]:>10.4f} "
f"{ci['zscore'][i]:>10.4f} {ci['ranking'][i]:>6} {ci['pvalue'][i]:>10.4f} {sig}")
print("\n* = significant at alpha=0.05")
print("\nNote: 'zscore' = (score - margin) / se, 'ranking' = rank by descending z-score.")
[ ]:
confidence_interval_plot(
ci,
feature_names=feature_names,
max_display=8,
show=False,
)
View Summary
Use the built-in summary() method for a formatted output:
[ ]:
# Print formatted summary
explainer.summary(alpha=0.05, alternative="greater")
Next Steps
Now that you’ve learned the basics, check out these tutorials:
OT Explainer Deep Dive: Learn more about the Gaussian OT method
EOT Explainer: Entropic OT for non-Gaussian data
Confidence Intervals: Advanced statistical inference