Explainers
Overview
The fdfi.explainers module provides classes for computing flow-disentangled
feature importance. The main classes are:
Base Explainer
- class fdfi.explainers.Explainer(model, data=None, **kwargs)[source]
Bases:
objectBase class for DFI explainers.
This class provides the interface for computing feature importance using disentangled methods, similar to SHAP explainers. It also provides post-hoc confidence intervals via conf_int() and formatted summaries via summary().
- Parameters:
model (callable) – The model to explain. Should be a function that takes a numpy array and returns predictions.
data (numpy.ndarray, optional) – Background data to use for explanations.
**kwargs (dict) – Additional parameters for the explainer.
- model
The model being explained.
- Type:
callable
- data
Background data for explanations.
- Type:
numpy.ndarray or None
Examples
>>> import numpy as np >>> from fdfi import Explainer >>> >>> # Define a simple model >>> def model(x): ... return x.sum(axis=1) >>> >>> # Create an explainer >>> explainer = Explainer(model) >>> >>> # Compute explanations (when implemented) >>> # explanations = explainer(X_test)
- conf_int(alpha=0.05, target='X', groups=None, threshold_null=True, multitest_method=None, var_floor_c=0.1, var_floor_method='mixture', var_floor_quantile=0.95, margin=0.0, margin_method='auto', margin_quantile=0.95, alternative='two-sided', verbose=False)[source]
Compute confidence intervals and significance statistics for feature importance.
If groups is provided, computes importance and uncertainty at the group level.
- Parameters:
alpha (float, default=0.05) – Significance level.
target (str, default='X') – Which space to use: ‘X’ (original) or ‘Z’ (latent).
groups (dict, numpy.ndarray, or pandas.DataFrame, optional) – Group assignment for features. Accepts: -
dict:{group_name: [feature_indices]}-numpy.ndarray: 1-D array of length d with group labels. -pandas.DataFrame: binary indicator matrix (features x groups).threshold_null (bool, default=True) – Zero out per-feature uncentered UEIFs with negative mean before summing.
multitest_method (str, optional) – Multiple testing correction method. Supports methods from
statsmodels.stats.multitest.multipletests, e.g., ‘bonferroni’, ‘holm’, ‘fdr_bh’ (Benjamini-Hochberg), ‘fdr_by’.var_floor_c (float, default=0.1) – Constant for the variance floor.
var_floor_method (str, default='mixture') – Method for variance floor calculation (‘mixture’ or ‘fixed’).
var_floor_quantile (float, default=0.95) – Quantile for the ‘mixture’ variance floor method.
margin (float, default=0.0) – Hypothesized margin for null hypothesis.
margin_method (str, default='auto') – Method to estimate the margin (‘auto’, ‘mixture’, ‘gap’, or ‘fixed’).
margin_quantile (float, default=0.95) – Quantile for the ‘mixture’ margin method.
alternative (str, default='two-sided') – Alternative hypothesis (‘two-sided’, ‘greater’, or ‘less’).
verbose (bool, default=False) – Whether to print debug information.
- Returns:
Dictionary with the following keys (each an array of length d or G):
'score': estimated feature importance (mean UEIF).'se': standard error of the mean UEIF (after variance floor).'zscore': signed z-statistic(score - margin) / se.'ranking': integer rank by descending z-score (1 = most important).'ci_lower': lower confidence interval bound.'ci_upper': upper confidence interval bound.'reject_null': boolean array, True where null is rejected.'pvalue': two-sided or one-sided p-value.'margin': null hypothesis margin used.'margin_method': method used to select the margin.'alternative': alternative hypothesis string.
Additional keys added when applicable:
'groups': list of group names (whengroupsis provided).'pvalue_adj': multiple-testing-adjusted p-values (whenmultitest_methodis provided).
- Return type:
- summary(alpha=0.05, print_output=True, **kwargs)[source]
Print and return a formatted feature importance summary table.
Computes confidence intervals via
conf_int()and formats the results as a human-readable table. Supports both individual-feature and group-level summaries, as well as multiple-testing correction.- Parameters:
alpha (float, default=0.05) – Significance level passed to
conf_int().print_output (bool, default=True) – If
True, print the table to stdout.**kwargs –
All keyword arguments are forwarded to
conf_int(). Common options include:target('X'or'Z') — which feature space to report.groups— dict, 1-D array, or binary DataFrame for group-level summaries (new in 0.0.5).multitest_method— e.g.'bonferroni','fdr_bh'for multiple-testing correction (new in 0.0.5).threshold_null— zero out negative-mean UEIFs before group aggregation (new in 0.0.5).var_floor_method,var_floor_c,var_floor_quantilemargin,margin_method,margin_quantilealternative('two-sided','greater','less')verbose
- Returns:
The formatted summary string (same text that is printed when
print_output=True).- Return type:
Examples
Individual-feature summary:
explainer(X_test, y=y_test) explainer.summary(alpha=0.05, target="X")
Group-level summary with Bonferroni correction:
explainer.summary( alpha=0.05, target="X", groups=df_groups, threshold_null=True, multitest_method="bonferroni", )
- group_importance(groups, target='X', threshold_null=True, se_adjustment=0.1, alpha=0.05)[source]
Compute group-level feature importance with uncertainty.
Deprecated since version 0.0.5: Use
conf_int()with thegroupsargument instead.- Parameters:
groups (dict, numpy.ndarray, or pandas.DataFrame) –
Group assignment for features. Accepts:
dict:{group_name: [feature_indices]}numpy.ndarray: 1-D array of length d with group labels.pandas.DataFrame: binary indicator matrix (features × groups).
target (str, default='X') – Which space to aggregate:
'X'or'Z'.threshold_null (bool, default=True) – Zero out per-feature UEIFs with negative mean before summing.
se_adjustment (float, default=0.1) – Finite-sample SE correction constant. Set to 0.0 to disable.
alpha (float, default=0.05) – Significance level.
- Returns:
'groups','importance','se','zscore','pvalue'— each an array of length G (number of groups).- Return type:
- diagnose(X_orig=None, Z_full=None, report_title=None)[source]
Public API to compute (or recompute) diagnostics.
- __call__(X, **kwargs)[source]
Compute feature importance for the given input.
- Parameters:
X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).
**kwargs (dict) – Additional parameters for explanation.
- Returns:
Feature importance values. Shape (n_samples, n_features).
- Return type:
- Raises:
NotImplementedError – This method must be implemented by subclasses.
- shap_values(X, **kwargs)[source]
Compute SHAP-like values (alias for __call__).
- Parameters:
X (numpy.ndarray) – Input data to explain.
**kwargs (dict) – Additional parameters.
- Returns:
Feature importance values.
- Return type:
Tree-Based Models
- class fdfi.explainers.TreeExplainer(model, data=None, **kwargs)[source]
Bases:
ExplainerExplainer for tree-based models.
This explainer is optimized for tree-based models like Random Forests, Gradient Boosting, etc.
- Parameters:
model (object) – A tree-based model (e.g., sklearn RandomForest, XGBoost, LightGBM).
data (numpy.ndarray, optional) – Background data.
**kwargs (dict) – Additional parameters.
- __call__(X, **kwargs)[source]
Compute feature importance for tree-based models.
- Parameters:
X (numpy.ndarray) – Input data to explain.
**kwargs (dict) – Additional parameters.
- Returns:
Feature importance values.
- Return type:
Linear Models
- class fdfi.explainers.LinearExplainer(model, data=None, **kwargs)[source]
Bases:
ExplainerExplainer for linear models.
This explainer is optimized for linear models like Linear Regression, Logistic Regression, etc.
- Parameters:
model (object) – A linear model.
data (numpy.ndarray, optional) – Background data.
**kwargs (dict) – Additional parameters.
- __call__(X, **kwargs)[source]
Compute feature importance for linear models.
- Parameters:
X (numpy.ndarray) – Input data to explain.
**kwargs (dict) – Additional parameters.
- Returns:
Feature importance values.
- Return type:
Kernel Methods
- class fdfi.explainers.KernelExplainer(model, data, **kwargs)[source]
Bases:
ExplainerExplainer using kernel-based methods.
This is a model-agnostic explainer that can work with any model.
- Parameters:
model (callable) – The model to explain.
data (numpy.ndarray) – Background data (required for kernel methods).
**kwargs (dict) – Additional parameters.
- __call__(X, **kwargs)[source]
Compute feature importance using kernel methods.
- Parameters:
X (numpy.ndarray) – Input data to explain.
**kwargs (dict) – Additional parameters.
- Returns:
Feature importance values.
- Return type:
Gaussian Optimal Transport (OTExplainer)
The OTExplainer implements Gaussian optimal-transport DFI (Disentangled
Feature Importance) without cross-fitting. This is the recommended starting
point for most use cases.
- class fdfi.explainers.OTExplainer(model, data, nsamples=50, sampling_method='resample', random_state=0, **kwargs)[source]
Bases:
ExplainerOptimal-transport DFI explainer using Gaussian transport.
This is the Gaussian DFI estimator without cross-fitting.
- Parameters:
- __init__(model, data, nsamples=50, sampling_method='resample', random_state=0, **kwargs)[source]
Initialize the OTExplainer.
- __call__(X, **kwargs)[source]
Compute feature importance for the given input.
- Parameters:
X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).
**kwargs (dict) – Additional parameters for explanation.
- Returns:
Feature importance values. Shape (n_samples, n_features).
- Return type:
- Raises:
NotImplementedError – This method must be implemented by subclasses.
Example:
import numpy as np
from fdfi.explainers import OTExplainer
# Create model and data
def model(X):
return X[:, 0] + 2 * X[:, 1]
X_background = np.random.randn(100, 10)
X_test = np.random.randn(10, 10)
# Create explainer and compute importance
explainer = OTExplainer(model, data=X_background, nsamples=50)
results = explainer(X_test)
print("Feature importance (X-space):", results["phi_X"])
print("Standard errors:", results["se_X"])
# Compute confidence intervals with FDR control (Benjamini-Hochberg)
ci = explainer.conf_int(multitest_method='fdr_bh', alpha=0.05)
print("Significant features after FDR control:", np.where(ci["reject_null"])[0])
print("Adjusted p-values:", ci["pvalue_adj"])
Entropic Optimal Transport (EOTExplainer)
The EOTExplainer uses entropic optimal transport with Sinkhorn iterations.
It supports adaptive epsilon, stochastic transport sampling, and both Gaussian
and empirical transport targets.
- class fdfi.explainers.EOTExplainer(model, data, nsamples=50, epsilon=0.1, auto_epsilon=False, sampling_method='resample', random_state=0, **kwargs)[source]
Bases:
ExplainerEntropic optimal-transport DFI explainer using semicontinuous transport and population backward attribution.
Uses the population EOT coupling between the empirical source and continuous N(0, I) target. The forward map is analytical:
Z = c_ε · X_whitened, c_ε = √(1 + ε) / (1 + ε/2)
Backward attribution uses the best linear projection:
E[X_whitened | Z] = M_w · Z
where M_w = E_π[ZZ^T]^{-1} E_π[ZX_w^T] is computed analytically from the semicontinuous coupling moments. This gives the weight matrix W = L @ M_w used for the decomposition:
φ_X_j = Σ_k W[j,k]² · φ_Z_k
Feature importance is measured via the uncentered efficient influence function (UEIF):
UEIF_{i,j} = (Y_i - ŷ_{-j,i})²
where ŷ_{-j} averages predictions over counterfactual resamples of feature j.
- Parameters:
model (callable) – The model to explain. Takes (n, d) array, returns (n,) predictions.
data (numpy.ndarray) – Background data for whitening and resampling. Shape (n, d).
nsamples (int, default=50) – Number of Monte Carlo samples per feature for counterfactual resampling.
epsilon (float, default=0.1) – EOT regularization parameter. Smaller ε → closer to exact OT; larger ε → more Gaussian shrinkage.
auto_epsilon (bool, default=False) – If True, set ε from a median-distance heuristic in whitened space.
sampling_method (str, default='resample') – How to draw counterfactual Z_j values: - ‘resample’: sample from the background Z pool - ‘permutation’: permute within the test set - ‘normal’: sample from N(0, 1)
random_state (int, default=0) – Random seed for reproducibility.
**kwargs (dict) – Extra arguments forwarded to the base Explainer.
- __init__(model, data, nsamples=50, epsilon=0.1, auto_epsilon=False, sampling_method='resample', random_state=0, **kwargs)[source]
Initialize the Explainer.
- __call__(X, y=None, **kwargs)[source]
Compute feature importance for the given input.
- Parameters:
X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).
**kwargs (Any) – Additional parameters for explanation.
y (ndarray | None)
**kwargs
- Returns:
Feature importance values. Shape (n_samples, n_features).
- Return type:
- Raises:
NotImplementedError – This method must be implemented by subclasses.
Example with advanced options:
from fdfi.explainers import EOTExplainer
explainer = EOTExplainer(
model.predict,
X_background,
auto_epsilon=True, # Adaptive regularization
stochastic_transport=True, # Sample from transport kernel
n_transport_samples=10, # Number of transport samples
target="gaussian", # or "empirical"
)
results = explainer(X_test)
Flow-Based DFI (FlowExplainer)
The FlowExplainer implements Flow-Disentangled Feature Importance using
normalizing flows. It supports both CPI (Conditional Permutation Importance)
and SCPI (Sobol-CPI). The key difference is the order of averaging:
CPI: Average predictions first, then squared difference: $(Y - E[f(tilde{X})])^2$
SCPI: Squared differences first, then average: $E[(Y - f(tilde{X}_b))^2]$
- class fdfi.explainers.FlowExplainer(model, data, flow_model=None, fit_flow=True, nsamples=50, sampling_method='resample', permuter=None, method='cpi', random_state=None, verbose='final', compute_diagnostics=True, **kwargs)[source]
Bases:
ExplainerFlow-based DFI explainer using normalizing flows.
Implements CPI (Conditional Permutation Importance) and SCPI (Sobol-CPI) methods. Both measure feature importance in Z-space:
CPI: Squared difference after averaging predictions: (Y - E[f(X_tilde)])^2
SCPI: Conditional variance of predictions: Var[f(X_tilde)]
For L2 loss with independent (disentangled) features, CPI and SCPI give similar results. SCPI is related to the Sobol total-order sensitivity index.
Z-space importance is transformed to X-space using the Jacobian of the flow
phi_X[l] = sum_k H[l,k]^2 * phi_Z[k]where H = dX/dZ is the Jacobian of the decoder transformation.- Parameters:
model (callable) – The model to explain. Should take (n, d) array and return (n,) predictions.
data (numpy.ndarray) – Background data for fitting flow and resampling. Shape (n, d).
flow_model (object, optional) – Pre-trained flow model. If None, will create default FlowMatchingModel.
fit_flow (bool, default=True) – Whether to fit flow model during initialization.
nsamples (int, default=50) – Number of Monte Carlo samples per feature.
sampling_method (str, default='resample') – Method for generating counterfactual Z values: - ‘resample’: Sample from encoded background data - ‘permutation’: Permute within test set - ‘normal’: Sample from standard normal - ‘condperm’: Conditional permutation (regress Z_j | Z_{-j})
permuter (object, optional) – Regressor for conditional permutation method. Defaults to LinearRegression.
method (str, default='cpi') – Which importance method to use: - ‘cpi’: Conditional Permutation Importance - average predictions first - ‘scpi’: Sobol-CPI - average squared differences - ‘both’: Compute both CPI and SCPI
random_state (int, optional) – Random seed for reproducibility.
verbose (bool or str, default='final') – Controls training output: - True or ‘all’: Show full progress bar - ‘final’: Only print final step status (default) - False: Silent
compute_diagnostics (bool, default=True) – Whether to compute disentanglement diagnostics at setup time.
flow_solver_rtol (float, default=1e-3) – Relative tolerance for default ODE integration in flow encode/decode.
flow_solver_atol (float, default=1e-5) – Absolute tolerance for default ODE integration in flow encode/decode.
diagnostics_solver_rtol (float, default=1e-6) – Relative tolerance for diagnostics round-trip integration.
diagnostics_solver_atol (float, default=1e-8) – Absolute tolerance for diagnostics round-trip integration.
**kwargs (dict) – Additional arguments passed to FlowMatchingModel if creating default.
- Z_full
Encoded background data in latent space.
- Type:
Examples
>>> import numpy as np >>> from fdfi.explainers import FlowExplainer >>> >>> # Define a simple model >>> def model(x): ... return x[:, 0] + 2 * x[:, 1] >>> >>> # Create background data >>> X_train = np.random.randn(200, 5) >>> X_test = np.random.randn(50, 5) >>> >>> # CPI only (default) >>> explainer = FlowExplainer(model, X_train, method='cpi') >>> results = explainer(X_test) >>> >>> # SCPI (Sobol-CPI - different averaging order) >>> explainer = FlowExplainer(model, X_train, method='scpi') >>> results = explainer(X_test)
- __init__(model, data, flow_model=None, fit_flow=True, nsamples=50, sampling_method='resample', permuter=None, method='cpi', random_state=None, verbose='final', compute_diagnostics=True, **kwargs)[source]
Initialize the FlowExplainer.
- fit_flow(X=None, num_steps=5000, verbose=None, **kwargs)[source]
Fit the flow model on data.
Can be called after initialization with fit_flow=False, or to refit on new data.
- Parameters:
X (numpy.ndarray, optional) – Data to fit on. If None, uses self.data.
num_steps (int, default=5000) – Number of training steps.
verbose (bool or str, optional) – Controls training output. If None, uses self.verbose. - True or ‘all’: Show full progress bar - ‘final’: Only print final step status (default) - False: Silent
**kwargs – Additional arguments passed to flow_model.fit().
- Returns:
For method chaining.
- Return type:
self
- set_flow(flow_model)[source]
Set a user-provided flow model.
The flow model must have a sample_batch(x, t_span) method where: - t_span=(1, 0) encodes X to Z - t_span=(0, 1) decodes Z to X
- Parameters:
flow_model (object) – A flow model with sample_batch(x, t_span) method.
- Returns:
For method chaining.
- Return type:
self
- __call__(X, **kwargs)[source]
Compute feature importance.
- Parameters:
X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).
**kwargs (dict) – Additional parameters (unused, for API compatibility).
- Returns:
Dictionary containing: - phi_Z: Z-space importance (d,) - CPI or SCPI depending on method - std_Z: Standard deviation (d,) - se_Z: Standard error (d,) - phi_X: X-space importance (d,) - transformed via Jacobian - std_X: Standard deviation (d,) - se_X: Standard error (d,) When method=’both’, also includes phi_Z_scpi, std_Z_scpi, se_Z_scpi.
- Return type:
Example with CPI (default):
from fdfi.explainers import FlowExplainer
explainer = FlowExplainer(
model.predict,
X_background,
fit_flow=True, # Fit normalizing flow during init
method='cpi', # CPI (default)
num_steps=200, # Flow training iterations
nsamples=50, # Monte Carlo samples
random_state=42,
)
results = explainer(X_test)
print("Z-space importance (CPI):", results["phi_Z"])
print("Confidence intervals:")
ci = explainer.conf_int(alpha=0.05, target="Z")
Example with SCPI (Sobol-CPI):
from fdfi.explainers import FlowExplainer
explainer = FlowExplainer(
model.predict,
X_background,
fit_flow=True,
method='scpi', # SCPI (Sobol-CPI)
num_steps=200,
nsamples=50,
)
results = explainer(X_test)
print("Importance (SCPI):", results["phi_Z"])
Using external flow models:
from fdfi.explainers import FlowExplainer
from fdfi.models import FlowMatchingModel
# Train flow externally
flow = FlowMatchingModel(X_background, dim=X_background.shape[1])
flow.fit(num_steps=500, verbose='final')
# Use in explainer
explainer = FlowExplainer(model.predict, X_background, fit_flow=False)
explainer.set_flow(flow)
results = explainer(X_test)
DFIExplainer Alias
DFIExplainer is an alias for OTExplainer for backward compatibility:
- fdfi.explainers.DFIExplainer
Alias for
OTExplainer.
Cross-Fitting (Crossfitting)
The Crossfitting class wraps any of the above explainers and performs
K-fold cross-fitting so that the disentanglement map is never evaluated on
its own training data. This yields valid standard errors and confidence
intervals even when the sample size is small.
- class fdfi.explainers.Crossfitting(model, data, explainer_class=<class 'fdfi.explainers.OTExplainer'>, cv=5, y=None, groups=None, cv_kwargs=None, random_state=None, **kwargs)[source]
Bases:
ExplainerCross-fitted DFI explainer for valid inference at small sample sizes.
Wraps any Explainer subclass and performs cross-fitting using a scikit-learn cross-validation splitter. The disentanglement map is fitted on the training portion of each split and importance is evaluated on the held-out portion. Final estimates are the ensemble average of cross-fitted predictors.
- Parameters:
model (callable) – The model to explain. Takes (n, d) array, returns (n,) predictions.
data (numpy.ndarray) – Full dataset. Shape (n, d).
explainer_class (type, default=OTExplainer) – The explainer class to instantiate per split. Must be a subclass of Explainer (e.g., OTExplainer, EOTExplainer, FlowExplainer).
cv (int or sklearn cross-validation splitter, default=5) – Controls how data is split for cross-fitting. Pass an
intforKFold(n_splits=cv, shuffle=True), or any scikit-learn splitter instance (e.g.KFold,StratifiedKFold,ShuffleSplit,RepeatedKFold,GroupKFold). Any object implementing.split(X, y, groups)is accepted.y (array-like of shape (n,), optional) – Target / response variable. Required only when using a stratified splitter so that fold assignment preserves class distribution.
groups (array-like of shape (n,), optional) – Group labels for group-aware splitters (
GroupKFold, etc.).random_state (int or None, default=None) – Random seed for the default
KFoldsplitter (when cv is int) and passed to child explainers.**kwargs (dict) – Additional keyword arguments forwarded to each split’s explainer constructor (e.g., nsamples, epsilon, sampling_method, num_steps).
cv_kwargs (dict | None)
- cv_
The resolved cross-validation splitter.
- Type:
sklearn splitter instance
- fold_indices
(train_idx, test_idx)for each split.- Type:
- ueifs_X
Per-sample X-space UEIFs, shape (n, d), after calling with
X=None.- Type:
numpy.ndarray or None
- ueifs_Z
Per-sample Z-space UEIFs, shape (n, d), after calling with
X=None.- Type:
numpy.ndarray or None
- __init__(model, data, explainer_class=<class 'fdfi.explainers.OTExplainer'>, cv=5, y=None, groups=None, cv_kwargs=None, random_state=None, **kwargs)[source]
Initialize the Explainer.
- __call__(X=None, **kwargs)[source]
Compute cross-fitted feature importance.
If X is
None, performs full cross-fitting onself.data: each split’s test set is the held-out portion of the data.If X is provided, uses the ensemble of fitted fold explainers to compute importance on X and averages the results.
- Parameters:
X (numpy.ndarray or None) – If None, cross-fit on
self.data(recommended for valid inference). If provided, shape (m, d), ensemble-predict on new data.kwargs (Any)
- Returns:
Same format as OTExplainer / FlowExplainer:
phi_X, std_X, se_X, phi_Z, std_Z, se_Z.- Return type:
Example — cross-fitted OTExplainer (default KFold):
from fdfi.explainers import Crossfitting, OTExplainer
cf = Crossfitting(
model.predict,
data=X_background,
explainer_class=OTExplainer,
cv=5, # 5-fold KFold (default)
nsamples=50,
random_state=42,
)
results = cf() # cross-fit on X_background
ci = cf.conf_int(alpha=0.05)
cf.summary()
Example — using a custom sklearn splitter:
from sklearn.model_selection import StratifiedKFold, ShuffleSplit
from fdfi.explainers import Crossfitting, EOTExplainer
# Stratified K-Fold (preserves class balance)
cf = Crossfitting(
model.predict, X_background,
explainer_class=EOTExplainer,
cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=0),
y=y_train, # required for stratification
nsamples=50,
)
results = cf()
# ShuffleSplit (random train/test splits)
cf = Crossfitting(
model.predict, X_background,
explainer_class=OTExplainer,
cv=ShuffleSplit(n_splits=10, test_size=0.2, random_state=0),
)
results = cf()
Ensemble prediction on new data:
# After cross-fitting, predict on unseen data
results_new = cf(X_test) # averages across all fold explainers
Group Importance
All explainer classes support group-level feature importance via the
groups argument in conf_int(). After running an explainer (so that
per-sample UEIFs are available), call conf_int(groups=...) to obtain
group-level importance, standard errors, and p-values.
Groups can be specified as:
A
dictmapping group names to lists of feature indices.A 1-D
numpyarray of group labels (one per feature).A binary
pandas.DataFrame(features × groups) — features may belong to multiple groups.
Example — dict input:
from fdfi.explainers import OTExplainer
explainer = OTExplainer(model.predict, X_background, nsamples=50)
explainer(X_test)
groups = {"signal": [0, 1, 2], "noise": [3, 4, 5, 6, 7, 8, 9]}
res = explainer.conf_int(groups=groups)
for name, imp, se, p in zip(
res["groups"], res["score"], res["se"], res["pvalue"]
):
print(f"{name}: importance={imp:.4f} se={se:.4f} p={p:.4f}")
Example — pandas DataFrame (overlapping groups):
import pandas as pd
# Features can belong to multiple groups
df_groups = pd.DataFrame({
"group_A": [1, 1, 0, 0, 0],
"group_B": [0, 1, 1, 0, 0], # feature 1 in both A and B
"group_C": [0, 0, 0, 1, 1],
})
res = explainer.conf_int(groups=df_groups)
Example — with Crossfitting:
from fdfi.explainers import Crossfitting, OTExplainer
cf = Crossfitting(model.predict, X_background, cv=5, nsamples=50)
cf() # cross-fit first
res = cf.conf_int(groups={"signal": [0, 1, 2], "noise": [3, 4]})