Explainers

Overview

The fdfi.explainers module provides classes for computing flow-disentangled feature importance. The main classes are:

Base Explainer

class fdfi.explainers.Explainer(model, data=None, **kwargs)[source]

Bases: object

Base class for DFI explainers.

This class provides the interface for computing feature importance using disentangled methods, similar to SHAP explainers. It also provides post-hoc confidence intervals via conf_int() and formatted summaries via summary().

Parameters:
  • model (callable) – The model to explain. Should be a function that takes a numpy array and returns predictions.

  • data (numpy.ndarray, optional) – Background data to use for explanations.

  • **kwargs (dict) – Additional parameters for the explainer.

model

The model being explained.

Type:

callable

data

Background data for explanations.

Type:

numpy.ndarray or None

Examples

>>> import numpy as np
>>> from dfi import Explainer
>>>
>>> # Define a simple model
>>> def model(x):
...     return x.sum(axis=1)
>>>
>>> # Create an explainer
>>> explainer = Explainer(model)
>>>
>>> # Compute explanations (when implemented)
>>> # explanations = explainer(X_test)
__init__(model, data=None, **kwargs)[source]

Initialize the Explainer.

Parameters:
conf_int(alpha=0.05, target='X', groups=None, threshold_null=True, multitest_method=None, var_floor_c=0.1, var_floor_method='mixture', var_floor_quantile=0.95, margin=0.0, margin_method='auto', margin_quantile=0.95, alternative='two-sided', verbose=False)[source]

Compute confidence intervals and significance statistics for feature importance.

If groups is provided, computes importance and uncertainty at the group level.

Parameters:
  • alpha (float, default=0.05) – Significance level.

  • target (str, default='X') – Which space to use: ‘X’ (original) or ‘Z’ (latent).

  • groups (dict, numpy.ndarray, or pandas.DataFrame, optional) – Group assignment for features. Accepts: - dict: {group_name: [feature_indices]} - numpy.ndarray: 1-D array of length d with group labels. - pandas.DataFrame: binary indicator matrix (features x groups).

  • threshold_null (bool, default=True) – Zero out per-feature uncentered UEIFs with negative mean before summing.

  • multitest_method (str, optional) – Multiple testing correction method. Supports methods from statsmodels.stats.multitest.multipletests, e.g., ‘bonferroni’, ‘holm’, ‘fdr_bh’ (Benjamini-Hochberg), ‘fdr_by’.

  • var_floor_c (float, default=0.1) – Constant for the variance floor.

  • var_floor_method (str, default='mixture') – Method for variance floor calculation (‘mixture’ or ‘fixed’).

  • var_floor_quantile (float, default=0.95) – Quantile for the ‘mixture’ variance floor method.

  • margin (float, default=0.0) – Hypothesized margin for null hypothesis.

  • margin_method (str, default='auto') – Method to estimate the margin (‘auto’, ‘mixture’, ‘gap’, or ‘fixed’).

  • margin_quantile (float, default=0.95) – Quantile for the ‘mixture’ margin method.

  • alternative (str, default='two-sided') – Alternative hypothesis (‘two-sided’, ‘greater’, or ‘less’).

  • verbose (bool, default=False) – Whether to print debug information.

Returns:

Dictionary containing ‘score’, ‘se’, ‘ci_lower’, ‘ci_upper’, ‘reject_null’, ‘pvalue’, ‘margin’, and ‘alternative’. If groups is provided, ‘groups’ (list of names) is also included. If multitest_method is provided, ‘pvalue_adj’ is also included.

Return type:

dict

summary(alpha=0.05, print_output=True, **kwargs)[source]
Parameters:
Return type:

str

group_importance(groups, target='X', threshold_null=True, se_adjustment=0.1, alpha=0.05)[source]

Compute group-level feature importance with uncertainty.

Deprecated since version 0.2.0: Use conf_int() with the groups argument instead.

Parameters:
  • groups (dict, numpy.ndarray, or pandas.DataFrame) –

    Group assignment for features. Accepts:

    • dict: {group_name: [feature_indices]}

    • numpy.ndarray: 1-D array of length d with group labels.

    • pandas.DataFrame: binary indicator matrix (features × groups).

  • target (str, default='X') – Which space to aggregate: 'X' or 'Z'.

  • threshold_null (bool, default=True) – Zero out per-feature UEIFs with negative mean before summing.

  • se_adjustment (float, default=0.1) – Finite-sample SE correction constant. Set to 0.0 to disable.

  • alpha (float, default=0.05) – Significance level.

Returns:

'groups', 'importance', 'se', 'zscore', 'pvalue' — each an array of length G (number of groups).

Return type:

dict

diagnose(X_orig=None, Z_full=None, report_title=None)[source]

Public API to compute (or recompute) diagnostics.

Parameters:
Return type:

dict

__call__(X, **kwargs)[source]

Compute feature importance for the given input.

Parameters:
  • X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).

  • **kwargs (dict) – Additional parameters for explanation.

Returns:

Feature importance values. Shape (n_samples, n_features).

Return type:

numpy.ndarray

Raises:

NotImplementedError – This method must be implemented by subclasses.

shap_values(X, **kwargs)[source]

Compute SHAP-like values (alias for __call__).

Parameters:
  • X (numpy.ndarray) – Input data to explain.

  • **kwargs (dict) – Additional parameters.

Returns:

Feature importance values.

Return type:

numpy.ndarray

Tree-Based Models

class fdfi.explainers.TreeExplainer(model, data=None, **kwargs)[source]

Bases: Explainer

Explainer for tree-based models.

This explainer is optimized for tree-based models like Random Forests, Gradient Boosting, etc.

Parameters:
  • model (object) – A tree-based model (e.g., sklearn RandomForest, XGBoost, LightGBM).

  • data (numpy.ndarray, optional) – Background data.

  • **kwargs (dict) – Additional parameters.

__init__(model, data=None, **kwargs)[source]

Initialize the TreeExplainer.

Parameters:
__call__(X, **kwargs)[source]

Compute feature importance for tree-based models.

Parameters:
  • X (numpy.ndarray) – Input data to explain.

  • **kwargs (dict) – Additional parameters.

Returns:

Feature importance values.

Return type:

numpy.ndarray

Linear Models

class fdfi.explainers.LinearExplainer(model, data=None, **kwargs)[source]

Bases: Explainer

Explainer for linear models.

This explainer is optimized for linear models like Linear Regression, Logistic Regression, etc.

Parameters:
  • model (object) – A linear model.

  • data (numpy.ndarray, optional) – Background data.

  • **kwargs (dict) – Additional parameters.

__init__(model, data=None, **kwargs)[source]

Initialize the LinearExplainer.

Parameters:
__call__(X, **kwargs)[source]

Compute feature importance for linear models.

Parameters:
  • X (numpy.ndarray) – Input data to explain.

  • **kwargs (dict) – Additional parameters.

Returns:

Feature importance values.

Return type:

numpy.ndarray

Kernel Methods

class fdfi.explainers.KernelExplainer(model, data, **kwargs)[source]

Bases: Explainer

Explainer using kernel-based methods.

This is a model-agnostic explainer that can work with any model.

Parameters:
  • model (callable) – The model to explain.

  • data (numpy.ndarray) – Background data (required for kernel methods).

  • **kwargs (dict) – Additional parameters.

__init__(model, data, **kwargs)[source]

Initialize the KernelExplainer.

Parameters:
__call__(X, **kwargs)[source]

Compute feature importance using kernel methods.

Parameters:
  • X (numpy.ndarray) – Input data to explain.

  • **kwargs (dict) – Additional parameters.

Returns:

Feature importance values.

Return type:

numpy.ndarray

Gaussian Optimal Transport (OTExplainer)

The OTExplainer implements Gaussian optimal-transport DFI (Disentangled Feature Importance) without cross-fitting. This is the recommended starting point for most use cases.

class fdfi.explainers.OTExplainer(model, data, nsamples=50, sampling_method='resample', random_state=0, **kwargs)[source]

Bases: Explainer

Optimal-transport DFI explainer using Gaussian transport.

This is the Gaussian DFI estimator without cross-fitting.

Parameters:
__init__(model, data, nsamples=50, sampling_method='resample', random_state=0, **kwargs)[source]

Initialize the OTExplainer.

Parameters:
__call__(X, **kwargs)[source]

Compute feature importance for the given input.

Parameters:
  • X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).

  • **kwargs (dict) – Additional parameters for explanation.

Returns:

Feature importance values. Shape (n_samples, n_features).

Return type:

numpy.ndarray

Raises:

NotImplementedError – This method must be implemented by subclasses.

Example:

import numpy as np
from fdfi.explainers import OTExplainer

# Create model and data
def model(X):
    return X[:, 0] + 2 * X[:, 1]

X_background = np.random.randn(100, 10)
X_test = np.random.randn(10, 10)

# Create explainer and compute importance
explainer = OTExplainer(model, data=X_background, nsamples=50)
results = explainer(X_test)

print("Feature importance (X-space):", results["phi_X"])
print("Standard errors:", results["se_X"])

# Compute confidence intervals with FDR control (Benjamini-Hochberg)
ci = explainer.conf_int(multitest_method='fdr_bh', alpha=0.05)
print("Significant features after FDR control:", np.where(ci["reject_null"])[0])
print("Adjusted p-values:", ci["pvalue_adj"])

Entropic Optimal Transport (EOTExplainer)

The EOTExplainer uses entropic optimal transport with Sinkhorn iterations. It supports adaptive epsilon, stochastic transport sampling, and both Gaussian and empirical transport targets.

class fdfi.explainers.EOTExplainer(model, data, nsamples=50, epsilon=0.1, auto_epsilon=False, sampling_method='resample', random_state=0, **kwargs)[source]

Bases: Explainer

Entropic optimal-transport DFI explainer using semicontinuous transport and population backward attribution.

Uses the population EOT coupling between the empirical source and continuous N(0, I) target. The forward map is analytical:

Z = c_ε · X_whitened, c_ε = √(1 + ε) / (1 + ε/2)

Backward attribution uses the best linear projection:

E[X_whitened | Z] = M_w · Z

where M_w = E_π[ZZ^T]^{-1} E_π[ZX_w^T] is computed analytically from the semicontinuous coupling moments. This gives the weight matrix W = L @ M_w used for the decomposition:

φ_X_j = Σ_k W[j,k]² · φ_Z_k

Feature importance is measured via the uncentered efficient influence function (UEIF):

UEIF_{i,j} = (Y_i - ŷ_{-j,i})²

where ŷ_{-j} averages predictions over counterfactual resamples of feature j.

Parameters:
  • model (callable) – The model to explain. Takes (n, d) array, returns (n,) predictions.

  • data (numpy.ndarray) – Background data for whitening and resampling. Shape (n, d).

  • nsamples (int, default=50) – Number of Monte Carlo samples per feature for counterfactual resampling.

  • epsilon (float, default=0.1) – EOT regularization parameter. Smaller ε → closer to exact OT; larger ε → more Gaussian shrinkage.

  • auto_epsilon (bool, default=False) – If True, set ε from a median-distance heuristic in whitened space.

  • sampling_method (str, default='resample') – How to draw counterfactual Z_j values: - ‘resample’: sample from the background Z pool - ‘permutation’: permute within the test set - ‘normal’: sample from N(0, 1)

  • random_state (int, default=0) – Random seed for reproducibility.

  • **kwargs (dict) – Extra arguments forwarded to the base Explainer.

__init__(model, data, nsamples=50, epsilon=0.1, auto_epsilon=False, sampling_method='resample', random_state=0, **kwargs)[source]

Initialize the Explainer.

Parameters:
__call__(X, **kwargs)[source]

Compute feature importance for the given input.

Parameters:
  • X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).

  • **kwargs (dict) – Additional parameters for explanation.

Returns:

Feature importance values. Shape (n_samples, n_features).

Return type:

numpy.ndarray

Raises:

NotImplementedError – This method must be implemented by subclasses.

Example with advanced options:

from fdfi.explainers import EOTExplainer

explainer = EOTExplainer(
    model.predict,
    X_background,
    auto_epsilon=True,           # Adaptive regularization
    stochastic_transport=True,   # Sample from transport kernel
    n_transport_samples=10,      # Number of transport samples
    target="gaussian",           # or "empirical"
)
results = explainer(X_test)

Shared Disentanglement Diagnostics

OTExplainer, EOTExplainer, and FlowExplainer expose a shared diagnostics interface via:

  • explainer.diagnostics (computed at setup by default)

  • explainer.diagnose(...) (recompute manually)

The diagnostics dictionary contains:

  • latent_independence_dcor (pairwise dCor matrix)

  • latent_independence_median and latent_independence_label

  • distribution_fidelity_mmd and distribution_fidelity_label

diag = explainer.diagnostics
# or: diag = explainer.diagnose()
print(diag["latent_independence_median"], diag["latent_independence_label"])
print(diag["distribution_fidelity_mmd"], diag["distribution_fidelity_label"])

Flow-Based DFI (FlowExplainer)

The FlowExplainer implements Flow-Disentangled Feature Importance using normalizing flows. It supports both CPI (Conditional Permutation Importance) and SCPI (Sobol-CPI). The key difference is the order of averaging:

  • CPI: Average predictions first, then squared difference: $(Y - E[f(tilde{X})])^2$

  • SCPI: Squared differences first, then average: $E[(Y - f(tilde{X}_b))^2]$

class fdfi.explainers.FlowExplainer(model, data, flow_model=None, fit_flow=True, nsamples=50, sampling_method='resample', permuter=None, method='cpi', random_state=None, verbose='final', compute_diagnostics=True, **kwargs)[source]

Bases: Explainer

Flow-based DFI explainer using normalizing flows.

Implements CPI (Conditional Permutation Importance) and SCPI (Sobol-CPI) methods. Both measure feature importance in Z-space:

  • CPI: Squared difference after averaging predictions: (Y - E[f(X_tilde)])^2

  • SCPI: Conditional variance of predictions: Var[f(X_tilde)]

For L2 loss with independent (disentangled) features, CPI and SCPI give similar results. SCPI is related to the Sobol total-order sensitivity index.

Z-space importance is transformed to X-space using the Jacobian of the flow phi_X[l] = sum_k H[l,k]^2 * phi_Z[k] where H = dX/dZ is the Jacobian of the decoder transformation.

Parameters:
  • model (callable) – The model to explain. Should take (n, d) array and return (n,) predictions.

  • data (numpy.ndarray) – Background data for fitting flow and resampling. Shape (n, d).

  • flow_model (object, optional) – Pre-trained flow model. If None, will create default FlowMatchingModel.

  • fit_flow (bool, default=True) – Whether to fit flow model during initialization.

  • nsamples (int, default=50) – Number of Monte Carlo samples per feature.

  • sampling_method (str, default='resample') – Method for generating counterfactual Z values: - ‘resample’: Sample from encoded background data - ‘permutation’: Permute within test set - ‘normal’: Sample from standard normal - ‘condperm’: Conditional permutation (regress Z_j | Z_{-j})

  • permuter (object, optional) – Regressor for conditional permutation method. Defaults to LinearRegression.

  • method (str, default='cpi') – Which importance method to use: - ‘cpi’: Conditional Permutation Importance - average predictions first - ‘scpi’: Sobol-CPI - average squared differences - ‘both’: Compute both CPI and SCPI

  • random_state (int, optional) – Random seed for reproducibility.

  • verbose (bool or str, default='final') – Controls training output: - True or ‘all’: Show full progress bar - ‘final’: Only print final step status (default) - False: Silent

  • compute_diagnostics (bool, default=True) – Whether to compute disentanglement diagnostics at setup time.

  • flow_solver_rtol (float, default=1e-3) – Relative tolerance for default ODE integration in flow encode/decode.

  • flow_solver_atol (float, default=1e-5) – Absolute tolerance for default ODE integration in flow encode/decode.

  • diagnostics_solver_rtol (float, default=1e-6) – Relative tolerance for diagnostics round-trip integration.

  • diagnostics_solver_atol (float, default=1e-8) – Absolute tolerance for diagnostics round-trip integration.

  • **kwargs (dict) – Additional arguments passed to FlowMatchingModel if creating default.

flow_model

The fitted normalizing flow model.

Type:

object

Z_full

Encoded background data in latent space.

Type:

numpy.ndarray

method

The importance method being used (‘cpi’, ‘scpi’, or ‘both’).

Type:

str

Examples

>>> import numpy as np
>>> from fdfi.explainers import FlowExplainer
>>>
>>> # Define a simple model
>>> def model(x):
...     return x[:, 0] + 2 * x[:, 1]
>>>
>>> # Create background data
>>> X_train = np.random.randn(200, 5)
>>> X_test = np.random.randn(50, 5)
>>>
>>> # CPI only (default)
>>> explainer = FlowExplainer(model, X_train, method='cpi')
>>> results = explainer(X_test)
>>>
>>> # SCPI (Sobol-CPI - different averaging order)
>>> explainer = FlowExplainer(model, X_train, method='scpi')
>>> results = explainer(X_test)
__init__(model, data, flow_model=None, fit_flow=True, nsamples=50, sampling_method='resample', permuter=None, method='cpi', random_state=None, verbose='final', compute_diagnostics=True, **kwargs)[source]

Initialize the FlowExplainer.

Parameters:
fit_flow(X=None, num_steps=5000, verbose=None, **kwargs)[source]

Fit the flow model on data.

Can be called after initialization with fit_flow=False, or to refit on new data.

Parameters:
  • X (numpy.ndarray, optional) – Data to fit on. If None, uses self.data.

  • num_steps (int, default=5000) – Number of training steps.

  • verbose (bool or str, optional) – Controls training output. If None, uses self.verbose. - True or ‘all’: Show full progress bar - ‘final’: Only print final step status (default) - False: Silent

  • **kwargs – Additional arguments passed to flow_model.fit().

Returns:

For method chaining.

Return type:

self

set_flow(flow_model)[source]

Set a user-provided flow model.

The flow model must have a sample_batch(x, t_span) method where: - t_span=(1, 0) encodes X to Z - t_span=(0, 1) decodes Z to X

Parameters:

flow_model (object) – A flow model with sample_batch(x, t_span) method.

Returns:

For method chaining.

Return type:

self

__call__(X, **kwargs)[source]

Compute feature importance.

Parameters:
  • X (numpy.ndarray) – Input data to explain. Shape (n_samples, n_features).

  • **kwargs (dict) – Additional parameters (unused, for API compatibility).

Returns:

Dictionary containing: - phi_Z: Z-space importance (d,) - CPI or SCPI depending on method - std_Z: Standard deviation (d,) - se_Z: Standard error (d,) - phi_X: X-space importance (d,) - transformed via Jacobian - std_X: Standard deviation (d,) - se_X: Standard error (d,) When method=’both’, also includes phi_Z_scpi, std_Z_scpi, se_Z_scpi.

Return type:

dict

Example with CPI (default):

from fdfi.explainers import FlowExplainer

explainer = FlowExplainer(
    model.predict,
    X_background,
    fit_flow=True,           # Fit normalizing flow during init
    method='cpi',            # CPI (default)
    num_steps=200,           # Flow training iterations
    nsamples=50,             # Monte Carlo samples
    random_state=42,
)
results = explainer(X_test)

print("Z-space importance (CPI):", results["phi_Z"])
print("Confidence intervals:")
ci = explainer.conf_int(alpha=0.05, target="Z")

Example with SCPI (Sobol-CPI):

from fdfi.explainers import FlowExplainer

explainer = FlowExplainer(
    model.predict,
    X_background,
    fit_flow=True,
    method='scpi',           # SCPI (Sobol-CPI)
    num_steps=200,
    nsamples=50,
)
results = explainer(X_test)

print("Importance (SCPI):", results["phi_Z"])

Using external flow models:

from fdfi.explainers import FlowExplainer
from fdfi.models import FlowMatchingModel

# Train flow externally
flow = FlowMatchingModel(X_background, dim=X_background.shape[1])
flow.fit(num_steps=500, verbose='final')

# Use in explainer
explainer = FlowExplainer(model.predict, X_background, fit_flow=False)
explainer.set_flow(flow)
results = explainer(X_test)

DFIExplainer Alias

DFIExplainer is an alias for OTExplainer for backward compatibility:

fdfi.explainers.DFIExplainer

Alias for OTExplainer.

Cross-Fitting (Crossfitting)

The Crossfitting class wraps any of the above explainers and performs K-fold cross-fitting so that the disentanglement map is never evaluated on its own training data. This yields valid standard errors and confidence intervals even when the sample size is small.

class fdfi.explainers.Crossfitting(model, data, explainer_class=<class 'fdfi.explainers.OTExplainer'>, cv=5, y=None, groups=None, random_state=None, **kwargs)[source]

Bases: Explainer

Cross-fitted DFI explainer for valid inference at small sample sizes.

Wraps any Explainer subclass and performs cross-fitting using a scikit-learn cross-validation splitter. The disentanglement map is fitted on the training portion of each split and importance is evaluated on the held-out portion. Final estimates are the ensemble average of cross-fitted predictors.

Parameters:
  • model (callable) – The model to explain. Takes (n, d) array, returns (n,) predictions.

  • data (numpy.ndarray) – Full dataset. Shape (n, d).

  • explainer_class (type, default=OTExplainer) – The explainer class to instantiate per split. Must be a subclass of Explainer (e.g., OTExplainer, EOTExplainer, FlowExplainer).

  • cv (int or sklearn cross-validation splitter, default=5) –

    Controls how data is split for cross-fitting: - int: number of folds for KFold(shuffle=True). - sklearn splitter instance: used directly, e.g. KFold,

    StratifiedKFold, ShuffleSplit, RepeatedKFold, GroupKFold, etc.

    Any object implementing .split(X, y, groups) is accepted.

  • y (array-like of shape (n,), optional) – Target / response variable. Required only when using a stratified splitter so that fold assignment preserves class distribution.

  • groups (array-like of shape (n,), optional) – Group labels for group-aware splitters (GroupKFold, etc.).

  • random_state (int or None, default=None) – Random seed for the default KFold splitter (when cv is int) and passed to child explainers.

  • **kwargs (dict) – Additional keyword arguments forwarded to each split’s explainer constructor (e.g., nsamples, epsilon, sampling_method, num_steps).

cv_

The resolved cross-validation splitter.

Type:

sklearn splitter instance

fold_explainers

The fitted explainer instances (one per split).

Type:

list[Explainer]

fold_indices

(train_idx, test_idx) for each split.

Type:

list[tuple[numpy.ndarray, numpy.ndarray]]

ueifs_X

Per-sample X-space UEIFs, shape (n, d), after calling with X=None.

Type:

numpy.ndarray or None

ueifs_Z

Per-sample Z-space UEIFs, shape (n, d), after calling with X=None.

Type:

numpy.ndarray or None

__init__(model, data, explainer_class=<class 'fdfi.explainers.OTExplainer'>, cv=5, y=None, groups=None, random_state=None, **kwargs)[source]

Initialize the Explainer.

Parameters:
__call__(X=None, **kwargs)[source]

Compute cross-fitted feature importance.

If X is None, performs full cross-fitting on self.data: each split’s test set is the held-out portion of the data.

If X is provided, uses the ensemble of fitted fold explainers to compute importance on X and averages the results.

Parameters:
  • X (numpy.ndarray or None) – If None, cross-fit on self.data (recommended for valid inference). If provided, shape (m, d), ensemble-predict on new data.

  • kwargs (Any)

Returns:

Same format as OTExplainer / FlowExplainer: phi_X, std_X, se_X, phi_Z, std_Z, se_Z.

Return type:

dict

Example — cross-fitted OTExplainer (default KFold):

from fdfi.explainers import Crossfitting, OTExplainer

cf = Crossfitting(
    model.predict,
    data=X_background,
    explainer_class=OTExplainer,
    cv=5,                    # 5-fold KFold (default)
    nsamples=50,
    random_state=42,
)
results = cf()               # cross-fit on X_background
ci = cf.conf_int(alpha=0.05)
cf.summary()

Example — using a custom sklearn splitter:

from sklearn.model_selection import StratifiedKFold, ShuffleSplit
from fdfi.explainers import Crossfitting, EOTExplainer

# Stratified K-Fold (preserves class balance)
cf = Crossfitting(
    model.predict, X_background,
    explainer_class=EOTExplainer,
    cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=0),
    y=y_train,               # required for stratification
    nsamples=50,
)
results = cf()

# ShuffleSplit (random train/test splits)
cf = Crossfitting(
    model.predict, X_background,
    explainer_class=OTExplainer,
    cv=ShuffleSplit(n_splits=10, test_size=0.2, random_state=0),
)
results = cf()

Ensemble prediction on new data:

# After cross-fitting, predict on unseen data
results_new = cf(X_test)     # averages across all fold explainers

Group Importance

All explainer classes support group-level feature importance via the groups argument in conf_int(). After running an explainer (so that per-sample UEIFs are available), call conf_int(groups=...) to obtain group-level importance, standard errors, and p-values.

Groups can be specified as:

  • A dict mapping group names to lists of feature indices.

  • A 1-D numpy array of group labels (one per feature).

  • A binary pandas.DataFrame (features × groups) — features may belong to multiple groups.

Example — dict input:

from fdfi.explainers import OTExplainer

explainer = OTExplainer(model.predict, X_background, nsamples=50)
explainer(X_test)

groups = {"signal": [0, 1, 2], "noise": [3, 4, 5, 6, 7, 8, 9]}
res = explainer.conf_int(groups=groups)

for name, imp, se, p in zip(
    res["groups"], res["score"], res["se"], res["pvalue"]
):
    print(f"{name}: importance={imp:.4f}  se={se:.4f}  p={p:.4f}")

Example — pandas DataFrame (overlapping groups):

import pandas as pd

# Features can belong to multiple groups
df_groups = pd.DataFrame({
    "group_A": [1, 1, 0, 0, 0],
    "group_B": [0, 1, 1, 0, 0],   # feature 1 in both A and B
    "group_C": [0, 0, 0, 1, 1],
})
res = explainer.conf_int(groups=df_groups)

Example — with Crossfitting:

from fdfi.explainers import Crossfitting, OTExplainer

cf = Crossfitting(model.predict, X_background, cv=5, nsamples=50)
cf()  # cross-fit first
res = cf.conf_int(groups={"signal": [0, 1, 2], "noise": [3, 4]})