[ ]:
import fdfi
print('FDFI version:', fdfi.__version__)

EOTExplainer: Semicontinuous Entropic Optimal Transport

This tutorial covers the EOTExplainer, which uses semicontinuous entropic optimal transport with population backward attribution (best linear projection) to compute disentangled feature importance.

What You’ll Learn

  1. How EOT whitening + semicontinuous transport disentangles features

  2. How the population backward attribution maps Z-importance to X-importance

  3. How to run attribution inference with confidence intervals

  4. How epsilon controls the transport shrinkage

[ ]:
import numpy as np
from fdfi.explainers import EOTExplainer
from fdfi.plots import confidence_interval_plot, diagnostics_plot, summary_bar

np.random.seed(42)

Why Semicontinuous EOT?

Gaussian OT whitens data via \(\Sigma^{-1/2}\), which assumes linear structure. Semicontinuous EOT solves the entropic transport problem between the empirical source and a continuous \(\mathcal{N}(0, I)\) target analytically:

\[Z = s \cdot X_{\text{whitened}}, \quad s = \frac{2}{2 + \varepsilon}\]

The population backward attribution computes the best linear projection \(E[X_w \mid Z]\) using the analytically known coupling moments:

\[M_w = E_\pi[ZZ^\top]^{-1} E_\pi[ZX_w^\top]\]

Then the weight matrix \(W = L \cdot M_w\) maps Z-space importance to X-space via:

\[\phi_{X,j} = \sum_k W_{jk}^2 \cdot \phi_{Z,k}\]

Synthetic Data: Relevant vs Null Features

We build a dataset with correlated features. The model directly uses \(X_0, X_2, X_4\), but because \(X_1\) is highly correlated with \(X_0\) (\(\rho = 0.7\)) and \(X_3\) is correlated with \(X_2\) (\(\rho = 0.5\)), they also carry predictive signal. FDFI’s design goal is to detect all features that provide predictive information, so the relevant set is \(\{X_0, X_1, X_2, X_3, X_4\}\), while \(X_5, \ldots, X_9\) are truly null.

[ ]:
# Correlated synthetic data with known active features
n_train = 400
n_test = 150
d = 10

# Build a covariance matrix with block correlations
rng = np.random.default_rng(42)
Sigma = np.eye(d)
# Correlate features 0-1 and 2-3
Sigma[0, 1] = Sigma[1, 0] = 0.7
Sigma[2, 3] = Sigma[3, 2] = 0.5

X_train = rng.multivariate_normal(np.zeros(d), Sigma, size=n_train)
X_test = rng.multivariate_normal(np.zeros(d), Sigma, size=n_test)

# Model directly uses features 0, 2, 4
active_idx = [0, 2, 4]
# Features 1 and 3 carry predictive signal via correlation
# → all 5 are "relevant"; features 5-9 are truly null
relevant_idx = [0, 1, 2, 3, 4]
null_idx = [5, 6, 7, 8, 9]

def exp_model(X):
    return 3.0 * X[:, 0] + 2.0 * X[:, 2] + 1.5 * X[:, 4]

print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)
print("Relevant features:", relevant_idx, "(model uses 0,2,4; 1,3 correlated)")
print("Null features:", null_idx)
print("Correlation(X0, X1):", f"{np.corrcoef(X_train[:, 0], X_train[:, 1])[0, 1]:.3f}")
print("Correlation(X2, X3):", f"{np.corrcoef(X_train[:, 2], X_train[:, 3])[0, 1]:.3f}")
[ ]:
# Sanity check: model predictions
y_preview = exp_model(X_test[:5])
print("Preview predictions:", np.round(y_preview, 3))
print("Response variance:", f"{np.var(exp_model(X_train)):.3f}")

Basic EOTExplainer Usage

Create an explainer, compute importance, and inspect results.

[ ]:
explainer = EOTExplainer(
    exp_model,
    data=X_train,
    nsamples=60,
    auto_epsilon=True,
    random_state=0,
)

results = explainer(X_test)
phi_X = results["phi_X"]

print("Feature importance (phi_X):")
print("-" * 55)
print(f"{'Feature':>8} {'phi_X':>10} {'Status':>12}")
print("-" * 55)
for i in range(d):
    status = "model" if i in active_idx else ("correlated" if i in relevant_idx else "null")
    print(f"{'X_' + str(i):>8} {phi_X[i]:>10.4f} {status:>12}")

print(f"\nAuto epsilon: {explainer.epsilon:.4f}")
print(f"Forward shrinkage s: {explainer.s_fwd:.4f}")
print(f"Backward weight matrix W shape: {explainer.W.shape}")
[ ]:
feature_names = [f"X{i}" for i in range(d)]

summary_bar(
    results["phi_X"],
    results["se_X"],
    feature_names,
    show=False,
)

Attribution Inference

Use one-sided testing to identify features with significant predictive importance. We expect all 5 relevant features (\(X_0\)\(X_4\)) to be detected, while the 5 null features (\(X_5\)\(X_9\)) should not.

[ ]:
# Default conf_int: margin_method="auto" (gap for d<30, mixture for d>=30)
ci = explainer.conf_int(
    alpha=0.05,
    target="X",
    alternative="greater",
    verbose=True,
)

attribution_idx = np.where(ci["reject_null"])[0]
expected = set(relevant_idx)
detected = set(attribution_idx.tolist())

print(f"\nMargin method: {ci['margin_method']}, margin: {ci['margin']:.4f}")
print("Detected features:", sorted(detected))
print("Relevant features:", sorted(expected))
print("True positives:", sorted(expected & detected))
print("False positives:", sorted(detected - expected))
print("Missed:", sorted(expected - detected))
print()
for i in range(d):
    tag = "*" if ci["reject_null"][i] else ""
    status = "model" if i in active_idx else ("corr" if i in relevant_idx else "null")
    print(f"  X_{i} [{status:>5}]: phi={ci['score'][i]:.4f}  se={ci['se'][i]:.4f}"
          f"  z={ci['zscore'][i]:.2f}  rank={ci['ranking'][i]:>2}  p={ci['pvalue'][i]:.4f} {tag}")

[ ]:
feature_names = [f"X{i}" for i in range(d)]

confidence_interval_plot(
    ci,
    feature_names=feature_names,
    show=False,
)

Z-Space vs X-Space Importance

The EOT decomposition first computes importance in the disentangled Z-space, then maps back to X-space via the backward weight matrix \(W\).

[ ]:
phi_Z = results["phi_Z"]
phi_X = results["phi_X"]

print(f"{'Feature':>8} {'phi_Z':>10} {'phi_X':>10}")
print("-" * 32)
for i in range(d):
    print(f"{'X_' + str(i):>8} {phi_Z[i]:>10.4f} {phi_X[i]:>10.4f}")

print(f"\nTotal phi_Z: {phi_Z.sum():.4f}")
print(f"Total phi_X: {phi_X.sum():.4f}")
print("\nNote: phi_Z measures importance in the disentangled space.")
print("phi_X maps it back to original features via the backward weights W.")

Effect of Epsilon on Attribution

Epsilon controls the EOT regularization. Smaller epsilon gives sharper transport (closer to exact OT), while larger epsilon shrinks toward Gaussian transport.

[ ]:
epsilons = [1e-3, 0.01, 0.1]
all_phi = {}

for eps in epsilons:
    exp_eps = EOTExplainer(
        exp_model,
        data=X_train,
        nsamples=60,
        epsilon=eps,
        random_state=0,
    )
    res = exp_eps(X_test)
    all_phi[eps] = res["phi_X"]
    print(f"eps={eps:.2f}  s={exp_eps.s_fwd:.4f}  "
          f"active_mean={res['phi_X'][active_idx].mean():.4f}  "
          f"null_mean={res['phi_X'][[i for i in range(d) if i not in active_idx]].mean():.4f}")

print()
header = f"{'Feature':>8}" + "".join(f"{'eps=' + str(e):>12}" for e in epsilons)
print(header)
print("-" * len(header))
for i in range(d):
    row = f"{'X_' + str(i):>8}"
    for eps in epsilons:
        row += f"{all_phi[eps][i]:>12.4f}"
    print(row)

Compare with OTExplainer (Gaussian Baseline)

The OTExplainer uses plain Gaussian whitening (\(W = L\)). The EOTExplainer adds the population backward projection (\(W = L \cdot M_w\)), which can better handle non-Gaussian structure.

[ ]:
from fdfi.explainers import OTExplainer

explainer_ot = OTExplainer(
    exp_model,
    data=X_train,
    nsamples=60,
    random_state=0,
)
results_ot = explainer_ot(X_test)

phi_ot = results_ot["phi_X"]
phi_eot = results["phi_X"]

print(f"{'Feature':>8} {'OT (Gauss)':>12} {'EOT (Semicont)':>15} {'Status':>10}")
print("-" * 49)
for i in range(d):
    status = "model" if i in active_idx else ("corr" if i in relevant_idx else "null")
    print(f"{'X_' + str(i):>8} {phi_ot[i]:>12.4f} {phi_eot[i]:>15.4f} {status:>10}")

ratio_ot = phi_ot[relevant_idx].mean() / phi_ot[null_idx].mean()
ratio_eot = phi_eot[relevant_idx].mean() / phi_eot[null_idx].mean()
print(f"\nRelevant/null ratio (OT):  {ratio_ot:.2f}x")
print(f"Relevant/null ratio (EOT): {ratio_eot:.2f}x")

Diagnostics and Summary

Use diagnostics to inspect transport quality and summary() for a tabular overview.

[ ]:
diag = explainer.diagnostics
print("Diagnostics:")
print(f"  Latent independence (median dCor): {diag['latent_independence_median']:.6f} [{diag['latent_independence_label']}]")
print(f"  Distribution fidelity (MMD):       {diag['distribution_fidelity_mmd']:.6f} [{diag['distribution_fidelity_label']}]")
print()

# Standardized summary table
_ = explainer.summary(alpha=0.05, target="X", alternative="greater")


diagnostics_plot(diag, feature_names=feature_names, show=False)

Quick Reference

from fdfi.explainers import EOTExplainer

explainer = EOTExplainer(
    model,
    data=X_train,
    nsamples=60,
    auto_epsilon=True,   # median-distance heuristic
    random_state=0,
)

results = explainer(X_test)
# results["phi_X"]  — X-space feature importance
# results["phi_Z"]  — Z-space (disentangled) importance

# Attribution inference
ci = explainer.conf_int(alpha=0.05, target="X", alternative="greater")
significant = np.where(ci["reject_null"])[0]

# Inspect transport quality
explainer.diagnostics

Summary

Key takeaways:

  1. EOTExplainer uses semicontinuous entropic OT — the forward map \(Z = s \cdot X_w\) is analytical (no Sinkhorn needed).

  2. Population backward attribution computes \(W = L \cdot M_w\) using the best linear projection from the coupling moments.

  3. FDFI detects all features with predictive signal, including correlated features — not only those directly in the model.

  4. epsilon controls regularization: smaller → closer to exact OT, larger → more Gaussian shrinkage.

  5. Use auto_epsilon=True for automatic tuning via the median-distance heuristic.

  6. conf_int() provides rigorous attribution inference with confidence intervals.