[ ]:
import fdfi
print('FDFI version:', fdfi.__version__)
EOTExplainer: Semicontinuous Entropic Optimal Transport
This tutorial covers the EOTExplainer, which uses semicontinuous entropic optimal transport with population backward attribution (best linear projection) to compute disentangled feature importance.
What You’ll Learn
How EOT whitening + semicontinuous transport disentangles features
How the population backward attribution maps Z-importance to X-importance
How to run attribution inference with confidence intervals
How epsilon controls the transport shrinkage
[ ]:
import numpy as np
from fdfi.explainers import EOTExplainer
from fdfi.plots import confidence_interval_plot, diagnostics_plot, summary_bar
np.random.seed(42)
Why Semicontinuous EOT?
Gaussian OT whitens data via \(\Sigma^{-1/2}\), which assumes linear structure. Semicontinuous EOT solves the entropic transport problem between the empirical source and a continuous \(\mathcal{N}(0, I)\) target analytically:
The population backward attribution computes the best linear projection \(E[X_w \mid Z]\) using the analytically known coupling moments:
Then the weight matrix \(W = L \cdot M_w\) maps Z-space importance to X-space via:
Synthetic Data: Relevant vs Null Features
We build a dataset with correlated features. The model directly uses \(X_0, X_2, X_4\), but because \(X_1\) is highly correlated with \(X_0\) (\(\rho = 0.7\)) and \(X_3\) is correlated with \(X_2\) (\(\rho = 0.5\)), they also carry predictive signal. FDFI’s design goal is to detect all features that provide predictive information, so the relevant set is \(\{X_0, X_1, X_2, X_3, X_4\}\), while \(X_5, \ldots, X_9\) are truly null.
[ ]:
# Correlated synthetic data with known active features
n_train = 400
n_test = 150
d = 10
# Build a covariance matrix with block correlations
rng = np.random.default_rng(42)
Sigma = np.eye(d)
# Correlate features 0-1 and 2-3
Sigma[0, 1] = Sigma[1, 0] = 0.7
Sigma[2, 3] = Sigma[3, 2] = 0.5
X_train = rng.multivariate_normal(np.zeros(d), Sigma, size=n_train)
X_test = rng.multivariate_normal(np.zeros(d), Sigma, size=n_test)
# Model directly uses features 0, 2, 4
active_idx = [0, 2, 4]
# Features 1 and 3 carry predictive signal via correlation
# → all 5 are "relevant"; features 5-9 are truly null
relevant_idx = [0, 1, 2, 3, 4]
null_idx = [5, 6, 7, 8, 9]
def exp_model(X):
return 3.0 * X[:, 0] + 2.0 * X[:, 2] + 1.5 * X[:, 4]
print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)
print("Relevant features:", relevant_idx, "(model uses 0,2,4; 1,3 correlated)")
print("Null features:", null_idx)
print("Correlation(X0, X1):", f"{np.corrcoef(X_train[:, 0], X_train[:, 1])[0, 1]:.3f}")
print("Correlation(X2, X3):", f"{np.corrcoef(X_train[:, 2], X_train[:, 3])[0, 1]:.3f}")
[ ]:
# Sanity check: model predictions
y_preview = exp_model(X_test[:5])
print("Preview predictions:", np.round(y_preview, 3))
print("Response variance:", f"{np.var(exp_model(X_train)):.3f}")
Basic EOTExplainer Usage
Create an explainer, compute importance, and inspect results.
[ ]:
explainer = EOTExplainer(
exp_model,
data=X_train,
nsamples=60,
auto_epsilon=True,
random_state=0,
)
results = explainer(X_test)
phi_X = results["phi_X"]
print("Feature importance (phi_X):")
print("-" * 55)
print(f"{'Feature':>8} {'phi_X':>10} {'Status':>12}")
print("-" * 55)
for i in range(d):
status = "model" if i in active_idx else ("correlated" if i in relevant_idx else "null")
print(f"{'X_' + str(i):>8} {phi_X[i]:>10.4f} {status:>12}")
print(f"\nAuto epsilon: {explainer.epsilon:.4f}")
print(f"Forward shrinkage s: {explainer.s_fwd:.4f}")
print(f"Backward weight matrix W shape: {explainer.W.shape}")
[ ]:
feature_names = [f"X{i}" for i in range(d)]
summary_bar(
results["phi_X"],
results["se_X"],
feature_names,
show=False,
)
Attribution Inference
Use one-sided testing to identify features with significant predictive importance. We expect all 5 relevant features (\(X_0\)–\(X_4\)) to be detected, while the 5 null features (\(X_5\)–\(X_9\)) should not.
[ ]:
# Default conf_int: margin_method="auto" (gap for d<30, mixture for d>=30)
ci = explainer.conf_int(
alpha=0.05,
target="X",
alternative="greater",
verbose=True,
)
attribution_idx = np.where(ci["reject_null"])[0]
expected = set(relevant_idx)
detected = set(attribution_idx.tolist())
print(f"\nMargin method: {ci['margin_method']}, margin: {ci['margin']:.4f}")
print("Detected features:", sorted(detected))
print("Relevant features:", sorted(expected))
print("True positives:", sorted(expected & detected))
print("False positives:", sorted(detected - expected))
print("Missed:", sorted(expected - detected))
print()
for i in range(d):
tag = "*" if ci["reject_null"][i] else ""
status = "model" if i in active_idx else ("corr" if i in relevant_idx else "null")
print(f" X_{i} [{status:>5}]: phi={ci['score'][i]:.4f} se={ci['se'][i]:.4f}"
f" z={ci['zscore'][i]:.2f} rank={ci['ranking'][i]:>2} p={ci['pvalue'][i]:.4f} {tag}")
[ ]:
feature_names = [f"X{i}" for i in range(d)]
confidence_interval_plot(
ci,
feature_names=feature_names,
show=False,
)
Z-Space vs X-Space Importance
The EOT decomposition first computes importance in the disentangled Z-space, then maps back to X-space via the backward weight matrix \(W\).
[ ]:
phi_Z = results["phi_Z"]
phi_X = results["phi_X"]
print(f"{'Feature':>8} {'phi_Z':>10} {'phi_X':>10}")
print("-" * 32)
for i in range(d):
print(f"{'X_' + str(i):>8} {phi_Z[i]:>10.4f} {phi_X[i]:>10.4f}")
print(f"\nTotal phi_Z: {phi_Z.sum():.4f}")
print(f"Total phi_X: {phi_X.sum():.4f}")
print("\nNote: phi_Z measures importance in the disentangled space.")
print("phi_X maps it back to original features via the backward weights W.")
Effect of Epsilon on Attribution
Epsilon controls the EOT regularization. Smaller epsilon gives sharper transport (closer to exact OT), while larger epsilon shrinks toward Gaussian transport.
[ ]:
epsilons = [1e-3, 0.01, 0.1]
all_phi = {}
for eps in epsilons:
exp_eps = EOTExplainer(
exp_model,
data=X_train,
nsamples=60,
epsilon=eps,
random_state=0,
)
res = exp_eps(X_test)
all_phi[eps] = res["phi_X"]
print(f"eps={eps:.2f} s={exp_eps.s_fwd:.4f} "
f"active_mean={res['phi_X'][active_idx].mean():.4f} "
f"null_mean={res['phi_X'][[i for i in range(d) if i not in active_idx]].mean():.4f}")
print()
header = f"{'Feature':>8}" + "".join(f"{'eps=' + str(e):>12}" for e in epsilons)
print(header)
print("-" * len(header))
for i in range(d):
row = f"{'X_' + str(i):>8}"
for eps in epsilons:
row += f"{all_phi[eps][i]:>12.4f}"
print(row)
Compare with OTExplainer (Gaussian Baseline)
The OTExplainer uses plain Gaussian whitening (\(W = L\)). The EOTExplainer adds the population backward projection (\(W = L \cdot M_w\)), which can better handle non-Gaussian structure.
[ ]:
from fdfi.explainers import OTExplainer
explainer_ot = OTExplainer(
exp_model,
data=X_train,
nsamples=60,
random_state=0,
)
results_ot = explainer_ot(X_test)
phi_ot = results_ot["phi_X"]
phi_eot = results["phi_X"]
print(f"{'Feature':>8} {'OT (Gauss)':>12} {'EOT (Semicont)':>15} {'Status':>10}")
print("-" * 49)
for i in range(d):
status = "model" if i in active_idx else ("corr" if i in relevant_idx else "null")
print(f"{'X_' + str(i):>8} {phi_ot[i]:>12.4f} {phi_eot[i]:>15.4f} {status:>10}")
ratio_ot = phi_ot[relevant_idx].mean() / phi_ot[null_idx].mean()
ratio_eot = phi_eot[relevant_idx].mean() / phi_eot[null_idx].mean()
print(f"\nRelevant/null ratio (OT): {ratio_ot:.2f}x")
print(f"Relevant/null ratio (EOT): {ratio_eot:.2f}x")
Diagnostics and Summary
Use diagnostics to inspect transport quality and summary() for a tabular overview.
[ ]:
diag = explainer.diagnostics
print("Diagnostics:")
print(f" Latent independence (median dCor): {diag['latent_independence_median']:.6f} [{diag['latent_independence_label']}]")
print(f" Distribution fidelity (MMD): {diag['distribution_fidelity_mmd']:.6f} [{diag['distribution_fidelity_label']}]")
print()
# Standardized summary table
_ = explainer.summary(alpha=0.05, target="X", alternative="greater")
diagnostics_plot(diag, feature_names=feature_names, show=False)
Quick Reference
from fdfi.explainers import EOTExplainer
explainer = EOTExplainer(
model,
data=X_train,
nsamples=60,
auto_epsilon=True, # median-distance heuristic
random_state=0,
)
results = explainer(X_test)
# results["phi_X"] — X-space feature importance
# results["phi_Z"] — Z-space (disentangled) importance
# Attribution inference
ci = explainer.conf_int(alpha=0.05, target="X", alternative="greater")
significant = np.where(ci["reject_null"])[0]
# Inspect transport quality
explainer.diagnostics
Summary
Key takeaways:
EOTExplaineruses semicontinuous entropic OT — the forward map \(Z = s \cdot X_w\) is analytical (no Sinkhorn needed).Population backward attribution computes \(W = L \cdot M_w\) using the best linear projection from the coupling moments.
FDFI detects all features with predictive signal, including correlated features — not only those directly in the model.
epsiloncontrols regularization: smaller → closer to exact OT, larger → more Gaussian shrinkage.Use
auto_epsilon=Truefor automatic tuning via the median-distance heuristic.conf_int()provides rigorous attribution inference with confidence intervals.