[ ]:
import fdfi
print('FDFI version:', fdfi.__version__)

Confidence Intervals and Statistical Inference

This tutorial covers statistical inference with FDFI, including confidence intervals, hypothesis testing, and feature selection.

What You’ll Learn

  1. Computing confidence intervals with conf_int()

  2. One-sided vs two-sided tests

  3. Variance floor for stable inference

  4. Practical significance margins

  5. Statistically-driven feature selection

[ ]:
import numpy as np
import matplotlib.pyplot as plt
from fdfi.explainers import OTExplainer
from fdfi.plots import confidence_interval_plot

np.random.seed(42)

Setup

Create a model where we know the true feature importance:

[ ]:
n_features = 10
n_train = 500
n_test = 100

# True importance: features 0, 1, 2 are important; rest are noise
true_coefs = np.array([2.0, 1.5, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])

def model(X):
    return X @ true_coefs

X_train = np.random.randn(n_train, n_features)
X_test = np.random.randn(n_test, n_features)

# Create explainer
explainer = OTExplainer(model, data=X_train, nsamples=100)
results = explainer(X_test)

print("True coefficients:", true_coefs)
print("Estimated importance:", results["phi_X"].round(3))

Basic Confidence Intervals

The conf_int() method computes pointwise confidence intervals:

[ ]:
# Two-sided 95% confidence intervals
ci = explainer.conf_int(alpha=0.05, alternative="two-sided")

print("Two-sided 95% Confidence Intervals:")
print("-" * 70)
print(f"{'Feature':>8} {'Estimate':>10} {'SE':>10} {'CI Lower':>10} {'CI Upper':>10} {'P-value':>10}")
print("-" * 70)
for i in range(n_features):
    sig = "*" if ci["pvalue"][i] < 0.05 else ""
    print(f"{i:>8} {ci['score'][i]:>10.4f} {ci['se'][i]:>10.4f} "
          f"{ci['ci_lower'][i]:>10.4f} {ci['ci_upper'][i]:>10.4f} {ci['pvalue'][i]:>10.4f} {sig}")

Visualize Confidence Intervals

[ ]:
feature_names = [f"X{i}" for i in range(n_features)]

confidence_interval_plot(
    ci,
    feature_names=feature_names,
    show=False,
)

One-Sided Tests

For feature importance, we often care if a feature has positive importance. Use alternative="greater":

[ ]:
# One-sided test: H0: phi <= 0 vs H1: phi > 0
ci_greater = explainer.conf_int(alpha=0.05, alternative="greater")

print("One-sided test (phi > 0):")
print("-" * 60)
print(f"{'Feature':>8} {'Estimate':>10} {'CI Lower':>10} {'P-value':>10} {'Significant':>12}")
print("-" * 60)
for i in range(n_features):
    sig = "Yes" if ci_greater["reject_null"][i] else "No"
    print(f"{i:>8} {ci_greater['score'][i]:>10.4f} "
          f"{ci_greater['ci_lower'][i]:>10.4f} {ci_greater['pvalue'][i]:>10.4f} {sig:>12}")

Variance Floor

When some features have very small variance in their importance estimates, confidence intervals can become too narrow. The variance floor adds a minimum standard error.

Two methods are available:

  • fixed: Use a constant floor value

  • mixture: Fit a two-component mixture to estimate the floor

[ ]:
# Without variance floor
ci_no_floor = explainer.conf_int(alpha=0.05, var_floor_c=0)

# With fixed variance floor
ci_fixed = explainer.conf_int(alpha=0.05, var_floor_method="fixed", var_floor_c=0.1)

# With mixture-based floor
ci_mixture = explainer.conf_int(alpha=0.05, var_floor_method="mixture", var_floor_quantile=0.95)

print("Standard errors comparison:")
print("-" * 55)
print(f"{'Feature':>8} {'No Floor':>12} {'Fixed':>12} {'Mixture':>12}")
print("-" * 55)
for i in range(n_features):
    print(f"{i:>8} {ci_no_floor['se'][i]:>12.4f} {ci_fixed['se'][i]:>12.4f} {ci_mixture['se'][i]:>12.4f}")

Practical Significance Margin

Instead of testing \(H_0: \phi = 0\), you can test against a practical threshold \(\delta\):

\[H_0: \phi \leq \delta \quad \text{vs} \quad H_1: \phi > \delta\]

This identifies features that are not just statistically different from zero, but also practically meaningful.

[ ]:
# Test with practical margin of 0.5
margin = 0.5
ci_margin = explainer.conf_int(
    alpha=0.05,
    alternative="greater",
    margin=margin
)

print(f"Testing H0: phi <= {margin}")
print("-" * 50)
print(f"{'Feature':>8} {'Estimate':>10} {'P-value':>10} {'Significant':>12}")
print("-" * 50)
for i in range(n_features):
    sig = "Yes" if ci_margin["reject_null"][i] else "No"
    print(f"{i:>8} {ci_margin['score'][i]:>10.4f} {ci_margin['pvalue'][i]:>10.4f} {sig:>12}")

Automatic Margin via Mixture Model

Use margin_method="mixture" to automatically estimate a practical margin:

[ ]:
ci_auto_margin = explainer.conf_int(
    alpha=0.05,
    alternative="greater",
    margin_method="mixture",
    margin_quantile=0.95,
)

print(f"Automatically selected margin: {ci_auto_margin['margin']:.4f}")
print(f"Significant features: {np.where(ci_auto_margin['reject_null'])[0]}")

Feature Selection with Statistical Guarantees

Use the confidence intervals to select features with controlled false discovery:

[ ]:
def statistical_feature_selection(explainer, X_test, alpha=0.05, margin=0.0):
    """Select features with statistical significance."""
    # Compute importance
    results = explainer(X_test)

    # Get confidence intervals
    ci = explainer.conf_int(
        alpha=alpha,
        alternative="greater",
        margin=margin,
        var_floor_method="mixture",
    )

    # Select significant features
    selected = np.where(ci["reject_null"])[0]

    # Sort by importance
    sorted_idx = np.argsort(ci["score"][selected])[::-1]

    return selected[sorted_idx], ci

# Run feature selection
selected_features, ci_result = statistical_feature_selection(
    explainer, X_test, alpha=0.05, margin=0.0
)

print("Selected Features (sorted by importance):")
print("-" * 40)
for i, feat in enumerate(selected_features):
    print(f"  {i+1}. Feature {feat} (importance = {ci_result['score'][feat]:.4f})")

print(f"\nTrue important features: 0, 1, 2")
print(f"Correctly identified: {set(selected_features) & {0, 1, 2}}")

The summary() Method

For a quick formatted view, use the built-in summary() method:

[ ]:
# Print formatted summary
explainer.summary(
    alpha=0.05,
    alternative="greater",
    var_floor_method="mixture",
)

Multiple Testing Correction

When explaining models with many features, it is important to control the false discovery rate (FDR). You can specify a multitest_method in both conf_int() and summary():

[ ]:
# Summary with FDR control (Benjamini-Hochberg)
explainer.summary(
    alpha=0.05,
    alternative="greater",
    multitest_method="fdr_bh"
)

Summary

Key takeaways:

  1. conf_int() provides confidence intervals and p-values for feature importance

  2. Use alternative="greater" for one-sided tests of positive importance

  3. Variance floor (var_floor_method) stabilizes inference for small effects

  4. Practical margin (margin) tests against meaningful thresholds

  5. Use summary() for quick formatted output