Choosing an Explainer
=====================

DFI provides several explainer classes for different use cases. This guide 
helps you choose the right one.

Quick Decision Guide
--------------------

.. list-table::
   :header-rows: 1
   :widths: 30 30 40

   * - Situation
     - Recommended
     - Notes
   * - General use, continuous data
     - ``OTExplainer``
     - Fast, stable, good default
   * - Non-Gaussian data
     - ``EOTExplainer``
     - Adaptive epsilon, more flexible
   * - Complex multimodal data
     - ``FlowExplainer``
     - Learns data distribution via normalizing flow
   * - Small sample / valid inference
     - ``Crossfitting``
     - Wraps any explainer with K-fold cross-fitting
   * - Mixed data types
     - ``EOTExplainer`` with Gower
     - Use ``cost_metric="gower"``
   * - Tree-based models
     - ``TreeExplainer``
     - Optimized for RF, XGBoost, etc.
   * - Linear models
     - ``LinearExplainer``
     - Exact for linear models
   * - Any black-box model
     - ``OTExplainer`` or ``KernelExplainer``
     - Model-agnostic

OTExplainer (Gaussian OT)
-------------------------

**Best for:** Continuous data that is approximately Gaussian

**Pros:**

- Fast closed-form computation
- Stable and reliable
- Good starting point for most problems

**Cons:**

- Assumes Gaussian structure
- May be suboptimal for heavy-tailed or multimodal data

**Example:**

.. code-block:: python

   from fdfi.explainers import OTExplainer

   explainer = OTExplainer(
       model.predict,
       data=X_background,
       nsamples=50,            # Monte Carlo samples per feature
       sampling_method="resample",  # or "permutation", "normal"
   )
   results = explainer(X_test)

EOTExplainer (Entropic OT)
--------------------------

**Best for:** Non-Gaussian, multimodal, or mixed-type data

**Pros:**

- Relaxes Gaussian assumption
- Adaptive regularization (``auto_epsilon=True``)
- Supports categorical features via Gower distance
- Stochastic transport for variance reduction

**Cons:**

- Slower than Gaussian OT
- More hyperparameters to tune

**Key options:**

.. code-block:: python

   from fdfi.explainers import EOTExplainer

   explainer = EOTExplainer(
       model.predict,
       data=X_background,
       # Regularization
       auto_epsilon=True,      # Auto-tune from median distance
       epsilon=0.1,            # Manual epsilon (if auto_epsilon=False)
       
       # Transport target
       target="gaussian",      # or "empirical"
       
       # Stochastic transport
       stochastic_transport=True,
       n_transport_samples=10,
       
       # Cost function for mixed data
       cost_metric="sqeuclidean",  # or "gower", "auto"
   )

FlowExplainer (Flow-Based DFI)
------------------------------

**Best for:** Complex, non-Gaussian data where normalizing flows can capture the
underlying distribution structure

**Pros:**

- Handles complex, multimodal distributions
- Maps data to Gaussian latent space via learned normalizing flow
- Supports both CPI and SCPI (Sobol-CPI) methods with different averaging orders
- Flexible flow training and pre-trained model support

**Cons:**

- Requires PyTorch and torchdiffeq dependencies
- Flow training can be slow for large datasets

**Key options:**

.. code-block:: python

   from fdfi.explainers import FlowExplainer

   explainer = FlowExplainer(
       model.predict,
       data=X_background,
       
       # Flow fitting
       fit_flow=True,          # Fit flow during init (or fit later)
       num_steps=200,          # Flow training iterations
       
       # Method selection
       method='cpi',           # 'cpi', 'scpi', or 'both'
       
       # Counterfactual sampling
       nsamples=50,            # Monte Carlo samples per feature
       sampling_method='resample',  # 'resample', 'permutation', 'normal', 'condperm'
       
       # Reproducibility
       random_state=42,
   )
   
   results = explainer(X_test)

**Understanding CPI vs SCPI:**

- **CPI (Conditional Permutation Importance)**: Average predictions first, then 
  compute squared difference:
  
  .. math::
  
     \phi_j^{CPI} = (Y - E_b[f(\tilde{X}_b^{(j)})])^2
  
- **SCPI (Sobol-CPI)**: Compute squared differences first, then average (Sobol
  sensitivity index formulation):
  
  .. math::
  
     \phi_j^{SCPI} = E_b[(Y - f(\tilde{X}_b^{(j)}))^2]

**External flow models:**

.. code-block:: python

   from fdfi.models import FlowMatchingModel

   # Train flow externally with custom settings
   flow = FlowMatchingModel(X_background, dim=X_background.shape[1])
   flow.fit(num_steps=500, verbose='final')

   # Use pre-trained flow in explainer
   explainer = FlowExplainer(model.predict, X_background, fit_flow=False)
   explainer.set_flow(flow)

Shared Diagnostics (OT / EOT / Flow)
------------------------------------

All disentangled explainers expose a shared ``diagnostics`` payload:

- ``latent_independence_median`` with qualitative label
- ``distribution_fidelity_mmd`` with qualitative label

Lower is better for both metrics. Labels use the same thresholds across
explainers:

- ``GOOD``: dCor < 0.10, MMD < 0.05
- ``MODERATE``: dCor < 0.25, MMD < 0.15
- ``POOR``: otherwise

.. code-block:: python

   explainer = OTExplainer(model.predict, X_background)
   diag = explainer.diagnostics
   print(diag["latent_independence_median"], diag["latent_independence_label"])
   print(diag["distribution_fidelity_mmd"], diag["distribution_fidelity_label"])

TreeExplainer
-------------

**Best for:** Tree ensemble models (Random Forest, Gradient Boosting, XGBoost, 
LightGBM)

**Pros:**

- Optimized tree traversal algorithms
- Exact or approximate Shapley computation

**Note:** Currently a placeholder—full implementation coming soon.

.. code-block:: python

   from fdfi.explainers import TreeExplainer
   from sklearn.ensemble import RandomForestRegressor

   model = RandomForestRegressor().fit(X_train, y_train)
   explainer = TreeExplainer(model, data=X_background)

LinearExplainer
---------------

**Best for:** Linear models (Linear/Logistic Regression, Ridge, Lasso)

**Pros:**

- Exact Shapley values for linear models
- Very fast computation

**Note:** Currently a placeholder—full implementation coming soon.

.. code-block:: python

   from fdfi.explainers import LinearExplainer
   from sklearn.linear_model import LinearRegression

   model = LinearRegression().fit(X_train, y_train)
   explainer = LinearExplainer(model, data=X_background)

KernelExplainer
---------------

**Best for:** Any model where you have no prior knowledge of structure

**Pros:**

- Works with any callable model
- Fully model-agnostic

**Cons:**

- Slowest method
- Can have high variance

**Note:** Currently a placeholder—full implementation coming soon.

.. code-block:: python

   from fdfi.explainers import KernelExplainer

   explainer = KernelExplainer(model.predict, data=X_background)

Crossfitting (Cross-Fitted Inference)
-------------------------------------

**Best for:** Small-to-moderate sample sizes where valid confidence intervals
are critical

**Pros:**

- Eliminates overfitting bias in the disentanglement map
- Yields valid standard errors and CIs even at small *n*
- Works with any explainer class (``OTExplainer``, ``EOTExplainer``,
  ``FlowExplainer``)
- Supports any scikit-learn cross-validation splitter (``KFold``,
  ``StratifiedKFold``, ``ShuffleSplit``, ``RepeatedKFold``, ``GroupKFold``,
  etc.)

**Cons:**

- K× slower than a single explainer (fits one per fold)
- For ``FlowExplainer`` folds, this means K separate flow trainings

**Key options:**

.. code-block:: python

   from fdfi.explainers import Crossfitting, OTExplainer
   from sklearn.model_selection import RepeatedKFold

   # Default: 5-fold KFold
   cf = Crossfitting(
       model.predict,
       data=X_background,
       explainer_class=OTExplainer,
       cv=5,
       nsamples=50,
       random_state=42,
   )
   results = cf()          # cross-fit on X_background
   ci = cf.conf_int(alpha=0.05)
   cf.summary()

   # RepeatedKFold for lower-variance estimates
   cf = Crossfitting(
       model.predict, X_background,
       explainer_class=OTExplainer,
       cv=RepeatedKFold(n_splits=5, n_repeats=3, random_state=0),
       nsamples=50,
   )
   results = cf()

Hyperparameter Guidelines
-------------------------

nsamples
~~~~~~~~

Number of Monte Carlo samples for counterfactual estimation.

- **Low (10-30)**: Fast but high variance
- **Medium (50-100)**: Good balance (recommended)
- **High (200+)**: Low variance but slow

sampling_method
~~~~~~~~~~~~~~~

How to generate counterfactual feature values:

- ``"resample"``: Sample from background data (default, preserves marginal)
- ``"permutation"``: Permute within test set (no new values)
- ``"normal"``: Sample from standard normal (strong Gaussian assumption)

epsilon (EOTExplainer)
~~~~~~~~~~~~~~~~~~~~~~

Entropic regularization strength:

- **Small (0.01)**: Sharp transport, may be unstable
- **Medium (0.1)**: Good balance
- **Large (1.0+)**: Smooth transport, loses structure
- **auto_epsilon=True**: Recommended, auto-tunes from data

target (EOTExplainer)
~~~~~~~~~~~~~~~~~~~~~

Transport target distribution:

- ``"gaussian"``: Standard normal target (default)
- ``"empirical"``: Permuted data as target

Computing Confidence Intervals
------------------------------

All explainers support post-hoc confidence intervals:

.. code-block:: python

   # Compute importance
   results = explainer(X_test)

   # Get confidence intervals
   ci = explainer.conf_int(
       alpha=0.05,
       target="X",              # or "Z" for latent space
       alternative="two-sided", # or "greater", "less"
       var_floor_method="mixture",  # Stabilize small variances
       margin=0.0,              # Practical significance threshold
   )

   print("Significant features:", np.where(ci["reject_null"])[0])