Transportability and Policy Learning

This vignette demonstrates two advanced applications of the causaldef framework: 1. Transportability: Generalizing experimental results to a new target population. 2. Policy Learning Bounds: Quantifying the limits of decision-making under confounding.

We utilize classical datasets (Lalonde NSW and Right Heart Catheterization) to illustrate these concepts.

library(causaldef)
library(stats)

# Helper for plot resizing
if (!exists("deparse1", envir = baseenv())) {
  deparse1 <- function(expr, collapse = " ", width.cutoff = 500L, ...) {
    paste(deparse(expr, width.cutoff, ...), collapse = collapse)
  }
}

1. Transportability: Lalonde’s Job Training

A common challenge in causal inference is external validity: Can we apply the results of a Randomized Controlled Trial (RCT) to a diferent target population?

We use the Lalonde dataset to simulate a transportability problem. * Source Population (\(S=1\)): The NSW experimental participants (typically disjoint from the general population). * Target Population (\(S=0\)): The CPS comparison group (representative of the broader population).

data("nsw_benchmark")

# Define Source: Experimental Sample
source_data <- subset(nsw_benchmark, sample_id %in% c("nsw_treated", "nsw_control"))

# Define Target: CPS Control Group (Broader population)
target_data <- subset(nsw_benchmark, sample_id == "cps_control")

# Covariates available for transport
transport_vars <- c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75")

# Comparison of demographics
print(summary(source_data[, c("age", "education", "re74")]))
#>       age          education         re74        
#>  Min.   :17.00   Min.   : 3.0   Min.   :    0.0  
#>  1st Qu.:20.00   1st Qu.: 9.0   1st Qu.:    0.0  
#>  Median :24.00   Median :10.0   Median :    0.0  
#>  Mean   :25.37   Mean   :10.2   Mean   : 2102.3  
#>  3rd Qu.:28.00   3rd Qu.:11.0   3rd Qu.:  824.4  
#>  Max.   :55.00   Max.   :16.0   Max.   :39570.7
print(summary(target_data[, c("age", "education", "re74")]))
#>       age          education          re74      
#>  Min.   :16.00   Min.   : 0.00   Min.   :    0  
#>  1st Qu.:24.00   1st Qu.:11.00   1st Qu.: 4403  
#>  Median :31.00   Median :12.00   Median :15124  
#>  Mean   :33.23   Mean   :12.03   Mean   :14017  
#>  3rd Qu.:42.00   3rd Qu.:13.00   3rd Qu.:23584  
#>  Max.   :55.00   Max.   :18.00   Max.   :25862

The target population (CPS) is significantly wealthier (re74 mean is much higher) and slightly older. We want to know: What would be the effect of job training if applied to the CPS population?

Transport Deficiency

We calculate the Transport Deficiency \(\delta(E_S, E_T)\). This measures how much information is lost due to the distributional shift between Source and Target.

# Create causal specification for the SOURCE
source_spec <- causal_spec(
  data = source_data,
  treatment = "treat",
  outcome = "re78",
  covariates = transport_vars
)
#> ✔ Created causal specification: n=445, 8 covariate(s)

# Compute Transport Deficiency
trans_def <- transport_deficiency(
  source_spec,
  target_data = target_data,
  transport_vars = transport_vars,
  method = "iptw",
  n_boot = 50 # Low for vignette speed
)
#> ✖ Transport proxy: 0.995 (ESS: 13.3%)
#> ℹ Severe distribution shift; transport may be unreliable

print(trans_def)
#> 
#> ── Transport Diagnostic Analysis ───────────────────────────────────────────────
#> Method: "iptw"
#> Source n: 445 | Target n: 15992
#> 
#> ── Transport Diagnostic ──
#> 
#>   transport proxy:       0.9954
#>   Standard error:        0.0250
#>   95% CI:               [1.1735, 1.2776]
#>   Effective sample size: 59.0 (13.3% of source)
#>   Note: this is a heuristic transport-risk proxy, not an exact transport deficiency.
#> ── Covariate Shift ──
#>            variable shift_metric severity
#> age             age   0.84595761   severe
#> education education   0.76555133   severe
#> black         black   2.36239417   severe
#> hispanic   hispanic   0.05755963      low
#> married     married   1.30665468   severe
#> nodegree   nodegree   1.11659926   severe
#> re74           re74   1.53592792   severe
#> re75           re75   1.77276154   severe
#> ── Effect Estimates ──
#>   ATE (source):     1676.3426
#>   ATE (transport):  778.8984
plot(trans_def, type = "shift")

Interpretation: * Covariate Shift: The plot shows which variables differ most (likely re74 and re75). * Transported ATE: The estimated effect in the target population. * Deficiency: A low delta implies we can reliably transport the result. A high delta warns that the populations are too distinct (lack of overlap or extreme weights).


2. Policy Learning Bounds: RHC

In Policy Learning, we seek an optimal treatment rule \(\pi(X)\) to maximize utility. However, with observational data, our estimate of a policy’s value is biased by confounding.

We use the Right Heart Catheterization (RHC) dataset to evaluate a risk-based policy. * Decision: Treat with RHC? * Outcome: 30-day Mortality (lower is better). * Policy: “Treat only high-risk patients” (e.g., APACHE score > 50).

data("rhc")

# Preprocessing
if (is.factor(rhc$swang1)) rhc$treat <- as.numeric(rhc$swang1) - 1 else rhc$treat <- rhc$swang1
if (is.factor(rhc$dth30)) rhc$outcome <- as.numeric(rhc$dth30) - 1 else rhc$outcome <- rhc$dth30

# Variables for adjustment
covariates <- c("age", "sex", "race", "aps1", "cat1") 

spec_rhc <- causal_spec(
  data = rhc,
  treatment = "treat",
  outcome = "outcome",
  covariates = covariates
)
#> ✔ Created causal specification: n=5735, 5 covariate(s)

Policy Evaluation

We compare two policies: 1. Treat All: Everyone gets RHC. 2. Risk-Based: Treat only if APACHE III score (aps1) > 50.

We estimate the observational value of these policies using IPW.

The Safety Floor

Even if the Risk-Based policy looks better, can we trust it? The Safety Floor tells us the worst-case error in our value estimate due to unmeasured confounding.

Conclusion: * The Safety Floor represents the irreducible uncertainty. * If the difference between Treat All and Risk-Based is smaller than the safety floor, we cannot be confident the new policy is actually superior to the baseline, regardless of sample size. * This illustrates the fundamental limit of offline policy learning from observational data.