statspai
Agent-native one-stop toolkit for the full empirical data-analysis pipeline in Python (v1.6+). 390+ functions, one import (`import statspai as sp`), unified API. Covers the complete loop after data cleaning — descriptive stats & EDA (sp.sumstats, sp.balance_table, sp.balance_pane
What it does
StatsPAI: Agent-Native Causal Inference & Econometrics
StatsPAI is the agent-native Python package for causal inference and applied econometrics. One import statspai as sp, 390+ functions, covering the complete empirical research workflow.
Source: https://github.com/brycewang-stanford/StatsPAI
PyPI: pip install statspai
Paper: Submitted to Journal of Open Source Software (JOSS) — under review
Why StatsPAI for Agents?
StatsPAI is the first econometrics toolkit purpose-built for LLM-driven empirical research:
- Self-describing API:
sp.list_functions(),sp.describe_function("did"),sp.function_schema("rdrobust")— agents discover and understand functions without doc lookup. - Unified result objects: Every estimator returns a
CausalResultwith.summary(),.plot(),.to_latex(),.to_word(),.to_excel(),.cite(), and a structured.diagnosticsdict for agent logic. - One import covers the full pipeline: descriptive stats → research-question DSL → DAG discovery → estimation → diagnostics → robustness, all behind
sp.<func>. - Estimand-first decisions:
sp.causal_questionandsp.causalmake the "DID vs RD vs IV?" choice explicit and defensible — not a guess.
End-to-End Empirical Pipeline (v1.6 P1) — the one-stop flow
The canonical agent loop, after the dataset is loaded:
Step 0 Data cleaning pandas (NOT StatsPAI — see Scope below)
Step 1 EDA & descriptives sp.sumstats / sp.balance_table / sp.describe
Step 2 Pre-flight checks sp.balance_panel / sp.diagnose / overlap & missing
Step 3 Research question sp.causal_question(...).identify()
Step 4 Model exploration sp.llm_dag_propose → sp.llm_dag_validate → sp.llm_dag_constrained
Step 5 Estimation sp.causal(...) OR sp.<specific_estimator>(...)
Step 6 Diagnostics & robust sp.diagnose / sp.spec_curve / sp.honest_did / sp.evalue
sp.paper() exists for end-to-end draft generation but is out of scope for this skill — stop at Step 6 and hand the CausalResult back to the user.
Note on code blocks below. All Step 0–6 examples share one labor-economics running narrative (
training → wage, withworker_id/firm_id/year/age/edu/tenure) purely for readability. Every column name,populationstring,estimand, anddesignvalue is illustrative — agents must substitute the actual columns and design from the user's DataFrame and research question. Only thesp.*function names and argument shapes are normative.
Scope boundary — what StatsPAI does NOT do
StatsPAI assumes you arrive with an analysis-ready DataFrame. Do these in pandas (or your preferred ETL) before calling any sp.* function:
import pandas as pd
df = pd.read_csv("raw.csv")
df = df.dropna(subset=["y", "treatment"]) # missing on key vars
df["year"] = pd.to_numeric(df["year"], errors="coerce") # type coercion
df = df.merge(covariates, on="firm_id", how="left") # joins
df["log_wage"] = np.log(df["wage"].clip(lower=1)) # transforms
If the agent skips Step 0 and feeds dirty data into sp.*, estimators will either error or silently drop rows — both are bugs you own, not StatsPAI's.
Step 1 — Descriptive statistics & EDA
sp.sumstats(df, vars=["wage", "edu", "exp"], by="treated", output="table1.docx")
sp.describe(df) # variable labels & types
sp.balance_table(df, treat="treated",
covariates=["age", "edu", "income"], test="ttest") # Table 1
Step 2 — Pre-flight checks (catch design failures before estimation)
# Panel structure — balance_panel DROPS unbalanced units (returns the balanced panel)
balanced = sp.balance_panel(df, entity="firm_id", time="year")
if len(balanced) < len(df):
print(f"Dropped {len(df) - len(balanced)} rows from unbalanced units — "
"decide drop vs keep before DID.")
# Treatment timing (staggered adoption?)
df.groupby("first_treat_year").size() # cohort sizes
# Covariate & outcome sanity (generic EDA, returns a dict report)
report = sp.diagnose(df, y="wage", x=["age", "edu", "tenure"])
# Full identification-level diagnostics (PT, weak IV, overlap, leverage) run
# automatically inside sp.causal(...) in Step 5 — don't duplicate here.
Step 3 — Estimand-first research question (the "DID vs RD vs IV?" decision)
sp.causal_question is the estimand-first DSL: declare population, treatment, outcome, estimand, design — then .identify() picks the estimator and writes down the assumptions you're committing to. This is the decision layer; sp.spec_curve is the results-layer multiverse and is not a substitute.
q = sp.causal_question(
treatment="training",
outcome="wage",
data=df,
population="manufacturing workers, 2010–2020",
estimand="ATT",
design="auto", # or "did" / "rdd" / "iv" / "rct" / "obs"
time_structure="panel",
time="year", id="worker_id",
covariates=["age", "edu", "tenure"],
)
plan = q.identify() # IdentificationPlan: estimator + assumptions + fallbacks
print(plan) # human-readable Methods paragraph
result = q.estimate() # runs the picked estimator
print(result.summary())
Step 4 — Model exploration via LLM-assisted DAG (closed loop)
# Propose: LLM (or heuristic backend) drafts a DAG from variable names + domain
proposal = sp.llm_dag_propose(
variables=df.columns.tolist(),
domain="labor economics: training, wages, tenure",
client=my_llm_client, # any object with .complete(prompt) -> str; None = heuristic
)
# Validate: per-edge CI test against the data
validation = sp.llm_dag_validate(proposal, df, alpha=0.05)
print(validation.edge_evidence) # which proposed edges the data supports
# Constrained discovery: propose → constrain → CI-validate → demote, until stable
discovered = sp.llm_dag_constrained(
df,
descriptions={"wage": "monthly wage USD", "training": "0/1 program"},
oracle=my_llm_client.suggest_edges, # optional; falls back to plain PC without it
max_iter=3,
)
Pass the resulting DAG into sp.causal(..., dag=discovered.dag) so identification reasoning uses it.
Step 5 — Estimation (one-call orchestration OR specific estimator)
One-call (recommended for agents):
w = sp.causal(df, y="wage", treatment="training",
id="worker_id", time="year", design="did",
covariates=["age", "edu", "tenure"],
dag=discovered.dag) # optional
print(w.diagnostics) # identification verdict
print(w.recommendation) # which estimator + why
print(w.result.summary()) # point estimate + SE + CI
print(w.robustness_findings) # automated robustness battery
Or call a specific estimator directly — see "Method Catalog" below.
Step 6 — Diagnostics & robustness
sp.diagnose_result(result) # PT / weak-IV / overlap / leverage battery on a fitted result
sp.honest_did(result, method="smoothness") # Rambachan–Roth PT sensitivity (DID)
# E-value takes point estimate + CI (or SE), NOT a result object. Extract from .params / .conf_int().
sp.evalue(estimate=result.params["training"],
ci=tuple(result.conf_int().loc["training"]),
measure="RR") # unmeasured-confounder sensitivity
# Specification curve: controls must be a LIST OF LISTS (each inner list = one spec).
sp.spec_curve(df, y="wage", x="training",
controls=[["age"],
["age", "edu"],
["age", "edu", "tenure"]],
subsets={"all": None, "manuf": df["industry"].eq("manufacturing")})
sp.bacon_decomposition(df, y="wage", treat="training",
time="year", id="worker_id") # TWFE diagnostic for staggered DID
sp.estat(result, test="all") # Stata-style postestimation (single test name or 'all')
Method Catalog
Classical Econometrics
# regress / ivreg take FORMULA first, then data (statsmodels-style)
sp.regress("y ~ x1 + x2", df, cluster="firm_id") # OLS
sp.ivreg("y ~ (x1 ~ z1 + z2) + x2", df, cluster="state") # IV/2SLS — syntax: (endog ~ instruments) + exog
sp.panel(df, "y ~ x1 + x2", entity="firm", time="year", method="fe") # Panel FE (kwarg is `method=`)
sp.heckman(df, y="wage", x=["age", "edu"], # Heckman selection
select="in_labor_force", z=["marital", "kids"]) # y/x for outcome, select+z for selection
sp.qreg(df, formula="y ~ x1 + x2", quantile=0.5) # Quantile regression
Difference-in-Differences
# 2x2 DID: `time` must take exactly 2 values (a post indicator works).
sp.did(df, y="y", treat="treated", time="post")
# Staggered DID estimators REQUIRE a unit-id column `i=`.
sp.callaway_santanna(df, y="y", g="first_treat_year", t="year", i="firm_id") # CS 2021
sp.sun_abraham(df, y="y", g="first_treat_year", t="year", i="firm_id") # SA 2021 event study
sp.bacon_decomposition(df, y="y", treat="treated", time="year", id="firm_id") # TWFE diagnostic
sp.honest_did(result, method="smoothness") # PT sensitivity (RR 2023)
sp.continuous_did(df, y="y", dose="dose", time="year", id="firm_id") # Continuous treatment
Regression Discontinuity
# Cutoff argument is `c=` (not `cutoff=`) across rdrobust / rkd.
sp.rdrobust(df, y="y", x="running_var", c=0) # Sharp RD (CCT 2014)
sp.rdrobust(df, y="y", x="running_var", c=0, fuzzy="treatment") # Fuzzy RD
sp.rddensity(df, x="running_var", c=0) # McCrary density test
sp.rdmc(df, y="y", x="running_var", cutoffs=[0, 5, 10]) # Multi-cutoff RD
sp.rkd(df, y="y", x="running_var", c=0) # Regression kink design
Matching & Reweighting
# Signature is (data, y, treat, covariates, ...) — outcome comes BEFORE treat.
sp.match(df, y="wage", treat="training", covariates=["age", "edu"], method="nearest") # PSM (method='nearest' is default)
sp.match(df, y="wage", treat="training", covariates=["age", "edu"], method="cem") # Coarsened exact matching
sp.ebalance(df, y="wage", treat="training", covariates=["age", "edu"]) # Entropy balancing
Synthetic Control
# Kwarg is treatment_time (NOT treated_period); treated_unit / treatment_time are singular.
sp.synth(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000) # ADH SCM (method='augmented' default)
sp.sdid(df, outcome="y", unit="unit", time="time",
treated_unit=1, treatment_time=2000) # Synthetic DID (Arkhangelsky et al. 2021)
Machine Learning Causal Inference
sp.dml(df, y="wage", treat="training", covariates=["age", "edu"], model="plr") # Double/Debiased ML
sp.causal_forest(formula="wage ~ training | age + edu", data=df) # Causal Forest (GRF) — formula API
sp.metalearner(df, y="wage", treat="training", covariates=["age", "edu"], learner="dr") # DR-Learner
sp.tmle(df, y="wage", treat="training", covariates=["age", "edu"]) # Targeted MLE
sp.aipw(df, y="wage", treat="training", covariates=["age", "edu"]) # Augmented IPW
Neural Causal Models
sp.tarnet(df, y="wage", treat="training", covariates=["age", "edu"]) # TARNet
sp.cfrnet(df, y="wage", treat="training", covariates=["age", "edu"]) # CFRNet
sp.dragonnet(df, y="wage", treat="training", covariates=["age", "edu"]) # DragonNet
Text Causal (v1.6 P1, experimental)
# text_treatment_effect: pass the TEXT COLUMN name (text_col=), not pre-computed embeddings.
# The `embedder=` argument controls how text is vectorised ('hash' default, or a callable).
sp.causal_text.text_treatment_effect( # Veitch–Wang–Blei (2020) text-as-treatment
df, text_col="doc", outcome="y", treatment="t",
covariates=["age", "edu"], embedder="hash", n_components=20)
# llm_annotator_correct: takes aligned pd.Series (NOT a DataFrame).
# annotations_human must be same length as annotations_llm — mark unlabelled rows with NaN
# if you only have human labels on a validation subset.
sp.causal_text.llm_annotator_correct( # Egami–Hinck–Stewart–Wei (2024)
annotations_llm=df["t_llm"], # LLM-annotated treatment (all rows)
annotations_human=df["t_true"], # human labels aligned; NaN where missing
outcome=df["y"],
covariates=df[["age", "edu"]],
method="hausman")
Robustness & Workflow
# spec_curve: controls is List[List[str]] (each inner list = one spec), not a flat list.
sp.spec_curve(df, y="wage", x="training",
controls=[["age"], ["age", "edu"], ["age", "edu", "tenure"]])
# robustness_report: takes (data, formula, x, ...) — NOT a result object.
sp.robustness_report(df, formula="wage ~ training + age + edu",
x="training", cluster_var="firm_id")
# subgroup_analysis: (data, formula, x, by={<label>: <col>}, ...). Wald test is automatic.
sp.subgroup_analysis(df, formula="wage ~ training + age + edu",
x="training", by={"gender": "female", "age_bin": "age_quartile"})
result.to_latex() # Export to LaTeX
result.to_word("output.docx") # Export to Word
result.cite() # Auto-generate citation
Interactive Visualization (v0.6+)
fig = result.plot()
sp.interactive(fig) # Stata Graph Editor-style WYSIWYG editing, 29 academic themes
Agent Integration Pattern
import statspai as sp
# Step 1: Discover available functions
functions = sp.list_functions()
# Step 2: Understand a specific function
info = sp.describe_function("callaway_santanna")
# Step 3: Get JSON schema for structured calls
schema = sp.function_schema("callaway_santanna")
# Step 4: Execute and get structured results (callaway_santanna requires `i=` unit id)
result = sp.callaway_santanna(df, y="y", g="first_treat_year", t="year", i="firm_id")
print(result.summary())
result.to_latex("tables/did_results.tex")
When to Use StatsPAI vs Other Packages
| Scenario | Use StatsPAI | Alternative |
|---|---|---|
| One-stop empirical pipeline (EDA → estimand → DAG → estimate → robustness) | ✅ Single import covers all six steps | Assemble 10+ R/Python packages |
| Agent-driven analysis with self-describing API | ✅ list_functions / describe_function / function_schema | pyfixest, statsmodels (no agent API) |
| "DID vs RD vs IV?" decision (estimand-first) | ✅ sp.causal_question + sp.causal recommender | None — usually a manual judgement call |
| LLM-assisted DAG discovery loop | ✅ sp.llm_dag_propose / validate / constrained | causal-learn (no LLM oracle integration) |
| Staggered DID with diagnostics | ✅ CS + SA + Bacon + HonestDID in one place | differences (partial) |
| Neural causal models | ✅ TARNet / CFRNet / DragonNet | econml (partial) |
| Stata users migrating to Python | ✅ Stata-equivalent names (sp.regress, sp.estat, sp.sumstats) | linearmodels (limited) |
Validation and Error Handling
After running any estimation, check the result object before proceeding:
result = sp.did(df, "y", "treated", "post")
# Always inspect the human-readable summary first
print(result.summary())
# Inspect structured diagnostics for agent-driven logic
if hasattr(result, "diagnostics"):
print(result.diagnostics)
Common pitfalls to guard against:
- Convergence warnings — surfaced in
result.summary()(check before trusting SEs). - Weak instruments for IV — require first-stage F ≥ 10 (Stock–Yogo rule of thumb); StatsPAI exposes this in
result.diagnostics["First-stage F (<endog>)"]. - Parallel trends for DID — run an event study and verify pre-treatment coefficients are statistically indistinguishable from zero; follow up with
sp.honest_did(result)for Rambachan–Roth sensitivity. - Bandwidth sensitivity for RDD — re-run
sp.rdrobustat half and double the MSE-optimal bandwidth; agreement within one SE is reassuring. - Missing data — StatsPAI drops rows with missing values by default; check
result.data_info["n_obs"]matches your expected sample size. - Overlap / common support — for matching, DML, and meta-learners, inspect propensity-score distributions before interpreting CATEs.
Capabilities
Install
Quality
deterministic score 0.58 from registry signals: · indexed on github topic:agent-skills · 263 github stars · SKILL.md body (16,909 chars)