Skillquality 0.70

causal-inference-mixtape

This skill should be used when the user asks to "implement a DiD regression", "write a causal inference pipeline", "set up an event study", "implement instrumental variables", "run a regression discontinuity design", "build a synthetic control model", "implement propensity score

Price
free
Protocol
skill
Verified
no

What it does

Causal Inference: The Mixtape — Code Skill

Practitioner-oriented causal inference skill built from Scott Cunningham's Causal Inference: The Mixtape repository. Covers 10 identification strategies with ready-to-run code templates in Python, R, and Stata.


Methods Covered

MethodPythonRStataReference
OLS / Regressionstatsmodelsestimatrreg/reghdfereferences/method-patterns.md §1
Difference-in-Differencesstatsmodels + C()lfe/fixestxtreg/reghdfereferences/method-patterns.md §2
Event Study (Dynamic DiD)manual lead/lagestimatrreghdfereferences/method-patterns.md §3
Staggered DiD / TWFEstatsmodelsbacondecompbacondecompreferences/method-patterns.md §4
Regression Discontinuitystatsmodels polynomialrdrobustrdplot/rdrobustreferences/method-patterns.md §5
Instrumental Variableslinearmodels IV2SLSAER/ivregivregress 2slsreferences/method-patterns.md §6
Synthetic Controlrpy2 → R SynthSynth + SCtoolssynthreferences/method-patterns.md §7
Matching / PSM / IPWmanual logit + weightsMatchIt + Zeligteffects/cemreferences/method-patterns.md §8
DAGs / Collider Biasdagitty (conceptual)dagitty/ggdagreferences/method-patterns.md §9
Randomization Inferencepermutation loopri2ritestreferences/method-patterns.md §10

Core Workflow

Implement a Causal Method

  1. Identify the method from the table above
  2. Load the appropriate template from references/method-patterns.md
  3. Adapt variable names, fixed effects, and clustering to the user's data
  4. Add robustness checks (parallel trends for DiD, McCrary for RDD, first-stage F for IV)

Choose the Right Language

ScenarioRecommendation
ML pipeline integrationPython (statsmodels + linearmodels)
Synthetic ControlR (Synth package) or Stata (synth) — Python lacks mature implementation
Bacon decompositionR (bacondecomp) or Stata — no Python equivalent
Publication-ready tablesStata (outreg2/esttab) or R (stargazer/modelsummary)
Coarsened Exact MatchingStata (cem) or R (MatchIt) — no Python equivalent
Quick prototypingPython with statsmodels

Cross-Language Equivalents

TaskPythonRStata
OLS with robust SEsmf.ols().fit(cov_type='HC1')lm_robust()reg y x, robust
Cluster SEfit(cov_type='cluster', cov_kwds={'groups': g})`felm(y ~ x0
Two-way FEC(id) + C(time) in formula`felm(y ~ xid + time)`
IV / 2SLSIV2SLS.from_formula('y ~ 1 + exog + [endog ~ inst]')`ivreg(y ~ exoginst)`
DiDC(treat)*C(post)treat:post in formuladid_multiplegt or interaction

Key Python Patterns

DiD with Cluster-Robust SE

import statsmodels.formula.api as smf

model = smf.ols('y ~ C(treated)*C(post) + controls', data=df)
results = model.fit(cov_type='cluster', cov_kwds={'groups': df['firm_id']})

Event Study (Lead/Lag)

# Create relative time dummies
for k in range(-4, 5):
    col = f'rel_{k}' if k >= 0 else f'rel_m{abs(k)}'
    df[col] = (df['relative_time'] == k).astype(int)

# Drop t=-1 as reference
formula = 'y ~ ' + ' + '.join([c for c in rel_cols if c != 'rel_m1']) + ' + C(id) + C(year)'

IV / 2SLS

from linearmodels.iv import IV2SLS

model = IV2SLS.from_formula('y ~ 1 + exog + [endog ~ instrument]', data=df)
results = model.fit(cov_type='clustered', clusters=df['cluster_var'])

Robustness Check Patterns

MethodRequired Checks
DiDParallel trends (event study plot), placebo treatment dates
RDDMcCrary density test, bandwidth robustness (half/double IK optimal), polynomial robustness
IVFirst-stage F > 10, exclusion restriction argument, over-identification test
Synthetic ControlPre-treatment RMSPE, placebo distribution, leave-one-out
MatchingCovariate balance table, caliper sensitivity

Common Pitfalls

  1. TWFE with staggered treatment — standard two-way FE is biased when treatment timing varies. Use Bacon decomposition or Sun & Abraham / Callaway & Sant'Anna estimators.
  2. Synthetic Control with many treated units — the Synth package handles one treated unit. For multiple, use augmented synthetic control or stacked approach.
  3. RDD without McCrary test — always test for manipulation at the cutoff before estimating.
  4. IV weak instruments — report first-stage F-statistic. Below 10 indicates weak instrument bias.
  5. Python Synth gap — no mature Python Synth package exists. Use rpy2 to call R's Synth from Python.

Additional Resources

Reference Files

  • references/method-patterns.md — Detailed code templates for all 10 methods with full examples
  • references/r-stata-comparison.md — Cross-language package comparison and method coverage gaps

Prompt Files

  • prompts/01-implement-method.md — Copy-paste prompt for implementing any causal method
  • prompts/02-robustness-checks.md — Copy-paste prompt for generating robustness check code

Capabilities

skillsource-brycewang-stanfordskill-10-jill0099-causal-inference-mixtapetopic-academic-researchtopic-agent-skillstopic-ai-agenttopic-awesome-listtopic-communicationtopic-copapertopic-economicstopic-educationtopic-empirical-researchtopic-international-relationstopic-political-sciencetopic-psychology

Install

Quality

0.70/ 1.00

deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 598 github stars · SKILL.md body (5,391 chars)

Provenance

Indexed fromgithub
Enriched2026-05-02 12:52:55Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-05-02

Agent access

causal-inference-mixtape — Clawmart · Clawmart