Skillquality 0.70
causal-inference-mixtape
This skill should be used when the user asks to "implement a DiD regression", "write a causal inference pipeline", "set up an event study", "implement instrumental variables", "run a regression discontinuity design", "build a synthetic control model", "implement propensity score
Price
free
Protocol
skill
Verified
no
What it does
Causal Inference: The Mixtape — Code Skill
Practitioner-oriented causal inference skill built from Scott Cunningham's Causal Inference: The Mixtape repository. Covers 10 identification strategies with ready-to-run code templates in Python, R, and Stata.
Methods Covered
| Method | Python | R | Stata | Reference |
|---|---|---|---|---|
| OLS / Regression | statsmodels | estimatr | reg/reghdfe | references/method-patterns.md §1 |
| Difference-in-Differences | statsmodels + C() | lfe/fixest | xtreg/reghdfe | references/method-patterns.md §2 |
| Event Study (Dynamic DiD) | manual lead/lag | estimatr | reghdfe | references/method-patterns.md §3 |
| Staggered DiD / TWFE | statsmodels | bacondecomp | bacondecomp | references/method-patterns.md §4 |
| Regression Discontinuity | statsmodels polynomial | rdrobust | rdplot/rdrobust | references/method-patterns.md §5 |
| Instrumental Variables | linearmodels IV2SLS | AER/ivreg | ivregress 2sls | references/method-patterns.md §6 |
| Synthetic Control | rpy2 → R Synth | Synth + SCtools | synth | references/method-patterns.md §7 |
| Matching / PSM / IPW | manual logit + weights | MatchIt + Zelig | teffects/cem | references/method-patterns.md §8 |
| DAGs / Collider Bias | dagitty (conceptual) | dagitty/ggdag | — | references/method-patterns.md §9 |
| Randomization Inference | permutation loop | ri2 | ritest | references/method-patterns.md §10 |
Core Workflow
Implement a Causal Method
- Identify the method from the table above
- Load the appropriate template from
references/method-patterns.md - Adapt variable names, fixed effects, and clustering to the user's data
- Add robustness checks (parallel trends for DiD, McCrary for RDD, first-stage F for IV)
Choose the Right Language
| Scenario | Recommendation |
|---|---|
| ML pipeline integration | Python (statsmodels + linearmodels) |
| Synthetic Control | R (Synth package) or Stata (synth) — Python lacks mature implementation |
| Bacon decomposition | R (bacondecomp) or Stata — no Python equivalent |
| Publication-ready tables | Stata (outreg2/esttab) or R (stargazer/modelsummary) |
| Coarsened Exact Matching | Stata (cem) or R (MatchIt) — no Python equivalent |
| Quick prototyping | Python with statsmodels |
Cross-Language Equivalents
| Task | Python | R | Stata |
|---|---|---|---|
| OLS with robust SE | smf.ols().fit(cov_type='HC1') | lm_robust() | reg y x, robust |
| Cluster SE | fit(cov_type='cluster', cov_kwds={'groups': g}) | `felm(y ~ x | 0 |
| Two-way FE | C(id) + C(time) in formula | `felm(y ~ x | id + time)` |
| IV / 2SLS | IV2SLS.from_formula('y ~ 1 + exog + [endog ~ inst]') | `ivreg(y ~ exog | inst)` |
| DiD | C(treat)*C(post) | treat:post in formula | did_multiplegt or interaction |
Key Python Patterns
DiD with Cluster-Robust SE
import statsmodels.formula.api as smf
model = smf.ols('y ~ C(treated)*C(post) + controls', data=df)
results = model.fit(cov_type='cluster', cov_kwds={'groups': df['firm_id']})
Event Study (Lead/Lag)
# Create relative time dummies
for k in range(-4, 5):
col = f'rel_{k}' if k >= 0 else f'rel_m{abs(k)}'
df[col] = (df['relative_time'] == k).astype(int)
# Drop t=-1 as reference
formula = 'y ~ ' + ' + '.join([c for c in rel_cols if c != 'rel_m1']) + ' + C(id) + C(year)'
IV / 2SLS
from linearmodels.iv import IV2SLS
model = IV2SLS.from_formula('y ~ 1 + exog + [endog ~ instrument]', data=df)
results = model.fit(cov_type='clustered', clusters=df['cluster_var'])
Robustness Check Patterns
| Method | Required Checks |
|---|---|
| DiD | Parallel trends (event study plot), placebo treatment dates |
| RDD | McCrary density test, bandwidth robustness (half/double IK optimal), polynomial robustness |
| IV | First-stage F > 10, exclusion restriction argument, over-identification test |
| Synthetic Control | Pre-treatment RMSPE, placebo distribution, leave-one-out |
| Matching | Covariate balance table, caliper sensitivity |
Common Pitfalls
- TWFE with staggered treatment — standard two-way FE is biased when treatment timing varies. Use Bacon decomposition or Sun & Abraham / Callaway & Sant'Anna estimators.
- Synthetic Control with many treated units — the Synth package handles one treated unit. For multiple, use augmented synthetic control or stacked approach.
- RDD without McCrary test — always test for manipulation at the cutoff before estimating.
- IV weak instruments — report first-stage F-statistic. Below 10 indicates weak instrument bias.
- Python Synth gap — no mature Python Synth package exists. Use
rpy2to call R'sSynthfrom Python.
Additional Resources
Reference Files
references/method-patterns.md— Detailed code templates for all 10 methods with full examplesreferences/r-stata-comparison.md— Cross-language package comparison and method coverage gaps
Prompt Files
prompts/01-implement-method.md— Copy-paste prompt for implementing any causal methodprompts/02-robustness-checks.md— Copy-paste prompt for generating robustness check code
Capabilities
skillsource-brycewang-stanfordskill-10-jill0099-causal-inference-mixtapetopic-academic-researchtopic-agent-skillstopic-ai-agenttopic-awesome-listtopic-communicationtopic-copapertopic-economicstopic-educationtopic-empirical-researchtopic-international-relationstopic-political-sciencetopic-psychology
Install
Installnpx skills add brycewang-stanford/Awesome-Agent-Skills-for-Empirical-Research
Transportskills-sh
Protocolskill
Quality
0.70/ 1.00
deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 598 github stars · SKILL.md body (5,391 chars)
Provenance
Indexed fromgithub
Enriched2026-05-02 12:52:55Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-05-02