{"id":"368583f4-85b3-4ec1-87ca-aa457e548543","shortId":"3p4jXE","kind":"skill","title":"replicate-study","tagline":"Replicate an existing cohort study's methodology on a different database. Extracts study design from a source paper, maps variables to the target DB via harmonization table, generates analysis code, and produces a replication difference report.","description":"# Replicate Study Skill\n\nYou are assisting a medical researcher in replicating an existing published study's methodology\non a different database. This is a common research strategy: take a validated methodology from\nPaper A (e.g., NHIS cohort study) and apply it to Database B (e.g., KNHANES, NHANES, or another\ncohort) to produce a new paper with the same analytical rigor.\n\n## When to Use\n\n- Researcher has a published paper they want to replicate on their own data\n- Swapping exposure/outcome variables within the same DB\n- Cross-national replication (e.g., Korean study → US data, or vice versa)\n- Extending a single-institution study to a national cohort\n\n## Inputs\n\n1. **Source paper**: PDF, DOI, or markdown of the paper to replicate\n2. **Target database path**: CSV/SAS data file(s) to use\n3. **Harmonization table** (optional): CSV mapping source → target variables\n   - Default: `${SKILL_DIR}/references/harmonization_knhanes_nhanes.csv` (if KNHANES↔NHANES)\n\n## Reference Files\n\n- `${SKILL_DIR}/references/methodology_extraction_template.md` — checklist for extracting study design\n- `${SKILL_DIR}/references/harmonization_knhanes_nhanes.csv` — KNHANES↔NHANES variable mapping (67 rows)\n- `${SKILL_DIR}/references/harmonization_3country.csv` — KNHANES+NHANES+CHNS 3-country mapping (45 rows, if available)\n- Upstream templates (read on demand):\n  - `medsci-skills/skills/write-paper/references/paper_types/nhis_cohort.md`\n  - `medsci-skills/skills/write-paper/references/paper_types/cross_national.md`\n  - `medsci-skills/skills/analyze-stats/references/analysis_guides/survey_weighted.md`\n  - `medsci-skills/skills/analyze-stats/references/analysis_guides/propensity_score.md`\n\n## Workflow\n\n### Phase 1: Source Paper Analysis\n\n1. Read the source paper (PDF → text, or markdown).\n2. Extract methodology using the extraction template:\n   - **Study design**: cohort / cross-sectional / case-control\n   - **Database**: name, country, years, N\n   - **Population**: inclusion/exclusion criteria, age range\n   - **Exposure**: variable name, definition, coding\n   - **Outcome**: variable name, definition, coding\n   - **Covariates**: full list with definitions\n   - **Statistical methods**: regression type, adjustment model, subgroup analyses\n   - **Survey design**: weights, strata, PSU (if applicable)\n   - **Sensitivity analyses**: list all\n3. Output: structured extraction summary for user review.\n\n### Phase 2: Variable Mapping\n\n1. Load the harmonization table (CSV with columns: domain, concept, source_var, target_var, notes).\n2. For each extracted variable (exposure, outcome, covariates):\n   - Find the matching row in the harmonization table\n   - Flag: DIRECT_MATCH / RECODE_NEEDED / NOT_AVAILABLE / PROXY_AVAILABLE\n3. Generate a **mapping report**:\n   - Green: directly available (no recoding)\n   - Yellow: available but needs recoding (document transformation)\n   - Red: not available in target DB (propose proxy or exclusion)\n4. Output: variable mapping table for user approval.\n\n### Phase 3: Code Generation\n\n1. Generate analysis code (Python with `pandas` + R via `subprocess` for survey-weighted):\n   a. **Data loading & cleaning**: read target DB, apply inclusion/exclusion\n   b. **Variable derivation**: recode variables per mapping table\n   c. **Survey design setup**: define svydesign object (strata, PSU, weights)\n   d. **Table 1**: demographics by exposure group (weighted)\n   e. **Main analysis**: replicate the primary model (logistic/Cox/linear regression)\n   f. **Subgroup analyses**: if specified in source paper\n   g. **Sensitivity analyses**: replicate all listed in source paper\n2. Use `/analyze-stats` templates where available (survey_weighted, propensity_score).\n3. All code must be self-contained and reproducible.\n\n### Phase 4: Difference Report\n\nGenerate a structured difference report documenting:\n\n| Section | Content |\n|---------|---------|\n| Study Design | Same / Modified (explain) |\n| Database | Source DB → Target DB (N, years, country) |\n| Population | Inclusion/exclusion differences |\n| Variable Mapping | Full mapping table with match status |\n| Unavailable Variables | What's missing and how handled |\n| Methodological Differences | Any forced changes (e.g., BMI cutoffs, LDL calculation) |\n| Expected Differences | Why results may differ (population, measurement, cultural) |\n\nSave as `replication_report.md` in the working directory.\n\n### Phase 5: Validation Checklist\n\nBefore reporting completion, verify:\n\n- [ ] All source paper covariates accounted for (mapped, proxied, or documented as missing)\n- [ ] Survey weights correctly applied (NEVER analyze unweighted if source used weights)\n- [ ] Obesity/BMI cutoffs match target population standards (Asian vs WHO)\n- [ ] Fasting requirements matched (fasting glucose, lipids)\n- [ ] Age restrictions applied correctly\n- [ ] Code runs without errors on target data\n- [ ] Output tables match source paper structure\n\n## Critical Rules\n\n1. **Never pool data across surveys**. Analyze each country's data with its own survey design.\n2. **Document every deviation** from the source methodology in the difference report.\n3. **Asian BMI cutoffs** (≥25 for obesity) when analyzing Korean data, even if source used WHO (≥30).\n4. **LDL calculation**: note if source used direct measurement vs Friedewald.\n5. **Weighted analysis is mandatory** for KNHANES/NHANES — never run unweighted models.\n6. **IRB**: note that KNHANES/NHANES are de-identified public data (IRB exempt or waived).\n7. **Outdated source definitions**: if the source paper used a pre-2023 definition that has since been superseded (e.g., NAFLD → MASLD 2023, CKD-EPI 2009 → 2021 race-free), call `/define-variables` to cross-check whether to mirror the legacy definition (pure replication) or upgrade to current (extension). Document the choice explicitly in the difference report.\n\n## Output Files\n\n```\n{working_dir}/\n├── replication_report.md     — Structured difference report\n├── variable_mapping.csv      — Variable mapping table with match status\n├── analysis_code.py          — Main analysis script (Python + R calls)\n├── analysis_code.R           — R script for survey-weighted analysis\n└── results/\n    ├── table1.csv            — Demographics table\n    ├── main_results.csv      — Primary analysis results\n    └── subgroup_results.csv  — Subgroup analysis results (if applicable)\n```\n\n## Example Invocation\n\n```\n/replicate-study\n\nSource paper: Joo 2026 (Psychiatry Research) — depression/diabetes cross-national\nTarget DB: /path/to/knhanes/HN18.csv\nHarmonization: /path/to/harmonization_knhanes_nhanes.csv\n```\n\n## Anti-Hallucination\n\n- **Never fabricate variable names, dataset column names, or variable codings.** If a variable mapping is uncertain, output `[VERIFY: variable_name]` and ask the user to confirm against the data dictionary.\n- **Never fabricate statistical results** — no invented p-values, effect sizes, confidence intervals, or sample sizes. All numbers must come from executed code output.\n- **Never generate references from memory.** Use `/search-lit` for all citations.\n- If a function, package, or API does not exist or you are unsure, say so explicitly rather than guessing.","tags":["replicate","study","medsci","skills","aperivue","agent-skills","biostatistics","claude-code","claude-skills","clinical-research","diagnostic-accuracy","irb-protocol"],"capabilities":["skill","source-aperivue","skill-replicate-study","topic-agent-skills","topic-biostatistics","topic-claude-code","topic-claude-skills","topic-clinical-research","topic-diagnostic-accuracy","topic-irb-protocol","topic-literature-review","topic-manuscript","topic-medical-ai","topic-medical-research","topic-meta-analysis"],"categories":["medsci-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/Aperivue/medsci-skills/replicate-study","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add Aperivue/medsci-skills","source_repo":"https://github.com/Aperivue/medsci-skills","install_from":"skills.sh"}},"qualityScore":"0.499","qualityRationale":"deterministic score 0.50 from registry signals: · indexed on github topic:agent-skills · 98 github stars · SKILL.md body (7,163 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:56:31.387Z","embedding":null,"createdAt":"2026-05-13T12:57:46.307Z","updatedAt":"2026-05-18T18:56:31.387Z","lastSeenAt":"2026-05-18T18:56:31.387Z","tsv":"'-2023':726 '/analyze-stats':480 '/define-variables':746 '/path/to/harmonization_knhanes_nhanes.csv':834 '/path/to/knhanes/hn18.csv':832 '/references/harmonization_3country.csv':205 '/references/harmonization_knhanes_nhanes.csv':180,196 '/references/methodology_extraction_template.md':188 '/replicate-study':819 '/search-lit':898 '/skills/analyze-stats/references/analysis_guides/propensity_score.md':236 '/skills/analyze-stats/references/analysis_guides/survey_weighted.md':232 '/skills/write-paper/references/paper_types/cross_national.md':228 '/skills/write-paper/references/paper_types/nhis_cohort.md':224 '1':146,239,243,324,403,446,633 '2':158,252,321,339,478,649 '2009':740 '2021':741 '2023':736 '2026':823 '25':665 '3':168,209,312,364,400,488,661 '30':677 '4':391,499,678 '45':212 '5':569,689 '6':700 '67':201 '7':715 'account':580 'across':637 'adjust':297 'age':276,614 'analys':300,309,463,471 'analysi':32,242,405,454,691,789,794,802,809,813 'analysis_code.py':787 'analyt':98 'analyz':593,639,669 'anoth':88 'anti':836 'anti-hallucin':835 'api':907 'appli':79,424,591,616 'applic':307,816 'approv':398 'asian':605,662 'ask':859 'assist':45 'avail':215,361,363,371,375,383,483 'b':83,426 'bmi':548,663 'c':434 'calcul':551,680 'call':745,793 'case':266 'case-control':265 'chang':546 'check':750 'checklist':189,571 'chns':208 'choic':766 'citat':901 'ckd':738 'ckd-epi':737 'clean':420 'code':33,282,287,401,406,490,618,847,890 'code.r':795 'cohort':7,76,89,144,261 'column':331,843 'come':887 'common':64 'complet':574 'concept':333 'confid':879 'confirm':863 'contain':495 'content':509 'control':267 'correct':590,617 'countri':210,270,522,641 'covari':288,346,579 'criteria':275 'critic':631 'cross':124,263,749,828 'cross-check':748 'cross-nat':123,827 'cross-sect':262 'csv':172,329 'csv/sas':162 'cultur':560 'current':762 'cutoff':549,600,664 'd':444 'data':115,131,163,418,624,636,643,671,710,866 'databas':14,60,82,160,268,515 'dataset':842 'db':27,122,386,423,517,519,831 'de':707 'de-identifi':706 'default':177 'defin':438 'definit':281,286,292,718,727,756 'demand':220 'demograph':447,805 'depression/diabetes':826 'deriv':428 'design':17,193,260,302,436,511,648 'deviat':652 'dictionari':867 'differ':13,38,59,500,505,525,543,553,557,659,770,778 'dir':179,187,195,204,775 'direct':356,370,685 'directori':567 'document':379,507,585,650,764 'doi':150 'domain':332 'e':452 'e.g':74,84,127,547,733 'effect':877 'epi':739 'error':621 'even':672 'everi':651 'exampl':817 'exclus':390 'execut':889 'exempt':712 'exist':6,52,910 'expect':552 'explain':514 'explicit':767,917 'exposur':278,344,449 'exposure/outcome':117 'extend':135 'extens':763 'extract':15,191,253,257,315,342 'f':461 'fabric':839,869 'fast':608,611 'file':164,185,773 'find':347 'flag':355 'forc':545 'free':744 'friedewald':688 'full':289,528 'function':904 'g':469 'generat':31,365,402,404,502,893 'glucos':612 'green':369 'group':450 'guess':920 'hallucin':837 'handl':541 'harmon':29,169,327,353,833 'identifi':708 'inclusion/exclusion':274,425,524 'input':145 'institut':139 'interv':880 'invent':873 'invoc':818 'irb':701,711 'joo':822 'knhane':85,182,197,206 'knhanes/nhanes':695,704 'korean':128,670 'ldl':550,679 'legaci':755 'lipid':613 'list':290,310,474 'load':325,419 'logistic/cox/linear':459 'main':453,788 'main_results.csv':807 'mandatori':693 'map':22,173,200,211,323,367,394,432,527,529,582,782,851 'markdown':152,251 'masld':735 'match':349,357,532,601,610,627,785 'may':556 'measur':559,686 'medic':47 'medsci':222,226,230,234 'medsci-skil':221,225,229,233 'memori':896 'method':294 'methodolog':10,56,70,254,542,656 'mirror':753 'miss':538,587 'model':298,458,699 'modifi':513 'must':491,886 'n':272,520 'nafld':734 'name':269,280,285,841,844,857 'nation':125,143,829 'need':359,377 'never':592,634,696,838,868,892 'new':93 'nhane':86,183,198,207 'nhis':75 'note':338,681,702 'number':885 'obes':667 'obesity/bmi':599 'object':440 'option':171 'outcom':283,345 'outdat':716 'output':313,392,625,772,854,891 'p':875 'p-valu':874 'packag':905 'panda':409 'paper':21,72,94,107,148,155,241,247,468,477,578,629,722,821 'path':161 'pdf':149,248 'per':431 'phase':238,320,399,498,568 'pool':635 'popul':273,523,558,603 'pre':725 'primari':457,808 'produc':35,91 'propens':486 'propos':387 'proxi':362,388,583 'psu':305,442 'psychiatri':824 'public':709 'publish':53,106 'pure':757 'python':407,791 'r':410,792,796 'race':743 'race-fre':742 'rang':277 'rather':918 'read':218,244,421 'recod':358,373,378,429 'red':381 'refer':184,894 'regress':295,460 'replic':2,4,37,40,50,111,126,157,455,472,758 'replicate-studi':1 'replication_report.md':563,776 'report':39,368,501,506,573,660,771,779 'reproduc':497 'requir':609 'research':48,65,103,825 'restrict':615 'result':555,803,810,814,871 'review':319 'rigor':99 'row':202,213,350 'rule':632 'run':619,697 'sampl':882 'save':561 'say':915 'score':487 'script':790,797 'section':264,508 'self':494 'self-contain':493 'sensit':308,470 'setup':437 'sinc':730 'singl':138 'single-institut':137 'size':878,883 'skill':42,178,186,194,203,223,227,231,235 'skill-replicate-study' 'sourc':20,147,174,240,246,334,467,476,516,577,596,628,655,674,683,717,721,820 'source-aperivue' 'specifi':465 'standard':604 'statist':293,870 'status':533,786 'strata':304,441 'strategi':66 'structur':314,504,630,777 'studi':3,8,16,41,54,77,129,140,192,259,510 'subgroup':299,462,812 'subgroup_results.csv':811 'subprocess':412 'summari':316 'supersed':732 'survey':301,415,435,484,588,638,647,800 'survey-weight':414,799 'svydesign':439 'swap':116 'tabl':30,170,328,354,395,433,445,530,626,783,806 'table1.csv':804 'take':67 'target':26,159,175,336,385,422,518,602,623,830 'templat':217,258,481 'text':249 'topic-agent-skills' 'topic-biostatistics' 'topic-claude-code' 'topic-claude-skills' 'topic-clinical-research' 'topic-diagnostic-accuracy' 'topic-irb-protocol' 'topic-literature-review' 'topic-manuscript' 'topic-medical-ai' 'topic-medical-research' 'topic-meta-analysis' 'transform':380 'type':296 'unavail':534 'uncertain':853 'unsur':914 'unweight':594,698 'upgrad':760 'upstream':216 'us':130 'use':102,167,255,479,597,675,684,723,897 'user':318,397,861 'valid':69,570 'valu':876 'var':335,337 'variabl':23,118,176,199,279,284,322,343,393,427,430,526,535,781,840,846,850,856 'variable_mapping.csv':780 'verifi':575,855 'versa':134 'via':28,411 'vice':133 'vs':606,687 'waiv':714 'want':109 'weight':303,416,443,451,485,589,598,690,801 'whether':751 'within':119 'without':620 'work':566,774 'workflow':237 'year':271,521 'yellow':374","prices":[{"id":"77884232-131b-4cb2-a6a5-08329ceab158","listingId":"368583f4-85b3-4ec1-87ca-aa457e548543","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"Aperivue","category":"medsci-skills","install_from":"skills.sh"},"createdAt":"2026-05-13T12:57:46.307Z"}],"sources":[{"listingId":"368583f4-85b3-4ec1-87ca-aa457e548543","source":"github","sourceId":"Aperivue/medsci-skills/replicate-study","sourceUrl":"https://github.com/Aperivue/medsci-skills/tree/main/skills/replicate-study","isPrimary":false,"firstSeenAt":"2026-05-13T12:57:46.307Z","lastSeenAt":"2026-05-18T18:56:31.387Z"}],"details":{"listingId":"368583f4-85b3-4ec1-87ca-aa457e548543","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"Aperivue","slug":"replicate-study","github":{"repo":"Aperivue/medsci-skills","stars":98,"topics":["agent-skills","biostatistics","claude-code","claude-skills","clinical-research","diagnostic-accuracy","irb-protocol","literature-review","manuscript","medical-ai","medical-research","meta-analysis","physician-researcher","prisma","pubmed","radiology","reporting-guidelines","strobe","systematic-review","tripod-ai"],"license":"other","html_url":"https://github.com/Aperivue/medsci-skills","pushed_at":"2026-05-17T20:50:52Z","description":"Claude Code skills for medical research — literature search, reporting guidelines, statistical analysis, publication figures. Built by a physician-researcher, tested on real publications. MIT licensed.","skill_md_sha":"274d93a7d815683fbb26737615b821dfe56f4e4e","skill_md_path":"skills/replicate-study/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/Aperivue/medsci-skills/tree/main/skills/replicate-study"},"layout":"multi","source":"github","category":"medsci-skills","frontmatter":{"name":"replicate-study","description":"Replicate an existing cohort study's methodology on a different database. Extracts study design from a source paper, maps variables to the target DB via harmonization table, generates analysis code, and produces a replication difference report."},"skills_sh_url":"https://skills.sh/Aperivue/medsci-skills/replicate-study"},"updatedAt":"2026-05-18T18:56:31.387Z"}}