{"id":"d1403561-0084-45c7-9a38-0c6ad7ddbec2","shortId":"u6ermM","kind":"skill","title":"protein-qc","tagline":"Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybas","description":"# Protein Design Quality Control\n\n## Critical Limitation\n\n**Individual metrics have weak predictive power for binding**. Research shows:\n- Individual metric ROC AUC: 0.64-0.66 (slightly better than random)\n- Metrics are **pre-screening filters**, not affinity predictors\n- **Composite scoring is essential** for meaningful ranking\n\nThese thresholds filter out poor designs but do NOT predict binding affinity.\n\n## QC Organization\n\nQC is organized by **purpose** and **level**:\n\n| Purpose | What it assesses | Key metrics |\n|---------|------------------|-------------|\n| **Binding** | Interface quality, binding geometry | ipTM, PAE, SC, dG, dSASA |\n| **Expression** | Manufacturability, solubility | Instability, GRAVY, pI, cysteines |\n| **Structural** | Fold confidence, consistency | pLDDT, pTM, scRMSD |\n\nEach category has two levels:\n- **Metric-level**: Calculated values with thresholds (pLDDT > 0.85)\n- **Design-level**: Pattern/motif detection (odd cysteines, NG sites)\n\n---\n\n## Quick Reference: All Thresholds\n\n| Category | Metric | Standard | Stringent | Source |\n|----------|--------|----------|-----------|--------|\n| **Structural** | pLDDT | > 0.85 | > 0.90 | AF2/Chai/Boltz |\n| | pTM | > 0.70 | > 0.80 | AF2/Chai/Boltz |\n| | scRMSD | < 2.0 Å | < 1.5 Å | Design vs pred |\n| **Binding** | ipTM | > 0.50 | > 0.60 | AF2/Chai/Boltz |\n| | PAE_interaction | < 12 Å | < 10 Å | AF2/Chai/Boltz |\n| | Shape Comp (SC) | > 0.50 | > 0.60 | PyRosetta |\n| | interface_dG | < -10 | < -15 | PyRosetta |\n| **Expression** | Instability | < 40 | < 30 | BioPython |\n| | GRAVY | < 0.4 | < 0.2 | BioPython |\n| | ESM2 PLL | > 0.0 | > 0.2 | ESM2 |\n\n### Design-Level Checks (Expression)\n| Pattern | Risk | Action |\n|---------|------|--------|\n| Odd cysteine count | Unpaired disulfides | Redesign |\n| NG/NS/NT motifs | Deamidation | Flag/avoid |\n| K/R >= 3 consecutive | Proteolysis | Flag |\n| >= 6 hydrophobic run | Aggregation | Redesign |\n\nSee: references/binding-qc.md, references/expression-qc.md, references/structural-qc.md\n\n---\n\n## Sequential Filtering Pipeline\n\n```python\nimport pandas as pd\n\ndesigns = pd.read_csv('designs.csv')\n\n# Stage 1: Structural confidence\ndesigns = designs[designs['pLDDT'] > 0.85]\n\n# Stage 2: Self-consistency\ndesigns = designs[designs['scRMSD'] < 2.0]\n\n# Stage 3: Binding quality\ndesigns = designs[(designs['ipTM'] > 0.5) & (designs['PAE_interaction'] < 10)]\n\n# Stage 4: Sequence plausibility\ndesigns = designs[designs['esm2_pll_normalized'] > 0.0]\n\n# Stage 5: Expression checks (design-level)\ndesigns = designs[designs['cysteine_count'] % 2 == 0]  # Even cysteines\ndesigns = designs[designs['instability_index'] < 40]\n```\n\n---\n\n## Composite Scoring (Required for Ranking)\n\nIndividual metrics alone are too weak. Use composite scoring:\n\n```python\ndef composite_score(row):\n    return (\n        0.30 * row['pLDDT'] +\n        0.20 * row['ipTM'] +\n        0.20 * (1 - row['PAE_interaction'] / 20) +\n        0.15 * row['shape_complementarity'] +\n        0.15 * row['esm2_pll_normalized']\n    )\n\ndesigns['score'] = designs.apply(composite_score, axis=1)\ntop_designs = designs.nlargest(100, 'score')\n```\n\nFor advanced composite scoring, see references/composite-scoring.md.\n\n---\n\n## Tool-Specific Filtering\n\n### BindCraft Filter Levels\n| Level | Use Case | Stringency |\n|-------|----------|------------|\n| Default | Standard design | Most stringent |\n| Relaxed | Need more designs | Higher failure rate |\n| Peptide | Designs < 30 AA | ~5-10x lower success |\n\n### BoltzGen Filtering\n```bash\nboltzgen run ... \\\n  --budget 60 \\\n  --alpha 0.01 \\\n  --filter_biased true \\\n  --refolding_rmsd_threshold 2.0 \\\n  --additional_filters 'ALA_fraction<0.3'\n```\n\n- `alpha=0.0`: Quality-only ranking\n- `alpha=0.01`: Default (slight diversity)\n- `alpha=1.0`: Diversity-only\n\n---\n\n## Design-Level Severity Scoring\n\nFor pattern-based checks, use severity scoring:\n\n| Severity Level | Score | Action |\n|----------------|-------|--------|\n| LOW | 0-15 | Proceed |\n| MODERATE | 16-35 | Review flagged issues |\n| HIGH | 36-60 | Redesign recommended |\n| CRITICAL | 61+ | Redesign required |\n\n---\n\n## Experimental Correlation\n\n| Metric | AUC | Use |\n|--------|-----|-----|\n| ipTM | ~0.64 | Pre-screening |\n| PAE | ~0.65 | Pre-screening |\n| ESM2 PLL | ~0.72 | Best single metric |\n| Composite | ~0.75+ | **Always use** |\n\n**Key insight**: Metrics work as **filters** (eliminating failures) not **predictors** (ranking successes).\n\n---\n\n## Campaign Health Assessment\n\nQuick assessment of your design campaign:\n\n| Pass Rate | Status | Interpretation |\n|-----------|--------|----------------|\n| > 15% | Excellent | Above average, proceed |\n| 10-15% | Good | Normal, proceed |\n| 5-10% | Marginal | Below average, review issues |\n| < 5% | Poor | Significant problems, diagnose |\n\n---\n\n## Failure Recovery Trees\n\n### Too Few Pass pLDDT Filter (< 5% with pLDDT > 0.85)\n\n```\nLow pLDDT across campaign\n├── Check scRMSD distribution\n│   ├── High scRMSD (>2.5Å): Backbone issue\n│   │   └── Fix: Regenerate backbones with lower noise_scale (0.5-0.8)\n│   └── Low scRMSD but low pLDDT: Disordered regions\n│       └── Fix: Check design length, simplify topology\n├── Try more sequences per backbone\n│   └── modal run modal_proteinmpnn.py --num-seq-per-target 32 --sampling-temp 0.1\n├── Use SolubleMPNN instead of ProteinMPNN\n│   └── Better for expression-optimized sequences\n└── Consider different design tool\n    └── BindCraft (integrated design) may work better\n```\n\n### Too Few Pass ipTM Filter (< 5% with ipTM > 0.5)\n\n```\nLow ipTM across campaign\n├── Review hotspot selection\n│   ├── Are hotspots surface-exposed? (SASA > 20Å²)\n│   ├── Are hotspots conserved? (check MSA)\n│   └── Try 3-6 different hotspot combinations\n├── Increase binder length (more contact area)\n│   └── Try 80-100 AA instead of 60-80 AA\n├── Check interface geometry\n│   ├── Is target flat? → Try helical binders\n│   └── Is target concave? → Try smaller binders\n└── Try all-atom design tool\n    └── BoltzGen (all-atom, better packing)\n```\n\n### High scRMSD (> 50% with scRMSD > 2.0Å)\n\n```\nSequences don't specify intended structure\n├── ProteinMPNN issue\n│   ├── Lower temperature: --sampling-temp 0.1\n│   ├── Increase sequences: --num-seq-per-target 32\n│   └── Check fixed_positions aren't over-constraining\n├── Backbone geometry issue\n│   ├── Backbones may be unusual/strained\n│   ├── Regenerate with lower noise_scale (0.5-0.8)\n│   └── Reduce diffuser.T to 30-40\n└── Try different sequence design\n    └── ColabDesign (AF2 gradient-based) may work better\n```\n\n### Everything Passes But No Experimental Hits\n\n```\nIn silico metrics don't predict affinity\n├── Generate MORE designs (10x current)\n│   └── Computational metrics have high false positive rate\n├── Increase diversity\n│   ├── Higher ProteinMPNN temperature (0.2-0.3)\n│   ├── Different backbone topologies\n│   └── Different hotspot combinations\n├── Try different design approach\n│   ├── BindCraft (different algorithm)\n│   ├── ColabDesign (AF2 hallucination)\n│   └── BoltzGen (all-atom diffusion)\n└── Check if target is druggable\n    └── Some targets are inherently difficult\n```\n\n### Too Many Designs Pass (> 50%)\n\n```\nSuspiciously high pass rate\n├── Check if thresholds are too lenient\n│   └── Use stringent thresholds: pLDDT > 0.90, ipTM > 0.60\n├── Verify prediction quality\n│   ├── Are predictions actually running? Check output files\n│   └── Are complexes being predicted, not just monomers?\n├── Check for data issues\n│   ├── Same sequence being predicted multiple times?\n│   └── Wrong FASTA format (missing chain separator)?\n└── Apply diversity filter\n    └── Cluster at 70% identity, take top per cluster\n```\n\n---\n\n## Diagnostic Commands\n\n### Quick Campaign Assessment\n\n```python\nimport pandas as pd\n\ndf = pd.read_csv('designs.csv')\n\n# Pass rates at each stage\nprint(f\"Total designs: {len(df)}\")\nprint(f\"pLDDT > 0.85: {(df['pLDDT'] > 0.85).mean():.1%}\")\nprint(f\"ipTM > 0.50: {(df['ipTM'] > 0.50).mean():.1%}\")\nprint(f\"scRMSD < 2.0: {(df['scRMSD'] < 2.0).mean():.1%}\")\nprint(f\"All filters: {((df['pLDDT'] > 0.85) & (df['ipTM'] > 0.5) & (df['scRMSD'] < 2.0)).mean():.1%}\")\n\n# Identify top issue\nif (df['pLDDT'] > 0.85).mean() < 0.1:\n    print(\"ISSUE: Low pLDDT - check backbone or sequence quality\")\nelif (df['ipTM'] > 0.50).mean() < 0.1:\n    print(\"ISSUE: Low ipTM - check hotspots or interface geometry\")\nelif (df['scRMSD'] < 2.0).mean() < 0.5:\n    print(\"ISSUE: High scRMSD - sequences don't specify backbone\")\n```\n\n---","tags":["protein","design","skills","adaptyvbio","agent-skills","claude-code","protein-design","protein-engineering"],"capabilities":["skill","source-adaptyvbio","skill-protein-qc","topic-agent-skills","topic-claude-code","topic-protein-design","topic-protein-engineering"],"categories":["protein-design-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/adaptyvbio/protein-design-skills/protein-qc","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add adaptyvbio/protein-design-skills","source_repo":"https://github.com/adaptyvbio/protein-design-skills","install_from":"skills.sh"}},"qualityScore":"0.513","qualityRationale":"deterministic score 0.51 from registry signals: · indexed on github topic:agent-skills · 126 github stars · SKILL.md body (8,433 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-02T12:54:48.959Z","embedding":null,"createdAt":"2026-04-18T22:10:12.237Z","updatedAt":"2026-05-02T12:54:48.959Z","lastSeenAt":"2026-05-02T12:54:48.959Z","tsv":"'-0.3':826 '-0.66':62 '-0.8':598,777 '-10':203,416,554 '-100':693 '-15':204,476,549 '-35':480 '-40':782 '-6':681 '-60':486 '-80':698 '0':320,475 '0.0':217,306,442 '0.01':428,448 '0.1':629,747,999,1014 '0.15':361,365 '0.2':213,218,825 '0.20':352,355 '0.3':440 '0.30':349 '0.4':212 '0.5':291,597,659,776,985,1029 '0.50':185,198,961,964,1012 '0.60':186,199,879 '0.64':61,499 '0.65':504 '0.70':172 '0.72':510 '0.75':515 '0.80':173 '0.85':147,168,272,576,952,955,982,997 '0.90':169,877 '1':17,265,356,376,957,966,975,990 '1.0':453 '1.5':178 '10':192,295,548 '100':380 '10x':811 '12':190 '15':543 '16':479 '2':26,274,319 '2.0':176,282,435,732,970,973,988,1027 '2.5':586 '20':360 '20å':673 '3':34,239,284,680 '30':209,413,781 '32':625,755 '36':485 '4':297 '40':208,328 '5':308,415,553,560,573,656 '50':729,862 '6':243 '60':426,697 '61':490 '70':918 '80':692 'aa':414,694,699 'across':579,662 'action':227,473 'actual':885 'addit':436 'advanc':383 'af2':788,841 'af2/chai/boltz':170,174,187,194 'affin':74,94,807 'aggreg':246 'ala':438 'algorithm':839 'all-atom':716,722,844 'alon':336 'alpha':427,441,447,452 'alway':516 'appli':913 'approach':836 'area':690 'aren':759 'assess':107,532,534,928 'atom':718,724,846 'auc':60,496 'averag':546,557 'axi':375 'backbon':588,592,616,764,767,828,1005,1038 'base':465,791 'bash':422 'best':511 'better':64,635,650,725,794 'bias':430 'bind':22,54,93,110,113,183,285 'bindcraft':392,645,837 'binder':686,708,714 'biopython':210,214 'boltzgen':420,423,721,843 'budget':425 'calcul':142 'campaign':530,538,580,663,927 'case':397 'categori':135,161 'chain':911 'check':35,223,310,466,581,607,677,700,756,848,867,887,897,1004,1019 'cluster':916,923 'colabdesign':787,840 'combin':684,832 'command':925 'comp':196 'complementar':364 'complex':891 'composit':76,329,341,345,373,384,514 'comput':813 'concav':711 'confid':129,267 'consecut':240 'conserv':676 'consid':641 'consist':130,277 'constrain':763 'contact':689 'control':5,44 'correl':494 'count':230,318 'critic':45,489 'csv':262,936 'current':812 'cystein':38,126,154,229,317,322 'data':899 'deamid':39,236 'def':344 'default':399,449 'design':12,19,42,88,149,180,221,260,268,269,270,278,279,280,287,288,289,292,300,301,302,312,314,315,316,323,324,325,370,378,401,407,412,458,537,608,643,647,719,786,810,835,860,946 'design-level':148,220,311,457 'designs.apply':372 'designs.csv':263,937 'designs.nlargest':379 'detect':152 'df':934,948,953,962,971,980,983,986,995,1010,1025 'dg':118,202 'diagnos':564 'diagnost':924 'differ':642,682,784,827,830,834,838 'difficult':857 'diffus':847 'diffuser.t':779 'disord':604 'distribut':583 'disulfid':232 'divers':451,455,821,914 'diversity-on':454 'druggabl':852 'dsasa':119 'elif':1009,1024 'elimin':524 'esm2':215,219,303,367,508 'essenti':79 'evalu':18 'even':321 'everyth':795 'excel':544 'experiment':493,799 'expos':671 'express':23,120,206,224,309,638 'expression-optim':637 'f':944,950,959,968,977 'failur':409,525,565 'fals':817 'fasta':908 'file':889 'filter':8,28,72,85,253,391,393,421,429,437,523,572,655,915,979 'fix':590,606,757 'flag':242,482 'flag/avoid':237 'flat':705 'fold':128 'format':909 'fraction':439 'generat':808 'geometri':114,702,765,1023 'good':550 'gradient':790 'gradient-bas':789 'gravi':124,211 'hallucin':842 'health':531 'helic':707 'high':484,584,727,816,864,1032 'higher':408,822 'hit':800 'hotspot':665,668,675,683,831,1020 'hydrophob':244 'ident':919 'identifi':991 'import':256,930 'increas':685,748,820 'index':327 'individu':47,57,334 'inher':856 'insight':519 'instabl':123,207,326 'instead':632,695 'integr':646 'intend':738 'interact':189,294,359 'interfac':111,201,701,1022 'interpret':542 'iptm':32,115,184,290,354,498,654,658,661,878,960,963,984,1011,1018 'issu':483,559,589,741,766,900,993,1001,1016,1031 'k/r':238 'key':108,518 'len':947 'length':609,687 'lenient':872 'level':103,138,141,150,222,313,394,395,459,471 'liabil':37 'limit':46 'low':474,577,599,602,660,1002,1017 'lower':418,594,742,773 'mani':859 'manufactur':121 'margin':555 'may':648,768,792 'mean':956,965,974,989,998,1013,1028 'meaning':81 'metric':6,48,58,67,109,140,162,335,495,513,520,803,814 'metric-level':139 'miss':910 'modal':617 'modal_proteinmpnn.py':619 'moder':478 'monom':896 'motif':235 'msa':678 'multipl':905 'need':405 'ng':155 'ng/ns/nt':234 'nois':595,774 'normal':305,369,551 'num':621,751 'num-seq-per-target':620,750 'odd':153,228 'optim':639 'organ':96,99 'output':888 'over-constrain':761 'pack':726 'pae':33,116,188,293,358,503 'panda':257,931 'pass':539,570,653,796,861,865,938 'pattern':225,464 'pattern-bas':463 'pattern/motif':151 'pd':259,933 'pd.read':261,935 'peptid':411 'per':615,623,753,922 'pi':125 'pipelin':254 'plausibl':299 'plddt':31,131,146,167,271,351,571,575,578,603,876,951,954,981,996,1003 'pll':216,304,368,509 'polyba':40 'poor':87,561 'posit':758,818 'power':52 'pre':70,501,506 'pre-screen':69,500,505 'pred':182 'predict':51,92,806,881,884,893,904 'predictor':75,527 'print':943,949,958,967,976,1000,1015,1030 'problem':563 'proceed':477,547,552 'protein':2,11,41 'protein-qc':1 'proteinmpnn':634,740,823 'proteolysi':241 'ptm':132,171 'purpos':101,104 'pyrosetta':200,205 'python':255,343,929 'qc':3,95,97 'qualiti':4,20,43,112,286,444,882,1008 'quality-on':443 'quick':157,533,926 'random':66 'rank':82,333,446,528 'rate':410,540,819,866,939 'recommend':488 'recoveri':566 'redesign':233,247,487,491 'reduc':778 'refer':158 'references/binding-qc.md':249 'references/composite-scoring.md':387 'references/expression-qc.md':250 'references/structural-qc.md':251 'refold':432 'regener':591,771 'region':605 'relax':404 'requir':331,492 'research':55 'return':348 'review':481,558,664 'risk':226 'rmsd':433 'roc':59 'row':347,350,353,357,362,366 'run':245,424,618,886 'sampl':627,745 'sampling-temp':626,744 'sasa':672 'sc':117,197 'scale':596,775 'score':77,330,342,346,371,374,381,385,461,469,472 'screen':71,502,507 'scrmsd':133,175,281,582,585,600,728,731,969,972,987,1026,1033 'see':248,386 'select':666 'self':276 'self-consist':275 'separ':912 'seq':622,752 'sequenc':36,298,614,640,734,749,785,902,1007,1034 'sequenti':252 'set':27 'sever':460,468,470 'shape':195,363 'show':56 'signific':562 'silico':802 'simplifi':610 'singl':512 'site':156 'skill':15 'skill-protein-qc' 'slight':63,450 'smaller':713 'solubl':122 'solublempnn':631 'sourc':165 'source-adaptyvbio' 'specif':390 'specifi':737,1037 'stage':264,273,283,296,307,942 'standard':163,400 'status':541 'stringenc':398 'stringent':164,403,874 'structur':25,127,166,266,739 'success':419,529 'surfac':670 'surface-expos':669 'suspici':863 'take':920 'target':624,704,710,754,850,854 'temp':628,746 'temperatur':743,824 'threshold':9,29,84,145,160,434,869,875 'time':906 'tool':389,644,720 'tool-specif':388 'top':377,921,992 'topic-agent-skills' 'topic-claude-code' 'topic-protein-design' 'topic-protein-engineering' 'topolog':611,829 'total':945 'tree':567 'tri':612,679,691,706,712,715,783,833 'true':431 'two':137 'unpair':231 'unusual/strained':770 'use':13,340,396,467,497,517,630,873 'valu':143 'verifi':880 'vs':181 'weak':50,339 'work':521,649,793 'wrong':907 'x':417 'å':177,179,191,193,587,733","prices":[{"id":"0c8e67fd-c757-4ba7-be66-91820a3864c3","listingId":"d1403561-0084-45c7-9a38-0c6ad7ddbec2","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"adaptyvbio","category":"protein-design-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T22:10:12.237Z"}],"sources":[{"listingId":"d1403561-0084-45c7-9a38-0c6ad7ddbec2","source":"github","sourceId":"adaptyvbio/protein-design-skills/protein-qc","sourceUrl":"https://github.com/adaptyvbio/protein-design-skills/tree/main/skills/protein-qc","isPrimary":false,"firstSeenAt":"2026-04-18T22:10:12.237Z","lastSeenAt":"2026-05-02T12:54:48.959Z"}],"details":{"listingId":"d1403561-0084-45c7-9a38-0c6ad7ddbec2","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"adaptyvbio","slug":"protein-qc","github":{"repo":"adaptyvbio/protein-design-skills","stars":126,"topics":["agent-skills","claude-code","protein-design","protein-engineering"],"license":"mit","html_url":"https://github.com/adaptyvbio/protein-design-skills","pushed_at":"2026-01-19T13:06:29Z","description":"Claude Code skills for protein design","skill_md_sha":"837bb9d5571c99fe61921d25d026859394018af9","skill_md_path":"skills/protein-qc/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/adaptyvbio/protein-design-skills/tree/main/skills/protein-qc"},"layout":"multi","source":"github","category":"protein-design-skills","frontmatter":{"name":"protein-qc","license":"MIT","description":"Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybasic clusters), (4) Creating multi-stage filtering pipelines, (5) Computing PyRosetta interface metrics (dG, SC, dSASA), (6) Checking biophysical properties (instability, GRAVY, pI), (7) Ranking designs with composite scoring.  This skill provides research-backed thresholds from binder design competitions and published benchmarks."},"skills_sh_url":"https://skills.sh/adaptyvbio/protein-design-skills/protein-qc"},"updatedAt":"2026-05-02T12:54:48.959Z"}}