{"id":"df3ba2c9-5ba7-4c00-bbc2-d9ed00a2bbae","shortId":"snSkKh","kind":"skill","title":"evaluating-skills","tagline":"Evaluates and creates agent skills following best practices. Use when reviewing, writing, or refactoring skills, or asking about skill structure, format, or specification.","description":"# Evaluating & Creating Skills\n\n## Quick Start\n\n1. **Validating**: Run `skills validate <skill-dir>` for structural checks\n2. **Scoring**: Run `python scripts/score-skills.py <skill-dir>` for spec-grounded LLM evaluation\n\n## When to Use This Skill\n\n- User wants to create a new skill\n- User asks to review or evaluate an existing skill\n- User needs help with skill format or structure\n- User asks about skill best practices\n- User wants to refactor or improve a skill\n- Keywords: \"skill\", \"SKILL.md\", \"create skill\", \"evaluate skill\", \"skill quality\"\n\n## Authoritative References\n\nThe scorer grounds evaluation against these live documents (with vendored snapshot fallback):\n- [Specification](https://agentskills.io/specification.md) - Field constraints, structure rules\n- [Best Practices](https://agentskills.io/skill-creation/best-practices.md) - Quality criteria\n- [Evaluating Skills](https://agentskills.io/skill-creation/evaluating-skills.md) - Evaluation methodology\n\n## Skill Anatomy\n\n```\nskill-name/                    # Gerund form (verb-ing)\n├── SKILL.md                   # Main documentation (<500 lines)\n└── references/                # Optional detailed references\n    ├── topic-1.md            # One level deep only\n    └── topic-2.md\n```\n\n### Frontmatter (Required)\n\n```yaml\n---\nname: skill-name                    # Gerund, lowercase, hyphens, max 64 chars\ndescription: \"Third person description with trigger keywords. Max 1024 chars.\"\n---\n```\n\n**Description Rules:**\n- Third person: \"Analyzes data...\" not \"I help you...\"\n- Include trigger keywords for agent activation\n- Describe what AND when to use\n\n## Recommended Section Order\n\n| Section | Purpose | Guidelines |\n|---------|---------|------------|\n| Quick Start | Immediate value | 2-5 lines, actionable |\n| When to Use | Activation triggers | Bullet points, keywords |\n| Core Concepts | Mental models | Build understanding |\n| Workflow/Procedures | Step-by-step | Progressive complexity |\n| Examples | Concrete patterns | Code blocks, scenarios |\n| Common Pitfalls | Mistakes to avoid | 5-10 items |\n| References | Deep dives | Link to references/ with trigger context |\n\n## Skill Types & Patterns\n\n### Exploratory Skills\nExplain concepts, provide reference material, build mental models.\n- Lead with fundamentals\n- Include terminology glossary\n- Show common patterns\n\n### Procedural Skills\nStep-by-step guides for completing tasks.\n- Start with quick start\n- Show code examples early\n- Progress simple → complex\n\n### Decision/Framework Skills\nHelp make choices between options.\n- Lead with decision trees (ASCII)\n- Provide decision matrices\n- Include keyword signals\n\n### Analytical Skills\nInterpret data or outputs.\n- Explain interpretation frameworks\n- Pattern recognition guidance\n- Good vs bad examples\n\n## Evaluation Checklist\n\n### Frontmatter\n- [ ] Name uses gerund form (verb-ing)\n- [ ] Name is lowercase with hyphens only\n- [ ] Name matches directory name\n- [ ] Description is third person\n- [ ] Description includes trigger keywords\n- [ ] Description < 1024 characters\n\n### Structure\n- [ ] SKILL.md body < 500 lines\n- [ ] Total skill < 5000 tokens\n- [ ] References one level deep only\n- [ ] Has Quick Start section\n- [ ] Has When to Use section\n\n### Content Quality\n- [ ] Paragraphs 3-5 lines max\n- [ ] Uses headers for organization\n- [ ] Code in fenced blocks with language\n- [ ] Tables for comparisons\n- [ ] Concrete examples (not abstract)\n- [ ] No time-sensitive information\n- [ ] Consistent terminology\n\n### Common Pitfalls\n- [ ] Includes pitfalls section\n- [ ] 5-10 specific mistakes\n- [ ] Explains why they're wrong\n\n## Creating a New Skill\n\n### Step 1: Choose the Name\n\n```\nGood: analyzing-data, creating-reports, managing-users\nBad:  data-analysis, report-creator, user-management\n```\n\nUse gerund form (verb + -ing). The action should be clear.\n\n### Step 2: Write the Description\n\nTemplate:\n```\n\"{Verb}s {what} for {purpose}. Use when {trigger conditions}.\"\n```\n\nExample:\n```\n\"Analyzes chart visualizations to extract insights. Use when interpreting\ndashboards, identifying trends, or explaining data patterns to stakeholders.\"\n```\n\n### Step 3: Structure Content\n\n1. Start with Quick Start (2-5 actionable lines)\n2. Add When to Use (bullet list of triggers)\n3. Write core content (concepts, workflows, examples)\n4. Add Common Pitfalls\n5. Move detailed content to references/ with loading triggers (e.g. \"Read when implementing X\")\n\n### Step 4: Validate\n\nRun through the evaluation checklist above.\n\n## Using the Scorer\n\n### Validate Only (fast, no LLM)\n```bash\nuv run python scripts/score-skills.py <skill-dir> --validate_only\n```\n\n### Full Scoring (with spec grounding)\n```bash\nuv run python scripts/score-skills.py <skill-dir>\n```\n\n### Batch All Skills\n```bash\nuv run python scripts/score-skills.py . --scan_all\n```\n\n## Common Pitfalls\n\n1. **First-person descriptions** - Use \"Analyzes...\" not \"I analyze...\"\n2. **Missing trigger keywords** - Agents can't find the skill\n3. **Too long SKILL.md** - Move details to references/ and add trigger context (e.g. \"Read when working with X\")\n4. **Nested reference folders** - Only one level allowed\n5. **Abstract examples** - Use concrete, real scenarios\n6. **Noun-form names** - Use \"analyzing-data\" not \"data-analyzer\"\n7. **No Quick Start** - Users abandon without immediate value\n8. **Inconsistent terminology** - Pick terms and stick with them\n9. **Missing pitfalls section** - Helps users avoid mistakes\n10. **Time-sensitive content** - Skills should be evergreen\n\n## References\n\n- [skill-checklist.md](references/skill-checklist.md) - Read when scoring or reviewing a skill to get the full rubric breakdown, anti-patterns, and evaluation template\n- [examples.md](references/examples.md) - Read when creating a new skill or refactoring an existing one to see concrete patterns from well-designed skills","tags":["evaluating","skills","altertable-ai","agent-skills","ai-agents","altertable"],"capabilities":["skill","source-altertable-ai","skill-evaluating-skills","topic-agent-skills","topic-ai-agents","topic-altertable"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/altertable-ai/skills/evaluating-skills","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add altertable-ai/skills","source_repo":"https://github.com/altertable-ai/skills","install_from":"skills.sh"}},"qualityScore":"0.453","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 7 github stars · SKILL.md body (5,844 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:14:20.087Z","embedding":null,"createdAt":"2026-05-18T13:21:54.888Z","updatedAt":"2026-05-18T19:14:20.087Z","lastSeenAt":"2026-05-18T19:14:20.087Z","tsv":"'-10':256,435 '-5':220,402,526 '/skill-creation/best-practices.md)':129 '/skill-creation/evaluating-skills.md)':136 '/specification.md)':120 '1':32,448,520,609 '10':701 '1024':185,373 '2':40,219,483,525,529,619 '3':401,517,538,629 '4':545,564,647 '5':255,434,549,655 '500':152,378 '5000':382 '6':662 '64':175 '7':675 '8':684 '9':693 'abandon':680 'abstract':421,656 'action':222,478,527 'activ':202,226 'add':530,546,638 'agent':7,201,623 'agentskills.io':119,128,135 'agentskills.io/skill-creation/best-practices.md)':127 'agentskills.io/skill-creation/evaluating-skills.md)':134 'agentskills.io/specification.md)':118 'allow':654 'analysi':465 'analyt':328 'analyz':191,454,498,615,618,669,674 'analyzing-data':453,668 'anatomi':140 'anti':727 'anti-pattern':726 'ascii':321 'ask':20,64,81 'authorit':103 'avoid':254,699 'bad':342,462 'bash':580,592,600 'batch':597 'best':10,84,125 'block':248,412 'bodi':377 'breakdown':725 'build':235,277 'bullet':228,534 'char':176,186 'charact':374 'chart':499 'check':39 'checklist':345,570 'choic':314 'choos':449 'clear':481 'code':247,304,409 'common':250,287,429,547,607 'comparison':417 'complet':297 'complex':243,309 'concept':232,273,542 'concret':245,418,659,747 'condit':496 'consist':427 'constraint':122 'content':398,519,541,552,705 'context':266,640 'core':231,540 'creat':6,28,59,97,443,457,736 'creating-report':456 'creator':468 'criteria':131 'dashboard':507 'data':192,331,455,464,512,670,673 'data-analysi':463 'data-analyz':672 'decis':319,323 'decision/framework':310 'deep':161,259,387 'describ':203 'descript':177,180,187,364,368,372,486,613 'design':752 'detail':156,551,634 'directori':362 'dive':260 'document':112,151 'e.g':558,641 'earli':306 'evalu':2,4,27,50,68,99,108,132,137,344,569,730 'evaluating-skil':1 'evergreen':709 'exampl':244,305,343,419,497,544,657 'examples.md':732 'exist':70,743 'explain':272,334,438,511 'exploratori':270 'extract':502 'fallback':116 'fast':577 'fenc':411 'field':121 'find':626 'first':611 'first-person':610 'folder':650 'follow':9 'form':145,350,474,665 'format':24,77 'framework':336 'frontmatt':164,346 'full':587,723 'fundament':282 'gerund':144,171,349,473 'get':721 'glossari':285 'good':340,452 'ground':48,107,591 'guid':295 'guidanc':339 'guidelin':214 'header':406 'help':74,195,312,697 'hyphen':173,358 'identifi':508 'immedi':217,682 'implement':561 'improv':91 'includ':197,283,325,369,431 'inconsist':685 'inform':426 'ing':148,353,476 'insight':503 'interpret':330,335,506 'item':257 'keyword':94,183,199,230,326,371,622 'languag':414 'lead':280,317 'level':160,386,653 'line':153,221,379,403,528 'link':261 'list':535 'live':111 'llm':49,579 'load':556 'long':631 'lowercas':172,356 'main':150 'make':313 'manag':460,471 'managing-us':459 'match':361 'materi':276 'matric':324 'max':174,184,404 'mental':233,278 'methodolog':138 'miss':620,694 'mistak':252,437,700 'model':234,279 'move':550,633 'name':143,167,170,347,354,360,363,451,666 'need':73 'nest':648 'new':61,445,738 'noun':664 'noun-form':663 'one':159,385,652,744 'option':155,316 'order':211 'organ':408 'output':333 'paragraph':400 'pattern':246,269,288,337,513,728,748 'person':179,190,367,612 'pick':687 'pitfal':251,430,432,548,608,695 'point':229 'practic':11,85,126 'procedur':289 'progress':242,307 'provid':274,322 'purpos':213,492 'python':43,583,595,603 'qualiti':102,130,399 'quick':30,215,301,390,523,677 're':441 'read':559,642,713,734 'real':660 'recognit':338 'recommend':209 'refactor':17,89,741 'refer':104,154,157,258,263,275,384,554,636,649,710 'references/examples.md':733 'references/skill-checklist.md':712 'report':458,467 'report-cr':466 'requir':165 'review':14,66,717 'rubric':724 'rule':124,188 'run':34,42,566,582,594,602 'scan':605 'scenario':249,661 'score':41,588,715 'scorer':106,574 'scripts/score-skills.py':44,584,596,604 'section':210,212,392,397,433,696 'see':746 'sensit':425,704 'show':286,303 'signal':327 'simpl':308 'skill':3,8,18,22,29,35,55,62,71,76,83,93,95,98,100,101,133,139,142,169,267,271,290,311,329,381,446,599,628,706,719,739,753 'skill-checklist.md':711 'skill-evaluating-skills' 'skill-nam':141,168 'skill.md':96,149,376,632 'snapshot':115 'source-altertable-ai' 'spec':47,590 'spec-ground':46 'specif':26,117,436 'stakehold':515 'start':31,216,299,302,391,521,524,678 'step':239,241,292,294,447,482,516,563 'step-by-step':238,291 'stick':690 'structur':23,38,79,123,375,518 'tabl':415 'task':298 'templat':487,731 'term':688 'terminolog':284,428,686 'third':178,189,366 'time':424,703 'time-sensit':423,702 'token':383 'topic-1.md':158 'topic-2.md':163 'topic-agent-skills' 'topic-ai-agents' 'topic-altertable' 'total':380 'tree':320 'trend':509 'trigger':182,198,227,265,370,495,537,557,621,639 'type':268 'understand':236 'use':12,53,208,225,348,396,405,472,493,504,533,572,614,658,667 'user':56,63,72,80,86,461,470,679,698 'user-manag':469 'uv':581,593,601 'valid':33,36,565,575,585 'valu':218,683 'vendor':114 'verb':147,352,475,488 'verb-':146,351 'visual':500 'vs':341 'want':57,87 'well':751 'well-design':750 'without':681 'work':644 'workflow':543 'workflow/procedures':237 'write':15,484,539 'wrong':442 'x':562,646 'yaml':166","prices":[{"id":"52bd0176-86c8-414d-8aae-db710464953f","listingId":"df3ba2c9-5ba7-4c00-bbc2-d9ed00a2bbae","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"altertable-ai","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:21:54.888Z"}],"sources":[{"listingId":"df3ba2c9-5ba7-4c00-bbc2-d9ed00a2bbae","source":"github","sourceId":"altertable-ai/skills/evaluating-skills","sourceUrl":"https://github.com/altertable-ai/skills/tree/main/skills/evaluating-skills","isPrimary":false,"firstSeenAt":"2026-05-18T13:21:54.888Z","lastSeenAt":"2026-05-18T19:14:20.087Z"}],"details":{"listingId":"df3ba2c9-5ba7-4c00-bbc2-d9ed00a2bbae","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"altertable-ai","slug":"evaluating-skills","github":{"repo":"altertable-ai/skills","stars":7,"topics":["agent-skills","ai-agents","altertable"],"license":"mit","html_url":"https://github.com/altertable-ai/skills","pushed_at":"2026-05-14T10:34:10Z","description":"Agent Skills for Altertable","skill_md_sha":"ee3f918beb2cc01700cfd5b04a7a0cdbf79c7dd8","skill_md_path":"skills/evaluating-skills/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/altertable-ai/skills/tree/main/skills/evaluating-skills"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"evaluating-skills","description":"Evaluates and creates agent skills following best practices. Use when reviewing, writing, or refactoring skills, or asking about skill structure, format, or specification."},"skills_sh_url":"https://skills.sh/altertable-ai/skills/evaluating-skills"},"updatedAt":"2026-05-18T19:14:20.087Z"}}