What it does

Evaluating & Creating Skills

Quick Start

Validating: Run skills validate <skill-dir> for structural checks
Scoring: Run python scripts/score-skills.py <skill-dir> for spec-grounded LLM evaluation

When to Use This Skill

User wants to create a new skill
User asks to review or evaluate an existing skill
User needs help with skill format or structure
User asks about skill best practices
User wants to refactor or improve a skill
Keywords: "skill", "SKILL.md", "create skill", "evaluate skill", "skill quality"

Authoritative References

The scorer grounds evaluation against these live documents (with vendored snapshot fallback):

Specification - Field constraints, structure rules
Best Practices - Quality criteria
Evaluating Skills - Evaluation methodology

Skill Anatomy

skill-name/                    # Gerund form (verb-ing)
├── SKILL.md                   # Main documentation (<500 lines)
└── references/                # Optional detailed references
    ├── topic-1.md            # One level deep only
    └── topic-2.md

Frontmatter (Required)

---
name: skill-name                    # Gerund, lowercase, hyphens, max 64 chars
description: "Third person description with trigger keywords. Max 1024 chars."
---

Description Rules:

Third person: "Analyzes data..." not "I help you..."
Include trigger keywords for agent activation
Describe what AND when to use

Recommended Section Order

Section	Purpose	Guidelines
Quick Start	Immediate value	2-5 lines, actionable
When to Use	Activation triggers	Bullet points, keywords
Core Concepts	Mental models	Build understanding
Workflow/Procedures	Step-by-step	Progressive complexity
Examples	Concrete patterns	Code blocks, scenarios
Common Pitfalls	Mistakes to avoid	5-10 items
References	Deep dives	Link to references/ with trigger context

Skill Types & Patterns

Exploratory Skills

Explain concepts, provide reference material, build mental models.

Lead with fundamentals
Include terminology glossary
Show common patterns

Procedural Skills

Step-by-step guides for completing tasks.

Start with quick start
Show code examples early
Progress simple → complex

Decision/Framework Skills

Help make choices between options.

Lead with decision trees (ASCII)
Provide decision matrices
Include keyword signals

Analytical Skills

Interpret data or outputs.

Explain interpretation frameworks
Pattern recognition guidance
Good vs bad examples

Evaluation Checklist

Frontmatter

Name uses gerund form (verb-ing)
Name is lowercase with hyphens only
Name matches directory name
Description is third person
Description includes trigger keywords
Description < 1024 characters

Structure

Content Quality

Common Pitfalls

Includes pitfalls section
5-10 specific mistakes
Explains why they're wrong

Creating a New Skill

Step 1: Choose the Name

Good: analyzing-data, creating-reports, managing-users
Bad:  data-analysis, report-creator, user-management

Use gerund form (verb + -ing). The action should be clear.

Step 2: Write the Description

Template:

"{Verb}s {what} for {purpose}. Use when {trigger conditions}."

Example:

"Analyzes chart visualizations to extract insights. Use when interpreting
dashboards, identifying trends, or explaining data patterns to stakeholders."

Step 3: Structure Content

Start with Quick Start (2-5 actionable lines)
Add When to Use (bullet list of triggers)
Write core content (concepts, workflows, examples)
Add Common Pitfalls
Move detailed content to references/ with loading triggers (e.g. "Read when implementing X")

Step 4: Validate

Run through the evaluation checklist above.

Using the Scorer

Validate Only (fast, no LLM)

uv run python scripts/score-skills.py <skill-dir> --validate_only

Full Scoring (with spec grounding)

uv run python scripts/score-skills.py <skill-dir>

Batch All Skills

uv run python scripts/score-skills.py . --scan_all

Common Pitfalls

First-person descriptions - Use "Analyzes..." not "I analyze..."
Missing trigger keywords - Agents can't find the skill
Too long SKILL.md - Move details to references/ and add trigger context (e.g. "Read when working with X")
Nested reference folders - Only one level allowed
Abstract examples - Use concrete, real scenarios
Noun-form names - Use "analyzing-data" not "data-analyzer"
No Quick Start - Users abandon without immediate value
Inconsistent terminology - Pick terms and stick with them
Missing pitfalls section - Helps users avoid mistakes
Time-sensitive content - Skills should be evergreen

References

skill-checklist.md - Read when scoring or reviewing a skill to get the full rubric breakdown, anti-patterns, and evaluation template
examples.md - Read when creating a new skill or refactoring an existing one to see concrete patterns from well-designed skills

Capabilities

skillsource-altertable-aiskill-evaluating-skillstopic-agent-skillstopic-ai-agentstopic-altertable

Install

Installnpx skills add altertable-ai/skills

Sourcehttps://github.com/altertable-ai/skills/tree/main/skills/evaluating-skills

skills.shhttps://skills.sh/altertable-ai/skills/evaluating-skills

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 7 github stars · SKILL.md body (5,844 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:14:20Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/snSkKh