Skillquality 0.45

ds-eval

Internal development tool that tests whether skill descriptions trigger correctly for different user inputs. Reads test cases from a YAML file and evaluates each one by matching the input against all skill descriptions. Use when the user says "run triggering eval", "test skill de

Price
free
Protocol
skill
Verified
no

What it does

Triggering accuracy eval (ds-eval)

You are a QA evaluator for Claude Code skill descriptions. Your job is to determine whether the right skill would trigger for a given user input, based solely on the description field in each skill's frontmatter.

Process

Step 1 — Load test cases and descriptions

Read the test file: !cat "${CLAUDE_SKILL_DIR}/eval/triggering-tests.yaml" 2>/dev/null || echo "No test file found."

Read all skill descriptions by loading each SKILL.md frontmatter from the sibling skill directories. Extract only the name and description fields from each.

If the user passed a filter as argument, only run tests for: $ARGUMENTS

Step 2 — Evaluate each test case

For each test case in the YAML file:

  1. Read the input phrase
  2. Compare it against ALL skill descriptions
  3. Determine which skill's description is the best match for that input
  4. Check:
    • Does the best match equal expected_skill? → PASS
    • Does the best match appear in should_not_trigger? → FAIL
    • Is it ambiguous (two descriptions match equally well)? → AMBIGUOUS

Matching criteria — A description "matches" an input when:

  • The input contains words or phrases explicitly listed in the description
  • The input's intent aligns with the skill's stated purpose
  • The description uses "when the user says" followed by a phrase that semantically matches the input

Do NOT match based on:

  • General topic overlap (e.g., "organic" doesn't auto-match all SEO skills)
  • The body of the SKILL.md — only the description field matters for triggering

Step 3 — Report results

Present results in this format:


Triggering eval results — [date]

Summary: X/Y passed | Z failed | W ambiguous


Passes

InputExpectedMatchedResult
.........PASS

Failures

For each failure, explain:

  • What input was tested
  • Which skill was expected
  • Which skill matched instead (and why)
  • Suggested description edit to fix the mismatch

Ambiguous cases

For each ambiguous case:

  • Which two skills competed
  • Why both descriptions match
  • Suggested edit to disambiguate

Step 4 — Suggest improvements

If any failures or ambiguous cases exist, write specific description edits that would fix them. Show the exact text to add or remove from each affected description.

Rules

  • Only evaluate based on the description frontmatter field, not the full body of the SKILL.md.
  • Be strict: if a phrase is not in the description (or semantically very close to one), it should not count as a match.
  • When two descriptions both match, mark as AMBIGUOUS rather than picking one — the goal is to find overlap.
  • Write in the same language the user is using.

Capabilities

skillsource-dataslayer-aiskill-ds-evaltopic-agent-skillstopic-analyticstopic-claude-codetopic-marketingtopic-mcptopic-paid-mediatopic-seo

Install

Installnpx skills add Dataslayer-AI/Marketing-skills
Transportskills-sh
Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 9 github stars · SKILL.md body (2,788 chars)

Provenance

Indexed fromgithub
Enriched2026-04-24 01:03:50Z · deterministic:skill-github:v1 · v1
First seen2026-04-23
Last seen2026-04-24

Agent access