skill-scorer
Evaluates Agent Skills (Cursor / Claude / OpenClaw compatible) and produces a quantitative, rubric-based score with actionable improvement suggestions. Use when the user asks to review, rate, audit, grade, lint, or improve a SKILL.md file, a skill folder, or a skill archive, or s
What it does
skill-scorer
一个"评测 Skill 的 Skill"。接收任意 Agent Skill 的源文件,必须依据本仓库的官方评分入口和 rubric/rubric.yaml
给出 5 大支柱的 100 分制评分、等级、证据引用与改进建议。Rubric 内置三类型差异化(atomic / pipeline / composite),子维度数随 skill 结构自动启用,由 applies_to 字段控制。同时兼容 Cursor / Claude / OpenClaw 三套规范。
When to use
- 用户提供
SKILL.md/ skill 文件夹 /.zip/ GitHub URL,并请求评分、审计或改进建议。 - 用户询问"我这个 skill 写得怎么样"、"怎么提升我的 skill 质量"、"帮我对齐官方最佳实践"。
- 不适用于:评价非 Skill 类文档(普通 README / 博客 / prompt 模板)。
Code Agent Quick Start
如果你是 Cursor、WorkBuddy、Hermes、小龙虾或类似 code agent,先读 USAGE.md。
推荐先运行 CLI 向导,让用户选择通用评测或金融专家版;如果选择金融专家版,向导会继续确认金融子场景,并输出后续官方命令:
python3 skills/skill-scorer/scripts/score.py --agent-wizard <path-to-skill-zip-dir-or-SKILL.md>
规则分预览:
python3 skills/skill-scorer/scripts/score.py <path-to-skill-zip-dir-or-SKILL.md>
完整 agent-side Deep Review(使用 code agent 自己的模型套餐,不消耗 SkillLens 服务端 key):
python3 skills/skill-scorer/scripts/score.py --agent-prompt <path-to-skill-zip-dir-or-SKILL.md> > agent-deep-review-prompt.md
# 将 agent-deep-review-prompt.md 完整交给当前 code agent 的模型,保存严格 JSON 为 agent-llm-results.json
python3 skills/skill-scorer/scripts/score.py --llm-results agent-llm-results.json <path-to-skill-zip-dir-or-SKILL.md>
不得临时生成自定义评分脚本替代官方 CLI;最终分数必须来自最后一步官方 CLI 输出。
金融专家版(可选)应优先通过 --agent-wizard 选择;手动执行时,必须在 --agent-prompt 和 --llm-results 两步都加入相同的 --domain finance --scenario <scenario-id>。支持的场景详见 USAGE.md。
Inputs
- 一个
SKILL.md文本,或 - 一个 skill 目录(含
scripts/references/assets/等),或 - 一个
.zip打包的 skill,或 - 一个指向 skill 仓库/子目录的 GitHub URL(Web 工具侧支持)。
Outputs
{
"spec": "claude | openclaw",
"language": "zh | en",
"score": 0-100,
"grade": "S | A | B | C | D",
"pillars": [
{
"id": "business_value",
"score": 0-25,
"dimensions": [
{
"id": "...",
"checks": [
{
"id": "...",
"status": "pass|partial|fail|n_a",
"evidence": "<primary-language alias>",
"evidence_zh": "中文现状",
"evidence_en": "English diagnosis",
"fix": "<primary-language alias>",
"fix_zh": "中文改法",
"fix_en": "English fix"
}
]
}
]
}
],
"bonus": 0-5,
"suggestions": [
{
"title": "Top 改进项",
"title_zh": "中文 Top 改进项",
"title_en": "English Top Improvement",
"why": "现状",
"why_zh": "中文现状",
"why_en": "English why",
"how": "改法",
"how_zh": "中文改法",
"how_en": "English how"
}
],
"deepReviewCertificate": {
"status": "verified"
}
}
evidence_zh + evidence_en (and fix_zh + fix_en, why_zh + why_en, how_zh + how_en, title_zh + title_en) are the canonical bilingual fields ≥ engineVersion 0.4.1. The unsuffixed evidence / fix / why / how / title are preserved as back-compat aliases pointing at the primary language so older readers keep working. The HTML report's ZH/EN toggle uses the suffixed fields to switch body content; falls back to the bare field when the JSON predates the bilingual schema.
Workflow
-
Locate SkillLens root:先定位包含
skills/skill-scorer/rubric/rubric.yaml的 SkillLens 仓库根目录。 -
Run official scorer:运行官方 CLI,不得临时生成替代评分脚本:
python3 skills/skill-scorer/scripts/score.py <path-to-skill-zip-dir-or-SKILL.md> -
Choose review mode:优先运行
--agent-wizard。如手动执行,必须确认是否启用领域专家版;当前 MVP 支持finance,并必须确认具体--scenario。 -
Agent-side Deep Review when requested:如需完整深度评测,必须先运行
--agent-prompt生成官方提示词,用当前 code agent 的模型返回严格 JSON,再运行--llm-results合并。领域专家版必须在两步命令都带上相同的--domain/--scenario。 -
Use official JSON only:总分、等级、pillar/dimension/check 分数必须来自官方 CLI 最终 JSON 输出,不能由 Agent 自己重算或补满。
-
Verify certificate:完整 Deep Review 必须包含
deepReviewCertificate.status="verified";金融专家版还必须包含domainExpert和deepReviewCertificate.domain;没有证书只能称为规则分预览或非官方结果。 -
Render:按用户阅读语言(zh / en)从 JSON 取双语字段(
evidence_zh+evidence_en,fix_zh+fix_en,why_zh+why_en,how_zh+how_en)渲染报告;Top 改进项必须来自 JSON 的suggestions,旧版单语 JSON 可回退到evidence/fix/why/how。
Official Tool Contract
- MUST call
skills/skill-scorer/scripts/score.pyfor local tool use, or call the deployed SkillLens Web/API endpoint when the user explicitly提供该服务地址。 - SHOULD start with
--agent-wizardfor agent-side Deep Review so the user explicitly chooses general vs. finance expert review. - MUST use the official
--agent-prompt→ model JSON →--llm-resultsflow for agent-side Deep Review. - MUST ask before enabling domain expert review when not using the wizard; for finance, pass the same
--domain finance --scenario <scenario-id>in prompt generation and merge. - MUST NOT paste or synthesize a new
python3 <<'PYEOF' ...scoring script to replace the official scorer. - MUST NOT claim "全面检测"、"Deep Review 完成"、"43 项全部通过" 或 "100/100" unless those exact values appear in official SkillLens output.
- MUST NOT call a result official full Deep Review unless
deepReviewCertificate.statusis exactlyverified. - MUST preserve
llmComplete=false/llmCoveragein the rendered report. If LLM checks are skipped, say so clearly. - MUST include the scoring source in every report, for example:
source: official SkillLens CLIorsource: SkillLens Web Deep Review. - MUST treat
rubric/rubric.yamlas read-only scoring data. Do not alter weights, thresholds, or pass/partial/fail mapping during evaluation.
Guardrails
- 规则分必须确定性且 跨语言一致(TS 前端与 Python CLI 行为等价)。
- LLM 评审仅用于
type: llm的细则,不得覆盖或改写规则分结果。 - 报告语言始终跟随被测 skill 的主语言,除非用户在 Web 端手动切换。
- 不在报告中回显原 skill 中可能的密钥/凭证字符串。
- 如果无法运行官方 CLI 或访问官方 Web/API,必须停止并说明原因;不得退回到自制评分器。
Files
rubric/rubric.yaml— 评分细则(Web 端与 CLI 共用的单一事实源)domains/finance/rubric.yaml— 金融专家版评分细则(通用分之外的附加专家报告)scripts/score.py— 官方本地 CLI 打分脚本(规则分预览;不会伪造 LLM Deep Review)USAGE.md— 给 Cursor / WorkBuddy / Hermes / 小龙虾等 code agent 的官方调用契约references/best-practices.md— Skill 写作最佳实践(供 LLM few-shot 与人类阅读)
Report Rendering Rules
Render the official JSON into a concise report. Do not use a fixed sample score. Use this shape:
# SkillLens Report
source: official SkillLens CLI | SkillLens Web Deep Review
mode: rule-only preview | full deep review
llmComplete: true | false
**Total**: <score from JSON> / 100 · **Grade**: <grade from JSON>
## Pillars
| Pillar | Score | LLM coverage |
|---|---:|---:|
| <pillar.name_zh/name_en> | <pillar.score>/<pillar.weight> | <evaluated>/<total> |
## Top Improvements
1. <suggestion.title from JSON>
- 现状/Why: <suggestion.why>
- 改法/How: <suggestion.how>
If the CLI output says llmComplete=false, explicitly call the result a rule-only preview. Never upgrade it to a full deep review.
Capabilities
Install
Quality
deterministic score 0.48 from registry signals: · indexed on github topic:agent-skills · 56 github stars · SKILL.md body (6,892 chars)