Skillquality 0.46

trueskill-rank

Domain-agnostic TrueSkill batch ranking via LLM-as-judge. Ranks any list of text items using overlapping subsets dispatched to Codex Spark workers. Swappable rubrics. Use when you need to rank, score, curate, or sort a collection by quality.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/buildoak/fieldwork-skills/trueskill-rank

What it does

TrueSkill Rank

Rank any collection of text items by quality using TrueSkill + LLM-as-judge.

Setup

pip install trueskill

Python 3.11+ required
agent-mux for parallel dispatch (optional -- falls back to direct OpenAI API if OPENAI_API_KEY is set)
Claude Code: copy this skill folder into .claude/skills/trueskill-rank/
Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, API fallback), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the trueskill-rank skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Quick Start

PYTHON="python3"
CLI="~/.claude/skills/trueskill-rank/scripts/trueskill-rank.py"

# Full pipeline: prepare + dispatch + aggregate
$PYTHON $CLI run \
  --input items.json \
  --overlap 3 \
  --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \
  --output results.json

# Or step by step:
$PYTHON $CLI prepare --input items.json --overlap 3 \
  --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \
  --output-dir /tmp/ts-run/
# Dispatch is handled internally by trueskill-rank.py (no separate script needed)
$PYTHON $CLI aggregate --run-dir /tmp/ts-run/ --output results.json

Cost: each subset of 10 items produces C(10,2)=45 implicit pairwise comparisons. 100 items at overlap 3 = 30 subsets = 30 API calls = 1,350 implicit comparisons.

Decision Tree

Mode Selection

Question	Answer	Mode
Ranking individual items (messages, posts)?	Yes	`--mode batch` (default)
Comparing entities (channels, sources, candidates)?	Yes	`--mode pairwise`
Items > 50?	Yes	`--mode batch` (far more efficient)
Items < 20, need binary signal?	Yes	`--mode pairwise`

Overlap Selection

Overlap	When to use	Cost
`--overlap 2`	Quick scan, low stakes, small sets	Lowest
`--overlap 3`	Default. Good balance of speed and confidence	Medium
`--overlap 4`	High-stakes curation, final rankings	Highest

Rubric Selection

Rubric	Use for
`practitioner-signal.md`	General content quality. 6 criteria led by practitioner signal
`signal-serendipity-entropy.md`	Content curation emphasizing surprise and cross-domain bridges
Custom rubric via `--rubric`	Any domain -- create from `example-template.md`

Input Format

{"items": [{"id": "item_001", "text": "...", "metadata": {...}}, ...]}

Source doesn't matter -- TG messages, HN posts, articles, papers, tweets. The id field is required, text is the content to rank, metadata is optional and passed through to output.

CLI Reference

prepare

Generate subsets and prompt files from input items.

$PYTHON $CLI prepare \
  --input items.json \         # Required. JSON with {"items": [...]}
  --mode batch \               # batch (default) or pairwise
  --overlap 3 \                # 2-4, controls statistical robustness
  --subset-size 10 \           # Items per subset (batch) or matchups per batch (pairwise)
  --rubric rubric.md \         # Required. Scoring criteria file
  --output-dir /tmp/ts-run/ \  # Where to write subsets.json and prompts/
  --seed 42 \                  # Random seed for reproducibility
  --text-cap 1500              # Max chars per item in prompts

Output: subsets.json + prompts/subset-NN.txt (batch) or prompts/batch-NN.txt (pairwise)

aggregate

Parse dispatch results and run TrueSkill rating.

$PYTHON $CLI aggregate \
  --run-dir /tmp/ts-run/ \     # Directory with subsets.json and results/
  --output results.json        # Final rankings JSON

run

Full pipeline: prepare + dispatch + aggregate.

$PYTHON $CLI run \
  --input items.json \
  --overlap 3 \
  --rubric rubric.md \
  --output results.json

Dispatch runs automatically via the built-in dispatch_workers() function. No external scripts required.

Output Format

{
  "rankings": [
    {"id": "item_001", "mu": 35.2, "sigma": 2.1, "conservative": 28.9,
     "rank": 1, "appearances": 3, "wins": 8, "losses": 2}
  ],
  "stats": {
    "total_items": 100, "subsets": 30, "results_parsed": 30,
    "parse_errors": 0, "coverage_gaps": 0, "mode": "batch",
    "overlap": 3, "rubric": "practitioner-signal"
  }
}

conservative = mu - 3*sigma. This is the ranking key. Penalizes items with few appearances (high uncertainty).

Dispatch

Built into trueskill-rank.py via dispatch_workers(). No external scripts.

Primary: agent-mux --engine codex --model gpt-5.3-codex-spark --reasoning low --effort low (free via GPT subscription). Resolves agent-mux via AGENT_MUX_PATH env var, then which agent-mux, then relative to skill directory. Runs 6 workers in parallel via concurrent.futures.ThreadPoolExecutor.

Fallback: If agent-mux not found, falls back to direct OpenAI API via urllib.request (stdlib, zero deps). Requires OPENAI_API_KEY env var. Uses gpt-4o-mini. Results written as {"success": true, "response": "..."} JSON.

Creating Custom Rubrics

Copy rubrics/example-template.md and fill in:

# Rubric Name

## Criteria (ordered by importance)
1. **CRITERION** -- Description
2. **CRITERION** -- Description

## Tiebreaker
How to break ties.

## Context
What kind of content this is for.

3-6 criteria recommended. Order matters -- most important first.

Anti-Patterns

Do NOT	Do Instead
Use overlap 2 for high-stakes curation	Use overlap 3-4 for reliable rankings
Use pairwise mode for 50+ items	Use batch ranking (far more efficient at scale)
Skip the rubric file	Always specify a rubric via `--rubric`
Run without trueskill installed	`pip install trueskill` first
Parse result files manually	Use the `aggregate` subcommand

Error Handling

Problem	Solution
`trueskill` not installed	`pip install trueskill`
agent-mux not found + no `OPENAI_API_KEY`	Install agent-mux or set `OPENAI_API_KEY` env var
Parse errors in results	Check result files in `results/` dir, retry failed subsets
Coverage gaps in aggregate output	Increase overlap coefficient (`--overlap 3` or `--overlap 4`)
Empty items array error	Check input JSON format -- must be `{"items": [...]}` with at least one item

Bundled Resources Index

Path	What	When to Load
`./SKILL.md`	Skill runbook (this file)	Always
`./UPDATES.md`	Structured changelog for AI agents	When checking for new features or updates
`./UPDATE-GUIDE.md`	Instructions for AI agents performing updates	When updating this skill
`./scripts/trueskill-rank.py`	Main CLI script -- prepare, dispatch, aggregate	Always (execution)
`./references/algorithm.md`	TrueSkill math, N-player mode, convergence, cost scaling	When tuning parameters or understanding ranking behavior
`./references/prior-runs.md`	Previous runs with statistics and lessons learned	When calibrating overlap, rubrics, or interpreting results
`./references/installation-guide.md`	Detailed install walkthrough for Claude Code and Codex CLI	First-time setup or environment repair
`./rubrics/practitioner-signal.md`	General content quality rubric (6 criteria)	Default rubric for most ranking tasks
`./rubrics/signal-serendipity-entropy.md`	Curation rubric emphasizing surprise and cross-domain bridges	Content discovery and curation
`./rubrics/example-template.md`	Template for creating custom rubrics	When creating a new domain-specific rubric

Capabilities

skillsource-buildoakskill-trueskill-ranktopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-automationtopic-browser-automationtopic-claude-codetopic-claude-skillstopic-codex

Install

Installnpx skills add buildoak/fieldwork-skills

Sourcehttps://github.com/buildoak/fieldwork-skills/tree/main/skills/trueskill-rank

skills.shhttps://skills.sh/buildoak/fieldwork-skills/trueskill-rank

Transportskills-sh

Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,129 chars)

Provenance

Indexed fromgithub

Enriched2026-04-22 19:06:33Z · deterministic:skill-github:v1 · v1

First seen2026-04-18

Last seen2026-04-22

Agent access

JSONhttps://clawmart.sh/api/listings/dHTuva