Skillquality 0.46

trueskill-rank

Domain-agnostic TrueSkill batch ranking via LLM-as-judge. Ranks any list of text items using overlapping subsets dispatched to Codex Spark workers. Swappable rubrics. Use when you need to rank, score, curate, or sort a collection by quality.

Price
free
Protocol
skill
Verified
no

What it does

TrueSkill Rank

Rank any collection of text items by quality using TrueSkill + LLM-as-judge.

Setup

pip install trueskill
  • Python 3.11+ required
  • agent-mux for parallel dispatch (optional -- falls back to direct OpenAI API if OPENAI_API_KEY is set)
  • Claude Code: copy this skill folder into .claude/skills/trueskill-rank/
  • Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, API fallback), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the trueskill-rank skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.


Quick Start

PYTHON="python3"
CLI="~/.claude/skills/trueskill-rank/scripts/trueskill-rank.py"

# Full pipeline: prepare + dispatch + aggregate
$PYTHON $CLI run \
  --input items.json \
  --overlap 3 \
  --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \
  --output results.json

# Or step by step:
$PYTHON $CLI prepare --input items.json --overlap 3 \
  --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \
  --output-dir /tmp/ts-run/
# Dispatch is handled internally by trueskill-rank.py (no separate script needed)
$PYTHON $CLI aggregate --run-dir /tmp/ts-run/ --output results.json

Cost: each subset of 10 items produces C(10,2)=45 implicit pairwise comparisons. 100 items at overlap 3 = 30 subsets = 30 API calls = 1,350 implicit comparisons.

Decision Tree

Mode Selection

QuestionAnswerMode
Ranking individual items (messages, posts)?Yes--mode batch (default)
Comparing entities (channels, sources, candidates)?Yes--mode pairwise
Items > 50?Yes--mode batch (far more efficient)
Items < 20, need binary signal?Yes--mode pairwise

Overlap Selection

OverlapWhen to useCost
--overlap 2Quick scan, low stakes, small setsLowest
--overlap 3Default. Good balance of speed and confidenceMedium
--overlap 4High-stakes curation, final rankingsHighest

Rubric Selection

RubricUse for
practitioner-signal.mdGeneral content quality. 6 criteria led by practitioner signal
signal-serendipity-entropy.mdContent curation emphasizing surprise and cross-domain bridges
Custom rubric via --rubricAny domain -- create from example-template.md

Input Format

{"items": [{"id": "item_001", "text": "...", "metadata": {...}}, ...]}

Source doesn't matter -- TG messages, HN posts, articles, papers, tweets. The id field is required, text is the content to rank, metadata is optional and passed through to output.

CLI Reference

prepare

Generate subsets and prompt files from input items.

$PYTHON $CLI prepare \
  --input items.json \         # Required. JSON with {"items": [...]}
  --mode batch \               # batch (default) or pairwise
  --overlap 3 \                # 2-4, controls statistical robustness
  --subset-size 10 \           # Items per subset (batch) or matchups per batch (pairwise)
  --rubric rubric.md \         # Required. Scoring criteria file
  --output-dir /tmp/ts-run/ \  # Where to write subsets.json and prompts/
  --seed 42 \                  # Random seed for reproducibility
  --text-cap 1500              # Max chars per item in prompts

Output: subsets.json + prompts/subset-NN.txt (batch) or prompts/batch-NN.txt (pairwise)

aggregate

Parse dispatch results and run TrueSkill rating.

$PYTHON $CLI aggregate \
  --run-dir /tmp/ts-run/ \     # Directory with subsets.json and results/
  --output results.json        # Final rankings JSON

run

Full pipeline: prepare + dispatch + aggregate.

$PYTHON $CLI run \
  --input items.json \
  --overlap 3 \
  --rubric rubric.md \
  --output results.json

Dispatch runs automatically via the built-in dispatch_workers() function. No external scripts required.

Output Format

{
  "rankings": [
    {"id": "item_001", "mu": 35.2, "sigma": 2.1, "conservative": 28.9,
     "rank": 1, "appearances": 3, "wins": 8, "losses": 2}
  ],
  "stats": {
    "total_items": 100, "subsets": 30, "results_parsed": 30,
    "parse_errors": 0, "coverage_gaps": 0, "mode": "batch",
    "overlap": 3, "rubric": "practitioner-signal"
  }
}

conservative = mu - 3*sigma. This is the ranking key. Penalizes items with few appearances (high uncertainty).

Dispatch

Built into trueskill-rank.py via dispatch_workers(). No external scripts.

Primary: agent-mux --engine codex --model gpt-5.3-codex-spark --reasoning low --effort low (free via GPT subscription). Resolves agent-mux via AGENT_MUX_PATH env var, then which agent-mux, then relative to skill directory. Runs 6 workers in parallel via concurrent.futures.ThreadPoolExecutor.

Fallback: If agent-mux not found, falls back to direct OpenAI API via urllib.request (stdlib, zero deps). Requires OPENAI_API_KEY env var. Uses gpt-4o-mini. Results written as {"success": true, "response": "..."} JSON.

Creating Custom Rubrics

Copy rubrics/example-template.md and fill in:

# Rubric Name

## Criteria (ordered by importance)
1. **CRITERION** -- Description
2. **CRITERION** -- Description

## Tiebreaker
How to break ties.

## Context
What kind of content this is for.

3-6 criteria recommended. Order matters -- most important first.


Anti-Patterns

Do NOTDo Instead
Use overlap 2 for high-stakes curationUse overlap 3-4 for reliable rankings
Use pairwise mode for 50+ itemsUse batch ranking (far more efficient at scale)
Skip the rubric fileAlways specify a rubric via --rubric
Run without trueskill installedpip install trueskill first
Parse result files manuallyUse the aggregate subcommand

Error Handling

ProblemSolution
trueskill not installedpip install trueskill
agent-mux not found + no OPENAI_API_KEYInstall agent-mux or set OPENAI_API_KEY env var
Parse errors in resultsCheck result files in results/ dir, retry failed subsets
Coverage gaps in aggregate outputIncrease overlap coefficient (--overlap 3 or --overlap 4)
Empty items array errorCheck input JSON format -- must be {"items": [...]} with at least one item

Bundled Resources Index

PathWhatWhen to Load
./SKILL.mdSkill runbook (this file)Always
./UPDATES.mdStructured changelog for AI agentsWhen checking for new features or updates
./UPDATE-GUIDE.mdInstructions for AI agents performing updatesWhen updating this skill
./scripts/trueskill-rank.pyMain CLI script -- prepare, dispatch, aggregateAlways (execution)
./references/algorithm.mdTrueSkill math, N-player mode, convergence, cost scalingWhen tuning parameters or understanding ranking behavior
./references/prior-runs.mdPrevious runs with statistics and lessons learnedWhen calibrating overlap, rubrics, or interpreting results
./references/installation-guide.mdDetailed install walkthrough for Claude Code and Codex CLIFirst-time setup or environment repair
./rubrics/practitioner-signal.mdGeneral content quality rubric (6 criteria)Default rubric for most ranking tasks
./rubrics/signal-serendipity-entropy.mdCuration rubric emphasizing surprise and cross-domain bridgesContent discovery and curation
./rubrics/example-template.mdTemplate for creating custom rubricsWhen creating a new domain-specific rubric

Capabilities

skillsource-buildoakskill-trueskill-ranktopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-automationtopic-browser-automationtopic-claude-codetopic-claude-skillstopic-codex

Install

Installnpx skills add buildoak/fieldwork-skills
Transportskills-sh
Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,129 chars)

Provenance

Indexed fromgithub
Enriched2026-04-22 19:06:33Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-04-22

Agent access