{"id":"186827d9-2a1e-4b27-9cb1-fc93f0d05960","shortId":"dHTuva","kind":"skill","title":"trueskill-rank","tagline":"Domain-agnostic TrueSkill batch ranking via LLM-as-judge. Ranks any list of text items using overlapping subsets dispatched to Codex Spark workers. Swappable rubrics. Use when you need to rank, score, curate, or sort a collection by quality.","description":"# TrueSkill Rank\n\nRank any collection of text items by quality using TrueSkill + LLM-as-judge.\n\n## Setup\n\n```bash\npip install trueskill\n```\n\n- **Python 3.11+** required\n- **agent-mux** for parallel dispatch (optional -- falls back to direct OpenAI API if `OPENAI_API_KEY` is set)\n- **Claude Code:** copy this skill folder into `.claude/skills/trueskill-rank/`\n- **Codex CLI:** append this SKILL.md content to your project's root `AGENTS.md`\n\nFor the full installation walkthrough (prerequisites, verification, API fallback), see [references/installation-guide.md](references/installation-guide.md).\n\n## Staying Updated\n\nThis skill ships with an `UPDATES.md` changelog and `UPDATE-GUIDE.md` for your AI agent.\n\nAfter installing, tell your agent: \"Check `UPDATES.md` in the trueskill-rank skill for any new features or changes.\"\n\nWhen updating, tell your agent: \"Read `UPDATE-GUIDE.md` and apply the latest changes from `UPDATES.md`.\"\n\nFollow `UPDATE-GUIDE.md` so customized local files are diffed before any overwrite.\n\n---\n\n## Quick Start\n\n```bash\nPYTHON=\"python3\"\nCLI=\"~/.claude/skills/trueskill-rank/scripts/trueskill-rank.py\"\n\n# Full pipeline: prepare + dispatch + aggregate\n$PYTHON $CLI run \\\n  --input items.json \\\n  --overlap 3 \\\n  --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \\\n  --output results.json\n\n# Or step by step:\n$PYTHON $CLI prepare --input items.json --overlap 3 \\\n  --rubric ~/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md \\\n  --output-dir /tmp/ts-run/\n# Dispatch is handled internally by trueskill-rank.py (no separate script needed)\n$PYTHON $CLI aggregate --run-dir /tmp/ts-run/ --output results.json\n```\n\nCost: each subset of 10 items produces C(10,2)=45 implicit pairwise comparisons. 100 items at overlap 3 = 30 subsets = 30 API calls = 1,350 implicit comparisons.\n\n## Decision Tree\n\n### Mode Selection\n\n| Question | Answer | Mode |\n|----------|--------|------|\n| Ranking individual items (messages, posts)? | Yes | `--mode batch` (default) |\n| Comparing entities (channels, sources, candidates)? | Yes | `--mode pairwise` |\n| Items > 50? | Yes | `--mode batch` (far more efficient) |\n| Items < 20, need binary signal? | Yes | `--mode pairwise` |\n\n### Overlap Selection\n\n| Overlap | When to use | Cost |\n|---------|-------------|------|\n| `--overlap 2` | Quick scan, low stakes, small sets | Lowest |\n| `--overlap 3` | Default. Good balance of speed and confidence | Medium |\n| `--overlap 4` | High-stakes curation, final rankings | Highest |\n\n### Rubric Selection\n\n| Rubric | Use for |\n|--------|---------|\n| `practitioner-signal.md` | General content quality. 6 criteria led by practitioner signal |\n| `signal-serendipity-entropy.md` | Content curation emphasizing surprise and cross-domain bridges |\n| Custom rubric via `--rubric` | Any domain -- create from `example-template.md` |\n\n## Input Format\n\n```json\n{\"items\": [{\"id\": \"item_001\", \"text\": \"...\", \"metadata\": {...}}, ...]}\n```\n\nSource doesn't matter -- TG messages, HN posts, articles, papers, tweets. The `id` field is required, `text` is the content to rank, `metadata` is optional and passed through to output.\n\n## CLI Reference\n\n### prepare\n\nGenerate subsets and prompt files from input items.\n\n```bash\n$PYTHON $CLI prepare \\\n  --input items.json \\         # Required. JSON with {\"items\": [...]}\n  --mode batch \\               # batch (default) or pairwise\n  --overlap 3 \\                # 2-4, controls statistical robustness\n  --subset-size 10 \\           # Items per subset (batch) or matchups per batch (pairwise)\n  --rubric rubric.md \\         # Required. Scoring criteria file\n  --output-dir /tmp/ts-run/ \\  # Where to write subsets.json and prompts/\n  --seed 42 \\                  # Random seed for reproducibility\n  --text-cap 1500              # Max chars per item in prompts\n```\n\nOutput: `subsets.json` + `prompts/subset-NN.txt` (batch) or `prompts/batch-NN.txt` (pairwise)\n\n### aggregate\n\nParse dispatch results and run TrueSkill rating.\n\n```bash\n$PYTHON $CLI aggregate \\\n  --run-dir /tmp/ts-run/ \\     # Directory with subsets.json and results/\n  --output results.json        # Final rankings JSON\n```\n\n### run\n\nFull pipeline: prepare + dispatch + aggregate.\n\n```bash\n$PYTHON $CLI run \\\n  --input items.json \\\n  --overlap 3 \\\n  --rubric rubric.md \\\n  --output results.json\n```\n\nDispatch runs automatically via the built-in `dispatch_workers()` function. No external scripts required.\n\n## Output Format\n\n```json\n{\n  \"rankings\": [\n    {\"id\": \"item_001\", \"mu\": 35.2, \"sigma\": 2.1, \"conservative\": 28.9,\n     \"rank\": 1, \"appearances\": 3, \"wins\": 8, \"losses\": 2}\n  ],\n  \"stats\": {\n    \"total_items\": 100, \"subsets\": 30, \"results_parsed\": 30,\n    \"parse_errors\": 0, \"coverage_gaps\": 0, \"mode\": \"batch\",\n    \"overlap\": 3, \"rubric\": \"practitioner-signal\"\n  }\n}\n```\n\n`conservative` = mu - 3*sigma. This is the ranking key. Penalizes items with few appearances (high uncertainty).\n\n## Dispatch\n\nBuilt into `trueskill-rank.py` via `dispatch_workers()`. No external scripts.\n\nPrimary: `agent-mux --engine codex --model gpt-5.3-codex-spark --reasoning low --effort low` (free via GPT subscription). Resolves agent-mux via `AGENT_MUX_PATH` env var, then `which agent-mux`, then relative to skill directory. Runs 6 workers in parallel via `concurrent.futures.ThreadPoolExecutor`.\n\nFallback: If `agent-mux` not found, falls back to direct OpenAI API via `urllib.request` (stdlib, zero deps). Requires `OPENAI_API_KEY` env var. Uses `gpt-4o-mini`. Results written as `{\"success\": true, \"response\": \"...\"}` JSON.\n\n## Creating Custom Rubrics\n\nCopy `rubrics/example-template.md` and fill in:\n\n```markdown\n# Rubric Name\n\n## Criteria (ordered by importance)\n1. **CRITERION** -- Description\n2. **CRITERION** -- Description\n\n## Tiebreaker\nHow to break ties.\n\n## Context\nWhat kind of content this is for.\n```\n\n3-6 criteria recommended. Order matters -- most important first.\n\n---\n\n## Anti-Patterns\n\n| Do NOT | Do Instead |\n|--------|------------|\n| Use overlap 2 for high-stakes curation | Use overlap 3-4 for reliable rankings |\n| Use pairwise mode for 50+ items | Use batch ranking (far more efficient at scale) |\n| Skip the rubric file | Always specify a rubric via `--rubric` |\n| Run without trueskill installed | `pip install trueskill` first |\n| Parse result files manually | Use the `aggregate` subcommand |\n\n## Error Handling\n\n| Problem | Solution |\n|---------|----------|\n| `trueskill` not installed | `pip install trueskill` |\n| agent-mux not found + no `OPENAI_API_KEY` | Install agent-mux or set `OPENAI_API_KEY` env var |\n| Parse errors in results | Check result files in `results/` dir, retry failed subsets |\n| Coverage gaps in aggregate output | Increase overlap coefficient (`--overlap 3` or `--overlap 4`) |\n| Empty items array error | Check input JSON format -- must be `{\"items\": [...]}` with at least one item |\n\n---\n\n## Bundled Resources Index\n\n| Path | What | When to Load |\n|------|------|-------------|\n| `./SKILL.md` | Skill runbook (this file) | Always |\n| `./UPDATES.md` | Structured changelog for AI agents | When checking for new features or updates |\n| `./UPDATE-GUIDE.md` | Instructions for AI agents performing updates | When updating this skill |\n| `./scripts/trueskill-rank.py` | Main CLI script -- prepare, dispatch, aggregate | Always (execution) |\n| `./references/algorithm.md` | TrueSkill math, N-player mode, convergence, cost scaling | When tuning parameters or understanding ranking behavior |\n| `./references/prior-runs.md` | Previous runs with statistics and lessons learned | When calibrating overlap, rubrics, or interpreting results |\n| `./references/installation-guide.md` | Detailed install walkthrough for Claude Code and Codex CLI | First-time setup or environment repair |\n| `./rubrics/practitioner-signal.md` | General content quality rubric (6 criteria) | Default rubric for most ranking tasks |\n| `./rubrics/signal-serendipity-entropy.md` | Curation rubric emphasizing surprise and cross-domain bridges | Content discovery and curation |\n| `./rubrics/example-template.md` | Template for creating custom rubrics | When creating a new domain-specific rubric |","tags":["trueskill","rank","fieldwork","skills","buildoak","agent-skills","ai-agents","ai-tools","automation","browser-automation","claude-code","claude-skills"],"capabilities":["skill","source-buildoak","skill-trueskill-rank","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-automation","topic-browser-automation","topic-claude-code","topic-claude-skills","topic-codex"],"categories":["fieldwork-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/buildoak/fieldwork-skills/trueskill-rank","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add buildoak/fieldwork-skills","source_repo":"https://github.com/buildoak/fieldwork-skills","install_from":"skills.sh"}},"qualityScore":"0.457","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,129 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T19:06:33.587Z","embedding":null,"createdAt":"2026-04-18T23:07:17.183Z","updatedAt":"2026-04-22T19:06:33.587Z","lastSeenAt":"2026-04-22T19:06:33.587Z","tsv":"'-4':444,773 '-5.3':637 '-6':747 '/.claude/skills/trueskill-rank/rubrics/practitioner-signal.md':199,214 '/.claude/skills/trueskill-rank/scripts/trueskill-rank.py':185 '/references/algorithm.md':936 '/references/installation-guide.md':968 '/references/prior-runs.md':953 '/rubrics/example-template.md':1012 '/rubrics/practitioner-signal.md':985 '/rubrics/signal-serendipity-entropy.md':998 '/scripts/trueskill-rank.py':927 '/skill.md':897 '/tmp/ts-run':218,235,470,515 '/update-guide.md':916 '/updates.md':903 '0':591,594 '001':381,565 '1':262,573,727 '10':242,246,451 '100':252,583 '1500':486 '2':247,314,443,579,730,764 '2.1':569 '20':299 '28.9':571 '3':197,212,256,323,442,539,575,598,605,746,772,869 '3.11':67 '30':257,259,585,588 '35.2':567 '350':263 '4':333,872 '42':478 '45':248 '4o':703 '50':291,781 '6':350,670,990 '8':577 'agent':70,134,139,158,631,651,654,662,679,828,838,908,920 'agent-mux':69,630,650,661,678,827,837 'agents.md':107 'aggreg':190,231,500,511,531,815,863,933 'agnost':6 'ai':133,907,919 'alway':795,902,934 'answer':271 'anti':756 'anti-pattern':755 'api':81,84,115,260,688,696,834,843 'appear':574,616 'append':98 'appli':162 'array':875 'articl':392 'automat':546 'back':77,684 'balanc':326 'bash':62,181,425,508,532 'batch':8,280,294,436,437,455,459,496,596,784 'behavior':952 'binari':301 'break':736 'bridg':365,1007 'built':550,620 'built-in':549 'bundl':889 'c':245 'calibr':962 'call':261 'candid':286 'cap':485 'chang':153,165 'changelog':128,905 'channel':284 'char':488 'check':140,851,877,910 'claud':88,973 'claude/skills/trueskill-rank':95 'cli':97,184,192,207,230,414,427,510,534,929,977 'code':89,974 'codex':26,96,634,639,976 'codex-spark':638 'coeffici':867 'collect':42,49 'compar':282 'comparison':251,265 'concurrent.futures.threadpoolexecutor':675 'confid':330 'conserv':570,603 'content':101,348,357,403,742,987,1008 'context':738 'control':445 'converg':943 'copi':90,715 'cost':238,312,944 'coverag':592,860 'creat':372,712,1015,1019 'criteria':351,465,723,748,991 'criterion':728,731 'cross':363,1005 'cross-domain':362,1004 'curat':38,337,358,769,999,1011 'custom':171,366,713,1016 'decis':266 'default':281,324,438,992 'dep':693 'descript':729,732 'detail':969 'dif':175 'dir':217,234,469,514,856 'direct':79,686 'directori':516,668 'discoveri':1009 'dispatch':24,74,189,219,502,530,544,552,619,624,932 'doesn':385 'domain':5,364,371,1006,1023 'domain-agnost':4 'domain-specif':1022 'effici':297,788 'effort':643 'emphas':359,1001 'empti':873 'engin':633 'entiti':283 'env':657,698,845 'environ':983 'error':590,817,848,876 'example-template.md':374 'execut':935 'extern':556,627 'fail':858 'fall':76,683 'fallback':116,676 'far':295,786 'featur':151,913 'field':397 'file':173,421,466,794,811,853,901 'fill':718 'final':338,523 'first':754,808,979 'first-tim':978 'folder':93 'follow':168 'format':376,560,880 'found':682,831 'free':645 'full':110,186,527 'function':554 'gap':593,861 'general':347,986 'generat':417 'good':325 'gpt':636,647,702 'gpt-4o-mini':701 'handl':221,818 'high':335,617,767 'high-stak':334,766 'highest':340 'hn':390 'id':379,396,563 'implicit':249,264 'import':726,753 'increas':865 'index':891 'individu':274 'input':194,209,375,423,429,536,878 'instal':64,111,136,804,806,823,825,836,970 'instead':761 'instruct':917 'intern':222 'interpret':966 'item':20,52,243,253,275,290,298,378,380,424,434,452,490,564,582,613,782,874,883,888 'items.json':195,210,430,537 'json':377,432,525,561,711,879 'judg':14,60 'key':85,611,697,835,844 'kind':740 'latest':164 'learn':960 'least':886 'led':352 'lesson':959 'list':17 'llm':12,58 'llm-as-judg':11,57 'load':896 'local':172 'loss':578 'low':317,642,644 'lowest':321 'main':928 'manual':812 'markdown':720 'matchup':457 'math':938 'matter':387,751 'max':487 'medium':331 'messag':276,389 'metadata':383,406 'mini':704 'mode':268,272,279,288,293,304,435,595,779,942 'model':635 'mu':566,604 'must':881 'mux':71,632,652,655,663,680,829,839 'n':940 'n-player':939 'name':722 'need':34,228,300 'new':150,912,1021 'one':887 'openai':80,83,687,695,833,842 'option':75,408 'order':724,750 'output':200,216,236,413,468,493,521,542,559,864 'output-dir':215,467 'overlap':22,196,211,255,306,308,313,322,332,441,538,597,763,771,866,868,871,963 'overwrit':178 'pairwis':250,289,305,440,460,499,778 'paper':393 'parallel':73,673 'paramet':948 'pars':501,587,589,809,847 'pass':410 'path':656,892 'pattern':757 'penal':612 'per':453,458,489 'perform':921 'pip':63,805,824 'pipelin':187,528 'player':941 'post':277,391 'practition':354,601 'practitioner-sign':600 'practitioner-signal.md':346 'prepar':188,208,416,428,529,931 'prerequisit':113 'previous':954 'primari':629 'problem':819 'produc':244 'project':104 'prompt':420,476,492 'prompts/batch-nn.txt':498 'prompts/subset-nn.txt':495 'python':66,182,191,206,229,426,509,533 'python3':183 'qualiti':44,54,349,988 'question':270 'quick':179,315 'random':479 'rank':3,9,15,36,46,47,146,273,339,405,524,562,572,610,776,785,951,996 'rate':507 'read':159 'reason':641 'recommend':749 'refer':415 'references/installation-guide.md':118,119 'relat':665 'reliabl':775 'repair':984 'reproduc':482 'requir':68,399,431,463,558,694 'resolv':649 'resourc':890 'respons':710 'result':503,520,586,705,810,850,852,855,967 'results.json':201,237,522,543 'retri':857 'robust':447 'root':106 'rubric':30,198,213,341,343,367,369,461,540,599,714,721,793,798,800,964,989,993,1000,1017,1025 'rubric.md':462,541 'rubrics/example-template.md':716 'run':193,233,505,513,526,535,545,669,801,955 'run-dir':232,512 'runbook':899 'scale':790,945 'scan':316 'score':37,464 'script':227,557,628,930 'see':117 'seed':477,480 'select':269,307,342 'separ':226 'set':87,320,841 'setup':61,981 'ship':124 'sigma':568,606 'signal':302,355,602 'signal-serendipity-entropy.md':356 'size':450 'skill':92,123,147,667,898,926 'skill-trueskill-rank' 'skill.md':100 'skip':791 'small':319 'solut':820 'sort':40 'sourc':285,384 'source-buildoak' 'spark':27,640 'specif':1024 'specifi':796 'speed':328 'stake':318,336,768 'start':180 'stat':580 'statist':446,957 'stay':120 'stdlib':691 'step':203,205 'structur':904 'subcommand':816 'subscript':648 'subset':23,240,258,418,449,454,584,859 'subset-s':448 'subsets.json':474,494,518 'success':708 'surpris':360,1002 'swappabl':29 'task':997 'tell':137,156 'templat':1013 'text':19,51,382,400,484 'text-cap':483 'tg':388 'tie':737 'tiebreak':733 'time':980 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-automation' 'topic-browser-automation' 'topic-claude-code' 'topic-claude-skills' 'topic-codex' 'total':581 'tree':267 'true':709 'trueskil':2,7,45,56,65,145,506,803,807,821,826,937 'trueskill-rank':1,144 'trueskill-rank.py':224,622 'tune':947 'tweet':394 'uncertainti':618 'understand':950 'updat':121,155,915,922,924 'update-guide.md':130,160,169 'updates.md':127,141,167 'urllib.request':690 'use':21,31,55,311,344,700,762,770,777,783,813 'var':658,699,846 'verif':114 'via':10,368,547,623,646,653,674,689,799 'walkthrough':112,971 'win':576 'without':802 'worker':28,553,625,671 'write':473 'written':706 'yes':278,287,292,303 'zero':692","prices":[{"id":"882a83b2-59f7-4c91-acf1-686cc9048b53","listingId":"186827d9-2a1e-4b27-9cb1-fc93f0d05960","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"buildoak","category":"fieldwork-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T23:07:17.183Z"}],"sources":[{"listingId":"186827d9-2a1e-4b27-9cb1-fc93f0d05960","source":"github","sourceId":"buildoak/fieldwork-skills/trueskill-rank","sourceUrl":"https://github.com/buildoak/fieldwork-skills/tree/main/skills/trueskill-rank","isPrimary":false,"firstSeenAt":"2026-04-18T23:07:17.183Z","lastSeenAt":"2026-04-22T19:06:33.587Z"}],"details":{"listingId":"186827d9-2a1e-4b27-9cb1-fc93f0d05960","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"buildoak","slug":"trueskill-rank","github":{"repo":"buildoak/fieldwork-skills","stars":15,"topics":["agent-skills","ai-agents","ai-tools","automation","browser-automation","claude-code","claude-skills","codex"],"license":"apache-2.0","html_url":"https://github.com/buildoak/fieldwork-skills","pushed_at":"2026-03-18T08:36:25Z","description":"Battle-tested skills for AI agents that do real work","skill_md_sha":"12ed42bb04fde3d8e099b934ea7e6ffc88c85853","skill_md_path":"skills/trueskill-rank/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/buildoak/fieldwork-skills/tree/main/skills/trueskill-rank"},"layout":"multi","source":"github","category":"fieldwork-skills","frontmatter":{"name":"trueskill-rank","description":"Domain-agnostic TrueSkill batch ranking via LLM-as-judge. Ranks any list of text items using overlapping subsets dispatched to Codex Spark workers. Swappable rubrics. Use when you need to rank, score, curate, or sort a collection by quality."},"skills_sh_url":"https://skills.sh/buildoak/fieldwork-skills/trueskill-rank"},"updatedAt":"2026-04-22T19:06:33.587Z"}}