Skillquality 0.45

Prove whether a prompt or model variant really won before shipping with promptstats

Run statistically sound comparisons on eval results so prompt and model changes are judged by confidence bounds, not bar-chart vibes.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/agentskillexchange/skills/prove-whether-a-prompt-or-model-variant-really-won-before-shipping-with-promptstats

What it does

Prove whether a prompt or model variant really won before shipping with promptstats

Run statistically sound comparisons on eval results so prompt and model changes are judged by confidence bounds, not bar-chart vibes.

Prerequisites

Python environment, promptstats package, eval result tables or per-input score arrays, prompt or model experiment outputs to compare

Installation

Use the upstream install or setup path that matches your environment:

pip install evalstats
pip install "evalstats[xlsx]"
pip install "evalstats[all]"
pip install "evalstats[lmm]"

Requirements and caveats from upstream:

If you set method="lmm", analyze() switches to a mixed-effects path (score ~ template + (1|input)) with Wald CIs and parametric rank distributions. By default this uses statsmodels (pure Python, no additional setup re...
Python API
evalstats main use case is as a Python API, which provides a similar entry point, the analyze() function. Simply pass your benchmark data in the correct format, and pass it to analyze to get a battery of results:

Basic usage or getting-started notes:

What statistics test should I run in X situation?
as well as example code (which will, obviously, tend to use evalstats, but
Running estats.analyze() and then estats.print_analysis_summary(analysis) prints a full statistical report to the terminal, including confidence interval line plots, pairwise comparisons between prompt templates, and...
Source: https://github.com/ianarawjo/promptstats
Extracted from upstream docs: https://raw.githubusercontent.com/ianarawjo/promptstats/HEAD/README.md

Documentation

https://statsforevals.com

Source

Agent Skill Exchange

Capabilities

skillsource-agentskillexchangeskill-prove-whether-a-prompt-or-model-variant-really-won-before-shipping-with-promptstatstopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Installnpx skills add agentskillexchange/skills

Sourcehttps://github.com/agentskillexchange/skills/tree/main/skills/prove-whether-a-prompt-or-model-variant-really-won-before-shipping-with-promptstats

skills.shhttps://skills.sh/agentskillexchange/skills/prove-whether-a-prompt-or-model-variant-really-won-before-shipping-with-promptstats

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,826 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:11:56Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/vCYY34

What it does

Prove whether a prompt or model variant really won before shipping with promptstats

Prerequisites

Installation

Python API

Documentation

Source

Capabilities

Install

Quality

Provenance

Agent access