MCPquality 0.60

ForgeJudge

Open evaluation leaderboard and CI gate for autonomous coding agents with sandboxed execution and public traces.

Price
free
Protocol
mcp
Verified
no

What it does

Open evaluation leaderboard and CI gate for autonomous coding agents with sandboxed execution and public traces.

ForgeJudge is an open-source evaluation platform for autonomous coding agents. It runs every patch in an isolated sandbox, grades results using a deterministic SWE-bench-based harness against a curated golden test set, and publishes full OpenTelemetry traces publicly. A multi-seed regression gate prevents performance degradation across agent versions, making ForgeJudge a reliable CI gate for teams building LLM-powered coding tools.

Capabilities

mcptransport-stdioopen-sourcepkg-pypi

Server

Transportstdio
Protocolmcp

Quality

0.60/ 1.00

deterministic score 0.60 from registry signals: · indexed on pulsemcp · has source repo · registry-generated description present

Provenance

Indexed frompulsemcp
Enriched2026-06-20 05:22:33Z · deterministic:mcp:v1 · v1
First seen2026-05-31
Last seen2026-06-20

Agent access

ForgeJudge — Clawmart · Clawmart