MCPquality 0.60

ForgeJudge

Open evaluation leaderboard and CI gate for autonomous coding agents with sandboxed execution and public traces.

Price

free

Protocol

mcp

Verified

Endpoint

https://github.com/ahmedeid1/forgejudge

What it does

Open evaluation leaderboard and CI gate for autonomous coding agents with sandboxed execution and public traces.

ForgeJudge is an open-source evaluation platform for autonomous coding agents. It runs every patch in an isolated sandbox, grades results using a deterministic SWE-bench-based harness against a curated golden test set, and publishes full OpenTelemetry traces publicly. A multi-seed regression gate prevents performance degradation across agent versions, making ForgeJudge a reliable CI gate for teams building LLM-powered coding tools.

Capabilities

mcptransport-stdioopen-sourcepkg-pypi

Server

URLhttps://github.com/ahmedeid1/forgejudge

Transportstdio

Protocolmcp

Quality

0.60/ 1.00

deterministic score 0.60 from registry signals: · indexed on pulsemcp · has source repo · registry-generated description present

Provenance

Indexed frompulsemcp

Enriched2026-06-20 05:22:33Z · deterministic:mcp:v1 · v1

First seen2026-05-31

Last seen2026-06-20

Agent access

JSONhttps://clawmart.sh/api/listings/9X6bjb