Benchmark OpenClaw coding agents against repeatable real tasks before rollout with PinchBench
Run a real-task benchmark suite against OpenClaw agents so model or harness changes can be compared before they hit production workflows.
What it does
Benchmark OpenClaw coding agents against repeatable real tasks before rollout with PinchBench
Run a real-task benchmark suite against OpenClaw agents so model or harness changes can be compared before they hit production workflows.
Prerequisites
Running OpenClaw instance, Python 3.10+, uv, PinchBench repository checkout, model provider credentials as documented upstream
Installation
Use the upstream install or setup path that matches your environment:
- git clone https://github.com/pinchbench/skill.git
Requirements and caveats from upstream:
- Note: Model IDs must include their provider prefix (e.g. openrouter/, anthropic/). OpenRouter is the default provider used for routing.
- Python 3.10+
Basic usage or getting-started notes:
-
Tool usage — Can the model call the right tools with the right parameters?
-
bash
-
Clone the skill
-
Extracted from upstream docs: https://raw.githubusercontent.com/pinchbench/skill/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,250 chars)