Benchmark browser agents on a fixed stealth and task suite with browser-use benchmark
Compare browser-agent reliability on a repeatable task and anti-bot suite before choosing a stack or claiming progress.
What it does
Benchmark browser agents on a fixed stealth and task suite with browser-use benchmark
Compare browser-agent reliability on a repeatable task and anti-bot suite before choosing a stack or claiming progress.
Prerequisites
Python, uv, benchmark repository dependencies, required API keys for the judge model and selected browser provider, target browser agent configuration
Installation
Use the upstream install or setup path that matches your environment:
- pip install uv
- uv sync
- uv run python run_eval.py --browser <provider>
Requirements and caveats from upstream:
- python -c "
Basic usage or getting-started notes:
-
2. Set up your .env (see .env.example)
-
cp .env.example .env
-
4. Run the evaluation
-
Extracted from upstream docs: https://raw.githubusercontent.com/browser-use/benchmark/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,135 chars)