Drive web and app UIs with vision-grounded steps when selectors are brittle or unavailable
Use Midscene.js when an agent needs screenshot-grounded UI actions and assertions across web, mobile, or desktop surfaces where DOM selectors are fragile, unavailable, or not the right abstraction.
What it does
Drive web and app UIs with vision-grounded steps when selectors are brittle or unavailable
Use Midscene.js when an agent needs screenshot-grounded UI actions and assertions across web, mobile, or desktop surfaces where DOM selectors are fragile, unavailable, or not the right abstraction.
Prerequisites
Midscene.js, Node.js, a supported vision model, and a target automation surface such as Playwright, Puppeteer, Android adb, or iOS WebDriverAgent
Installation
Requirements and caveats from upstream:
- midscene-pc-docker - Docker image with Midscene-PC server pre-installed
- Midscene-Python - Python SDK for Midscene automation
Basic usage or getting-started notes:
-
Sample Projects: https://github.com/web-infra-dev/midscene-example
-
Extracted from upstream docs: https://raw.githubusercontent.com/web-infra-dev/midscene/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,274 chars)