peekaboo
Native macOS GUI automation via peekaboo 3.0+. AX-first hybrid with VLM fallback, FSM-based workflows with bug-specific transitions. Safari web automation, native app control, social media operations, system dialogs, cross-app workflows.
What it does
Peekaboo
Native macOS GUI automation via peekaboo, using an AX-first hybrid approach that prioritizes accessibility-tree interaction when reliable and switches to screenshot, coordinate, JavaScript, or vision-based fallbacks when AX coverage breaks down in real apps.
Terminology used in this file:
- AX (Accessibility tree): The macOS accessibility hierarchy used for deterministic element discovery and interaction (
B1,T2, etc.). - peekaboo: The macOS automation CLI used for discovery (
see,image), interaction (click,type,hotkey), and app/window/dialog control. - UI-TARS: Vision model server used when AX discovery fails or times out on complex UIs.
- FSM: Finite State Machine controlling retries, transitions, and recovery paths.
- Retina scale: Display pixel ratio (typically
2.0) used to map screenshot coordinates correctly on Retina displays. - snapshot ID: The
snapshot_idreturned bysee; required for stable element resolution and invalidated by UI changes.
Setup
brew install jq
brew install --cask peekaboo
mkdir -p ./scripts
chmod +x ./scripts/peekaboo-safe.sh ./scripts/health-check.sh
./scripts/health-check.sh
- Grant Accessibility and Screen Recording permissions for your terminal agent host (for example, Terminal, iTerm2, Claude Code).
- Use
./scripts/peekaboo-safe.shas the only invocation entrypoint for peekaboo commands. - For full setup details and troubleshooting, see
references/installation-guide.mdif present in your local skill copy.
Staying Updated
This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.
After installing, tell your agent: "Check UPDATES.md in the peekaboo skill for any new features or changes."
When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."
Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.
Quick Start
Run a 3-step baseline flow: health check, discover, interact.
# 1) Health check
./scripts/health-check.sh
# 2) Discover UI state
./scripts/peekaboo-safe.sh see --app Safari --json --path /tmp/peekaboo/
# 3) Interact
./scripts/peekaboo-safe.sh click --on B1 --app Safari
Hard Rules (MUST FOLLOW)
- All peekaboo calls MUST use
./scripts/peekaboo-safe.shwrapper - never directpeekaboocalls - Sequential execution only - no parallel GUI operations (causes race conditions)
- NEVER run
tccutil reset- resets ALL app permissions, not just one - Always
--mode screen- window mode returns 698B stubs (beta3 bug) - Health check required - run
./scripts/health-check.shbefore automation - Fresh snapshots mandatory - re-run
seebefore each click sequence (IDs expire) - Use see --analyze for Q&A - 50x more token-efficient than full screenshots
Navigation Strategy: Navigate Programmatically, Interact Visually
Core principle: Never waste turns clicking through UI to reach a target. Get there programmatically, then use peekaboo for the last-mile interaction.
| Method | When | Example |
|---|---|---|
| URL scheme | System Settings, Safari pages | open "x-apple.systempreferences:com.apple.Desktop-Settings.extension" |
| AppleScript | Apps with scripting dictionary | osascript -e 'tell app "Finder" to open folder "Desktop"' |
| CLI/flags | Apps with CLI args | open -a TextEdit /path/to/file |
| Menu command | Navigate within already-open app | ./scripts/peekaboo-safe.sh menu click --path "View > Show Sidebar" |
| AX sidebar click | Last resort - only if above fail | Unreliable for tightly-packed items (System Settings proven broken) |
Hostile AX apps (use URL scheme / AppleScript only):
- System Settings - SwiftUI toggles invisible to AX roles, sidebar clicks misfire
Routing Algorithm (Interaction Tier)
if target_is_native_macos_ui():
use_ax_interaction() # click --on B1, type, hotkey
elif target_is_swiftui_toggle():
use_screenshot_coordinate_click() # screenshot -> identify -> click --coords
elif target_is_web_form():
use_safari_js_injection() # osascript JavaScript
elif target_is_web_button():
use_coordinate_click() # click --coords X,Y --no-auto-focus
elif see_timeout_complex_app():
start_ui_tars_server() # vision provider required
elif dialog_or_sheet():
add_no_auto_focus_flag() # prevent focus conflicts
else:
escalate_to_human()
FSM State Machine
| State | Entry Condition | Actions | Exit Criteria | Retry Budget | Next State |
|---|---|---|---|---|---|
| PREFLIGHT | Start | health-check.sh, mkdir /tmp/peekaboo | all_checks_pass | 1 | CLASSIFY_TARGET |
| CLASSIFY_TARGET | Health pass | Determine interaction tier | tier_identified | 1 | ACQUIRE_WINDOW |
| ACQUIRE_WINDOW | Target known | app switch, window focus | app_frontmost | 3 | WAIT_FOR_UI_STABLE |
| WAIT_FOR_UI_STABLE | Window focused | sleep, check spinner state | ui_elements_stable | 2 | DISCOVER |
| DISCOVER | UI stable | see --app --json --path | elements_found | 3 | INTERACT |
| INTERACT | Elements found | click/type/menu based on tier | action_completed | 5 | VERIFY |
| VERIFY | Action attempted | see/image for state change | success_confirmed | 2 | EXTRACT or CLEANUP |
| EXTRACT | Data needed | see --analyze or JS injection | data_extracted | 2 | CLEANUP |
| CLEANUP | Task complete | clean snapshots, restore clipboard | cleanup_done | 1 | DONE |
| RECOVER | Error detected | Retry from previous state | retry_limit_hit | 3 | HUMAN_HANDOFF or ABORT |
| HUMAN_HANDOFF | Auth/CAPTCHA/TCC | Log issue, preserve evidence | human_resolved | inf | CLASSIFY_TARGET |
| ABORT | Fatal error | Log failure, preserve evidence | - | - | - |
| DONE | Success | Return results | - | - | - |
Bug-Specific Transitions
seetimeout -> start UI-TARS -> retry DISCOVER- Safari AX click misfires -> force coordinate click path
typetargets URL bar -> force JS injection path- daemon file-not-found ->
daemon run --mode manual & - Chrome window capture fail -> switch to Safari
- System Settings AX sidebar misfire -> use URL scheme:
open "x-apple.systempreferences:com.apple.PANE_ID" - SwiftUI toggle invisible to AX -> screenshot + coordinate click (not AX click)
Core Commands (Canonical Forms Only)
Discovery
./scripts/peekaboo-safe.sh see --app AppName --json --path /tmp/peekaboo/./scripts/peekaboo-safe.sh see --analyze "What is the current state?" --app AppName./scripts/peekaboo-safe.sh image --mode screen --path /tmp/peekaboo/screenshot.png
Interaction
./scripts/peekaboo-safe.sh click --on B1 --snapshot "$sid" --app AppName./scripts/peekaboo-safe.sh click --coords X,Y --no-auto-focus --app AppName./scripts/peekaboo-safe.sh type "text" --return --profile human --app AppName./scripts/peekaboo-safe.sh hotkey --keys "cmd,w" --app AppName
App Control
./scripts/peekaboo-safe.sh app switch --to AppName./scripts/peekaboo-safe.sh window focus --window-title "Title" --app AppName./scripts/peekaboo-safe.sh menu click --path "File > Save As..." --app AppName
Safari Workarounds
- Navigate:
./scripts/peekaboo-safe.sh open "https://url" --app Safari - Form fill:
osascript -e 'tell application "Safari" to do JavaScript "document.querySelector(\"input[name=field]\").value = \"value\"" in document 1' - Button click:
./scripts/peekaboo-safe.sh click --coords X,Y --no-auto-focus --app Safari
Agent Mode
./scripts/peekaboo-safe.sh agent "Natural language task" --model gpt-4o --max-steps 10./scripts/peekaboo-safe.sh agent --resume --max-steps 5
Token Budget
see --analyze ~100 tokens | see --json ~600 | image --mode screen ~1400. Soft limit 50 turns, hard limit 100. Prefer see --analyze over full AX trees. Details in references/token-budget.md.
Capability Matrix
| Feature Area | Status | Last Validated | Notes |
|---|---|---|---|
| Native macOS UI | validated | 2026-02-22 | 20 tests passed, core capability |
| Agent mode | validated | 2026-02-22 | Multi-step automation works |
| Safari forms | partial | 2026-02-22 | JS injection required, see playbooks |
| System dialogs | validated | 2026-02-22 | Use --no-auto-focus |
| File pickers | partial | 2026-02-22 | Compact mode issues |
| Menu navigation | validated | 2026-02-22 | App menus work, Apple menu limited |
| Spaces management | validated | 2026-02-22 | Full virtual desktop support |
| Live capture | validated | 2026-02-22 | Recording and diff detection |
| Chrome automation | unsupported | 2026-02-22 | Issue #67, use Safari |
| Social media ops | validated | 2026-02-22 | Reddit shadow DOM login, Booking URL-first search, X signup, cookie extraction, CAPTCHA detection -- see playbooks |
Playbook Index
Each playbook is a self-contained field-tested recipe. Status: validated = tested in production, experimental = untested or partially tested.
| Playbook | Use Case | Status |
|---|---|---|
native-app-automation.md | Any macOS app workflow | validated |
safari-login.md | Generic web authentication | validated |
dialog-and-file-picker.md | System dialogs, save/open | validated |
cross-app-data-transfer.md | Clipboard transfer between apps | validated |
reddit-login.md | Reddit auth (shadow DOM + React events) | validated |
reddit-data-extraction.md | Post Insights extraction (author-only analytics) | validated |
booking-search.md | Hotel search (URL-first, VLM extraction) | validated |
x-signup.md | X/Twitter account registration + email verification | validated |
file-upload.md | macOS file dialog from web apps (Cmd+Shift+G) | validated |
captcha-solver.md | CAPTCHA detection and handling | experimental |
cookie-extractor.md | Browser session cookie extraction | experimental |
instagram-monitor.md | Instagram content monitoring | experimental |
Reference Index
references/commands.md- Complete command reference with all flagsreferences/safari-workarounds.md- Web automation canonical patternsreferences/patterns.md- Common automation workflows and recipesreferences/failures.md- Error recovery procedures and symptom tablereferences/validated-tests.md- Field validation results and capabilitiesreferences/token-budget.md- Token costs and optimization strategiesreferences/hard-tests.md- Edge-case and stress test catalogreferences/installation-guide.md- Full setup from zero to working
Pre-flight Checklist
./scripts/health-check.sh
# Must return: cli_exists=true, accessibility_granted=true, ax_test_success=true
mkdir -p /tmp/peekaboo
Error Recovery
After 3 retries on same state: log state + evidence screenshot + escalate to HUMAN_HANDOFF. Quick recovery: app switch + fresh see + health-check.sh.
Capabilities
Install
Quality
deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (11,095 chars)