agent-computer-use
REQUIRED for any task that involves operating a desktop application — opening apps, clicking buttons, typing into fields, pressing keys, scrolling, dragging, reading what's on screen, moving or resizing windows, or verifying state after an action. Always use the `agent-cu` CLI co
What it does
agent-computer-use — Computer use Skill
You have access to agent-cu, a CLI tool that controls desktop applications. You can click buttons, type text, read screens, scroll, drag files, move windows — all from the terminal.
First-run setup (user action)
The first time you run an agent-cu command, Claude Code prompts the user to approve it. The user has three options, in order of preference:
-
Easiest — run
agent-cu setupin a terminal. Interactive wizard that writesBash(agent-cu *)to the right settings file. One-time. If the user seems frustrated by repeated approval prompts, suggest this.If you (the agent) are running it yourself and can't answer prompts, use the non-interactive flags:
agent-cu setup --yes→ unsupervised + global (recommended default)agent-cu setup --mode <unsupervised|supervised> --scope <global|project>for specific combos
-
Quickest — pick "Yes, and don't ask again for:
agent-cu *" in the approval prompt. Covers only that specific subcommand pattern; you'll get prompted again for other subcommands. -
Manual — add
"Bash(agent-cu *)"to~/.claude/settings.jsonunderpermissions.allow.
Do not attempt to modify the user's settings file yourself — direct them to agent-cu setup instead.
How to think
Think like a human sitting at the computer. Before you act, ask yourself: what would I see on screen? What would I click? What would I type?
A human:
- Looks at the screen (snapshot)
- Finds what they need (identify refs)
- Does one action (click, type, key)
- Checks what changed (re-snapshot)
You must do the same. Never skip steps. Never assume the UI didn't change after an action.
Core loop
snapshot → identify → act → verify
agent-cu snapshot -a Music -i -c # what's on screen?
# read the output, find the right @ref
agent-cu click @e5 # do one thing
agent-cu snapshot -a Music -i -c # what changed?
Every action changes the UI. Your previous refs are now stale. Always re-snapshot.
Opening apps
Always wait for the app to be ready before doing anything:
agent-cu open Safari --wait
agent-cu snapshot -a Safari -i -c
Never interact with an app you haven't opened and snapshotted first.
Finding elements
Step 1: Snapshot with -i -c (interactive + compact):
agent-cu snapshot -a Calculator -i -c
This shows only clickable/typeable elements with refs like @e1, @e5, @e12.
Step 2: Read the output. Find the element you need by its name, role, or id.
Step 3: Use the ref. Refs are the fastest and most reliable way to target elements.
If elements are missing, increase depth:
agent-cu snapshot -a Safari -i -c -d 8
Clicking
For buttons, links, menu items — use click:
agent-cu click @e5 # single click (AXPress, headless)
agent-cu click @e5 --count 2 # double-click (opens files, plays songs)
click tries AXPress first (background, no focus steal). Only falls back to mouse simulation for double-click or right-click.
For elements with stable IDs (won't change between snapshots):
agent-cu click 'id="play"' -a Music
agent-cu click 'id~="track-123"' -a Music # partial id match
Typing
With a target element (preferred — uses AXSetValue, headless):
agent-cu type "hello world" -s @e3
Into the focused field (keyboard simulation, needs app focus):
agent-cu type "hello world" -a Safari
Always prefer -s @ref when you have a ref. It's more reliable.
Key presses
agent-cu key Return -a Calculator
agent-cu key cmd+k -a Slack
agent-cu key cmd+a -a TextEdit
agent-cu key Escape -a Safari
Scrolling
agent-cu scroll down -a Music # scroll main content area
agent-cu scroll down --amount 10 -a Music # scroll more
agent-cu scroll-to @e42 # scroll element into view (headless)
Scroll needs the app to be focused. Use scroll-to for headless.
Reading content
agent-cu text -a Calculator # all visible text
agent-cu get-value @e5 # one element's value/state
agent-cu get-value 'id="title"' -a Music # by selector
Use get-value on specific elements instead of text on large apps.
Window management
agent-cu move-window -a Notes --x 100 --y 100
agent-cu resize-window -a Notes --width 800 --height 600
agent-cu windows -a Finder # get window positions and sizes
These are instant and headless — use AXSetPosition/AXSetSize.
Drag and drop
Drag needs the app to be focused and two visible, non-overlapping areas.
Think like a human: you need to see both the source and destination.
# Step 1: Set up windows side by side
agent-cu move-window -a Finder --x 0 --y 25
agent-cu resize-window -a Finder --width 720 --height 475
# (open a second Finder window for destination)
# Step 2: Snapshot to find the file
agent-cu snapshot -a Finder -i -c -d 8
# Step 3: Get the file's position
agent-cu get-value @e32 # check position
# Step 4: Drag to destination
agent-cu drag @e32 @e50 -a Finder # drag by refs
# or by coordinates:
agent-cu drag --from-x 300 --from-y 55 --to-x 1000 --to-y 200 -a Finder
Selector syntax
Refs (always prefer these)
@e1, @e2, @e3 # from most recent snapshot
DSL
'role=button name="Submit"' # role + exact name
'name="Login"' # exact name
'id="AllClear"' # exact id (most stable)
'id~="track-123"' # id contains (case-insensitive)
'name~="Clear"' # name contains (case-insensitive)
'button "Submit"' # shorthand: role name
'"Login"' # shorthand: just name
'role=button index=2' # 3rd match (0-based)
'css=".my-button"' # CSS selector (Electron apps only)
Chains
'id=sidebar >> role=button index=0' # first button inside sidebar
'name="Form" >> button "Submit"' # submit button inside form
Electron apps (CDP)
Electron apps (Slack, Cursor, VS Code, Postman, Discord) are automatically detected. agent-cu relaunches them with CDP support on first use.
Everything works headless — no window activation, no mouse, no focus steal:
agent-cu snapshot -a Slack -i -c # full DOM tree via CDP
agent-cu click @e5 # JS element.click()
agent-cu key cmd+k -a Slack # CDP key dispatch
agent-cu type "hello" -a Slack # CDP insertText
agent-cu scroll down -a Slack # JS scrollBy()
agent-cu text -a Slack # document.body.innerText
Typing in Electron apps: insertText goes to the focused element. If you need to type into a specific input:
agent-cu snapshot -a Slack -i -c # find the input ref
agent-cu click @e18 # click to focus it
agent-cu key cmd+a -a Slack # select all
agent-cu key backspace -a Slack # clear
agent-cu type "your text" -a Slack # now type
Verification
Never assume an action worked. Verify by checking a state-bearing attribute, not just by looking at the tree again.
The id vs name distinction (critical)
Many apps give a button a fixed id (the slot) and a changing name (the current label).
Music's transport button is the canonical example:
idis always"play"— it identifies the button as "the transport button", even when currently playing.nameflips between"play"and"pause"depending on playback state.
To detect state, read name, not id:
# check if music is playing
agent-cu find 'id="play"' -a Music --compact
# → [{"name":"pause", ...}] ← means: playback is ON
# → [{"name":"play", ...}] ← means: playback is OFF
The same pattern appears in many apps: bookmark/unbookmark, mute/unmute, expand/collapse, follow/unfollow. When you want to confirm a toggle worked, always read the element's current name after the action.
Inline verification with --expect
agent-cu click @e5 --expect 'name="Dashboard"'
# clicks, then polls for an element with name="Dashboard". Fails if it never appears.
Reading values
agent-cu get-value @e3 # one element's value + role + position
agent-cu find 'id="play"' -a Music --compact # most stable if id is known
agent-cu snapshot -a Safari -i -c # broad check
Idempotent typing
agent-cu ensure-text @e3 "hello" # only types if value differs
Reading dynamic computed values (e.g., Calculator result)
Some apps don't surface the result as a normal value on a labeled element — it's hidden in a staticText node. Use tree and walk for any node with a value:
agent-cu tree -a Calculator -d 8 --compact | python3 -c "
import json, sys
d = json.load(sys.stdin)
def walk(n):
if n.get('value'): print(n.get('role'), '=', repr(n['value']))
for c in n.get('children', []): walk(c)
walk(d)
"
# → staticText = '1,234×7'
# → staticText = '8,638'
Locale gotcha: numbers are locale-formatted. Indian locale shows 7^8 = 57,64,801, international shows 5,764,801. They're the same value. Before comparing, strip commas and spaces.
Waiting
When UI takes time to load:
agent-cu wait-for 'name="Dashboard"' # poll until element appears
agent-cu wait-for 'role=button' --timeout 15
sleep 2 # simple delay after navigation
Batch operations
Chain multiple commands to avoid per-command startup:
echo '[["click","@e5"],["key","Return","-a","Music"]]' | agent-cu batch --bail
Real-world patterns
Search and play a song in Music (verified flow)
# 1. Open and snapshot
agent-cu open Music --wait
agent-cu snapshot -a Music -i -c
# → @e1 is the Search sidebar item
# 2. Click Search
agent-cu click @e1 -a Music
# 3. Type into the search field — use role=textField, not a ref (the ref
# for the search field changes as the view switches)
agent-cu type "Espresso Sabrina Carpenter" -s 'role=textField' -a Music --submit
sleep 2 # let search results populate
# 4. Pick a result. Grep the snapshot for items matching the track name —
# the `id` embeds a stable catalog id, so grab that.
agent-cu snapshot -a Music -c | grep -i "espresso" | head -5
# → [@e53] other("axcell") "Espresso" id=Music.shelfItem.TopSearchLockup[id=top-search-section-top-1744253558,...]
# 5. Open the album (double-click). Use the full id string, not the ref —
# refs can drift between snapshots during long flows.
agent-cu click 'id="Music.shelfItem.TopSearchLockup[id=top-search-section-top-1744253558,parentId=top-search-section-top]"' -a Music --count 2
sleep 2
# 6. Find the track row and play. The track has a stable id pattern.
agent-cu snapshot -a Music -c | grep "track-lockup" | head -3
# → [@e52] group "Espresso" id=Music.shelfItem.AlbumTrackLockup[...]
# 7. Select, then try Return to play. If that fails, fall back to the
# transport play button directly.
agent-cu click @e52 -a Music
agent-cu key Return -a Music
sleep 1
# 8. Verify via the transport button's `name` — "pause" means playing.
agent-cu find 'id="play"' -a Music --compact
# if name="play", playback didn't start. Fallback:
agent-cu click 'id="play"' -a Music
Key lessons from this flow:
- Refs (
@e53) can go stale between snapshots separated by major UI changes. Prefer fullid="..."for cross-snapshot targeting. type -s 'role=textField' --submitcombines typing, clearing, and pressing Return reliably.- Double-click on a search result often opens the item, not plays it. Drill into the detail view, then trigger playback.
- Always verify playback via the
nameof the transport button (id="play"is the slot;nameholds state). - If
Returndoesn't trigger playback, click the transport play button as a fallback. Have a plan B.
Compute a multi-step calculation in Calculator (verified flow)
# 1. Open and snapshot — Calculator buttons have stable ids (Seven, Multiply, Equals, etc.)
agent-cu open Calculator --wait
agent-cu snapshot -a Calculator -i -c
# 2. Use batch for the whole keystroke sequence. Avoids 17 per-process starts.
# Example: 7^8 = 7×7×7×7×7×7×7×7 = 5,764,801
echo '[
["click","id=\"AllClear\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Multiply\"","-a","Calculator"],
["click","id=\"Seven\"","-a","Calculator"],
["click","id=\"Equals\"","-a","Calculator"]
]' | agent-cu batch
# 3. Read the result — walk the tree for any staticText with a value.
agent-cu tree -a Calculator -d 8 --compact | python3 -c "
import json, sys
d = json.load(sys.stdin)
for n in [d]:
def walk(n):
if n.get('value'): print(n.get('role'), '=', repr(n['value']))
for c in n.get('children', []): walk(c)
walk(n)
"
# → staticText = '7×7×7×7×7×7×7×7' ← expression
# → staticText = '57,64,801' ← result (Indian locale; strip commas → 5764801)
Key lessons:
- When an app exposes stable
id=on every interactive element (Calculator does), skip snapshotting between clicks — just batch them. - The result isn't on a labeled element; walk the tree for nodes with a
value. - Numbers come locale-formatted. Strip commas/spaces before parsing.
Open a DM in Slack and send a message
agent-cu key cmd+k -a Slack
sleep 1
agent-cu snapshot -a Slack -i -c
# find and click the search input
agent-cu click @e18
agent-cu key cmd+a -a Slack
agent-cu key backspace -a Slack
agent-cu type "Vukasin" -a Slack
sleep 1
agent-cu key Return -a Slack
sleep 2
agent-cu type "hey, check this out" -a Slack
agent-cu key Return -a Slack
Check calendar events
agent-cu open Calendar --wait
agent-cu snapshot -a Calendar -i -c -d 6
# read the visible dates
agent-cu text -a Calendar
# navigate to next month
agent-cu click @e3 # next month button
sleep 1
agent-cu text -a Calendar
Fill a web form
agent-cu open Safari --wait
agent-cu snapshot -a Safari -i -c
agent-cu type "https://example.com/form" -s @e34
agent-cu key Return -a Safari
sleep 3
agent-cu snapshot -a Safari -i -c -d 8
agent-cu type "John Doe" -s @e5
agent-cu type "john@example.com" -s @e6
agent-cu type "Hello world" -s @e7
agent-cu click @e8 # submit button
agent-cu snapshot -a Safari -i -c # verify submission
Drag a file between Finder windows
# set up side-by-side windows
agent-cu move-window -a Finder --x 0 --y 25
# (ensure two windows open, Downloads left, Desktop right)
agent-cu snapshot -a Finder -i -c -d 8
# find the file (look for textfield with val="filename")
agent-cu click @e32 # select the file
agent-cu drag --from-x 300 --from-y 55 --to-x 1000 --to-y 200 -a Finder
Browse App Store
agent-cu open "App Store" --wait
agent-cu snapshot -a "App Store" -i -c -d 10
agent-cu click 'id="AppStore.tabBar.discover"' -a "App Store"
sleep 2
agent-cu scroll down --amount 5 -a "App Store"
agent-cu snapshot -a "App Store" -i -c -d 10
agent-cu text -a "App Store"
Rules
- Always snapshot before acting. You cannot interact with what you cannot see.
- Always re-snapshot after acting. The UI changed. Your refs are stale.
- Refs for the next step, IDs for the long haul. Refs (
@e5) are fastest for the immediate next action after a snapshot. Fullid="..."selectors survive across snapshots and are the right choice for anything you'll return to after other UI changes. - Use
-i -con snapshots. Interactive + compact reduces noise by 10x. - Prefer
id=overname=when the app provides stable ids. IDs don't change with state; names often do. - For state detection, read the
nameattribute — notid. Many apps keepidas a fixed slot and flipnameto reflect current state (play/pause, mute/unmute, bookmark/unbookmark). - Wait after navigation.
wait-fora known landmark first; fall back tosleep 2-3if there's none. - One action, then verify. Don't chain multiple mutations without checking state between them.
- Use
type -s @refovertype -a App. The selector path uses AXSetValue (reliable). The app path uses keyboard simulation (fragile, needs focus). - Use
scroll-to @refwhen you know the element. It's headless.scroll downneeds focus. - Never hardcode app-specific logic. Don't write a helper like
play_song_in_music(). Use the same primitives (snapshot,click,type,key,find) for every app. The scene tree tells you what to do — read it, act on it, verify. - Have a fallback for every action. If a
clickdoesn't produce the expected state change, escalate: double-click → scroll-to + click → keyboard shortcut → direct action button (e.g., transportid="play"instead of track-row Return). See the Recovery playbook. - Batch when ids are stable and numerous. Calculator, forms, keypad flows benefit from piping a JSON array into
agent-cu batch— one process start instead of N.
Recovery playbook
When something doesn't work, the principle is: diagnose the exact failure mode, then apply the matching fallback. Never retry the same command blindly.
Element not found (stale ref)
Symptom: error: element not found matching SelectorChain { ... } on a @ref that came from an earlier snapshot.
Cause: refs resolve to a specific tree path; major UI changes (navigation, scroll, modal open) invalidate them.
Fix: re-snapshot and use the new ref, or switch to a stable id=:
agent-cu snapshot -a Safari -i -c
# …find it again, or use the full id you saw earlier:
agent-cu click 'id="the-stable-id"' -a Safari
Prefer id="..." for any target you'll act on across multiple snapshots. Refs are for single-step actions only.
Ambiguous selector
Symptom: error: ambiguous selector: found N elements matching.
Fix: add index= or narrow by role/name:
agent-cu click 'role=button index=0' -a Music
agent-cu click 'role=button name="Save"' -a TextEdit
Click succeeded but nothing happened
Symptom: {"success": true, "message": "clicked at (X,Y)"} but the expected state change (new page, playback, menu) did not occur.
Diagnose in order:
- Was AXPress used or did it fall back to coordinates?
pressed "Name" at (X,Y)= AXPress succeeded (best).clicked at (X,Y)= coordinate click; the element might have moved or be obscured.
- Is the element's state actually what you think? Read the
nameattribute of the state-bearing element (e.g., Music's transport button), not just the tree. - Does the action require focus? Some apps (especially web UI) need the app to be frontmost.
agent-cu open App --waitactivates it.
Fallback ladder:
# Ladder 1: AXPress failed — try double-click
agent-cu click @e5 --count 2
# Ladder 2: Element offscreen — bring it in, then click
agent-cu scroll-to @e5
agent-cu click @e5
# Ladder 3: The clicked element isn't the one that triggers the action —
# drill into the container and find the explicit action button
agent-cu snapshot -a Music -c | grep -E "play|pause|Next"
agent-cu click 'id="play"' -a Music
# Ladder 4: Keyboard shortcut fallback
agent-cu key Return -a Music # activate selected item
agent-cu key space -a Music # play/pause
Type didn't land
Symptom: text didn't appear in the field.
Causes + fixes:
# Cause: no ref — fell back to keyboard sim which needs app focus
agent-cu type "hello" -a Safari
# Fix: use selector path (AXSetValue, reliable)
agent-cu type "hello" -s @e3
# Cause: field wasn't focused when keyboard sim fired
agent-cu click @e3 # focus it first
agent-cu type "hello" -s @e3 # then type
# Cause: existing value not cleared
agent-cu type "new" -s @e3 --append # wrong: keeps old text
agent-cu type "new" -s @e3 # right: clears first
Search submitted but results aren't in the tree yet
Symptom: snapshot right after --submit shows the old page.
Fix: wait, then re-snapshot. Prefer polling over sleep:
agent-cu type "query" -s 'role=textField' -a Music --submit
agent-cu wait-for 'name~="Top Results"' --timeout 5
agent-cu snapshot -a Music -i -c
If wait-for doesn't know a stable landmark, fall back to sleep 2.
Snapshot is too shallow — can't find the element
Bump depth:
agent-cu snapshot -a Safari -i -c -d 8
agent-cu snapshot -a Safari -i -c -d 12 # for deeply nested apps
Ref points to the wrong thing
Symptom: @e5 clicks something but it's not what you expected.
Cause: the snapshot re-ran silently (you took two) and refs got renumbered.
Fix: always inspect the ref's resolved element before acting on anything important:
agent-cu get-value @e5 -a Safari
# confirm role, name, id are what you expect, THEN click
Electron app not using CDP
agent-cu auto-detects Electron apps. If CDP isn't working:
agent-cu snapshot -a Slack -i -c -v # verbose shows CDP status
First run takes ~5s as the app relaunches with --remote-debugging-port. Every subsequent run uses a cached connection (~15ms).
DSL selector parse error
Symptom: unknown filter key: 'id_contains'.
Fix: the syntax is id~= (tilde-equals) for partial match, not id_contains=:
agent-cu click 'id~="track-"' -a Music
agent-cu click 'name~="Submit"' -a Safari
Output format
All output is JSON:
{"success": true, "message": "pressed \"7\" at (453, 354)"}
{"error": true, "type": "element_not_found", "message": "..."}
{"role": "button", "name": "Submit", "value": null, "position": {"x": 450, "y": 320}}
Capabilities
Install
Quality
deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 10 github stars · SKILL.md body (22,689 chars)