{"id":"e7e25037-7a7c-4c9e-9c99-600b8e59cd1a","shortId":"HwUaKZ","kind":"skill","title":"Tarsier Vision Utilities for Web Interaction Agents","tagline":"Tarsier is a Python library by Reworkd that provides vision utilities for AI web interaction agents. It visually tags interactable elements on web pages with bracketed IDs, enabling LLMs to take actions like CLICK [23], and includes an OCR algorithm that converts page screenshots","description":"# Tarsier Vision Utilities for Web Interaction Agents\n\nTarsier is a Python library by Reworkd that provides vision utilities for AI web interaction agents. It visually tags interactable elements on web pages with bracketed IDs, enabling LLMs to take actions like CLICK [23], and includes an OCR algorithm that converts page screenshots into whitespace-structured text representations that even text-only LLMs can understand.\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- pip install tarsier\n- npm run build\n\nRequirements and caveats from upstream:\n- <img alt=\"Python\" src=\"https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54\" />\n- python\n- This compiles the TypeScript into JavaScript, which can then be utilized in the Python package.\n\nBasic usage or getting-started notes:\n- If you've tried using an LLM to automate web interactions, you've probably run into questions like:\n- shell\n- Visit our [cookbook](https://github.com/reworkd/Tarsier/tree/main/cookbook) for agent examples using Tarsier:\n\n- Source: https://github.com/reworkd/tarsier\n- Extracted from upstream docs: https://raw.githubusercontent.com/reworkd/tarsier/HEAD/README.md\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/tarsier-vision-utilities-web-interaction-agents/)","tags":["tarsier","vision","utilities","web","interaction","agents","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list"],"capabilities":["skill","source-agentskillexchange","skill-tarsier-vision-utilities-web-interaction-agents","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/tarsier-vision-utilities-web-interaction-agents","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,348 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:44.780Z","embedding":null,"createdAt":"2026-05-18T13:19:48.820Z","updatedAt":"2026-05-18T19:12:44.780Z","lastSeenAt":"2026-05-18T19:12:44.780Z","tsv":"'/reworkd/tarsier':196 '/reworkd/tarsier/head/readme.md':203 '/reworkd/tarsier/tree/main/cookbook)':187 '/skills/tarsier-vision-utilities-web-interaction-agents/)':210 '23':42,93 'action':39,90 'agent':7,23,58,74,189,205 'agentskillexchange.com':209 'agentskillexchange.com/skills/tarsier-vision-utilities-web-interaction-agents/)':208 'ai':20,71 'algorithm':47,98 'autom':171 'basic':156 'bracket':33,84 'build':134 'caveat':137 'click':41,92 'compil':142 'convert':49,100 'cookbook':184 'doc':200 'element':28,79 'enabl':35,86 'environ':128 'even':110 'exampl':190 'exchang':207 'extract':197 'get':160 'getting-start':159 'github.com':186,195 'github.com/reworkd/tarsier':194 'github.com/reworkd/tarsier/tree/main/cookbook)':185 'id':34,85 'includ':44,95 'instal':117,121,130 'interact':6,22,27,57,73,78,173 'javascript':146 'librari':12,63 'like':40,91,180 'llm':169 'llms':36,87,114 'match':126 'note':162 'npm':132 'ocr':46,97 'packag':155 'page':31,50,82,101 'path':124 'pip':129 'probabl':176 'provid':16,67 'python':11,62,140,154 'question':179 'raw.githubusercontent.com':202 'raw.githubusercontent.com/reworkd/tarsier/head/readme.md':201 'represent':108 'requir':135 'reworkd':14,65 'run':133,177 'screenshot':51,102 'setup':123 'shell':181 'skill':206 'skill-tarsier-vision-utilities-web-interaction-agents' 'sourc':193,204 'source-agentskillexchange' 'start':161 'structur':106 'tag':26,77 'take':38,89 'tarsier':1,8,52,59,131,192 'text':107,112 'text-on':111 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'tri':166 'typescript':144 'understand':116 'upstream':120,139,199 'usag':157 'use':118,167,191 'util':3,18,54,69,151 've':165,175 'vision':2,17,53,68 'visit':182 'visual':25,76 'web':5,21,30,56,72,81,172 'whitespac':105 'whitespace-structur':104","prices":[{"id":"9b8a3c6e-fe47-4ba1-922f-0584c4110627","listingId":"e7e25037-7a7c-4c9e-9c99-600b8e59cd1a","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:19:48.820Z"}],"sources":[{"listingId":"e7e25037-7a7c-4c9e-9c99-600b8e59cd1a","source":"github","sourceId":"agentskillexchange/skills/tarsier-vision-utilities-web-interaction-agents","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/tarsier-vision-utilities-web-interaction-agents","isPrimary":false,"firstSeenAt":"2026-05-18T13:19:48.820Z","lastSeenAt":"2026-05-18T19:12:44.780Z"}],"details":{"listingId":"e7e25037-7a7c-4c9e-9c99-600b8e59cd1a","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"tarsier-vision-utilities-web-interaction-agents","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"9a8b167e675d86ea7465645a7f1d1fd9de448b2a","skill_md_path":"skills/tarsier-vision-utilities-web-interaction-agents/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/tarsier-vision-utilities-web-interaction-agents"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Tarsier Vision Utilities for Web Interaction Agents","description":"Tarsier is a Python library by Reworkd that provides vision utilities for AI web interaction agents. It visually tags interactable elements on web pages with bracketed IDs, enabling LLMs to take actions like CLICK [23], and includes an OCR algorithm that converts page screenshots into whitespace-structured text representations that even text-only LLMs can understand."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/tarsier-vision-utilities-web-interaction-agents"},"updatedAt":"2026-05-18T19:12:44.780Z"}}