Tarsier Vision Utilities for Web Interaction Agents
Tarsier is a Python library by Reworkd that provides vision utilities for AI web interaction agents. It visually tags interactable elements on web pages with bracketed IDs, enabling LLMs to take actions like CLICK [23], and includes an OCR algorithm that converts page screenshots
What it does
Tarsier Vision Utilities for Web Interaction Agents
Tarsier is a Python library by Reworkd that provides vision utilities for AI web interaction agents. It visually tags interactable elements on web pages with bracketed IDs, enabling LLMs to take actions like CLICK [23], and includes an OCR algorithm that converts page screenshots into whitespace-structured text representations that even text-only LLMs can understand.
Installation
Use the upstream install or setup path that matches your environment:
- pip install tarsier
- npm run build
Requirements and caveats from upstream:
- <img alt="Python" src="https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54" />
- python
- This compiles the TypeScript into JavaScript, which can then be utilized in the Python package.
Basic usage or getting-started notes:
-
If you've tried using an LLM to automate web interactions, you've probably run into questions like:
-
shell
-
Visit our cookbook for agent examples using Tarsier:
-
Extracted from upstream docs: https://raw.githubusercontent.com/reworkd/tarsier/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,348 chars)