{"id":"cdd9da39-ddc4-4595-87e5-5f2dd305e46e","shortId":"nT5C4g","kind":"skill","title":"Trafilatura Web Text Extraction and Crawling Toolkit","tagline":"Trafilatura is a Python package and CLI tool for gathering text from the web. It handles crawling, downloading, and extracting main text content, metadata, and comments from raw HTML, outputting clean structured data in CSV, JSON, Markdown, XML, and TXT formats.","description":"# Trafilatura Web Text Extraction and Crawling Toolkit\n\nTrafilatura is a Python package and CLI tool for gathering text from the web. It handles crawling, downloading, and extracting main text content, metadata, and comments from raw HTML, outputting clean structured data in CSV, JSON, Markdown, XML, and TXT formats.\n\n## Installation\n\nRequirements and caveats from upstream:\n- [![Python package](https://img.shields.io/pypi/v/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura)\n- [![Python versions](https://img.shields.io/pypi/pyversions/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura)\n- Trafilatura is a cutting-edge **Python package and command-line tool**\n\nBasic usage or getting-started notes:\n- to run the evaluation with the latest data and packages.\n- [Getting started with Trafilatura](https://trafilatura.readthedocs.io/en/latest/quickstart.html)\n- is straightforward. For more information and detailed guides, visit\n\n- Source: https://github.com/adbar/trafilatura\n- Extracted from upstream docs: https://raw.githubusercontent.com/adbar/trafilatura/HEAD/README.md\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/trafilatura-web-text-extraction-crawling/)","tags":["trafilatura","web","text","extraction","crawling","skills","agentskillexchange","agent-skills","ai-agents","ai-tools","awesome-list","claude-code"],"capabilities":["skill","source-agentskillexchange","skill-trafilatura-web-text-extraction-crawling","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/trafilatura-web-text-extraction-crawling","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,213 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:12:53.132Z","embedding":null,"createdAt":"2026-05-18T13:19:59.970Z","updatedAt":"2026-05-18T19:12:53.132Z","lastSeenAt":"2026-05-18T19:12:53.132Z","tsv":"'/adbar/trafilatura':162 '/adbar/trafilatura/head/readme.md':169 '/en/latest/quickstart.html)':149 '/pypi/pyversions/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura)':112 '/pypi/v/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura)':107 '/skills/trafilatura-web-text-extraction-crawling/)':176 'agent':171 'agentskillexchange.com':175 'agentskillexchange.com/skills/trafilatura-web-text-extraction-crawling/)':174 'basic':126 'caveat':100 'clean':38,86 'cli':14,62 'command':123 'command-lin':122 'comment':33,81 'content':30,78 'crawl':6,24,54,72 'csv':42,90 'cut':117 'cutting-edg':116 'data':40,88,140 'detail':156 'doc':166 'download':25,73 'edg':118 'evalu':136 'exchang':173 'extract':4,27,52,75,163 'format':48,96 'gather':17,65 'get':130,143 'getting-start':129 'github.com':161 'github.com/adbar/trafilatura':160 'guid':157 'handl':23,71 'html':36,84 'img.shields.io':106,111 'img.shields.io/pypi/pyversions/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura)':110 'img.shields.io/pypi/v/trafilatura.svg)](https://pypi.python.org/pypi/trafilatura)':105 'inform':154 'instal':97 'json':43,91 'latest':139 'line':124 'main':28,76 'markdown':44,92 'metadata':31,79 'note':132 'output':37,85 'packag':12,60,104,120,142 'python':11,59,103,108,119 'raw':35,83 'raw.githubusercontent.com':168 'raw.githubusercontent.com/adbar/trafilatura/head/readme.md':167 'requir':98 'run':134 'skill':172 'skill-trafilatura-web-text-extraction-crawling' 'sourc':159,170 'source-agentskillexchange' 'start':131,144 'straightforward':151 'structur':39,87 'text':3,18,29,51,66,77 'tool':15,63,125 'toolkit':7,55 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'trafilatura':1,8,49,56,113,146 'trafilatura.readthedocs.io':148 'trafilatura.readthedocs.io/en/latest/quickstart.html)':147 'txt':47,95 'upstream':102,165 'usag':127 'version':109 'visit':158 'web':2,21,50,69 'xml':45,93","prices":[{"id":"7f3e3af6-beae-40cc-bf8c-0feb2b00b3e6","listingId":"cdd9da39-ddc4-4595-87e5-5f2dd305e46e","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:19:59.970Z"}],"sources":[{"listingId":"cdd9da39-ddc4-4595-87e5-5f2dd305e46e","source":"github","sourceId":"agentskillexchange/skills/trafilatura-web-text-extraction-crawling","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/trafilatura-web-text-extraction-crawling","isPrimary":false,"firstSeenAt":"2026-05-18T13:19:59.970Z","lastSeenAt":"2026-05-18T19:12:53.132Z"}],"details":{"listingId":"cdd9da39-ddc4-4595-87e5-5f2dd305e46e","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"trafilatura-web-text-extraction-crawling","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"aa1bdbedf58f28e97ab6a19665c68ba6fb322753","skill_md_path":"skills/trafilatura-web-text-extraction-crawling/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/trafilatura-web-text-extraction-crawling"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Trafilatura Web Text Extraction and Crawling Toolkit","description":"Trafilatura is a Python package and CLI tool for gathering text from the web. It handles crawling, downloading, and extracting main text content, metadata, and comments from raw HTML, outputting clean structured data in CSV, JSON, Markdown, XML, and TXT formats."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/trafilatura-web-text-extraction-crawling"},"updatedAt":"2026-05-18T19:12:53.132Z"}}