{"id":"4fcf8483-4378-40d7-b15f-f4eecb1503bf","shortId":"KTPHCF","kind":"skill","title":"Extract schema.org, Open Graph, and JSON-LD metadata from web pages for indexing","tagline":"Uses extruct to pull machine-readable metadata from raw HTML so an agent can classify, deduplicate, or enrich pages without brittle full-page parsing. It is best for metadata harvesting workflows, not for crawling an entire site or rendering JavaScript-heavy pages.","description":"# Extract schema.org, Open Graph, and JSON-LD metadata from web pages for indexing\n\nUses extruct to pull machine-readable metadata from raw HTML so an agent can classify, deduplicate, or enrich pages without brittle full-page parsing. It is best for metadata harvesting workflows, not for crawling an entire site or rendering JavaScript-heavy pages.\n\n## Prerequisites\n\nPython 3 environment\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- pip install extruct\n- pip install 'extruct[cli]'\n- pip install -r requirements-dev.txt\n\nRequirements and caveats from upstream:\n- :target: https://pypi.python.org/pypi/extruct\n- .. _rdflib: https://pypi.python.org/pypi/rdflib/\n- First fetch the HTML using python-requests and then feed the response body to extruct::\n\nBasic usage or getting-started notes:\n- ------------\n- ::\n- -----\n\n- Source: https://github.com/scrapinghub/extruct\n- Extracted from upstream docs: https://raw.githubusercontent.com/scrapinghub/extruct/HEAD/README.rst\n\n## Documentation\n\n- https://github.com/scrapinghub/extruct#readme\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing/)","tags":["extract","schema","org","open","graph","and","json","metadata","from","web","pages","for"],"capabilities":["skill","source-agentskillexchange","skill-extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,238 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:10:24.506Z","embedding":null,"createdAt":"2026-05-18T13:16:28.997Z","updatedAt":"2026-05-18T19:10:24.506Z","lastSeenAt":"2026-05-18T19:10:24.506Z","tsv":"'/pypi/extruct':154 '/pypi/rdflib/':158 '/scrapinghub/extruct':185 '/scrapinghub/extruct#readme':196 '/scrapinghub/extruct/head/readme.rst':192 '/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing/)':203 '3':121 'agent':28,87,198 'agentskillexchange.com':202 'agentskillexchange.com/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing/)':201 'basic':175 'best':43,102 'bodi':172 'brittl':36,95 'caveat':148 'classifi':30,89 'cli':141 'crawl':50,109 'dedupl':31,90 'doc':189 'document':193 'enrich':33,92 'entir':52,111 'environ':122,134 'exchang':200 'extract':1,60,186 'extruct':16,75,137,140,174 'feed':169 'fetch':160 'first':159 'full':38,97 'full-pag':37,96 'get':179 'getting-start':178 'github.com':184,195 'github.com/scrapinghub/extruct':183 'github.com/scrapinghub/extruct#readme':194 'graph':4,63 'harvest':46,105 'heavi':58,117 'html':25,84,162 'index':14,73 'instal':123,127,136,139,143 'javascript':57,116 'javascript-heavi':56,115 'json':7,66 'json-ld':6,65 'ld':8,67 'machin':20,79 'machine-read':19,78 'match':132 'metadata':9,22,45,68,81,104 'note':181 'open':3,62 'page':12,34,39,59,71,93,98,118 'pars':40,99 'path':130 'pip':135,138,142 'prerequisit':119 'pull':18,77 'pypi.python.org':153,157 'pypi.python.org/pypi/extruct':152 'pypi.python.org/pypi/rdflib/':156 'python':120,165 'python-request':164 'r':144 'raw':24,83 'raw.githubusercontent.com':191 'raw.githubusercontent.com/scrapinghub/extruct/head/readme.rst':190 'rdflib':155 'readabl':21,80 'render':55,114 'request':166 'requir':146 'requirements-dev.txt':145 'respons':171 'schema.org':2,61 'setup':129 'site':53,112 'skill':199 'skill-extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing' 'sourc':182,197 'source-agentskillexchange' 'start':180 'target':151 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'upstream':126,150,188 'usag':176 'use':15,74,124,163 'web':11,70 'without':35,94 'workflow':47,106","prices":[{"id":"7eeeda20-8553-455d-8fa9-359cb553ebcd","listingId":"4fcf8483-4378-40d7-b15f-f4eecb1503bf","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:16:28.997Z"}],"sources":[{"listingId":"4fcf8483-4378-40d7-b15f-f4eecb1503bf","source":"github","sourceId":"agentskillexchange/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing","isPrimary":false,"firstSeenAt":"2026-05-18T13:16:28.997Z","lastSeenAt":"2026-05-18T19:10:24.506Z"}],"details":{"listingId":"4fcf8483-4378-40d7-b15f-f4eecb1503bf","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"7a946d82cb6eb0d077e1545c12e2784e404e0bf2","skill_md_path":"skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Extract schema.org, Open Graph, and JSON-LD metadata from web pages for indexing","description":"Uses extruct to pull machine-readable metadata from raw HTML so an agent can classify, deduplicate, or enrich pages without brittle full-page parsing. It is best for metadata harvesting workflows, not for crawling an entire site or rendering JavaScript-heavy pages."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/extract-schema-org-open-graph-and-json-ld-metadata-from-web-pages-for-indexing"},"updatedAt":"2026-05-18T19:10:24.506Z"}}