{"id":"0d352e37-9c1d-4009-9364-2feb82badfde","shortId":"NF5XRD","kind":"skill","title":"using-web-scraping","tagline":"Search and scrape public web content with headless Chrome and DuckDuckGo using safe practices.","description":"# Web Scraping Skill — Chrome (Playwright) + DuckDuckGo\n\nA privacy-minded, agent-facing web-scraping skill that uses headless Chrome (Playwright/Puppeteer) and DuckDuckGo for search. Focuses on: reliable navigation, extracting structured text, obeying robots.txt, and rate-limiting.\n\n## When to use\n- Collect public webpage content for summarization, metadata extraction, or link discovery.\n- Use DuckDuckGo for queries when you want a privacy-respecting search source.\n- NOT for bypassing paywalls, scraping private/logged-in content, or violating Terms of Service.\n\n## Safety & etiquette\n- Always check and respect `/robots.txt` before scraping a site.\n- Rate-limit requests (default: 1 request/sec) and use polite `User-Agent` strings.\n- Avoid executing arbitrary user-provided JavaScript on scraped pages.\n- Only scrape public content; if login is required, return `login_required` instead of attempting to bypass.\n\n## Capabilities\n- Search DuckDuckGo and return top-N result links.\n- Visit result pages in headless Chrome and extract `title`, `meta description`, `main` text (or best-effort article text), and `canonical` URL.\n- Return results as structured JSON for downstream consumption.\n\n## Examples\n### Node.js (Playwright)\n```javascript\nconst { chromium } = require('playwright');\n\nasync function ddgSearchAndScrape(query) {\n  const browser = await chromium.launch({ headless: true });\n  const page = await browser.newPage({ userAgent: 'open-skills-bot/1.0' });\n\n  // DuckDuckGo search\n  await page.goto('https://duckduckgo.com/');\n  await page.fill('input[name=\"q\"]', query);\n  await page.keyboard.press('Enter');\n  await page.waitForSelector('.result__title a');\n\n  // collect top result URL\n  const href = await page.getAttribute('.result__title a', 'href');\n  if (!href) { await browser.close(); return []; }\n\n  // visit result and extract\n  await page.goto(href, { waitUntil: 'domcontentloaded' });\n  const title = await page.title();\n  const description = await page.locator('meta[name=\"description\"]').getAttribute('content').catch(() => null);\n  const article = await page.locator('article, main, #content').first().innerText().catch(() => null);\n\n  await browser.close();\n  return [{ url: href, title, description, text: article }];\n}\n\n// usage\n// ddgSearchAndScrape('open-source agent runtimes').then(console.log);\n```\n\n## Agent prompt (copy/paste)\n```text\nYou are an agent with a web-scraping skill. For any `search:` task, use DuckDuckGo to find relevant pages, then open each page in a headless Chrome instance (Playwright/Puppeteer) and extract `title`, `meta description`, `main text`, and `canonical` URL. Always:\n- Check and respect robots.txt\n- Rate-limit requests (<=1 req/sec)\n- Use a clear `User-Agent` and do not execute arbitrary page JS\nReturn results as JSON: [{url,title,description,text}] or `login_required` if a page needs authentication.\n```\n\n## Quick setup\n- Node: `npm i playwright` and run `npx playwright install` for browser binaries.\n- Python: `pip install playwright` and `playwright install`.\n\n## Tips\n- Use `page.route` to block large assets (images, fonts) when you only need text.\n- Respect site terms and introduce exponential backoff for retries.\n\n## See also\n- [using-youtube-download.md](using-youtube-download.md) — media-specific scraping and download examples.","tags":["using","web","scraping","open","skills","besoeasy","agent-skills","ai-agents","claude-code","clawdbot","clawdbot-skill","llm-tools"],"capabilities":["skill","source-besoeasy","skill-using-web-scraping","topic-agent-skills","topic-ai-agents","topic-claude-code","topic-clawdbot","topic-clawdbot-skill","topic-llm-tools","topic-mcp-server","topic-openai","topic-openclaw","topic-vibe-coding","topic-vibecoding"],"categories":["open-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/besoeasy/open-skills/using-web-scraping","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add besoeasy/open-skills","source_repo":"https://github.com/besoeasy/open-skills","install_from":"skills.sh"}},"qualityScore":"0.505","qualityRationale":"deterministic score 0.51 from registry signals: · indexed on github topic:agent-skills · 111 github stars · SKILL.md body (3,319 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-02T12:55:05.246Z","embedding":null,"createdAt":"2026-04-18T22:10:58.734Z","updatedAt":"2026-05-02T12:55:05.246Z","lastSeenAt":"2026-05-02T12:55:05.246Z","tsv":"'/'');':222 '/1.0':215 '/robots.txt':103 '1':113,360 'agent':30,120,303,307,314,367 'agent-fac':29 'also':436 'alway':99,351 'arbitrari':124,372 'articl':175,279,282,297 'asset':418 'async':196 'attempt':145 'authent':390 'avoid':122 'await':202,208,218,223,229,232,243,251,258,265,269,280,289 'backoff':432 'best':173 'best-effort':172 'binari':404 'block':416 'bot':214 'browser':201,403 'browser.close':252,290 'browser.newpage':209 'bypass':87,147 'canon':178,349 'capabl':148 'catch':276,287 'check':100,352 'chrome':13,22,39,163,338 'chromium':193 'chromium.launch':203 'clear':364 'collect':61,237 'console.log':306 'const':192,200,206,241,263,267,278 'consumpt':187 'content':10,64,91,135,275,284 'copy/paste':309 'ddgsearchandscrap':198,299 'default':112 'descript':168,268,273,295,345,381 'discoveri':71 'domcontentload':262 'download':444 'downstream':186 'duckduckgo':15,24,42,73,150,216,326 'duckduckgo.com':221 'duckduckgo.com/'');':220 'effort':174 'enter':231 'etiquett':98 'exampl':188,445 'execut':123,371 'exponenti':431 'extract':49,68,165,257,342 'face':31 'find':328 'first':285 'focus':45 'font':420 'function':197 'getattribut':274 'headless':12,38,162,204,337 'href':242,248,250,260,293 'imag':419 'innertext':286 'input':225 'instal':401,407,411 'instanc':339 'instead':143 'introduc':430 'javascript':128,191 'js':374 'json':184,378 'larg':417 'limit':57,110,358 'link':70,157 'login':137,141,384 'main':169,283,346 'media':440 'media-specif':439 'meta':167,271,344 'metadata':67 'mind':28 'n':155 'name':226,272 'navig':48 'need':389,424 'node':393 'node.js':189 'npm':394 'npx':399 'null':277,288 'obey':52 'open':212,301,332 'open-skills-bot':211 'open-sourc':300 'page':131,160,207,330,334,373,388 'page.fill':224 'page.getattribute':244 'page.goto':219,259 'page.keyboard.press':230 'page.locator':270,281 'page.route':414 'page.title':266 'page.waitforselector':233 'paywal':88 'pip':406 'playwright':23,190,195,396,400,408,410 'playwright/puppeteer':40,340 'polit':117 'practic':18 'privaci':27,81 'privacy-mind':26 'privacy-respect':80 'private/logged-in':90 'prompt':308 'provid':127 'public':8,62,134 'python':405 'q':227 'queri':75,199,228 'quick':391 'rate':56,109,357 'rate-limit':55,108,356 'relev':329 'reliabl':47 'req/sec':361 'request':111,359 'request/sec':114 'requir':139,142,194,385 'respect':82,102,354,426 'result':156,159,181,234,239,245,255,376 'retri':434 'return':140,152,180,253,291,375 'robots.txt':53,355 'run':398 'runtim':304 'safe':17 'safeti':97 'scrape':4,7,20,34,89,105,130,133,319,442 'search':5,44,83,149,217,323 'see':435 'servic':96 'setup':392 'site':107,427 'skill':21,35,213,320 'skill-using-web-scraping' 'sourc':84,302 'source-besoeasy' 'specif':441 'string':121 'structur':50,183 'summar':66 'task':324 'term':94,428 'text':51,170,176,296,310,347,382,425 'tip':412 'titl':166,235,246,264,294,343,380 'top':154,238 'top-n':153 'topic-agent-skills' 'topic-ai-agents' 'topic-claude-code' 'topic-clawdbot' 'topic-clawdbot-skill' 'topic-llm-tools' 'topic-mcp-server' 'topic-openai' 'topic-openclaw' 'topic-vibe-coding' 'topic-vibecoding' 'true':205 'url':179,240,292,350,379 'usag':298 'use':2,16,37,60,72,116,325,362,413 'user':119,126,366 'user-ag':118,365 'user-provid':125 'userag':210 'using-web-scrap':1 'using-youtube-download.md':437,438 'violat':93 'visit':158,254 'waituntil':261 'want':78 'web':3,9,19,33,318 'web-scrap':32,317 'webpag':63","prices":[{"id":"19e4b7be-8156-4491-a55b-9c5366bc4e55","listingId":"0d352e37-9c1d-4009-9364-2feb82badfde","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"besoeasy","category":"open-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T22:10:58.734Z"}],"sources":[{"listingId":"0d352e37-9c1d-4009-9364-2feb82badfde","source":"github","sourceId":"besoeasy/open-skills/using-web-scraping","sourceUrl":"https://github.com/besoeasy/open-skills/tree/main/skills/using-web-scraping","isPrimary":false,"firstSeenAt":"2026-04-18T22:10:58.734Z","lastSeenAt":"2026-05-02T12:55:05.246Z"}],"details":{"listingId":"0d352e37-9c1d-4009-9364-2feb82badfde","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"besoeasy","slug":"using-web-scraping","github":{"repo":"besoeasy/open-skills","stars":111,"topics":["agent-skills","ai","ai-agents","claude-code","clawdbot","clawdbot-skill","llm-tools","mcp-server","openai","openclaw","vibe-coding","vibecoding"],"license":null,"html_url":"https://github.com/besoeasy/open-skills","pushed_at":"2026-03-31T13:05:30Z","description":"Battle-tested skill library for AI agents. Save 98% of API costs with ready-to-use code for crypto, PDFs, search, web scraping & more. No trial-and-error, no expensive APIs.","skill_md_sha":"3f13a96ce4e14018f513b4b73b07f8ebbffdfd73","skill_md_path":"skills/using-web-scraping/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/besoeasy/open-skills/tree/main/skills/using-web-scraping"},"layout":"multi","source":"github","category":"open-skills","frontmatter":{"name":"using-web-scraping","description":"Search and scrape public web content with headless Chrome and DuckDuckGo using safe practices."},"skills_sh_url":"https://skills.sh/besoeasy/open-skills/using-web-scraping"},"updatedAt":"2026-05-02T12:55:05.246Z"}}