{"id":"5fc778b8-c02c-449a-a6b4-0fd5bd632578","shortId":"AA9Lbc","kind":"skill","title":"web-scraper","tagline":"Web scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON.","description":"# Web Scraper\n\n## Overview\n\nWeb scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON.\n\n## When to Use This Skill\n\n- When the user mentions \"scraper\" or related topics\n- When the user mentions \"scraping\" or related topics\n- When the user mentions \"extrair dados web\" or related topics\n- When the user mentions \"web scraping\" or related topics\n- When the user mentions \"raspar dados\" or related topics\n- When the user mentions \"coletar dados site\" or related topics\n\n## Do Not Use This Skill When\n\n- The task is unrelated to web scraper\n- A simpler, more specific tool can handle the request\n- The user needs general-purpose assistance without domain expertise\n\n## How It Works\n\nExecute phases in strict order. Each phase feeds the next.\n\n```\n1. CLARIFY  ->  2. RECON  ->  3. STRATEGY  ->  4. EXTRACT  ->  5. TRANSFORM  ->  6. VALIDATE  ->  7. FORMAT\n```\n\nNever skip Phase 1 or Phase 2. They prevent wasted effort and failed extractions.\n\n**Fast path**: If user provides URL + clear data target + the request is simple\n(single page, one data type), compress Phases 1-3 into a single action:\nfetch, classify, and extract in one WebFetch call. Still validate and format.\n\n---\n\n## Capabilities\n\n- **Multi-strategy**: WebFetch (static), Browser automation (JS-rendered), Bash/curl (APIs), WebSearch (discovery)\n- **Extraction modes**: table, list, article, product, contact, FAQ, pricing, events, jobs, custom\n- **Output formats**: Markdown tables (default), JSON, CSV\n- **Pagination**: auto-detect and follow (page numbers, infinite scroll, load-more)\n- **Multi-URL**: extract same structure across sources with comparison and diff\n- **Validation**: confidence ratings (HIGH/MEDIUM/LOW) on every extraction\n- **Auto-escalation**: WebFetch fails silently -> automatic Browser fallback\n- **Data transforms**: cleaning, normalization, deduplication, enrichment\n- **Differential mode**: detect changes between scraping runs\n\n## Web Scraper\n\nMulti-strategy web data extraction with intelligent approach selection,\nautomatic fallback escalation, data transformation, and structured output.\n\n## Phase 1: Clarify\n\nEstablish extraction parameters before touching any URL.\n\n## Required Parameters\n\n| Parameter     | Resolve                              | Default        |\n|:--------------|:-------------------------------------|:---------------|\n| Target URL(s) | Which page(s) to scrape?             | *(required)*   |\n| Data Target   | What specific data to extract?       | *(required)*   |\n| Output Format | Markdown table, JSON, CSV, or text?  | Markdown table |\n| Scope         | Single page, paginated, or multi-URL?| Single page    |\n\n## Optional Parameters\n\n| Parameter     | Resolve                                | Default      |\n|:--------------|:---------------------------------------|:-------------|\n| Pagination    | Follow pagination? Max pages?          | No, 1 page   |\n| Max Items     | Maximum number of items to collect?    | Unlimited    |\n| Filters       | Data to exclude or include?            | None         |\n| Sort Order    | How to sort results?                   | Source order  |\n| Save Path     | Save to file? Which path?              | Display only |\n| Language      | Respond in which language?             | User's lang  |\n| Diff Mode     | Compare with previous run?             | No           |\n\n## Clarification Rules\n\n- If user provides a URL and clear data target, proceed directly to Phase 2.\n  Do NOT ask unnecessary questions.\n- If request is ambiguous (e.g. \"scrape this site\"), ask ONLY:\n  \"What specific data do you want me to extract from this page?\"\n- Default to Markdown table output. Mention alternatives only if relevant.\n- Accept requests in any language. Always respond in the user's language.\n- If user says \"everything\" or \"all data\", perform recon first, then present\n  what's available and let user choose.\n\n## Discovery Mode\n\nWhen user has a topic but no specific URL:\n1. Use WebSearch to find the most relevant pages\n2. Present top 3-5 URLs with descriptions\n3. Let user choose which to scrape, or scrape all\n4. Proceed to Phase 2 with selected URL(s)\n\nExample: \"find and extract pricing data for CRM tools\"\n-> WebSearch(\"CRM tools pricing comparison 2026\")\n-> Present top results -> User selects -> Extract\n\n---\n\n## Phase 2: Reconnaissance\n\nAnalyze the target page before extraction.\n\n## Step 2.1: Initial Fetch\n\nUse WebFetch to retrieve and analyze the page structure:\n\n```\nWebFetch(\n  url = TARGET_URL,\n  prompt = \"Analyze this page structure and report:\n    1. Page type: article, product listing, search results, data table,\n       directory, dashboard, API docs, FAQ, pricing page, job board, events, or other\n    2. Main content structure: tables, ordered/unordered lists, card grid, free-form text,\n       accordion/collapsible sections, tabs\n    3. Approximate number of distinct data items visible\n    4. JavaScript rendering indicators: empty containers, loading spinners,\n       SPA framework markers (React root, Vue app, Angular), minimal HTML with heavy JS\n    5. Pagination: next/prev links, page numbers, load-more buttons,\n       infinite scroll indicators, total results count\n    6. Data density: how much structured, extractable data exists\n    7. List the main data fields/columns available for extraction\n    8. Embedded structured data: JSON-LD, microdata, OpenGraph tags\n    9. Available download links: CSV, Excel, PDF, API endpoints\"\n)\n```\n\n## Step 2.2: Evaluate Fetch Quality\n\n| Signal                                      | Interpretation                    | Action                    |\n|:--------------------------------------------|:----------------------------------|:--------------------------|\n| Rich content with data clearly visible      | Static page                       | Strategy A (WebFetch)     |\n| Empty containers, \"loading...\", minimal text | JS-rendered                       | Strategy B (Browser)      |\n| Login wall, CAPTCHA, 403/401 response       | Blocked                           | Report to user            |\n| Content present but poorly structured       | Needs precision                   | Strategy B (Browser)      |\n| JSON or XML response body                   | API endpoint                      | Strategy C (Bash/curl)    |\n| Download links for CSV/Excel available      | Direct data file                  | Strategy C (download)     |\n\n## Step 2.3: Content Classification\n\nClassify into an extraction mode:\n\n| Mode       | Indicators                                 | Examples                          |\n|:-----------|:-------------------------------------------|:----------------------------------|\n| `table`    | HTML `<table>`, grid layout with headers   | Price comparison, statistics, specs|\n| `list`     | Repeated similar elements, card grids      | Search results, product listings  |\n| `article`  | Long-form text with headings/paragraphs    | Blog post, news article, docs     |\n| `product`  | Product name, price, specs, images, rating | E-commerce product page           |\n| `contact`  | Names, emails, phones, addresses, roles    | Team page, staff directory        |\n| `faq`      | Question-answer pairs, accordions          | FAQ page, help center             |\n| `pricing`  | Plan names, prices, features, tiers        | SaaS pricing page                 |\n| `events`   | Dates, locations, titles, descriptions     | Event listings, conferences       |\n| `jobs`     | Titles, companies, locations, salaries     | Job boards, career pages          |\n| `custom`   | User specified CSS selectors or fields     | Anything not matching above       |\n\nRecord: **page type**, **extraction mode**, **JS rendering needed (yes/no)**,\n**available fields**, **structured data present (JSON-LD etc.)**.\n\nIf user asked for \"everything\", present the available fields and let them choose.\n\n---\n\n## Phase 3: Strategy Selection\n\nChoose the extraction approach based on recon results.\n\n## Decision Tree\n\n```\nStructured data (JSON-LD, microdata) has what we need?\n |\n +-- YES --> STRATEGY E: Extract structured data directly\n |\n +-- NO: Content fully visible in WebFetch?\n      |\n      +-- YES: Need precise element targeting?\n      |    |\n      |    +-- NO  --> STRATEGY A: WebFetch + AI extraction\n      |    +-- YES --> STRATEGY B: Browser automation\n      |\n      +-- NO: JavaScript rendering detected?\n           |\n           +-- YES --> STRATEGY B: Browser automation\n           +-- NO:  API/JSON/XML endpoint or download link?\n                |\n                +-- YES --> STRATEGY C: Bash (curl + jq)\n                +-- NO  --> Report access issue to user\n```\n\n## Strategy A: Webfetch With Ai Extraction\n\n**Best for**: Static pages, articles, simple tables, well-structured HTML.\n\nUse WebFetch with a targeted extraction prompt tailored to the mode:\n\n```\nWebFetch(\n  url = URL,\n  prompt = \"Extract [DATA_TARGET] from this page.\n    Return ONLY the extracted data as [FORMAT] with these columns/fields: [FIELDS].\n    Rules:\n    - If a value is missing or unclear, use 'N/A'\n    - Do not include navigation, ads, footers, or unrelated content\n    - Preserve original values exactly (numbers, currencies, dates)\n    - Include ALL matching items, not just the first few\n    - For each item, also extract the URL/link if available\"\n)\n```\n\n**Auto-escalation**: If WebFetch returns suspiciously few items (less than\n50% of expected from recon), or mostly empty fields, automatically escalate\nto Strategy B without asking user. Log the escalation in notes.\n\n## Strategy B: Browser Automation\n\n**Best for**: JS-rendered pages, SPAs, interactive content, lazy-loaded data.\n\nSequence:\n1. Get tab context: `tabs_context_mcp(createIfEmpty=true)` -> get tabId\n2. Navigate to URL: `navigate(url=TARGET_URL, tabId=TAB)`\n3. Wait for content to load: `computer(action=\"wait\", duration=3, tabId=TAB)`\n4. Check for cookie/consent banners: `find(query=\"cookie consent or accept button\", tabId=TAB)`\n   - If found, dismiss it (prefer privacy-preserving option)\n5. Read page structure: `read_page(tabId=TAB)` or `get_page_text(tabId=TAB)`\n6. Locate target elements: `find(query=\"[DESCRIPTION]\", tabId=TAB)`\n7. Extract with JavaScript for precise data via `javascript_tool`\n\n```javascript\n// Table extraction\nconst rows = document.querySelectorAll('TABLE_SELECTOR tr');\nconst data = Array.from(rows).map(row => {\n  const cells = row.querySelectorAll('td, th');\n  return Array.from(cells).map(c => c.textContent.trim());\n});\nJSON.stringify(data);\n```\n\n```javascript\n// List/card extraction\nconst items = document.querySelectorAll('ITEM_SELECTOR');\nconst data = Array.from(items).map(item => ({\n  field1: item.querySelector('FIELD1_SELECTOR')?.textContent?.trim() || null,\n  field2: item.querySelector('FIELD2_SELECTOR')?.textContent?.trim() || null,\n  link: item.querySelector('a')?.href || null,\n}));\nJSON.stringify(data);\n```\n\n8. For lazy-loaded content, scroll and re-extract:\n   `computer(action=\"scroll\", scroll_direction=\"down\", tabId=TAB)`\n   then `computer(action=\"wait\", duration=2, tabId=TAB)`\n\n## Strategy C: Bash (Curl + Jq)\n\n**Best for**: REST APIs, JSON endpoints, XML feeds, CSV/Excel downloads.\n\n```bash\n\n## Json Api\n\ncurl -s \"API_URL\" | jq '[.items[] | {field1: .key1, field2: .key2}]'\n\n## Csv Download\n\ncurl -s \"CSV_URL\" -o /tmp/scraped_data.csv\n\n## Xml Parsing\n\ncurl -s \"XML_URL\" | python3 -c \"\nimport xml.etree.ElementTree as ET, json, sys\ntree = ET.parse(sys.stdin)\n\n## ... Parse And Output Json\n\n\"\n```\n\n## Strategy D: Hybrid\n\nWhen a single strategy is insufficient, combine:\n1. WebSearch to discover relevant URLs\n2. WebFetch for initial content assessment\n3. Browser automation for JS-heavy sections\n4. Bash for post-processing (jq, python for data cleaning)\n\n## Strategy E: Structured Data Extraction\n\nWhen JSON-LD, microdata, or OpenGraph is present:\n1. Use Browser `javascript_tool` to extract structured data:\n```javascript\nconst scripts = document.querySelectorAll('script[type=\"application/ld+json\"]');\nconst data = Array.from(scripts).map(s => {\n  try { return JSON.parse(s.textContent); } catch { return null; }\n}).filter(Boolean);\nJSON.stringify(data);\n```\n2. This often provides cleaner, more reliable data than DOM scraping\n3. Fall back to DOM extraction only for fields not in structured data\n\n## Pagination Handling\n\nWhen pagination is detected and user wants multiple pages:\n\n**Page-number pagination (any strategy):**\n1. Extract data from current page\n2. Identify URL pattern (e.g. `?page=N`, `/page/N`, `&offset=N`)\n3. Iterate through pages up to user's max (default: 5 pages)\n4. Show progress: \"Extracting page 2/5...\"\n5. Concatenate all results, deduplicate if needed\n\n**Infinite scroll (Browser only):**\n1. Extract currently visible data\n2. Record item count\n3. Scroll down: `computer(action=\"scroll\", scroll_direction=\"down\", tabId=TAB)`\n4. Wait: `computer(action=\"wait\", duration=2, tabId=TAB)`\n5. Extract newly loaded data\n6. Compare count - if no new items after 2 scrolls, stop\n7. Repeat until no new content or max iterations (default: 5)\n\n**\"Load More\" button (Browser only):**\n1. Extract currently visible data\n2. Find button: `find(query=\"load more button\", tabId=TAB)`\n3. Click it: `computer(action=\"left_click\", ref=REF, tabId=TAB)`\n4. Wait and extract new content\n5. Repeat until button disappears or max iterations reached\n\n---\n\n## Phase 4: Extract\n\nExecute the selected strategy using mode-specific patterns.\nSee [references/extraction-patterns.md](references/extraction-patterns.md)\nfor CSS selectors and JavaScript snippets.\n\n## Table Mode\n\nWebFetch prompt:\n```\n\"Extract ALL rows from the table(s) on this page.\nReturn as a markdown table with exact column headers.\nInclude every row - do not truncate or summarize.\nPreserve numeric precision, currencies, and units.\"\n```\n\n## List Mode\n\nWebFetch prompt:\n```\n\"Extract each [ITEM_TYPE] from this page.\nFor each item, extract: [FIELD_LIST].\nReturn as a JSON array of objects with these keys: [KEY_LIST].\nInclude ALL items, not just the first few. Include link/URL for each item if available.\"\n```\n\n## Article Mode\n\nWebFetch prompt:\n```\n\"Extract article metadata:\n- title, author, date, tags/categories, word count estimate\n- Key factual data points, statistics, and named entities\nReturn as structured markdown. Summarize the content; do not reproduce full text.\"\n```\n\n## Product Mode\n\nWebFetch prompt:\n```\n\"Extract product data with these exact fields:\n- name, brand, price, currency, originalPrice (if discounted),\n  availability, description (first 200 chars), rating, reviewCount,\n  specifications (as key-value pairs), productUrl, imageUrl\nReturn as JSON. Use null for missing fields.\"\n```\n\nAlso check for JSON-LD `Product` schema (Strategy E) first.\n\n## Contact Mode\n\nWebFetch prompt:\n```\n\"Extract contact information for each person/entity:\n- name, title, role, email, phone, address, organization, website, linkedinUrl\nReturn as a markdown table. Only extract real contacts visible on the page.\"\n```\n\n## Faq Mode\n\nWebFetch prompt:\n```\n\"Extract all question-answer pairs from this page.\nFor each FAQ item extract:\n- question: the exact question text\n- answer: the answer text (first 300 chars if long)\n- category: the section/category if grouped\nReturn as a JSON array of objects.\"\n```\n\n## Pricing Mode\n\nWebFetch prompt:\n```\n\"Extract all pricing plans/tiers from this page.\nFor each plan extract:\n- planName, monthlyPrice, annualPrice, currency\n- features (array of included features)\n- limitations (array of limits or excluded features)\n- ctaText (call-to-action button text)\n- highlighted (true if marked as recommended/popular)\nReturn as JSON. Use null for missing fields.\"\n```\n\n## Events Mode\n\nWebFetch prompt:\n```\n\"Extract all events/sessions from this page.\nFor each event extract:\n- title, date, time, endTime, location, description (first 200 chars)\n- speakers (array of names), category, registrationUrl\nReturn as JSON. Use null for missing fields.\"\n```\n\n## Jobs Mode\n\nWebFetch prompt:\n```\n\"Extract all job listings from this page.\nFor each job extract:\n- title, company, location, salary, salaryRange, type (full-time/part-time/contract)\n- postedDate, description (first 200 chars), applyUrl, tags\nReturn as JSON. Use null for missing fields.\"\n```\n\n## Custom Mode\n\nWhen user provides specific selectors or field descriptions:\n- Use Browser automation with `javascript_tool` and user's CSS selectors\n- Or use WebFetch with a prompt built from user's field descriptions\n- Always confirm extracted schema with user before proceeding to multi-URL\n\n## Multi-Url Extraction\n\nWhen extracting from multiple URLs:\n1. Extract from the **first URL** to establish the data schema\n2. Show user the first results and confirm the schema is correct\n3. Extract from remaining URLs using the same schema\n4. Add a `source` column/field to every record with the origin URL\n5. Combine all results into a single output\n6. Show progress: \"Extracting 3/7 URLs...\"\n\n---\n\n## Phase 5: Transform\n\nClean, normalize, and enrich extracted data before validation.\nSee [references/data-transforms.md](references/data-transforms.md) for patterns.\n\n## Automatic Transforms (Always Apply)\n\n| Transform              | Action                                               |\n|:-----------------------|:-----------------------------------------------------|\n| Whitespace cleanup     | Trim, collapse multiple spaces, remove `\\n` in cells |\n| HTML entity decode     | `&amp;` -> `&`, `&lt;` -> `<`, `&#39;` -> `'`       |\n| Unicode normalization  | NFKC normalization for consistent characters          |\n| Empty string to null   | `\"\"` -> `null` (for JSON), `\"\"` -> `N/A` (for tables)|\n\n## Conditional Transforms (Apply When Relevant)\n\n| Transform             | When                         | Action                                  |\n|:----------------------|:-----------------------------|:----------------------------------------|\n| Price normalization   | Product/pricing modes        | Extract numeric value + currency symbol |\n| Date normalization    | Any dates found              | Normalize to ISO-8601 (YYYY-MM-DD)      |\n| URL resolution        | Relative URLs extracted      | Convert to absolute URLs                |\n| Phone normalization   | Contact mode                 | Standardize to E.164 format if possible |\n| Deduplication         | Multi-page or multi-URL      | Remove exact duplicate rows             |\n| Sorting               | User requested or natural    | Sort by user-specified field            |\n\n## Data Enrichment (Only When Useful)\n\n| Enrichment             | When                         | Action                                |\n|:-----------------------|:-----------------------------|:--------------------------------------|\n| Currency conversion    | User asks for single currency| Note original + convert (approximate) |\n| Domain extraction      | URLs in data                 | Add domain column from full URLs      |\n| Word count             | Article mode                 | Count words in extracted text         |\n| Relative dates         | Dates present                | Add \"X days ago\" column if useful     |\n\n## Deduplication Strategy\n\nWhen combining data from multiple pages or URLs:\n1. Exact match: rows with identical values in all fields -> keep first\n2. Near match: rows with same key fields (name+source) but different details\n   -> keep most complete (fewer nulls), flag in notes\n3. Report: \"Removed N duplicate rows\" in delivery notes\n\n---\n\n## Phase 6: Validate\n\nVerify extraction quality before delivering results.\n\n## Validation Checks\n\n| Check                | Action                                              |\n|:---------------------|:----------------------------------------------------|\n| Item count           | Compare extracted count to expected count from recon |\n| Empty fields         | Count N/A or null values per field                   |\n| Data type consistency| Numbers should be numeric, dates parseable           |\n| Duplicates           | Flag exact duplicate rows (post-dedup)               |\n| Encoding             | Check for HTML entities, garbled characters           |\n| Completeness         | All user-requested fields present in output          |\n| Truncation           | Verify data wasn't cut off (check last items)        |\n| Outliers             | Flag values that seem anomalous (e.g. $0.00 price)  |\n\n## Confidence Rating\n\nAssign to every extraction:\n\n| Rating     | Criteria                                                        |\n|:-----------|:----------------------------------------------------------------|\n| **HIGH**   | All fields populated, count matches expected, no anomalies      |\n| **MEDIUM** | Minor gaps (<10% empty fields) or count slightly differs        |\n| **LOW**    | Significant gaps (>10% empty), structural issues, partial data  |\n\nAlways report confidence with specifics:\n> Confidence: **HIGH** - 47 items extracted, all 6 fields populated,\n> matches expected count from page analysis.\n\n## Auto-Recovery (Try Before Reporting Issues)\n\n| Issue              | Auto-Recovery Action                                  |\n|:-------------------|:------------------------------------------------------|\n| Missing data       | Re-attempt with Browser if WebFetch was used          |\n| Encoding problems  | Apply HTML entity decode + unicode normalization      |\n| Incomplete results | Check for pagination or lazy-loading, fetch more      |\n| Count mismatch     | Scroll/paginate to find remaining items               |\n| All fields empty   | Page likely JS-rendered, switch to Browser strategy   |\n| Partial fields     | Try JSON-LD extraction as supplement                  |\n\nLog all recovery attempts in delivery notes.\nInform user of any irrecoverable gaps with specific details.\n\n---\n\n## Phase 7: Format And Deliver\n\nStructure results according to user preference.\nSee [references/output-templates.md](references/output-templates.md)\nfor complete formatting templates.\n\n## Delivery Envelope\n\nALWAYS wrap results with this metadata header:\n\n```markdown\n\n## Extraction Results\n\n**Source:** [Page Title](http://example.com)\n**Date:** YYYY-MM-DD HH:MM UTC\n**Items:** N records (M fields each)\n**Confidence:** HIGH | MEDIUM | LOW\n**Strategy:** A (WebFetch) | B (Browser) | C (API) | E (Structured Data)\n**Format:** Markdown Table | JSON | CSV\n\n---\n\n[DATA HERE]\n\n---\n\n**Notes:**\n- [Any gaps, issues, or observations]\n- [Transforms applied: deduplication, normalization, etc.]\n- [Pages scraped if paginated: \"Pages 1-5 of 12\"]\n- [Auto-escalation if it occurred: \"Escalated from WebFetch to Browser\"]\n```\n\n## Markdown Table Rules\n\n- Left-align text columns (`:---`), right-align numbers (`---:`)\n- Consistent column widths for readability\n- Include summary row for numeric data when useful (totals, averages)\n- Maximum 10 columns per table; split wider data into multiple tables\n  or suggest JSON format\n- Truncate long cell values to 60 chars with `...` indicator\n- Use `N/A` for missing values, never leave cells empty\n- For multi-page results, show combined table (not per-page)\n\n## Json Rules\n\n- Use camelCase for keys (e.g. `productName`, `unitPrice`)\n- Wrap in metadata envelope:\n  ```json\n  {\n    \"metadata\": {\n      \"source\": \"URL\",\n      \"title\": \"Page Title\",\n      \"extractedAt\": \"ISO-8601\",\n      \"itemCount\": 47,\n      \"fieldCount\": 6,\n      \"confidence\": \"HIGH\",\n      \"strategy\": \"A\",\n      \"transforms\": [\"deduplication\", \"priceNormalization\"],\n      \"notes\": []\n    },\n    \"data\": [ ... ]\n  }\n  ```\n- Pretty-print with 2-space indentation\n- Numbers as numbers (not strings), booleans as booleans\n- null for missing values (not empty strings)\n\n## Csv Rules\n\n- First row is always headers\n- Quote any field containing commas, quotes, or newlines\n- UTF-8 encoding with BOM for Excel compatibility\n- Use `,` as delimiter (standard)\n- Include metadata as comments: `# Source: URL`\n\n## File Output\n\nWhen user requests file save:\n- Markdown: `.md` extension\n- JSON: `.json` extension\n- CSV: `.csv` extension\n- Confirm path before writing\n- Report full file path and item count after saving\n\n## Multi-Url Comparison Format\n\nWhen comparing data across multiple sources:\n- Add `Source` as the first column/field\n- Use short identifiers for sources (domain name or user label)\n- Group by source or interleave based on user preference\n- Highlight differences if user asks for comparison\n- Include summary: \"Best price: $X at store-b.com\"\n\n## Differential Output\n\nWhen user requests change detection (diff mode):\n- Compare current extraction with previous run\n- Mark new items with `[NEW]`\n- Mark removed items with `[REMOVED]`\n- Mark changed values with `[WAS: old_value]`\n- Include summary: \"Changes since last run: +5 new, -2 removed, 3 modified\"\n\n---\n\n## Rate Limiting\n\n- Maximum 1 request per 2 seconds for sequential page fetches\n- For multi-URL jobs, process sequentially with pauses\n- If a site returns 429 (Too Many Requests), stop and report to user\n\n## Access Respect\n\n- If a page blocks access (403, CAPTCHA, login wall), report to user\n- Do NOT attempt to bypass bot detection, CAPTCHAs, or access controls\n- Do NOT scrape behind authentication unless user explicitly provides access\n- Respect robots.txt directives when known\n\n## Copyright\n\n- Do NOT reproduce large blocks of copyrighted article text\n- For articles: extract factual data, statistics, and structured info;\n  summarize narrative content\n- Always include source attribution (http://example.com) in output\n\n## Data Scope\n\n- Extract ONLY what the user explicitly requested\n- Warn user before collecting potentially sensitive data at scale\n  (emails, phone numbers, personal information)\n- Do not store or transmit extracted data beyond what the user sees\n\n## Failure Protocol\n\nWhen extraction fails or is blocked:\n1. Explain the specific reason (JS rendering, bot detection, login, etc.)\n2. Suggest alternatives (different URL, API if available, manual approach)\n3. Never retry aggressively or escalate access attempts\n\n---\n\n## Quick Reference: Mode Cheat Sheet\n\n| User Says...                         | Mode      | Strategy  | Output Default   |\n|:-------------------------------------|:----------|:----------|:-----------------|\n| \"extract the table\"                  | table     | A or B    | Markdown table   |\n| \"get all products/prices\"            | product   | E then A  | Markdown table   |\n| \"scrape the listings\"                | list      | A or B    | Markdown table   |\n| \"extract contact info / team page\"   | contact   | A         | Markdown table   |\n| \"get the article data\"               | article   | A         | Markdown text    |\n| \"extract the FAQ\"                    | faq       | A or B    | JSON             |\n| \"get pricing plans\"                  | pricing   | A or B    | Markdown table   |\n| \"scrape job listings\"                | jobs      | A or B    | Markdown table   |\n| \"get event schedule\"                 | events    | A or B    | Markdown table   |\n| \"find and extract [topic]\"           | discovery | WebSearch | Markdown table   |\n| \"compare prices across sites\"        | multi-URL | A or B    | Comparison table |\n| \"what changed since last time\"       | diff      | any       | Diff format      |\n\n---\n\n## References\n\n- **Extraction patterns**: [references/extraction-patterns.md](references/extraction-patterns.md)\n  CSS selectors, JavaScript snippets, JSON-LD parsing, domain tips.\n\n- **Output templates**: [references/output-templates.md](references/output-templates.md)\n  Markdown, JSON, CSV templates with complete examples.\n\n- **Data transforms**: [references/data-transforms.md](references/data-transforms.md)\n  Cleaning, normalization, deduplication, enrichment patterns.\n\n## Best Practices\n\n- Provide clear, specific context about your project and requirements\n- Review all suggestions before applying them to production code\n- Combine with other complementary skills for comprehensive analysis\n\n## Common Pitfalls\n\n- Using this skill for tasks outside its domain expertise\n- Applying recommendations without understanding your specific context\n- Not providing enough project context for accurate analysis\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.","tags":["web","scraper","antigravity","awesome","skills","sickn33","agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows"],"capabilities":["skill","source-sickn33","skill-web-scraper","topic-agent-skills","topic-agentic-skills","topic-ai-agent-skills","topic-ai-agents","topic-ai-coding","topic-ai-workflows","topic-antigravity","topic-antigravity-skills","topic-claude-code","topic-claude-code-skills","topic-codex-cli","topic-codex-skills"],"categories":["antigravity-awesome-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/web-scraper","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add sickn33/antigravity-awesome-skills","source_repo":"https://github.com/sickn33/antigravity-awesome-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 37911 github stars · SKILL.md body (28,434 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:51:59.804Z","embedding":null,"createdAt":"2026-04-18T21:47:25.194Z","updatedAt":"2026-05-18T18:51:59.804Z","lastSeenAt":"2026-05-18T18:51:59.804Z","tsv":"'+5':3032 '-2':3034 '-3':200 '-5':546,2738 '-8':2898 '-8601':2277,2846 '/page/n':1561 '/part-time/contract':2072 '/tmp/scraped_data.csv':1396 '0.00':2508 '1':151,168,199,326,388,533,623,1181,1428,1473,1548,1593,1654,2142,2384,2737,3041,3184 '10':2530,2540,2780 '12':2740 '2':153,171,453,542,564,591,645,1192,1358,1434,1507,1554,1598,1619,1635,1659,2153,2396,2864,3044,3195 '2.1':600 '2.2':744 '2.3':814 '2/5':1581 '200':1852,2032,2076 '2026':583 '3':155,545,550,661,958,1202,1212,1440,1518,1564,1602,1669,2165,2417,3036,3205 '3/7':2198 '300':1943 '4':157,560,669,1215,1448,1576,1613,1680,1696,2174 '403':3079 '403/401':776 '429':3063 '47':2553,2848 '5':159,690,1238,1574,1582,1622,1648,1686,2186,2201 '50':1141 '6':161,706,1252,1627,2194,2427,2557,2850 '60':2799 '7':163,715,1261,1638,2653 '8':724,1334 '9':734 'absolut':2289 'accept':491,1225 'access':1033,3072,3078,3095,3106,3211 'accord':2659 'accordion':884 'accordion/collapsible':658 'accur':3419 'across':270,2952,3313 'action':204,750,1209,1346,1355,1606,1616,1673,1994,2221,2259,2331,2438,2577 'ad':1100 'add':2175,2348,2367,2955 'address':873,1898 'aggress':3208 'ago':2370 'ai':1003,1041 'align':2757,2762 'also':1124,1872 'altern':487,3197 'alway':496,2121,2218,2546,2672,2887,3134 'ambigu':462 'analysi':2565,3394,3420 'analyz':593,608,617 'angular':684 'annualpric':1976 'anomal':2506 'anomali':2526 'answer':882,1923,1938,1940 'anyth':922 'api':229,635,741,797,1369,1378,1381,2710,3200 'api/json/xml':1020 'app':683 'appli':2219,2254,2591,2728,3382,3406 'application/ld':1488 'applyurl':2078 'approach':315,964,3204 'approxim':662,2342 'array':1774,1956,1979,1984,2035 'array.from':1282,1292,1309,1492 'articl':236,626,845,855,1047,1797,1802,2356,3120,3123,3262,3264 'ask':456,467,946,1156,2335,2984,3454 'assess':1439 'assign':2512 'assist':134 'attempt':2582,2639,3088,3212 'attribut':3137 'authent':3101 'author':1805 'auto':253,284,1131,2567,2575,2742 'auto-detect':252 'auto-escal':283,1130,2741 'auto-recoveri':2566,2574 'autom':224,1009,1018,1166,1442,2100 'automat':289,317,1150,2216 'avail':517,721,735,806,935,951,1129,1796,1849,3202 'averag':2778 'b':771,790,1007,1016,1154,1164,2707,3230,3248,3274,3282,3291,3300,3320 'back':1520 'banner':1219 'base':965,2976 'bash':1028,1363,1376,1449 'bash/curl':228,801 'behind':3100 'best':1043,1167,1366,2989,3367 'beyond':3171 'block':778,3077,3117,3183 'blog':852 'board':641,912 'bodi':796 'bom':2901 'boolean':1504,2872,2874 'bot':3091,3191 'boundari':3462 'brand':1843 'browser':223,290,772,791,1008,1017,1165,1441,1475,1591,1652,2099,2584,2625,2708,2751 'built':2115 'button':699,1226,1651,1661,1666,1689,1995 'bypass':3090 'c':800,811,1027,1295,1362,1404,2709 'c.textcontent.trim':1296 'call':212,1992 'call-to-act':1991 'camelcas':2827 'capabl':217 'captcha':775,3080,3093 'card':652,839 'career':913 'catch':1500 'categori':1947,2038 'cell':1287,1293,2231,2796,2810 'center':888 'chang':301,2999,3020,3028,3324 'char':1853,1944,2033,2077,2800 'charact':2241,2481 'cheat':3216 'check':1216,1873,2436,2437,2476,2498,2599 'choos':521,553,956,961 'clarif':438,3456 'clarifi':152,327 'classif':816 'classifi':206,817 'clean':294,1458,2203,3362 'cleaner':1511 'cleanup':2223 'clear':185,446,755,3370,3429 'click':1670,1675 'code':3386 'coletar':100 'collaps':2225 'collect':397,3153 'column':1737,2350,2371,2759,2765,2781 'column/field':2178,2960 'columns/fields':1084 'combin':1427,2187,2377,2818,3387 'comma':2893 'comment':2912 'commerc':866 'common':3395 'compani':908,2064 'compar':433,1628,2441,2950,3003,3311 'comparison':273,582,832,2947,2986,3321 'compat':2904 'complementari':3390 'complet':2411,2482,2667,3356 'comprehens':3393 'compress':197 'comput':1208,1345,1354,1605,1615,1672 'concaten':1583 'condit':2252 'confer':905 'confid':277,2510,2548,2551,2700,2851 'confirm':2122,2160,2931 'consent':1223 'consist':2240,2460,2764 'const':1274,1280,1286,1302,1307,1483,1490 'contact':238,869,1883,1888,1910,2293,3252,3256 'contain':674,763,2892 'content':647,752,782,815,989,1104,1175,1205,1339,1438,1643,1685,1825,3133 'context':1184,1186,3372,3412,3417 'control':3096 'convers':2333 'convert':2287,2341 'cooki':1222 'cookie/consent':1218 'copyright':3112,3119 'correct':2164 'count':705,1601,1629,1809,2355,2358,2440,2443,2446,2451,2522,2534,2562,2608,2941 'createifempti':1188 'criteria':2517,3465 'crm':576,579 'css':918,1711,2107,3337 'csv':250,362,738,1389,1393,2718,2882,2928,2929,3353 'csv/excel':805,1374 'csv/json':23,46 'ctatext':1990 'curl':1029,1364,1379,1391,1399 'currenc':1110,1750,1845,1977,2267,2332,2338 'current':1552,1595,1656,3004 'custom':243,915,2088 'cut':2496 'd':1419 'dado':11,34,73,92,101 'dashboard':634 'data':186,195,292,311,320,349,353,400,447,471,509,574,631,666,707,713,719,727,754,808,938,972,986,1070,1079,1179,1267,1281,1298,1308,1333,1457,1462,1481,1491,1506,1514,1530,1550,1597,1626,1658,1813,1837,2151,2208,2324,2347,2378,2458,2493,2545,2579,2713,2719,2774,2786,2859,2951,3126,3141,3156,3170,3263,3358 'date':899,1111,1806,2026,2269,2272,2364,2365,2465,2686 'day':2369 'dd':2281,2690 'de':13,36 'decis':969 'decod':2234,2594 'dedup':2474 'dedupl':296,1586,2301,2374,2729,2856,3364 'default':248,339,381,481,1573,1647,3223 'delimit':2907 'deliv':2433,2656 'deliveri':2424,2641,2670 'densiti':708 'describ':3433 'descript':549,902,1258,1850,2030,2074,2097,2120 'detail':2408,2651 'detect':254,300,1013,1536,3000,3092,3192 'diff':275,431,3001,3328,3330 'differ':2407,2536,2981,3198 'differenti':298,2994 'direct':450,807,987,1349,1609,3109 'directori':633,878 'disappear':1690 'discount':1848 'discov':1431 'discoveri':231,522,3307 'dismiss':1231 'display':421 'distinct':665 'doc':636,856 'document.queryselectorall':1276,1304,1485 'dom':1516,1522 'domain':136,2343,2349,2966,3345,3404 'download':736,802,812,1023,1375,1390 'duplic':2311,2421,2467,2470 'durat':1211,1357,1618 'e':21,44,865,983,1460,1881,2711,3237 'e-commerc':864 'e.164':2297 'e.g':463,1558,2507,2830 'effort':175 'element':838,997,1255 'email':871,1896,3159 'embed':725 'empti':673,762,1148,2242,2449,2531,2541,2617,2811,2880 'encod':2475,2589,2899 'endpoint':742,798,1021,1371 'endtim':2028 'enough':3415 'enrich':297,2206,2325,2329,3365 'entiti':1818,2233,2479,2593 'envelop':2671,2836 'environ':3445 'environment-specif':3444 'escal':285,319,1132,1151,1160,2743,2747,3210 'establish':328,2149 'estim':1810 'estrategia':9,32 'estruturado':12,35 'et':1408 'et.parse':1412 'etc':943,2731,3194 'evalu':745 'event':241,642,898,903,2011,2023,3295,3297 'events/sessions':2017 'everi':281,1740,2180,2514 'everyth':506,948 'exact':1108,1736,1840,1935,2310,2385,2469 'exampl':569,824,3357 'example.com':2685,3138 'excel':739,2903 'exclud':402,1988 'execut':141,1698 'exist':714 'expect':1143,2445,2524,2561 'expert':3450 'expertis':137,3405 'explain':3185 'explicit':3104,3148 'export':22,45 'extens':2924,2927,2930 'extract':158,178,208,232,267,282,312,329,355,477,572,589,598,712,723,820,929,963,984,1004,1042,1059,1069,1078,1125,1262,1273,1301,1344,1463,1479,1523,1549,1579,1594,1623,1655,1683,1697,1720,1757,1767,1801,1835,1887,1908,1919,1932,1963,1973,2015,2024,2052,2062,2123,2136,2138,2143,2166,2197,2207,2264,2286,2344,2361,2430,2442,2515,2555,2633,2680,3005,3124,3143,3169,3179,3224,3251,3268,3305,3333 'extractedat':2844 'extrai':10,33 'extrair':72 'factual':1812,3125 'fail':177,287,3180 'failur':3176 'fall':1519 'fallback':291,318 'faq':239,637,879,885,1915,1930,3270,3271 'fast':179 'featur':893,1978,1982,1989 'feed':148,1373 'fetch':205,602,746,2606,3049 'fewer':2412 'field':921,936,952,1085,1149,1526,1768,1841,1871,2010,2047,2087,2096,2119,2323,2393,2403,2450,2457,2487,2520,2532,2558,2616,2628,2698,2891 'field1':1313,1315,1385 'field2':1320,1322,1387 'fieldcount':2849 'fields/columns':720 'file':418,809,2915,2920,2937 'filter':399,1503 'find':537,570,1220,1256,1660,1662,2612,3303 'first':512,1119,1788,1851,1882,1942,2031,2075,2146,2157,2395,2884,2959 'flag':2414,2468,2502 'follow':256,383 'footer':1101 'form':656,848 'format':164,216,245,358,1081,2298,2654,2668,2714,2793,2948,3331 'found':1230,2273 'framework':678 'free':655 'free-form':654 'full':1829,2070,2352,2936 'full-tim':2069 'fulli':990 'gap':2529,2539,2648,2723 'garbl':2480 'general':132 'general-purpos':131 'get':1182,1190,1247,3233,3260,3276,3294 'grid':653,827,840 'group':1951,2971 'handl':125,1532 'header':830,1738,2678,2888 'headings/paragraphs':851 'heavi':688,1446 'help':887 'hh':2691 'high':2518,2552,2701,2852 'high/medium/low':279 'highlight':1997,2980 'href':1330 'html':686,826,1053,2232,2478,2592 'hybrid':1420 'ident':2389 'identifi':1555,2963 'imag':862 'imageurl':1863 'import':1405 'includ':404,1098,1112,1739,1782,1790,1981,2769,2909,2987,3026,3135 'incomplet':2597 'indent':2866 'indic':672,702,823,2802 'infinit':259,700,1589 'info':3130,3253 'inform':1889,2643,3163 'initi':601,1437 'input':3459 'insuffici':1426 'inteligent':6,29 'intellig':314 'interact':1174 'interleav':2975 'interpret':749 'irrecover':2647 'iso':2276,2845 'issu':1034,2543,2572,2573,2724 'item':391,395,667,1115,1123,1138,1303,1305,1310,1312,1384,1600,1633,1759,1766,1784,1794,1931,2439,2500,2554,2614,2694,2940,3011,3016 'item.queryselector':1314,1321,1328 'itemcount':2847 'iter':1565,1646,1693 'javascript':670,1011,1264,1269,1271,1299,1476,1482,1714,2102,3339 'job':242,640,906,911,2048,2054,2061,3054,3286,3288 'jq':1030,1365,1383,1454 'js':226,689,768,931,1170,1445,2621,3189 'js-heavi':1444 'js-render':225,767,1169,2620 'json':249,361,729,792,941,974,1370,1377,1409,1417,1466,1489,1773,1866,1876,1955,2005,2042,2082,2248,2631,2717,2792,2824,2837,2925,2926,3275,3342,3352 'json-ld':728,940,973,1465,1875,2630,3341 'json.parse':1498 'json.stringify':1297,1332,1505 'keep':2394,2409 'key':1779,1780,1811,1859,2402,2829 'key-valu':1858 'key1':1386 'key2':1388 'known':3111 'label':2970 'lang':430 'languag':423,427,495,502 'larg':3116 'last':2499,3030,3326 'layout':828 'lazi':1177,1337,2604 'lazy-load':1176,1336,2603 'ld':730,942,975,1467,1877,2632,3343 'leav':2809 'left':1674,2756 'left-align':2755 'less':1139 'let':519,551,954 'like':2619 'limit':1983,1986,3039,3421 'link':693,737,803,1024,1327 'link/url':1791 'linkedinurl':1901 'list':235,628,651,716,835,844,904,1753,1769,1781,2055,3244,3245,3287 'list/card':1300 'lista':17,40 'load':262,675,697,764,1178,1207,1338,1625,1649,1664,2605 'load-mor':261,696 'locat':900,909,1253,2029,2065 'log':1158,2636 'login':773,3081,3193 'long':847,1946,2795 'long-form':846 'low':2537,2703 'm':2697 'main':646,718 'mani':3065 'manual':3203 'map':1284,1294,1311,1494 'mark':2000,3009,3014,3019 'markdown':246,359,365,483,1733,1822,1905,2679,2715,2752,2922,3231,3240,3249,3258,3266,3283,3292,3301,3309,3351 'marker':679 'match':924,1114,2386,2398,2523,2560,3430 'max':385,390,1572,1645,1692 'maximum':392,2779,3040 'mcp':1187 'md':2923 'medium':2527,2702 'mention':55,63,71,81,90,99,486 'metadata':1803,2677,2835,2838,2910 'microdata':731,976,1468 'minim':685,765 'minor':2528 'mismatch':2609 'miss':1091,1870,2009,2046,2086,2578,2806,2877,3467 'mm':2280,2689,2692 'mode':233,299,432,523,821,822,930,1064,1704,1717,1754,1798,1832,1884,1916,1960,2012,2049,2089,2263,2294,2357,3002,3215,3220 'mode-specif':1703 'modifi':3037 'monitoramento':20,43 'monthlypric':1975 'most':1147 'much':710 'multi':8,31,219,265,308,373,2131,2134,2303,2307,2814,2945,3052,3316 'multi-estrategia':7,30 'multi-pag':2302,2813 'multi-strategi':218,307 'multi-url':264,372,2130,2133,2306,2944,3051,3315 'multipl':1540,2140,2226,2380,2788,2953 'n':1560,1563,2229,2420,2695 'n/a':1095,2249,2452,2804 'name':859,870,891,1817,1842,1893,2037,2404,2967 'narrat':3132 'natur':2317 'navig':1099,1193,1196 'near':2397 'need':130,787,933,980,995,1588 'never':165,2808,3206 'new':1632,1642,1684,3010,3013,3033 'newli':1624 'newlin':2896 'news':854 'next':150 'next/prev':692 'nfkc':2237 'none':405 'normal':295,2204,2236,2238,2261,2270,2274,2292,2596,2730,3363 'note':1162,2339,2416,2425,2642,2721,2858 'null':1319,1326,1331,1502,1868,2007,2044,2084,2245,2246,2413,2454,2875 'number':258,393,663,695,1109,1544,2461,2763,2867,2869,3161 'numer':1748,2265,2464,2773 'o':1395 'object':1776,1958 'observ':2726 'occur':2746 'offset':1562 'often':1509 'old':3024 'one':194,210 'opengraph':732,1470 'option':377,1237 'order':145,407,413 'ordered/unordered':650 'organ':1899 'origin':1106,2184,2340 'originalpric':1846 'outlier':2501 'output':244,324,357,485,1416,2193,2490,2916,2995,3140,3222,3347,3439 'outsid':3402 'overview':26 'page':193,257,344,369,376,386,389,480,541,596,610,619,624,639,694,758,868,876,886,897,914,927,1046,1074,1172,1240,1243,1248,1541,1543,1553,1559,1567,1575,1580,1729,1763,1914,1927,1969,2020,2058,2304,2381,2564,2618,2683,2732,2736,2815,2823,2842,3048,3076,3255 'page-numb':1542 'pagin':251,370,382,384,691,1531,1534,1545,2601,2735 'pagina':14,37 'paginacao':19,42 'pair':883,1861,1924 'paramet':330,336,337,378,379 'pars':1398,1414,3344 'parseabl':2466 'partial':2544,2627 'path':180,415,420,2932,2938 'pattern':1557,1706,2215,3334,3366 'paus':3058 'pdf':740 'per':2456,2782,2822,3043 'per-pag':2821 'perform':510 'permiss':3460 'person':3162 'person/entity':1892 'phase':142,147,167,170,198,325,452,563,590,957,1695,2200,2426,2652 'phone':872,1897,2291,3160 'pitfal':3396 'plan':890,1972,3278 'plannam':1974 'plans/tiers':1966 'point':1814 'poor':785 'popul':2521,2559 'possibl':2300 'post':853,1452,2473 'post-dedup':2472 'post-process':1451 'postedd':2073 'potenti':3154 'practic':3368 'precis':788,996,1266,1749 'preco':18,41 'prefer':1233,2662,2979 'present':514,543,584,783,939,949,1472,2366,2488 'preserv':1105,1236,1747 'pretti':2861 'pretty-print':2860 'prevent':173 'previous':435,3007 'price':240,573,581,638,831,860,889,892,896,1844,1959,1965,2260,2509,2990,3277,3279,3312 'pricenorm':2857 'print':2862 'privaci':1235 'privacy-preserv':1234 'problem':2590 'proceed':449,561,2128 'process':1453,3055 'product':237,627,843,857,858,867,1831,1836,1878,3236,3385 'product/pricing':2262 'productnam':2831 'products/prices':3235 'producturl':1862 'progress':1578,2196 'project':3375,3416 'prompt':616,1060,1068,1719,1756,1800,1834,1886,1918,1962,2014,2051,2114 'protocol':3177 'provid':183,442,1510,2092,3105,3369,3414 'purpos':133 'python':1455 'python3':1403 'qualiti':747,2431 'queri':1221,1257,1663 'question':458,881,1922,1933,1936 'question-answ':880,1921 'quick':3213 'quot':2889,2894 'raspar':91 'rate':278,863,1854,2511,2516,3038 're':1343,2581 're-attempt':2580 're-extract':1342 'reach':1694 'react':680 'read':1239,1242 'readabl':2768 'real':1909 'reason':3188 'recommend':3407 'recommended/popular':2002 'recon':154,511,967,1145,2448 'reconnaiss':592 'record':926,1599,2181,2696 'recoveri':2568,2576,2638 'ref':1676,1677 'refer':3214,3332 'references/data-transforms.md':2212,2213,3360,3361 'references/extraction-patterns.md':1708,1709,3335,3336 'references/output-templates.md':2664,2665,3349,3350 'registrationurl':2039 'relat':58,66,76,85,94,104,2284,2363 'relev':490,540,1432,2256 'reliabl':1513 'remain':2168,2613 'remov':2228,2309,2419,3015,3018,3035 'render':227,671,769,932,1012,1171,2622,3190 'repeat':836,1639,1687 'report':622,779,1032,2418,2547,2571,2935,3069,3083 'reproduc':1828,3115 'request':127,189,460,492,2315,2486,2919,2998,3042,3066,3149 'requir':335,348,356,3377,3458 'resolut':2283 'resolv':338,380 'respect':3073,3107 'respond':424,497 'respons':777,795 'rest':1368 'result':411,586,630,704,842,968,1585,2158,2189,2434,2598,2658,2674,2681,2816 'retri':3207 'retriev':606 'return':1075,1135,1291,1497,1501,1730,1770,1819,1864,1902,1952,2003,2040,2080,3062 'review':3378,3451 'reviewcount':1855 'rich':751 'right':2761 'right-align':2760 'robots.txt':3108 'role':874,1895 'root':681 'row':1275,1283,1285,1722,1741,2312,2387,2399,2422,2471,2771,2885 'row.queryselectorall':1288 'rule':439,1086,2754,2825,2883 'run':304,436,3008,3031 's.textcontent':1499 'saa':895 'safeti':3461 'salari':910,2066 'salaryrang':2067 'save':414,416,2921,2943 'say':505,3219 'scale':3158 'schedul':3296 'schema':1879,2124,2152,2162,2173 'scope':367,3142,3432 'scrape':5,28,64,83,303,347,464,556,558,1517,2733,3099,3242,3285 'scraper':3,25,56,118,306 'script':1484,1486,1493 'scroll':260,701,1340,1347,1348,1590,1603,1607,1608,1636 'scroll/paginate':2610 'search':629,841 'second':3045 'section':659,1447 'section/category':1949 'see':1707,2211,2663,3175 'seem':2505 'select':316,566,588,960,1700 'selector':919,1278,1306,1316,1323,1712,2094,2108,3338 'sensit':3155 'sequenc':1180 'sequenti':3047,3056 'sheet':3217 'short':2962 'show':1577,2154,2195,2817 'signal':748 'signific':2538 'silent':288 'similar':837 'simpl':191,1048 'simpler':120 'sinc':3029,3325 'singl':192,203,368,375,1423,2192,2337 'site':102,466,3061,3314 'skill':51,110,3391,3399,3424 'skill-web-scraper' 'skip':166 'slight':2535 'snippet':1715,3340 'sort':406,410,2313,2318 'sourc':271,412,2177,2405,2682,2839,2913,2954,2956,2965,2973,3136 'source-sickn33' 'spa':677 'space':2227,2865 'spas':1173 'speaker':2034 'spec':834,861 'specif':122,352,470,531,1705,1856,2093,2550,2650,3187,3371,3411,3446 'specifi':917,2322 'spinner':676 'split':2784 'staff':877 'standard':2295,2908 'static':222,757,1045 'statist':833,1815,3127 'step':599,743,813 'still':213 'stop':1637,3067,3452 'store':3166 'store-b.com':2993 'strategi':156,220,309,759,770,789,799,810,959,982,1000,1006,1015,1026,1037,1153,1163,1361,1418,1424,1459,1547,1701,1880,2375,2626,2704,2853,3221 'strict':144 'string':2243,2871,2881 'structur':269,323,611,620,648,711,726,786,937,971,985,1052,1241,1461,1480,1529,1821,2542,2657,2712,3129 'substitut':3442 'success':3464 'suggest':2791,3196,3380 'summar':1746,1823,3131 'summari':2770,2988,3027 'supplement':2635 'suspici':1136 'switch':2623 'symbol':2268 'sys':1410 'sys.stdin':1413 'tab':660,1183,1185,1201,1214,1228,1245,1251,1260,1352,1360,1612,1621,1668,1679 'tabela':16,39 'tabid':1191,1200,1213,1227,1244,1250,1259,1351,1359,1611,1620,1667,1678 'tabl':234,247,360,366,484,632,649,825,1049,1272,1277,1716,1725,1734,1906,2251,2716,2753,2783,2789,2819,3226,3227,3232,3241,3250,3259,3284,3293,3302,3310,3322 'tag':733,2079 'tags/categories':1807 'tailor':1061 'target':187,340,350,448,595,614,998,1058,1071,1198,1254 'task':113,3401,3428 'td':1289 'team':875,3254 'templat':2669,3348,3354 'test':3448 'text':364,657,766,849,1249,1830,1937,1941,1996,2362,2758,3121,3267 'textcont':1317,1324 'th':1290 'tier':894 'time':2027,2071,3327 'tip':3346 'titl':901,907,1804,1894,2025,2063,2684,2841,2843 'tool':123,577,580,1270,1477,2103 'top':544,585 'topic':59,67,77,86,95,105,528,3306 'topic-agent-skills' 'topic-agentic-skills' 'topic-ai-agent-skills' 'topic-ai-agents' 'topic-ai-coding' 'topic-ai-workflows' 'topic-antigravity' 'topic-antigravity-skills' 'topic-claude-code' 'topic-claude-code-skills' 'topic-codex-cli' 'topic-codex-skills' 'total':703,2777 'touch':332 'tr':1279 'transform':160,293,321,2202,2217,2220,2253,2257,2727,2855,3359 'transmit':3168 'treat':3437 'tree':970,1411 'tri':1496,2569,2629 'trim':1318,1325,2224 'true':1189,1998 'truncat':1744,2491,2794 'type':196,625,928,1487,1760,2068,2459 'unclear':1093 'understand':3409 'unicod':2235,2595 'unit':1752 'unitpric':2832 'unless':3102 'unlimit':398 'unnecessari':457 'unrel':115,1103 'url':184,266,334,341,374,444,532,547,567,613,615,1066,1067,1195,1197,1199,1382,1394,1402,1433,1556,2132,2135,2141,2147,2169,2185,2199,2282,2285,2290,2308,2345,2353,2383,2840,2914,2946,3053,3199,3317 'url/link':1127 'use':49,108,534,603,1054,1094,1474,1702,1867,2006,2043,2083,2098,2110,2170,2328,2373,2588,2776,2803,2826,2905,2961,3397,3422 'user':54,62,70,80,89,98,129,182,428,441,500,504,520,525,552,587,781,916,945,1036,1157,1538,1570,2091,2105,2117,2126,2155,2314,2321,2334,2485,2644,2661,2918,2969,2978,2983,2997,3071,3085,3103,3147,3151,3174,3218 'user-request':2484 'user-specifi':2320 'utc':2693 'utf':2897 'valid':162,214,276,2210,2428,2435,3447 'valu':1089,1107,1860,2266,2390,2455,2503,2797,2807,2878,3021,3025 'verifi':2429,2492 'via':1268 'visibl':668,756,991,1596,1657,1911 'vue':682 'wait':1203,1210,1356,1614,1617,1681 'wall':774,3082 'want':474,1539 'warn':3150 'wasn':2494 'wast':174 'web':2,4,15,24,27,38,74,82,117,305,310 'web-scrap':1 'webfetch':211,221,286,604,612,761,993,1002,1039,1055,1065,1134,1435,1718,1755,1799,1833,1885,1917,1961,2013,2050,2111,2586,2706,2749 'websearch':230,535,578,1429,3308 'websit':1900 'well':1051 'well-structur':1050 'whitespac':2222 'wider':2785 'width':2766 'without':135,1155,3408 'word':1808,2354,2359 'work':140 'wrap':2673,2833 'write':2934 'x':2368,2991 'xml':794,1372,1397,1401 'xml.etree.elementtree':1406 'yes':981,994,1005,1014,1025 'yes/no':934 'yyyi':2279,2688 'yyyy-mm-dd':2278,2687","prices":[{"id":"84919dfa-e6c2-4e5b-9eec-a60829cb2543","listingId":"5fc778b8-c02c-449a-a6b4-0fd5bd632578","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"sickn33","category":"antigravity-awesome-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:47:25.194Z"}],"sources":[{"listingId":"5fc778b8-c02c-449a-a6b4-0fd5bd632578","source":"github","sourceId":"sickn33/antigravity-awesome-skills/web-scraper","sourceUrl":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/web-scraper","isPrimary":false,"firstSeenAt":"2026-04-18T21:47:25.194Z","lastSeenAt":"2026-05-18T18:51:59.804Z"},{"listingId":"5fc778b8-c02c-449a-a6b4-0fd5bd632578","source":"skills_sh","sourceId":"sickn33/antigravity-awesome-skills/web-scraper","sourceUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/web-scraper","isPrimary":true,"firstSeenAt":"2026-05-07T20:43:58.613Z","lastSeenAt":"2026-05-07T22:42:30.877Z"}],"details":{"listingId":"5fc778b8-c02c-449a-a6b4-0fd5bd632578","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"sickn33","slug":"web-scraper","github":{"repo":"sickn33/antigravity-awesome-skills","stars":37911,"topics":["agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows","antigravity","antigravity-skills","claude-code","claude-code-skills","codex-cli","codex-skills","cursor","cursor-skills","developer-tools","gemini-cli","gemini-skills","kiro","mcp","skill-library"],"license":"mit","html_url":"https://github.com/sickn33/antigravity-awesome-skills","pushed_at":"2026-05-18T08:24:49Z","description":"Installable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.","skill_md_sha":"25dac85a7003e47c1f8d92d4b09526617d713e8b","skill_md_path":"skills/web-scraper/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/web-scraper"},"layout":"multi","source":"github","category":"antigravity-awesome-skills","frontmatter":{"name":"web-scraper","description":"Web scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON."},"skills_sh_url":"https://skills.sh/sickn33/antigravity-awesome-skills/web-scraper"},"updatedAt":"2026-05-18T18:51:59.804Z"}}