Skillquality 0.46

web-search

Web search and content extraction skill for AI coding agents. Zero API keys required. Decision tree with fallback chains across 5 tools.

Price
free
Protocol
skill
Verified
no

What it does

Web Search

Web search, scraping, and content extraction for AI coding agents. Zero API keys required. Five tools organized in fallback chains: WebSearch and Crawl4AI as primary, Jina as secondary, duckduckgo-search and WebFetch as fallbacks. Use when your agent needs web information -- finding pages, extracting content, or conducting research.

Terminology used in this file:

  • Playwright: Browser automation framework used by Crawl4AI for JavaScript-rendered pages.
  • SPA: Single-page application; content is rendered dynamically in JavaScript.
  • MCP: Model Context Protocol, a standard for exposing tool servers to AI agents.

Setup

python3 -m pip install crawl4ai duckduckgo-search
crawl4ai-setup
  • Claude Code: copy this skill folder into .claude/skills/web-search/
  • Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the web-search skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.


Quick Start

Run this minimal fallback-safe sequence:

# 1) Find candidate pages
python3 -c "from duckduckgo_search import DDGS; import json; print(json.dumps(DDGS().text('your query', max_results=5), indent=2))"

# 2) Extract one page quickly (no local deps)
curl -s "https://r.jina.ai/http://example.com/article" | head -80

# 3) Escalate to Crawl4AI if JS rendering is needed
crwl https://example.com/app --f markdown --bypass-cache

Use this routing rule: search with WebSearch first, extract with Jina/WebFetch for simple pages, escalate to Crawl4AI for JS-heavy targets.

Decision Tree

Need info from the web?
  |
  +-- Need to SEARCH for pages/answers?
  |     +-- Default first choice --> WebSearch (built-in, zero setup)
  |     +-- WebSearch unavailable? --> Jina s.jina.ai (no key needed)
  |     +-- Both fail? --> duckduckgo-search Python lib (emergency fallback)
  |
  +-- Need to EXTRACT content from a known URL?
  |     +-- JS-heavy SPA, dynamic content? --> Crawl4AI crwl (full browser rendering)
  |     +-- Simple text page (article, docs, blog)? --> Jina r.jina.ai (fast, no install)
  |     +-- Jina/Crawl4AI unavailable? --> WebFetch (built-in, AI-summarized)
  |     +-- Need structured data extraction? --> Crawl4AI with extraction strategy
  |     +-- Multiple URLs in batch? --> Crawl4AI batch mode
  |
  +-- Need DEEP RESEARCH (search + extract + combine)?
        --> WebSearch to find URLs --> Crawl4AI/Jina extract each --> synthesize

Rule of thumb: WebSearch for finding, Jina for reading, Crawl4AI for rendering.

Tool Reference

WebSearch (Built-in) -- Primary Search

What: Claude Code built-in web search tool. Returns search results with links and snippets. Install required: None (built-in to Claude Code) Strengths: Zero setup, zero API keys, integrated into agent workflow, always available Weaknesses: No direct SDK/CLI access (tool-only), results are search-result blocks not raw JSON

# Invoked as a Claude Code tool:
WebSearch(query="your search query")

# Supports domain filtering:
WebSearch(query="your query", allowed_domains=["docs.python.org"])
WebSearch(query="your query", blocked_domains=["pinterest.com"])

Returns: Search result blocks with titles, URLs, and content snippets.

WebFetch (Built-in) -- Fallback URL Extraction

What: Claude Code built-in URL fetcher. Fetches page content, converts HTML to markdown, processes with AI. Install required: None (built-in to Claude Code) Strengths: Zero setup, AI-processed output, handles redirects, 15-min cache Weaknesses: Cannot handle authenticated/private URLs, may summarize large content

# Invoked as a Claude Code tool:
WebFetch(url="https://example.com/page", prompt="Extract the main content")

Limitations:

  • Will fail for authenticated URLs (Google Docs, Confluence, Jira, private GitHub)
  • HTTP auto-upgraded to HTTPS
  • Large content may be summarized rather than returned in full
  • When redirected to a different host, returns redirect URL instead of content

Crawl4AI -- JS-Rendering Web Scraper

What: Open-source scraper with full Playwright browser rendering. Outputs LLM-friendly markdown. Install required: pip install crawl4ai && crawl4ai-setup Strengths: Full JS rendering, handles SPAs, batch crawling, structured extraction Weaknesses: Requires Playwright install, heavier than Jina

# CLI (simplest)
crwl https://example.com
crwl https://example.com -o markdown

# Python API
from crawl4ai import AsyncWebCrawler
async with AsyncWebCrawler() as crawler:
    result = await crawler.arun(url='https://example.com')
    print(result.markdown)

Jina Reader/Search -- Zero-Install Extraction & Search

What: URL-to-markdown converter and search via HTTP API. No install needed -- just curl. API key: Not required. JINA_API_KEY is optional and only increases rate limits. Strengths: Zero install, fast (~1s), works everywhere curl works, search + extract in one service Weaknesses: No JS rendering, rate limited without API key

# Read a URL (returns markdown)
curl -s 'https://r.jina.ai/https://example.com'

# Search (returns search results)
curl -s 'https://s.jina.ai/your+search+query'

# With API key (higher rate limits, optional)
curl -s -H "Authorization: Bearer $JINA_API_KEY" 'https://r.jina.ai/https://example.com'

duckduckgo-search -- Emergency Search Fallback

What: Python library for DuckDuckGo search. Zero API keys, zero registration. Install required: pip install duckduckgo-search Strengths: Completely free, no API key, no rate limit concerns, reliable fallback Weaknesses: Less AI-optimized results than WebSearch, Python-only

from duckduckgo_search import DDGS
results = DDGS().text("your query", max_results=5)
for r in results:
    print(r['title'], r['href'], r['body'])
# One-liner from CLI
python3 -c "from duckduckgo_search import DDGS; import json; print(json.dumps(DDGS().text('your query', max_results=5), indent=2))"

Core Workflows

Pattern 1: Quick Web Search

When: Need factual answers or find relevant pages

  1. Use WebSearch: WebSearch(query="your query here")
  2. Parse results: each result has title, URL, and content snippet
  3. Fallback: curl -s 'https://s.jina.ai/your+query+here'
  4. Emergency: python3 -c "from duckduckgo_search import DDGS; ..."

Pattern 2: URL Content Extraction

When: Have a URL, need its content as clean text/markdown

a) JS-heavy site: crwl URL (Crawl4AI, full rendering) b) Lightweight static page: curl -s 'https://r.jina.ai/URL' (Jina) c) Both fail: WebFetch(url="URL", prompt="Extract the main content")

Decision: Is it a SPA/JS-heavy? Use Crawl4AI. Static content? Use Jina first. If output is empty/broken, escalate.

Pattern 3: Deep Research

When: Need comprehensive research on a topic with multiple sources

  1. WebSearch to find relevant pages
  2. Pick top 3-5 URLs from results
  3. Extract each with Crawl4AI or Jina
  4. If any extraction fails (JS site), use the other tool
  5. Synthesize extracted content into research summary

Token budget: ~5K per extracted page, budget 25K total for 5 pages

Pattern 4: Batch URL Scraping

When: Need content from multiple URLs (5+)

import asyncio
from crawl4ai import AsyncWebCrawler
urls = ['url1', 'url2', 'url3']

async def batch():
    async with AsyncWebCrawler() as crawler:
        for url in urls:
            result = await crawler.arun(url=url)
            print(f'--- {url} ---')
            print(result.markdown[:2000])

asyncio.run(batch())

Pattern 5: Fallback Chain

When: Primary tool fails

Search chain: WebSearch (built-in) --> Jina s.jina.ai --> duckduckgo-search

Extract chain: Crawl4AI crwl --> Jina r.jina.ai --> WebFetch (built-in)

Always try the primary tool first, escalate on failure.


MCP Configuration

Jina MCP (optional enhancement, not required):

{
  "jina-reader": {
    "command": "npx",
    "args": ["-y", "jina-ai-reader-mcp"]
  }
}

MCP (Model Context Protocol) is optional. Your agent can use CLI/Python/built-in tools directly.


Environment Setup

Zero API keys required. All tools work out of the box.

Optional:

  • JINA_API_KEY (get from https://jina.ai) -- increases rate limits, not required
export JINA_API_KEY='jina_...'  # optional

Install:

  • pip install crawl4ai duckduckgo-search
  • crawl4ai-setup # installs Playwright browsers

Built-in tools (WebSearch, WebFetch) require no installation.

Verify: ./scripts/search-check.sh


Anti-Patterns

Do NOTDo instead
Use Crawl4AI for simple text pagesUse Jina r.jina.ai (zero overhead)
Use Jina for JS-heavy SPAsUse Crawl4AI (Jina has no JS rendering)
Skip the fallback chainAlways have a backup: WebSearch->Jina->duckduckgo, Crawl4AI->Jina->WebFetch
Extract full pages when you need one factUse WebSearch (returns relevant snippets directly)
Batch with Jina for 10+ URLsUse Crawl4AI batch mode (designed for it)
Forget rate limitsJina without API key has stricter limits
Use WebFetch for authenticated URLsIt will fail; use browser-ops skill or direct API access

Error Handling

SymptomToolCauseFix
No results returnedWebSearchQuery too specific or topic too nicheBroaden query, try Jina s.jina.ai or duckduckgo-search
Redirect notificationWebFetchURL redirects to different hostMake a new WebFetch request with the provided redirect URL
Auth failureWebFetchAuthenticated/private URLUse browser-ops skill or direct API access instead
Content summarizedWebFetchPage content too largeUse Jina r.jina.ai or Crawl4AI for full content
429 Too Many RequestsJinaRate limit hitAdd JINA_API_KEY header, or add delay between requests
Empty/truncated outputJinaJS-rendered content not capturedEscalate to Crawl4AI: crwl URL
crwl: command not foundCrawl4AINot installed or not on PATHpip install crawl4ai && crawl4ai-setup
Playwright browser not foundCrawl4AIcrawl4ai-setup not runRun: crawl4ai-setup
TimeoutErrorCrawl4AIPage too slow or blockingAdd timeout parameter, check if site blocks bots
SSL certificate errorAnyExpired or self-signed certRetry; for Crawl4AI add ignore_https_errors=True
403 ForbiddenJina/Crawl4AISite blocking automated accessTry different tool from fallback chain
ImportError: duckduckgo_searchduckduckgo-searchPackage not installedpip install duckduckgo-search
RatelimitExceptionduckduckgo-searchToo many requests too fastAdd 1-2s delay between calls, or switch to WebSearch

Bundled Resources Index

PathWhatWhen to load
./UPDATES.mdStructured changelog for AI agentsWhen checking for new features or updates
./UPDATE-GUIDE.mdInstructions for AI agents performing updatesWhen updating this skill
./references/installation-guide.mdDetailed install walkthrough for Claude Code and Codex CLIFirst-time setup or environment repair
./references/tool-comparison.mdSide-by-side comparison: latency, cost, JS support, accuracyWhen choosing between tools for a specific use case
./references/error-patterns.mdDetailed failure modes and recovery per toolWhen debugging a failed extraction or search
./scripts/search-check.shHealth check: verifies all tools are availableBefore first web search task in a session
./scripts/setup.shOne-shot installer for all dependenciesFirst-time setup or after environment reset

Capabilities

skillsource-buildoakskill-web-searchtopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-automationtopic-browser-automationtopic-claude-codetopic-claude-skillstopic-codex

Install

Installnpx skills add buildoak/fieldwork-skills
Transportskills-sh
Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (12,279 chars)

Provenance

Indexed fromgithub
Enriched2026-04-22 19:06:34Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-04-22

Agent access