Skillquality 0.45

Common Crawl URL Index Miner

Queries the Common Crawl Index API and CC-MAIN collections to surface historical URL coverage, MIME types, and crawl snapshots at scale. Handy for research workflows that need broad web recall without building a full crawler from scratch.

Price
free
Protocol
skill
Verified
no

What it does

Common Crawl URL Index Miner

Queries the Common Crawl Index API and CC-MAIN collections to surface historical URL coverage, MIME types, and crawl snapshots at scale. Handy for research workflows that need broad web recall without building a full crawler from scratch.

Installation

Use the upstream install or setup path that matches your environment:

  • docker build . -t cc-index-table
  • docker run --rm -ti cc-index-table --help
  • docker run --rm --entrypoint=/opt/spark/bin/spark-submit cc-index-table
  • docker run --mount=type=bind,source=/tmp/data,destination=/data --rm cc-index-table /data/in /data/out

Requirements and caveats from upstream:

  • Building and running using Docker

  • A Dockerfile is provided to compile the project and run the Spark job in a Docker container.
  • build the Docker image:

Basic usage or getting-started notes:

Source

Capabilities

skillsource-agentskillexchangeskill-common-crawl-url-index-minertopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,594 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:09:53Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access