Skillquality 0.45

Common Crawl URL Index Miner

Queries the Common Crawl Index API and CC-MAIN collections to surface historical URL coverage, MIME types, and crawl snapshots at scale. Handy for research workflows that need broad web recall without building a full crawler from scratch.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/agentskillexchange/skills/common-crawl-url-index-miner

What it does

Common Crawl URL Index Miner

Installation

Use the upstream install or setup path that matches your environment:

docker build . -t cc-index-table
docker run --rm -ti cc-index-table --help
docker run --rm --entrypoint=/opt/spark/bin/spark-submit cc-index-table
docker run --mount=type=bind,source=/tmp/data,destination=/data --rm cc-index-table /data/in /data/out

Requirements and caveats from upstream:

Building and running using Docker
A Dockerfile is provided to compile the project and run the Spark job in a Docker container.
build the Docker image:

Basic usage or getting-started notes:

This projects provides a comprehensive set of example queries (SQL) and also Java code to fetch and process the WARC records matched by a SQL query.
Run mvn spotless:check and mvn spotless:apply, see the Spotless Maven guide. Java formatting rules are defined in [eclipse-formatter.xml](eclips...
run the table converter tool, here showing the command-line help (--help):
Source: https://github.com/commoncrawl/cc-index-table
Extracted from upstream docs: https://raw.githubusercontent.com/commoncrawl/cc-index-table/HEAD/README.md

Source

Agent Skill Exchange

Capabilities

skillsource-agentskillexchangeskill-common-crawl-url-index-minertopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Installnpx skills add agentskillexchange/skills

Sourcehttps://github.com/agentskillexchange/skills/tree/main/skills/common-crawl-url-index-miner

skills.shhttps://skills.sh/agentskillexchange/skills/common-crawl-url-index-miner

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,594 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:09:53Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/fFhVdQ

What it does

Common Crawl URL Index Miner

Installation

Building and running using Docker

Source

Capabilities

Install

Quality

Provenance

Agent access