Skillquality 0.45

Turn captured WARC pages into clean text and language-tagged records with warc2text

Use warc2text when an agent already has WARC captures and needs readable text, language identification, and exportable records for review, search, or corpus building instead of re-crawling pages.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/agentskillexchange/skills/turn-captured-warc-pages-into-clean-text-and-language-tagged-records-with-warc2text

What it does

Turn captured WARC pages into clean text and language-tagged records with warc2text

Use warc2text when an agent already has WARC captures and needs readable text, language identification, and exportable records for review, search, or corpus building instead of re-crawling pages.

Prerequisites

warc2text build or binary, WARC input files, local output storage

Installation

Use the upstream install or setup path that matches your environment:

git clone --recurse-submodules https://github.com/bitextor/warc2text.git
git clone https://github.com/bitextor/warc2text.git
brew install uchardet libzip
cmake -DCMAKE_INSTALL_PREFIX=/your/prefix/path ..

Requirements and caveats from upstream:

On a node with EasyBuild installed you can install warc2text as a module:
--skip-text-extraction Skip text extraction and output only html. This option is not compatible with "text" value in -f option and also requires to skip language identification.

Basic usage or getting-started notes:

On Debian/Ubuntu/Mint:
apt-get install build-essential cmake libuchardet-dev libzip-dev libboost-thread-dev libboost-regex-dev libboost-filesystem-dev libboost-log-dev libboost-iostreams-dev libboost-locale-dev libboost-program-options-dev
On Mac:
Source: https://github.com/bitextor/warc2text
Extracted from upstream docs: https://raw.githubusercontent.com/bitextor/warc2text/HEAD/README.md

Documentation

https://github.com/bitextor/warc2text

Source

Agent Skill Exchange

Capabilities

skillsource-agentskillexchangeskill-turn-captured-warc-pages-into-clean-text-and-language-tagged-records-with-warc2texttopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-awesome-listtopic-claude-codetopic-codextopic-cursortopic-llmtopic-mcptopic-npx-skillstopic-openclawtopic-skills-catalog

Install

Installnpx skills add agentskillexchange/skills

Sourcehttps://github.com/agentskillexchange/skills/tree/main/skills/turn-captured-warc-pages-into-clean-text-and-language-tagged-records-with-warc2text

skills.shhttps://skills.sh/agentskillexchange/skills/turn-captured-warc-pages-into-clean-text-and-language-tagged-records-with-warc2text

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,626 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:12:55Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/cPxARY