Normalize and filter noisy URL lists before crawling or queueing
Uses Courlan to clean, normalize, de-track, and language-filter raw URL inventories before a crawler, scraper, or analyst queue touches them. Best when an agent already has too many candidate links and needs a smaller, cleaner frontier, not a full crawling stack.
What it does
Normalize and filter noisy URL lists before crawling or queueing
Uses Courlan to clean, normalize, de-track, and language-filter raw URL inventories before a crawler, scraper, or analyst queue touches them. Best when an agent already has too many candidate links and needs a smaller, cleaner frontier, not a full crawling stack.
Prerequisites
Python 3, pip, command line
Installation
Use the upstream install or setup path that matches your environment:
- $ pip install courlan # pip3 install on systems where both Python 2 and 3 are installed
- $ pip install --upgrade courlan # to make sure you have the latest version
- $ pip install git+https://github.com/adbar/courlan.git # latest available code (see build status above)
Requirements and caveats from upstream:
Basic usage or getting-started notes:
-
is tested on Linux, macOS and Windows systems.
-
Courlan is available on the package repository PyPI
-
bash
-
Source: https://github.com/adbar/courlan
-
Extracted from upstream docs: https://raw.githubusercontent.com/adbar/courlan/HEAD/README.md
Documentation
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,583 chars)