{"id":"384ffb61-ca53-43b7-afe7-24e994e2d63b","shortId":"GtmDtd","kind":"skill","title":"Normalize and filter noisy URL lists before crawling or queueing","tagline":"Uses Courlan to clean, normalize, de-track, and language-filter raw URL inventories before a crawler, scraper, or analyst queue touches them. Best when an agent already has too many candidate links and needs a smaller, cleaner frontier, not a full crawling stack.","description":"# Normalize and filter noisy URL lists before crawling or queueing\n\nUses Courlan to clean, normalize, de-track, and language-filter raw URL inventories before a crawler, scraper, or analyst queue touches them. Best when an agent already has too many candidate links and needs a smaller, cleaner frontier, not a full crawling stack.\n\n## Prerequisites\n\nPython 3, pip, command line\n\n## Installation\n\nUse the upstream install or setup path that matches your environment:\n- $ pip install courlan # pip3 install on systems where both Python 2 and 3 are installed\n- $ pip install --upgrade courlan # to make sure you have the latest version\n- $ pip install git+https://github.com/adbar/courlan.git # latest available code (see build status above)\n\nRequirements and caveats from upstream:\n- [![Python package](https://img.shields.io/pypi/v/courlan.svg)](https://pypi.python.org/pypi/courlan)\n- [![Python versions](https://img.shields.io/pypi/pyversions/courlan.svg)](https://pypi.python.org/pypi/courlan)\n- Usable with Python or on the command-line\n\nBasic usage or getting-started notes:\n- is tested on Linux, macOS and Windows systems.\n- Courlan is available on the package repository [PyPI](https://pypi.org/)\n- bash\n\n- Source: https://github.com/adbar/courlan\n- Extracted from upstream docs: https://raw.githubusercontent.com/adbar/courlan/HEAD/README.md\n\n## Documentation\n\n- https://adrien.barbaresi.eu/blog/easy-content-aware-url-filtering.html\n\n## Source\n\n- [Agent Skill Exchange](https://agentskillexchange.com/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing/)","tags":["normalize","and","filter","noisy","url","lists","before","crawling","queueing","skills","agentskillexchange","agent-skills"],"capabilities":["skill","source-agentskillexchange","skill-normalize-and-filter-noisy-url-lists-before-crawling-or-queueing","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-awesome-list","topic-claude-code","topic-codex","topic-cursor","topic-llm","topic-mcp","topic-npx-skills","topic-openclaw","topic-skills-catalog"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/agentskillexchange/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add agentskillexchange/skills","source_repo":"https://github.com/agentskillexchange/skills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,583 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:11:26.133Z","embedding":null,"createdAt":"2026-05-18T13:17:55.817Z","updatedAt":"2026-05-18T19:11:26.133Z","lastSeenAt":"2026-05-18T19:11:26.133Z","tsv":"'/)':218 '/adbar/courlan':223 '/adbar/courlan.git':161 '/adbar/courlan/head/readme.md':230 '/blog/easy-content-aware-url-filtering.html':234 '/pypi/pyversions/courlan.svg)](https://pypi.python.org/pypi/courlan)':183 '/pypi/v/courlan.svg)](https://pypi.python.org/pypi/courlan)':178 '/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing/)':241 '2':139 '3':113,141 'adrien.barbaresi.eu':233 'adrien.barbaresi.eu/blog/easy-content-aware-url-filtering.html':232 'agent':38,93,236 'agentskillexchange.com':240 'agentskillexchange.com/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing/)':239 'alreadi':39,94 'analyst':31,86 'avail':163,210 'bash':219 'basic':193 'best':35,90 'build':166 'candid':43,98 'caveat':171 'clean':14,69 'cleaner':49,104 'code':164 'command':115,191 'command-lin':190 'courlan':12,67,131,147,208 'crawl':8,54,63,109 'crawler':28,83 'de':17,72 'de-track':16,71 'doc':227 'document':231 'environ':128 'exchang':238 'extract':224 'filter':3,22,58,77 'frontier':50,105 'full':53,108 'get':197 'getting-start':196 'git':158 'github.com':160,222 'github.com/adbar/courlan':221 'github.com/adbar/courlan.git':159 'img.shields.io':177,182 'img.shields.io/pypi/pyversions/courlan.svg)](https://pypi.python.org/pypi/courlan)':181 'img.shields.io/pypi/v/courlan.svg)](https://pypi.python.org/pypi/courlan)':176 'instal':117,121,130,133,143,145,157 'inventori':25,80 'languag':21,76 'language-filt':20,75 'latest':154,162 'line':116,192 'link':44,99 'linux':203 'list':6,61 'maco':204 'make':149 'mani':42,97 'match':126 'need':46,101 'noisi':4,59 'normal':1,15,56,70 'note':199 'packag':175,213 'path':124 'pip':114,129,144,156 'pip3':132 'prerequisit':111 'pypi':215 'pypi.org':217 'pypi.org/)':216 'python':112,138,174,179,186 'queue':10,32,65,87 'raw':23,78 'raw.githubusercontent.com':229 'raw.githubusercontent.com/adbar/courlan/head/readme.md':228 'repositori':214 'requir':169 'scraper':29,84 'see':165 'setup':123 'skill':237 'skill-normalize-and-filter-noisy-url-lists-before-crawling-or-queueing' 'smaller':48,103 'sourc':220,235 'source-agentskillexchange' 'stack':55,110 'start':198 'status':167 'sure':150 'system':135,207 'test':201 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-awesome-list' 'topic-claude-code' 'topic-codex' 'topic-cursor' 'topic-llm' 'topic-mcp' 'topic-npx-skills' 'topic-openclaw' 'topic-skills-catalog' 'touch':33,88 'track':18,73 'upgrad':146 'upstream':120,173,226 'url':5,24,60,79 'usabl':184 'usag':194 'use':11,66,118 'version':155,180 'window':206","prices":[{"id":"f509b269-8761-4ca9-8153-a1bb9732f154","listingId":"384ffb61-ca53-43b7-afe7-24e994e2d63b","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"agentskillexchange","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:17:55.817Z"}],"sources":[{"listingId":"384ffb61-ca53-43b7-afe7-24e994e2d63b","source":"github","sourceId":"agentskillexchange/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing","sourceUrl":"https://github.com/agentskillexchange/skills/tree/main/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing","isPrimary":false,"firstSeenAt":"2026-05-18T13:17:55.817Z","lastSeenAt":"2026-05-18T19:11:26.133Z"}],"details":{"listingId":"384ffb61-ca53-43b7-afe7-24e994e2d63b","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"agentskillexchange","slug":"normalize-and-filter-noisy-url-lists-before-crawling-or-queueing","github":{"repo":"agentskillexchange/skills","stars":8,"topics":["agent-skills","ai-agents","ai-tools","awesome-list","claude-code","codex","cursor","llm","mcp","npx-skills","openclaw","skills-catalog"],"license":"mit","html_url":"https://github.com/agentskillexchange/skills","pushed_at":"2026-05-18T19:02:17Z","description":"The open catalog of AI agent skills — 2,000+ security-scanned skills for Claude Code, Cursor, Codex, and more.","skill_md_sha":"f70dde370035bb879d4a87c1762a966236df75f9","skill_md_path":"skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/agentskillexchange/skills/tree/main/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"Normalize and filter noisy URL lists before crawling or queueing","description":"Uses Courlan to clean, normalize, de-track, and language-filter raw URL inventories before a crawler, scraper, or analyst queue touches them. Best when an agent already has too many candidate links and needs a smaller, cleaner frontier, not a full crawling stack."},"skills_sh_url":"https://skills.sh/agentskillexchange/skills/normalize-and-filter-noisy-url-lists-before-crawling-or-queueing"},"updatedAt":"2026-05-18T19:11:26.133Z"}}