Scrapy Distributed Crawler Framework
Orchestrates large-scale web crawling using Scrapy with scrapy-redis for distributed job queuing. Integrates Splash for JavaScript rendering, stores results in MongoDB via scrapy-mongodb pipeline, and respects robots.txt with AutoThrottle.
What it does
Scrapy Distributed Crawler Framework
Orchestrates large-scale web crawling using Scrapy with scrapy-redis for distributed job queuing. Integrates Splash for JavaScript rendering, stores results in MongoDB via scrapy-mongodb pipeline, and respects robots.txt with AutoThrottle.
Installation
Use the upstream install or setup path that matches your environment:
- pip install scrapy
Requirements and caveats from upstream:
- :alt: Supported Python Versions
- It is cross-platform, and requires Python 3.10+. It is maintained by Zyte_
Basic usage or getting-started notes:
-
.. code:: bash
-
And follow the documentation_ to learn how to use it.
-
.. _documentation: https://docs.scrapy.org/en/latest/
-
Source: https://github.com/scrapy/scrapy
-
Extracted from upstream docs: https://raw.githubusercontent.com/scrapy/scrapy/HEAD/README.rst
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (962 chars)