Newspaper4k Python Article Extraction and NLP Library
Newspaper4k is an actively maintained fork of the popular Newspaper3k library for Python. It extracts articles, titles, images, authors, and metadata from news websites, with built-in NLP for keyword extraction and text summarization.
What it does
Newspaper4k Python Article Extraction and NLP Library
Newspaper4k is an actively maintained fork of the popular Newspaper3k library for Python. It extracts articles, titles, images, authors, and metadata from news websites, with built-in NLP for keyword extraction and text summarization.
Installation
Use the upstream install or setup path that matches your environment:
- pip install newspaper4k
- pip install newspaper4k[gnews]
- brew install libxml2 libxslt
- brew install libtiff libjpeg webp little-cms2
Requirements and caveats from upstream:
-
Python compatibility
- Python 3.10+ minimum
- python -m newspaper --url="https://edition.cnn.com/2023/11/17/success/job-seekers-use-ai/index.html" --language=en --output-format=json --output-file=article.json
Basic usage or getting-started notes:
-
bash
-
Using the CLI
-
You can start directly from the command line, using the included CLI:
-
Extracted from upstream docs: https://raw.githubusercontent.com/AndyTheFactory/newspaper4k/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,193 chars)