Beautiful Soup Academic Paper Parser
Extracts structured citation data from academic repositories using BeautifulSoup4 with lxml parser. Parses DOI metadata, author affiliations, and reference lists from PubMed, arXiv, and Semantic Scholar HTML.
What it does
Beautiful Soup Academic Paper Parser
Extracts structured citation data from academic repositories using BeautifulSoup4 with lxml parser. Parses DOI metadata, author affiliations, and reference lists from PubMed, arXiv, and Semantic Scholar HTML.
Installation
Use the upstream install or setup path that matches your environment:
- pip install beautifulsoup4
- format. Run make html in that directory to create HTML
Requirements and caveats from upstream:
- Requires: Python >=3.7.0
- Python
- Python :: 3
Basic usage or getting-started notes:
-
from bs4 import BeautifulSoup
-
soup = BeautifulSoup("<p>Some<b>bad<i>HTML")
-
print(soup.prettify())
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (830 chars)