Parquet Schema Extractor for S3
Extracts and validates Parquet file schemas from Amazon S3 using the PyArrow library and AWS S3 SDK (boto3). Compares schemas across multiple partitions to detect schema drift and incompatible type changes. Outputs a schema diff report with partition paths and affected column det
What it does
Parquet Schema Extractor for S3
Extracts and validates Parquet file schemas from Amazon S3 using the PyArrow library and AWS S3 SDK (boto3). Compares schemas across multiple partitions to detect schema drift and incompatible type changes. Outputs a schema diff report with partition paths and affected column details.
Installation
Use the upstream install or setup path that matches your environment:
- $ npm install parquetjs
Requirements and caveats from upstream:
- This project requires a major overhaul, as well as handling and sorting through dozens of issues and prs.
- fully asynchronous, pure node.js implementation of the Parquet file format
- To use parquet.js with node.js, install it using npm:
Basic usage or getting-started notes:
-
Once you have installed the parquet.js library, you can import it as a single
-
Extracted from upstream docs: https://raw.githubusercontent.com/ironSource/parquetjs/HEAD/README.md
Source
Capabilities
Install
Quality
deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (1,135 chars)