Skillquality 0.46

spark-history-cli

Query a running Apache Spark History Server from Copilot CLI. Use this whenever the user wants to inspect SHS applications, jobs, stages, executors, SQL executions, environment details, or event logs, especially when they mention Spark History Server, SHS, event log history, benc

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/yaooqinn/spark-history-cli/spark-history-cli

What it does

spark-history-cli

Use this skill when the task is about exploring or debugging data exposed by a running Apache Spark History Server.

Installation

pip install spark-history-cli

Or if not on PATH after install:

python -m spark_history_cli --json apps

Why use this skill

It gives you a purpose-built CLI instead of scraping the Spark History Server web UI.
It wraps the REST API cleanly and already handles attempt-ID resolution for multi-attempt apps.
It supports --json, which makes downstream reasoning and comparisons much easier.

Workflow

Prefer the CLI over raw REST calls.
Prefer --json unless the user explicitly wants a human-formatted table.
Use --server <url> or SPARK_HISTORY_SERVER to point at the right SHS. If the user does not specify one, assume http://localhost:18080.
Start broad, then drill down:
- list applications
- choose the relevant app
- inspect jobs, stages, executors, SQL executions, environment, or logs
If the user says "latest app", "recent run", or similar, list apps first and choose the most relevant recent application before continuing.
If the CLI is unavailable, install it with python -m pip install spark-history-cli if tool permissions allow it.

Command patterns

spark-history-cli --json --server http://localhost:18080 apps
spark-history-cli --json --server http://localhost:18080 app <app-id>
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> jobs
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> stages
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> executors --all
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> sql
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> sql-plan <exec-id> --view final
spark-history-cli --server http://localhost:18080 --app-id <app-id> sql-plan <exec-id> --dot -o plan.dot
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> sql-jobs <exec-id>
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> summary
spark-history-cli --json --server http://localhost:18080 --app-id <app-id> env
spark-history-cli --server http://localhost:18080 --app-id <app-id> logs output.zip

If spark-history-cli is not on PATH, use:

python -m spark_history_cli --json apps

What to reach for

apps for recent runs, durations, status, and picking candidates
app <id> for high-level details about one run
attempts for multi-attempt apps (list or show specific attempt details)
jobs, job <id> for job-level failures or progress
job-stages <id> for stages belonging to a job
stages, stage <id> for task/stage bottlenecks
stage-summary <id> for task metric quantiles (p5/p25/p50/p75/p95) — duration, GC, memory, shuffle, I/O
stage-tasks <id> for individual task details — sorted by runtime to find stragglers
executors --all for executor churn or skew investigations
sql for SQL execution history and plan graph data
sql-plan <id> for SQL plan extraction:
- --view full (default): full plan text
- --view initial: only the Initial Plan (pre-AQE)
- --view final: only the Final Plan (post-AQE)
- --dot: Graphviz DOT output for visualizing the plan DAG
- --json + --view: structured JSON with isAdaptive, sectionCount, plan, and sections
- -o <file>: write output to file instead of stdout
sql-jobs <id> for jobs associated with a SQL execution (fetches all linked jobs by ID)
summary for a concise application overview: app info, resource config (driver/executor/shuffle), and workload stats (jobs/stages/tasks/SQL)
env for Spark config/runtime context
logs only when the user explicitly wants the event log archive saved locally

Practical guidance

Preserve the user's server URL if they gave one explicitly.
Summarize findings after retrieving JSON; do not dump raw JSON unless the user asked for it.
Treat event logs and benchmark history as potentially sensitive. Download them only when necessary and keep them local.
This CLI needs a running Spark History Server. It does not replace SHS and it does not parse raw event logs directly.

Troubleshooting

Issue	Solution
`Connection refused`	SHS not running — start with `$SPARK_HOME/sbin/start-history-server.sh`
`404 Not Found` on app	App ID may include attempt suffix — use `apps` to list valid IDs
No apps listed	Check `spark.history.fs.logDirectory` points to the right event log path
`ModuleNotFoundError`	CLI not installed — run `pip install spark-history-cli`
Wrong server	Set `SPARK_HISTORY_SERVER` env var or use `--server <url>`
Timeout on large apps	SHS may be parsing event logs — wait and retry, or check SHS logs

Capabilities

skillsource-yaooqinnskill-spark-history-clitopic-agent-skillstopic-benchmarktopic-clitopic-diagnosticstopic-glutentopic-performancetopic-sparktopic-spark-history-servertopic-tpc-dstopic-velox

Install

Installnpx skills add yaooqinn/spark-history-cli

Sourcehttps://github.com/yaooqinn/spark-history-cli/tree/main/skills/spark-history-cli

skills.shhttps://skills.sh/yaooqinn/spark-history-cli/spark-history-cli

Transportskills-sh

Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 22 github stars · SKILL.md body (4,905 chars)

Provenance

Indexed fromgithub

Enriched2026-04-23 07:00:58Z · deterministic:skill-github:v1 · v1

First seen2026-04-18

Last seen2026-04-23

Agent access

JSONhttps://clawmart.sh/api/listings/La6erq