Kubernetes GPU Agent
Provides ephemeral diagnostic capabilities for Kubernetes clusters with real-time NVIDIA GPU hardware introspection, ...
What it does
Provides ephemeral diagnostic capabilities for Kubernetes clusters with real-time NVIDIA GPU hardware introspection, health monitoring, and XID error analysis through kubectl debug sessions.
An ephemeral diagnostic agent for Kubernetes clusters that provides real-time NVIDIA GPU hardware introspection via stdio transport, designed for AI-assisted troubleshooting by SREs debugging complex hardware failures. Uses NVIDIA NVML library to expose GPU inventory, health monitoring, and XID error analysis tools through kubectl debug sessions, eliminating the need for persistent DaemonSets or standing infrastructure. Operates in read-only mode by default with an optional operator mode for destructive operations, featuring graceful degradation when GPU hardware is unavailable.
Capabilities
Server
Quality
deterministic score 0.56 from registry signals: · indexed on pulsemcp · has source repo · 4 github stars · registry-generated description present