{"id":"c5ddcd94-1789-4cb3-8302-2ed0c72fde13","shortId":"VxSw9h","kind":"skill","title":"observability-engineer","tagline":"Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows.","description":"You are an observability engineer specializing in production-grade monitoring, logging, tracing, and reliability systems for enterprise-scale applications.\n\n## Use this skill when\n\n- Designing monitoring, logging, or tracing systems\n- Defining SLIs/SLOs and alerting strategies\n- Investigating production reliability or performance regressions\n\n## Do not use this skill when\n\n- You only need a single ad-hoc dashboard\n- You cannot access metrics, logs, or tracing data\n- You need application feature development instead of observability\n\n## Instructions\n\n1. Identify critical services, user journeys, and reliability targets.\n2. Define signals, instrumentation, and data retention.\n3. Build dashboards and alerts aligned to SLOs.\n4. Validate signal quality and reduce alert noise.\n\n## Safety\n\n- Avoid logging sensitive data or secrets.\n- Use alerting thresholds that balance coverage and noise.\n\n## Purpose\nExpert observability engineer specializing in comprehensive monitoring strategies, distributed tracing, and production reliability systems. Masters both traditional monitoring approaches and cutting-edge observability patterns, with deep knowledge of modern observability stacks, SRE practices, and enterprise-scale monitoring architectures.\n\n## Capabilities\n\n### Monitoring & Metrics Infrastructure\n- Prometheus ecosystem with advanced PromQL queries and recording rules\n- Grafana dashboard design with templating, alerting, and custom panels\n- InfluxDB time-series data management and retention policies\n- DataDog enterprise monitoring with custom metrics and synthetic monitoring\n- New Relic APM integration and performance baseline establishment\n- CloudWatch comprehensive AWS service monitoring and cost optimization\n- Nagios and Zabbix for traditional infrastructure monitoring\n- Custom metrics collection with StatsD, Telegraf, and Collectd\n- High-cardinality metrics handling and storage optimization\n\n### Distributed Tracing & APM\n- Jaeger distributed tracing deployment and trace analysis\n- Zipkin trace collection and service dependency mapping\n- AWS X-Ray integration for serverless and microservice architectures\n- OpenTracing and OpenTelemetry instrumentation standards\n- Application Performance Monitoring with detailed transaction tracing\n- Service mesh observability with Istio and Envoy telemetry\n- Correlation between traces, logs, and metrics for root cause analysis\n- Performance bottleneck identification and optimization recommendations\n- Distributed system debugging and latency analysis\n\n### Log Management & Analysis\n- ELK Stack (Elasticsearch, Logstash, Kibana) architecture and optimization\n- Fluentd and Fluent Bit log forwarding and parsing configurations\n- Splunk enterprise log management and search optimization\n- Loki for cloud-native log aggregation with Grafana integration\n- Log parsing, enrichment, and structured logging implementation\n- Centralized logging for microservices and distributed systems\n- Log retention policies and cost-effective storage strategies\n- Security log analysis and compliance monitoring\n- Real-time log streaming and alerting mechanisms\n\n### Alerting & Incident Response\n- PagerDuty integration with intelligent alert routing and escalation\n- Slack and Microsoft Teams notification workflows\n- Alert correlation and noise reduction strategies\n- Runbook automation and incident response playbooks\n- On-call rotation management and fatigue prevention\n- Post-incident analysis and blameless postmortem processes\n- Alert threshold tuning and false positive reduction\n- Multi-channel notification systems and redundancy planning\n- Incident severity classification and response procedures\n\n### SLI/SLO Management & Error Budgets\n- Service Level Indicator (SLI) definition and measurement\n- Service Level Objective (SLO) establishment and tracking\n- Error budget calculation and burn rate analysis\n- SLA compliance monitoring and reporting\n- Availability and reliability target setting\n- Performance benchmarking and capacity planning\n- Customer impact assessment and business metrics correlation\n- Reliability engineering practices and failure mode analysis\n- Chaos engineering integration for proactive reliability testing\n\n### OpenTelemetry & Modern Standards\n- OpenTelemetry collector deployment and configuration\n- Auto-instrumentation for multiple programming languages\n- Custom telemetry data collection and export strategies\n- Trace sampling strategies and performance optimization\n- Vendor-agnostic observability pipeline design\n- Protocol buffer and gRPC telemetry transmission\n- Multi-backend telemetry export (Jaeger, Prometheus, DataDog)\n- Observability data standardization across services\n- Migration strategies from proprietary to open standards\n\n### Infrastructure & Platform Monitoring\n- Kubernetes cluster monitoring with Prometheus Operator\n- Docker container metrics and resource utilization tracking\n- Cloud provider monitoring across AWS, Azure, and GCP\n- Database performance monitoring for SQL and NoSQL systems\n- Network monitoring and traffic analysis with SNMP and flow data\n- Server hardware monitoring and predictive maintenance\n- CDN performance monitoring and edge location analysis\n- Load balancer and reverse proxy monitoring\n- Storage system monitoring and capacity forecasting\n\n### Chaos Engineering & Reliability Testing\n- Chaos Monkey and Gremlin fault injection strategies\n- Failure mode identification and resilience testing\n- Circuit breaker pattern implementation and monitoring\n- Disaster recovery testing and validation procedures\n- Load testing integration with monitoring systems\n- Dependency failure simulation and cascading failure prevention\n- Recovery time objective (RTO) and recovery point objective (RPO) validation\n- System resilience scoring and improvement recommendations\n- Automated chaos experiments and safety controls\n\n### Custom Dashboards & Visualization\n- Executive dashboard creation for business stakeholders\n- Real-time operational dashboards for engineering teams\n- Custom Grafana plugins and panel development\n- Multi-tenant dashboard design and access control\n- Mobile-responsive monitoring interfaces\n- Embedded analytics and white-label monitoring solutions\n- Data visualization best practices and user experience design\n- Interactive dashboard development with drill-down capabilities\n- Automated report generation and scheduled delivery\n\n### Observability as Code & Automation\n- Infrastructure as Code for monitoring stack deployment\n- Terraform modules for observability infrastructure\n- Ansible playbooks for monitoring agent deployment\n- GitOps workflows for dashboard and alert management\n- Configuration management and version control strategies\n- Automated monitoring setup for new services\n- CI/CD integration for observability pipeline testing\n- Policy as Code for compliance and governance\n- Self-healing monitoring infrastructure design\n\n### Cost Optimization & Resource Management\n- Monitoring cost analysis and optimization strategies\n- Data retention policy optimization for storage costs\n- Sampling rate tuning for high-volume telemetry data\n- Multi-tier storage strategies for historical data\n- Resource allocation optimization for monitoring infrastructure\n- Vendor cost comparison and migration planning\n- Open source vs commercial tool evaluation\n- ROI analysis for observability investments\n- Budget forecasting and capacity planning\n\n### Enterprise Integration & Compliance\n- SOC2, PCI DSS, and HIPAA compliance monitoring requirements\n- Active Directory and SAML integration for monitoring access\n- Multi-tenant monitoring architectures and data isolation\n- Audit trail generation and compliance reporting automation\n- Data residency and sovereignty requirements for global deployments\n- Integration with enterprise ITSM tools (ServiceNow, Jira Service Management)\n- Corporate firewall and network security policy compliance\n- Backup and disaster recovery for monitoring infrastructure\n- Change management processes for monitoring configurations\n\n### AI & Machine Learning Integration\n- Anomaly detection using statistical models and machine learning algorithms\n- Predictive analytics for capacity planning and resource forecasting\n- Root cause analysis automation using correlation analysis and pattern recognition\n- Intelligent alert clustering and noise reduction using unsupervised learning\n- Time series forecasting for proactive scaling and maintenance scheduling\n- Natural language processing for log analysis and error categorization\n- Automated baseline establishment and drift detection for system behavior\n- Performance regression detection using statistical change point analysis\n- Integration with MLOps pipelines for model monitoring and observability\n\n## Behavioral Traits\n- Prioritizes production reliability and system stability over feature velocity\n- Implements comprehensive monitoring before issues occur, not after\n- Focuses on actionable alerts and meaningful metrics over vanity metrics\n- Emphasizes correlation between business impact and technical metrics\n- Considers cost implications of monitoring and observability solutions\n- Uses data-driven approaches for capacity planning and optimization\n- Implements gradual rollouts and canary monitoring for changes\n- Documents monitoring rationale and maintains runbooks religiously\n- Stays current with emerging observability tools and practices\n- Balances monitoring coverage with system performance impact\n\n## Knowledge Base\n- Latest observability developments and tool ecosystem evolution (2024/2025)\n- Modern SRE practices and reliability engineering patterns with Google SRE methodology\n- Enterprise monitoring architectures and scalability considerations for Fortune 500 companies\n- Cloud-native observability patterns and Kubernetes monitoring with service mesh integration\n- Security monitoring and compliance requirements (SOC2, PCI DSS, HIPAA, GDPR)\n- Machine learning applications in anomaly detection, forecasting, and automated root cause analysis\n- Multi-cloud and hybrid monitoring strategies across AWS, Azure, GCP, and on-premises\n- Developer experience optimization for observability tooling and shift-left monitoring\n- Incident response best practices, post-incident analysis, and blameless postmortem culture\n- Cost-effective monitoring strategies scaling from startups to enterprises with budget optimization\n- OpenTelemetry ecosystem and vendor-neutral observability standards\n- Edge computing and IoT device monitoring at scale\n- Serverless and event-driven architecture observability patterns\n- Container security monitoring and runtime threat detection\n- Business intelligence integration with technical monitoring for executive reporting\n\n## Response Approach\n1. **Analyze monitoring requirements** for comprehensive coverage and business alignment\n2. **Design observability architecture** with appropriate tools and data flow\n3. **Implement production-ready monitoring** with proper alerting and dashboards\n4. **Include cost optimization** and resource efficiency considerations\n5. **Consider compliance and security** implications of monitoring data\n6. **Document monitoring strategy** and provide operational runbooks\n7. **Implement gradual rollout** with monitoring validation at each stage\n8. **Provide incident response** procedures and escalation workflows\n\n## Example Interactions\n- \"Design a comprehensive monitoring strategy for a microservices architecture with 50+ services\"\n- \"Implement distributed tracing for a complex e-commerce platform handling 1M+ daily transactions\"\n- \"Set up cost-effective log management for a high-traffic application generating 10TB+ daily logs\"\n- \"Create SLI/SLO framework with error budget tracking for API services with 99.9% availability target\"\n- \"Build real-time alerting system with intelligent noise reduction for 24/7 operations team\"\n- \"Implement chaos engineering with monitoring validation for Netflix-scale resilience testing\"\n- \"Design executive dashboard showing business impact of system reliability and revenue correlation\"\n- \"Set up compliance monitoring for SOC2 and PCI requirements with automated evidence collection\"\n- \"Optimize monitoring costs while maintaining comprehensive coverage for startup scaling to enterprise\"\n- \"Create automated incident response workflows with runbook integration and Slack/PagerDuty escalation\"\n- \"Build multi-region observability architecture with data sovereignty compliance\"\n- \"Implement machine learning-based anomaly detection for proactive issue identification\"\n- \"Design observability strategy for serverless architecture with AWS Lambda and API Gateway\"\n- \"Create custom metrics pipeline for business KPIs integrated with technical monitoring\"\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.","tags":["observability","engineer","antigravity","awesome","skills","sickn33","agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows"],"capabilities":["skill","source-sickn33","skill-observability-engineer","topic-agent-skills","topic-agentic-skills","topic-ai-agent-skills","topic-ai-agents","topic-ai-coding","topic-ai-workflows","topic-antigravity","topic-antigravity-skills","topic-claude-code","topic-claude-code-skills","topic-codex-cli","topic-codex-skills"],"categories":["antigravity-awesome-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/observability-engineer","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add sickn33/antigravity-awesome-skills","source_repo":"https://github.com/sickn33/antigravity-awesome-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 34666 github stars · SKILL.md body (13,024 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-23T06:51:39.319Z","embedding":null,"createdAt":"2026-04-18T21:41:33.829Z","updatedAt":"2026-04-23T06:51:39.319Z","lastSeenAt":"2026-04-23T06:51:39.319Z","tsv":"'1':97,1311 '10tb':1427 '1m':1410 '2':106,1321 '2024/2025':1162 '24/7':1455 '3':113,1331 '4':121,1342 '5':1350 '50':1397 '500':1182 '6':1359 '7':1367 '8':1377 '99.9':1441 'access':82,754,931 'across':585,613,1225 'action':1089 'activ':924 'ad':77 'ad-hoc':76 'advanc':192 'agent':811 'aggreg':366 'agnost':564 'ai':984 'alert':57,117,127,137,203,405,407,414,424,452,818,1016,1090,1339,1448 'algorithm':996 'align':118,1320 'alloc':886 'analysi':273,320,332,335,395,447,497,526,630,648,857,904,1007,1011,1038,1058,1217,1251 'analyt':762,998 'analyz':1312 'anomali':988,1210,1533 'ansibl':807 'api':1438,1549 'apm':227,266 'applic':43,90,296,1208,1425 'approach':163,1117,1310 'appropri':1326 'architectur':184,290,341,936,1176,1290,1324,1395,1523,1544 'ask':1595 'assess':515 'audit':940 'auto':543 'auto-instrument':542 'autom':431,719,785,794,826,946,1008,1042,1214,1492,1508 'avail':503,1442 'avoid':130 'aw':235,281,614,1226,1546 'azur':615,1227 'backend':576 'backup':971 'balanc':140,650,1146 'base':1154,1532 'baselin':231,1043 'behavior':1050,1068 'benchmark':509 'best':771,1246 'bit':347 'blameless':449,1253 'bottleneck':322 'boundari':1603 'breaker':679 'budget':476,492,908,1267,1435 'buffer':569 'build':4,114,1444,1518 'burn':495 'busi':517,732,1100,1300,1319,1474,1556 'calcul':493 'call':438 'canari':1127 'cannot':81 'capabl':185,784 'capac':511,659,911,1000,1119 'cardin':258 'cascad':700 'categor':1041 'caus':319,1006,1216 'cdn':642 'central':377 'chang':978,1056,1130 'channel':461 'chao':527,661,665,720,1459 'ci/cd':832 'circuit':678 'clarif':1597 'classif':469 'clear':1570 'cloud':363,610,1185,1220 'cloud-nat':362,1184 'cloudwatch':233 'cluster':598,1017 'code':793,797,840 'collect':250,276,552,1494 'collectd':255 'collector':538 'commerc':1407 'commerci':900 'compani':1183 'comparison':893 'complex':1404 'complianc':397,499,842,915,921,944,970,1199,1352,1484,1527 'comprehens':14,150,234,1080,1316,1389,1500 'comput':1278 'configur':352,541,820,983 'consid':1105,1351 'consider':1179,1349 'contain':604,1293 'control':724,755,824 'corpor':964 'correl':311,425,519,1010,1098,1481 'cost':239,389,851,856,867,892,1106,1257,1344,1416,1497 'cost-effect':388,1256,1415 'coverag':141,1148,1317,1501 'creat':1430,1507,1551 'creation':730 'criteria':1606 'critic':99 'cultur':1255 'current':1139 'custom':205,220,248,513,549,725,742,1552 'cut':166 'cutting-edg':165 'daili':1411,1428 'dashboard':79,115,199,726,729,738,751,778,816,1341,1472 'data':87,111,133,211,551,583,635,769,861,876,884,938,947,1115,1329,1358,1525 'data-driven':1114 'databas':618 'datadog':216,581 'debug':329 'deep':171 'defin':54,107 'definit':481 'deliveri':790 'depend':279,696 'deploy':270,539,801,812,954 'describ':1574 'design':48,200,567,752,776,850,1322,1387,1470,1539 'detail':300 'detect':989,1047,1053,1211,1299,1534 'develop':92,747,779,1157,1233 'devic':1281 'directori':925 'disast':684,973 'distribut':153,264,268,327,382,1400 'docker':603 'document':1131,1360 'drift':1046 'drill':782 'drill-down':781 'driven':1116,1289 'dss':918,1203 'e':1406 'e-commerc':1405 'ecosystem':190,1160,1270 'edg':167,646,1277 'effect':390,1258,1417 'effici':1348 'elasticsearch':338 'elk':336 'embed':761 'emerg':1141 'emphas':1097 'engin':3,27,147,521,528,662,740,1168,1460 'enrich':372 'enterpris':41,181,217,354,913,957,1174,1265,1506 'enterprise-scal':40,180 'environ':1586 'environment-specif':1585 'envoy':309 'error':475,491,1040,1434 'escal':417,1383,1517 'establish':232,488,1044 'evalu':902 'event':1288 'event-driven':1287 'evid':1493 'evolut':1161 'exampl':1385 'execut':728,1307,1471 'experi':721,775,1234 'expert':145,1591 'export':554,578 'failur':524,672,697,701 'fals':456 'fatigu':442 'fault':669 'featur':91,1077 'firewal':965 'flow':634,1330 'fluent':346 'fluentd':344 'focus':1087 'forecast':660,909,1004,1026,1212 'fortun':1181 'forward':349 'framework':1432 'gateway':1550 'gcp':617,1228 'gdpr':1205 'generat':787,942,1426 'gitop':813 'global':953 'googl':1171 'govern':844 'grade':32 'gradual':1124,1369 'grafana':198,368,743 'gremlin':668 'grpc':571 'handl':260,1409 'hardwar':637 'heal':847 'high':257,873,1423 'high-cardin':256 'high-traff':1422 'high-volum':872 'hipaa':920,1204 'histor':883 'hoc':78 'hybrid':1222 'identif':323,674,1538 'identifi':98 'impact':514,1101,1152,1475 'implement':13,376,681,1079,1123,1332,1368,1399,1458,1528 'implic':1107,1355 'improv':717 'incid':20,408,433,446,467,1244,1250,1379,1509 'includ':1343 'indic':479 'influxdb':207 'infrastructur':188,246,594,795,806,849,890,977 'inject':670 'input':1600 'instead':93 'instruct':96 'instrument':109,294,544 'integr':228,285,369,411,529,692,833,914,928,955,987,1059,1195,1302,1514,1558 'intellig':413,1015,1301,1451 'interact':777,1386 'interfac':760 'invest':907 'investig':59 'iot':1280 'isol':939 'issu':1083,1537 'istio':307 'itsm':958 'jaeger':267,579 'jira':961 'journey':102 'kibana':340 'knowledg':172,1153 'kpis':1557 'kubernet':597,1190 'label':766 'lambda':1547 'languag':548,1034 'latenc':331 'latest':1155 'learn':986,995,1023,1207,1531 'learning-bas':1530 'left':1242 'level':478,485 'limit':1562 'load':649,690 'locat':647 'log':9,34,50,84,131,314,333,348,355,365,370,375,378,384,394,402,1037,1418,1429 'logstash':339 'loki':360 'machin':985,994,1206,1529 'maintain':1135,1499 'mainten':641,1031 'manag':18,212,334,356,440,474,819,821,854,963,979,1419 'map':280 'master':159 'match':1571 'meaning':1092 'measur':483 'mechan':406 'mesh':304,1194 'methodolog':1173 'metric':83,187,221,249,259,316,518,605,1093,1096,1104,1553 'microservic':289,380,1394 'microsoft':420 'migrat':587,895 'miss':1608 'mlop':1061 'mobil':757 'mobile-respons':756 'mode':525,673 'model':992,1064 'modern':174,535,1163 'modul':803 'monitor':8,33,49,151,162,183,186,218,224,237,247,298,398,500,596,599,612,620,627,638,644,654,657,683,694,759,767,799,810,827,848,855,889,922,930,935,976,982,1065,1081,1109,1128,1132,1147,1175,1191,1197,1223,1243,1259,1282,1295,1305,1313,1336,1357,1361,1372,1390,1462,1485,1496,1561 'monkey':666 'multi':460,575,749,878,933,1219,1520 'multi-backend':574 'multi-channel':459 'multi-cloud':1218 'multi-region':1519 'multi-ten':748,932 'multi-ti':877 'multipl':546 'nagio':241 'nativ':364,1186 'natur':1033 'need':73,89 'netflix':1466 'netflix-scal':1465 'network':626,967 'neutral':1274 'new':225,830 'nois':128,143,427,1019,1452 'nosql':624 'notif':422,462 'object':486,705,710 'observ':2,15,26,95,146,168,175,305,565,582,791,805,835,906,1067,1111,1142,1156,1187,1237,1275,1291,1323,1522,1540 'observability-engin':1 'occur':1084 'on-cal':436 'on-premis':1230 'open':592,897 'opentelemetri':293,534,537,1269 'opentrac':291 'oper':602,737,1365,1456 'optim':240,263,325,343,359,561,852,859,864,887,1122,1235,1268,1345,1495 'output':1580 'pagerduti':410 'panel':206,746 'pars':351,371 'pattern':169,680,1013,1169,1188,1292 'pci':917,1202,1489 'perform':63,230,297,321,508,560,619,643,1051,1151 'permiss':1601 'pipelin':566,836,1062,1554 'plan':466,512,896,912,1001,1120 'platform':595,1408 'playbook':435,808 'plugin':744 'point':709,1057 'polici':215,386,838,863,969 'posit':457 'post':445,1249 'post-incid':444,1248 'postmortem':450,1254 'practic':178,522,772,1145,1165,1247 'predict':640,997 'premis':1232 'prevent':443,702 'priorit':1070 'proactiv':531,1028,1536 'procedur':472,689,1381 'process':451,980,1035 'product':6,31,60,156,1071,1334 'production-grad':30 'production-readi':5,1333 'program':547 'prometheus':189,580,601 'promql':193 'proper':1338 'proprietari':590 'protocol':568 'provid':611,1364,1378 'proxi':653 'purpos':144 'qualiti':124 'queri':194 'rate':496,869 'rational':1133 'ray':284 'readi':7,1335 'real':400,735,1446 'real-tim':399,734,1445 'recognit':1014 'recommend':326,718 'record':196 'recoveri':685,703,708,974 'reduc':126 'reduct':428,458,1020,1453 'redund':465 'region':1521 'regress':64,1052 'reliabl':37,61,104,157,505,520,532,663,1072,1167,1478 'relic':226 'religi':1137 'report':502,786,945,1308 'requir':923,951,1200,1314,1490,1599 'resid':948 'resili':676,714,1468 'resourc':607,853,885,1003,1347 'respons':21,409,434,471,758,1245,1309,1380,1510 'retent':112,214,385,862 'revenu':1480 'revers':652 'review':1592 'roi':903 'rollout':1125,1370 'root':318,1005,1215 'rotat':439 'rout':415 'rpo':711 'rto':706 'rule':197 'runbook':430,1136,1366,1513 'runtim':1297 'safeti':129,723,1602 'saml':927 'sampl':557,868 'scalabl':1178 'scale':42,182,1029,1261,1284,1467,1504 'schedul':789,1032 'scope':1573 'score':715 'search':358 'secret':135 'secur':393,968,1196,1294,1354 'self':846 'self-heal':845 'sensit':132 'seri':210,1025 'server':636 'serverless':287,1285,1543 'servic':100,236,278,303,477,484,586,831,962,1193,1398,1439 'servicenow':960 'set':507,1413,1482 'setup':828 'sever':468 'shift':1241 'shift-left':1240 'show':1473 'signal':108,123 'simul':698 'singl':75 'skill':46,69,1565 'skill-observability-engineer' 'sla':498 'slack':418 'slack/pagerduty':1516 'sli':480 'sli/slo':17,473,1431 'slis/slos':55 'slo':487 'slos':120 'snmp':632 'soc2':916,1201,1487 'solut':768,1112 'sourc':898 'source-sickn33' 'sovereignti':950,1526 'special':28,148 'specif':1587 'splunk':353 'sql':622 'sre':177,1164,1172 'stabil':1075 'stack':176,337,800 'stage':1376 'stakehold':733 'standard':295,536,584,593,1276 'startup':1263,1503 'statist':991,1055 'statsd':252 'stay':1138 'stop':1593 'storag':262,391,655,866,880 'strategi':16,58,152,392,429,555,558,588,671,825,860,881,1224,1260,1362,1391,1541 'stream':403 'structur':374 'substitut':1583 'success':1605 'synthet':223 'system':12,38,53,158,328,383,463,625,656,695,713,1049,1074,1150,1449,1477 'target':105,506,1443 'task':1569 'team':421,741,1457 'technic':1103,1304,1560 'telegraf':253 'telemetri':310,550,572,577,875 'templat':202 'tenant':750,934 'terraform':802 'test':533,664,677,686,691,837,1469,1589 'threat':1298 'threshold':138,453 'tier':879 'time':209,401,704,736,1024,1447 'time-seri':208 'tool':901,959,1143,1159,1238,1327 'topic-agent-skills' 'topic-agentic-skills' 'topic-ai-agent-skills' 'topic-ai-agents' 'topic-ai-coding' 'topic-ai-workflows' 'topic-antigravity' 'topic-antigravity-skills' 'topic-claude-code' 'topic-claude-code-skills' 'topic-codex-cli' 'topic-codex-skills' 'trace':11,35,52,86,154,265,269,272,275,302,313,556,1401 'track':490,609,1436 'tradit':161,245 'traffic':629,1424 'trail':941 'trait':1069 'transact':301,1412 'transmiss':573 'treat':1578 'tune':454,870 'unsupervis':1022 'use':44,67,136,990,1009,1021,1054,1113,1563 'user':101,774 'util':608 'valid':122,688,712,1373,1463,1588 'vaniti':1095 'veloc':1078 'vendor':563,891,1273 'vendor-agnost':562 'vendor-neutr':1272 'version':823 'visual':727,770 'volum':874 'vs':899 'white':765 'white-label':764 'workflow':22,423,814,1384,1511 'x':283 'x-ray':282 'zabbix':243 'zipkin':274","prices":[{"id":"aa59eafb-418b-488b-9842-fb472dacdec2","listingId":"c5ddcd94-1789-4cb3-8302-2ed0c72fde13","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"sickn33","category":"antigravity-awesome-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:41:33.829Z"}],"sources":[{"listingId":"c5ddcd94-1789-4cb3-8302-2ed0c72fde13","source":"github","sourceId":"sickn33/antigravity-awesome-skills/observability-engineer","sourceUrl":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/observability-engineer","isPrimary":false,"firstSeenAt":"2026-04-18T21:41:33.829Z","lastSeenAt":"2026-04-23T06:51:39.319Z"}],"details":{"listingId":"c5ddcd94-1789-4cb3-8302-2ed0c72fde13","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"sickn33","slug":"observability-engineer","github":{"repo":"sickn33/antigravity-awesome-skills","stars":34666,"topics":["agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows","antigravity","antigravity-skills","claude-code","claude-code-skills","codex-cli","codex-skills","cursor","cursor-skills","developer-tools","gemini-cli","gemini-skills","kiro","mcp","skill-library"],"license":"mit","html_url":"https://github.com/sickn33/antigravity-awesome-skills","pushed_at":"2026-04-23T06:41:03Z","description":"Installable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.","skill_md_sha":"a780d377e881a0d7bcc4751214b1dd9f1ff8c46f","skill_md_path":"skills/observability-engineer/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/observability-engineer"},"layout":"multi","source":"github","category":"antigravity-awesome-skills","frontmatter":{"name":"observability-engineer","description":"Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows."},"skills_sh_url":"https://skills.sh/sickn33/antigravity-awesome-skills/observability-engineer"},"updatedAt":"2026-04-23T06:51:39.319Z"}}