{"id":"13516452-8142-46f5-81f0-534745f83ff0","shortId":"pcTT5m","kind":"skill","title":"LLM Tracing and Observability Setup","tagline":"Configures end-to-end tracing for an LLM application using OpenTelemetry with LangSmith, Langfuse, or Helicone — span naming, metadata tagging, latency thresholds, and cost tracking.","description":"# LLM Tracing and Observability Setup\n\n## What this skill does\n\nThis skill sets up production-grade observability for LLM applications. Without tracing, debugging a broken LLM pipeline means guessing — you can't see what prompt was sent, what the model returned, which tool call failed, or why latency spiked. This skill configures the right tracing layer for your stack and shows what to instrument.\n\n## How to use\n\n### Claude Code / Cline\n\nCopy this file to `.agents/skills/llm-tracing-setup/SKILL.md` in your project root.\n\nThen ask:\n- *\"Use the LLM Tracing Setup skill to add observability to our LangChain app.\"*\n- *\"Set up Langfuse tracing for our OpenAI API calls.\"*\n\nProvide:\n- LLM framework in use (LangChain, direct API, LlamaIndex, custom)\n- Preferred tracing backend (LangSmith, Langfuse, Helicone, or open to suggestions)\n- Language (Python or TypeScript)\n- Whether you need cost tracking, latency alerting, or user feedback collection\n\n### Cursor / Codex\n\nPaste your LLM call code alongside these instructions and specify the tracing backend.\n\n## The Prompt / Instructions for the Agent\n\n### Step 1 — Choose a tracing backend\n\n| Backend | Best for | Cost model |\n|---|---|---|\n| **LangSmith** | LangChain / LangGraph apps | Free tier + usage |\n| **Langfuse** | Any LLM stack, self-hostable | Free tier + open source |\n| **Helicone** | OpenAI / Anthropic direct API | Per-request fee |\n| **OpenTelemetry + Jaeger** | Full control, existing OTel infra | Self-hosted |\n| **Braintrust** | Eval-heavy teams, prompt versioning | Per-event |\n\n**Recommendation:** Langfuse for most teams — framework-agnostic, self-hostable, free tier generous, good UI.\n\n### Step 2a — Langfuse setup (any stack)\n\n```python\n# pip install langfuse\nimport os\nfrom langfuse import Langfuse\nfrom langfuse.decorators import observe, langfuse_context\n\nos.environ[\"LANGFUSE_PUBLIC_KEY\"] = \"pk-lf-...\"\nos.environ[\"LANGFUSE_SECRET_KEY\"] = \"sk-lf-...\"\nos.environ[\"LANGFUSE_HOST\"] = \"https://cloud.langfuse.com\"\n\nlangfuse = Langfuse()\n\n# Decorate any function that calls an LLM\n@observe()\ndef generate_response(user_query: str) -> str:\n    # Automatically traces: input, output, latency, model\n    response = openai_client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[{\"role\": \"user\", \"content\": user_query}]\n    )\n    return response.choices[0].message.content\n\n# Add custom metadata\n@observe()\ndef process_document(doc_id: str, query: str) -> str:\n    langfuse_context.update_current_observation(\n        metadata={\"doc_id\": doc_id, \"pipeline_version\": \"v2.1\"},\n        tags=[\"document-qa\", \"production\"]\n    )\n    return generate_response(query)\n```\n\n**For LangChain integration:**\n```python\nfrom langfuse.callback import CallbackHandler\n\nhandler = CallbackHandler()\n\n# Pass to any LangChain chain or agent\nchain.invoke({\"query\": user_input}, config={\"callbacks\": [handler]})\n```\n\n### Step 2b — LangSmith setup (LangChain / LangGraph)\n\n```python\nimport os\nos.environ[\"LANGCHAIN_TRACING_V2\"] = \"true\"\nos.environ[\"LANGCHAIN_API_KEY\"] = \"ls__...\"\nos.environ[\"LANGCHAIN_PROJECT\"] = \"my-project-prod\"\n\n# That's it — all LangChain calls are automatically traced\n# For LangGraph, same env vars apply\n```\n\nAdd metadata to traces:\n```python\nfrom langchain_core.tracers.context import tracing_v2_enabled\n\nwith tracing_v2_enabled(tags=[\"user_type:premium\", \"feature:search\"]):\n    result = chain.invoke(user_input)\n```\n\n### Step 2c — Helicone setup (direct OpenAI / Anthropic API)\n\n```python\n# No SDK needed — just change the base URL\nfrom openai import OpenAI\n\nclient = OpenAI(\n    api_key=os.environ[\"OPENAI_API_KEY\"],\n    base_url=\"https://oai.helicone.ai/v1\",\n    default_headers={\n        \"Helicone-Auth\": f\"Bearer {os.environ['HELICONE_API_KEY']}\",\n        \"Helicone-Property-UserId\": user_id,       # custom metadata\n        \"Helicone-Property-Feature\": \"document-qa\"\n    }\n)\n# All calls now traced automatically\n```\n\n### Step 3 — What to instrument\n\n**Always trace:**\n- Every LLM call: model, prompt, response, latency, token count\n- Tool calls: tool name, input, output, duration\n- Retrieval steps: query, top-K results, reranking scores\n\n**Add custom spans for:**\n```python\n# Manual span for non-LLM operations\nwith langfuse.start_as_current_span(\"document-retrieval\") as span:\n    span.update(metadata={\"query\": query, \"index\": \"prod-v2\"})\n    results = vector_store.search(query)\n    span.update(metadata={\"results_count\": len(results)})\n```\n\n**Tag every trace with:**\n- `user_id` or session ID (for debugging user-reported issues)\n- `environment`: production / staging\n- `pipeline_version`: lets you compare v1 vs v2 side-by-side\n- `feature` or `use_case`: document-qa, chatbot, summarization\n\n### Step 4 — Key metrics to monitor\n\n| Metric | Alert threshold | How to track |\n|---|---|---|\n| p95 LLM latency | > 8 seconds | Langfuse latency histogram |\n| Error rate | > 2% | Langfuse error traces |\n| Cost per request | > $0.05 | Token count × model price |\n| Cache hit rate | < 20% | Custom metadata tag |\n| LLM-as-judge score | < 3.5/5 | Langfuse scores API |\n\n### Step 5 — Collecting user feedback\n\nConnect real user feedback to traces to build an eval dataset:\n\n```python\n# After the user rates a response\ndef record_feedback(trace_id: str, score: int, comment: str):\n    langfuse.score(\n        trace_id=trace_id,\n        name=\"user_rating\",\n        value=score,          # 1-5\n        comment=comment\n    )\n```\n\nTraces with low scores become your eval regression dataset automatically.\n\n### Step 6 — Production readiness checklist\n\n- [ ] All LLM calls traced with model, latency, and token counts\n- [ ] User/session ID attached to every trace\n- [ ] Environment tag (prod/staging) on all traces\n- [ ] Latency alert set for p95 > 8s\n- [ ] Error rate alert set for > 2%\n- [ ] Cost dashboard showing daily spend by feature\n- [ ] User feedback collection wired to trace IDs\n- [ ] Sampling configured for high-volume apps (trace 10% in prod, 100% in staging)","tags":["llm","tracing","setup","openagentskills","notysoty","agent-skills","claude","claude-code","claude-skills","cline","cursor","llm-skills"],"capabilities":["skill","source-notysoty","skill-llm-tracing-setup","topic-agent-skills","topic-claude","topic-claude-code","topic-claude-skills","topic-cline","topic-cursor","topic-llm","topic-llm-skills","topic-skills"],"categories":["openagentskills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/Notysoty/openagentskills/llm-tracing-setup","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add Notysoty/openagentskills","source_repo":"https://github.com/Notysoty/openagentskills","install_from":"skills.sh"}},"qualityScore":"0.454","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (5,967 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:13:22.659Z","embedding":null,"createdAt":"2026-05-18T13:20:43.929Z","updatedAt":"2026-05-18T19:13:22.659Z","lastSeenAt":"2026-05-18T19:13:22.659Z","tsv":"'-5':738 '/5':690 '/v1':501 '0':343 '0.05':672 '1':192,737 '10':812 '100':815 '2':665,789 '20':680 '2a':266 '2b':403 '2c':469 '3':534 '3.5':689 '4':644 '4o':333 '5':695 '6':752 '8':658 '8s':783 'add':120,345,443,565 'agent':190,394 'agents/skills/llm-tracing-setup/skill.md':106 'agnost':256 'alert':165,650,779,786 'alongsid':177 'alway':538 'anthrop':222,474 'api':133,142,224,418,475,491,495,511,693 'app':125,205,810 'appli':442 'applic':15,51 'ask':112 'attach':768 'auth':506 'automat':322,435,532,750 'backend':147,184,196,197 'base':483,497 'bearer':508 'becom':745 'best':198 'braintrust':239 'broken':56 'build':706 'cach':677 'call':75,134,175,311,433,529,542,550,758 'callback':400 'callbackhandl':385,387 'case':637 'chain':392 'chain.invoke':395,465 'chang':481 'chatbot':641 'checklist':755 'choos':193 'claud':99 'client':489 'cline':101 'cloud.langfuse.com':304 'code':100,176 'codex':171 'collect':169,696,799 'comment':725,739,740 'compar':626 'config':399 'configur':6,83,805 'connect':699 'content':338 'context':286 'control':232 'copi':102 'cost':30,162,200,669,790 'count':548,601,674,765 'current':359,580 'cursor':170 'custom':144,346,519,566,681 'daili':793 'dashboard':791 'dataset':709,749 'debug':54,614 'decor':307 'def':315,349,717 'default':502 'direct':141,223,472 'doc':352,362,364 'document':351,371,526,583,639 'document-qa':370,525,638 'document-retriev':582 'durat':555 'enabl':453,457 'end':8,10 'end-to-end':7 'env':440 'environ':619,772 'error':663,667,784 'eval':241,708,747 'eval-heavi':240 'event':248 'everi':540,605,770 'exist':233 'f':507 'fail':76 'featur':462,524,634,796 'fee':228 'feedback':168,698,702,719,798 'file':104 'framework':137,255 'framework-agnost':254 'free':206,216,260 'full':231 'function':309 'generat':316,375 'generous':262 'good':263 'gpt':332 'gpt-4o-mini':331 'grade':47 'guess':60 'handler':386,401 'header':503 'heavi':242 'helicon':22,150,220,470,505,510,514,522 'helicone-auth':504 'helicone-property-featur':521 'helicone-property-userid':513 'high':808 'high-volum':807 'histogram':662 'hit':678 'host':238,303 'hostabl':215,259 'id':353,363,365,518,609,612,721,729,731,767,803 'import':275,279,283,384,409,450,487 'index':591 'infra':235 'input':324,398,467,553 'instal':273 'instruct':179,187 'instrument':95,537 'int':724 'integr':380 'issu':618 'jaeger':230 'judg':687 'k':561 'key':290,297,419,492,496,512,645 'langchain':124,140,203,379,391,406,412,417,422,432 'langchain_core.tracers.context':449 'langfus':20,128,149,209,250,267,274,278,280,285,288,295,302,305,306,660,666,691 'langfuse.callback':383 'langfuse.decorators':282 'langfuse.score':727 'langfuse.start':578 'langfuse_context.update':358 'langgraph':204,407,438 'langsmith':19,148,202,404 'languag':155 'latenc':27,79,164,326,546,657,661,762,778 'layer':87 'len':602 'let':624 'lf':293,300 'llamaindex':143 'llm':1,14,32,50,57,115,136,174,211,313,541,575,656,685,757 'llm-as-judg':684 'low':743 'ls':420 'manual':570 'mean':59 'messag':335 'message.content':344 'metadata':25,347,361,444,520,588,599,682 'metric':646,649 'mini':334 'model':71,201,327,330,543,675,761 'monitor':648 'my-project-prod':424 'name':24,552,732 'need':161,479 'non':574 'non-llm':573 'oai.helicone.ai':500 'oai.helicone.ai/v1':499 'observ':4,35,48,121,284,314,348,360 'open':152,218 'openai':132,221,473,486,488,490,494 'openai_client.chat.completions.create':329 'opentelemetri':17,229 'oper':576 'os':276,410 'os.environ':287,294,301,411,416,421,493,509 'otel':234 'output':325,554 'p95':655,782 'pass':388 'past':172 'per':226,247,670 'per-ev':246 'per-request':225 'pip':272 'pipelin':58,366,622 'pk':292 'pk-lf':291 'prefer':145 'premium':461 'price':676 'process':350 'prod':427,593,814 'prod-v2':592 'prod/staging':774 'product':46,373,620,753 'production-grad':45 'project':109,423,426 'prompt':66,186,244,544 'properti':515,523 'provid':135 'public':289 'python':156,271,381,408,447,476,569,710 'qa':372,527,640 'queri':319,340,355,377,396,558,589,590,597 'rate':664,679,714,734,785 'readi':754 'real':700 'recommend':249 'record':718 'regress':748 'report':617 'request':227,671 'rerank':563 'respons':317,328,376,545,716 'response.choices':342 'result':464,562,595,600,603 'retriev':556,584 'return':72,341,374 'right':85 'role':336 'root':110 'sampl':804 'score':564,688,692,723,736,744 'sdk':478 'search':463 'second':659 'secret':296 'see':64 'self':214,237,258 'self-host':213,236,257 'sent':68 'session':611 'set':43,126,780,787 'setup':5,36,117,268,405,471 'show':92,792 'side':631,633 'side-by-sid':630 'sk':299 'sk-lf':298 'skill':39,42,82,118 'skill-llm-tracing-setup' 'sourc':219 'source-notysoty' 'span':23,567,571,581,586 'span.update':587,598 'specifi':181 'spend':794 'spike':80 'stack':90,212,270 'stage':621,817 'step':191,265,402,468,533,557,643,694,751 'str':320,321,354,356,357,722,726 'suggest':154 'summar':642 'tag':26,369,458,604,683,773 'team':243,253 'threshold':28,651 'tier':207,217,261 'token':547,673,764 'tool':74,549,551 'top':560 'top-k':559 'topic-agent-skills' 'topic-claude' 'topic-claude-code' 'topic-claude-skills' 'topic-cline' 'topic-cursor' 'topic-llm' 'topic-llm-skills' 'topic-skills' 'trace':2,11,33,53,86,116,129,146,183,195,323,413,436,446,451,455,531,539,606,668,704,720,728,730,741,759,771,777,802,811 'track':31,163,654 'true':415 'type':460 'typescript':158 'ui':264 'url':484,498 'usag':208 'use':16,98,113,139,636 'user':167,318,337,339,397,459,466,517,608,616,697,701,713,733,797 'user-report':615 'user/session':766 'userid':516 'v1':627 'v2':414,452,456,594,629 'v2.1':368 'valu':735 'var':441 'vector_store.search':596 'version':245,367,623 'volum':809 'vs':628 'whether':159 'wire':800 'without':52","prices":[{"id":"e6549f21-23f6-43f6-88ee-d30aa12d6641","listingId":"13516452-8142-46f5-81f0-534745f83ff0","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"Notysoty","category":"openagentskills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:20:43.929Z"}],"sources":[{"listingId":"13516452-8142-46f5-81f0-534745f83ff0","source":"github","sourceId":"Notysoty/openagentskills/llm-tracing-setup","sourceUrl":"https://github.com/Notysoty/openagentskills/tree/main/skills/llm-tracing-setup","isPrimary":false,"firstSeenAt":"2026-05-18T13:20:43.929Z","lastSeenAt":"2026-05-18T19:13:22.659Z"}],"details":{"listingId":"13516452-8142-46f5-81f0-534745f83ff0","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"Notysoty","slug":"llm-tracing-setup","github":{"repo":"Notysoty/openagentskills","stars":8,"topics":["agent-skills","claude","claude-code","claude-skills","cline","cursor","llm","llm-skills","skills"],"license":"mit","html_url":"https://github.com/Notysoty/openagentskills","pushed_at":"2026-03-28T06:50:19Z","description":"A  community-driven library of reusable AI agent skills for Claude Code, Cursor, Codex, Cline, and more.","skill_md_sha":"990c585064273106762133f81e99d780c2664f8d","skill_md_path":"skills/llm-tracing-setup/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/Notysoty/openagentskills/tree/main/skills/llm-tracing-setup"},"layout":"multi","source":"github","category":"openagentskills","frontmatter":{"name":"LLM Tracing and Observability Setup","description":"Configures end-to-end tracing for an LLM application using OpenTelemetry with LangSmith, Langfuse, or Helicone — span naming, metadata tagging, latency thresholds, and cost tracking."},"skills_sh_url":"https://skills.sh/Notysoty/openagentskills/llm-tracing-setup"},"updatedAt":"2026-05-18T19:13:22.659Z"}}