Skillquality 0.45

LLM Tracing and Observability Setup

Configures end-to-end tracing for an LLM application using OpenTelemetry with LangSmith, Langfuse, or Helicone — span naming, metadata tagging, latency thresholds, and cost tracking.

Price
free
Protocol
skill
Verified
no

What it does

LLM Tracing and Observability Setup

What this skill does

This skill sets up production-grade observability for LLM applications. Without tracing, debugging a broken LLM pipeline means guessing — you can't see what prompt was sent, what the model returned, which tool call failed, or why latency spiked. This skill configures the right tracing layer for your stack and shows what to instrument.

How to use

Claude Code / Cline

Copy this file to .agents/skills/llm-tracing-setup/SKILL.md in your project root.

Then ask:

  • "Use the LLM Tracing Setup skill to add observability to our LangChain app."
  • "Set up Langfuse tracing for our OpenAI API calls."

Provide:

  • LLM framework in use (LangChain, direct API, LlamaIndex, custom)
  • Preferred tracing backend (LangSmith, Langfuse, Helicone, or open to suggestions)
  • Language (Python or TypeScript)
  • Whether you need cost tracking, latency alerting, or user feedback collection

Cursor / Codex

Paste your LLM call code alongside these instructions and specify the tracing backend.

The Prompt / Instructions for the Agent

Step 1 — Choose a tracing backend

BackendBest forCost model
LangSmithLangChain / LangGraph appsFree tier + usage
LangfuseAny LLM stack, self-hostableFree tier + open source
HeliconeOpenAI / Anthropic direct APIPer-request fee
OpenTelemetry + JaegerFull control, existing OTel infraSelf-hosted
BraintrustEval-heavy teams, prompt versioningPer-event

Recommendation: Langfuse for most teams — framework-agnostic, self-hostable, free tier generous, good UI.

Step 2a — Langfuse setup (any stack)

# pip install langfuse
import os
from langfuse import Langfuse
from langfuse.decorators import observe, langfuse_context

os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-..."
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-..."
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"

langfuse = Langfuse()

# Decorate any function that calls an LLM
@observe()
def generate_response(user_query: str) -> str:
    # Automatically traces: input, output, latency, model
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": user_query}]
    )
    return response.choices[0].message.content

# Add custom metadata
@observe()
def process_document(doc_id: str, query: str) -> str:
    langfuse_context.update_current_observation(
        metadata={"doc_id": doc_id, "pipeline_version": "v2.1"},
        tags=["document-qa", "production"]
    )
    return generate_response(query)

For LangChain integration:

from langfuse.callback import CallbackHandler

handler = CallbackHandler()

# Pass to any LangChain chain or agent
chain.invoke({"query": user_input}, config={"callbacks": [handler]})

Step 2b — LangSmith setup (LangChain / LangGraph)

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "ls__..."
os.environ["LANGCHAIN_PROJECT"] = "my-project-prod"

# That's it — all LangChain calls are automatically traced
# For LangGraph, same env vars apply

Add metadata to traces:

from langchain_core.tracers.context import tracing_v2_enabled

with tracing_v2_enabled(tags=["user_type:premium", "feature:search"]):
    result = chain.invoke(user_input)

Step 2c — Helicone setup (direct OpenAI / Anthropic API)

# No SDK needed — just change the base URL
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://oai.helicone.ai/v1",
    default_headers={
        "Helicone-Auth": f"Bearer {os.environ['HELICONE_API_KEY']}",
        "Helicone-Property-UserId": user_id,       # custom metadata
        "Helicone-Property-Feature": "document-qa"
    }
)
# All calls now traced automatically

Step 3 — What to instrument

Always trace:

  • Every LLM call: model, prompt, response, latency, token count
  • Tool calls: tool name, input, output, duration
  • Retrieval steps: query, top-K results, reranking scores

Add custom spans for:

# Manual span for non-LLM operations
with langfuse.start_as_current_span("document-retrieval") as span:
    span.update(metadata={"query": query, "index": "prod-v2"})
    results = vector_store.search(query)
    span.update(metadata={"results_count": len(results)})

Tag every trace with:

  • user_id or session ID (for debugging user-reported issues)
  • environment: production / staging
  • pipeline_version: lets you compare v1 vs v2 side-by-side
  • feature or use_case: document-qa, chatbot, summarization

Step 4 — Key metrics to monitor

MetricAlert thresholdHow to track
p95 LLM latency> 8 secondsLangfuse latency histogram
Error rate> 2%Langfuse error traces
Cost per request> $0.05Token count × model price
Cache hit rate< 20%Custom metadata tag
LLM-as-judge score< 3.5/5Langfuse scores API

Step 5 — Collecting user feedback

Connect real user feedback to traces to build an eval dataset:

# After the user rates a response
def record_feedback(trace_id: str, score: int, comment: str):
    langfuse.score(
        trace_id=trace_id,
        name="user_rating",
        value=score,          # 1-5
        comment=comment
    )

Traces with low scores become your eval regression dataset automatically.

Step 6 — Production readiness checklist

  • All LLM calls traced with model, latency, and token counts
  • User/session ID attached to every trace
  • Environment tag (prod/staging) on all traces
  • Latency alert set for p95 > 8s
  • Error rate alert set for > 2%
  • Cost dashboard showing daily spend by feature
  • User feedback collection wired to trace IDs
  • Sampling configured for high-volume apps (trace 10% in prod, 100% in staging)

Capabilities

skillsource-notysotyskill-llm-tracing-setuptopic-agent-skillstopic-claudetopic-claude-codetopic-claude-skillstopic-clinetopic-cursortopic-llmtopic-llm-skillstopic-skills

Install

Installnpx skills add Notysoty/openagentskills
Transportskills-sh
Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (5,967 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:13:22Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access