{"id":"99f5f890-372c-48ea-92d2-0f9bd9a8b10f","shortId":"rzEjYf","kind":"skill","title":"dspy-debugging-observability","tagline":"This skill should be used when the user asks to \"debug DSPy programs\", \"trace LLM calls\", \"monitor production DSPy\", \"use MLflow with DSPy\", mentions \"inspect_history\", \"custom callbacks\", \"observability\", \"production monitoring\", \"cost tracking\", or needs to debug, trace, and mo","description":"# DSPy Debugging & Observability\n\n## Goal\n\nDebug, trace, and monitor DSPy programs using built-in inspection, MLflow tracing, and custom callbacks for production observability.\n\n## When to Use\n\n- Debugging unexpected outputs\n- Understanding multi-step program flow\n- Production monitoring (cost, latency, errors)\n- Analyzing optimizer behavior\n- Tracking LLM API usage\n\n## Related Skills\n\n- Optimize programs: [dspy-miprov2-optimizer](../dspy-miprov2-optimizer/SKILL.md)\n- Evaluate quality: [dspy-evaluation-suite](../dspy-evaluation-suite/SKILL.md)\n- Build agents: [dspy-react-agent-builder](../dspy-react-agent-builder/SKILL.md)\n\n## Inputs\n\n| Input | Type | Description |\n|-------|------|-------------|\n| `program` | `dspy.Module` | Program to debug/monitor |\n| `callback` | `BaseCallback` | Optional custom callback (subclass of `dspy.utils.callback.BaseCallback`) |\n\n## Outputs\n\n| Output | Type | Description |\n|--------|------|-------------|\n| `GLOBAL_HISTORY` | `list[dict]` | Raw execution trace from `dspy.clients.base_lm` |\n| `metrics` | `dict` | Cost, latency, token counts from callbacks |\n\n## Workflow\n\n### Phase 1: Basic Inspection with inspect_history()\n\nThe simplest debugging approach:\n\n```python\nimport dspy\n\ndspy.configure(lm=dspy.LM(\"openai/gpt-4o-mini\"))\n\n# Run program\nqa = dspy.ChainOfThought(\"question -> answer\")\nresult = qa(question=\"What is the capital of France?\")\n\n# Inspect last execution (prints to console)\ndspy.inspect_history(n=1)\n\n# To access raw history programmatically:\nfrom dspy.clients.base_lm import GLOBAL_HISTORY\nfor entry in GLOBAL_HISTORY[-1:]:\n    print(f\"Model: {entry['model']}\")\n    print(f\"Usage: {entry.get('usage', {})}\")\n    print(f\"Cost: {entry.get('cost', 0)}\")\n```\n\n### Phase 2: MLflow Tracing\n\nMLflow integration requires explicit setup:\n\n```python\nimport dspy\nimport mlflow\n\n# Setup MLflow (4 steps required)\n# 1. Set tracking URI and experiment\nmlflow.set_tracking_uri(\"http://localhost:5000\")\nmlflow.set_experiment(\"DSPy\")\n\n# 2. Enable DSPy autologging\nmlflow.dspy.autolog(\n    log_traces=True,              # Log traces during inference\n    log_traces_from_compile=True, # Log traces when compiling/optimizing\n    log_traces_from_eval=True,    # Log traces during evaluation\n    log_compiles=True,            # Log optimization process info\n    log_evals=True                # Log evaluation call info\n)\n\ndspy.configure(lm=dspy.LM(\"openai/gpt-4o-mini\"))\n\n# Configure retriever (required before using dspy.Retrieve)\nrm = dspy.ColBERTv2(url=\"http://20.102.90.50:2017/wiki17_abstracts\")\ndspy.configure(rm=rm)\n\nclass RAGPipeline(dspy.Module):\n    def __init__(self):\n        self.retrieve = dspy.Retrieve(k=3)\n        self.generate = dspy.ChainOfThought(\"context, question -> answer\")\n\n    def forward(self, question):\n        context = self.retrieve(question).passages\n        return self.generate(context=context, question=question)\n\npipeline = RAGPipeline()\nresult = pipeline(question=\"What is machine learning?\")\n\n# View traces in MLflow UI (run in terminal): mlflow ui --port 5000\n```\n\nMLflow captures LLM calls, token usage, costs, and execution times when autolog is enabled.\n\n### Phase 3: Custom Callbacks for Production\n\nBuild custom callbacks for specialized monitoring:\n\n```python\nimport dspy\nfrom dspy.utils.callback import BaseCallback\nimport logging\nimport time\nfrom typing import Any\n\nlogger = logging.getLogger(__name__)\n\nclass ProductionMonitoringCallback(BaseCallback):\n    \"\"\"Track cost, latency, and errors in production.\"\"\"\n\n    def __init__(self):\n        super().__init__()\n        self.total_cost = 0.0\n        self.total_tokens = 0\n        self.call_count = 0\n        self.errors = []\n        self.start_times = {}\n\n    def on_lm_start(self, call_id: str, instance: Any, inputs: dict[str, Any]):\n        \"\"\"Called when LM is invoked.\"\"\"\n        self.start_times[call_id] = time.time()\n\n    def on_lm_end(self, call_id: str, outputs: dict[str, Any] | None, exception: Exception | None = None):\n        \"\"\"Called after LM finishes.\"\"\"\n        if exception:\n            self.errors.append(str(exception))\n            logger.error(f\"LLM error: {exception}\")\n            return\n\n        # Calculate latency\n        start = self.start_times.pop(call_id, time.time())\n        latency = time.time() - start\n\n        # Extract usage from outputs\n        usage = outputs.get('usage', {}) if isinstance(outputs, dict) else {}\n        tokens = usage.get('total_tokens', 0)\n        model = outputs.get('model', 'unknown') if isinstance(outputs, dict) else 'unknown'\n        cost = self._estimate_cost(model, usage)\n\n        self.total_tokens += tokens\n        self.total_cost += cost\n        self.call_count += 1\n\n        logger.info(f\"LLM call: {latency:.2f}s, {tokens} tokens, ${cost:.4f}\")\n\n    def _estimate_cost(self, model: str, usage: dict[str, int]) -> float:\n        \"\"\"Estimate cost based on model pricing (update rates for 2026).\"\"\"\n        pricing = {\n            'gpt-4o-mini': {'input': 0.00015 / 1000, 'output': 0.0006 / 1000},\n            'gpt-4o': {'input': 0.0025 / 1000, 'output': 0.01 / 1000},\n        }\n        model_key = next((k for k in pricing if k in model), 'gpt-4o-mini')\n        input_cost = usage.get('prompt_tokens', 0) * pricing[model_key]['input']\n        output_cost = usage.get('completion_tokens', 0) * pricing[model_key]['output']\n        return input_cost + output_cost\n\n    def get_metrics(self) -> dict[str, Any]:\n        \"\"\"Return aggregated metrics.\"\"\"\n        return {\n            'total_cost': self.total_cost,\n            'total_tokens': self.total_tokens,\n            'call_count': self.call_count,\n            'avg_cost_per_call': self.total_cost / max(self.call_count, 1),\n            'error_count': len(self.errors)\n        }\n\n# Usage\nmonitor = ProductionMonitoringCallback()\ndspy.configure(lm=dspy.LM(\"openai/gpt-4o-mini\"), callbacks=[monitor])\n\n# Run your program\nqa = dspy.ChainOfThought(\"question -> answer\")\nfor question in questions:\n    result = qa(question=question)\n\n# Get metrics\nmetrics = monitor.get_metrics()\nprint(f\"Total cost: ${metrics['total_cost']:.2f}\")\nprint(f\"Total calls: {metrics['call_count']}\")\nprint(f\"Errors: {metrics['error_count']}\")\n```\n\n### Phase 4: Sampling for High-Volume Production\n\nFor high-traffic applications, sample traces to reduce overhead:\n\n```python\nimport random\nfrom dspy.utils.callback import BaseCallback\nfrom typing import Any\n\nclass SamplingCallback(BaseCallback):\n    \"\"\"Sample 10% of traces.\"\"\"\n\n    def __init__(self, sample_rate: float = 0.1):\n        super().__init__()\n        self.sample_rate = sample_rate\n        self.sampled_calls = []\n\n    def on_lm_end(self, call_id: str, outputs: dict[str, Any] | None, exception: Exception | None = None):\n        \"\"\"Sample a subset of LM calls.\"\"\"\n        if random.random() < self.sample_rate:\n            self.sampled_calls.append({\n                'call_id': call_id,\n                'outputs': outputs,\n                'exception': exception\n            })\n\n# Use with high-volume apps\ncallback = SamplingCallback(sample_rate=0.1)\ndspy.configure(lm=dspy.LM(\"openai/gpt-4o-mini\"), callbacks=[callback])\n```\n\n## Best Practices\n\n1. **Use inspect_history() for debugging** - Quick inspection during development\n2. **MLflow for comprehensive tracing** - Automatic instrumentation in production\n3. **Sample high-volume traces** - Reduce overhead with 1-10% sampling\n4. **Privacy-aware logging** - Redact PII before logging\n5. **Async callbacks** - Non-blocking callbacks for production\n\n## Limitations\n\n- Callbacks are synchronous by default (can block LLM calls)\n- MLflow tracing adds ~5-10ms overhead per call\n- inspect_history() only stores recent calls (last 100 by default)\n- Custom callbacks don't capture internal optimizer steps\n- Cost estimation requires manual pricing table updates\n\n## Official Documentation\n\n- **DSPy Documentation**: https://dspy.ai/\n- **DSPy GitHub**: https://github.com/stanfordnlp/dspy\n- **Observability Guide**: https://dspy.ai/tutorials/observability/","tags":["dspy","debugging","observability","skills","omidzamani","agent-skills","claude-code","claude-skills","llm","prompt-optimization","rag"],"capabilities":["skill","source-omidzamani","skill-dspy-debugging-observability","topic-agent-skills","topic-claude-code","topic-claude-skills","topic-dspy","topic-llm","topic-prompt-optimization","topic-rag"],"categories":["dspy-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/OmidZamani/dspy-skills/dspy-debugging-observability","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add OmidZamani/dspy-skills","source_repo":"https://github.com/OmidZamani/dspy-skills","install_from":"skills.sh"}},"qualityScore":"0.487","qualityRationale":"deterministic score 0.49 from registry signals: · indexed on github topic:agent-skills · 74 github stars · SKILL.md body (8,000 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-02T06:55:44.135Z","embedding":null,"createdAt":"2026-04-18T22:14:10.311Z","updatedAt":"2026-05-02T06:55:44.135Z","lastSeenAt":"2026-05-02T06:55:44.135Z","tsv":"'-1':215 '-10':869,903 '/dspy-evaluation-suite/skill.md':107 '/dspy-miprov2-optimizer/skill.md':100 '/dspy-react-agent-builder/skill.md':115 '/stanfordnlp/dspy':942 '/tutorials/observability/':947 '0':231,441,444,530,627,637 '0.0':438 '0.00015':592 '0.0006':595 '0.0025':601 '0.01':604 '0.1':776,831 '1':157,198,251,553,679,840,868 '10':767 '100':915 '1000':593,596,602,605 '2':233,265,850 '20.102.90.50':322 '2017/wiki17_abstracts':323 '2026':585 '2f':559,720 '3':336,392,859 '4':248,735,871 '4f':564 '4o':589,599,620 '5':880,902 '5000':261,376 'access':200 'add':901 'agent':109,113 'aggreg':655 'analyz':85 'answer':179,341,699 'api':90 'app':826 'applic':746 'approach':166 'ask':13 'async':881 'autolog':268,388 'automat':855 'avg':670 'awar':874 'base':578 'basecallback':126,409,423,758,765 'basic':158 'behavior':87 'best':838 'block':885,896 'build':108,397 'builder':114 'built':57 'built-in':56 'calcul':504 'call':20,307,380,453,462,469,477,489,508,557,666,673,724,726,784,790,807,813,815,898,907,913 'callback':32,64,125,129,154,394,399,691,827,836,837,882,886,890,919 'capit':186 'captur':378,922 'class':327,421,763 'compil':280,296 'compiling/optimizing':285 'complet':635 'comprehens':853 'configur':313 'consol':194 'context':339,346,352,353 'cost':36,82,149,228,230,383,425,437,541,549,550,563,567,577,623,633,644,646,659,661,671,675,716,719,926 'count':152,443,552,667,669,678,681,727,733 'custom':31,63,128,393,398,918 'debug':3,15,41,46,49,71,165,845 'debug/monitor':124 'def':330,342,431,448,472,565,647,770,785 'default':894,917 'descript':119,136 'develop':849 'dict':140,148,459,481,524,538,572,651,794 'document':934,936 'dspi':2,16,23,27,45,53,97,104,111,169,243,264,267,405,935,938 'dspy-debugging-observ':1 'dspy-evaluation-suit':103 'dspy-miprov2-optimizer':96 'dspy-react-agent-build':110 'dspy.ai':937,946 'dspy.ai/tutorials/observability/':945 'dspy.chainofthought':177,338,697 'dspy.clients.base':145,205 'dspy.colbertv2':320 'dspy.configure':170,309,324,687,832 'dspy.inspect':195 'dspy.lm':172,311,689,834 'dspy.module':121,329 'dspy.retrieve':318,334 'dspy.utils.callback':407,756 'dspy.utils.callback.basecallback':132 'els':525,539 'enabl':266,390 'end':475,788 'entri':211,219 'entry.get':224,229 'error':84,428,501,680,730,732 'estim':566,576,927 'eval':289,303 'evalu':101,105,294,306 'except':485,486,494,497,502,798,799,819,820 'execut':142,191,385 'experi':256,263 'explicit':239 'extract':514 'f':217,222,227,499,555,714,722,729 'finish':492 'float':575,775 'flow':79 'forward':343 'franc':188 'get':648,708 'github':939 'github.com':941 'github.com/stanfordnlp/dspy':940 'global':137,208,213 'goal':48 'gpt':588,598,619 'gpt-4o':597 'gpt-4o-mini':587,618 'guid':944 'high':739,744,824,862 'high-traff':743 'high-volum':738,823,861 'histori':30,138,162,196,202,209,214,843,909 'id':454,470,478,509,791,814,816 'import':168,207,242,244,404,408,410,412,416,753,757,761 'infer':276 'info':301,308 'init':331,432,435,771,778 'input':116,117,458,591,600,622,631,643 'inspect':29,59,159,161,189,842,847,908 'instanc':456 'instrument':856 'int':574 'integr':237 'intern':923 'invok':466 'isinst':522,536 'k':335,609,611,615 'key':607,630,640 'last':190,914 'latenc':83,150,426,505,511,558 'learn':364 'len':682 'limit':889 'list':139 'llm':19,89,379,500,556,897 'lm':146,171,206,310,450,464,474,491,688,787,806,833 'localhost':260 'log':270,273,277,282,286,291,295,298,302,305,411,875,879 'logger':418 'logger.error':498 'logger.info':554 'logging.getlogger':419 'machin':363 'manual':929 'max':676 'mention':28 'metric':147,649,656,709,710,712,717,725,731 'mini':590,621 'miprov2':98 'mlflow':25,60,234,236,245,247,368,373,377,851,899 'mlflow.dspy.autolog':269 'mlflow.set':257,262 'mo':44 'model':218,220,531,533,543,569,580,606,617,629,639 'monitor':21,35,52,81,402,685,692 'monitor.get':711 'ms':904 'multi':76 'multi-step':75 'n':197 'name':420 'need':39 'next':608 'non':884 'non-block':883 'none':484,487,488,797,800,801 'observ':4,33,47,67,943 'offici':933 'openai/gpt-4o-mini':173,312,690,835 'optim':86,94,99,299,924 'option':127 'output':73,133,134,480,517,523,537,594,603,632,641,645,793,817,818 'outputs.get':519,532 'overhead':751,866,905 'passag':349 'per':672,906 'phase':156,232,391,734 'pii':877 'pipelin':356,359 'port':375 'practic':839 'price':581,586,613,628,638,930 'print':192,216,221,226,713,721,728 'privaci':873 'privacy-awar':872 'process':300 'product':22,34,66,80,396,430,741,858,888 'productionmonitoringcallback':422,686 'program':17,54,78,95,120,122,175,695 'programmat':203 'prompt':625 'python':167,241,403,752 'qa':176,181,696,705 'qualiti':102 'question':178,182,340,345,348,354,355,360,698,701,703,706,707 'quick':846 'ragpipelin':328,357 'random':754 'random.random':809 'rate':583,774,780,782,811,830 'raw':141,201 'react':112 'recent':912 'redact':876 'reduc':750,865 'relat':92 'requir':238,250,315,928 'result':180,358,704 'retriev':314 'return':350,503,642,654,657 'rm':319,325,326 'run':174,370,693 'sampl':736,747,766,773,781,802,829,860,870 'samplingcallback':764,828 'self':332,344,433,452,476,568,650,772,789 'self._estimate_cost':542 'self.call':442,551,668,677 'self.errors':445,683 'self.errors.append':495 'self.generate':337,351 'self.retrieve':333,347 'self.sample':779,810 'self.sampled':783 'self.sampled_calls.append':812 'self.start':446,467 'self.start_times.pop':507 'self.total':436,439,545,548,660,664,674 'set':252 'setup':240,246 'simplest':164 'skill':6,93 'skill-dspy-debugging-observability' 'source-omidzamani' 'special':401 'start':451,506,513 'step':77,249,925 'store':911 'str':455,460,479,482,496,570,573,652,792,795 'subclass':130 'subset':804 'suit':106 'super':434,777 'synchron':892 'tabl':931 'termin':372 'time':386,413,447,468 'time.time':471,510,512 'token':151,381,440,526,529,546,547,561,562,626,636,663,665 'topic-agent-skills' 'topic-claude-code' 'topic-claude-skills' 'topic-dspy' 'topic-llm' 'topic-prompt-optimization' 'topic-rag' 'total':528,658,662,715,718,723 'trace':18,42,50,61,143,235,271,274,278,283,287,292,366,748,769,854,864,900 'track':37,88,253,258,424 'traffic':745 'true':272,281,290,297,304 'type':118,135,415,760 'ui':369,374 'understand':74 'unexpect':72 'unknown':534,540 'updat':582,932 'uri':254,259 'url':321 'usag':91,223,225,382,515,518,520,544,571,684 'usage.get':527,624,634 'use':9,24,55,70,317,821,841 'user':12 'view':365 'volum':740,825,863 'workflow':155","prices":[{"id":"f13e309c-adae-44d3-a18c-e8d58b663f24","listingId":"99f5f890-372c-48ea-92d2-0f9bd9a8b10f","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"OmidZamani","category":"dspy-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T22:14:10.311Z"}],"sources":[{"listingId":"99f5f890-372c-48ea-92d2-0f9bd9a8b10f","source":"github","sourceId":"OmidZamani/dspy-skills/dspy-debugging-observability","sourceUrl":"https://github.com/OmidZamani/dspy-skills/tree/master/skills/dspy-debugging-observability","isPrimary":false,"firstSeenAt":"2026-04-18T22:14:10.311Z","lastSeenAt":"2026-05-02T06:55:44.135Z"}],"details":{"listingId":"99f5f890-372c-48ea-92d2-0f9bd9a8b10f","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"OmidZamani","slug":"dspy-debugging-observability","github":{"repo":"OmidZamani/dspy-skills","stars":74,"topics":["agent-skills","claude-code","claude-skills","dspy","llm","prompt-optimization","rag"],"license":"mit","html_url":"https://github.com/OmidZamani/dspy-skills","pushed_at":"2026-02-21T12:49:43Z","description":"Collection of Claude Skills for DSPy framework - program language models, optimize prompts, and build RAG pipelines systematically","skill_md_sha":"cd3f2889321b3e4511b932d43676ebb29223af26","skill_md_path":"skills/dspy-debugging-observability/SKILL.md","default_branch":"master","skill_tree_url":"https://github.com/OmidZamani/dspy-skills/tree/master/skills/dspy-debugging-observability"},"layout":"multi","source":"github","category":"dspy-skills","frontmatter":{"name":"dspy-debugging-observability","description":"This skill should be used when the user asks to \"debug DSPy programs\", \"trace LLM calls\", \"monitor production DSPy\", \"use MLflow with DSPy\", mentions \"inspect_history\", \"custom callbacks\", \"observability\", \"production monitoring\", \"cost tracking\", or needs to debug, trace, and monitor DSPy applications in development and production."},"skills_sh_url":"https://skills.sh/OmidZamani/dspy-skills/dspy-debugging-observability"},"updatedAt":"2026-05-02T06:55:44.135Z"}}