{"id":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","shortId":"yzL84k","kind":"skill","title":"bigquery-pipeline-audit","tagline":"Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness. Returns a structured report with exact patch locations.","description":"# BigQuery Pipeline Audit: Cost, Safety and Production Readiness\n\nYou are a senior data engineer reviewing a Python + BigQuery pipeline script.\nYour goals: catch runaway costs before they happen, ensure reruns do not corrupt\ndata, and make sure failures are visible.\n\nAnalyze the codebase and respond in the structure below (A to F + Final).\nReference exact function names and line locations. Suggest minimal fixes, not\nrewrites.\n\n---\n\n## A) COST EXPOSURE: What will actually get billed?\n\nLocate every BigQuery job trigger (`client.query`, `load_table_from_*`,\n`extract_table`, `copy_table`, DDL/DML via query) and every external call\n(APIs, LLM calls, storage writes).\n\nFor each, answer:\n- Is this inside a loop, retry block, or async gather?\n- What is the realistic worst-case call count?\n- For each `client.query`, is `QueryJobConfig.maximum_bytes_billed` set?\n  For load, extract, and copy jobs, is the scope bounded and counted against MAX_JOBS?\n- Is the same SQL and params being executed more than once in a single run?\n  Flag repeated identical queries and suggest query hashing plus temp table caching.\n\n**Flag immediately if:**\n- Any BQ query runs once per date or once per entity in a loop\n- Worst-case BQ job count exceeds 20\n- `maximum_bytes_billed` is missing on any `client.query` call\n\n---\n\n## B) DRY RUN AND EXECUTION MODES\n\nVerify a `--mode` flag exists with at least `dry_run` and `execute` options.\n\n- `dry_run` must print the plan and estimated scope with zero billed BQ execution\n  (BigQuery dry-run estimation via job config is allowed) and zero external API or LLM calls\n- `execute` requires explicit confirmation for prod (`--env=prod --confirm`)\n- Prod must not be the default environment\n\nIf missing, propose a minimal `argparse` patch with safe defaults.\n\n---\n\n## C) BACKFILL AND LOOP DESIGN\n\n**Hard fail if:** the script runs one BQ query per date or per entity in a loop.\n\nCheck that date-range backfills use one of:\n1. A single set-based query with `GENERATE_DATE_ARRAY`\n2. A staging table loaded with all dates then one join query\n3. Explicit chunks with a hard `MAX_CHUNKS` cap\n\nAlso check:\n- Is the date range bounded by default (suggest 14 days max without `--override`)?\n- If the script crashes mid-run, is it safe to re-run without double-writing?\n- For backdated simulations, verify data is read from time-consistent snapshots\n  (`FOR SYSTEM_TIME AS OF`, partitioned as-of tables, or dated snapshot tables).\n  Flag any read from a \"latest\" or unversioned table when running in backdated mode.\n\nSuggest a concrete rewrite if the current approach is row-by-row.\n\n---\n\n## D) QUERY SAFETY AND SCAN SIZE\n\nFor each query, check:\n- **Partition filter** is on the raw column, not `DATE(ts)`, `CAST(...)`, or\n  any function that prevents pruning\n- **No `SELECT *`**: only columns actually used downstream\n- **Joins will not explode**: verify join keys are unique or appropriately scoped\n  and flag any potential many-to-many\n- **Expensive operations** (`REGEXP`, `JSON_EXTRACT`, UDFs) only run after\n  partition filtering, not on full table scans\n\nProvide a specific SQL fix for any query that fails these checks.\n\n---\n\n## E) SAFE WRITES AND IDEMPOTENCY\n\nIdentify every write operation. Flag plain `INSERT`/append with no dedup logic.\n\nEach write should use one of:\n1. `MERGE` on a deterministic key (e.g., `entity_id + date + model_version`)\n2. Write to a staging table scoped to the run, then swap or merge into final\n3. Append-only with a dedupe view:\n   `QUALIFY ROW_NUMBER() OVER (PARTITION BY <key>) = 1`\n\nAlso check:\n- Will a re-run create duplicate rows?\n- Is the write disposition (`WRITE_TRUNCATE` vs `WRITE_APPEND`) intentional\n  and documented?\n- Is `run_id` being used as part of the merge or dedupe key? If so, flag it.\n  `run_id` should be stored as a metadata column, not as part of the uniqueness\n  key, unless you explicitly want multi-run history.\n\nState the recommended approach and the exact dedup key for this codebase.\n\n---\n\n## F) OBSERVABILITY: Can you debug a failure?\n\nVerify:\n- Failures raise exceptions and abort with no silent `except: pass` or warn-only\n- Each BQ job logs: job ID, bytes processed or billed when available,\n  slot milliseconds, and duration\n- A run summary is logged or written at the end containing:\n  `run_id, env, mode, date_range, tables written, total BQ jobs, total bytes`\n- `run_id` is present and consistent across all log lines\n\nIf `run_id` is missing, propose a one-line fix:\n`run_id = run_id or datetime.utcnow().strftime('%Y%m%dT%H%M%S')`\n\n---\n\n## Final\n\n**1. PASS / FAIL** with specific reasons per section (A to F).\n**2. Patch list** ordered by risk, referencing exact functions to change.\n**3. If FAIL: Top 3 cost risks** with a rough worst-case estimate\n(e.g., \"loop over 90 dates x 3 retries = 270 BQ jobs\").","tags":["bigquery","pipeline","audit","awesome","copilot","github","agent-skills","agents","custom-agents","github-copilot","hacktoberfest","prompt-engineering"],"capabilities":["skill","source-github","skill-bigquery-pipeline-audit","topic-agent-skills","topic-agents","topic-awesome","topic-custom-agents","topic-github-copilot","topic-hacktoberfest","topic-prompt-engineering"],"categories":["awesome-copilot"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/github/awesome-copilot/bigquery-pipeline-audit","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add github/awesome-copilot","source_repo":"https://github.com/github/awesome-copilot","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 33270 github stars · SKILL.md body (4,801 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:52:06.210Z","embedding":null,"createdAt":"2026-04-18T20:26:08.988Z","updatedAt":"2026-05-18T18:52:06.210Z","lastSeenAt":"2026-05-18T18:52:06.210Z","tsv":"'/append':547 '1':335,558,600,773 '14':377 '2':346,570,784 '20':218 '270':817 '3':358,586,795,799,815 '90':812 'abort':688 'across':744 'actual':94,484 'allow':270 'also':367,601 'analyz':64 'answer':124 'api':117,274 'append':588,619 'append-on':587 'approach':447,667 'appropri':497 'argpars':299 'array':345 'as-of':418 'async':133 'audit':4,5,26 'avail':709 'b':228 'backdat':401,438 'backfil':305,331 'base':340 'bigqueri':2,7,24,41,99,261 'bigquery-pipeline-audit':1 'bill':96,150,221,258,707 'block':131 'bound':161,373 'bq':198,214,259,316,699,734,818 'byte':149,220,704,737 'c':304 'cach':193 'call':116,119,142,227,277 'cap':366 'case':141,213,807 'cast':473 'catch':46 'chang':794 'check':326,368,462,534,602 'chunk':360,365 'client.query':102,146,226 'codebas':66,675 'column':469,483,648 'concret':442 'config':268 'confirm':281,286 'consist':410,743 'contain':724 'copi':108,156 'corrupt':56 'cost':10,27,48,90,800 'count':143,163,216 'crash':385 'creat':608 'current':446 'd':453 'data':36,57,404 'date':203,319,329,344,353,371,423,471,567,729,813 'date-rang':328 'datetime.utcnow':764 'day':378 'ddl/dml':110 'debug':680 'dedup':550,592,634,671 'default':292,303,375 'design':308 'determinist':562 'disposit':614 'document':622 'doubl':398 'double-writ':397 'downstream':486 'dri':229,242,247,263 'dry-run':262 'dt':768 'duplic':609 'durat':713 'e':535 'e.g':564,809 'end':723 'engin':37 'ensur':52 'entiti':207,322,565 'env':284,727 'environ':293 'estim':254,265,808 'everi':98,114,541 'exact':21,78,670,791 'exceed':217 'except':686,692 'execut':174,232,245,260,278 'exist':238 'expens':507 'explicit':280,359,658 'explod':490 'exposur':91 'extern':115,273 'extract':106,154,511 'f':75,676,783 'fail':310,532,775,797 'failur':61,682,684 'filter':464,517 'final':76,585,772 'fix':86,527,758 'flag':182,194,237,426,500,544,638 'full':520 'function':79,476,792 'gather':134 'generat':343 'get':95 'goal':45 'h':769 'happen':51 'hard':309,363 'hash':189 'histori':663 'id':566,625,641,703,726,739,750,760,762 'idempot':12,539 'ident':184 'identifi':540 'immedi':195 'insert':546 'insid':127 'intent':620 'job':100,157,166,215,267,700,702,735,819 'join':356,487,492 'json':510 'key':493,563,635,655,672 'latest':431 'least':241 'line':82,747,757 'list':786 'llm':118,276 'load':103,153,350 'locat':23,83,97 'log':701,718,746 'logic':551 'loop':129,210,307,325,810 'm':767,770 'make':59 'mani':504,506 'many-to-mani':503 'max':165,364,379 'maximum':219 'merg':559,583,632 'metadata':647 'mid':387 'mid-run':386 'millisecond':711 'minim':85,298 'miss':223,295,752 'mode':233,236,439,728 'model':568 'multi':661 'multi-run':660 'must':249,288 'name':80 'number':596 'observ':677 'one':315,333,355,556,756 'one-lin':755 'oper':508,543 'option':246 'order':787 'overrid':381 'param':172 'part':629,651 'partit':417,463,516,598 'pass':693,774 'patch':22,300,785 'per':202,206,318,321,779 'pipelin':3,8,25,42 'plain':545 'plan':252 'plus':190 'potenti':502 'present':741 'prevent':478 'print':250 'process':705 'prod':283,285,287 'product':14,30 'propos':296,753 'provid':523 'prune':479 'python':6,40 'qualifi':594 'queri':112,185,188,199,317,341,357,454,461,530 'queryjobconfig.maximum':148 'rais':685 'rang':330,372,730 'raw':468 're':394,606 're-run':393,605 'read':406,428 'readi':15,31 'realist':138 'reason':778 'recommend':666 'refer':77 'referenc':790 'regexp':509 'repeat':183 'report':19 'requir':279 'rerun':53 'respond':68 'retri':130,816 'return':16 'review':38 'rewrit':88,443 'risk':789,801 'rough':804 'row':450,452,595,610 'row-by-row':449 'run':181,200,230,243,248,264,314,388,395,436,514,579,607,624,640,662,715,725,738,749,759,761 'runaway':47 'safe':302,391,536 'safeti':11,28,455 'scan':457,522 'scope':160,255,498,576 'script':43,313,384 'section':780 'select':481 'senior':35 'set':151,339 'set-bas':338 'silent':691 'simul':402 'singl':180,337 'size':458 'skill' 'skill-bigquery-pipeline-audit' 'slot':710 'snapshot':411,424 'source-github' 'specif':525,777 'sql':170,526 'stage':348,574 'state':664 'storag':120 'store':644 'strftime':765 'structur':18,71 'suggest':84,187,376,440 'summari':716 'sure':60 'swap':581 'system':413 'tabl':104,107,109,192,349,421,425,434,521,575,731 'temp':191 'time':409,414 'time-consist':408 'top':798 'topic-agent-skills' 'topic-agents' 'topic-awesome' 'topic-custom-agents' 'topic-github-copilot' 'topic-hacktoberfest' 'topic-prompt-engineering' 'total':733,736 'trigger':101 'truncat':616 'ts':472 'udf':512 'uniqu':495,654 'unless':656 'unvers':433 'use':332,485,555,627 'verifi':234,403,491,683 'version':569 'via':111,266 'view':593 'visibl':63 'vs':617 'want':659 'warn':696 'warn-on':695 'without':380,396 'worst':140,212,806 'worst-cas':139,211,805 'write':121,399,537,542,553,571,613,615,618 'written':720,732 'x':814 'y':766 'zero':257,272","prices":[{"id":"0ef0c24c-768a-4438-85c8-82d525619433","listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"github","category":"awesome-copilot","install_from":"skills.sh"},"createdAt":"2026-04-18T20:26:08.988Z"}],"sources":[{"listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","source":"github","sourceId":"github/awesome-copilot/bigquery-pipeline-audit","sourceUrl":"https://github.com/github/awesome-copilot/tree/main/skills/bigquery-pipeline-audit","isPrimary":false,"firstSeenAt":"2026-04-18T21:48:28.161Z","lastSeenAt":"2026-05-18T18:52:06.210Z"},{"listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","source":"skills_sh","sourceId":"github/awesome-copilot/bigquery-pipeline-audit","sourceUrl":"https://skills.sh/github/awesome-copilot/bigquery-pipeline-audit","isPrimary":true,"firstSeenAt":"2026-04-18T20:26:08.988Z","lastSeenAt":"2026-05-07T22:40:18.720Z"}],"details":{"listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"github","slug":"bigquery-pipeline-audit","github":{"repo":"github/awesome-copilot","stars":33270,"topics":["agent-skills","agents","ai","awesome","custom-agents","github-copilot","hacktoberfest","prompt-engineering"],"license":"mit","html_url":"https://github.com/github/awesome-copilot","pushed_at":"2026-05-18T01:26:59Z","description":"Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.","skill_md_sha":"5894927e2b6be515b22e78e7ae0d18ee8f736c30","skill_md_path":"skills/bigquery-pipeline-audit/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/github/awesome-copilot/tree/main/skills/bigquery-pipeline-audit"},"layout":"multi","source":"github","category":"awesome-copilot","frontmatter":{"name":"bigquery-pipeline-audit","description":"Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness. Returns a structured report with exact patch locations."},"skills_sh_url":"https://skills.sh/github/awesome-copilot/bigquery-pipeline-audit"},"updatedAt":"2026-05-18T18:52:06.210Z"}}