{"id":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","shortId":"yzL84k","kind":"skill","title":"Bigquery Pipeline Audit","tagline":"Awesome Copilot skill by Github","description":"# BigQuery Pipeline Audit: Cost, Safety and Production Readiness\n\nYou are a senior data engineer reviewing a Python + BigQuery pipeline script.\nYour goals: catch runaway costs before they happen, ensure reruns do not corrupt\ndata, and make sure failures are visible.\n\nAnalyze the codebase and respond in the structure below (A to F + Final).\nReference exact function names and line locations. Suggest minimal fixes, not\nrewrites.\n\n---\n\n## A) COST EXPOSURE: What will actually get billed?\n\nLocate every BigQuery job trigger (`client.query`, `load_table_from_*`,\n`extract_table`, `copy_table`, DDL/DML via query) and every external call\n(APIs, LLM calls, storage writes).\n\nFor each, answer:\n- Is this inside a loop, retry block, or async gather?\n- What is the realistic worst-case call count?\n- For each `client.query`, is `QueryJobConfig.maximum_bytes_billed` set?\n  For load, extract, and copy jobs, is the scope bounded and counted against MAX_JOBS?\n- Is the same SQL and params being executed more than once in a single run?\n  Flag repeated identical queries and suggest query hashing plus temp table caching.\n\n**Flag immediately if:**\n- Any BQ query runs once per date or once per entity in a loop\n- Worst-case BQ job count exceeds 20\n- `maximum_bytes_billed` is missing on any `client.query` call\n\n---\n\n## B) DRY RUN AND EXECUTION MODES\n\nVerify a `--mode` flag exists with at least `dry_run` and `execute` options.\n\n- `dry_run` must print the plan and estimated scope with zero billed BQ execution\n  (BigQuery dry-run estimation via job config is allowed) and zero external API or LLM calls\n- `execute` requires explicit confirmation for prod (`--env=prod --confirm`)\n- Prod must not be the default environment\n\nIf missing, propose a minimal `argparse` patch with safe defaults.\n\n---\n\n## C) BACKFILL AND LOOP DESIGN\n\n**Hard fail if:** the script runs one BQ query per date or per entity in a loop.\n\nCheck that date-range backfills use one of:\n1. A single set-based query with `GENERATE_DATE_ARRAY`\n2. A staging table loaded with all dates then one join query\n3. Explicit chunks with a hard `MAX_CHUNKS` cap\n\nAlso check:\n- Is the date range bounded by default (suggest 14 days max without `--override`)?\n- If the script crashes mid-run, is it safe to re-run without double-writing?\n- For backdated simulations, verify data is read from time-consistent snapshots\n  (`FOR SYSTEM_TIME AS OF`, partitioned as-of tables, or dated snapshot tables).\n  Flag any read from a \"latest\" or unversioned table when running in backdated mode.\n\nSuggest a concrete rewrite if the current approach is row-by-row.\n\n---\n\n## D) QUERY SAFETY AND SCAN SIZE\n\nFor each query, check:\n- **Partition filter** is on the raw column, not `DATE(ts)`, `CAST(...)`, or\n  any function that prevents pruning\n- **No `SELECT *`**: only columns actually used downstream\n- **Joins will not explode**: verify join keys are unique or appropriately scoped\n  and flag any potential many-to-many\n- **Expensive operations** (`REGEXP`, `JSON_EXTRACT`, UDFs) only run after\n  partition filtering, not on full table scans\n\nProvide a specific SQL fix for any query that fails these checks.\n\n---\n\n## E) SAFE WRITES AND IDEMPOTENCY\n\nIdentify every write operation. Flag plain `INSERT`/append with no dedup logic.\n\nEach write should use one of:\n1. `MERGE` on a deterministic key (e.g., `entity_id + date + model_version`)\n2. Write to a staging table scoped to the run, then swap or merge into final\n3. Append-only with a dedupe view:\n   `QUALIFY ROW_NUMBER() OVER (PARTITION BY <key>) = 1`\n\nAlso check:\n- Will a re-run create duplicate rows?\n- Is the write disposition (`WRITE_TRUNCATE` vs `WRITE_APPEND`) intentional\n  and documented?\n- Is `run_id` being used as part of the merge or dedupe key? If so, flag it.\n  `run_id` should be stored as a metadata column, not as part of the uniqueness\n  key, unless you explicitly want multi-run history.\n\nState the recommended approach and the exact dedup key for this codebase.\n\n---\n\n## F) OBSERVABILITY: Can you debug a failure?\n\nVerify:\n- Failures raise exceptions and abort with no silent `except: pass` or warn-only\n- Each BQ job logs: job ID, bytes processed or billed when available,\n  slot milliseconds, and duration\n- A run summary is logged or written at the end containing:\n  `run_id, env, mode, date_range, tables written, total BQ jobs, total bytes`\n- `run_id` is present and consistent across all log lines\n\nIf `run_id` is missing, propose a one-line fix:\n`run_id = run_id or datetime.utcnow().strftime('%Y%m%dT%H%M%S')`\n\n---\n\n## Final\n\n**1. PASS / FAIL** with specific reasons per section (A to F).\n**2. Patch list** ordered by risk, referencing exact functions to change.\n**3. If FAIL: Top 3 cost risks** with a rough worst-case estimate\n(e.g., \"loop over 90 dates x 3 retries = 270 BQ jobs\").","tags":["bigquery","pipeline","audit","awesome","copilot","github"],"capabilities":["skill","source-github","category-awesome-copilot"],"categories":["awesome-copilot"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/github/awesome-copilot/bigquery-pipeline-audit","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"install_from":"skills.sh"}},"qualityScore":"0.300","qualityRationale":"deterministic score 0.30 from registry signals: · indexed on skills.sh · published under github/awesome-copilot","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T09:40:13.762Z","embedding":null,"createdAt":"2026-04-18T20:26:08.988Z","updatedAt":"2026-04-22T09:40:13.762Z","lastSeenAt":"2026-04-22T09:40:13.762Z","tsv":"'/append':532 '1':320,543,585,758 '14':362 '2':331,555,769 '20':203 '270':802 '3':343,571,780,784,800 '90':797 'abort':673 'across':729 'actual':79,469 'allow':255 'also':352,586 'analyz':49 'answer':109 'api':102,259 'append':573,604 'append-on':572 'approach':432,652 'appropri':482 'argpars':284 'array':330 'as-of':403 'async':118 'audit':3,11 'avail':694 'awesom':4 'b':213 'backdat':386,423 'backfil':290,316 'base':325 'bigqueri':1,9,26,84,246 'bill':81,135,206,243,692 'block':116 'bound':146,358 'bq':183,199,244,301,684,719,803 'byte':134,205,689,722 'c':289 'cach':178 'call':101,104,127,212,262 'cap':351 'case':126,198,792 'cast':458 'catch':31 'category-awesome-copilot' 'chang':779 'check':311,353,447,519,587 'chunk':345,350 'client.query':87,131,211 'codebas':51,660 'column':454,468,633 'concret':427 'config':253 'confirm':266,271 'consist':395,728 'contain':709 'copi':93,141 'copilot':5 'corrupt':41 'cost':12,33,75,785 'count':128,148,201 'crash':370 'creat':593 'current':431 'd':438 'data':21,42,389 'date':188,304,314,329,338,356,408,456,552,714,798 'date-rang':313 'datetime.utcnow':749 'day':363 'ddl/dml':95 'debug':665 'dedup':535,577,619,656 'default':277,288,360 'design':293 'determinist':547 'disposit':599 'document':607 'doubl':383 'double-writ':382 'downstream':471 'dri':214,227,232,248 'dry-run':247 'dt':753 'duplic':594 'durat':698 'e':520 'e.g':549,794 'end':708 'engin':22 'ensur':37 'entiti':192,307,550 'env':269,712 'environ':278 'estim':239,250,793 'everi':83,99,526 'exact':63,655,776 'exceed':202 'except':671,677 'execut':159,217,230,245,263 'exist':223 'expens':492 'explicit':265,344,643 'explod':475 'exposur':76 'extern':100,258 'extract':91,139,496 'f':60,661,768 'fail':295,517,760,782 'failur':46,667,669 'filter':449,502 'final':61,570,757 'fix':71,512,743 'flag':167,179,222,411,485,529,623 'full':505 'function':64,461,777 'gather':119 'generat':328 'get':80 'github':8 'goal':30 'h':754 'happen':36 'hard':294,348 'hash':174 'histori':648 'id':551,610,626,688,711,724,735,745,747 'idempot':524 'ident':169 'identifi':525 'immedi':180 'insert':531 'insid':112 'intent':605 'job':85,142,151,200,252,685,687,720,804 'join':341,472,477 'json':495 'key':478,548,620,640,657 'latest':416 'least':226 'line':67,732,742 'list':771 'llm':103,261 'load':88,138,335 'locat':68,82 'log':686,703,731 'logic':536 'loop':114,195,292,310,795 'm':752,755 'make':44 'mani':489,491 'many-to-mani':488 'max':150,349,364 'maximum':204 'merg':544,568,617 'metadata':632 'mid':372 'mid-run':371 'millisecond':696 'minim':70,283 'miss':208,280,737 'mode':218,221,424,713 'model':553 'multi':646 'multi-run':645 'must':234,273 'name':65 'number':581 'observ':662 'one':300,318,340,541,741 'one-lin':740 'oper':493,528 'option':231 'order':772 'overrid':366 'param':157 'part':614,636 'partit':402,448,501,583 'pass':678,759 'patch':285,770 'per':187,191,303,306,764 'pipelin':2,10,27 'plain':530 'plan':237 'plus':175 'potenti':487 'present':726 'prevent':463 'print':235 'process':690 'prod':268,270,272 'product':15 'propos':281,738 'provid':508 'prune':464 'python':25 'qualifi':579 'queri':97,170,173,184,302,326,342,439,446,515 'queryjobconfig.maximum':133 'rais':670 'rang':315,357,715 'raw':453 're':379,591 're-run':378,590 'read':391,413 'readi':16 'realist':123 'reason':763 'recommend':651 'refer':62 'referenc':775 'regexp':494 'repeat':168 'requir':264 'rerun':38 'respond':53 'retri':115,801 'review':23 'rewrit':73,428 'risk':774,786 'rough':789 'row':435,437,580,595 'row-by-row':434 'run':166,185,215,228,233,249,299,373,380,421,499,564,592,609,625,647,700,710,723,734,744,746 'runaway':32 'safe':287,376,521 'safeti':13,440 'scan':442,507 'scope':145,240,483,561 'script':28,298,369 'section':765 'select':466 'senior':20 'set':136,324 'set-bas':323 'silent':676 'simul':387 'singl':165,322 'size':443 'skill':6 'slot':695 'snapshot':396,409 'source-github' 'specif':510,762 'sql':155,511 'stage':333,559 'state':649 'storag':105 'store':629 'strftime':750 'structur':56 'suggest':69,172,361,425 'summari':701 'sure':45 'swap':566 'system':398 'tabl':89,92,94,177,334,406,410,419,506,560,716 'temp':176 'time':394,399 'time-consist':393 'top':783 'total':718,721 'trigger':86 'truncat':601 'ts':457 'udf':497 'uniqu':480,639 'unless':641 'unvers':418 'use':317,470,540,612 'verifi':219,388,476,668 'version':554 'via':96,251 'view':578 'visibl':48 'vs':602 'want':644 'warn':681 'warn-on':680 'without':365,381 'worst':125,197,791 'worst-cas':124,196,790 'write':106,384,522,527,538,556,598,600,603 'written':705,717 'x':799 'y':751 'zero':242,257","prices":[{"id":"0ef0c24c-768a-4438-85c8-82d525619433","listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"github","category":"awesome-copilot","install_from":"skills.sh"},"createdAt":"2026-04-18T20:26:08.988Z"}],"sources":[{"listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","source":"github","sourceId":"github/awesome-copilot/bigquery-pipeline-audit","sourceUrl":"https://github.com/github/awesome-copilot/tree/main/skills/bigquery-pipeline-audit","isPrimary":false,"firstSeenAt":"2026-04-18T21:48:28.161Z","lastSeenAt":"2026-04-22T06:52:15.720Z"},{"listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","source":"skills_sh","sourceId":"github/awesome-copilot/bigquery-pipeline-audit","sourceUrl":"https://skills.sh/github/awesome-copilot/bigquery-pipeline-audit","isPrimary":true,"firstSeenAt":"2026-04-18T20:26:08.988Z","lastSeenAt":"2026-04-22T09:40:13.762Z"}],"details":{"listingId":"dd60bd4e-ed9a-444d-b7d8-b374bf22cd5d","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"github","slug":"bigquery-pipeline-audit","source":"skills_sh","category":"awesome-copilot","skills_sh_url":"https://skills.sh/github/awesome-copilot/bigquery-pipeline-audit"},"updatedAt":"2026-04-22T09:40:13.762Z"}}