{"id":"24984c2c-62d5-4a5a-88de-469d39908f1d","shortId":"LAYSVb","kind":"skill","title":"nutmeg-wrangle","tagline":"Transform, filter, reshape, join, and manipulate football data. Use when the user needs to clean data, merge datasets, convert between formats, handle missing values, work with large datasets, or do any data manipulation task on football data.","description":"# Wrangle\n\nHelp the user manipulate football data effectively. This skill is about the mechanics of working with data, adapted to the user's language and tools.\n\n## Accuracy\n\nRead and follow `docs/accuracy-guardrail.md` before answering any question about provider-specific facts (IDs, endpoints, schemas, coordinates, rate limits). Always use `search_docs` — never guess from training data.\n## First: check profile\n\nRead `.nutmeg.user.md`. If it doesn't exist, tell the user to run `/nutmeg` first. Use their profile for language preference and stack.\n\n## Core operations\n\n### Coordinate transforms\n\nFootball data coordinates vary by provider. Always verify and convert before combining data.\n\nUse `search_docs(query=\"coordinate system\", provider=\"[provider]\")` to look up the specific system. Key conversions:\n\n- Opta (0-100) to StatsBomb (120x80): `x * 1.2`, `y * 0.8`\n- Wyscout to Opta: `x` stays, `y = 100 - y` (invert Y)\n- Any to kloppy normalised: use kloppy's `.transform()` in Python\n\n### Filtering events\n\nCommon filtering patterns for football event data:\n\n**By event type:**\n- Shots: filter for shot/miss/goal/saved event types\n- Passes in final third: filter passes where x > 66.7 (Opta coords)\n- Defensive actions: tackles + interceptions + ball recoveries\n\n**By match state:**\n- Open play only: exclude set pieces (corners, free kicks, throw-ins, penalties)\n- First half vs second half: use periodId or timestamp\n- Score state: track running score to filter \"when winning\", \"when losing\"\n\n**By zone:**\n- Penalty area actions: x > 83, 21 < y < 79 (Opta coords)\n- High press: actions in opponent's defensive third (x > 66.7)\n\n### Joining datasets\n\nCommon joins in football data:\n\n| Join | Key | Notes |\n|------|-----|-------|\n| Events + lineups | player_id + match_id | Get player names/positions for each event |\n| Events + xG | match_id + event sequence | Match xG to specific shots |\n| Multiple providers | match date + team names | Fuzzy matching often needed |\n| Season data + Elo | date | Join Elo rating at time of match |\n\n**Fuzzy team name matching** is a constant pain. Build a mapping table:\n```python\nTEAM_MAP = {\n    'Man City': 'Manchester City',\n    'Man United': 'Manchester United',\n    'Spurs': 'Tottenham Hotspur',\n    'Wolves': 'Wolverhampton Wanderers',\n    # ...\n}\n```\n\n### Reshaping\n\nCommon reshaping operations:\n\n- **Wide to long:** Season stats tables (one column per stat) to tidy format (one row per stat per team)\n- **Events to possession chains:** Group consecutive events by the same team into possession sequences\n- **Match-level to season aggregates:** Group by team, sum/average per-match values\n- **Player-match to player-season:** Aggregate across matches, weight by minutes played\n\n### Handling large datasets\n\nFull event data for a PL season is ~500MB+ (380 matches x ~1700 events). Strategies:\n\n**Python:**\n- Use polars instead of pandas for 5-10x speed improvement\n- Process match-by-match in a loop, don't load all into memory\n- Use DuckDB for SQL queries on Parquet files without loading into memory\n\n**JavaScript/TypeScript:**\n- Stream JSON files with `readline` or `JSONStream`\n- Use SQLite (better-sqlite3) for local queries\n- Process files in parallel with worker threads\n\n**R:**\n- Use data.table instead of tidyverse for large datasets\n- Arrow/Parquet for out-of-memory processing\n\n### Data quality checks\n\nAlways validate after wrangling:\n\n| Check | What to look for |\n|-------|-----------------|\n| Event counts | ~1500-2000 events per PL match. Much less = data issue |\n| Coordinate range | Should be within provider's expected range |\n| Missing player IDs | Some events lack player attribution (ball out, etc.) |\n| Duplicate events | Same event_id appearing twice |\n| Time gaps | Large gaps in event timestamps within a match |\n| Team attribution | Verify home/away assignment is consistent |\n\n### Format conversion\n\n| From | To | Tool/method |\n|------|-----|------------|\n| JSON events | DataFrame | pandas/polars `read_json` or manual parsing |\n| CSV | Parquet | `df.write_parquet()` (polars) or `df.to_parquet()` (pandas) |\n| Provider format | kloppy model | `kloppy.load_{provider}()` in Python |\n| kloppy model | DataFrame | `dataset.to_df()` |\n| Any | SQLite | Load into SQLite for ad-hoc queries |","tags":["wrangle","nutmeg","withqwerty","agent-skills","claude-code","claude-code-plugin","football-analytics","football-data","mcp","opta","sports-analytics","statsbomb"],"capabilities":["skill","source-withqwerty","skill-wrangle","topic-agent-skills","topic-claude-code","topic-claude-code-plugin","topic-football-analytics","topic-football-data","topic-mcp","topic-opta","topic-sports-analytics","topic-statsbomb"],"categories":["nutmeg"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/withqwerty/nutmeg/wrangle","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add withqwerty/nutmeg","source_repo":"https://github.com/withqwerty/nutmeg","install_from":"skills.sh"}},"qualityScore":"0.458","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 17 github stars · SKILL.md body (4,266 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-23T01:02:06.791Z","embedding":null,"createdAt":"2026-04-18T23:06:54.672Z","updatedAt":"2026-04-23T01:02:06.791Z","lastSeenAt":"2026-04-23T01:02:06.791Z","tsv":"'-10':451 '-100':156 '-2000':535 '/nutmeg':111 '0':155 '0.8':163 '1.2':161 '100':170 '120x80':159 '1500':534 '1700':440 '21':262 '380':437 '5':450 '500mb':436 '66.7':210,276 '79':264 '83':261 'accuraci':67 'across':419 'action':214,259,269 'ad':631 'ad-hoc':630 'adapt':59 'aggreg':402,418 'alway':87,131,523 'answer':73 'appear':569 'area':258 'arrow/parquet':513 'assign':585 'attribut':560,582 'ball':217,561 'better':492 'better-sqlite3':491 'build':339 'chain':386 'check':97,522,527 'citi':347,349 'clean':18 'column':371 'combin':136 'common':186,279,361 'consecut':388 'consist':587 'constant':337 'convers':153,589 'convert':22,134 'coord':212,266 'coordin':84,123,127,142,544 'core':121 'corner':228 'count':533 'csv':602 'data':11,19,35,40,47,58,95,126,137,192,283,321,430,520,542 'data.table':506 'datafram':595,621 'dataset':21,31,278,427,512 'dataset.to':622 'date':313,323 'defens':213,273 'df':623 'df.to':608 'df.write':604 'doc':90,140 'docs/accuracy-guardrail.md':71 'doesn':103 'duckdb':470 'duplic':564 'effect':48 'elo':322,325 'endpoint':82 'etc':563 'event':185,191,194,200,287,298,299,303,383,389,429,441,532,536,557,565,567,576,594 'exclud':225 'exist':105 'expect':551 'fact':80 'file':476,484,498 'filter':5,184,187,197,206,250 'final':204 'first':96,112,235 'follow':70 'footbal':10,39,46,125,190,282 'format':24,376,588,612 'free':229 'full':428 'fuzzi':316,331 'gap':572,574 'get':293 'group':387,403 'guess':92 'half':236,239 'handl':25,425 'help':42 'high':267 'hoc':632 'home/away':584 'hotspur':356 'id':81,290,292,302,555,568 'improv':454 'in':233 'instead':446,507 'intercept':216 'invert':172 'issu':543 'javascript/typescript':481 'join':7,277,280,284,324 'json':483,593,598 'jsonstream':488 'key':152,285 'kick':230 'kloppi':176,179,613,619 'kloppy.load':615 'lack':558 'languag':64,117 'larg':30,426,511,573 'less':541 'level':399 'limit':86 'lineup':288 'load':465,478,626 'local':495 'long':366 'look':147,530 'loop':462 'lose':254 'man':346,350 'manchest':348,352 'manipul':9,36,45 'manual':600 'map':341,345 'match':220,291,301,305,312,317,330,334,398,409,413,420,438,457,459,539,580 'match-by-match':456 'match-level':397 'mechan':54 'memori':468,480,518 'merg':20 'minut':423 'miss':26,553 'model':614,620 'much':540 'multipl':310 'name':315,333 'names/positions':295 'need':16,319 'never':91 'normalis':177 'note':286 'nutmeg':2 'nutmeg-wrangl':1 'nutmeg.user.md':100 'often':318 'one':370,377 'open':222 'oper':122,363 'oppon':271 'opta':154,166,211,265 'out-of-memori':515 'pain':338 'panda':448,610 'pandas/polars':596 'parallel':500 'parquet':475,603,605,609 'pars':601 'pass':202,207 'pattern':188 'penalti':234,257 'per':372,379,381,408,537 'per-match':407 'periodid':241 'piec':227 'pl':433,538 'play':223,424 'player':289,294,412,416,554,559 'player-match':411 'player-season':415 'polar':445,606 'possess':385,395 'prefer':118 'press':268 'process':455,497,519 'profil':98,115 'provid':78,130,144,145,311,549,611,616 'provider-specif':77 'python':183,343,443,618 'qualiti':521 'queri':141,473,496,633 'question':75 'r':504 'rang':545,552 'rate':85,326 'read':68,99,597 'readlin':486 'recoveri':218 'reshap':6,360,362 'row':378 'run':110,247 'schema':83 'score':244,248 'search':89,139 'season':320,367,401,417,434 'second':238 'sequenc':304,396 'set':226 'shot':196,309 'shot/miss/goal/saved':199 'skill':50 'skill-wrangle' 'source-withqwerty' 'specif':79,150,308 'speed':453 'spur':354 'sql':472 'sqlite':490,625,628 'sqlite3':493 'stack':120 'stat':368,373,380 'state':221,245 'statsbomb':158 'stay':168 'strategi':442 'stream':482 'sum/average':406 'system':143,151 'tabl':342,369 'tackl':215 'task':37 'team':314,332,344,382,393,405,581 'tell':106 'third':205,274 'thread':503 'throw':232 'throw-in':231 'tidi':375 'tidyvers':509 'time':328,571 'timestamp':243,577 'tool':66 'tool/method':592 'topic-agent-skills' 'topic-claude-code' 'topic-claude-code-plugin' 'topic-football-analytics' 'topic-football-data' 'topic-mcp' 'topic-opta' 'topic-sports-analytics' 'topic-statsbomb' 'tottenham':355 'track':246 'train':94 'transform':4,124,181 'twice':570 'type':195,201 'unit':351,353 'use':12,88,113,138,178,240,444,469,489,505 'user':15,44,62,108 'valid':524 'valu':27,410 'vari':128 'verifi':132,583 'vs':237 'wander':359 'weight':421 'wide':364 'win':252 'within':548,578 'without':477 'wolv':357 'wolverhampton':358 'work':28,56 'worker':502 'wrangl':3,41,526 'wyscout':164 'x':160,167,209,260,275,439,452 'xg':300,306 'y':162,169,171,173,263 'zone':256","prices":[{"id":"f4643284-c7ab-4ab7-9311-59102db1b2cd","listingId":"24984c2c-62d5-4a5a-88de-469d39908f1d","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"withqwerty","category":"nutmeg","install_from":"skills.sh"},"createdAt":"2026-04-18T23:06:54.672Z"}],"sources":[{"listingId":"24984c2c-62d5-4a5a-88de-469d39908f1d","source":"github","sourceId":"withqwerty/nutmeg/wrangle","sourceUrl":"https://github.com/withqwerty/nutmeg/tree/main/skills/wrangle","isPrimary":false,"firstSeenAt":"2026-04-18T23:06:54.672Z","lastSeenAt":"2026-04-23T01:02:06.791Z"}],"details":{"listingId":"24984c2c-62d5-4a5a-88de-469d39908f1d","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"withqwerty","slug":"wrangle","github":{"repo":"withqwerty/nutmeg","stars":17,"topics":["agent-skills","claude-code","claude-code-plugin","football-analytics","football-data","mcp","opta","sports-analytics","statsbomb","xg"],"license":null,"html_url":"https://github.com/withqwerty/nutmeg","pushed_at":"2026-04-16T02:33:15Z","description":"Football data analytics toolkit for Claude Code. Covers Opta, StatsBomb, Wyscout, SportMonks, and free sources.","skill_md_sha":"d96334dafa94ac902cebdfcd2d1f612d9fd5e0ac","skill_md_path":"skills/wrangle/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/withqwerty/nutmeg/tree/main/skills/wrangle"},"layout":"multi","source":"github","category":"nutmeg","frontmatter":{"name":"nutmeg-wrangle","description":"Transform, filter, reshape, join, and manipulate football data. Use when the user needs to clean data, merge datasets, convert between formats, handle missing values, work with large datasets, or do any data manipulation task on football data."},"skills_sh_url":"https://skills.sh/withqwerty/nutmeg/wrangle"},"updatedAt":"2026-04-23T01:02:06.791Z"}}