Skillquality 0.46

nutmeg-wrangle

Transform, filter, reshape, join, and manipulate football data. Use when the user needs to clean data, merge datasets, convert between formats, handle missing values, work with large datasets, or do any data manipulation task on football data.

Price
free
Protocol
skill
Verified
no

What it does

Wrangle

Help the user manipulate football data effectively. This skill is about the mechanics of working with data, adapted to the user's language and tools.

Accuracy

Read and follow docs/accuracy-guardrail.md before answering any question about provider-specific facts (IDs, endpoints, schemas, coordinates, rate limits). Always use search_docs — never guess from training data.

First: check profile

Read .nutmeg.user.md. If it doesn't exist, tell the user to run /nutmeg first. Use their profile for language preference and stack.

Core operations

Coordinate transforms

Football data coordinates vary by provider. Always verify and convert before combining data.

Use search_docs(query="coordinate system", provider="[provider]") to look up the specific system. Key conversions:

  • Opta (0-100) to StatsBomb (120x80): x * 1.2, y * 0.8
  • Wyscout to Opta: x stays, y = 100 - y (invert Y)
  • Any to kloppy normalised: use kloppy's .transform() in Python

Filtering events

Common filtering patterns for football event data:

By event type:

  • Shots: filter for shot/miss/goal/saved event types
  • Passes in final third: filter passes where x > 66.7 (Opta coords)
  • Defensive actions: tackles + interceptions + ball recoveries

By match state:

  • Open play only: exclude set pieces (corners, free kicks, throw-ins, penalties)
  • First half vs second half: use periodId or timestamp
  • Score state: track running score to filter "when winning", "when losing"

By zone:

  • Penalty area actions: x > 83, 21 < y < 79 (Opta coords)
  • High press: actions in opponent's defensive third (x > 66.7)

Joining datasets

Common joins in football data:

JoinKeyNotes
Events + lineupsplayer_id + match_idGet player names/positions for each event
Events + xGmatch_id + event sequenceMatch xG to specific shots
Multiple providersmatch date + team namesFuzzy matching often needed
Season data + ElodateJoin Elo rating at time of match

Fuzzy team name matching is a constant pain. Build a mapping table:

TEAM_MAP = {
    'Man City': 'Manchester City',
    'Man United': 'Manchester United',
    'Spurs': 'Tottenham Hotspur',
    'Wolves': 'Wolverhampton Wanderers',
    # ...
}

Reshaping

Common reshaping operations:

  • Wide to long: Season stats tables (one column per stat) to tidy format (one row per stat per team)
  • Events to possession chains: Group consecutive events by the same team into possession sequences
  • Match-level to season aggregates: Group by team, sum/average per-match values
  • Player-match to player-season: Aggregate across matches, weight by minutes played

Handling large datasets

Full event data for a PL season is ~500MB+ (380 matches x ~1700 events). Strategies:

Python:

  • Use polars instead of pandas for 5-10x speed improvement
  • Process match-by-match in a loop, don't load all into memory
  • Use DuckDB for SQL queries on Parquet files without loading into memory

JavaScript/TypeScript:

  • Stream JSON files with readline or JSONStream
  • Use SQLite (better-sqlite3) for local queries
  • Process files in parallel with worker threads

R:

  • Use data.table instead of tidyverse for large datasets
  • Arrow/Parquet for out-of-memory processing

Data quality checks

Always validate after wrangling:

CheckWhat to look for
Event counts~1500-2000 events per PL match. Much less = data issue
Coordinate rangeShould be within provider's expected range
Missing player IDsSome events lack player attribution (ball out, etc.)
Duplicate eventsSame event_id appearing twice
Time gapsLarge gaps in event timestamps within a match
Team attributionVerify home/away assignment is consistent

Format conversion

FromToTool/method
JSON eventsDataFramepandas/polars read_json or manual parsing
CSVParquetdf.write_parquet() (polars) or df.to_parquet() (pandas)
Provider formatkloppy modelkloppy.load_{provider}() in Python
kloppy modelDataFramedataset.to_df()
AnySQLiteLoad into SQLite for ad-hoc queries

Capabilities

skillsource-withqwertyskill-wrangletopic-agent-skillstopic-claude-codetopic-claude-code-plugintopic-football-analyticstopic-football-datatopic-mcptopic-optatopic-sports-analyticstopic-statsbomb

Install

Installnpx skills add withqwerty/nutmeg
Transportskills-sh
Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 17 github stars · SKILL.md body (4,266 chars)

Provenance

Indexed fromgithub
Enriched2026-04-23 01:02:06Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-04-23

Agent access