Skillquality 0.49

gpt-image

General-purpose image generation and reference-image editing via OpenAI GPT Image 2 (`gpt-image-2`). Wraps the two official endpoints from the OpenAI cookbook — `/v1/images/generations` for text-to-image and `/v1/images/edits` for reference-image edits (including alpha-channel ma

Price
free
Protocol
skill
Verified
no

What it does

gpt-image

General image generation/editing CLI for OpenAI's gpt-image-2. Designed for agents: all API parameters are first-class flags, defaults are sane, output is a file on disk. The skill auto-loads when Claude detects an image-generation intent — no slash command needed.

One-line usage

# As a Claude Code plugin (installed via /plugin install):
uv run "$CLAUDE_PLUGIN_ROOT/skills/gpt-image/scripts/generate.py" -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]

# As a direct CLI (installed via uvx or uv tool install):
uvx --from git+https://github.com/wuyoscar/gpt_image_2_skill gpt-image -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]

# Or once installed globally:
gpt-image -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]

Reads OPENAI_API_KEY from env. Writes to OUT (or auto-named YYYY-MM-DD-HH-MM-SS-<slug>.png in ./fig/ or cwd). Prints output path(s) on stdout. Exit 0 on success, 1 on API error, 2 on bad args / missing key.

CLI flags (complete reference)

FlagType / ValuesDefaultApplies toDescription
-p, --promptstr— requiredbothText prompt for generation, or edit instruction.
-f, --filepathautobothOutput path. Auto-gen if omitted. Extension follows --format.
-i, --imagepath (repeatable)editsReference image(s). Presence routes through /v1/images/edits (the official endpoint per the OpenAI cookbook).
-m, --maskpatheditsAlpha-channel PNG mask. Opaque pixels are preserved, transparent pixels are regenerated. Edits endpoint only — requires -i.
--input-fidelitylow | higheditsControls how closely the output tracks the reference. Supported on gpt-image-1 and gpt-image-1.5; silently ignored by gpt-image-2 (already high fidelity by default).
--modelstrgpt-image-2bothModel ID. Fallbacks: gpt-image-1.5, gpt-image-1, gpt-image-1-mini.
--sizeliteral / shortcut1024x1024bothLiterals: 1024x1024, 1536x1024, 1024x1536, 2048x2048, 2048x1152, 3840x2160, 2160x3840, or any 16-px multiple up to 3840 max edge (3:1 ratio cap, 655k–8.3M total pixels). Shortcuts: 1k 2k 4k portrait landscape square wide tall.
--qualityauto | low | medium | highhighbothCost roughly 10× per step. low ≈ $0.005/img, medium ≈ $0.04, high ≈ $0.17. CLI default stays high, but agents should choose deliberately: low for cheap drafts / large sweeps, medium for normal exploration, high for final assets, typography, Chinese text, diagrams, or anything shipping-facing.
-n, --nint1bothNumber of images to return. >1 suffixes filenames _0, _1, …
--backgroundauto | opaqueAPI defaultgenerations onlyopaque disables transparent background.
--moderationauto | lowAPI defaultgenerations onlylow relaxes content filter where policy allows.
--formatpng | jpeg | webppngbothResponse encoding.
--compressionint 0–100bothJPEG/WebP compression. Ignored for PNG.
--userstrbothOptional end-user identifier for OpenAI abuse tracking.

Budget / quality policy for agents

Use --quality as the budget dial. There is no separate --budget flag in this CLI.

  • low — cheap draft mode. Use for broad prompt exploration, collecting many variants, gallery mining, rough composition checks, or when the user explicitly wants low cost / fast iteration.
  • medium — balanced mode. Use for normal one-off exploration, style probing, or cases where readability matters but the output is not yet final.
  • high — shipping / report mode. Use for Chinese text, posters, infographics, paper figures, dense labels, multi-panel layouts, banners, or any asset likely to be kept.

Rule of thumb for autonomous agents:

  • If the user asks for many variants, cheap, draft, explore, or collect, start with low.
  • If the user asks for polished but still exploratory, use medium.
  • If the user asks for final, fancy, hero, paper figure, poster, diagram, or exact text, use high.
  • If unsure, keep the CLI default high for text-heavy / delivery-facing outputs; otherwise prefer medium during exploration.

Endpoint selection (official OpenAI cookbook pattern)

ModeTriggerEndpoint
Generate from promptno -iPOST /v1/images/generations (JSON body)
Edit / reference-based-i one or more timesPOST /v1/images/edits (multipart form)
Inpaint with mask-i + -mPOST /v1/images/edits with a mask file

Both endpoints accept gpt-image-2 as of April 2026 — verified against OpenAI's official cookbook prompting guide. The skill uses the official openai Python SDK under the hood (from openai import OpenAI; client.images.generate(...) / client.images.edit(...)) — the CLI is a thin wrapper that exposes every SDK parameter as a flag.

Content policy: gpt-image-2 enforces its own content rules on the edits endpoint. Real-person-likeness edits usually refuse (400 error with a moderation message). The skill surfaces the response body verbatim on stderr and exits 1.

Canonical examples

# 1. Vanilla generate, 1K square, auto quality
uv run generate.py -p "a photorealistic convenience store at 10pm"

# 2. 2K portrait poster with exact Chinese text, high quality
uv run generate.py \
  -p 'Design a 3:4 tea poster. Exact copy: "山川茶事" / "冷泡系列" / "中杯 16 元"' \
  --size portrait --quality high -f poster.png

# 3. 4-image grid, transparent background disabled, webp
uv run generate.py -p "isometric furniture, minimalist" \
  -n 4 --background opaque --format webp --compression 85

# 4. Edit / colorize existing image
uv run generate.py -p "colorize this manga page and translate to Chinese" \
  -i page.jpg -f colored.png

# 5. Multi-reference brand collab
uv run generate.py -p "77 (the cat) × KFC employee poster" \
  -i cat.png -i kfc_logo.png -f collab.png --size portrait

# 6. Masked inpaint — replace sky only
uv run generate.py -p "replace sky with aurora, keep foreground intact" \
  -i photo.jpg -m sky_mask.png -f aurora.png --quality high

# 7. 4K widescreen render
uv run generate.py -p "cinematic Shanghai skyline at dusk" \
  --size 4k --quality high -f skyline.png

Response handling

  • API returns data: [{ b64_json: "…" }] by default; the script decodes base64 and writes bytes.
  • If the API returns url instead, the script GETs the URL and writes the downloaded bytes.
  • With -n > 1, files are suffixed: out.pngout_0.png, out_1.png, …

Error surface

ConditionExitstderr
OPENAI_API_KEY unset2error: OPENAI_API_KEY not set. ...
--mask without -i2error: --mask requires --image (edits endpoint only)
-i or -m path missing2error: --image not found: PATH
OpenAI returns non-2xx1error: <status> from OpenAI: <body> (first 2000 chars of response)
Response has no image data1error: no image data in response: <json>

When an agent hits exit 1, it should surface the response body verbatim — it usually names the problem (rate limit, moderation block, invalid size).

Size picking guide

IntentSize
Default / social square1024x1024 (1k)
Mobile screenshot, portrait poster, beauty/skincare1024x1536 (portrait)
Landscape photo, gameplay screenshot1536x1024 (landscape)
Hi-res print, paper figure2048x2048 (2k)
Widescreen cinematic, dashboard hero3840x2160 (4k)
Tall story banner, vertical video thumbnail2160x3840 (tall)

Prompt-craft references (optional, load only when needed)

These are not required to use the script — they exist for prompt-quality uplift when the user's intent needs more structure than a one-liner.

  • references/craft.md — 12 cross-cutting principles: exact-text-in-quotes, aspect-ratio-first, camera/shot language, scene density, style anchoring, negation, reference-based unlocks, dense Chinese text, three-glances test.
  • references/gallery.md — 56 community-curated templates across 8 categories: photography, games, UI/UX, typography, infographics, character consistency, editing, collage. Each entry keeps its original Source: @handle attribution.
  • references/openai-cookbook.md — verbatim Markdown capture of OpenAI's official GPT Image prompting guide. Load this when the user asks about OpenAI's own parameter semantics, wants a use-case beyond what our gallery covers (UI mockups, pitch-deck slides, scientific diagrams, virtual try-on, billboard mockups, translation edits), or needs the authoritative parameter-coverage table.

Load a reference file only when the user's request signals that category (e.g. asks for a poster → load typography section of gallery; asks about rendering Chinese → load craft.md sections 1, 7, 10; asks "how does the edits endpoint actually work?" → load openai-cookbook.md).

Attribution

Prompt patterns curated from ZeroLu/awesome-gpt-image under CC BY 4.0. Individual @handle attributions preserved per-entry in references/gallery.md.

Capabilities

skillsource-wuyoscarskill-gpt-imagetopic-agent-skillstopic-gpt-image-2topic-image-generation

Install

Installnpx skills add wuyoscar/gpt_image_2_skill
Transportskills-sh
Protocolskill

Quality

0.49/ 1.00

deterministic score 0.49 from registry signals: · indexed on github topic:agent-skills · 80 github stars · SKILL.md body (9,646 chars)

Provenance

Indexed fromgithub
Enriched2026-04-23 18:56:48Z · deterministic:skill-github:v1 · v1
First seen2026-04-23
Last seen2026-04-23

Agent access